Category

Language & NLP

Translation, summarization, extraction.

30 listings

Every Language & NLP listing on ai-supply is a free, open-source AI capability that we scan and grade A–D for security before it appears here — so you can adopt with confidence, not just a star count. This category spans 8 Skills, 6 Datasets, 6 Prompts, 4 Pipelines, 2 Evals, 2 Plugins, 1 Template and 1 Guardrail.

Security posture: 17 of 28 scanned rated safe · avg score 88/100.

Language & NLP leaderboard →How we grade security →Security findings →All categories →

The Natural Language Toolkit — Python's foundational NLP library for tokenization, POS tagging, parsing, and corpora.

! B · 75⟳ 1mo agoshellnetworkfs

▣Датасет

Hugging Face Datasets

Fast, memory-mapped dataset library for NLP and ML — 50,000+ datasets, streaming, and Arrow-backed processing.

✓ A · 100⟳ 1mo agoPII

Hugging Face Tokenizers

Ultra-fast tokenizer library (Rust core) — BPE, WordPiece, SentencePiece — tokenize GBs in seconds.

! B · 75⟳ 3mo agoshellnetworkfs

Industrial-strength NLP library for Python with pre-trained pipelines for tokenization, NER, parsing, and more.

! B · 75⟳ 4mo agofs

Hugging Face Accelerate

Run PyTorch training scripts on any hardware — single GPU, multi-GPU, TPU — with minimal code changes.

✓ A · 100⟳ 1mo agoshellnetworkfs

❝Промпт

Awesome ChatGPT Prompts

The largest open prompt collection — thousands of curated system prompts for personas, coding, writing, education, and creative tasks. CC0 licensed.

! B · 75⟳ 3d agojailbreak

Modular topic modeling framework using transformer embeddings and c-TF-IDF for interpretable, coherent topics.

✓ A · 100⟳ 7mo agonetworkfs

❝Промпт

Prompt Engineering Guide

MIT-licensed comprehensive guide and prompt library — techniques, examples, and templates for every major LLM prompting method.

! B · 75⟳ 4mo agojailbreak

Flair — State-of-the-Art NLP Framework

Simple NLP library with SOTA models for NER, POS tagging, chunking, text classification, and contextual string embeddings.

! B · 75⟳ 1y agoshellnetworkfs

▣Датасет

MIT-licensed 4.2M instruction dataset — GPT-4/3.5 augmented CoT traces that power top open-source fine-tunes.

⬡Пайплайн

Production-ready NLP pipeline framework for building search, RAG, and question-answering systems with any LLM.

✓ A · 100⟳ 8d agosecretsshellnetwork

▣Датасет

Databricks Dolly-15k

CC-BY-SA-3.0 instruction dataset of 15k human-written prompts — the first commercially licensed open instruction dataset.

Stanza — Stanford NLP Python Toolkit

Stanford NLP's Python library for tokenisation, sentence segmentation, NER, dependency parsing, and coreference across 70+ languages.

✓ A · 100⟳ 13d agoshellnetworkfs

Apache-2.0 RAG evaluation framework — faithfulness, answer relevancy, context recall, and more in one pip install.

✓ A · 100⟳ 6mo agoguards:4

⬡Пайплайн

All-in-one semantic search, RAG, and LLM workflow engine — embeddings, vector DB, and pipelines in one library.

✓ A · 100⟳ 27d agosecretsshellnetwork

▣Датасет

Anthropic HH-RLHF Dataset

Anthropic's human-preference (helpful/harmless) dataset for RLHF and alignment research.

✓ A · 100⟳ 1y agoPII

▤Шаблон

Open-source, self-hostable ChatGPT-style chat UI that works with any model.

! B · 75⟳ 2y ago

Argilla — Collaborative Data Annotation for LLMs

Open-source annotation platform for building high-quality fine-tuning and RLHF datasets; integrates with Hugging Face Hub.

✓ A · 100⟳ 1y agonetworkfs

❝Промпт

LLM Prompt Library

Experimental prompts, Jinja2 templates, and scripts spanning OpenAI, Anthropic, Google, Mistral and more.

✓ A · 100⟳ 1y agojailbreak

▣Датасет

Stanford Alpaca

Code and the 52K instruction-following dataset behind Stanford's Alpaca models.

! B · 75⟳ 2y agoPII

⊕Плагин

Text Generator for Obsidian

Obsidian plugin that generates note content via OpenAI, Anthropic, Google, and local models.

✓ A · 100⟳ 3mo ago

⊕Плагин

spacy-llm — LLMs in spaCy NLP Pipelines

Integrates LLMs as spaCy pipeline components for NER, classification, lemmatisation, and relation extraction with zero/few-shot prompting.

! B · 88⟳ 4mo agonetworkfs

⛨Ограничитель

Pretrained PyTorch models that score text for toxicity, threats, insults, and identity attacks — a drop-in content-moderation guardrail.

✓ A · 100⟳ 4mo agoguards:4

❝Промпт

The Big Prompt Library

Large library of prompts, system prompts, and LLM instructions, including extracted product system prompts.

! B · 75⟳ 5mo agojailbreak

⬡Пайплайн

Lightweight, low-dependency unified Python API to run any cross-encoder, ColBERT, or LLM reranker to sharpen RAG retrieval.

✓ A · 100⟳ 1y agonetworkfs

❝Промпт

Brex Prompt Engineering Guide

Brex's in-depth public guide to prompt engineering and working with GPT-4-class models.

✓ A · 100⟳ 2y agojailbreak

▣Датасет

Heterogeneous zero-shot information-retrieval benchmark bundling 15+ diverse IR datasets behind one evaluation API.

✓ A · 100⟳ 1y agoPII

❝Промпт

ChatGPT System Prompts

Curated collection of high-quality system prompts for ChatGPT, organized by role and use case.

✓ A · 100⟳ 1y ago

⬡Пайплайн

Open-source RAG engine built on deep document understanding, grounding LLM answers with traceable citations.

! D · 0⟳ 21d agosecretsshellnetwork

Benchmark of 817 questions across 38 categories measuring whether LLMs avoid imitative falsehoods — a direct probe of factual hallucination.

✓ A · 95⟳ 1y agoguards:6