SetFit — Efficient Few-Shot Text Classification

SetFit (Sentence Transformer Fine-Tuning) is HuggingFace's framework for highly efficient few-shot text classification. It fine-tunes Sentence Transformers contrastively then trains a lightweight classifier head, achieving near full-dataset accuracy with 8–64 labeled examples per class. For legal teams, this means training a contract clause classifier, document routing model, or compliance screener with minimal labeled data.

Key Features

State-of-the-art few-shot accuracy with 8 examples per class
No prompt engineering — direct embedding-based fine-tuning
10–100x faster training than large generative models
Multilingual: fine-tune on multilingual checkpoints for cross-jurisdiction use
HuggingFace Hub integration — push and share models instantly

Quick Start

pip install setfit

from setfit import SetFitModel, Trainer, TrainingArguments
from datasets import Dataset

# 8 labeled contract clauses per category
train_dataset = Dataset.from_dict({
    "text": ["The party may terminate upon 30 days notice...", ...],
    "label": [0, 1, 0, 1, ...]
})
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
trainer = Trainer(model=model, train_dataset=train_dataset)
trainer.train()
preds = model.predict(["Either party may terminate this agreement..."])
print(preds)  # [0] = termination clause

npx ai-supply add setfit-few-shot-classifier

Curated mirror of the open-source SetFit (Apache-2.0). Get it from the source.

SetFit — Efficient Few-Shot Text Classification

SetFit — Efficient Few-Shot Text Classification

Key Features

Quick Start

More from @ai-supply