◆SkillLegal & ComplianceFree
SetFit — Efficient Few-Shot Text Classification
HuggingFace's few-shot text classification framework — match Legal-BERT accuracy with as few as 8 labeled contract examples, no prompt engineering.
SetFit — Efficient Few-Shot Text Classification
SetFit (Sentence Transformer Fine-Tuning) is HuggingFace's framework for highly efficient few-shot text classification. It fine-tunes Sentence Transformers contrastively then trains a lightweight classifier head, achieving near full-dataset accuracy with 8–64 labeled examples per class. For legal teams, this means training a contract clause classifier, document routing model, or compliance screener with minimal labeled data.
Key Features
- State-of-the-art few-shot accuracy with 8 examples per class
- No prompt engineering — direct embedding-based fine-tuning
- 10–100x faster training than large generative models
- Multilingual: fine-tune on multilingual checkpoints for cross-jurisdiction use
- HuggingFace Hub integration — push and share models instantly
Quick Start
pip install setfit
from setfit import SetFitModel, Trainer, TrainingArguments
from datasets import Dataset
# 8 labeled contract clauses per category
train_dataset = Dataset.from_dict({
"text": ["The party may terminate upon 30 days notice...", ...],
"label": [0, 1, 0, 1, ...]
})
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
trainer = Trainer(model=model, train_dataset=train_dataset)
trainer.train()
preds = model.predict(["Either party may terminate this agreement..."])
print(preds) # [0] = termination clause
npx ai-supply add setfit-few-shot-classifier
Curated mirror of the open-source SetFit (Apache-2.0). Get it from the source.