▣DatasetLegal & ComplianceFree
CUAD — Contract Understanding Atticus Dataset
Expert-labeled dataset of 13,000+ annotations across 510 commercial contracts covering 41 legal clause types for contract review AI.
CUAD — Contract Understanding Atticus Dataset
CUAD (Contract Understanding Atticus Dataset) is a large-scale dataset created by The Atticus Project with dozens of legal experts. It contains 13,000+ annotations across 510 real commercial contracts, labeling 41 distinct clause types including parties, payment terms, termination clauses, IP ownership, and liability caps. It is the benchmark dataset for training and evaluating contract review AI systems.
Key Features
- 510 commercial contracts from EDGAR (SEC filings)
- 41 clause categories annotated by legal professionals
- Question-answering format compatible with extractive QA models
- Benchmark leaderboard for contract understanding research
- Free for commercial and academic use under CC-BY-4.0
Quick Start
from datasets import load_dataset
dataset = load_dataset("theatticusproject/cuad")
train = dataset["train"]
print(f"Train examples: {len(train)}")
print(train[0]["title"]) # Contract name
print(train[0]["question"]) # Clause type question
print(train[0]["answers"]) # Extracted clause text
npx ai-supply add cuad-contract-understanding-dataset
Curated mirror of the open-source CUAD (CC-BY-4.0). Get it from the source.