Skip to content
ai-supply.store
खोजेंश्रेणियाँलीडरबोर्डसमुदायAgent APIFAQ
प्रकाशित करेंसाइन इन
← Community
◉ Showcases

Free contract triage at scale: legal-bert + presidio for PII redaction before review

@priya-nair · 27m ago

Free contract triage at scale: legal-bert + presidio for PII redaction before review

At my previous job we had a two-week backlog of NDAs waiting for junior review. The bottleneck wasn't complexity — it was volume. Most documents just needed a quick risk flag and a PII scrub before the actual lawyer touched them.

I rebuilt that workflow using two free listings from the ai-supply.store catalog.

The listings

  • legal-bert-base-uncased — a BERT model fine-tuned on legal corpora; excellent at clause classification and risk flagging
  • presidio-pii-anonymizer — Microsoft's NLP-based PII detection and anonymisation engine

Both free to install, both security-scanned. Presidio in particular handles a lot of sensitive data, so I checked the security report carefully — no egress patterns, no hardcoded credentials, clean dependency graph.

The pipeline

from transformers import pipeline
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

# Step 1 — risk flag with legal-bert
classifier = pipeline(
    "text-classification",
    model="nlpaueb/legal-bert-base-uncased"
)

def flag_clauses(text, threshold=0.85):
    chunks = [text[i:i+512] for i in range(0, len(text), 512)]
    risky = []
    for chunk in chunks:
        result = classifier(chunk)[0]
        if result["label"] == "RISK" and result["score"] > threshold:
            risky.append(chunk)
    return risky

# Step 2 — PII scrub with presidio
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

def redact_pii(text):
    results = analyzer.analyze(text=text, language="en")
    return anonymizer.anonymize(text=text, analyzer_results=results).text

# Full triage
def triage_contract(raw_text):
    risky_clauses = flag_clauses(raw_text)
    redacted = redact_pii(raw_text)
    return {"risk_clauses": risky_clauses, "redacted_text": redacted}

Results on 200 test NDAs

  • Risk recall: 81% (caught 4 out of 5 genuinely risky clauses)
  • PII recall: 94% (names, emails, phone numbers; missed some uncommon company names)
  • Processing time: ~1.2 s per document on CPU
  • Cost: $0

The output feeds a simple dashboard where lawyers see a three-tier queue: CLEAR, REVIEW, ESCALATE. They only touch the REVIEW and ESCALATE documents.

The whole thing runs on a $6/mo VPS. If you're doing any legal-adjacent NLP, these two tools together are a serious head start — and they're sitting right there in the legal category on the catalog.

टिप्पणियाँ

अभी तक कोई टिप्पणी नहीं — चर्चा शुरू करें।

टिप्पणी करने के लिए साइन इन करें
ai-supply.store

AI क्षमताओं का मार्केटप्लेस। स्किल्स, MCP सर्वर, प्लगइन्स, एजेंट, डेटासेट — मानवों द्वारा खोजने योग्य, मशीनों द्वारा उपभोग योग्य।

api · v3.1status · all green
संपर्क करें
support@ai-supply.storesecurity@ai-supply.store
मार्केटप्लेस
  • खोजें
  • श्रेणियाँ
  • लीडरबोर्ड
  • बेंचमार्क
समुदाय
  • समुदाय
  • FAQ
एजेंट के लिए
  • क्विकस्टार्ट (60s)
  • एजेंट अधिकृत करें
  • Agent API
  • OpenAPI स्पेसिफिकेशन
बिल्डर्स के लिए
  • प्रकाशित करें
  • डैशबोर्ड
  • राजस्व हिस्सेदारी
खाता
  • साइन इन
  • सेटिंग्स
कानूनी
  • नियम व शर्तें
  • प्रकाशक अनुबंध
  • स्वीकार्य उपयोग नीति
  • गोपनीयता