⬡PipelineLegal & ComplianceFree
Apache OpenNLP — NLP Toolkit for Legal Documents
Apache-licensed Java/Python NLP toolkit with tokenization, sentence detection, NER, POS tagging, and chunking — production-ready for legal document pipelines.
Apache OpenNLP — NLP Toolkit for Legal Documents
Apache OpenNLP is a mature, production-grade NLP toolkit from the Apache Software Foundation. It provides a full suite of language processing components — tokenizer, sentence detector, part-of-speech tagger, named entity finder, chunker, parser, and coreference resolver — implemented as trainable maximum-entropy models. Widely used in legal document pipelines for court filing processing, regulatory text extraction, and contract analysis in enterprise Java environments.
Key Features
- Full NLP pipeline: tokenize → sentence detect → POS tag → NER → parse
- Trainable on domain-specific corpora (legal, medical, financial)
- REST service via OpenNLP Sandbox for microservice deployments
- Python bindings available via
opennlp-python - Apache-2.0 — clear IP for commercial legal software
Quick Start
# Via opennlp Python wrapper
pip install opennlp
import opennlp
nlp = opennlp.OpenNLP("/path/to/models/")
sentences = nlp.sentence_detector("The plaintiff filed suit. The court denied relief.")
tokens = nlp.tokenizer(sentences[0])
pos_tags = nlp.pos_tagger(tokens)
print(list(zip(tokens, pos_tags)))
npx ai-supply add apache-opennlp
Curated mirror of the open-source Apache OpenNLP (Apache-2.0). Get it from the source.