◆SkillLanguage & NLPFree
Stanza — Stanford NLP Python Toolkit
Stanford NLP's Python library for tokenisation, sentence segmentation, NER, dependency parsing, and coreference across 70+ languages.
Installs78k
Rating★ 4.6
Reviews26
Stanza
Stanza is Stanford NLP's production-grade Python NLP toolkit covering the full linguistic analysis pipeline. With pre-trained neural models for 70+ human languages, it delivers accurate tokenisation, multi-word token expansion, POS tagging, lemmatisation, NER, and dependency parsing in one consistent API.
Key Features
- 70+ language models, including low-resource languages
- Full NLP pipeline: tokenise → MWT → POS → lemma → depparse → NER → coref
- BiLSTM neural architecture; UD-trained dependency parsers
- spaCy-compatible wrapper (
stanza.pipeline.core.Pipeline) - Named entity recognition: 18+ entity types
- Biomedical/clinical NLP models (PubMed, MIMIC-III trained)
- CoreNLP Java server bridge for Stanford CoreNLP features
Quick Start
import stanza
stanza.download("en") # download once
nlp = stanza.Pipeline(lang="en", processors="tokenize,mwt,pos,lemma,depparse,ner")
doc = nlp("Barack Obama was born in Hawaii. He was the 44th President.")
for sent in doc.sentences:
for word in sent.words:
print(f"{word.text:15s} POS={word.upos:6s} HEAD={sent.words[word.head-1].text if word.head > 0 else 'root'}")
for ent in sent.ents:
print(f" NER: {ent.text} [{ent.type}]")
Install via ai-supply
npx ai-supply add stanza-stanford-nlp-toolkit
Curated mirror of the open-source Stanza (Apache-2.0). Get it from the source.