◆SkillHealthcareFree
scispaCy — Biomedical & Scientific NLP Models
Allen AI's Apache-2.0 spaCy models for biomedical text with UMLS, MeSH, GO, HPO, and RxNorm entity linking and NER for genes, diseases, chemicals, and proteins.
scispaCy — Biomedical & Scientific NLP Models
scispaCy by AllenAI provides spaCy-compatible NLP models and pipelines specifically trained on biomedical and scientific text (PubMed, MIMIC-III, BC5CDR, JNLPBA). It includes entity linkers for major biomedical ontologies — UMLS, MeSH, Gene Ontology, Human Phenotype Ontology, and RxNorm.
Key Features
- 5 pre-trained spaCy models (sm/md/lg/transformer variants) for biomedical NER
- Entity types: disease, chemical, gene, protein, cell line, DNA, RNA, cell type
- Entity linkers: UMLS (3M+ concepts), MeSH, GO, HPO, RxNorm
- Abbreviation detection (biomedical abbreviations expand correctly)
- Fully compatible with the spaCy ecosystem (stanza, transformers pipeline)
Quick Start
pip install scispacy
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_core_sci_lg-0.5.4.tar.gz
import spacy, scispacy
from scispacy.linking import EntityLinker
nlp = spacy.load("en_core_sci_lg")
nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "umls"})
doc = nlp("Metformin is used to treat type 2 diabetes mellitus.")
for ent in doc.ents:
print(ent.text, ent._.kb_ents[:2])
npx ai-supply add scispacy-biomedical-nlp
Curated mirror of the open-source scispaCy (Apache-2.0). Get it from the source.