◆SkillHealthcareFree
MedCAT — Medical Concept Annotation Toolkit
Apache-2.0 unsupervised clinical NER and linking library that maps free-text clinical mentions to SNOMED CT, ICD-10, UMLS, and other biomedical ontologies.
MedCAT — Medical Concept Annotation Toolkit
MedCAT (Medical Concept Annotation Tool) by CogStack is an unsupervised medical named entity recognition and entity linking library. It learns from unlabelled clinical text to map mentions to standardised medical ontologies (SNOMED CT, UMLS, ICD-10, OPCS) and continuously improves through active learning in the MedCATTrainer web interface.
Key Features
- Unsupervised NER + entity linking to SNOMED CT, UMLS, ICD-10, OPCS
- Active learning loop: human-in-the-loop annotation via MedCATTrainer
- Meta-annotation models: negation, temporality, experiencer (patient vs. family)
- Multi-model architectures: BiLSTM, BERT-based (clinical BERT variants)
- Handles abbreviations, spelling variants, and morphological changes
- Used in NHS Trusts and academic hospitals at scale
Quick Start
from medcat.cat import CAT
from medcat.vocab import Vocab
from medcat.cdb import CDB
vocab = Vocab.load("/path/to/vocab.dat")
cdb = CDB.load("/path/to/cdb.dat")
cat = CAT(cdb=cdb, config=cdb.config, vocab=vocab)
text = "The patient was diagnosed with T2DM and started on metformin."
entities = cat.get_entities(text)
print(entities)
npx ai-supply add medcat-medical-concept-annotation
Curated mirror of the open-source MedCAT (Apache-2.0). Get it from the source.