◐ModelHealthcareFree
Bio_ClinicalBERT — Clinical Text Embeddings
MIT-licensed BERT fine-tuned on MIMIC-III clinical notes for superior clinical embeddings — NER, relation extraction, and ICU outcome prediction.
Bio_ClinicalBERT — Clinical Text Embeddings
Bio_ClinicalBERT (Emily Alsentzer et al., 2019) is a BERT model initialised from BioBERT and further fine-tuned on all clinical notes in MIMIC-III — a large dataset of ICU patient EHRs from Beth Israel Deaconess Medical Center. It produces clinical-domain embeddings that significantly outperform general BERT and BioBERT on tasks like clinical NER, de-identification, and ICU mortality prediction.
Key Features
- Initialised from BioBERT, fine-tuned on MIMIC-III clinical notes (2M+ notes)
- Outperforms BERT and BioBERT on clinical downstream tasks
- Available on HuggingFace as
emilyalsentzer/Bio_ClinicalBERT - Standard HuggingFace Transformers interface — drop-in replacement
- Discharge-summary-specific variant also available
Quick Start
from transformers import AutoTokenizer, AutoModel
import torch
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
inputs = tokenizer("Patient presents with acute myocardial infarction.",
return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
cls_embedding = outputs.last_hidden_state[:, 0, :] # [CLS] token
print(cls_embedding.shape) # torch.Size([1, 768])
npx ai-supply add bio-clinicalbert-clinical-embeddings
Curated mirror of the open-source Bio_ClinicalBERT (MIT). Get it from the source.