◆SkillLanguage & NLPFree
BERTopic
Modular topic modeling framework using transformer embeddings and c-TF-IDF for interpretable, coherent topics.
Installs180k
Rating★ 4.7
Reviews60
BERTopic
BERTopic is a topic modeling technique that leverages transformer-based embeddings (BERT, Sentence-BERT, OpenAI) to create dense clusters of documents, then uses class-based TF-IDF to produce coherent, interpretable topic representations.
Key Features
- Transformer embeddings: Use any sentence-transformer, OpenAI, or Hugging Face embedding model as the backbone
- Modular design: Swap out any component — embedding, dimensionality reduction (UMAP), clustering (HDBSCAN), and vectorization
- Dynamic topics: Track how topics evolve over time with
topics_over_time - Guided modeling: Seed the model with keywords to steer topic discovery
- Zero-shot classification: Assign documents to pre-defined topics without training
- Visualization: Built-in Plotly visualizations — topic hierarchy, similarity heatmap, topic evolution
- Online learning: Incrementally update the model with new documents
Quick Start
pip install bertopic
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset="all")["data"]
model = BERTopic(language="english", calculate_probabilities=True)
topics, probs = model.fit_transform(docs)
print(model.get_topic_info().head(10))
model.visualize_topics()
Add to ai-supply
npx ai-supply add bertopic-topic-modeling
Curated mirror of the open-source BERTopic (MIT). Get it from the source.