⠿EmbeddingResearchFree
SPECTER
Citation-informed document embeddings for scientific papers, enabling similarity, recommendation, and clustering from title + abstract.
SPECTER
SPECTER (from AllenAI) produces document-level embeddings tuned specifically for scientific literature. Rather than averaging generic sentence vectors, it is trained with a citation-informed objective on the Transformer (SciBERT) backbone — papers that cite one another are pulled together in the embedding space — so the resulting vectors capture scholarly relatedness far better than off-the-shelf encoders.
Key features
- Encodes a paper from just its title and abstract into a single dense vector — no full text or citation graph needed at inference
- Citation-informed contrastive training yields embeddings aligned with real scholarly relatedness
- Ready for paper recommendation, semantic search, deduplication, topic clustering, and citation prediction
- Strong baseline on the SciDocs evaluation suite for scientific document tasks
- Pairs naturally with a vector store to power a research-discovery pipeline
Use it to build "papers like this" retrieval or to cluster a literature corpus by genuine topical proximity.
Curated mirror of the open-source SPECTER (Apache-2.0). Get it from the source.