SPECTER

SPECTER (from AllenAI) produces document-level embeddings tuned specifically for scientific literature. Rather than averaging generic sentence vectors, it is trained with a citation-informed objective on the Transformer (SciBERT) backbone — papers that cite one another are pulled together in the embedding space — so the resulting vectors capture scholarly relatedness far better than off-the-shelf encoders.

Key features

Encodes a paper from just its title and abstract into a single dense vector — no full text or citation graph needed at inference
Citation-informed contrastive training yields embeddings aligned with real scholarly relatedness
Ready for paper recommendation, semantic search, deduplication, topic clustering, and citation prediction
Strong baseline on the SciDocs evaluation suite for scientific document tasks
Pairs naturally with a vector store to power a research-discovery pipeline

Use it to build "papers like this" retrieval or to cluster a literature corpus by genuine topical proximity.

Curated mirror of the open-source SPECTER (Apache-2.0). Get it from the source.

SPECTER

SPECTER

Key features

More from @ai-supply