⠿EmbeddingLanguage & NLPFree
nomic-embed-text-v1
Apache-2.0 text embedding model with 8192-token context — the first open, auditable, long-context embedding.
nomic-embed-text-v1
nomic-embed-text-v1 is a fully open-source (Apache 2.0) text embedding model from Nomic AI that supports 8192-token contexts — far exceeding the 512-token limit of most alternatives. It is the first embedding model to come with a fully reproducible training pipeline and model card, making it uniquely auditable.
Key features
- 8192-token context — embed entire research papers or long documents in one pass
- Trained on 235M curated text pairs
- Outperforms OpenAI
text-embedding-ada-002on MTEB benchmarks - Fully reproducible — training code, data, and weights all open
- Native support in LangChain, LlamaIndex, Chroma, and Qdrant
Quick start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"nomic-ai/nomic-embed-text-v1",
trust_remote_code=True
)
sentences = [
"search_query: How do vector databases work?",
"search_document: Vector databases store embeddings for fast similarity search."
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (2, 768)
Install via ai-supply
npx ai-supply add nomic-embed-text-v1
Curated mirror of the open-source nomic-embed-text-v1 (Apache-2.0). Get it from the source.