Name: nomic-embed-text-v1
Availability: InStock
Rating: 4.7 (48 reviews)
Author: ai-supply

nomic-embed-text-v1

nomic-embed-text-v1 is a fully open-source (Apache 2.0) text embedding model from Nomic AI that supports 8192-token contexts — far exceeding the 512-token limit of most alternatives. It is the first embedding model to come with a fully reproducible training pipeline and model card, making it uniquely auditable.

Key features

8192-token context — embed entire research papers or long documents in one pass
Trained on 235M curated text pairs
Outperforms OpenAI text-embedding-ada-002 on MTEB benchmarks
Fully reproducible — training code, data, and weights all open
Native support in LangChain, LlamaIndex, Chroma, and Qdrant

Quick start

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "nomic-ai/nomic-embed-text-v1",
    trust_remote_code=True
)

sentences = [
    "search_query: How do vector databases work?",
    "search_document: Vector databases store embeddings for fast similarity search."
]
embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 768)

Install via ai-supply

npx ai-supply add nomic-embed-text-v1

Curated mirror of the open-source nomic-embed-text-v1 (Apache-2.0). Get it from the source.

nomic-embed-text-v1

nomic-embed-text-v1

Key features

Quick start

Install via ai-supply

More from @ai-supply