⠿EmbeddingData & ETLFree
FastEmbed
Fast, lightweight CPU-first embedding library using quantized ONNX models — no PyTorch dependency — for vector search and RAG.
FastEmbed
FastEmbed from Qdrant generates text (and image) embeddings quickly on CPU without dragging in a full PyTorch/Transformers stack. It ships quantized ONNX Runtime models, so cold starts are fast and containers stay small — ideal for serverless functions, ingestion workers, and edge deployments that just need vectors for a vector store.
Key features
- Quantized ONNX models run efficiently on CPU with a tiny dependency footprint
- Supports dense, sparse (SPLADE-style), late-interaction (ColBERT), and image embedding models
- Batched, parallelized encoding for high ingestion throughput
- Curated catalog of popular open embedding models selectable by name
- First-class integration with the Qdrant vector database, but usable standalone
Drop it in as the encoding step ahead of any vector store to turn chunks into embeddings without provisioning a GPU.
Curated mirror of the open-source FastEmbed (Apache-2.0). Get it from the source.