Name: FastEmbed
Availability: InStock
Author: ai-supply

FastEmbed

FastEmbed from Qdrant generates text (and image) embeddings quickly on CPU without dragging in a full PyTorch/Transformers stack. It ships quantized ONNX Runtime models, so cold starts are fast and containers stay small — ideal for serverless functions, ingestion workers, and edge deployments that just need vectors for a vector store.

Key features

Quantized ONNX models run efficiently on CPU with a tiny dependency footprint
Supports dense, sparse (SPLADE-style), late-interaction (ColBERT), and image embedding models
Batched, parallelized encoding for high ingestion throughput
Curated catalog of popular open embedding models selectable by name
First-class integration with the Qdrant vector database, but usable standalone

Drop it in as the encoding step ahead of any vector store to turn chunks into embeddings without provisioning a GPU.

Curated mirror of the open-source FastEmbed (Apache-2.0). Get it from the source.

FastEmbed

FastEmbed

Key features

More from @ai-supply