▣DatasetData & ETLFree
Deep Lake
AI data lake with multimodal tensor storage, vector search, and serverless SQL — stream datasets directly to PyTorch and TensorFlow.
التثبيتات118k
التقييم★ 4.6
المراجعات39
Deep Lake
Deep Lake (by Activeloop) is an AI-native data runtime that stores multimodal datasets — images, videos, text, audio, annotations, and embeddings — as chunked tensors in cloud or local storage. It exposes a vector store API for RAG pipelines and streams data directly into PyTorch/TensorFlow DataLoaders without full dataset downloads.
Key Features
- Multimodal tensor storage — one dataset can contain images, text, embeddings, bounding boxes, and labels
- Vector search — cosine, L2, and dot-product ANN search over embedding tensors; hybrid text+vector search
- Serverless SQL — query datasets via TQL (Tensor Query Language) with no data movement
- Streaming DataLoader — pull mini-batches for training directly from S3/GCS without local copies
- Data versioning — branch, commit, checkout datasets like git; full history tracking
- Integrations — LangChain, LlamaIndex, PyTorch, TensorFlow, and HuggingFace compatible
Quick Start
pip install deeplake
import deeplake
ds = deeplake.dataset("hub://activeloop/coco-train")
for sample in ds.pytorch(batch_size=4):
images = sample["images"] # stream from cloud
Install via ai-supply
npx ai-supply add deeplake-multimodal-data-lake
Curated mirror of the open-source Deep Lake (Apache-2.0). Get it from the source.