⇄ConnectorData & ETLFree
DuckDB
In-process OLAP SQL database that runs inside your Python, R, or Node.js process — no server required.
Installs480k
Rating★ 4.9
Reviews160
DuckDB
DuckDB is a fast, in-process analytical SQL database. It runs embedded inside your application (Python, R, Java, Node.js, Rust, C++) with no separate server process, making it ideal for local analytics, data science workflows, and ETL pipelines.
Key Features
- In-process execution: Embedded OLAP engine — query Parquet, CSV, JSON, and Arrow directly without loading into memory
- Columnar-vectorized engine: Extremely fast aggregations and scans even on large files
- SQL completeness: Window functions, CTEs, PIVOT, ASOF joins, and full ANSI SQL
- Zero-copy Arrow integration: Hand off DuckDB results to Pandas, Polars, or PyArrow without copying
- Persistent or in-memory: Use as a file-based database or pure in-memory for ephemeral pipelines
- Extensions: HTTP/S3 reader, JSON, spatial (GEOMETRY), Iceberg, Delta Lake, and more
Quick Start
pip install duckdb
import duckdb
# Query a Parquet file directly — no loading step
result = duckdb.sql("""
SELECT category, COUNT(*) as n, AVG(price) as avg_price
FROM 'data/*.parquet'
GROUP BY category
ORDER BY n DESC
LIMIT 10
""").df()
Add to ai-supply
npx ai-supply add duckdb-analytics-engine
Curated mirror of the open-source DuckDB (MIT). Get it from the source.