⇄ConnectorData & ETLFree
Crawl4AI
LLM-friendly open-source web crawler that turns pages into clean Markdown/JSON ready for RAG and agent pipelines.
Crawl4AI
Crawl4AI is an open-source, LLM-friendly web crawler and scraper that converts web pages into clean, structured Markdown or JSON ready to feed into RAG and agent pipelines. It is one of the most popular data-ingestion connectors for AI applications, designed for speed and for output that models can consume directly.
Key features
- Fast async crawling built on Playwright with browser session reuse
- LLM-ready Markdown generation with content filtering and pruning
- CSS/XPath selectors plus LLM-based structured extraction strategies
- Handles JavaScript-rendered pages, lazy loading, and stealth/anti-bot options
- Python API and a Docker deployment for scale
Usage note: pip install crawl4ai, run the post-install browser setup, then use the async crawler to fetch a URL and receive cleaned Markdown plus extracted structured data.
Curated mirror of the open-source Crawl4AI (Apache-2.0). Get it from the source.