ScrapeGraphAI
Prompt-driven web scraping: describe the data you want and LLM graph pipelines extract clean structured JSON from any page.
ScrapeGraphAI
ScrapeGraphAI turns web scraping into a prompt. Instead of writing brittle CSS/XPath selectors, you point it at a URL (or local HTML/XML/JSON/Markdown) and describe the data you want in natural language; a graph of LLM-powered nodes fetches, parses, and returns clean structured output. It works with OpenAI, Anthropic, Groq, Azure, Gemini, and local Ollama models, so extraction logic survives the page redesigns that break traditional scrapers.
Key features
- Prompt-to-JSON extraction via composable scraping "graphs" (SmartScraper, SearchGraph, OmniScraper)
- Model-agnostic: cloud LLMs or fully local models through Ollama
- Handles single pages, multi-page search, and multimedia sources
- Built-in browser rendering for JavaScript-heavy sites
- Python API plus integrations for pipelines and agents
Ideal for research agents, dataset collection, and monitoring tasks where target sites change often and hand-tuned selectors are too costly to maintain.
Curated mirror of the open-source ScrapeGraphAI (MIT). Get it from the source.