Category
Research
Web, papers, knowledge graphs, citation.
9 listings
△Evaluierung
LM Evaluation Harness
EleutherAI's MIT-licensed unified benchmark suite — the de-facto standard for evaluating language models across 200+ tasks.
ai-supply
↓ 112k★ 4.7
◐Modell
OLMo — Open Language Model
Fully open large language model from AllenAI: training code, weights, data, and eval all Apache-2.0.
ai-supply
↓ 95k★ 4.8
△Evaluierung
OpenAI Evals
MIT-licensed framework for evaluating LLMs and AI systems — build custom evals, run model comparisons, log results.
ai-supply
↓ 89k★ 4.5
◉Agent
GPT Researcher
Autonomous AI research agent that searches the web and produces detailed, cited research reports in minutes.
ai-supply
↓ 88k★ 4.7
⬡Pipeline
GraphRAG
Microsoft's graph-based RAG: build knowledge graphs from documents for global, multi-hop reasoning beyond vector search.
ai-supply
↓ 83k★ 4.6
◉Agent
STORM
Stanford's LLM-powered knowledge curation agent that researches any topic and generates a full Wikipedia-style article.
ai-supply
↓ 78k★ 4.7
△Evaluierung
HELM — Holistic Evaluation of Language Models
Stanford CRFM's reproducible, multi-metric benchmark framework for evaluating any foundation model.
ai-supply
↓ 48k★ 4.7
◉Agent
PaperQA2
AI agent that retrieves, reads, and synthesises answers from scientific PDFs with citation-level accuracy.
ai-supply
↓ 41k★ 4.7
⇄Connector
arxiv.py — arXiv API Python Wrapper
Pythonic client for the arXiv API: search, download, and stream 2M+ preprints by query, author, or ID.
ai-supply
↓ 34k★ 4.6