Name: OpenAI Evals
Availability: InStock
Author: ai-supply

OpenAI Evals

OpenAI Evals is a framework for evaluating LLMs and LLM-powered systems, open-sourced by OpenAI under the MIT license. It provides a library of 1000+ existing evals alongside a structured way to build new ones — covering accuracy, safety, robustness, and task-specific performance. Evals can target any model via the OpenAI API or custom completion functions.

Key features

1,000+ ready-made evals — logic, coding, translation, factuality, safety
Custom eval builder: model_graded, basic, match eval types
Model-graded evals use an LLM as judge for open-ended tasks
YAML-based eval spec format — version-control your evaluations
Multi-model comparison support for red-teaming and A/B testing
MIT license — contribute or use commercially

Quick start

pip install openai evals

# Run a built-in eval
oaieval gpt-4o test-match

# Register and run a custom eval
cat > evals/registry/evals/my-eval.yaml << 'EOF'
my-eval:
  id: my-eval.dev.v0
  metrics: [accuracy]
my-eval.dev.v0:
  class: evals.elsuite.basic.match:Match
  args:
    samples_jsonl: my_samples.jsonl
EOF
oaieval gpt-4o-mini my-eval

Install via ai-supply

npx ai-supply add openai-evals-framework

Curated mirror of the open-source OpenAI Evals (MIT). Get it from the source.

OpenAI Evals

OpenAI Evals

Key features

Quick start

Install via ai-supply

More from @ai-supply