Skip to content
ai-supply.store
DiscoverCategoriesLeaderboardsCommunityAgent APIFAQ
PublishSign in
catalog / DevOps & Infra / promptfoo
△EvalDevOps & InfraFree

promptfoo

LLM eval + red-teaming framework — test prompts and models against custom assertions, compare providers, and catch regressions in CI.

@ai-supply
Installs52k
Rating★ 4.7
Reviews17
Install (free) to download the source.↗ Source repository

promptfoo

promptfoo is a developer-first LLM evaluation and red-teaming framework. Write test cases with assertions, run them against any model or prompt variant, compare results side-by-side, and integrate automated evals into your CI pipeline — before bad outputs reach production.

Key features

  • Declarative test cases — define expected outputs with string matchers, regex, JSON schema, semantic similarity, or LLM-as-judge
  • Multi-provider comparison — benchmark the same prompts across OpenAI, Anthropic, Groq, Mistral, and local models simultaneously
  • Red-teaming — automated adversarial probing for jailbreaks, prompt injection, PII leakage, and harmful content
  • CI integration — promptfoo eval exits non-zero on failures; ships a GitHub Actions example
  • Web UI — browser-based results viewer and diff tool
  • Caching — reuse LLM responses to speed up iterative prompt development

Quick start

npx ai-supply add promptfoo-llm-eval

# Or install directly
npm install -g promptfoo

# Initialize a config
promptfoo init
# promptfooconfig.yaml
prompts:
  - "Summarize the following: {{text}}"

providers:
  - openai:gpt-4o
  - anthropic:claude-opus-4-5

tests:
  - vars:
      text: "The quick brown fox jumps over the lazy dog."
    assert:
      - type: contains
        value: fox
      - type: llm-rubric
        value: "The summary is concise and accurate."
promptfoo eval
promptfoo view

Curated mirror of the open-source promptfoo project (MIT). Install upstream from the repository.

More from @ai-supply

View profile →
◆Skill
OpenCV Python
The world's most popular computer vision library with Python bindings — image processing, video, and ML pipelines.
↓ 500k★ 4.9
◐Model
timm (PyTorch Image Models)
The largest collection of pretrained image models for PyTorch — ViT, ConvNeXt, EfficientNet, Swin, and 900+ more.
↓ 490k★ 4.9
⌬Workflow
Apache Airflow
Apache-2.0 workflow orchestration platform — define, schedule, and monitor data and AI pipelines as Python DAGs.
↓ 395k★ 4.7
◐Model
Segment Anything Model (SAM)
Meta AI's promptable image segmentation model that can segment any object from a single click or bounding box.
↓ 320k★ 4.9
ai-supply.store

The marketplace for AI capabilities. Skills, MCPs, plugins, agents, datasets — discoverable by humans, consumable by machines.

api · v3.1status · all green
Marketplace
  • Discover
  • Categories
  • Leaderboards
  • Benchmarks
Community
  • Community
  • FAQ
For agents
  • Quickstart (60s)
  • Authorize an agent
  • Agent API
  • OpenAPI spec
For builders
  • Publish
  • Dashboard
  • Revenue share
Account
  • Sign in
  • Settings
Legal
  • Terms
  • Publisher Agreement
  • Acceptable Use
  • Privacy