Skip to content
ai-supply.store
DiscoverCategoriesLeaderboardsCommunityAgent APIFAQ
PublishSign in
catalog / DevOps & Infra / DeepEval
△EvalDevOps & InfraFree

DeepEval

Apache-2.0 LLM testing framework — pytest-style unit tests for RAG pipelines, chatbots, and LLM outputs.

@ai-supply
Installs67k
Rating★ 4.6
Reviews22
Install (free) to download the source.↗ Source repository

DeepEval

DeepEval is an open-source LLM evaluation framework that brings software-testing discipline to AI systems. Written for Python and styled after pytest, it lets teams write unit tests for LLM outputs, RAG pipelines, and chatbot quality — with 14+ built-in metrics including G-Eval, RAGAS-style faithfulness, answer relevancy, hallucination detection, and toxicity.

Key features

  • 14+ built-in metrics: G-Eval, answer relevancy, faithfulness, contextual precision/recall, hallucination, toxicity, bias, summarization
  • pytest integration — assert LLM outputs like code
  • RAG-aware: test retrieval quality, context relevancy, and generation faithfulness in one suite
  • CI/CD ready with JUnit XML output and GitHub Actions support
  • Apache-2.0 license — commercial use fully permitted

Quick start

pip install deepeval
import pytest
from deepeval import assert_test
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric

def test_answer_relevancy():
    test_case = LLMTestCase(
        input="What is RAG?",
        actual_output="RAG stands for Retrieval-Augmented Generation.",
        retrieval_context=["RAG is a technique that enhances LLM outputs with retrieved documents."]
    )
    metric = AnswerRelevancyMetric(threshold=0.7)
    assert_test(test_case, [metric])
# Run your LLM test suite
deepevals test run test_llm.py

Install via ai-supply

npx ai-supply add deepeval-llm-testing

Curated mirror of the open-source DeepEval (Apache-2.0). Get it from the source.

More from @ai-supply

View profile →
◆Skill
OpenCV Python
The world's most popular computer vision library with Python bindings — image processing, video, and ML pipelines.
↓ 500k★ 4.9
◐Model
timm (PyTorch Image Models)
The largest collection of pretrained image models for PyTorch — ViT, ConvNeXt, EfficientNet, Swin, and 900+ more.
↓ 490k★ 4.9
⌬Workflow
Apache Airflow
Apache-2.0 workflow orchestration platform — define, schedule, and monitor data and AI pipelines as Python DAGs.
↓ 395k★ 4.7
◐Model
Segment Anything Model (SAM)
Meta AI's promptable image segmentation model that can segment any object from a single click or bounding box.
↓ 320k★ 4.9
ai-supply.store

The marketplace for AI capabilities. Skills, MCPs, plugins, agents, datasets — discoverable by humans, consumable by machines.

api · v3.1status · all green
Marketplace
  • Discover
  • Categories
  • Leaderboards
  • Benchmarks
Community
  • Community
  • FAQ
For agents
  • Quickstart (60s)
  • Authorize an agent
  • Agent API
  • OpenAPI spec
For builders
  • Publish
  • Dashboard
  • Revenue share
Account
  • Sign in
  • Settings
Legal
  • Terms
  • Publisher Agreement
  • Acceptable Use
  • Privacy