Name: DeepEval
Availability: InStock
Rating: 4.6 (22 reviews)
Author: ai-supply

DeepEval

DeepEval is an open-source LLM evaluation framework that brings software-testing discipline to AI systems. Written for Python and styled after pytest, it lets teams write unit tests for LLM outputs, RAG pipelines, and chatbot quality — with 14+ built-in metrics including G-Eval, RAGAS-style faithfulness, answer relevancy, hallucination detection, and toxicity.

Key features

14+ built-in metrics: G-Eval, answer relevancy, faithfulness, contextual precision/recall, hallucination, toxicity, bias, summarization
pytest integration — assert LLM outputs like code
RAG-aware: test retrieval quality, context relevancy, and generation faithfulness in one suite
CI/CD ready with JUnit XML output and GitHub Actions support
Apache-2.0 license — commercial use fully permitted

Quick start

pip install deepeval

import pytest
from deepeval import assert_test
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric

def test_answer_relevancy():
    test_case = LLMTestCase(
        input="What is RAG?",
        actual_output="RAG stands for Retrieval-Augmented Generation.",
        retrieval_context=["RAG is a technique that enhances LLM outputs with retrieved documents."]
    )
    metric = AnswerRelevancyMetric(threshold=0.7)
    assert_test(test_case, [metric])

# Run your LLM test suite
deepevals test run test_llm.py

Install via ai-supply

npx ai-supply add deepeval-llm-testing

Curated mirror of the open-source DeepEval (Apache-2.0). Get it from the source.

DeepEval

DeepEval

Key features

Quick start

Install via ai-supply

More from @ai-supply