Browse the marketplace · ai-supply.store · ai-supply.store

Skip to content

ai-supply.store

探索分类排行榜社区 Agent API FAQ

目录

浏览市场

⌕

CategoryAll Cybersecurity Coding Finance Agentic capability Marketing Orchestration Data & ETL Research Vision & Image Audio & Speech Language & NLP DevOps & Infra Robotics & Control Healthcare Legal & Compliance Gaming & Simulation

KindAll ◆Skill ◇MCP server ⊕Plugin ◉Agent ◐Model ▣Dataset ⠿Embedding ⬡Pipeline ⌬Workflow ⇄Connector ❝Prompt ▤Template ⛨Guardrail ⊜Fine-tune △Eval

Sortpopular rating new most securePricefree paid

20 results

End-to-end ML lifecycle platform — experiment tracking, model registry, serving, and LLM evaluation.

↓ 730k★ 4.8

Weights & Biases (wandb)

ML experiment tracking and visualization — log metrics, hyperparameters, models, and media in real time.

↓ 680k★ 4.8

Nuclei — Template-Based Vulnerability Scanner

ProjectDiscovery's fast, template-driven vulnerability scanner with 9,000+ community templates for web apps, APIs, cloud, and AI service endpoints.

↓ 312k★ 4.9

Great Expectations

Data quality framework for defining, testing, and documenting expectations about your data pipelines.

↓ 210k★ 4.6

Prowler — Cloud Security Posture Manager

Apache-licensed multi-cloud security assessment tool covering 500+ checks across AWS, Azure, GCP, and Kubernetes, including AI service misconfigurations.

↓ 154k★ 4.7

Open-source ML and LLM observability framework for evaluating, monitoring, and testing AI system quality.

↓ 135k★ 4.6

LM Evaluation Harness

EleutherAI's MIT-licensed unified benchmark suite — the de-facto standard for evaluating language models across 200+ tasks.

↓ 112k★ 4.7

Self-hosted, open-source ML training metadata tracker with a powerful exploratory web UI.

MIT-licensed framework for evaluating LLMs and AI systems — build custom evals, run model comparisons, log results.

garak — LLM Vulnerability Scanner

NVIDIA's open-source LLM vulnerability scanner that probes language models for prompt injection, jailbreaks, hallucinations, and more.

procgen — Procedural Game Environments for RL

OpenAI's MIT-licensed suite of 16 procedurally-generated 2D game environments for measuring generalization in reinforcement learning agents.

Apache-2.0 LLM testing framework — pytest-style unit tests for RAG pipelines, chatbots, and LLM outputs.

PyRIT — Python Risk Identification Toolkit

Microsoft's open-source AI red-teaming toolkit for systematically finding risks in generative AI systems through automated adversarial probing.

Apache-2.0 RAG evaluation framework — faithfulness, answer relevancy, context recall, and more in one pip install.

LLM eval + red-teaming framework — test prompts and models against custom assertions, compare providers, and catch regressions in CI.

HELM — Holistic Evaluation of Language Models

Stanford CRFM's reproducible, multi-metric benchmark framework for evaluating any foundation model.

Opik (by Comet)

Open-source LLM evaluation and observability platform — trace, test, and monitor LLM apps end-to-end.

pyfolio-reloaded — Portfolio Performance Analytics

Maintained fork of Quantopian's pyfolio providing comprehensive risk and return analytics, tear sheets, and performance attribution for quantitative strategies.

LexGLUE — Legal Language Understanding Benchmark

Multi-task benchmark for legal NLP with 7 datasets covering EURLEX classification, contract clause labeling, court judgement prediction, and more.

Melting Pot — Multi-Agent RL Test Suite

Google DeepMind's suite of 50+ multi-agent social RL scenarios testing cooperation, competition, and generalization.

ai-supply.store

AI 能力市场。技能、MCP、插件、智能体、数据集——人可发现，机器可消费。

api · v3.1status · all green

联系

support@ai-supply.store security@ai-supply.store

市场

探索
分类
排行榜
基准测试

社区

社区
FAQ

面向智能体

快速入门 (60s)
授权智能体
Agent API
OpenAPI 规范

面向开发者

发布
控制台
收益分成

账户

登录
设置

法律条款

条款
发布者协议
可接受使用政策
隐私政策