Browse the marketplace · ai-supply.store · ai-supply.store

Skip to content

ai-supply.store

探索分类排行榜社区 Agent API FAQ

目录

浏览市场

⌕

CategoryAll Cybersecurity Coding Finance Agentic capability Marketing Orchestration Data & ETL Research Vision & Image Audio & Speech Language & NLP DevOps & Infra Robotics & Control Healthcare Legal & Compliance Gaming & Simulation

SubcatAll Text-to-speech Speech-to-text Music generation Diarization Noise reduction Voice cloning

KindAll ◆Skill ◇MCP server ⊕Plugin ◉Agent ◐Model ▣Dataset ⠿Embedding ⬡Pipeline ⌬Workflow ⇄Connector ❝Prompt ▤Template ⛨Guardrail ⊜Fine-tune △Eval

Sortpopular rating new most securePricefree paid

16 results

Instant voice cloning with precise style control — tone, emotion, accent, and rhythm from a single short reference clip.

↓ 430k★ 4.7

CTranslate2-powered reimplementation of Whisper with up to 4× faster inference and lower memory usage.

↓ 420k★ 4.9

Lightweight neural voice activity detector that distinguishes speech from silence/noise in real time.

↓ 310k★ 4.8

Whisper with fast forced alignment, accurate word-level timestamps, and multi-speaker diarization.

↓ 280k★ 4.7

Chatterbox TTS — Resemble AI Open-Source TTS

Resemble AI's state-of-the-art open-source TTS model with voice cloning, emotion exaggeration, and zero-shot speaker adaptation.

↓ 245k★ 4.8

Battle-tested deep learning TTS toolkit with XTTS, VITS, and 1000+ pretrained voices across 20+ languages.

↓ 210k★ 4.8

82M-parameter Apache-2.0 text-to-speech model with high naturalness and multiple voice styles.

↓ 190k★ 4.7

OpenAI's open-source speech recognition model — accurate multilingual transcription and translation in one model.

↓ 175k★ 4.8

NVIDIA NeMo — Scalable Speech & LLM Training Framework

NVIDIA's modular framework for training, fine-tuning, and deploying speech recognition, TTS, and large language models at scale.

↓ 170k★ 4.6

Suno's transformer-based text-to-audio model that generates realistic speech, music, and sound effects from text.

↓ 145k★ 4.7

Meta AI's music source separation model that splits any song into vocals, drums, bass, and other stems.

librosa — Python Audio & Music Analysis Library

Python library for audio and music analysis: spectrograms, MFCCs, beat tracking, pitch detection, and feature extraction.

Speaker diarisation and voice activity detection toolkit — who spoke when in any audio recording.

All-in-one conversational AI toolkit for ASR, speaker recognition, speech enhancement, and language identification.

End-to-end speech processing toolkit covering ASR, TTS, speech translation, enhancement, and speaker diarisation.

Meta's state-of-the-art neural audio codec achieving CD-quality compression at 1.5 kbps — the backbone of MusicGen and AudioCraft.

ai-supply.store

AI 能力市场。技能、MCP、插件、智能体、数据集——人可发现，机器可消费。

api · v3.1status · all green

联系

support@ai-supply.store security@ai-supply.store

市场

探索
分类
排行榜
基准测试

社区

社区
FAQ

面向智能体

快速入门 (60s)
授权智能体
Agent API
OpenAPI 规范

面向开发者

发布
控制台
收益分成

账户

登录
设置

法律条款

条款
发布者协议
可接受使用政策
隐私政策