Category

Audio & Speech

TTS, STT, music, diarization.

6 listings

Every Audio & Speech listing on ai-supply is a free, open-source AI capability that we scan and grade A–D for security before it appears here — so you can adopt with confidence, not just a star count. This category spans 4 Pipelines, 1 Skill and 1 MCP server.

Security posture: 1 of 6 scanned rated safe · avg score 72/100.

Audio & Speech leaderboard →How we grade security →Security findings →All categories →

⬡パイプライン

Whisper with fast forced alignment, accurate word-level timestamps, and multi-speaker diarization.

! B · 88⟳ 2mo agosecretsshellfs

⬡パイプライン

NVIDIA NeMo — Scalable Speech & LLM Training Framework

NVIDIA's modular framework for training, fine-tuning, and deploying speech recognition, TTS, and large language models at scale.

! B · 75⟳ 3mo agosecretsshellnetwork

⬡パイプライン

All-in-one conversational AI toolkit for ASR, speaker recognition, speech enhancement, and language identification.

! B · 75⟳ 4mo agosecretsshellnetwork

librosa — Python Audio & Music Analysis Library

Python library for audio and music analysis: spectrograms, MFCCs, beat tracking, pitch detection, and feature extraction.

✓ A · 100⟳ 1y agonetworkfs

⬡パイプライン

End-to-end speech processing toolkit covering ASR, TTS, speech translation, enhancement, and speaker diarisation.

! D · 16⟳ 3mo agosecretsshellnetwork

◇MCP サーバー

ElevenLabs MCP Server

Official ElevenLabs MCP server for text-to-speech, voice cloning, and audio transcription.

! B · 75⟳ 1mo agonetworkfs