◐ModelAudio & SpeechFree
Coqui TTS
Battle-tested deep learning TTS toolkit with XTTS, VITS, and 1000+ pretrained voices across 20+ languages.
Coqui TTS
Coqui TTS is a production-grade deep learning toolkit for Text-to-Speech synthesis. It implements state-of-the-art architectures (XTTS v2, VITS, YourTTS, GlowTTS, SpeedySpeech) with a unified API, supporting voice cloning, multilingual synthesis, and streaming output.
Key Features
- XTTS v2: cross-lingual zero-shot voice cloning in 17+ languages from a 6-second sample
- 1100+ pre-trained speaker voices available via the model manager
- Streaming synthesis for low-latency deployment in conversational AI
- Fine-tuning pipeline: adapt any model to a new voice in minutes on consumer hardware
- REST API server included (
tts-server) for easy integration
Quick Start
pip install TTS
# List available models
tts --list_models
# Synthesise speech
tts --text "Hello from Coqui!" \
--model_name tts_models/en/ljspeech/tacotron2-DDC \
--out_path output.wav
from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
tts.tts_to_file(
text="Bonjour le monde!",
speaker_wav="my_voice.wav",
language="fr",
file_path="output.wav",
)
npx ai-supply add coqui-tts-synthesis
Curated mirror of the open-source Coqui TTS (MPL-2.0). Get it from the source.