◐ModelAudio & SpeechFree
faster-whisper
CTranslate2-powered reimplementation of Whisper with up to 4× faster inference and lower memory usage.
Installs420k
Rating★ 4.9
Reviews140
faster-whisper
faster-whisper is a high-performance reimplementation of OpenAI's Whisper using CTranslate2, a fast inference engine for Transformer models. It delivers 4× faster transcription than the original Whisper at the same accuracy — or the same speed with 2× less memory — making it production-ready for real-time and batch workloads.
Key Features
- 4× faster inference: CTranslate2 INT8 quantisation on CPU and FP16 on GPU
- Lower memory: serve large-v3 on a single consumer GPU that couldn't fit the original
- Word-level timestamps: accurate per-word start/end times with minimal overhead
- VAD filter: integrated Silero VAD to skip silence and reduce hallucinations on quiet audio
- Batched inference: process multiple audio chunks in parallel for maximum GPU utilisation
- Drop-in compatible: same model names as the original Whisper; automatic download from HF Hub
Quick Start
pip install faster-whisper
from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)
for segment in segments:
print(f"[{segment.start:.2f}s → {segment.end:.2f}s] {segment.text}")
npx ai-supply add faster-whisper-optimized-transcription
Curated mirror of the open-source faster-whisper (MIT). Get it from the source.