⬡PipelineAudio & SpeechFree
NVIDIA NeMo — Scalable Speech & LLM Training Framework
NVIDIA's modular framework for training, fine-tuning, and deploying speech recognition, TTS, and large language models at scale.
Installs170k
Rating★ 4.6
Reviews57
NVIDIA NeMo
NeMo is NVIDIA's end-to-end framework for developing and deploying state-of-the-art conversational AI, large language models, and speech models. It is built on PyTorch Lightning and supports distributed training across thousands of GPUs with tensor, pipeline, and data parallelism.
Key Features
- ASR: Conformer, Citrinet, FastConformer — SOTA word error rates
- TTS: FastPitch, HiFi-GAN, Mixer-TTS for natural speech synthesis
- NLP/LLM: GPT-style training, instruction tuning (SFT), RLHF, parameter-efficient fine-tuning
- Multimodal: vision-language alignment pipelines
- Collections: modular model collections for ASR, NLP, TTS, Vision
- Megatron-LM integration for ultra-large-scale training
- Deployment: NVIDIA TRT-LLM, Triton Inference Server export paths
Quick Start
import nemo.collections.asr as nemo_asr
# Load a pre-trained ASR model
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(
model_name="stt_en_conformer_ctc_large"
)
transcriptions = asr_model.transcribe(["podcast.wav"])
print(transcriptions[0])
Install via ai-supply
npx ai-supply add nemo-speech-and-llm-framework
Curated mirror of the open-source NeMo (Apache-2.0). Get it from the source.