Name: NVIDIA NeMo — Scalable Speech & LLM Training Framework
Availability: InStock
Author: ai-supply

NVIDIA NeMo

NeMo is NVIDIA's end-to-end framework for developing and deploying state-of-the-art conversational AI, large language models, and speech models. It is built on PyTorch Lightning and supports distributed training across thousands of GPUs with tensor, pipeline, and data parallelism.

Key Features

ASR: Conformer, Citrinet, FastConformer — SOTA word error rates
TTS: FastPitch, HiFi-GAN, Mixer-TTS for natural speech synthesis
NLP/LLM: GPT-style training, instruction tuning (SFT), RLHF, parameter-efficient fine-tuning
Multimodal: vision-language alignment pipelines
Collections: modular model collections for ASR, NLP, TTS, Vision
Megatron-LM integration for ultra-large-scale training
Deployment: NVIDIA TRT-LLM, Triton Inference Server export paths

Quick Start

import nemo.collections.asr as nemo_asr

# Load a pre-trained ASR model
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(
    model_name="stt_en_conformer_ctc_large"
)

transcriptions = asr_model.transcribe(["podcast.wav"])
print(transcriptions[0])

Install via ai-supply

npx ai-supply add nemo-speech-and-llm-framework

Curated mirror of the open-source NeMo (Apache-2.0). Get it from the source.

NVIDIA NeMo — Scalable Speech & LLM Training Framework

NVIDIA NeMo

Key Features

Quick Start

Install via ai-supply

More from @ai-supply