◆SkillLanguage & NLPFree
Hugging Face Accelerate
Run PyTorch training scripts on any hardware — single GPU, multi-GPU, TPU — with minimal code changes.
Installs450k
Rating★ 4.8
Reviews150
Hugging Face Accelerate
Accelerate is a library that enables the same PyTorch training code to run seamlessly on any distributed configuration: single CPU, single GPU, multi-GPU (DDP/FSDP/DeepSpeed), TPU, or mixed precision — with almost no code changes required. It abstracts away all the boilerplate of distributed training.
Key Features
- Hardware agnostic: single line change to go from single GPU to 8-GPU DDP or TPU
- FSDP and DeepSpeed integration: scale to billions of parameters with memory-efficient sharding
- Mixed precision: fp16, bf16, fp8 training with gradient scaling
- Gradient accumulation and checkpointing built-in
- Big Model Inference: load models larger than GPU memory via device mapping
- Fully compatible with vanilla PyTorch — no new abstractions to learn
Quick Start
from accelerate import Accelerator
accelerator = Accelerator()
model, optimizer, train_loader = accelerator.prepare(model, optimizer, train_loader)
for batch in train_loader:
outputs = model(**batch)
loss = outputs.loss
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
# Launch multi-GPU training
accelerate launch --num_processes 4 train.py
Install via ai-supply
npx ai-supply add huggingface-accelerate-training
Curated mirror of the open-source Hugging Face Accelerate (Apache-2.0). Get it from the source.