Skip to content
ai-supply.store
DiscoverCategoriesLeaderboardsCommunityAgent APIFAQ
PublishSign in
catalog / Language & NLP / TRL (Transformer Reinforcement Learning)
⊜Fine-tuneLanguage & NLPFree

TRL (Transformer Reinforcement Learning)

Fine-tune LLMs with RLHF, PPO, DPO, SFT, and GRPO — the standard library for aligning language models.

@ai-supply
Installs380k
Rating★ 4.8
Reviews127
↗ Source repository

TRL — Transformer Reinforcement Learning

TRL is a full-stack library by Hugging Face for training transformer language models with reinforcement learning from human feedback (RLHF) and related alignment techniques. It provides efficient trainers for every stage of the modern LLM alignment pipeline.

Key Features

  • SFTTrainer: supervised fine-tuning with packing, LoRA, and chat templates
  • DPO/IPO/KTO: direct preference optimization variants — no reward model needed
  • PPO: proximal policy optimization with reward model for classic RLHF
  • GRPO: group relative policy optimization (as used in DeepSeek-R1)
  • RewardTrainer: train reward models from preference data
  • Integrates with PEFT, Accelerate, bitsandbytes for efficient training
  • 🤗 Hub model card generation and W&B/TensorBoard logging

Quick Start

from trl import SFTTrainer, SFTConfig
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
trainer = SFTTrainer(
    model=model,
    args=SFTConfig(output_dir="/tmp/sft"),
    train_dataset=dataset,
)
trainer.train()

Install via ai-supply

npx ai-supply add trl-rlhf-training

Curated mirror of the open-source TRL (Apache-2.0). Get it from the source.

More from @ai-supply

View profile →
◐Model
llama.cpp
Pure C/C++ LLM inference library — run quantized models on CPU, Metal, CUDA and more.
↓ 900k★ 4.9
⇄Connector
vLLM
High-throughput, memory-efficient LLM inference engine with PagedAttention and continuous batching.
↓ 820k★ 4.9
⠿Embedding
Sentence Transformers
State-of-the-art sentence and text embeddings — compute semantic similarity, clustering, and dense retrieval.
↓ 750k★ 4.9
⬡Pipeline
Diffusers
Hugging Face's state-of-the-art library for diffusion-based image, video, and audio generation models.
↓ 750k★ 4.9
ai-supply.store

The marketplace for AI capabilities. Skills, MCPs, plugins, agents, datasets — discoverable by humans, consumable by machines.

api · v3.1status · all green
Marketplace
  • Discover
  • Categories
  • Leaderboards
  • Benchmarks
Community
  • Community
  • FAQ
For agents
  • Quickstart (60s)
  • Authorize an agent
  • Agent API
  • OpenAPI spec
For builders
  • Publish
  • Dashboard
  • Revenue share
Account
  • Sign in
  • Settings
Legal
  • Terms
  • Publisher Agreement
  • Acceptable Use
  • Privacy