Skip to content
ai-supply.store
ExplorarCategoriasClassificaçõesComunidadeAgent APIFAQ
PublicarEntrar
catalog / Language & NLP / TRL (Transformer Reinforcement Learning)
⊜Fine-tuneLanguage & NLPFree

TRL (Transformer Reinforcement Learning)

Fine-tune LLMs with RLHF, PPO, DPO, SFT, and GRPO — the standard library for aligning language models.

@ai-supply
Instalações380k
Avaliação★ 4.8
Análises127
↗ Repositório fonte

TRL — Transformer Reinforcement Learning

TRL is a full-stack library by Hugging Face for training transformer language models with reinforcement learning from human feedback (RLHF) and related alignment techniques. It provides efficient trainers for every stage of the modern LLM alignment pipeline.

Key Features

  • SFTTrainer: supervised fine-tuning with packing, LoRA, and chat templates
  • DPO/IPO/KTO: direct preference optimization variants — no reward model needed
  • PPO: proximal policy optimization with reward model for classic RLHF
  • GRPO: group relative policy optimization (as used in DeepSeek-R1)
  • RewardTrainer: train reward models from preference data
  • Integrates with PEFT, Accelerate, bitsandbytes for efficient training
  • 🤗 Hub model card generation and W&B/TensorBoard logging

Quick Start

from trl import SFTTrainer, SFTConfig
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
trainer = SFTTrainer(
    model=model,
    args=SFTConfig(output_dir="/tmp/sft"),
    train_dataset=dataset,
)
trainer.train()

Install via ai-supply

npx ai-supply add trl-rlhf-training

Curated mirror of the open-source TRL (Apache-2.0). Get it from the source.

More from @ai-supply

View profile →
◐Model
llama.cpp
Pure C/C++ LLM inference library — run quantized models on CPU, Metal, CUDA and more.
↓ 900k★ 4.9
⇄Connector
vLLM
High-throughput, memory-efficient LLM inference engine with PagedAttention and continuous batching.
↓ 820k★ 4.9
◉Agent
MetaGPT
Multi-agent framework that assigns GPT roles (PM, engineer, QA) to solve complex software tasks end-to-end.
↓ 820k★ 4.8
◆Skill
NLTK
The Natural Language Toolkit — Python's foundational NLP library for tokenization, POS tagging, parsing, and corpora.
↓ 760k★ 4.7
ai-supply.store

O marketplace de capacidades de IA. Habilidades, MCPs, plugins, agentes, datasets — descobertos por humanos, consumidos por máquinas.

api · v3.1status · all green
Contato
support@ai-supply.storesecurity@ai-supply.store
Marketplace
  • Explorar
  • Categorias
  • Classificações
  • Benchmarks
Comunidade
  • Comunidade
  • FAQ
Para agentes
  • Início rápido (60s)
  • Autorizar um agente
  • Agent API
  • Especificação OpenAPI
Para desenvolvedores
  • Publicar
  • Painel
  • Partilha de receitas
Conta
  • Entrar
  • Configurações
Legal
  • Termos
  • Acordo de editor
  • Uso aceitável
  • Privacidade