Skip to content
ai-supply.store
ExplorarCategoríasClasificacionesComunidadAgent APIFAQ
PublicarIniciar sesión
catalog / Language & NLP / TRL (Transformer Reinforcement Learning)
⊜Fine-tuneLanguage & NLPFree

TRL (Transformer Reinforcement Learning)

Fine-tune LLMs with RLHF, PPO, DPO, SFT, and GRPO — the standard library for aligning language models.

@ai-supply
Instalaciones380k
Valoración★ 4.8
Reseñas127
↗ Repositorio fuente

TRL — Transformer Reinforcement Learning

TRL is a full-stack library by Hugging Face for training transformer language models with reinforcement learning from human feedback (RLHF) and related alignment techniques. It provides efficient trainers for every stage of the modern LLM alignment pipeline.

Key Features

  • SFTTrainer: supervised fine-tuning with packing, LoRA, and chat templates
  • DPO/IPO/KTO: direct preference optimization variants — no reward model needed
  • PPO: proximal policy optimization with reward model for classic RLHF
  • GRPO: group relative policy optimization (as used in DeepSeek-R1)
  • RewardTrainer: train reward models from preference data
  • Integrates with PEFT, Accelerate, bitsandbytes for efficient training
  • 🤗 Hub model card generation and W&B/TensorBoard logging

Quick Start

from trl import SFTTrainer, SFTConfig
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
trainer = SFTTrainer(
    model=model,
    args=SFTConfig(output_dir="/tmp/sft"),
    train_dataset=dataset,
)
trainer.train()

Install via ai-supply

npx ai-supply add trl-rlhf-training

Curated mirror of the open-source TRL (Apache-2.0). Get it from the source.

More from @ai-supply

View profile →
◐Model
llama.cpp
Pure C/C++ LLM inference library — run quantized models on CPU, Metal, CUDA and more.
↓ 900k★ 4.9
⇄Connector
vLLM
High-throughput, memory-efficient LLM inference engine with PagedAttention and continuous batching.
↓ 820k★ 4.9
◉Agent
MetaGPT
Multi-agent framework that assigns GPT roles (PM, engineer, QA) to solve complex software tasks end-to-end.
↓ 820k★ 4.8
◆Skill
NLTK
The Natural Language Toolkit — Python's foundational NLP library for tokenization, POS tagging, parsing, and corpora.
↓ 760k★ 4.7
ai-supply.store

El marketplace de capacidades de IA. Habilidades, MCPs, plugins, agentes, datasets — descubribles por humanos, consumibles por máquinas.

api · v3.1status · all green
Contacto
support@ai-supply.storesecurity@ai-supply.store
Marketplace
  • Explorar
  • Categorías
  • Clasificaciones
  • Benchmarks
Comunidad
  • Comunidad
  • FAQ
Para agentes
  • Inicio rápido (60s)
  • Autorizar un agente
  • Agent API
  • Especificación OpenAPI
Para desarrolladores
  • Publicar
  • Panel
  • Reparto de ingresos
Cuenta
  • Iniciar sesión
  • Configuración
Legal
  • Términos
  • Acuerdo de editor
  • Uso aceptable
  • Privacidad