Skip to content
ai-supply.store
탐색카테고리리더보드커뮤니티Agent APIFAQ
게시로그인
catalog / Language & NLP / TRL (Transformer Reinforcement Learning)
⊜Fine-tuneLanguage & NLPFree

TRL (Transformer Reinforcement Learning)

Fine-tune LLMs with RLHF, PPO, DPO, SFT, and GRPO — the standard library for aligning language models.

@ai-supply
설치 수380k
평점★ 4.8
리뷰127
↗ 소스 저장소

TRL — Transformer Reinforcement Learning

TRL is a full-stack library by Hugging Face for training transformer language models with reinforcement learning from human feedback (RLHF) and related alignment techniques. It provides efficient trainers for every stage of the modern LLM alignment pipeline.

Key Features

  • SFTTrainer: supervised fine-tuning with packing, LoRA, and chat templates
  • DPO/IPO/KTO: direct preference optimization variants — no reward model needed
  • PPO: proximal policy optimization with reward model for classic RLHF
  • GRPO: group relative policy optimization (as used in DeepSeek-R1)
  • RewardTrainer: train reward models from preference data
  • Integrates with PEFT, Accelerate, bitsandbytes for efficient training
  • 🤗 Hub model card generation and W&B/TensorBoard logging

Quick Start

from trl import SFTTrainer, SFTConfig
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
trainer = SFTTrainer(
    model=model,
    args=SFTConfig(output_dir="/tmp/sft"),
    train_dataset=dataset,
)
trainer.train()

Install via ai-supply

npx ai-supply add trl-rlhf-training

Curated mirror of the open-source TRL (Apache-2.0). Get it from the source.

More from @ai-supply

View profile →
◐Model
llama.cpp
Pure C/C++ LLM inference library — run quantized models on CPU, Metal, CUDA and more.
↓ 900k★ 4.9
⇄Connector
vLLM
High-throughput, memory-efficient LLM inference engine with PagedAttention and continuous batching.
↓ 820k★ 4.9
◉Agent
MetaGPT
Multi-agent framework that assigns GPT roles (PM, engineer, QA) to solve complex software tasks end-to-end.
↓ 820k★ 4.8
◆Skill
NLTK
The Natural Language Toolkit — Python's foundational NLP library for tokenization, POS tagging, parsing, and corpora.
↓ 760k★ 4.7
ai-supply.store

AI 역량 마켓플레이스. 스킬, MCP, 플러그인, 에이전트, 데이터셋 — 사람이 발견하고, 기계가 활용합니다.

api · v3.1status · all green
문의하기
support@ai-supply.storesecurity@ai-supply.store
마켓플레이스
  • 탐색
  • 카테고리
  • 리더보드
  • 벤치마크
커뮤니티
  • 커뮤니티
  • FAQ
에이전트용
  • 빠른 시작 (60s)
  • 에이전트 승인
  • Agent API
  • OpenAPI 사양
빌더용
  • 게시
  • 대시보드
  • 수익 배분
계정
  • 로그인
  • 설정
법적 정보
  • 이용약관
  • 게시자 계약
  • 이용 정책
  • 개인정보 처리방침