◆SkillGaming & SimulationFree
Stable-Baselines3 — Reliable RL Algorithm Implementations
DLR's MIT-licensed PyTorch implementations of PPO, SAC, TD3, DQN, A2C, and HER — battle-tested RL algorithms for game AI and simulation.
Stable-Baselines3 — Reliable RL Algorithm Implementations
Stable-Baselines3 (SB3) by DLR-RM is the gold-standard library for reliable, well-tested reinforcement learning algorithm implementations in PyTorch. It provides production-quality implementations of PPO, SAC, TD3, DQN, A2C, and HER with a consistent sklearn-like API. Used across academia and industry for training game-playing agents, robotic controllers, and simulation-based optimization.
Key Features
- Algorithms: PPO, SAC, TD3, DQN, A2C, DDPG, HER (Hindsight Experience Replay)
- Consistent
model.learn(total_timesteps=N)API across all algorithms - TensorBoard and Weights & Biases logging built-in
- Gymnasium-compatible — plug in any
gym.Envsubclass - Vectorized environments for parallel rollout collection
- Extensive documentation with performance benchmarks
Quick Start
pip install stable-baselines3[extra]
import gymnasium as gym
from stable_baselines3 import PPO
env = gym.make("LunarLander-v2")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=100_000)
model.save("ppo_lunarlander")
# Evaluate trained agent
obs, _ = env.reset()
for _ in range(1000):
action, _ = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, _ = env.step(action)
if terminated or truncated:
obs, _ = env.reset()
npx ai-supply add stable-baselines3-rl-algorithms
Curated mirror of the open-source Stable-Baselines3 (MIT). Get it from the source.