Name: Stable-Baselines3 — Reliable RL Algorithm Implementations
Availability: InStock
Author: ai-supply

Stable-Baselines3 — Reliable RL Algorithm Implementations

Stable-Baselines3 (SB3) by DLR-RM is the gold-standard library for reliable, well-tested reinforcement learning algorithm implementations in PyTorch. It provides production-quality implementations of PPO, SAC, TD3, DQN, A2C, and HER with a consistent sklearn-like API. Used across academia and industry for training game-playing agents, robotic controllers, and simulation-based optimization.

Key Features

Algorithms: PPO, SAC, TD3, DQN, A2C, DDPG, HER (Hindsight Experience Replay)
Consistent model.learn(total_timesteps=N) API across all algorithms
TensorBoard and Weights & Biases logging built-in
Gymnasium-compatible — plug in any gym.Env subclass
Vectorized environments for parallel rollout collection
Extensive documentation with performance benchmarks

Quick Start

pip install stable-baselines3[extra]

import gymnasium as gym
from stable_baselines3 import PPO

env = gym.make("LunarLander-v2")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=100_000)
model.save("ppo_lunarlander")

# Evaluate trained agent
obs, _ = env.reset()
for _ in range(1000):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, _ = env.step(action)
    if terminated or truncated:
        obs, _ = env.reset()

npx ai-supply add stable-baselines3-rl-algorithms

Curated mirror of the open-source Stable-Baselines3 (MIT). Get it from the source.

Stable-Baselines3 — Reliable RL Algorithm Implementations