Diffusion Policy training (LeRobot)
Diffusion Policy was introduced by Columbia et al.; see diffusion-policy.cs.columbia.edu and the reference code columbia-ai-robotics/diffusion_policy. LeRobot ships a PyTorch implementation under LeRobot diffusion policy and publishes a PushT baseline lerobot/diffusion_pusht.
This page covers only the LeRobot path. For strict paper benchmark reproduction, use the original Diffusion Policy repo and its data pipeline.
When to use
Diffusion Policy models actions as a conditional denoising process. It fits continuous, smooth control with multiple feasible motion modes. It is heavier and slower than ACT, but a good choice when you need smooth multi-step trajectories.
Prefer it when:
- Trajectories are continuous and smoothness matters.
- Multiple valid action paths exist.
- You have more data than a minimal ACT baseline.
- You can afford longer training and slower inference.
If you only need to validate the data pipeline, establish a baseline with ACT or SmolVLA first.
Official references (LeRobot v0.5.0)
Public examples include:
examples/training/train_policy.py— trainsDiffusionPolicywithDiffusionConfigon PushT.- The
lerobot/diffusion_pushtmodel card — documentslerobot-trainwith--policy.type=diffusion,--dataset.repo_id=lerobot/pusht,--batch_size=64,--steps=200000.
Training should use the installed CLI:
lerobot-train \
--output_dir=outputs/train/diffusion_pusht \
--policy.type=diffusion \
--dataset.repo_id=lerobot/pusht \
--seed=100000 \
--env.type=pusht \
--batch_size=64 \
--steps=200000 \
--eval_freq=25000 \
--save_freq=25000 \
--wandb.enable=true
For your own robot data, omit --env.type initially; run offline training and checkpoint loading checks first.
LeRobot GPU image
IO-AI publishes ioaitech/lerobot-gpu:v0.5.0, which bundles LeRobot v0.5.0, GPU training dependencies, video decoding, and lerobot-train for Diffusion Policy training.
Smoke test
docker run --rm --gpus all --shm-size 16g \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/outputs \
ioaitech/lerobot-gpu:v0.5.0 \
bash -lc 'lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=local/my_dataset \
--dataset.root=/data/input \
--batch_size=8 \
--steps=1000 \
--output_dir=/outputs/diffusion_smoke \
--job_name=diffusion_smoke \
--policy.device=cuda \
--wandb.enable=false'
Training template
docker run --rm --gpus all --shm-size 16g \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/outputs \
ioaitech/lerobot-gpu:v0.5.0 \
bash -lc 'lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=local/my_dataset \
--dataset.root=/data/input \
--batch_size=64 \
--steps=100000 \
--output_dir=/outputs/diffusion_policy \
--job_name=diffusion_policy \
--policy.device=cuda \
--save_checkpoint=true \
--save_freq=10000 \
--wandb.enable=false'
If a flag is rejected, inspect the CLI for your exact image/git commit:
lerobot-train --help
Reproduce from upstream LeRobot
Read Install LeRobot, then pin v0.5.0:
git clone https://github.com/huggingface/lerobot.git
cd lerobot
git checkout v0.5.0
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
pip install -e ".[all]"
Local dataset:
lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=local/my_dataset \
--dataset.root=/path/to/lerobot_dataset \
--batch_size=64 \
--steps=100000 \
--output_dir=outputs/train/diffusion_policy \
--job_name=diffusion_policy \
--policy.device=cuda \
--wandb.enable=false
PushT reproduction:
lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=lerobot/pusht \
--env.type=pusht \
--batch_size=64 \
--steps=200000 \
--output_dir=outputs/train/diffusion_pusht \
--job_name=diffusion_pusht \
--policy.device=cuda
Parameter guidance
| Flag | Guidance |
|---|---|
--batch_size | Start at 8–16 for smoke tests; scale with VRAM. |
--steps | Diffusion policies often need longer runs than ACT; ~100k as a first baseline. |
--save_freq | Save multiple checkpoints to compare motion quality over training. |
--policy.device | Use cuda for real training. |
--wandb.enable | Recommended for long jobs to monitor loss and interruptions. |
Horizon, n_action_steps, noise schedules, and vision backbones change between releases—trust lerobot-train --help and the policy config classes for your pinned version.
Evaluation guidance
Offline loss is a weak proxy for robot success. At minimum compare:
- Trajectory smoothness (jitter, pauses).
- Repeatability under identical initial conditions.
- Robustness to object pose / occlusion.
- Whether inference latency meets your control rate.
Real-time control trades sampling steps vs quality—validate in sim or on hardware.
Troubleshooting
Key mismatch
Inspect meta/info.json for image/state/action keys. Diffusion Policy is time-window sensitive; wrong keys or FPS corrupt supervision.
OOM
Lower --batch_size, reduce DataLoader workers, and shorten --steps until a smoke run passes. Do not launch week-long jobs before a smoke test.
Mismatch vs original Diffusion Policy
LeRobot’s preprocessing, dataloading, and environments differ from the Columbia repo. For paper-level reproduction, use columbia-ai-robotics/diffusion_policy.