Skip to main content

Diffusion Policy training (LeRobot)

Diffusion Policy was introduced by Columbia et al.; see diffusion-policy.cs.columbia.edu and the reference code columbia-ai-robotics/diffusion_policy. LeRobot ships a PyTorch implementation under LeRobot diffusion policy and publishes a PushT baseline lerobot/diffusion_pusht.

This page covers only the LeRobot path. For strict paper benchmark reproduction, use the original Diffusion Policy repo and its data pipeline.

When to use

Diffusion Policy models actions as a conditional denoising process. It fits continuous, smooth control with multiple feasible motion modes. It is heavier and slower than ACT, but a good choice when you need smooth multi-step trajectories.

Prefer it when:

  • Trajectories are continuous and smoothness matters.
  • Multiple valid action paths exist.
  • You have more data than a minimal ACT baseline.
  • You can afford longer training and slower inference.

If you only need to validate the data pipeline, establish a baseline with ACT or SmolVLA first.

Official references (LeRobot v0.5.0)

Public examples include:

  • examples/training/train_policy.py — trains DiffusionPolicy with DiffusionConfig on PushT.
  • The lerobot/diffusion_pusht model card — documents lerobot-train with --policy.type=diffusion, --dataset.repo_id=lerobot/pusht, --batch_size=64, --steps=200000.

Training should use the installed CLI:

lerobot-train \
--output_dir=outputs/train/diffusion_pusht \
--policy.type=diffusion \
--dataset.repo_id=lerobot/pusht \
--seed=100000 \
--env.type=pusht \
--batch_size=64 \
--steps=200000 \
--eval_freq=25000 \
--save_freq=25000 \
--wandb.enable=true

For your own robot data, omit --env.type initially; run offline training and checkpoint loading checks first.

LeRobot GPU image

IO-AI publishes ioaitech/lerobot-gpu:v0.5.0, which bundles LeRobot v0.5.0, GPU training dependencies, video decoding, and lerobot-train for Diffusion Policy training.

Smoke test

docker run --rm --gpus all --shm-size 16g \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/outputs \
ioaitech/lerobot-gpu:v0.5.0 \
bash -lc 'lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=local/my_dataset \
--dataset.root=/data/input \
--batch_size=8 \
--steps=1000 \
--output_dir=/outputs/diffusion_smoke \
--job_name=diffusion_smoke \
--policy.device=cuda \
--wandb.enable=false'

Training template

docker run --rm --gpus all --shm-size 16g \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/outputs \
ioaitech/lerobot-gpu:v0.5.0 \
bash -lc 'lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=local/my_dataset \
--dataset.root=/data/input \
--batch_size=64 \
--steps=100000 \
--output_dir=/outputs/diffusion_policy \
--job_name=diffusion_policy \
--policy.device=cuda \
--save_checkpoint=true \
--save_freq=10000 \
--wandb.enable=false'

If a flag is rejected, inspect the CLI for your exact image/git commit:

lerobot-train --help

Reproduce from upstream LeRobot

Read Install LeRobot, then pin v0.5.0:

git clone https://github.com/huggingface/lerobot.git
cd lerobot
git checkout v0.5.0

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
pip install -e ".[all]"

Local dataset:

lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=local/my_dataset \
--dataset.root=/path/to/lerobot_dataset \
--batch_size=64 \
--steps=100000 \
--output_dir=outputs/train/diffusion_policy \
--job_name=diffusion_policy \
--policy.device=cuda \
--wandb.enable=false

PushT reproduction:

lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=lerobot/pusht \
--env.type=pusht \
--batch_size=64 \
--steps=200000 \
--output_dir=outputs/train/diffusion_pusht \
--job_name=diffusion_pusht \
--policy.device=cuda

Parameter guidance

FlagGuidance
--batch_sizeStart at 8–16 for smoke tests; scale with VRAM.
--stepsDiffusion policies often need longer runs than ACT; ~100k as a first baseline.
--save_freqSave multiple checkpoints to compare motion quality over training.
--policy.deviceUse cuda for real training.
--wandb.enableRecommended for long jobs to monitor loss and interruptions.

Horizon, n_action_steps, noise schedules, and vision backbones change between releases—trust lerobot-train --help and the policy config classes for your pinned version.

Evaluation guidance

Offline loss is a weak proxy for robot success. At minimum compare:

  • Trajectory smoothness (jitter, pauses).
  • Repeatability under identical initial conditions.
  • Robustness to object pose / occlusion.
  • Whether inference latency meets your control rate.

Real-time control trades sampling steps vs quality—validate in sim or on hardware.

Troubleshooting

Key mismatch

Inspect meta/info.json for image/state/action keys. Diffusion Policy is time-window sensitive; wrong keys or FPS corrupt supervision.

OOM

Lower --batch_size, reduce DataLoader workers, and shorten --steps until a smoke run passes. Do not launch week-long jobs before a smoke test.

Mismatch vs original Diffusion Policy

LeRobot’s preprocessing, dataloading, and environments differ from the Columbia repo. For paper-level reproduction, use columbia-ai-robotics/diffusion_policy.

References