Diffusion Policy training (LeRobot)

Diffusion Policy was introduced by Columbia et al.; see diffusion-policy.cs.columbia.edu and the reference code columbia-ai-robotics/diffusion_policy. LeRobot ships a PyTorch implementation under LeRobot diffusion policy and publishes a PushT baseline lerobot/diffusion_pusht.

This page covers only the LeRobot path. For strict paper benchmark reproduction, use the original Diffusion Policy repo and its data pipeline.

When to use

Diffusion Policy models actions as a conditional denoising process. It fits continuous, smooth control with multiple feasible motion modes. It is heavier and slower than ACT, but a good choice when you need smooth multi-step trajectories.

Prefer it when:

Trajectories are continuous and smoothness matters.
Multiple valid action paths exist.
You have more data than a minimal ACT baseline.
You can afford longer training and slower inference.

If you only need to validate the data pipeline, establish a baseline with ACT or SmolVLA first.

Official references (LeRobot `v0.5.0`)

Public examples include:

examples/training/train_policy.py — trains DiffusionPolicy with DiffusionConfig on PushT.
The lerobot/diffusion_pusht model card — documents lerobot-train with --policy.type=diffusion, --dataset.repo_id=lerobot/pusht, --batch_size=64, --steps=200000.

Training should use the installed CLI:

lerobot-train \
  --output_dir=outputs/train/diffusion_pusht \
  --policy.type=diffusion \
  --dataset.repo_id=lerobot/pusht \
  --seed=100000 \
  --env.type=pusht \
  --batch_size=64 \
  --steps=200000 \
  --eval_freq=25000 \
  --save_freq=25000 \
  --wandb.enable=true

For your own robot data, omit --env.type initially; run offline training and checkpoint loading checks first.

LeRobot GPU image

IO-AI publishes ioaitech/lerobot-gpu:v0.5.0, which bundles LeRobot v0.5.0, GPU training dependencies, video decoding, and lerobot-train for Diffusion Policy training.

Smoke test

docker run --rm --gpus all --shm-size 16g \
  -v /path/to/lerobot_dataset:/data/input \
  -v /path/to/output:/outputs \
  ioaitech/lerobot-gpu:v0.5.0 \
  bash -lc 'lerobot-train \
    --policy.type=diffusion \
    --dataset.repo_id=local/my_dataset \
    --dataset.root=/data/input \
    --batch_size=8 \
    --steps=1000 \
    --output_dir=/outputs/diffusion_smoke \
    --job_name=diffusion_smoke \
    --policy.device=cuda \
    --wandb.enable=false'

Training template

docker run --rm --gpus all --shm-size 16g \
  -v /path/to/lerobot_dataset:/data/input \
  -v /path/to/output:/outputs \
  ioaitech/lerobot-gpu:v0.5.0 \
  bash -lc 'lerobot-train \
    --policy.type=diffusion \
    --dataset.repo_id=local/my_dataset \
    --dataset.root=/data/input \
    --batch_size=64 \
    --steps=100000 \
    --output_dir=/outputs/diffusion_policy \
    --job_name=diffusion_policy \
    --policy.device=cuda \
    --save_checkpoint=true \
    --save_freq=10000 \
    --wandb.enable=false'

If a flag is rejected, inspect the CLI for your exact image/git commit:

lerobot-train --help

Reproduce from upstream LeRobot

Read Install LeRobot, then pin v0.5.0:

git clone https://github.com/huggingface/lerobot.git
cd lerobot
git checkout v0.5.0

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
pip install -e ".[all]"

Local dataset:

lerobot-train \
  --policy.type=diffusion \
  --dataset.repo_id=local/my_dataset \
  --dataset.root=/path/to/lerobot_dataset \
  --batch_size=64 \
  --steps=100000 \
  --output_dir=outputs/train/diffusion_policy \
  --job_name=diffusion_policy \
  --policy.device=cuda \
  --wandb.enable=false

PushT reproduction:

lerobot-train \
  --policy.type=diffusion \
  --dataset.repo_id=lerobot/pusht \
  --env.type=pusht \
  --batch_size=64 \
  --steps=200000 \
  --output_dir=outputs/train/diffusion_pusht \
  --job_name=diffusion_pusht \
  --policy.device=cuda

Parameter guidance

Flag	Guidance
`--batch_size`	Start at 8–16 for smoke tests; scale with VRAM.
`--steps`	Diffusion policies often need longer runs than ACT; ~100k as a first baseline.
`--save_freq`	Save multiple checkpoints to compare motion quality over training.
`--policy.device`	Use `cuda` for real training.
`--wandb.enable`	Recommended for long jobs to monitor loss and interruptions.

Horizon, n_action_steps, noise schedules, and vision backbones change between releases—trust lerobot-train --help and the policy config classes for your pinned version.

Evaluation guidance

Offline loss is a weak proxy for robot success. At minimum compare:

Trajectory smoothness (jitter, pauses).
Repeatability under identical initial conditions.
Robustness to object pose / occlusion.
Whether inference latency meets your control rate.

Real-time control trades sampling steps vs quality—validate in sim or on hardware.

Troubleshooting

Key mismatch

Inspect meta/info.json for image/state/action keys. Diffusion Policy is time-window sensitive; wrong keys or FPS corrupt supervision.

OOM

Lower --batch_size, reduce DataLoader workers, and shorten --steps until a smoke run passes. Do not launch week-long jobs before a smoke test.

Mismatch vs original Diffusion Policy

LeRobot’s preprocessing, dataloading, and environments differ from the Columbia repo. For paper-level reproduction, use columbia-ai-robotics/diffusion_policy.

When to use​

Official references (LeRobot v0.5.0)​

LeRobot GPU image​

Smoke test​

Training template​

Reproduce from upstream LeRobot​

Parameter guidance​

Evaluation guidance​

Troubleshooting​

Key mismatch​

OOM​

Mismatch vs original Diffusion Policy​

References​