Pi0 and Pi0.5 model fine-tuning guide
This guide explains how to fine-tune Pi0 and Pi0.5 with Docker images published by IO-AI.TECH, on top of base checkpoints such as pi0_base / pi05_base. Commands, mount conventions, and argument names match the OpenPI wrapper train_lerobot.py inside the image.
If you want the official OpenPI fine-tuning workflow without hand-building the environment, this is the path that matches day-to-day deployment practice.
Why this path
Pi0 / Pi0.5 are fine-tuned with the OpenPI stack (JAX under the hood) from public base weights. Once you have a LeRobot dataset, the most direct approach is to use:
ioaitech/train_openpi:pi0ioaitech/train_openpi:pi05
Both images ship the dependencies needed for fine-tuning and use:
- Dataset mounted at
/data/input - Outputs (checkpoints, etc.) at
/data/output
Images are published on Docker Hub: ioaitech/train_openpi:pi0 and ioaitech/train_openpi:pi05.
One-command fine-tuning
Prerequisites
- Linux host
- Working NVIDIA driver
- Docker
docker run --gpus allworks
GPU sanity check:
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
Pi0
docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi0 \
--batch_size 8 \
--steps 20000 \
--save_interval 1000 \
--learning_rate 2.5e-5 \
--action_horizon 50 \
--prompt "pick up the object"
Pi0.5
docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi05 \
--batch_size 8 \
--steps 20000 \
--save_interval 1000 \
--learning_rate 2.5e-5 \
--action_horizon 50 \
--prompt "pick up the object"
The main difference is the image tag. The container selects base weights and Pi0Config according to the MODEL_TYPE baked into the image at build time.
Minimal smoke run
docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi0 \
--batch_size 1 \
--steps 1000
After this succeeds, increase batch_size, steps, and other fine-tuning hyperparameters.
Data requirements
The entrypoint requires /data/input/meta/info.json. Your dataset root should include:
your_dataset/
├── meta/
│ └── info.json
├── data/
└── videos/
The fine-tuning wrapper automatically:
- Detects the LeRobot dataset version
- Converts v3 layouts to a v2.1-compatible tree when needed for the pipeline
- Ensures
episodes_stats.jsonlfor v2.1 when missing - Computes normalization statistics
- Symlinks the mounted tree into the LeRobot cache for OpenPI
Pi0 vs Pi0.5
They share one fine-tuning wrapper but differ in model config and weights:
- Pi0 loads
pi0_basecheckpoints - Pi0.5 loads
pi05_basecheckpoints - Pi0.5 uses
Pi0Config(pi05=True)(state/token layout differs from Pi0) - Pi0.5 uses a larger default
max_token_lenfor richer conditioning
Start with Pi0 to validate the pipeline; switch to ioaitech/train_openpi:pi05 when you intentionally want the Pi0.5 base and behavior.
Common arguments
The arguments below match train_lerobot.py in the image:
| Argument | Default | Description |
|---|---|---|
--batch_size | 1 | Batch per optimization step; raised if needed to divide JAX device count |
--steps | 1000 | Number of optimization steps |
--gpus | all | All GPUs or e.g. 0,1 |
--prompt | empty | Default language prompt when the dataset has no task text |
--save_interval | 500 | Checkpoint interval |
--learning_rate | empty | Omit to use peak LR 2.5e-5 |
--fsdp_devices | auto | FSDP device count; auto from GPU count |
--lora | auto | LoRA on by default for single GPU, off for multi-GPU |
--ema_decay | empty | EMA; disabled under LoRA by default to save VRAM |
--action_horizon | 50 | Action sequence length |
Fine-tuning behavior
The wrapper picks strategies automatically:
- Single GPU:
--lora autotends to enable LoRA to save memory - Multi GPU:
--fsdp_devices autoenables FSDP-style sharding - If
batch_sizeis smaller than the device count or not divisible, it is bumped to a valid value
So the one-liners above emphasize “it runs” before you tune every JAX detail.
About --prompt
If episodes include task strings, those take precedence during fine-tuning. --prompt is only a fallback when the dataset has no task field. Treat it as optional unless you know your export lacks language metadata.
Outputs
Checkpoints go under the mounted /data/output. The embedded TrainConfig fixes:
name = docker_trainexp_name = train
So the default checkpoint directory is:
/path/to/output/docker_train/train/
Normalization stats are written under the asset directories configured during fine-tuning for reuse in later runs or inference.
Suggested workflows
1. Short validation on a new dataset
docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi0 \
--batch_size 1 \
--steps 1000 \
--save_interval 200
Confirm logs and that checkpoints appear before scaling up fine-tuning.
2. Conservative single-GPU run
Keep the default auto policy; do not force LoRA off at first:
docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi0 \
--batch_size 4 \
--steps 20000 \
--save_interval 1000 \
--action_horizon 50
3. Multi-GPU
docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi05 \
--gpus 0,1,2,3 \
--batch_size 16 \
--steps 30000 \
--fsdp_devices 4 \
--save_interval 1000
FAQ
1. “No LeRobot dataset” at startup
Verify the host mount to /data/input and the presence of /data/input/meta/info.json.
2. Different behavior on one vs many GPUs
By design: LoRA, FSDP, and batch-size adjustment depend on device count.
3. Why is the output folder fixed?
The wrapper currently hard-codes docker_train/train under /data/output. Finer experiment naming may be added elsewhere; the docs reflect the image behavior as shipped.
4. Do I run compute_norm_stats manually?
No. The wrapper computes and saves normalization statistics before fine-tuning starts.
5. Pi0 or Pi0.5?
Use Pi0 to stabilize the pipeline first. Use Pi0.5 when you explicitly want that base and configuration.
Practical tips
- Validate dataset layout before long fine-tuning jobs
- Short runs before long runs
- On single GPU, accept defaults before overriding LoRA/FSDP
- Keep separate output roots when comparing Pi0 vs Pi0.5