Skip to main content

Pi0 and Pi0.5 model fine-tuning guide

This guide explains how to fine-tune Pi0 and Pi0.5 with Docker images published by IO-AI.TECH, on top of base checkpoints such as pi0_base / pi05_base. Commands, mount conventions, and argument names match the OpenPI wrapper train_lerobot.py inside the image.

If you want the official OpenPI fine-tuning workflow without hand-building the environment, this is the path that matches day-to-day deployment practice.

Why this path

Pi0 / Pi0.5 are fine-tuned with the OpenPI stack (JAX under the hood) from public base weights. Once you have a LeRobot dataset, the most direct approach is to use:

  • ioaitech/train_openpi:pi0
  • ioaitech/train_openpi:pi05

Both images ship the dependencies needed for fine-tuning and use:

  • Dataset mounted at /data/input
  • Outputs (checkpoints, etc.) at /data/output
Docker image registry

Images are published on Docker Hub: ioaitech/train_openpi:pi0 and ioaitech/train_openpi:pi05.

One-command fine-tuning

Prerequisites

  • Linux host
  • Working NVIDIA driver
  • Docker
  • docker run --gpus all works

GPU sanity check:

docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Pi0

docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi0 \
--batch_size 8 \
--steps 20000 \
--save_interval 1000 \
--learning_rate 2.5e-5 \
--action_horizon 50 \
--prompt "pick up the object"

Pi0.5

docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi05 \
--batch_size 8 \
--steps 20000 \
--save_interval 1000 \
--learning_rate 2.5e-5 \
--action_horizon 50 \
--prompt "pick up the object"

The main difference is the image tag. The container selects base weights and Pi0Config according to the MODEL_TYPE baked into the image at build time.

Minimal smoke run

docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi0 \
--batch_size 1 \
--steps 1000

After this succeeds, increase batch_size, steps, and other fine-tuning hyperparameters.

Data requirements

The entrypoint requires /data/input/meta/info.json. Your dataset root should include:

your_dataset/
├── meta/
│ └── info.json
├── data/
└── videos/

The fine-tuning wrapper automatically:

  • Detects the LeRobot dataset version
  • Converts v3 layouts to a v2.1-compatible tree when needed for the pipeline
  • Ensures episodes_stats.jsonl for v2.1 when missing
  • Computes normalization statistics
  • Symlinks the mounted tree into the LeRobot cache for OpenPI

Pi0 vs Pi0.5

They share one fine-tuning wrapper but differ in model config and weights:

  • Pi0 loads pi0_base checkpoints
  • Pi0.5 loads pi05_base checkpoints
  • Pi0.5 uses Pi0Config(pi05=True) (state/token layout differs from Pi0)
  • Pi0.5 uses a larger default max_token_len for richer conditioning

Start with Pi0 to validate the pipeline; switch to ioaitech/train_openpi:pi05 when you intentionally want the Pi0.5 base and behavior.

Common arguments

The arguments below match train_lerobot.py in the image:

ArgumentDefaultDescription
--batch_size1Batch per optimization step; raised if needed to divide JAX device count
--steps1000Number of optimization steps
--gpusallAll GPUs or e.g. 0,1
--promptemptyDefault language prompt when the dataset has no task text
--save_interval500Checkpoint interval
--learning_rateemptyOmit to use peak LR 2.5e-5
--fsdp_devicesautoFSDP device count; auto from GPU count
--loraautoLoRA on by default for single GPU, off for multi-GPU
--ema_decayemptyEMA; disabled under LoRA by default to save VRAM
--action_horizon50Action sequence length

Fine-tuning behavior

The wrapper picks strategies automatically:

  • Single GPU: --lora auto tends to enable LoRA to save memory
  • Multi GPU: --fsdp_devices auto enables FSDP-style sharding
  • If batch_size is smaller than the device count or not divisible, it is bumped to a valid value

So the one-liners above emphasize “it runs” before you tune every JAX detail.

About --prompt

If episodes include task strings, those take precedence during fine-tuning. --prompt is only a fallback when the dataset has no task field. Treat it as optional unless you know your export lacks language metadata.

Outputs

Checkpoints go under the mounted /data/output. The embedded TrainConfig fixes:

  • name = docker_train
  • exp_name = train

So the default checkpoint directory is:

/path/to/output/docker_train/train/

Normalization stats are written under the asset directories configured during fine-tuning for reuse in later runs or inference.

Suggested workflows

1. Short validation on a new dataset

docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi0 \
--batch_size 1 \
--steps 1000 \
--save_interval 200

Confirm logs and that checkpoints appear before scaling up fine-tuning.

2. Conservative single-GPU run

Keep the default auto policy; do not force LoRA off at first:

docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi0 \
--batch_size 4 \
--steps 20000 \
--save_interval 1000 \
--action_horizon 50

3. Multi-GPU

docker run --rm --gpus all \
-v /path/to/lerobot_dataset:/data/input \
-v /path/to/output:/data/output \
ioaitech/train_openpi:pi05 \
--gpus 0,1,2,3 \
--batch_size 16 \
--steps 30000 \
--fsdp_devices 4 \
--save_interval 1000

FAQ

1. “No LeRobot dataset” at startup

Verify the host mount to /data/input and the presence of /data/input/meta/info.json.

2. Different behavior on one vs many GPUs

By design: LoRA, FSDP, and batch-size adjustment depend on device count.

3. Why is the output folder fixed?

The wrapper currently hard-codes docker_train/train under /data/output. Finer experiment naming may be added elsewhere; the docs reflect the image behavior as shipped.

4. Do I run compute_norm_stats manually?

No. The wrapper computes and saves normalization statistics before fine-tuning starts.

5. Pi0 or Pi0.5?

Use Pi0 to stabilize the pipeline first. Use Pi0.5 when you explicitly want that base and configuration.

Practical tips

  • Validate dataset layout before long fine-tuning jobs
  • Short runs before long runs
  • On single GPU, accept defaults before overriding LoRA/FSDP
  • Keep separate output roots when comparing Pi0 vs Pi0.5

References