Skip to main content

Fine-tuning Pi0 Model with Your Data

Pi0 is a powerful Vision-Language-Action (VLA) model that can quickly adapt to your robot tasks through fine-tuning with a small amount of data.

This document is based on the LeRobot framework, using the pre-trained Pi0 model (lerobot/pi0) to perform fine-tuning on your own robot dataset.

Prerequisites

  • You have a dataset in LeRobot format (see the previous section on exporting LeRobot datasets)
  • You want to adapt the Pi0 model to your robot hardware, tasks, or control style
  • You are familiar with PyTorch and the Hugging Face ecosystem for training
  • You have basic deep learning training experience

What is Fine-tuning?

Fine-tuning refers to continuing to train a pre-trained model for a short period using your own data.

Pi0 is a model pre-trained on multiple general robot tasks, and it has already learned many general capabilities in visual understanding, instruction understanding, and action prediction.

Through fine-tuning, you can make Pi0:

  • Adapt to your robot environment: Adapt to your camera perspective, lighting environment, and mechanical structure
  • Adapt to specific tasks: Optimize for your specific tasks (such as picking up pens, classification, placement)
  • Improve control accuracy: Significantly improve the model's control accuracy and success rate in your tasks

Simply put: Pre-trained models know a lot, but they don't necessarily understand you; after fine-tuning, they understand you.

Environment Setup

System Requirements

Please ensure your environment meets the following requirements:

  • Python ≥ 3.8 (recommended 3.10 or 3.11)
  • GPU: Recommend at least 32GB VRAM (Pi0 is a large model, recommend V100 or higher level graphics cards)
  • Memory: Recommend at least 64GB RAM
  • Storage: Ensure sufficient disk space for datasets and models

Install Dependencies

# Clone LeRobot repository
git clone https://github.com/huggingface/lerobot.git
cd lerobot/

# Install LeRobot framework (includes Pi0 support)
pip install -e ".[pi0]"

# Verify installation
python -c "from lerobot.policies import Pi0Policy; print('Pi0 installation successful!')"

Prepare Your Dataset

Export LeRobot Format Data

You can export annotated data as LeRobot format datasets through the IO Data Platform export page. See Export LeRobot Dataset.

Important note: You don't need to upload your data to HuggingFace to train. Use the --dataset.root= parameter to specify using local directory data for fine-tuning.

Dataset Structure Example

Assume your exported data is stored in ~/DualPiper_Pickup_Pen:

$ ls ~/DualPiper_Pickup_Pen
data meta videos

$ ls ~/DualPiper_Pickup_Pen/data
episode_000 episode_001 episode_002 ...

$ ls ~/DualPiper_Pickup_Pen/data/episode_000
observations actions.npy language_instruction.txt metadata.json

Start Fine-tuning

Basic Training Command

Use the train.py script for training:

# Set CUDA memory allocation strategy (recommended)
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

# Start training
python3 -m lerobot.scripts.train \
--dataset.root=~/DualPiper_Pickup_Pen \
--dataset.repo_id=io-ai-data/DualPiper_Pickup_Pen \
--policy.path=lerobot/pi0 \
--policy.repo_id=lerobot/pi0 \
--policy.device=cuda \
--output_dir=./checkpoints/pi0_finetuned \
--batch_size=1 \
--policy.attention_implementation=flash_attention_2 \
--training.learning_rate=1e-5 \
--training.num_epochs=10
tip

The Pi0 model itself is quite large, so single-card training takes at least dozens of hours. For example, when we fine-tuned 2 hours of collected data, it took 50 hours on a single V100 card. 8 V100 cards would speed up the process, but it still takes nearly 10 hours.

Parameter Details

ParameterMeaningRecommended ValueDescription
--dataset.rootLocal dataset path~/your_datasetPoints to your LeRobot dataset directory
--dataset.repo_idHugging Face dataset nameyour-username/datasetUsed to identify the dataset
--policy.pathPre-trained model pathlerobot/pi0Use official pre-trained model
--policy.repo_idModel repository IDlerobot/pi0Model repository on Hugging Face
--policy.deviceTraining devicecudaUse GPU acceleration
--output_dirModel save directory./checkpoints/pi0_finetunedPath to save fine-tuned model
--batch_sizeBatch size1Adjust based on VRAM size
--policy.attention_implementationAttention mechanism implementationflash_attention_2Use Flash Attention 2 for acceleration
--training.learning_rateLearning rate1e-5Use smaller learning rate for fine-tuning
--training.num_epochsNumber of training epochs10Adjust based on data volume

Multi-GPU Accelerated Training

If you have multiple graphics cards, you can train in parallel to significantly speed up the process.

# Use 8 GPUs for training
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
torchrun --nproc_per_node=8 --master_port=29500 \
-m lerobot.scripts.train \
--dataset.root=/data/nfs2/export/lerobot/DualPiper_Pickup_Pen/ \
--dataset.repo_id=io-ai-data/DualPiper_Pickup_Pen \
--policy.path=lerobot/pi0 \
--policy.repo_id=lerobot/pi0 \
--policy.device=cuda \
--output_dir=/home/puxk/DualPiper_Pickup_Pen_pi0_model \
--batch_size=1 \
--num_workers=4 \
--policy.attention_implementation=flash_attention_2

The total batch size is nproc_per_node × batch_size. With --batch_size=1, 8 cards equals a total batch size of 8. You can increase this if memory allows.

For debugging, it's recommended to first test with --nproc_per_node=2 to see if it runs, then expand to all 8 cards.

Special Notes

  1. Network acceleration: You may need network acceleration to download the Pi0 model from HuggingFace, refer to tools like hg-mirror
  2. VRAM monitoring: Please monitor GPU VRAM usage during training
  3. Resume training: If training is interrupted, you can continue from checkpoints in --output_dir

Important Notes: Gemma Model Compatibility

The Pi0 model internally uses Google PaLI-Gemma model structure for language processing. Since it's a recently added Hugging Face model:

Version Requirements

  • Transformers: Ensure you use the latest version of transformers (recommended ≥ 4.37.0)
  • Compatibility check: If you encounter embed_tokens related errors, please upgrade transformers

Solutions

# Upgrade to latest transformers
pip install -U transformers

# Or specify version
pip install transformers>=4.37.0

View Training Results

Model Save Location

After training is complete, the model will be saved in:

./checkpoints/pi0_finetuned/
├── config.json
├── pytorch_model.bin
├── tokenizer.json
└── training_args.bin

Load Fine-tuned Model

from lerobot.policies import Pi0Policy

# Load fine-tuned model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")

# Perform inference
action = policy.select_action(batch)

Training Log Analysis

During training, detailed logs will be generated, including:

  • Loss curves: Changes in training loss and validation loss
  • Learning rate scheduling: Changes in learning rate
  • Gradient information: Gradient norms and gradient clipping
  • Memory usage: GPU VRAM and system memory usage

How to Evaluate Results?

IO Data Platform Evaluation

The IO Data Platform supports model inference quality evaluation. You can compare any real data with model inference results and visualize the comparison for each joint instruction.

Other Evaluation Methods

You can also evaluate Pi0's performance on your tasks through the following methods:

1. Quantitative Evaluation

  • Action error: Compare predicted actions with real actions (such as angle error, position error)
  • Trajectory comparison: Visualize predicted vs ground truth behavior trajectories
  • Loss metrics: Record loss change trends on validation set

2. Qualitative Evaluation

  • Real deployment: Verify task success rate on real robots
  • Behavior observation: Observe whether model-generated actions meet expectations
  • User feedback: Collect user feedback from actual usage

3. Evaluation Script Example

import numpy as np
from lerobot.policies import Pi0Policy

# Load model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")

# Evaluation function
def evaluate_model(policy, test_dataset):
total_error = 0
num_samples = 0

for batch in test_dataset:
predicted_action = policy.select_action(batch)
true_action = batch['actions']

# Calculate action error
error = np.mean(np.abs(predicted_action - true_action))
total_error += error
num_samples += 1

return total_error / num_samples

# Execute evaluation
avg_error = evaluate_model(policy, test_dataset)
print(f"Average action error: {avg_error:.4f}")

Frequently Asked Questions

Q: Model reports embed_tokens doesn't exist during fine-tuning?

A: This is because the transformers version is too old, just upgrade:

pip install -U transformers

Q: Dataset not uploaded to Hugging Face, can I use local data?

A: Yes, use the --dataset.root=your_local_path parameter.

Q: What to do if VRAM is insufficient during training?

A: You can try the following methods:

# 1. Reduce batch size
--batch_size=1

There are other parameters that need to be decided based on actual circumstances.

Q: How long does training typically take?

A: Training time depends on:

  • Data volume: Usually 10+ hours
  • Hardware configuration: GPU performance has a big impact
  • Training epochs: Recommend 5-20 epochs

Q: How to determine if training has converged?

A: Observe the following indicators:

  • Loss curves: Training loss and validation loss stabilize
  • Validation metrics: Performance on validation set no longer improves
  • Overfitting: Stop training when validation loss starts to rise

Q: How to deploy the fine-tuned model?

A: The IO Data Platform already supports automatic deployment of Pi0 and SmolVLA and other common models in the robotics field. Please consult our technical experts for details.

You can also refer to the following steps:

# 1. Load model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")

# 2. Prepare input data
observation = {
'images': camera_images,
'language_instruction': "Pick up the red pen"
}

# 3. Generate action commands
action = policy.select_action(observation)

Good luck with your training!