Skip to main content

Fine-Tuning the Pi0 Model: Adapting to Robot Tasks with Custom Datasets

Pi0 is an advanced Vision-Language-Action (VLA) model that can quickly adapt to specific robot tasks through few-shot fine-tuning.

This document is based on the LeRobot framework and uses the pre-trained Pi0 model (lerobot/pi0) for fine-tuning on custom robot datasets.

Prerequisites

  • Have a dataset in LeRobot format (refer to the LeRobot dataset export guide in the previous section)
  • Need to adapt the Pi0 model to specific robot hardware, tasks, or control strategies
  • Familiar with PyTorch and the Hugging Face ecosystem training process
  • Basic experience in deep learning model training

Fine-Tuning Overview

Fine-tuning is the process of further training a pre-trained model using domain-specific data, typically involving fewer iterations.

The Pi0 model has been pre-trained on diverse robot tasks, possessing general visual perception, language understanding, and action generation capabilities.

Through fine-tuning, Pi0 can achieve:

  • Environmental Adaptation: Adjust to specific camera views, lighting conditions, and mechanical structures
  • Task Specialization: Optimize performance for specific tasks (e.g., object grasping, classification, or placement)
  • Precision Improvement: Significantly enhance control accuracy and success rate in target tasks

In simple terms: Pre-trained models know a lot, but not necessarily about you; after fine-tuning, they understand you.

Environment Preparation

System Requirements

Ensure your environment meets the following requirements:

  • Python ≥ 3.8 (recommended 3.10 or higher)
  • GPU: At least 32GB VRAM (Pi0 is a large-scale model, recommend NVIDIA V100 or higher performance GPU)
  • Memory: At least 64GB system RAM
  • Storage: Sufficient disk space for datasets and model checkpoints

Install Dependencies

# Clone LeRobot repository (primary)
git clone https://github.com/lerobot-ai/lerobot.git
cd lerobot/

# Install LeRobot framework (including Pi0 support)
pip install -e ".[pi0]"

# Verify installation
python -c "from lerobot.policies import Pi0Policy; print('Pi0 installed successfully!')"

Mirror (if needed): https://github.com/huggingface/lerobot

Prepare Your Dataset

Export LeRobot Format Data

You can export annotated data to LeRobot format dataset through the IO Data Platform export page, see Export LeRobot Dataset.

Important Note: You don't need to upload your data to HuggingFace for training; use the --dataset.root= parameter to specify local directory data for fine-tuning.

Dataset Structure Example

Assume your exported data is stored in ~/DualPiper_Pickup_Pen:

$ ls ~/DualPiper_Pickup_Pen
data meta videos

$ ls ~/DualPiper_Pickup_Pen/data
episode_000 episode_001 episode_002 ...

$ ls ~/DualPiper_Pickup_Pen/data/episode_000
observations actions.npy language_instruction.txt metadata.json

Start Fine-Tuning

Basic Training Command

Use the train.py script for training:

# Set CUDA memory allocation policy (recommended)
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

# Start training
python3 -m lerobot.scripts.train \
--dataset.root=~/DualPiper_Pickup_Pen \
--dataset.repo_id=io-ai-data/DualPiper_Pickup_Pen \
--policy.path=lerobot/pi0 \
--policy.repo_id=lerobot/pi0 \
--policy.device=cuda \
--output_dir=./checkpoints/pi0_finetuned \
--batch_size=1 \
--policy.attention_implementation=flash_attention_2 \
--training.learning_rate=1e-5 \
--training.num_epochs=10
tip

The Pi0 model has a large number of parameters, single GPU training may take tens of hours. For example, fine-tuning a 2-hour dataset takes about 50 hours on a single NVIDIA V100, while using 8 V100s can reduce it to about 10 hours.

Parameter Details

ParameterMeaningRecommended ValueDescription
--dataset.rootLocal dataset path~/your_datasetPoints to LeRobot format dataset directory
--dataset.repo_idHugging Face dataset IDyour-username/datasetFor metadata identification
--policy.pathPre-trained model pathlerobot/pi0Specify official pre-trained model
--policy.repo_idModel repository IDlerobot/pi0Hugging Face model repository
--policy.deviceTraining devicecudaEnable GPU acceleration
--output_dirOutput directory./checkpoints/pi0_finetunedFine-tuned model save path
--batch_sizeBatch size1Adjust based on VRAM
--policy.attention_implementationAttention mechanismflash_attention_2Use efficient implementation to accelerate training
--training.learning_rateLearning rate1e-5Use lower value during fine-tuning to avoid catastrophic forgetting
--training.num_epochsNumber of training epochs10Adjust based on dataset scale

Multi-GPU Accelerated Training

If you have multiple GPUs, parallel training can significantly speed up the process.

# Train with 8 GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
torchrun --nproc_per_node=8 --master_port=29500 \
-m lerobot.scripts.train \
--dataset.root=/data/nfs2/export/lerobot/DualPiper_Pickup_Pen/ \
--dataset.repo_id=io-ai-data/DualPiper_Pickup_Pen \
--policy.path=lerobot/pi0 \
--policy.repo_id=lerobot/pi0 \
--policy.device=cuda \
--output_dir=/home/puxk/DualPiper_Pickup_Pen_pi0_model \
--batch_size=1 \
--num_workers=4 \
--policy.attention_implementation=flash_attention_2

The total batch size is nproc_per_node × batch_size. With --batch_size=1, 8 cards equal a total batch size of 8; increase if memory allows.

For debugging, suggest starting with --nproc_per_node=2 to test, then expand to all 8 cards.

Special Notes

  1. Network Access: Downloading Hugging Face models may require proxy tools like hf-mirror
  2. VRAM Monitoring: Monitor GPU utilization in real-time during training
  3. Checkpoint Recovery: If training is interrupted, resume from the latest checkpoint in --output_dir

The Pi0 model integrates Google PaliGemma for language processing. As a recent Hugging Face model:

Version Compatibility

  • Transformers: Requires ≥ 4.37.0
  • Compatibility Check: If encountering embed_tokens errors, upgrade transformers

Solutions

# Upgrade to latest transformers
pip install -U transformers

# Or specify version
pip install transformers>=4.37.0

Policy Selection (Pi0 vs SmolVLA)

  • For single-task, short-horizon, high-rate control with modest language needs: Pi0/ACT-style policies can be more compute-efficient and responsive.
  • For stronger multi-task and cross-scene generalization or single-GPU/consumer hardware alignment: prefer SmolVLA; fine-tuning smolvla_base typically yields stable gains.
  • Under resource constraints: enable mixed precision, gradient accumulation, and consider lowering input resolution and batch size.

Training Results Inspection

Model Saving

After training, model files are located at:

./checkpoints/pi0_finetuned/
├── config.json
├── pytorch_model.bin
├── tokenizer.json
└── training_args.bin

Load Fine-Tuned Model

from lerobot import Pi0Policy  # Note: Import path may be lerobot.policies

# Load model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")

# Inference example
action = policy.select_action(observation_batch)

Log Analysis

Training logs include:

  • Loss Curves: Training/validation loss trends
  • Learning Rate: Scheduler changes
  • Gradient Statistics: Norms and clipping information
  • Resource Utilization: GPU/CPU memory usage

Performance Evaluation

IO Data Platform Evaluation

The IO platform provides model inference quality evaluation, supporting visual comparison of real data with model outputs, including per-joint command comparisons.

Other Evaluation Methods

Quantitative Metrics

  • Action Error: Mean Absolute Error (MAE) or Mean Squared Error (MSE) between predicted actions and ground truth
  • Trajectory Similarity: Compare trajectories using Dynamic Time Warping (DTW) or Fréchet distance
  • Validation Loss: Monitor loss function values on the validation set

Qualitative Evaluation

  • Real-World Testing: Deploy to physical robots and measure task success rates
  • Behavior Analysis: Visualize if generated action sequences are reasonable
  • Human Evaluation: Collect expert subjective scores on model outputs

Evaluation Script Example

import numpy as np
from lerobot import Pi0Policy
from torch.utils.data import DataLoader # Assume test_dataset is a DataLoader

# Load model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")

# Evaluation function
def evaluate_model(policy, test_loader):
total_mae = 0.0
num_batches = 0

for batch in test_loader:
with torch.no_grad():
predicted_action = policy.select_action(batch)
true_action = batch['actions']

# Calculate MAE
mae = np.mean(np.abs(predicted_action.cpu().numpy() - true_action.cpu().numpy()))
total_mae += mae
num_batches += 1

return total_mae / num_batches

# Perform evaluation (assume test_loader is prepared)
avg_mae = evaluate_model(policy, test_loader)
print(f"Average Absolute Error (MAE): {avg_mae:.4f}")