Fine-Tuning the Pi0 Model: Adapting to Robot Tasks with Custom Datasets
Pi0 is an advanced Vision-Language-Action (VLA) model that can quickly adapt to specific robot tasks through few-shot fine-tuning.
This document is based on the LeRobot framework and uses the pre-trained Pi0 model (lerobot/pi0
) for fine-tuning on custom robot datasets.
Prerequisites
- Have a dataset in LeRobot format (refer to the LeRobot dataset export guide in the previous section)
- Need to adapt the Pi0 model to specific robot hardware, tasks, or control strategies
- Familiar with PyTorch and the Hugging Face ecosystem training process
- Basic experience in deep learning model training
Fine-Tuning Overview
Fine-tuning is the process of further training a pre-trained model using domain-specific data, typically involving fewer iterations.
The Pi0 model has been pre-trained on diverse robot tasks, possessing general visual perception, language understanding, and action generation capabilities.
Through fine-tuning, Pi0 can achieve:
- Environmental Adaptation: Adjust to specific camera views, lighting conditions, and mechanical structures
- Task Specialization: Optimize performance for specific tasks (e.g., object grasping, classification, or placement)
- Precision Improvement: Significantly enhance control accuracy and success rate in target tasks
In simple terms: Pre-trained models know a lot, but not necessarily about you; after fine-tuning, they understand you.
Environment Preparation
System Requirements
Ensure your environment meets the following requirements:
- Python ≥ 3.8 (recommended 3.10 or higher)
- GPU: At least 32GB VRAM (Pi0 is a large-scale model, recommend NVIDIA V100 or higher performance GPU)
- Memory: At least 64GB system RAM
- Storage: Sufficient disk space for datasets and model checkpoints
Install Dependencies
# Clone LeRobot repository (primary)
git clone https://github.com/lerobot-ai/lerobot.git
cd lerobot/
# Install LeRobot framework (including Pi0 support)
pip install -e ".[pi0]"
# Verify installation
python -c "from lerobot.policies import Pi0Policy; print('Pi0 installed successfully!')"
Mirror (if needed): https://github.com/huggingface/lerobot
Prepare Your Dataset
Export LeRobot Format Data
You can export annotated data to LeRobot format dataset through the IO Data Platform export page, see Export LeRobot Dataset.
Important Note: You don't need to upload your data to HuggingFace for training; use the --dataset.root=
parameter to specify local directory data for fine-tuning.
Dataset Structure Example
Assume your exported data is stored in ~/DualPiper_Pickup_Pen
:
$ ls ~/DualPiper_Pickup_Pen
data meta videos
$ ls ~/DualPiper_Pickup_Pen/data
episode_000 episode_001 episode_002 ...
$ ls ~/DualPiper_Pickup_Pen/data/episode_000
observations actions.npy language_instruction.txt metadata.json
Start Fine-Tuning
Basic Training Command
Use the train.py
script for training:
# Set CUDA memory allocation policy (recommended)
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# Start training
python3 -m lerobot.scripts.train \
--dataset.root=~/DualPiper_Pickup_Pen \
--dataset.repo_id=io-ai-data/DualPiper_Pickup_Pen \
--policy.path=lerobot/pi0 \
--policy.repo_id=lerobot/pi0 \
--policy.device=cuda \
--output_dir=./checkpoints/pi0_finetuned \
--batch_size=1 \
--policy.attention_implementation=flash_attention_2 \
--training.learning_rate=1e-5 \
--training.num_epochs=10
The Pi0 model has a large number of parameters, single GPU training may take tens of hours. For example, fine-tuning a 2-hour dataset takes about 50 hours on a single NVIDIA V100, while using 8 V100s can reduce it to about 10 hours.
Parameter Details
Parameter | Meaning | Recommended Value | Description |
---|---|---|---|
--dataset.root | Local dataset path | ~/your_dataset | Points to LeRobot format dataset directory |
--dataset.repo_id | Hugging Face dataset ID | your-username/dataset | For metadata identification |
--policy.path | Pre-trained model path | lerobot/pi0 | Specify official pre-trained model |
--policy.repo_id | Model repository ID | lerobot/pi0 | Hugging Face model repository |
--policy.device | Training device | cuda | Enable GPU acceleration |
--output_dir | Output directory | ./checkpoints/pi0_finetuned | Fine-tuned model save path |
--batch_size | Batch size | 1 | Adjust based on VRAM |
--policy.attention_implementation | Attention mechanism | flash_attention_2 | Use efficient implementation to accelerate training |
--training.learning_rate | Learning rate | 1e-5 | Use lower value during fine-tuning to avoid catastrophic forgetting |
--training.num_epochs | Number of training epochs | 10 | Adjust based on dataset scale |
Multi-GPU Accelerated Training
If you have multiple GPUs, parallel training can significantly speed up the process.
# Train with 8 GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
torchrun --nproc_per_node=8 --master_port=29500 \
-m lerobot.scripts.train \
--dataset.root=/data/nfs2/export/lerobot/DualPiper_Pickup_Pen/ \
--dataset.repo_id=io-ai-data/DualPiper_Pickup_Pen \
--policy.path=lerobot/pi0 \
--policy.repo_id=lerobot/pi0 \
--policy.device=cuda \
--output_dir=/home/puxk/DualPiper_Pickup_Pen_pi0_model \
--batch_size=1 \
--num_workers=4 \
--policy.attention_implementation=flash_attention_2
The total batch size is nproc_per_node × batch_size. With --batch_size=1, 8 cards equal a total batch size of 8; increase if memory allows.
For debugging, suggest starting with --nproc_per_node=2 to test, then expand to all 8 cards.
Special Notes
- Network Access: Downloading Hugging Face models may require proxy tools like hf-mirror
- VRAM Monitoring: Monitor GPU utilization in real-time during training
- Checkpoint Recovery: If training is interrupted, resume from the latest checkpoint in
--output_dir
The Pi0 model integrates Google PaliGemma for language processing. As a recent Hugging Face model:
Version Compatibility
- Transformers: Requires ≥ 4.37.0
- Compatibility Check: If encountering
embed_tokens
errors, upgrade transformers
Solutions
# Upgrade to latest transformers
pip install -U transformers
# Or specify version
pip install transformers>=4.37.0
Policy Selection (Pi0 vs SmolVLA)
- For single-task, short-horizon, high-rate control with modest language needs: Pi0/ACT-style policies can be more compute-efficient and responsive.
- For stronger multi-task and cross-scene generalization or single-GPU/consumer hardware alignment: prefer SmolVLA; fine-tuning
smolvla_base
typically yields stable gains. - Under resource constraints: enable mixed precision, gradient accumulation, and consider lowering input resolution and batch size.
Training Results Inspection
Model Saving
After training, model files are located at:
./checkpoints/pi0_finetuned/
├── config.json
├── pytorch_model.bin
├── tokenizer.json
└── training_args.bin
Load Fine-Tuned Model
from lerobot import Pi0Policy # Note: Import path may be lerobot.policies
# Load model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")
# Inference example
action = policy.select_action(observation_batch)
Log Analysis
Training logs include:
- Loss Curves: Training/validation loss trends
- Learning Rate: Scheduler changes
- Gradient Statistics: Norms and clipping information
- Resource Utilization: GPU/CPU memory usage
Performance Evaluation
IO Data Platform Evaluation
The IO platform provides model inference quality evaluation, supporting visual comparison of real data with model outputs, including per-joint command comparisons.
Other Evaluation Methods
Quantitative Metrics
- Action Error: Mean Absolute Error (MAE) or Mean Squared Error (MSE) between predicted actions and ground truth
- Trajectory Similarity: Compare trajectories using Dynamic Time Warping (DTW) or Fréchet distance
- Validation Loss: Monitor loss function values on the validation set
Qualitative Evaluation
- Real-World Testing: Deploy to physical robots and measure task success rates
- Behavior Analysis: Visualize if generated action sequences are reasonable
- Human Evaluation: Collect expert subjective scores on model outputs
Evaluation Script Example
import numpy as np
from lerobot import Pi0Policy
from torch.utils.data import DataLoader # Assume test_dataset is a DataLoader
# Load model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")
# Evaluation function
def evaluate_model(policy, test_loader):
total_mae = 0.0
num_batches = 0
for batch in test_loader:
with torch.no_grad():
predicted_action = policy.select_action(batch)
true_action = batch['actions']
# Calculate MAE
mae = np.mean(np.abs(predicted_action.cpu().numpy() - true_action.cpu().numpy()))
total_mae += mae
num_batches += 1
return total_mae / num_batches
# Perform evaluation (assume test_loader is prepared)
avg_mae = evaluate_model(policy, test_loader)
print(f"Average Absolute Error (MAE): {avg_mae:.4f}")