Fine-tuning Pi0 Model with Your Data
Pi0 is a powerful Vision-Language-Action (VLA) model that can quickly adapt to your robot tasks through fine-tuning with a small amount of data.
This document is based on the LeRobot framework, using the pre-trained Pi0 model (lerobot/pi0
) to perform fine-tuning on your own robot dataset.
Prerequisites
- You have a dataset in LeRobot format (see the previous section on exporting LeRobot datasets)
- You want to adapt the Pi0 model to your robot hardware, tasks, or control style
- You are familiar with PyTorch and the Hugging Face ecosystem for training
- You have basic deep learning training experience
What is Fine-tuning?
Fine-tuning refers to continuing to train a pre-trained model for a short period using your own data.
Pi0 is a model pre-trained on multiple general robot tasks, and it has already learned many general capabilities in visual understanding, instruction understanding, and action prediction.
Through fine-tuning, you can make Pi0:
- Adapt to your robot environment: Adapt to your camera perspective, lighting environment, and mechanical structure
- Adapt to specific tasks: Optimize for your specific tasks (such as picking up pens, classification, placement)
- Improve control accuracy: Significantly improve the model's control accuracy and success rate in your tasks
Simply put: Pre-trained models know a lot, but they don't necessarily understand you; after fine-tuning, they understand you.
Environment Setup
System Requirements
Please ensure your environment meets the following requirements:
- Python ≥ 3.8 (recommended 3.10 or 3.11)
- GPU: Recommend at least 32GB VRAM (Pi0 is a large model, recommend V100 or higher level graphics cards)
- Memory: Recommend at least 64GB RAM
- Storage: Ensure sufficient disk space for datasets and models
Install Dependencies
# Clone LeRobot repository
git clone https://github.com/huggingface/lerobot.git
cd lerobot/
# Install LeRobot framework (includes Pi0 support)
pip install -e ".[pi0]"
# Verify installation
python -c "from lerobot.policies import Pi0Policy; print('Pi0 installation successful!')"
Prepare Your Dataset
Export LeRobot Format Data
You can export annotated data as LeRobot format datasets through the IO Data Platform export page. See Export LeRobot Dataset.
Important note: You don't need to upload your data to HuggingFace to train. Use the --dataset.root=
parameter to specify using local directory data for fine-tuning.
Dataset Structure Example
Assume your exported data is stored in ~/DualPiper_Pickup_Pen
:
$ ls ~/DualPiper_Pickup_Pen
data meta videos
$ ls ~/DualPiper_Pickup_Pen/data
episode_000 episode_001 episode_002 ...
$ ls ~/DualPiper_Pickup_Pen/data/episode_000
observations actions.npy language_instruction.txt metadata.json
Start Fine-tuning
Basic Training Command
Use the train.py
script for training:
# Set CUDA memory allocation strategy (recommended)
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# Start training
python3 -m lerobot.scripts.train \
--dataset.root=~/DualPiper_Pickup_Pen \
--dataset.repo_id=io-ai-data/DualPiper_Pickup_Pen \
--policy.path=lerobot/pi0 \
--policy.repo_id=lerobot/pi0 \
--policy.device=cuda \
--output_dir=./checkpoints/pi0_finetuned \
--batch_size=1 \
--policy.attention_implementation=flash_attention_2 \
--training.learning_rate=1e-5 \
--training.num_epochs=10
The Pi0 model itself is quite large, so single-card training takes at least dozens of hours. For example, when we fine-tuned 2 hours of collected data, it took 50 hours on a single V100 card. 8 V100 cards would speed up the process, but it still takes nearly 10 hours.
Parameter Details
Parameter | Meaning | Recommended Value | Description |
---|---|---|---|
--dataset.root | Local dataset path | ~/your_dataset | Points to your LeRobot dataset directory |
--dataset.repo_id | Hugging Face dataset name | your-username/dataset | Used to identify the dataset |
--policy.path | Pre-trained model path | lerobot/pi0 | Use official pre-trained model |
--policy.repo_id | Model repository ID | lerobot/pi0 | Model repository on Hugging Face |
--policy.device | Training device | cuda | Use GPU acceleration |
--output_dir | Model save directory | ./checkpoints/pi0_finetuned | Path to save fine-tuned model |
--batch_size | Batch size | 1 | Adjust based on VRAM size |
--policy.attention_implementation | Attention mechanism implementation | flash_attention_2 | Use Flash Attention 2 for acceleration |
--training.learning_rate | Learning rate | 1e-5 | Use smaller learning rate for fine-tuning |
--training.num_epochs | Number of training epochs | 10 | Adjust based on data volume |
Multi-GPU Accelerated Training
If you have multiple graphics cards, you can train in parallel to significantly speed up the process.
# Use 8 GPUs for training
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
torchrun --nproc_per_node=8 --master_port=29500 \
-m lerobot.scripts.train \
--dataset.root=/data/nfs2/export/lerobot/DualPiper_Pickup_Pen/ \
--dataset.repo_id=io-ai-data/DualPiper_Pickup_Pen \
--policy.path=lerobot/pi0 \
--policy.repo_id=lerobot/pi0 \
--policy.device=cuda \
--output_dir=/home/puxk/DualPiper_Pickup_Pen_pi0_model \
--batch_size=1 \
--num_workers=4 \
--policy.attention_implementation=flash_attention_2
The total batch size is nproc_per_node × batch_size. With --batch_size=1
, 8 cards equals a total batch size of 8. You can increase this if memory allows.
For debugging, it's recommended to first test with --nproc_per_node=2
to see if it runs, then expand to all 8 cards.
Special Notes
- Network acceleration: You may need network acceleration to download the Pi0 model from HuggingFace, refer to tools like hg-mirror
- VRAM monitoring: Please monitor GPU VRAM usage during training
- Resume training: If training is interrupted, you can continue from checkpoints in
--output_dir
Important Notes: Gemma Model Compatibility
The Pi0 model internally uses Google PaLI-Gemma model structure for language processing. Since it's a recently added Hugging Face model:
Version Requirements
- Transformers: Ensure you use the latest version of
transformers
(recommended ≥ 4.37.0) - Compatibility check: If you encounter
embed_tokens
related errors, please upgrade transformers
Solutions
# Upgrade to latest transformers
pip install -U transformers
# Or specify version
pip install transformers>=4.37.0
View Training Results
Model Save Location
After training is complete, the model will be saved in:
./checkpoints/pi0_finetuned/
├── config.json
├── pytorch_model.bin
├── tokenizer.json
└── training_args.bin
Load Fine-tuned Model
from lerobot.policies import Pi0Policy
# Load fine-tuned model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")
# Perform inference
action = policy.select_action(batch)
Training Log Analysis
During training, detailed logs will be generated, including:
- Loss curves: Changes in training loss and validation loss
- Learning rate scheduling: Changes in learning rate
- Gradient information: Gradient norms and gradient clipping
- Memory usage: GPU VRAM and system memory usage
How to Evaluate Results?
IO Data Platform Evaluation
The IO Data Platform supports model inference quality evaluation. You can compare any real data with model inference results and visualize the comparison for each joint instruction.
Other Evaluation Methods
You can also evaluate Pi0's performance on your tasks through the following methods:
1. Quantitative Evaluation
- Action error: Compare predicted actions with real actions (such as angle error, position error)
- Trajectory comparison: Visualize predicted vs ground truth behavior trajectories
- Loss metrics: Record loss change trends on validation set
2. Qualitative Evaluation
- Real deployment: Verify task success rate on real robots
- Behavior observation: Observe whether model-generated actions meet expectations
- User feedback: Collect user feedback from actual usage
3. Evaluation Script Example
import numpy as np
from lerobot.policies import Pi0Policy
# Load model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")
# Evaluation function
def evaluate_model(policy, test_dataset):
total_error = 0
num_samples = 0
for batch in test_dataset:
predicted_action = policy.select_action(batch)
true_action = batch['actions']
# Calculate action error
error = np.mean(np.abs(predicted_action - true_action))
total_error += error
num_samples += 1
return total_error / num_samples
# Execute evaluation
avg_error = evaluate_model(policy, test_dataset)
print(f"Average action error: {avg_error:.4f}")
Related Resources
- LeRobot Project Homepage
- Hugging Face Pi0 Model
- Original Pi0 Implementation (JAX)
- Pi0 Paper
- LeRobot Documentation
Frequently Asked Questions
Q: Model reports embed_tokens
doesn't exist during fine-tuning?
A: This is because the transformers
version is too old, just upgrade:
pip install -U transformers
Q: Dataset not uploaded to Hugging Face, can I use local data?
A: Yes, use the --dataset.root=your_local_path
parameter.
Q: What to do if VRAM is insufficient during training?
A: You can try the following methods:
# 1. Reduce batch size
--batch_size=1
There are other parameters that need to be decided based on actual circumstances.
Q: How long does training typically take?
A: Training time depends on:
- Data volume: Usually 10+ hours
- Hardware configuration: GPU performance has a big impact
- Training epochs: Recommend 5-20 epochs
Q: How to determine if training has converged?
A: Observe the following indicators:
- Loss curves: Training loss and validation loss stabilize
- Validation metrics: Performance on validation set no longer improves
- Overfitting: Stop training when validation loss starts to rise
Q: How to deploy the fine-tuned model?
A: The IO Data Platform already supports automatic deployment of Pi0 and SmolVLA and other common models in the robotics field. Please consult our technical experts for details.
You can also refer to the following steps:
# 1. Load model
policy = Pi0Policy.from_pretrained("./checkpoints/pi0_finetuned")
# 2. Prepare input data
observation = {
'images': camera_images,
'language_instruction': "Pick up the red pen"
}
# 3. Generate action commands
action = policy.select_action(observation)
Good luck with your training!