LeRobot Dataset
LeRobot is an open-source standardized data framework from Hugging Face, designed for robot learning and reinforcement learning scenarios. It provides a unified data format so researchers can share, compare, and reproduce robot learning experiments more easily, significantly reducing cross-project data conversion costs.
Data Export
The IO Data Platform fully supports exporting data in the LeRobot standard format, directly usable for Vision-Language-Action (VLA) model training. Exported data contains complete multimodal information for robot operations: visual observations, natural-language instructions, and the corresponding action sequences—forming a full perception–understanding–execution data mapping.
Exporting LeRobot datasets can be compute-intensive. On the IO Data Open Platform, the free tier reasonably limits export count per user; paid plans provide unlimited exports and GPU acceleration to significantly speed up processing.
1. Select Data to Export
Annotation must be completed before export. During annotation, action sequences are precisely mapped to natural-language instructions, which is essential for VLA training. With this mapping, models can learn to understand language commands and convert them into accurate robot control actions.
For detailed annotation workflow and batch-annotation tips, see: Annotation Guide
After annotation, you can review all annotated datasets in the export page. Flexible subset selection is supported—you can export exactly what you need.
Dataset naming can be customized. If you plan to publish to Hugging Face, we recommend using a standard repository naming format like myproject/myrepo1
, which helps with collaboration and later sharing.
The larger the data, the longer the export. Export by task type rather than all at once. Batched exports not only speed up processing, they also simplify later data management, versioning, and targeted model training.
2. Download and Extract Export Files
Export duration depends on data size and current system load—typically tens of minutes. The page updates progress automatically; you can come back later to check results.
After completion, click the Download button in the Export Records panel on the right to get a .tar.gz
archive.
We suggest creating a clean local directory (e.g., ~/Downloads/mylerobot3
) to extract files to avoid mixing with other data:
The extracted files strictly follow the LeRobot dataset format, with complete multimodal content such as images, robot state, and action labels:
Data Visualization and Validation
To help users understand and validate data quickly, LeRobot offers two main visualization options, each with its own strengths:
Use Case | Visualization | Advantages |
---|---|---|
Local development & debugging | Rerun SDK (local) | Rich features, highly interactive, offline usable |
Quick preview & sharing | Hugging Face Spaces (online) | No install, easy to share, accessible anytime |
1. Local Visualization with Rerun SDK
Install the lerobot
repo locally and use the built-in lerobot/scripts/visualize_dataset.py
script with the Rerun SDK for timeline-style, interactive multimodal visualization (images, states, actions, etc.). This is the most powerful and customizable option.
Environment Setup and Dependencies
Ensure Python ≥ 3.10, then run the following installation commands:
# Install Rerun SDK
python3 -m pip install rerun-sdk==0.23.1
# Clone LeRobot official repository
git clone https://github.com/huggingface/lerobot.git
cd lerobot
# Install LeRobot development environment
pip install -e .
Start Data Visualization
python3 -m lerobot.scripts.visualize_dataset \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0
Parameter Description:
--repo-id
: Hugging Face dataset identifier (e.g.,io-ai-data/lerobot_dataset
)--root
: Local LeRobot dataset storage path, pointing to the extracted directory--episode-index
: Specify the episode index to visualize (starting from 0)
Generate Offline Visualization Files
You can save visualization results as Rerun format files (.rrd) for offline viewing or sharing with team members:
python3 -m lerobot.scripts.visualize_dataset \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0 \
--save 1 \
--output-dir ./rrd_out
# View saved visualization file offline
rerun ./rrd_out/lerobot_pusht_episode_0.rrd
Remote Visualization (WebSocket Mode)
When processing on a remote server but viewing locally, use WebSocket mode:
# Start visualization service on the server
python3 -m lerobot.scripts.visualize_dataset \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0 \
--mode distant \
--ws-port 9091
# Connect to remote visualization service locally
rerun ws://SERVER_IP:9091
2. Online Visualization with Hugging Face Spaces
If you prefer not to install a local environment, LeRobot provides an online visualization tool based on Hugging Face Spaces, usable without any local dependencies. This approach is particularly suitable for quick data preview or sharing datasets with teams.
Online visualization requires uploading data to a Hugging Face online repository. Note that Hugging Face free accounts only support visualization of public repositories, meaning your data will be publicly accessible. If data involves sensitive information requiring privacy, use the local Rerun SDK approach.
Operation Steps
- Visit the online visualization tool: https://huggingface.co/spaces/lerobot/visualize_dataset
- Enter your dataset identifier in the Dataset Repo ID field (e.g.,
io-ai-data/uncap_pen
) - Select the episode number to view in the left panel (e.g.,
Episode 0
) - The top of the page provides various playback options; choose the viewing method that best suits your needs
Model Training Guide
Model training based on LeRobot datasets is the core step in achieving robot learning. Different model architectures have varying requirements for training parameters and data preprocessing, making the choice of appropriate model strategy crucial for training effectiveness.
Model Selection Strategy
Current mainstream VLA models include:
Model Type | Use Case | Key Features | Recommended For |
---|---|---|---|
smolVLA | Single GPU, fast prototyping | Moderate parameters, efficient training | Consumer GPU, proof of concept |
Pi0 / Pi0.5 | Complex tasks, multimodal fusion | Strong language understanding | Production, complex interaction |
ACT | Single-task optimization | High action prediction accuracy | Specific tasks, high-frequency control |
Diffusion | Smooth action generation | Diffusion-based, high trajectory quality | Tasks requiring smooth trajectories |
VQ-BeT | Action discretization | Vector quantization, fast inference | Real-time control scenarios |
TDMPC | Model predictive control | Sample efficient, online learning | Data-scarce scenarios |
Pi0 models belong to Physical Intelligence's OpenPI framework, not the LeRobot project. If you need to use Pi0 models, please refer to: OpenPI Official Documentation
smolVLA Training Guide (Recommended for Beginners)
smolVLA is a VLA model optimized for consumer-grade/single-GPU environments. Compared to training from scratch, it's strongly recommended to fine-tune based on official pretrained weights, which can significantly reduce training time and improve final results.
LeRobot training commands use the following parameter format:
- Policy type:
--policy.type smolvla
(specify which model to use) - Parameter values: Use space separation, e.g.,
--batch_size 64
(not--batch_size=64
) - Boolean values: Use
true
/false
, e.g.,--save_checkpoint true
- List values: Use space separation, e.g.,
--policy.down_dims 512 1024 2048
- Model upload: By default, add
--policy.push_to_hub false
to disable automatic upload to HuggingFace Hub
Environment Setup
# Clone LeRobot repository
git clone https://github.com/huggingface/lerobot.git
cd lerobot
# Install complete environment with smolVLA support
pip install -e ".[smolvla]"
Fine-tuning Training (Recommended)
lerobot-train \
--policy.type smolvla \
--policy.pretrained_path lerobot/smolvla_base \
--dataset.repo_id your-name/your-repo \
--dataset.root /data/lerobot_dataset \
--output_dir /data/lerobot_smolvla_finetune \
--batch_size 64 \
--steps 20000 \
--policy.optimizer_lr 1e-4 \
--policy.device cuda \
--policy.push_to_hub false \
--save_checkpoint true \
--save_freq 5000
Practical Recommendations:
- Data preparation: Recommend recording 50+ task demonstration episodes, ensuring coverage of different object positions, poses, and environmental variations
- Training resources: Single A100 training for 20k steps takes approximately 4 hours; consumer-grade GPUs can reduce batch_size or enable gradient accumulation
- Hyperparameter tuning: Start with
batch_size=64
,steps=20k
, learning rate from1e-4
and adjust based on results - When to train from scratch: Only consider training from scratch when you have large-scale datasets (thousands of hours)
Training from Scratch (Advanced Users)
lerobot-train \
--policy.type smolvla \
--dataset.repo_id your-name/your-repo \
--dataset.root /data/lerobot_dataset \
--output_dir /data/lerobot_smolvla_fromscratch \
--batch_size 64 \
--steps 200000 \
--policy.optimizer_lr 1e-4 \
--policy.device cuda \
--policy.push_to_hub false \
--save_checkpoint true \
--save_freq 10000
Performance Optimization Tips
Memory Optimization:
# Add these parameters to optimize memory usage
--policy.use_amp true \
--num_workers 2 \
--batch_size 32 # Reduce batch size
Training Monitoring:
- Configure Weights & Biases (W&B) to monitor training curves and evaluation metrics
- Set reasonable validation intervals and early stopping strategies
- Regularly save checkpoints to prevent training interruption
ACT Model Training Guide
ACT (Action-Conditioned Transformer) is designed for single-task or short-horizon policy learning. While not as versatile as smolVLA for multi-task generalization, ACT remains a cost-effective choice for scenarios with clear tasks, high control frequency, and relatively short horizons.
ACT model requires policy.n_action_steps
≤ policy.chunk_size
. It's recommended to set both parameters to the same value (e.g., 100) to avoid configuration errors.
Data Preprocessing Requirements
Trajectory Processing:
- Ensure unified segment length and time alignment (recommend 10-20 step action chunks)
- Normalize action data to unify scale and units
- Maintain consistency in observation data, especially camera intrinsics and viewpoints
Training Configuration:
lerobot-train \
--policy.type act \
--dataset.repo_id your-name/your-repo \
--dataset.root /data/lerobot_dataset \
--output_dir /data/lerobot_act_finetune \
--batch_size 8 \
--steps 100000 \
--policy.chunk_size 100 \
--policy.n_action_steps 100 \
--policy.n_obs_steps 1 \
--policy.optimizer_lr 1e-5 \
--policy.device cuda \
--policy.push_to_hub false \
--save_checkpoint true \
--save_freq 10000
Hyperparameter Tuning Recommendations:
- Batch size: Start at 8; adjust based on memory (ACT recommends smaller batches)
- Learning rate: Recommend 1e-5, ACT is sensitive to learning rate
- Training steps: 100k-200k steps, adjust based on task complexity
- Action chunk size: chunk_size and n_action_steps recommended at 100
- Regularization: Increase data diversity or early stopping when overfitting occurs
Performance Tuning Strategies
Handling Overfitting:
- Increase data collection diversity
- Apply appropriate regularization techniques
- Implement early stopping strategies
Handling Underfitting:
- Extend training steps
- Adjust learning rate scheduling
- Check data quality and consistency
Frequently Asked Questions (FAQ)
Data Export
Q: How long does LeRobot data export take?
A: Export time mainly depends on data scale and current system load. Typically, each GB of data requires 3-5 minutes processing time. For improved efficiency, it's recommended to export by task type in batches, avoiding processing overly large datasets at once.
Q: What are the limitations of the free version?
A: The free version has reasonable limitations on export quantity and frequency per user, with specific quotas displayed on the export interface. For large-scale data export, it's recommended to upgrade to the paid version to enjoy unlimited exports and GPU acceleration services.
Q: How to verify export data integrity?
A: Use LeRobot's built-in validation tool to check:
python -m lerobot.scripts.validate_dataset --root /path/to/dataset
Q: How to handle oversized exported datasets?
A: You can optimize through the following methods:
- Lower export frequency settings (default 30fps can be reduced to 10-15fps)
- Split exports by time period or task type
- Compress image quality (while ensuring training effectiveness)
Data Visualization
Q: What if Rerun SDK installation fails?
A: Please check the following conditions:
- Ensure Python version ≥ 3.10
- Check network connection stability
- Try installing in a virtual environment:
python -m venv rerun_env && source rerun_env/bin/activate
- Use mirror source:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple rerun-sdk==0.23.1
Q: Does online visualization require public data?
A: Yes. Hugging Face Spaces' online visualization tool can only access public datasets. If your data involves sensitive information requiring privacy, please use the local Rerun SDK approach.
Q: How to upload data to Hugging Face?
A: Use the official CLI tool:
# Install Hugging Face CLI
pip install huggingface_hub
# Login to account
huggingface-cli login
# Upload dataset
huggingface-cli upload-dataset your-username/dataset-name /path/to/dataset
Model Training
Q: What model types are supported?
A: LeRobot format supports multiple mainstream VLA models:
- smolVLA: Suitable for single-GPU environments and fast prototyping
- Pi0: Powerful multimodal capabilities, suitable for complex tasks (belongs to OpenPI framework)
- ACT: Focused on single-task optimization, high action prediction accuracy
For specific supported models, please refer to: https://github.com/huggingface/lerobot/tree/main/src/lerobot/policies
Q: What to do when encountering memory issues during training?
A: Try the following optimization strategies:
- Reduce batch size:
--batch_size 1
or smaller - Enable mixed precision training:
--policy.use_amp true
- Reduce data loading threads:
--num_workers 1
- Reduce observation steps:
--policy.n_obs_steps 1
- Clear GPU cache: add
torch.cuda.empty_cache()
in training script
Q: How to choose the right model?
A: Choose based on your specific needs:
- Fast prototyping: Choose smolVLA
- Complex multimodal tasks: Choose Pi0 (requires OpenPI framework)
- Resource-limited environments: Choose smolVLA or ACT
- Single specialized task: Choose ACT
Q: How to evaluate training effectiveness?
A: LeRobot provides various evaluation methods:
- Quantitative metrics: Action error (MAE/MSE), trajectory similarity (DTW)
- Qualitative evaluation: Real-world test success rate, behavior analysis
- Platform evaluation: IO Platform provides visual model quality evaluation tools
Q: How long does training typically take?
A: Training time depends on multiple factors:
- Data scale: 50 demonstration episodes typically require 2-8 hours
- Hardware configuration: A100 is 3-5 times faster than consumer-grade GPUs
- Model choice: smolVLA trains faster than ACT
- Training strategy: Fine-tuning is 5-10 times faster than training from scratch
Technical Support
Q: How to get help when encountering technical issues?
A: You can get support through the following channels:
- Consult LeRobot official documentation: https://huggingface.co/docs/lerobot
- Submit Issues on GitHub: https://github.com/huggingface/lerobot/issues
- Contact IO Platform technical support team
- Participate in LeRobot community discussions
Q: Does the IO Platform support automatic model deployment?
A: Yes, the IO Platform supports automatic deployment services for mainstream models such as Pi0 (OpenPI framework), smolVLA, and ACT. For detailed information, please contact the technical support team for deployment plans and pricing.
Related Resources
Official Resources
- LeRobot Project Home: https://github.com/huggingface/lerobot
- LeRobot Model Collection: https://huggingface.co/lerobot
- LeRobot Official Documentation: https://huggingface.co/docs/lerobot
- Hugging Face Online Visualization Tool: https://huggingface.co/spaces/lerobot/visualize_dataset
Tools and Frameworks
- Rerun Visualization Platform: https://www.rerun.io/
- Hugging Face Hub: https://huggingface.co/docs/huggingface_hub
Academic Resources
- Pi0 Original Paper: https://arxiv.org/abs/2410.24164
- ACT Paper: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
- VLA Survey Paper: Vision-Language-Action Models for Robotic Manipulation
OpenPI Related Resources
- OpenPI Project Home: https://github.com/Physical-Intelligence/openpi
- Physical Intelligence: https://www.physicalintelligence.company/
Community Resources
- LeRobot GitHub Discussions: https://github.com/huggingface/lerobot/discussions
- Hugging Face Robot Learning Community: https://huggingface.co/spaces/lerobot
This documentation will be continuously updated to reflect the latest developments and best practices in the LeRobot ecosystem. If you have questions or suggestions, please feel free to contact us through the IO Platform technical support channels.