LeRobot Dataset
LeRobot is an open-source standardized dataset solution for robot learning and reinforcement learning scenarios, provided by HuggingFace. It offers a unified format that makes it easier for researchers to share, compare, and reproduce robot learning experiments.
Exporting Data
The IO Data Platform supports exporting data in the LeRobot format, which can be directly used for training VLA (Vision-Language-Action) models. This format contains a complete mapping of visual information, language instructions, and action data for robotic operations.
Exporting data requires significant computational resources. Therefore, the free version of the IO Data Open Platform limits the number of exports per user. The paid version removes this limitation and offers faster export speeds with GPU acceleration.
1. Select Data to Export
You need to annotate the data first. Annotation links actions with natural language instructions, which is essential for training VLA models. This process ensures that the model can understand language commands and translate them into corresponding robot actions.
For information on how to annotate and perform rapid batch annotation, please refer to the documentation: Data Annotation
Once annotation is complete, you can view the annotated data on the export page and select specific subsets for export.
You can customize the dataset name. If you plan to upload the data to Hugging Face, it is recommended to use the standard repository naming format, such as myproject/myrepo1
, to simplify the publishing process.
The more data you select, the slower the export will be. It is recommended to select data by task type rather than exporting everything at once. This not only speeds up the export process but also makes data management and model training easier.
2. Download and Extract the Exported File
The export process may take several minutes to tens of minutes, depending on the data size and system load. The progress will refresh automatically, so you can return to the export page later to check the result.
Once the export is successful, you will see a Download Data button in the Export Records section on the right side of the page. Click it to download a .tar.gz archive.
It is recommended to create a new empty directory locally, such as ~/Downloads/mylerobot3
, to extract the files and avoid confusion:
The extracted files follow the standard LeRobot dataset structure, including visual data, state information, and action labels:
Browsing Data
There are two common visualization methods to help users quickly browse, understand, and debug the data. Each method is suitable for different scenarios.
Scenario | Method | Advantages |
---|---|---|
Local development and debugging | Rerun SDK local viewing | Full functionality, high interactivity, no network dependency |
Quick preview or demo loading | Hugging Face online viewing | No installation required, easy sharing, accessible anytime |
1. Using Rerun SDK for Local Viewing
You need to download and install the lerobot
repository locally, and use lerobot/scripts/visualize_dataset.py
with Rerun SDK to achieve timeline-style interactive multimodal data viewing (including images, states, actions, etc.). This method provides the richest interactive functionality and customization options.
Installing Rerun SDK
Ensure you are using Python 3.10 or higher, and execute the following commands to install the necessary dependencies:
python3 -m pip install rerun-sdk==0.23.1
git clone https://github.com/huggingface/lerobot.git # Clone repository
cd lerobot
pip install -e . # Install in development mode
Launching the Visualization Script
python3 -m lerobot.scripts.visualize_dataset \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0
Parameter descriptions:
--repo-id
: Hugging Face dataset name, such asio-ai-data/lerobot_dataset
--root
: Local path where LeRobot data is stored, pointing to the extracted directory--episode-index
: Specify the episode index to view (starting from 0)
Saving as .rrd File
You can save the data visualization results as Rerun format for offline viewing or sharing with team members:
python3 -m lerobot.scripts.visualize_dataset \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0 \
--save 1 \
--output-dir ./rrd_out
# Then you can view offline
rerun ./rrd_out/lerobot_pusht_episode_0.rrd
Remote WebSocket Mode
If you need to view remotely (e.g., viewing data on a server from local), you can use WebSocket mode:
# Server side
python3 -m lerobot.scripts.visualize_dataset \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0 \
--mode distant \
--ws-port 9091
# Local side
rerun ws://server-IP:9091
2. Online Viewing via Hugging Face Spaces
If you don't want to install any local environment, LeRobot provides an online viewing tool based on Hugging Face Spaces, requiring no local dependencies. This method is particularly suitable for quick previews or sharing dataset content with teams.
Online visualization requires you to upload your data to Hugging Face's online repository. Hugging Face free accounts can only visualize public repositories, meaning your data needs to be publicly accessible for visualization. If you need to maintain privacy, please use the local visualization method.
Operation Steps
- Open the page: https://huggingface.co/spaces/lerobot/visualize_dataset
- Fill in the Dataset Repo ID, such as
io-ai-data/uncap_pen
- Select the task number on the left, such as
Episode 0
- There are more options at the top of the page to choose the most suitable playback method
Training Models
Training models is a key step in implementing robot learning. Different models may have different requirements for parameters and data.
We use the smolvla
model as an example to explain the basic training commands and parameter configuration:
Basic Training Command
Use the following command to start training the smolvla
model:
python -m lerobot.scripts.train \
--policy.type=smolvla \
--dataset.root=/data/lerobot_dataset \
--dataset.repo_id=io-ai-data/lerobot-dataset \
--policy.device=cuda \
--output_dir=/data/lerobot_model
The above command will run the training command on a single NVIDIA graphics card.
It will use the LeRobot data in /data/lerobot_dataset
to train the smolvla
model, and the model will be saved in the directory /data/lerobot_model
.
Parameter Descriptions
--policy.type
: Specify the type of model to train, such assmolvla
.--dataset.root
: Root directory of the local dataset, should point to the extracted LeRobot dataset path.--dataset.repo_id
: Hugging Face dataset repository ID, such asio-ai-data/lerobot-dataset
.--policy.device
: Specify the training device, supportscuda
(GPU) orcpu
.--output_dir
: Directory to save the trained model
smolVLA Fine-tuning Recommendations (Recommended)
smolVLA is a VLA model optimized for consumer-grade/single-card environments. Compared to training from scratch, it is more recommended to fine-tune on official pre-trained weights.
Installation and Preparation
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e ".[smolvla]"
Fine-tuning from Pre-trained Model (Recommended)
python -m lerobot.scripts.train \
--policy.path=lerobot/smolvla_base \
--dataset.root=/data/lerobot_dataset \
--dataset.repo_id=your-name/your-repo \
--policy.device=cuda \
--output_dir=/data/lerobot_smolvla_finetune \
--training.num_train_steps=20000 \
--batch_size=64
Practical recommendations:
- Data volume: Start with about 50 task segments, try to cover diversity in object positions, poses, start/end points, etc.
- Resources and duration: Single A100 training 20k steps takes about 4 hours; consumer-grade graphics cards can appropriately reduce batch size or enable gradient accumulation.
- Hyperparameter starting point:
batch_size=64
,training.num_train_steps=20k
, keep learning rate default or fine-tune from1e-4
. - When to use
--policy.type=smolvla
: For training from scratch; usually requires more data and longer duration, not recommended for small datasets.
Training from Scratch (Optional)
python -m lerobot.scripts.train \
--policy.type=smolvla \
--dataset.root=/data/lerobot_dataset \
--dataset.repo_id=your-name/your-repo \
--policy.device=cuda \
--output_dir=/data/lerobot_smolvla_fromscratch \
--training.num_train_steps=200000 \
--batch_size=64
Common Optimizations
- Memory optimization:
--training.fp16=true
,--training.gradient_accumulation_steps=4
(adjust according to memory). - Multi-view/multi-scene data augmentation: Significant impact on generalization during fine-tuning phase.
- Monitoring tools: Recommend configuring W&B to monitor training curves and evaluation metrics for early stopping and backtracking.
ACT Fine-tuning Recommendations
ACT (Action-Conditioned Transformer) is suitable for single-task or short-sequence policy learning; usually performs worse than smolVLA in multi-task generalization. If the task is focused, high control frequency, and relatively short sequences, ACT is still a cost-effective choice.
Data and preprocessing:
- Trajectory slicing: Ensure uniform segment length and alignment (e.g., 10-20 step action chunks).
- Action normalization: Unified scale/units can significantly stabilize training.
- Observation consistency: Camera intrinsics/perspectives should be as consistent as possible, or record diverse demonstrations covering typical perturbations.
Training hyperparameter starting point (fine-tune according to memory and task difficulty):
batch_size=64
, learning rate1e-4
, warmup1000
steps, training steps100k-200k
.- If overfitting: increase data diversity, add regularization or early stopping; if underfitting: extend steps or relax regularization.
Command example (when local lerobot
version includes ACT policy):
python -m lerobot.scripts.train \
--policy.type=act \
--dataset.root=/data/lerobot_dataset \
--dataset.repo_id=your-name/your-repo \
--policy.device=cuda \
--output_dir=/data/lerobot_act_finetune \
--batch_size=64 \
--training.num_train_steps=100000 \
--training.learning_rate=1e-4
Note: Different versions may have different policy names and available parameters. Please refer to the local src/lerobot/policies
implementation and official documentation.
Frequently Asked Questions (FAQ)
Q: How long does it take to export LeRobot data?
A: Export time depends on the data size and current system load. Usually 3-5 minutes per GB of data. It is recommended to export in batches to improve efficiency.
Q: How much data can the free version export?
A: The free version has export quantity limits, which will be displayed on the export interface. For large exports, it is recommended to upgrade to the paid version.
Q: What models can the exported data be used for training?
A: LeRobot format supports various VLA models, including but not limited to: smolvla, Pi0, ACT, etc. Please refer to the LeRobot official codebase: https://github.com/huggingface/lerobot/tree/main/src/lerobot/policies
Q: Does online visualization require public datasets?
A: Yes, Hugging Face Spaces' online visualization tool can only access public datasets. If you need to maintain privacy, please use the local Rerun SDK.
Q: What to do if Rerun SDK installation fails?
A: Ensure you are using Python 3.10+ and check your network connection. If problems persist, try using conda environment or virtual environment.
Q: What to do if running out of memory during training?
A: You can try the following methods:
- Reduce batch size
- Use the num_workers parameter of the data loader
- Enable gradient accumulation
- Use mixed precision training
Q: How to evaluate training effectiveness?
A: LeRobot provides various evaluation metrics, including task success rate, action accuracy, etc. For specific evaluation methods, please refer to the LeRobot official documentation.
Q: What hardware acceleration is supported?
A: Supports CUDA GPU acceleration and CPU training. It is recommended to use GPU for better training performance.
Q: How to upload data to Hugging Face?
A: Use Hugging Face CLI tools:
pip install huggingface_hub
huggingface-cli login
huggingface-cli upload-dataset your-username/dataset-name /path/to/dataset
Q: What to do if the dataset is too large?
A: It is recommended to reduce the FPS setting during export. The default is 30 frames per second, which can be reduced to 10 or 15.
Q: How to verify export data integrity?
A: Use LeRobot's validation tool:
python -m lerobot.scripts.validate_dataset --root /path/to/dataset
Related Links
- LeRobot GitHub: https://github.com/huggingface/lerobot
- LeRobot Models: https://huggingface.co/lerobot
- Hugging Face Online Viewing Tool: https://huggingface.co/spaces/lerobot/visualize_dataset
- Rerun Official Website: https://www.rerun.io/