LeRobot Dataset
LeRobot is an open-source standardized dataset solution for robot learning and reinforcement learning scenarios, provided by HuggingFace. It offers a unified format that makes it easier for researchers to share, compare, and reproduce robot learning experiments.
Exporting Data
The IO Data Platform supports exporting data in the LeRobot format, which can be directly used for training VLA (Vision-Language-Action) models. This format contains a complete mapping of visual information, language instructions, and action data for robotic operations.
Exporting data requires significant computational resources. Therefore, the free version of the IO Data Open Platform limits the number of exports per user. The paid version removes this limitation and offers faster export speeds with GPU acceleration.
1. Select Data to Export
You need to annotate the data first. Annotation links actions with natural language instructions, which is essential for training VLA models. This process ensures that the model can understand language commands and translate them into corresponding robot actions.
Once annotation is complete, you can view the annotated data on the export page and select specific subsets for export.
You can customize the dataset name. If you plan to upload the data to Hugging Face, it is recommended to use the standard repository naming format, such as myproject/myrepo1
, to simplify the publishing process.
The more data you select, the slower the export will be. It is recommended to select data by task type rather than exporting everything at once. This not only speeds up the export process but also makes data management and model training easier.
2. Download and Extract the Exported File
The export process may take several minutes to tens of minutes, depending on the data size and system load. The progress will refresh automatically, so you can return to the export page later to check the result.
Once the export is successful, you will see a Download Data button in the Export Records section on the right side of the page. Click it to download a .tar.gz archive.
It is recommended to create a new empty directory locally, such as ~/Downloads/mylerobot3
, to extract the files and avoid confusion:
The extracted files follow the standard LeRobot dataset structure, including visual data, state information, and action labels:
Browsing Data
There are two common visualization methods to help users quickly browse, understand, and debug the data. Each method is suitable for different scenarios.
Scenario | Method | Advantages |
---|---|---|
Local development | Rerun SDK local viewer | Full features, highly interactive, offline |
Quick preview/demo | Hugging Face online viewer | No installation, easy sharing, accessible |
1. Local Viewing with Rerun SDK
Download and install the lerobot
repository locally. Using lerobot/scripts/visualize_dataset.py
, you can leverage the Rerun SDK for interactive, timeline-based multimodal data visualization (including images, states, actions, etc.). This method provides the richest interactive features and customization options.
Install Rerun SDK
Make sure you are using Python 3.10 or above, and run the following commands to install the required dependencies:
python3 -m pip install rerun-sdk==0.23.1
git clone https://github.com/lerobot-ai/lerobot.git # Clone the repository
cd lerobot
pip install -e . # Install in development mode
Start the Visualization Script
python3 lerobot/scripts/visualize_dataset.py \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0
Parameter description:
--repo-id
: Hugging Face dataset name, e.g.,io-ai-data/lerobot_dataset
--root
: Local path to the extracted LeRobot data directory--episode-index
: Index of the episode to view (starting from 0)
Save as .rrd File
You can save the visualization results in Rerun format for offline viewing or sharing with team members:
python3 lerobot/scripts/visualize_dataset.py \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0 \
--save 1 \
--output-dir ./rrd_out
# Then view offline
rerun ./rrd_out/lerobot_pusht_episode_0.rrd
Remote WebSocket Mode
If you need to view data remotely (e.g., data on a server viewed locally), use the WebSocket mode:
# On the server
python3 lerobot/scripts/visualize_dataset.py \
--repo-id io-ai-data/lerobot_dataset \
--root ~/Downloads/mylerobot3 \
--episode-index 0 \
--mode distant \
--ws-port 9091
# On the local machine
rerun ws://<server-ip>:9091
2. Online Viewing via Hugging Face Spaces
If you prefer not to install any local environment, LeRobot provides an online viewer based on Hugging Face Spaces, requiring no local dependencies. This is especially suitable for quick previews or sharing datasets with your team.
Online visualization requires uploading your data to a Hugging Face repository. Free Hugging Face accounts can only visualize public repositories, meaning your data must be publicly accessible. To keep your data private, consider upgrading to a paid account or use local visualization.
Steps
- Open: https://huggingface.co/spaces/lerobot/visualize_dataset
- Enter the Dataset Repo ID, e.g.,
io-ai-data/lerobot_dataset
- Enter the Episode Index, e.g.,
0
- Select the dataset split (default is "train")
- Click the "Load" button and wait for the data to load and display
Features
- Synchronized playback of multi-channel videos (RGB, Depth, etc.)
- Real-time line charts for state and control data
- Language instructions displayed with timestamps
- Intuitive Gradio-based interface, no coding required
- Supports speed adjustment and frame-by-frame playback
Related Links
- LeRobot GitHub: https://github.com/huggingface/lerobot
- LeRobot Datasets: https://huggingface.co/lerobot
- Hugging Face Online Viewer: https://huggingface.co/spaces/lerobot/visualize_dataset
- Rerun Official Site: https://www.rerun.io/