LeRobot v2 vs v3 Format Differences
LeRobot currently has two formats: v2.0/v2.1 and v3.0. Understanding the differences helps with format selection, migration, and training pipeline integration.
For format conversion or visual verification on local data, you can use LeRobot Studio: open a v2 or v3 dataset, then choose the target version (v2.1 or v3.0) on export to convert in either direction—no Python or Hugging Face CLI required.
Summary of Differences
| Aspect | v2.0 / v2.1 | v3.0 |
|---|---|---|
| Storage | One file per episode | Multiple episodes in fewer, larger files; locations defined in metadata |
| Tabular data | data/chunk-XXX/episode_YYYYYY.parquet | data/chunk-XXX/file-YYY.parquet (read by row range) |
| Video | videos/chunk-XXX/{key}/episode_YYYYYY.mp4 | videos/{key}/chunk-XXX/file-YYY.mp4 (read by timestamp range) |
| Episode metadata | meta/episodes.jsonl (one JSON per line) | Chunked Parquet under meta/episodes/ |
| Tasks | meta/tasks.jsonl | meta/tasks.jsonl or meta/tasks.parquet |
| Paths | Derived from chunk and episode indices | Path templates configurable in info.json |
| Scalability | Many files; slower enumeration | Hub streaming (StreamingLeRobotDataset); better for large datasets |
In v3, data is stored in fewer large files; readers use row indices and timestamps in metadata to reconstruct a per-episode view. This reduces file count and fits cloud streaming.
Directory Layout
v2
meta/info.json: includescodebase_version(v2.0orv2.1) andchunks_sizemeta/episodes.jsonl: one JSON per line per episodemeta/tasks.jsonl: task indices and descriptionsdata/chunk-{chunk}/episode_{episode}.parquet: one Parquet per episodevideos/chunk-{chunk}/{featureKey}/episode_{episode}.mp4: one MP4 per episode per feature
meta/
info.json
episodes.jsonl
tasks.jsonl
data/
chunk-000/
episode_000000.parquet
episode_000001.parquet
...
videos/
chunk-000/
observation.images.front/
episode_000000.mp4
episode_000001.mp4
...
v3
meta/info.json:codebase_versionisv3.0; optionalsplitsand path templatesmeta/stats.json(optional): statistics for normalizationmeta/episodes/: chunked Parquet; each row is one episode and its location in data/videosmeta/tasks.jsonlormeta/tasks.parquet: tasksdata/chunk-{chunk}/file-{file}.parquet: multiple episodes in one filevideos/{featureKey}/chunk-{chunk}/file-{file}.mp4: multiple episodes in one video; boundaries given by timestamps in metadata
meta/
info.json
stats.json # optional
episodes/
chunk-000/
file-000.parquet
tasks.jsonl # or tasks.parquet
data/
chunk-000/
file-000.parquet
videos/
observation.images.front/
chunk-000/
file-000.mp4
Metadata Differences
Episode:
- v2: Each line in
episodes.jsonlhasepisode_index,length,tasks, and optionaltask_index. Paths are computed fromepisode_indexandchunks_size; paths are not stored in metadata. - v3: Parquet rows add
dataset_from_index,dataset_to_index(table row range),data/chunk_index,data/file_index, and per-videovideos/{key}/chunk_index,file_index,from_timestamp,to_timestamp. Readers use these to locate the right file and range.
info.json:
- v2: Must have
chunks_sizeandcodebase_version(v2.0orv2.1). - v3:
codebase_versionisv3.0; may includesplits,data_path,video_path. If paths differ from the default template, validators may warn; readers can still adapt.
Format Conversion
Official v2.1 → v3.0
Hugging Face provides a Python script that merges v2.1’s per-episode parquet/mp4 files into v3’s larger files and writes offsets and timestamps in meta/episodes/*. Best for datasets already on the Hub.
- Docs and command: Official docs – Migrate v2.1 → v3.0
- Install
lerobotfrom main or a pre-release, then run:
python -m lerobot.datasets.v30.convert_dataset_v21_to_v30 --repo-id=...
Two-way v2.1 ↔ v3.0 with LeRobot Studio
LeRobot Studio supports opening, previewing, and exporting both v2 and v3 in the browser. Choose the target version (v2.1 or v3.0) on export to convert locally—no Python or Hub needed.
- Data: v2→v3 merges episode Parquet files in the same chunk into
file-*.parquet; v3→v2 slices by row range into one parquet per episode. - Video: v2→v2 copies as-is; v2→v3 / v3→v3 transcodes by timestamp into shared MP4; v3→v2 cuts by time into one MP4 per episode.
- Use when: You have local or private data and prefer a UI over the CLI; or you export from the IO-AI platform and then verify with Studio.
For both viewing data and changing format, LeRobot Studio is the easiest: drag-and-drop local load, no upload; choose v2.1 or v3.0 on export.
Validation and Compatibility
Required files:
- v2:
meta/info.json(codebase_versionstarts with v2),meta/episodes.jsonl,meta/tasks.jsonl. Missingchunks_sizeuses a default and triggers a warning. - v3:
meta/info.json(starts with v3), at least one Parquet undermeta/episodes/. Tasks can be intasks.jsonlortasks.parquet.
To self-check, read codebase_version in meta/info.json and ensure the directory and files match the table above.
Training: v3 supports streaming from the Hub and suits large scale; v2.1 is still used by many frameworks (e.g. Pi0/OpenPI in this repo). Choose v2.1 or v3.0 according to your training framework or platform.
Summary
- Choose v2: When you need compatibility with v2-only pipelines, or your dataset is small and file count is not a concern; structure is simple and easy to inspect.
- Choose v3: When you want the latest LeRobot training stack, or large data with streaming, fewer files, and cloud-friendly training.
For local v2/v3 preview and conversion, use LeRobot Studio.
See also: LeRobot Dataset v3.0 (official), this section LeRobot Studio, LeRobot dataset.