Skip to main content

LeRobot v2 vs v3 Format Differences

LeRobot currently has two formats: v2.0/v2.1 and v3.0. Understanding the differences helps with format selection, migration, and training pipeline integration.

For format conversion or visual verification on local data, you can use LeRobot Studio: open a v2 or v3 dataset, then choose the target version (v2.1 or v3.0) on export to convert in either direction—no Python or Hugging Face CLI required.

Summary of Differences

Aspectv2.0 / v2.1v3.0
StorageOne file per episodeMultiple episodes in fewer, larger files; locations defined in metadata
Tabular datadata/chunk-XXX/episode_YYYYYY.parquetdata/chunk-XXX/file-YYY.parquet (read by row range)
Videovideos/chunk-XXX/{key}/episode_YYYYYY.mp4videos/{key}/chunk-XXX/file-YYY.mp4 (read by timestamp range)
Episode metadatameta/episodes.jsonl (one JSON per line)Chunked Parquet under meta/episodes/
Tasksmeta/tasks.jsonlmeta/tasks.jsonl or meta/tasks.parquet
PathsDerived from chunk and episode indicesPath templates configurable in info.json
ScalabilityMany files; slower enumerationHub streaming (StreamingLeRobotDataset); better for large datasets

In v3, data is stored in fewer large files; readers use row indices and timestamps in metadata to reconstruct a per-episode view. This reduces file count and fits cloud streaming.

Directory Layout

v2

  • meta/info.json: includes codebase_version (v2.0 or v2.1) and chunks_size
  • meta/episodes.jsonl: one JSON per line per episode
  • meta/tasks.jsonl: task indices and descriptions
  • data/chunk-{chunk}/episode_{episode}.parquet: one Parquet per episode
  • videos/chunk-{chunk}/{featureKey}/episode_{episode}.mp4: one MP4 per episode per feature
meta/
info.json
episodes.jsonl
tasks.jsonl
data/
chunk-000/
episode_000000.parquet
episode_000001.parquet
...
videos/
chunk-000/
observation.images.front/
episode_000000.mp4
episode_000001.mp4
...

v3

  • meta/info.json: codebase_version is v3.0; optional splits and path templates
  • meta/stats.json (optional): statistics for normalization
  • meta/episodes/: chunked Parquet; each row is one episode and its location in data/videos
  • meta/tasks.jsonl or meta/tasks.parquet: tasks
  • data/chunk-{chunk}/file-{file}.parquet: multiple episodes in one file
  • videos/{featureKey}/chunk-{chunk}/file-{file}.mp4: multiple episodes in one video; boundaries given by timestamps in metadata
meta/
info.json
stats.json # optional
episodes/
chunk-000/
file-000.parquet
tasks.jsonl # or tasks.parquet
data/
chunk-000/
file-000.parquet
videos/
observation.images.front/
chunk-000/
file-000.mp4

Metadata Differences

Episode:

  • v2: Each line in episodes.jsonl has episode_index, length, tasks, and optional task_index. Paths are computed from episode_index and chunks_size; paths are not stored in metadata.
  • v3: Parquet rows add dataset_from_index, dataset_to_index (table row range), data/chunk_index, data/file_index, and per-video videos/{key}/chunk_index, file_index, from_timestamp, to_timestamp. Readers use these to locate the right file and range.

info.json:

  • v2: Must have chunks_size and codebase_version (v2.0 or v2.1).
  • v3: codebase_version is v3.0; may include splits, data_path, video_path. If paths differ from the default template, validators may warn; readers can still adapt.

Format Conversion

Official v2.1 → v3.0

Hugging Face provides a Python script that merges v2.1’s per-episode parquet/mp4 files into v3’s larger files and writes offsets and timestamps in meta/episodes/*. Best for datasets already on the Hub.

Two-way v2.1 ↔ v3.0 with LeRobot Studio

LeRobot Studio supports opening, previewing, and exporting both v2 and v3 in the browser. Choose the target version (v2.1 or v3.0) on export to convert locally—no Python or Hub needed.

  • Data: v2→v3 merges episode Parquet files in the same chunk into file-*.parquet; v3→v2 slices by row range into one parquet per episode.
  • Video: v2→v2 copies as-is; v2→v3 / v3→v3 transcodes by timestamp into shared MP4; v3→v2 cuts by time into one MP4 per episode.
  • Use when: You have local or private data and prefer a UI over the CLI; or you export from the IO-AI platform and then verify with Studio.
Recommended

For both viewing data and changing format, LeRobot Studio is the easiest: drag-and-drop local load, no upload; choose v2.1 or v3.0 on export.

Validation and Compatibility

Required files:

  • v2: meta/info.json (codebase_version starts with v2), meta/episodes.jsonl, meta/tasks.jsonl. Missing chunks_size uses a default and triggers a warning.
  • v3: meta/info.json (starts with v3), at least one Parquet under meta/episodes/. Tasks can be in tasks.jsonl or tasks.parquet.

To self-check, read codebase_version in meta/info.json and ensure the directory and files match the table above.

Training: v3 supports streaming from the Hub and suits large scale; v2.1 is still used by many frameworks (e.g. Pi0/OpenPI in this repo). Choose v2.1 or v3.0 according to your training framework or platform.

Summary

  • Choose v2: When you need compatibility with v2-only pipelines, or your dataset is small and file count is not a concern; structure is simple and easy to inspect.
  • Choose v3: When you want the latest LeRobot training stack, or large data with streaming, fewer files, and cloud-friendly training.

For local v2/v3 preview and conversion, use LeRobot Studio.

See also: LeRobot Dataset v3.0 (official), this section LeRobot Studio, LeRobot dataset.