Skip to main content

HDF5 Dataset

HDF5 (Hierarchical Data Format v5) is an efficient, flexible storage format widely used in embodied AI. Its hierarchical groups/datasets make it easy to organize multimodal data, with fast I/O and cross‑platform sharing.

Import

Device vendors may use different folder/naming conventions. The platform supports common external collectors (e.g., SenseTime Piper). If your schema isn’t supported yet, contact us with the structure and we’ll adapt quickly.

Export

You can export annotated mcap/bag/hdf5 inputs as HDF5 for ML training. Annotation associates actions with NL instructions so VLA models learn to follow language.

See: Annotation

After annotation, select subsets to export.

Select to export

  • Group size: how many raw files per HDF5 (set 1 for one‑to‑one)
  • Refresh rate: sampling frequency per second (affects size)

After export, view results on the page:

Export success

Downloaded data:

Export result

Structure

Exported files are grouped (e.g., chunk_001.hdf5) and follow a tree layout:

  • Root /
  • Subgroups like /data, /meta
    • /data contains subgroups per episode (episode_001, episode_002, ...)
  • Datasets under each /data/episode_xxx include:
    • Attributes
      • task (EN instruction)
      • task_zh (ZH instruction)
      • score (quality score)
    • Stored data
      • action (joint commands, ND array)
      • observation.images.* (compressed images)
      • observation.state (sensor states)
      • observation.gripper (gripper state)

Example:

HDF5 "./chunk_001.hdf5" {
FILE_CONTENTS {
group /
group /data
group /data/episode_001
dataset /data/episode_001/action
dataset /data/episode_001/observation.gripper
dataset /data/episode_001/observation.images.camera_01
dataset /data/episode_001/observation.images.camera_02
dataset /data/episode_001/observation.images.camera_03
dataset /data/episode_001/observation.images.camera_04
dataset /data/episode_001/observation.state
group /data/episode_002
dataset /data/episode_002/action
dataset /data/episode_002/observation.gripper
dataset /data/episode_002/observation.images.camera_01
dataset /data/episode_002/observation.images.camera_02
dataset /data/episode_002/observation.images.camera_03
dataset /data/episode_002/observation.images.camera_04
dataset /data/episode_002/observation.state
......
group /meta
}
}

Read example

Using Python h5py:

import h5py

with h5py.File('chunk_001.hdf5', 'r') as f:
print('top:', list(f.keys()))
episode_001 = f['/data/episode_001']
print('episode_001 datasets:', list(episode_001.keys()))
action = episode_001['action'][:]
print('action:', action)

When to use HDF5

  • Scalable multimodal storage (images, sensors, etc.)
  • Built‑in compression
  • Cross‑platform sharing/migration
  • Flexible hierarchy for complex tasks

Robot training

Exported HDF5 can be used for imitation learning, RL, and VLA models.

See training details: HDF5 for robot training