Skip to main content

HDF5 Dataset

HDF5 (Hierarchical Data Format version 5) is an efficient and flexible data storage format widely used in embodied intelligence. Its hierarchical structure (groups and datasets) makes it easy to organize and manage complex multimodal data, supporting efficient data reading/writing and cross-platform sharing.

HDF5 Data Import

In the embodied intelligence domain, the structure and naming conventions of HDF5 files vary by device manufacturer. The platform is compatible with mainstream external data acquisition systems (such as Songling Piper), allowing direct import and visualization of relevant HDF5 files.

If your HDF5 data is not yet supported, please contact us with your data structure details. We will quickly adapt our platform to enable seamless visualization, annotation, and export of your multimodal data.

HDF5 Data Export

The platform supports exporting annotated data in formats such as mcap, bag, and hdf5 to HDF5 files, facilitating subsequent machine learning model training. During annotation, actions are linked with natural language instructions to ensure VLA models can understand and execute language commands.

For annotation operations, see: Data Annotation

After annotation, you can select the required data subset for export in the export interface.

Select data to export

  • Group Count: Set the number of original files included in each HDF5 file. For one-to-one correspondence, set to 1.
  • Data Refresh Rate: Controls the number of data acquisitions per second, affecting file size.

After export, you can view the results in the interface:

View export results

Downloaded data and read it

Exported Data Structure Description

Exported HDF5 files are named by original file groups (e.g., chunk_001.hdf5) and organized in a tree structure:

  • Root Group (/): Top-level directory.
  • Subgroups: Such as /data, /meta.
    • Under /data, subgroups are divided by annotation task sequence (e.g., episode_001, episode_002).
  • Datasets: Such as /data/episode_001
    • Attributes include:
      • task: Annotated natural language (English)
      • task_zh: Annotated natural language (Chinese)
      • score: Action quality score
    • Stored data includes:
      • action: Issued joint commands (multi-dimensional array)
      • observation.images.*: Compressed images from various viewpoints (JPEG)
      • observation.state: Sensor observations (multi-dimensional array)
      • observation.gripper: Gripper closure state observations (multi-dimensional array)

Example structure:

HDF5 "./chunk_001.hdf5" {
FILE_CONTENTS {
group /
group /data
group /data/episode_001
dataset /data/episode_001/action
dataset /data/episode_001/observation.gripper
dataset /data/episode_001/observation.images.camera_01
dataset /data/episode_001/observation.images.camera_02
dataset /data/episode_001/observation.images.camera_03
dataset /data/episode_001/observation.images.camera_04
dataset /data/episode_001/observation.state
group /data/episode_002
dataset /data/episode_002/action
dataset /data/episode_002/observation.gripper
dataset /data/episode_002/observation.images.camera_01
dataset /data/episode_002/observation.images.camera_02
dataset /data/episode_002/observation.images.camera_03
dataset /data/episode_002/observation.images.camera_04
dataset /data/episode_002/observation.state
......
group /meta
}
}

HDF5 File Reading Example

It is recommended to use the Python h5py library to read and manipulate HDF5 files. Basic usage is as follows:

import h5py

# Open HDF5 file in read-only mode
with h5py.File('chunk_001.hdf5', 'r') as f:
# View top-level groups
print("Top-level groups:", list(f.keys()))

# Access datasets under /data/episode_001 group
episode_001 = f['/data/episode_001']
print("Datasets under episode_001:", list(episode_001.keys()))

# Read action dataset
action_data = episode_001['action'][:]
print("Action data:", action_data)

Application Scenarios and Advantages

HDF5 format offers the following advantages in embodied intelligence:

  • Supports large-scale multimodal data storage (e.g., high-resolution images, sensor data)
  • Built-in data compression saves storage space
  • Cross-platform compatibility facilitates data sharing and migration
  • Flexible hierarchical structure suitable for complex tasks and diverse data management

By designing HDF5 structures reasonably and leveraging platform tools, you can efficiently manage and process complex data related to embodied intelligence, supporting scientific research and model training.