Skip to main content

Data Formats

EmbodyFlow Platform is designed for general robot data management, with Robot Operating System (ROS) as the benchmark for unified robot data management.

  1. Data Import: Supports automatic conversion of non-ROS standard data from systems like AgiBot and AgileX into ROS standard format for unified management.
  2. Data Visualization: Built-in visualization models for over 30 mainstream robots, allowing smooth playback of all formats including 3D animations and 2D images.
  3. Data Export: Supports one-click export to standard HDF5/LeRobot data formats, with adaptive joint and image mapping based on original data, ready for model training.

Table of Contents


Human Data Format

Human data collection is primarily used to record the operator's actions and interaction processes, containing multimodal sensor data.

File Structure

Each collection task generates a folder named with a timestamp:

f"{date}_{project}_{scene}_{task}_{staff_id}_{timestamp}"
├── align_result.csv # Timestamp alignment table
├── annotation.json # Annotation data
├── config/ # Camera and sensor configuration
│ ├── calib_data.yml
│ ├── depth_to_rgb.yml
│ ├── mocap_main.yml
│ ├── orbbec_depth.yml
│ ├── orbbec_rgb.yml
│ └── pose_calib.yml
└── data.mcap # Multimodal data package

Multimodal Data

The data.mcap file contains all synchronized sensor data, stored in MCAP format.

Main Topic List:

Topic NameData TypeDescription
/mocap/sensor_dataio_msgs/squashed_mocap_dataJoint velocity, acceleration, angular velocity, rotation angle, and sensor data from motion capture
/mocap/ros_tftf2_msgs/TFMessageTF transforms for all joints based on motion capture
/joint_statessensor_msgs/JointStateJointStates for all joints based on motion capture
/rgbd/color/image_raw/compressedsensor_msgs/CompressedImageRGB image from the main head camera
/rgbd/depth/image_rawsensor_msgs/ImageDepth image from the main head camera
/colorized_depthsensor_msgs/CompressedImageColorized depth image from the main head camera
/left_ee_posegeometry_msgs/PoseStampedLeft gripper pose in the main head camera coordinate system
/right_ee_posegeometry_msgs/PoseStampedRight gripper pose in the main head camera coordinate system
/claws_l_handio_msgs/claws_angleLeft gripper closure degree
/claws_r_handio_msgs/claws_angleRight gripper closure degree
/claws_touch_dataio_msgs/squashed_touchGripper tactile data
/realsense_left_hand/color/image_raw/compressedsensor_msgs/CompressedImageRGB image from the left gripper camera
/realsense_left_hand/depth/image_rect_rawsensor_msgs/ImageDepth image from the left gripper camera
/realsense_right_hand/color/image_raw/compressedsensor_msgs/CompressedImageRGB image from the right gripper camera
/realsense_right_hand/depth/image_rect_rawsensor_msgs/ImageDepth image from the right gripper camera
/usb_cam_fisheye/mjpeg_raw/compressedsensor_msgs/CompressedImageRGB image from the main head fisheye camera
/usb_cam_left/mjpeg_raw/compressedsensor_msgs/CompressedImageRGB image from the main head left monocular camera
/usb_cam_right/mjpeg_raw/compressedsensor_msgs/CompressedImageRGB image from the main head right monocular camera
/ee_visualizationsensor_msgs/CompressedImageEnd-effector pose visualization in the main head camera RGB image
/touch_visualizationsensor_msgs/CompressedImageGripper tactile data visualization
/robot_descriptionstd_msgs/StringMotion capture URDF
/global_localizationgeometry_msgs/PoseStampedMain head camera pose in the world coordinate system
/world_left_ee_posegeometry_msgs/PoseStampedLeft gripper pose in the world coordinate system
/world_right_ee_posegeometry_msgs/PoseStampedRight gripper pose in the world coordinate system

Camera Data:

  • Main Head RGBD Camera: Color + Depth images
  • Left/Right Gripper Camera: RealSense RGBD
  • Fisheye Camera: Panoramic view
  • Left/Right Monocular Camera: Stereo vision

Note: If tactile gloves are used, an additional /mocap/touch_data topic will be added.

Click to view raw MCAP data format
library:   mcap go v1.7.0                                              
profile: ros1
messages: 45200
duration: 1m5.625866496s
start: 2025-01-15T18:09:29.628202496+08:00 (1736935769.628202496)
end: 2025-01-15T18:10:35.254068992+08:00 (1736935835.254068992)
compression:
zstd: [764/764 chunks] [6.13 GiB/3.84 GiB (37.39%)] [59.87 MiB/sec]
channels:
(1) /rgbd/color/image_raw/compressed 1970 msgs (30.02 Hz) : sensor_msgs/CompressedImage [ros1msg]
(2) /joint_states 1970 msgs (30.02 Hz) : sensor_msgs/JointState [ros1msg]
(3) /claws_r_hand 1970 msgs (30.02 Hz) : io_msgs/claws_angle [ros1msg]
(4) /global_localization 1970 msgs (30.02 Hz) : geometry_msgs/PoseStamped [ros1msg]
(5) /robot_description 1 msgs : std_msgs/String [ros1msg]
(6) /ee_visualization 1970 msgs (30.02 Hz) : sensor_msgs/CompressedImage [ros1msg]
(7) /rgbd/depth/image_raw 1970 msgs (30.02 Hz) : sensor_msgs/Image [ros1msg]
(8) /colorized_depth 1970 msgs (30.02 Hz) : sensor_msgs/CompressedImage [ros1msg]
(9) /claws_l_hand 1970 msgs (30.02 Hz) : io_msgs/claws_angle [ros1msg]
(10) /claws_touch_data 1970 msgs (30.02 Hz) : io_msgs/squashed_touch [ros1msg]
(11) /touch_visualization 1970 msgs (30.02 Hz) : sensor_msgs/CompressedImage [ros1msg]
(12) /mocap/sensor_data 1970 msgs (30.02 Hz) : io_msgs/squashed_mocap_data [ros1msg]
(13) /mocap/ros_tf 1970 msgs (30.02 Hz) : tf2_msgs/TFMessage [ros1msg]
(14) /left_ee_pose 1970 msgs (30.02 Hz) : geometry_msgs/PoseStamped [ros1msg]
(15) /right_ee_pose 1970 msgs (30.02 Hz) : geometry_msgs/PoseStamped [ros1msg]
(16) /usb_cam_left/mjpeg_raw/compressed 1960 msgs (29.87 Hz) : sensor_msgs/CompressedImage [ros1msg]
(17) /usb_cam_right/mjpeg_raw/compressed 1946 msgs (29.65 Hz) : sensor_msgs/CompressedImage [ros1msg]
(18) /usb_cam_fisheye/mjpeg_raw/compressed 1957 msgs (29.82 Hz) : sensor_msgs/CompressedImage [ros1msg]
(19) /realsense_left_hand/depth/image_rect_raw 1961 msgs (29.88 Hz) : sensor_msgs/Image [ros1msg]
(20) /realsense_left_hand/color/image_raw/compressed 1961 msgs (29.88 Hz) : sensor_msgs/CompressedImage [ros1msg]
(21) /realsense_right_hand/depth/image_rect_raw 1947 msgs (29.67 Hz) : sensor_msgs/Image [ros1msg]
(22) /realsense_right_hand/color/image_raw/compressed 1947 msgs (29.67 Hz) : sensor_msgs/CompressedImage [ros1msg]
(23) /world_left_ee_pose 1970 msgs (30.02 Hz) : geometry_msgs/PoseStamped [ros1msg]
(24) /world_right_ee_pose 1970 msgs (30.02 Hz) : geometry_msgs/PoseStamped [ros1msg]
channels: 24
attachments: 0
metadata: 0
Topic NameData Meaning
/mocap/sensor_dataJoint velocity, acceleration, angular velocity, rotation angle, and sensor data from motion capture
/mocap/ros_tfTF for all joints based on motion capture
/joint_statesJointState for all joints based on motion capture
/right_ee_poseRight gripper pose in main head camera coordinate system
/left_ee_poseLeft gripper pose in main head camera coordinate system
/claws_l_handLeft gripper closure degree
/claws_r_handRight gripper closure degree
/claws_touch_dataGripper tactile data (contains two messages, frame_id indicates left or right gripper, first 4 values in data are valid)
/realsense_left_hand/color/image_raw/compressedLeft gripper camera RGB image
/realsense_left_hand/depth/image_rect_rawLeft gripper camera depth image
/realsense_right_hand/color/image_raw/compressedRight gripper camera RGB image
/realsense_right_hand/depth/image_rect_rawRight gripper camera depth image
/rgbd/color/image_raw/compressedMain head camera RGB image
/rgbd/depth/image_rawMain head camera depth image
/colorized_depthMain head camera colorized depth image
/usb_cam_fisheye/mjpeg_raw/compressedMain head fisheye camera RGB image
/usb_cam_left/mjpeg_raw/compressedMain head left monocular camera RGB image
/usb_cam_right/mjpeg_raw/compressedMain head right monocular camera RGB image
/ee_visualizationEnd-effector pose visualization in main head camera RGB image
/touch_visualizationGripper tactile data visualization
/robot_descriptionMotion capture URDF
/global_localizationMain head camera pose in world coordinate system
/world_left_ee_poseLeft gripper pose in world coordinate system
/world_right_ee_poseRight gripper pose in world coordinate system

If data is collected using tactile gloves, a tactile digital signal array topic will be added:

/mocap/touch_data 57 msgs (30.25 Hz): io_msgs/squashed_touch [ros1msg]

Natural Language Annotation

{
"belong_to": "20250115_InnerTest_PublicArea_TableClearing_szk_180926",
"mocap_offset": [],
"object_set": [
"paper cup",
"placemat",
"trash can",
"napkin",
"plate",
"dinner knife",
"tableware storage box",
"wine glass",
"dinner fork"
],
"scene": "PublicArea",
"skill_set": [
"pick {A} from {B}",
"toss {A} into {B}",
"place {A} on {B}"
],
"subtasks": [
{
"skill": "pick {A} from {B}",
"description": "pick the paper cup from the placemat with the left gripper",
"description_zh": "左夹爪 从 餐垫 捡起 纸杯",
"end_frame_id": 227,
"end_timestamp": "1736935777206000000",
"sequence_id": 1,
"start_frame_id": 159,
"start_timestamp": "1736935774906000000",
"comment": "",
"attempts": "success"
},
{
"skill": "toss {A} into {B}",
"description": "toss the paper cup into the trash can with the left gripper",
"description_zh": "左夹爪 扔纸杯进垃圾桶",
"end_frame_id": 318,
"end_timestamp": "1736935780244000000",
"sequence_id": 2,
"start_frame_id": 231,
"start_timestamp": "1736935777306000000",
"comment": "",
"attempts": "success"
},
...
],
"tag_set": [],
"task_description": "20250115_InnerTest_PublicArea_TableClearing_szk_180926"
}

Teleoperation Robot Data Format

Teleoperation robot data records the process of an operator controlling a robot through VR devices.

Teleoperation File Structure

f"{robot_name}_{date}_{timestamp}_{sequence_id}"
├── RM_AIDAL_250124_172033_0.mcap # Multimodal data
├── RM_AIDAL_250124_172033_0.json # Annotation data
└── RM_AIDAL_250126_093648_0.metadata.yaml # Metadata

Teleoperation Multimodal Data

Main Topic List:

Topic NameData TypeDescription
/camera_01/color/image_raw/compressedsensor_msgs/msg/CompressedImageMain camera RGB image
/camera_02/color/image_raw/compressedsensor_msgs/msg/CompressedImageLeft camera RGB image
/camera_03/color/image_raw/compressedsensor_msgs/msg/CompressedImageRight camera RGB image
io_teleop/joint_statessensor_msgs/msg/JointStateJoint state
io_teleop/joint_cmdsensor_msgs/msg/JointStateJoint command
io_teleop/target_ee_posesgeometry_msgs/msg/PoseArrayTarget end-effector poses
io_teleop/target_base_movestd_msgs/msg/Float64MultiArrayTarget base move
io_teleop/target_gripper_statussensor_msgs/msg/JointStateTarget gripper status
io_teleop/target_joint_from_vrsensor_msgs/msg/JointStateTarget joints from VR device
/robot_descriptionstd_msgs/msg/StringRobot URDF description
/tftf2_msgs/msg/TFMessageTF spatial pose transform info
Click to view raw MCAP data format
Files:             RM_AIDAL_250126_091041_0.mcap
Bag size: 443.3 MiB
Storage id: mcap
Duration: 100.052164792s
Start: Jan 24 2025 21:37:32.526605552 (1737725852.526605552)
End: Jan 24 2025 21:39:12.578770344 (1737725952.578770344)
Messages: 62116
Topic information: Topic: /camera_01/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
Topic: /camera_02/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
Topic: /camera_03/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
Topic: io_teleop/joint_states | Type: sensor_msgs/msg/JointState | Count: 1529 | Serialization Format: cdr
Topic: io_teleop/joint_cmd | Type: sensor_msgs/msg/JointState | Count: 10009 | Serialization Format: cdr
Topic: io_teleop/target_ee_poses | Type: geometry_msgs/msg/PoseArray | Count: 10014 | Serialization Format: cdr
Topic: io_teleop/target_base_move | Type: std_msgs/msg/Float64MultiArray | Count: 10010 | Serialization Format: cdr
Topic: io_teleop/target_gripper_status | Type: sensor_msgs/msg/JointState | Count: 10012 | Serialization Format: cdr
Topic: io_teleop/target_joint_from_vr | Type: sensor_msgs/msg/JointState | Count: 10012 | Serialization Format: cdr
Topic: /robot_description | Type: std_msgs/msg/String | Count: 1 | Serialization Format: cdr
Topic: /tf | Type: tf2_msgs/msg/TFMessage | Count: 1529 | Serialization Format: cdr
Topic NameData Meaning
/camera_01/color/image_raw/compressedMain camera RGB image
/camera_02/color/image_raw/compressedLeft camera RGB image
/camera_03/color/image_raw/compressedRight camera RGB image
io_teleop/joint_statesJoint state
io_teleop/joint_cmdJoint command
io_teleop/target_ee_posesTarget end-effector poses
io_teleop/target_base_moveTarget base move
io_teleop/target_gripper_statusTarget gripper status
io_teleop/target_joint_from_vrTarget joints from VR device
/robot_descriptionRobot URDF description
/tfTF spatial pose transform info

Teleoperation Annotation Data

{
"belong_to": "RM_AIDAL_250126_091041_0",
"mocap_offset": [],
"object_set": [
"lemon candy",
"plate",
"pistachios"
],
"scene": "250126",
"skill_set": [
"place {A} on {B}"
],
"subtasks": [
{
"skill": "place {A} on {B}",
"objecta": "lemon candy",
"objectb": "plate",
"options": [
"leftHand"
],
"description": "place the lemon candy on the plate with the left hand",
"end_timestamp": "1737725886915000000",
"sequence_id": 1,
"start_timestamp": "1737725880757000000",
"comment": "",
"attempts": "success"
},
{
"skill": "place {A} on {B}",
"objecta": "pistachios",
"objectb": "plate",
"options": [
"rightHand"
],
"description": "place the pistachios on the plate with the right hand",
"end_timestamp": "1737725950745000000",
"sequence_id": 2,
"start_timestamp": "1737725941657000000",
"comment": "",
"attempts": "success"
}
],
"tag_set": [],
"task_description": "20250205_RM_ItemPacking_zhouxw"
}

Exporting Model Training Data

To facilitate model training, the platform provides various data export capabilities, converting raw captured MCAP and JSON data into formats suitable for machine learning training.

Common HDF5 and LeRobot formats can be exported with one click, and they automatically adapt to different robots or number of sensors without manual configuration.

HDF5 Format

HDF5 format is suitable for large-scale data storage and fast access, organized in a hierarchical structure.

File Structure:

chunk_001.hdf5
├── /data/ # Data group
│ ├── episode_001/ # First task sequence
│ │ ├── action # Joint commands (multi-dimensional array)
│ │ ├── observation.state # Sensor observation values
│ │ ├── observation.gripper # Gripper state
│ │ └── observation.images.* # Images from various views
│ │ └── task # Task description in English
│ │ └── task_zh # Task description in Chinese
│ └── episode_002/ # Second task sequence
└── /meta/ # Metadata group

Data Content:

  • action - Joint control commands (float32 array)
  • observation.state - Sensor observation values (float32 array)
  • observation.images.* - Compressed image data (JPEG format)
  • observation.gripper - Gripper state (float32 array)
  • task - English natural language description
  • task_zh - Chinese natural language description
  • score - Action quality score

LeRobot Format

LeRobot format is a standard data format in the robot learning field, compatible with mainstream robot learning frameworks.

Reference Sample Data: https://huggingface.co/datasets/io-intelligence/piper_uncap_pen

Data Feature Definition:

The length and shape of the exported LeRobot dataset will automatically adapt, supporting any number of cameras or joints. The shape here is for the format exported for the AgileX desktop 7-DOF arm:

Feature NameData TypeShapeDescription
actionfloat32[14]Joint commands (7 joints per arm)
observation.statefloat32[14]Joint states (7 joints per arm)
observation.images.cam_highimage[3,480,640]High-view camera image
observation.images.cam_lowimage[3,480,640]Low-view camera image
observation.images.cam_left_wristimage[3,480,640]Left wrist camera image
observation.images.cam_right_wristimage[3,480,640]Right wrist camera image
timestampfloat32[1]Timestamp
frame_indexint64[1]Frame index
episode_indexint64[1]Task sequence index
Click to view complete LeRobot format definition example
{
"codebase_version": "v2.1",
"robot_type": "custom_arm",
"total_episodes": 20,
"total_frames": 5134,
"total_tasks": 20,
"total_videos": 0,
"total_chunks": 1,
"chunks_size": 1000,
"fps": 30,
"splits": {
"train": "0:20"
},
"data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet",
"video_path": "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4",
"features": {
"observation.images.camera_01": {
"dtype": "image",
"shape": [480, 640, 3]
},
"observation.images.camera_02": {
"dtype": "image",
"shape": [480, 640, 3]
},
"observation.images.camera_03": {
"dtype": "image",
"shape": [480, 640, 3]
},
"observation.images.camera_04": {
"dtype": "image",
"shape": [480, 640, 3]
},
"observation.state": {
"dtype": "float64",
"shape": [37],
"names": [
"r_joint1", "r_joint2", "r_joint3", "r_joint4", "r_joint5", "r_joint6",
"l_joint1", "l_joint2", "l_joint3", "l_joint4", "l_joint5", "l_joint6",
"R_thumb_MCP_joint1", "R_thumb_MCP_joint2", "R_thumb_PIP_joint", "R_thumb_DIP_joint",
"R_index_MCP_joint", "R_index_DIP_joint", "R_middle_MCP_joint", "R_middle_DIP_joint",
"R_ring_MCP_joint", "R_ring_DIP_joint", "R_pinky_MCP_joint", "R_pinky_DIP_joint",
"L_thumb_MCP_joint1", "L_thumb_MCP_joint2", "L_thumb_PIP_joint", "L_thumb_DIP_joint",
"L_index_MCP_joint", "L_index_DIP_joint", "L_middle_MCP_joint", "L_middle_DIP_joint",
"L_ring_MCP_joint", "L_ring_DIP_joint", "L_pinky_MCP_joint", "L_pinky_DIP_joint",
"platform_joint"
]
},
"action": {
"dtype": "float64",
"shape": [12],
"names": [
"l_joint1", "l_joint2", "l_joint3", "l_joint4", "l_joint5", "l_joint6",
"r_joint1", "r_joint2", "r_joint3", "r_joint4", "r_joint5", "r_joint6"
]
},
"observation.gripper": {
"dtype": "float64",
"shape": [2],
"names": ["right_gripper", "left_gripper"]
},
"timestamp": {
"dtype": "float32",
"shape": [1],
"names": null
},
"frame_index": {
"dtype": "int64",
"shape": [1],
"names": null
},
"episode_index": {
"dtype": "int64",
"shape": [1],
"names": null
},
"index": {
"dtype": "int64",
"shape": [1],
"names": null
},
"task_index": {
"dtype": "int64",
"shape": [1],
"names": null
}
}
}