データフォーマット

IOデータプラットフォームは柔軟なデータフォーマットをサポートしており、データ可視化テンプレートをカスタマイズできます。

ここでは、IOデータ収集製品で収集されたデータフォーマットを例に説明します。

ヒューマンデータフォーマット

ファイル構成

f"{date}_{project}_{scene}_{task}_{staff_id}_{timestamp}"
├── align_result.csv # タイムスタンプアライメント表
├── annotation.json  # アノテーションデータ
├── config           # カメラとセンサーの設定
│   ├── calib_data.yml
│   ├── depth_to_rgb.yml
│   ├── mocap_main.yml
│   ├── orbbec_depth.yml
│   ├── orbbec_rgb.yml
│   └── pose_calib.yml
└── data.mcap        # マルチモーダルデータ

マルチモーダルデータ

library:   mcap go v1.7.0                                              
profile:   ros1                                                        
messages:  45200                                                       
duration:  1m5.625866496s                                              
start:     2025-01-15T18:09:29.628202496+08:00 (1736935769.628202496)  
end:       2025-01-15T18:10:35.254068992+08:00 (1736935835.254068992)  
compression:
  zstd: [764/764 chunks] [6.13 GiB/3.84 GiB (37.39%)] [59.87 MiB/sec] 
channels:
  (1)  /rgbd/color/image_raw/compressed                  1970 msgs (30.02 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (2)  /joint_states                                     1970 msgs (30.02 Hz)   : sensor_msgs/JointState [ros1msg]       
  (3)  /claws_r_hand                                     1970 msgs (30.02 Hz)   : io_msgs/claws_angle [ros1msg]          
  (4)  /global_localization                              1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
  (5)  /robot_description                                   1 msgs              : std_msgs/String [ros1msg]              
  (6)  /ee_visualization                                 1970 msgs (30.02 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (7)  /rgbd/depth/image_raw                             1970 msgs (30.02 Hz)   : sensor_msgs/Image [ros1msg]            
  (8)  /colorized_depth                                  1970 msgs (30.02 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (9)  /claws_l_hand                                     1970 msgs (30.02 Hz)   : io_msgs/claws_angle [ros1msg]          
  (10) /claws_touch_data                                 1970 msgs (30.02 Hz)   : io_msgs/squashed_touch [ros1msg]       
  (11) /touch_visualization                              1970 msgs (30.02 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (12) /mocap/sensor_data                                1970 msgs (30.02 Hz)   : io_msgs/squashed_mocap_data [ros1msg]  
  (13) /mocap/ros_tf                                     1970 msgs (30.02 Hz)   : tf2_msgs/TFMessage [ros1msg]           
  (14) /left_ee_pose                                     1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
  (15) /right_ee_pose                                    1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
  (16) /usb_cam_left/mjpeg_raw/compressed                1960 msgs (29.87 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (17) /usb_cam_right/mjpeg_raw/compressed               1946 msgs (29.65 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (18) /usb_cam_fisheye/mjpeg_raw/compressed             1957 msgs (29.82 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (19) /realsense_left_hand/depth/image_rect_raw         1961 msgs (29.88 Hz)   : sensor_msgs/Image [ros1msg]            
  (20) /realsense_left_hand/color/image_raw/compressed   1961 msgs (29.88 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (21) /realsense_right_hand/depth/image_rect_raw        1947 msgs (29.67 Hz)   : sensor_msgs/Image [ros1msg]            
  (22) /realsense_right_hand/color/image_raw/compressed  1947 msgs (29.67 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
  (23) /world_left_ee_pose                               1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
  (24) /world_right_ee_pose                              1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
channels: 24
attachments: 0
metadata: 0

トピック名	データ内容
/mocap/sensor_data	モーションキャプチャに基づく関節速度、加速度、角速度、回転角度、センサーデータ
/mocap/ros_tf	モーションキャプチャに基づく全関節のTF
/joint_states	モーションキャプチャに基づく全関節のJointState
/right_ee_pose	メインヘッドカメラ座標系における右グリッパのポーズ
/left_ee_pose	メインヘッドカメラ座標系における左グリッパのポーズ
/claws_l_hand	左グリッパの閉じ具合
/claws_r_hand	右グリッパの閉じ具合
/claws_touch_data	グリッパの触覚データ（2つのメッセージを含み、各メッセージのframe_idは左右グリッパを示し、dataの最初の4つの値が有効）
/realsense_left_hand/color/image_raw/compressed	左グリッパカメラのRGB画像
/realsense_left_hand/depth/image_rect_raw	左グリッパカメラの深度画像
/realsense_right_hand/color/image_raw/compressed	右グリッパカメラのRGB画像
/realsense_right_hand/depth/image_rect_raw	右グリッパカメラの深度画像
/rgbd/color/image_raw/compressed	メインヘッドカメラのRGB画像
/rgbd/depth/image_raw	メインヘッドカメラの深度画像
/colorized_depth	メインヘッドカメラのカラーデプス画像
/usb_cam_fisheye/mjpeg_raw/compressed	メインヘッド魚眼カメラのRGB画像
/usb_cam_left/mjpeg_raw/compressed	メインヘッド左単眼カメラのRGB画像
/usb_cam_right/mjpeg_raw/compressed	メインヘッド右単眼カメラのRGB画像
/ee_visualization	メインヘッドカメラRGB画像内のエンドエフェクターポーズの可視化
/touch_visualization	グリッパ触覚データの可視化
/robot_description	モーションキャプチャURDF
/global_localization	メインヘッドカメラのワールド座標系におけるポーズ
/world_left_ee_pose	ワールド座標系における左グリッパのポーズ
/world_right_ee_pose	ワールド座標系における右グリッパのポーズ

人が触覚グローブを装着してデータを収集する場合、触覚デジタル信号配列のトピックが追加されます：

/mocap/touch_data 57 msgs (30.25 Hz): io_msgs/squashed_touc [ros1msg]

自然言語アノテーションデータ

{
  "belong_to": "20250115_InnerTest_PublicArea_TableClearing_szk_180926",
  "mocap_offset": [],
  "object_set": [
  "paper cup",
  "placemat",
  "trash can",
  "napkin",
  "plate",
  "dinner knife",
  "tableware storage box",
  "wine glass",
  "dinner fork"
  ],
  "scene": "PublicArea",
  "skill_set": [
  "pick {A} from {B}",
  "toss {A} into {B}",
  "place {A} on {B}"
  ],
  "subtasks": [
  {
    "skill": "pick {A} from {B}",
    "description": "pick the paper cup from the placemat with the left gripper",
    "description_zh": "左夹爪 从 餐垫 捡起 纸杯",
    "end_frame_id": 227,
    "end_timestamp": "1736935777206000000",
    "sequence_id": 1,
    "start_frame_id": 159,
    "start_timestamp": "1736935774906000000",
    "comment": "",
    "attempts": "success"
  },
  {
    "skill": "toss {A} into {B}",
    "description": "toss the paper cup into the trash can with the left gripper",
    "description_zh": "左夹爪 扔纸杯进垃圾桶",
    "end_frame_id": 318,
    "end_timestamp": "1736935780244000000",
    "sequence_id": 2,
    "start_frame_id": 231,
    "start_timestamp": "1736935777306000000",
    "comment": "",
    "attempts": "success"
  },
  ...
  ],
  "tag_set": [],
  "task_description": "20250115_InnerTest_PublicArea_TableClearing_szk_180926"
}

遠隔操作ロボットデータフォーマット

ファイル構成

f"{robot_name}_{date}_{timestamp}_{sequence_id}"
├── RM_AIDAL_250124_172033_0.mcap
├── RM_AIDAL_250124_172033_0.json
├── RM_AIDAL_250126_093648_0.metadata.yaml

マルチモーダルデータ

Files:             RM_AIDAL_250126_091041_0.mcap
Bag size:          443.3 MiB
Storage id:        mcap
Duration:          100.052164792s
Start:             Jan 24 2025 21:37:32.526605552 (1737725852.526605552)
End:               Jan 24 2025 21:39:12.578770344 (1737725952.578770344)
Messages:          62116
Topic information: Topic: /camera_01/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
           Topic: /camera_02/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
           Topic: /camera_03/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
           Topic: io_teleop/joint_states | Type: sensor_msgs/msg/JointState | Count: 1529 | Serialization Format: cdr
           Topic: io_teleop/joint_cmd | Type: sensor_msgs/msg/JointState | Count: 10009 | Serialization Format: cdr
           Topic: io_teleop/target_ee_poses | Type: geometry_msgs/msg/PoseArray | Count: 10014 | Serialization Format: cdr
           Topic: io_teleop/target_base_move | Type: std_msgs/msg/Float64MultiArray | Count: 10010 | Serialization Format: cdr
           Topic: io_teleop/target_gripper_status | Type: sensor_msgs/msg/JointState | Count: 10012 | Serialization Format: cdr
           Topic: io_teleop/target_joint_from_vr | Type: sensor_msgs/msg/JointState | Count: 10012 | Serialization Format: cdr
           Topic: /robot_description | Type: std_msgs/msg/String | Count: 1 | Serialization Format: cdr
           Topic: /tf | Type: tf2_msgs/msg/TFMessage | Count: 1529 | Serialization Format: cdr

トピック名	データ内容
/camera_01/color/image_raw/compressed	メインカメラのRGB画像
/camera_02/color/image_raw/compressed	左カメラのRGB画像
/camera_03/color/image_raw/compressed	右カメラのRGB画像
io_teleop/joint_states	関節状態
io_teleop/joint_cmd	関節コマンド
io_teleop/target_ee_poses	エンドエフェクターの目標ポーズ
io_teleop/target_base_move	ベース移動目標
io_teleop/target_gripper_status	グリッパの目標状態
io_teleop/target_joint_from_vr	VRデバイスの関節目標
/robot_description	ロボットURDF記述
/tf	TF空間ポーズ変換情報

自然言語アノテーションデータ

{
  "belong_to": "RM_AIDAL_250126_091041_0",
  "mocap_offset": [],
  "object_set": [
    "lemon candy",
    "plate",
    "pistachios"
  ],
  "scene": "250126",
  "skill_set": [
    "place {A} on {B}"
  ],
  "subtasks": [
    {
      "skill": "place {A} on {B}",
      "objecta": "lemon candy",
      "objectb": "plate",
      "options": [
        "leftHand"
      ],
      "description": "place the lemon candy on the plate with the left hand",
      "end_timestamp": "1737725886915000000",
      "sequence_id": 1,
      "start_timestamp": "1737725880757000000",
      "comment": "",
      "attempts": "success"
    },
    {
      "skill": "place {A} on {B}",
      "objecta": "pistachios",
      "objectb": "plate",
      "options": [
        "rightHand"
      ],
      "description": "place the pistachios on the plate with the right hand",
      "end_timestamp": "1737725950745000000",
      "sequence_id": 2,
      "start_timestamp": "1737725941657000000",
      "comment": "",
      "attempts": "success"
    }
  ],
  "tag_set": [],
  "task_description": "20250205_RM_ItemPacking_zhouxw"
}

モデル学習用データ

上記のmcapおよびjsonデータをPythonで解析可能なデータに変換し、大規模モデルの学習データとして直接利用できます。

HDF5フォーマット

以下は基本的なデータ例です。実際のデータや顧客のカスタマイズ要件により、出力される学習データのフォーマットは異なります。

/root
  ├── metadata (Group)
  │     ├── creation_time (Attribute)
  │     ├── source (Attribute)
  │     ├── schema (Dataset)
  │
  ├── messages (Group)
  │     ├── /camera_01/color/image_raw/compressed (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── /camera_02/color/image_raw/compressed (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── /camera_03/color/image_raw/compressed (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── io_teleop/joint_states (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── io_teleop/joint_cmd (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── io_teleop/target_ee_poses (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── io_teleop/target_base_move (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── io_teleop/target_gripper_status (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── io_teleop/target_joint_from_vr (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── /robot_description (Group)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)
  │     │
  │     ├── /tf (Group)
  │     │     ├── timestamps (Dataset)
  │     │     ├── data (Dataset)
  │     │     ├── schema_id (Attribute)

LeRobotフォーマット

サンプルデータはこちらをご参照ください：https://huggingface.co/datasets/io-ai-data/DesktopCleanup_RM_AIDAL_demo

{
  "codebase_version": "v2.1",
  "robot_type": "custom_arm",
  "total_episodes": 20,
  "total_frames": 5134,
  "total_tasks": 20,
  "total_videos": 0,
  "total_chunks": 1,
  "chunks_size": 1000,
  "fps": 30,
  "splits": {
    "train": "0:20"
  },
  "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet",
  "video_path": "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4",
  "features": {
    "observation.images.camera_01": {
      "dtype": "image",
      "shape": [
        480,
        640,
        3
      ]
    },
    "observation.images.camera_02": {
      "dtype": "image",
      "shape": [
        480,
        640,
        3
      ]
    },
    "observation.images.camera_03": {
      "dtype": "image",
      "shape": [
        480,
        640,
        3
      ]
    },
    "observation.images.camera_04": {
      "dtype": "image",
      "shape": [
        480,
        640,
        3
      ]
    },
    "observation.state": {
      "dtype": "float64",
      "shape": [
        37
      ],
      "names": [
        "r_joint1",
        "r_joint2",
        "r_joint3",
        "r_joint4",
        "r_joint5",
        "r_joint6",
        "l_joint1",
        "l_joint2",
        "l_joint3",
        "l_joint4",
        "l_joint5",
        "l_joint6",
        "R_thumb_MCP_joint1",
        "R_thumb_MCP_joint2",
        "R_thumb_PIP_joint",
        "R_thumb_DIP_joint",
        "R_index_MCP_joint",
        "R_index_DIP_joint",
        "R_middle_MCP_joint",
        "R_middle_DIP_joint",
        "R_ring_MCP_joint",
        "R_ring_DIP_joint",
        "R_pinky_MCP_joint",
        "R_pinky_DIP_joint",
        "L_thumb_MCP_joint1",
        "L_thumb_MCP_joint2",
        "L_thumb_PIP_joint",
        "L_thumb_DIP_joint",
        "L_index_MCP_joint",
        "L_index_DIP_joint",
        "L_middle_MCP_joint",
        "L_ring_MCP_joint",
        "L_ring_DIP_joint",
        "L_pinky_MCP_joint",
        "L_pinky_DIP_joint",
        "platform_joint"
      ]
    },
    "action": {
      "dtype": "float64",
      "shape": [
        12
      ],
      "names": [
        "l_joint1",
        "l_joint2",
        "l_joint3",
        "l_joint4",
        "l_joint5",
        "l_joint6",
        "r_joint1",
        "r_joint2",
        "r_joint3",
        "r_joint4",
        "r_joint5",
        "r_joint6"
      ]
    },
    "observation.gripper": {
      "dtype": "float64",
      "shape": [
        2
      ],
      "names": [
        "right_gripper",
        "left_gripper"
      ]
    },
    "timestamp": {
      "dtype": "float32",
      "shape": [
        1
      ],
      "names": null
    },
    "frame_index": {
      "dtype": "int64",
      "shape": [
        1
      ],
      "names": null
    },
    "episode_index": {
      "dtype": "int64",
      "shape": [
        1
      ],
      "names": null
    },
    "index": {
      "dtype": "int64",
      "shape": [
        1
      ],
      "names": null
    },
    "task_index": {
      "dtype": "int64",
      "shape": [
        1
      ],
      "names": null
    }
  }
}

ヒューマンデータフォーマット​

ファイル構成​

マルチモーダルデータ​

自然言語アノテーションデータ​

遠隔操作ロボットデータフォーマット​

ファイル構成​

マルチモーダルデータ​

自然言語アノテーションデータ​

モデル学習用データ​

HDF5フォーマット​

LeRobotフォーマット​

ヒューマンデータフォーマット

ファイル構成

マルチモーダルデータ

自然言語アノテーションデータ

遠隔操作ロボットデータフォーマット

ファイル構成

マルチモーダルデータ

自然言語アノテーションデータ

モデル学習用データ

HDF5フォーマット

LeRobotフォーマット