データフォーマット

IOデータプラットフォームは、Robot Operating System (ROS) を基準とした統一的なロボットデータ管理を目的として設計されています。

データインポート: 智元、松霊などのデータ収集システムの非ROS標準データをROS標準フォーマットに自動変換し、統一管理をサポートします。
データ可視化: 30種類以上の主流ロボットの可視化モデルを内蔵し、3Dアニメーションや平面画像などすべてのフォーマットのコンテンツをスムーズに再生できます。
データエクスポート: 標準HDF5/LeRobotデータフォーマットのワンクリックエクスポートをサポートし、元データに基づいて関節と画像を自動適応し、直接モデル訓練に投入できます。

ヒューマンデータフォーマット

ヒューマンデータ収集は主に操作者の動作とインタラクション過程を記録するために使用され、マルチモーダルセンサーデータを含みます。

ファイル構成

各収集タスクはタイムスタンプで命名されたフォルダを生成します：

f"{date}_{project}_{scene}_{task}_{staff_id}_{timestamp}"
├── align_result.csv    # タイムスタンプアライメント表
├── annotation.json     # アノテーションデータ
├── config/            # カメラとセンサーの設定
│   ├── calib_data.yml
│   ├── depth_to_rgb.yml
│   ├── mocap_main.yml
│   ├── orbbec_depth.yml
│   ├── orbbec_rgb.yml
│   └── pose_calib.yml
└── data.mcap          # マルチモーダルデータパッケージ

マルチモーダルデータ

data.mcap ファイルにはすべてのセンサーの同期データが含まれ、MCAPフォーマットで保存されます。

主要Topic一覧:

Topic名	データタイプ	説明
`/mocap/sensor_data`	`io_msgs/squashed_mocap_data`	モーションキャプチャの関節速度、加速度、角速度、回転角度、センサーデータ
`/mocap/ros_tf`	`tf2_msgs/TFMessage`	モーションキャプチャに基づくすべての関節のTF変換
`/joint_states`	`sensor_msgs/JointState`	モーションキャプチャに基づくすべての関節のJointState
`/rgbd/color/image_raw/compressed`	`sensor_msgs/CompressedImage`	メインヘッドカメラのRGB画像
`/rgbd/depth/image_raw`	`sensor_msgs/Image`	メインヘッドカメラの深度画像
`/colorized_depth`	`sensor_msgs/CompressedImage`	メインヘッドカメラのカラー深度画像
`/left_ee_pose`	`geometry_msgs/PoseStamped`	メインヘッドカメラ座標系での左グリッパー姿勢
`/right_ee_pose`	`geometry_msgs/PoseStamped`	メインヘッドカメラ座標系での右グリッパー姿勢
`/claws_l_hand`	`io_msgs/claws_angle`	左グリッパー閉鎖度
`/claws_r_hand`	`io_msgs/claws_angle`	右グリッパー閉鎖度
`/claws_touch_data`	`io_msgs/squashed_touch`	グリッパー触覚データ
`/realsense_left_hand/color/image_raw/compressed`	`sensor_msgs/CompressedImage`	左グリッパーカメラのRGB画像
`/realsense_left_hand/depth/image_rect_raw`	`sensor_msgs/Image`	左グリッパーカメラの深度画像
`/realsense_right_hand/color/image_raw/compressed`	`sensor_msgs/CompressedImage`	右グリッパーカメラのRGB画像
`/realsense_right_hand/depth/image_rect_raw`	`sensor_msgs/Image`	右グリッパーカメラの深度画像
`/usb_cam_fisheye/mjpeg_raw/compressed`	`sensor_msgs/CompressedImage`	メインヘッド魚眼カメラのRGB画像
`/usb_cam_left/mjpeg_raw/compressed`	`sensor_msgs/CompressedImage`	メインヘッド左単眼カメラのRGB画像
`/usb_cam_right/mjpeg_raw/compressed`	`sensor_msgs/CompressedImage`	メインヘッド右単眼カメラのRGB画像
`/ee_visualization`	`sensor_msgs/CompressedImage`	メインヘッドカメラRGB画像でのエンドエフェクター姿勢可視化
`/touch_visualization`	`sensor_msgs/CompressedImage`	グリッパー触覚データ可視化
`/robot_description`	`std_msgs/String`	モーションキャプチャURDF
`/global_localization`	`geometry_msgs/PoseStamped`	世界座標系でのメインヘッドカメラ姿勢
`/world_left_ee_pose`	`geometry_msgs/PoseStamped`	世界座標系での左グリッパー姿勢
`/world_right_ee_pose`	`geometry_msgs/PoseStamped`	世界座標系での右グリッパー姿勢

カメラデータ:

メインヘッドRGBDカメラ: カラー+深度画像
左/右グリッパーカメラ: RealSense RGBD
魚眼カメラ: パノラマビュー
左/右単眼カメラ: ステレオビジョン

注意: 触覚グローブを使用する場合、追加で /mocap/touch_data Topicが追加されます。

元のMCAPデータフォーマットを表示

library:   mcap go v1.7.0                                              
profile:   ros1                                                        
messages:  45200                                                       
duration:  1m5.625866496s                                              
start:     2025-01-15T18:09:29.628202496+08:00 (1736935769.628202496)  
end:       2025-01-15T18:10:35.254068992+08:00 (1736935835.254068992)  
compression:
    zstd: [764/764 chunks] [6.13 GiB/3.84 GiB (37.39%)] [59.87 MiB/sec] 
channels:
    (1)  /rgbd/color/image_raw/compressed                  1970 msgs (30.02 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (2)  /joint_states                                     1970 msgs (30.02 Hz)   : sensor_msgs/JointState [ros1msg]       
    (3)  /claws_r_hand                                     1970 msgs (30.02 Hz)   : io_msgs/claws_angle [ros1msg]          
    (4)  /global_localization                              1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
    (5)  /robot_description                                   1 msgs              : std_msgs/String [ros1msg]              
    (6)  /ee_visualization                                 1970 msgs (30.02 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (7)  /rgbd/depth/image_raw                             1970 msgs (30.02 Hz)   : sensor_msgs/Image [ros1msg]            
    (8)  /colorized_depth                                  1970 msgs (30.02 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (9)  /claws_l_hand                                     1970 msgs (30.02 Hz)   : io_msgs/claws_angle [ros1msg]          
    (10) /claws_touch_data                                 1970 msgs (30.02 Hz)   : io_msgs/squashed_touch [ros1msg]       
    (11) /touch_visualization                              1970 msgs (30.02 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (12) /mocap/sensor_data                                1970 msgs (30.02 Hz)   : io_msgs/squashed_mocap_data [ros1msg]  
    (13) /mocap/ros_tf                                     1970 msgs (30.02 Hz)   : tf2_msgs/TFMessage [ros1msg]           
    (14) /left_ee_pose                                     1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
    (15) /right_ee_pose                                    1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
    (16) /usb_cam_left/mjpeg_raw/compressed                1960 msgs (29.87 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (17) /usb_cam_right/mjpeg_raw/compressed               1946 msgs (29.65 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (18) /usb_cam_fisheye/mjpeg_raw/compressed             1957 msgs (29.82 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (19) /realsense_left_hand/depth/image_rect_raw         1961 msgs (29.88 Hz)   : sensor_msgs/Image [ros1msg]            
    (20) /realsense_left_hand/color/image_raw/compressed   1961 msgs (29.88 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (21) /realsense_right_hand/depth/image_rect_raw        1947 msgs (29.67 Hz)   : sensor_msgs/Image [ros1msg]            
    (22) /realsense_right_hand/color/image_raw/compressed  1947 msgs (29.67 Hz)   : sensor_msgs/CompressedImage [ros1msg]  
    (23) /world_left_ee_pose                               1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
    (24) /world_right_ee_pose                              1970 msgs (30.02 Hz)   : geometry_msgs/PoseStamped [ros1msg]    
channels: 24
attachments: 0
metadata: 0

自然言語アノテーション

annotation.json ファイルにはタスクの意味アノテーション情報が含まれ、訓練とタスク意図の理解に使用されます。

主要フィールド説明:

フィールド	タイプ	説明
`belong_to`	string	関連データファイル識別子
`object_set`	array	タスクに関わるすべてのオブジェクト
`scene`	string	シーン識別子
`skill_set`	array	スキルテンプレート集合
`subtasks`	array	サブタスクシーケンス
`task_description`	string	タスク説明

スキルテンプレートフォーマット:

pick {A} from {B} - BからAを拾う
place {A} on {B} - AをBに置く
toss {A} into {B} - AをBに投げる

サブタスク構造:

{
  "skill": "pick {A} from {B}",
  "description": "pick the paper cup from the placemat with the left gripper",
  "description_zh": "左夹爪 从 餐垫 捡起 纸杯",
  "start_frame_id": 159,
  "end_frame_id": 227,
  "start_timestamp": "1736935774906000000",
  "end_timestamp": "1736935777206000000",
  "sequence_id": 1,
  "attempts": "success",
  "comment": ""
}

完全なアノテーションデータ例を表示

{
  "belong_to": "20250115_InnerTest_PublicArea_TableClearing_szk_180926",
  "mocap_offset": [],
  "object_set": [
    "paper cup",
    "placemat",
    "trash can",
    "napkin",
    "plate",
    "dinner knife",
    "tableware storage box",
    "wine glass",
    "dinner fork"
  ],
  "scene": "PublicArea",
  "skill_set": [
    "pick {A} from {B}",
    "toss {A} into {B}",
    "place {A} on {B}"
  ],
  "subtasks": [
    {
      "skill": "pick {A} from {B}",
      "description": "pick the paper cup from the placemat with the left gripper",
      "description_zh": "左夹爪 从 餐垫 捡起 纸杯",
      "end_frame_id": 227,
      "end_timestamp": "1736935777206000000",
      "sequence_id": 1,
      "start_frame_id": 159,
      "start_timestamp": "1736935774906000000",
      "comment": "",
      "attempts": "success"
    },
    {
      "skill": "toss {A} into {B}",
      "description": "toss the paper cup into the trash can with the left gripper",
      "description_zh": "左夹爪 扔纸杯进垃圾桶",
      "end_frame_id": 318,
      "end_timestamp": "1736935780244000000",
      "sequence_id": 2,
      "start_frame_id": 231,
      "start_timestamp": "1736935777306000000",
      "comment": "",
      "attempts": "success"
    }
  ],
  "tag_set": [],
  "task_description": "20250115_InnerTest_PublicArea_TableClearing_szk_180926"
}

テレオペレーションロボットデータフォーマット

テレオペレーションロボットデータは、操作者がVRデバイスを通じてロボットを制御する過程を記録します。

ファイル構成

f"{robot_name}_{date}_{timestamp}_{sequence_id}"
├── RM_AIDAL_250124_172033_0.mcap    # マルチモーダルデータ
├── RM_AIDAL_250124_172033_0.json    # アノテーションデータ
└── RM_AIDAL_250126_093648_0.metadata.yaml  # メタデータ

マルチモーダルデータ

主要Topic一覧:

Topic名	データタイプ	説明
`/camera_01/color/image_raw/compressed`	`sensor_msgs/msg/CompressedImage`	メインカメラのRGB画像
`/camera_02/color/image_raw/compressed`	`sensor_msgs/msg/CompressedImage`	左カメラのRGB画像
`/camera_03/color/image_raw/compressed`	`sensor_msgs/msg/CompressedImage`	右カメラのRGB画像
`io_teleop/joint_states`	`sensor_msgs/msg/JointState`	関節状態
`io_teleop/joint_cmd`	`sensor_msgs/msg/JointState`	関節コマンド
`io_teleop/target_ee_poses`	`geometry_msgs/msg/PoseArray`	エンドエフェクター目標姿勢
`io_teleop/target_base_move`	`std_msgs/msg/Float64MultiArray`	ベース移動目標
`io_teleop/target_gripper_status`	`sensor_msgs/msg/JointState`	グリッパー状態目標
`io_teleop/target_joint_from_vr`	`sensor_msgs/msg/JointState`	VRデバイスの関節目標
`/robot_description`	`std_msgs/msg/String`	ロボットURDF説明
`/tf`	`tf2_msgs/msg/TFMessage`	TF空間姿勢変換情報

元のMCAPデータフォーマットを表示

Files:             RM_AIDAL_250126_091041_0.mcap
Bag size:          443.3 MiB
Storage id:        mcap
Duration:          100.052164792s
Start:             Jan 24 2025 21:37:32.526605552 (1737725852.526605552)
End:               Jan 24 2025 21:39:12.578770344 (1737725952.578770344)
Messages:          62116
Topic information: Topic: /camera_01/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
                   Topic: /camera_02/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
                   Topic: /camera_03/color/image_raw/compressed | Type: sensor_msgs/msg/CompressedImage | Count: 3000 | Serialization Format: cdr
                   Topic: io_teleop/joint_states | Type: sensor_msgs/msg/JointState | Count: 1529 | Serialization Format: cdr
                   Topic: io_teleop/joint_cmd | Type: sensor_msgs/msg/JointState | Count: 10009 | Serialization Format: cdr
                   Topic: io_teleop/target_ee_poses | Type: geometry_msgs/msg/PoseArray | Count: 10014 | Serialization Format: cdr
                   Topic: io_teleop/target_base_move | Type: std_msgs/msg/Float64MultiArray | Count: 10010 | Serialization Format: cdr
                   Topic: io_teleop/target_gripper_status | Type: sensor_msgs/msg/JointState | Count: 10012 | Serialization Format: cdr
                   Topic: io_teleop/target_joint_from_vr | Type: sensor_msgs/msg/JointState | Count: 10012 | Serialization Format: cdr
                   Topic: /robot_description | Type: std_msgs/msg/String | Count: 1 | Serialization Format: cdr
                   Topic: /tf | Type: tf2_msgs/msg/TFMessage | Count: 1529 | Serialization Format: cdr

自然言語アノテーション

テレオペレーションデータのアノテーションフォーマットはヒューマンデータと同じで、どちらもロボットや人間がどのような動作を行い、どのような物体に関わったかの自然言語表現です。

完全なテレオペレーションアノテーションデータ例を表示

{
    "belong_to": "RM_AIDAL_250126_091041_0",
    "mocap_offset": [],
    "object_set": [
        "lemon candy",
        "plate",
        "pistachios"
    ],
    "scene": "250126",
    "skill_set": [
        "place {A} on {B}"
    ],
    "subtasks": [
        {
            "skill": "place {A} on {B}",
            "objecta": "lemon candy",
            "objectb": "plate",
            "options": [
                "leftHand"
            ],
            "description": "place the lemon candy on the plate with the left hand",
            "end_timestamp": "1737725886915000000",
            "sequence_id": 1,
            "start_timestamp": "1737725880757000000",
            "comment": "",
            "attempts": "success"
        },
        {
            "skill": "place {A} on {B}",
            "objecta": "pistachios",
            "objectb": "plate",
            "options": [
                "rightHand"
            ],
            "description": "place the pistachios on the plate with the right hand",
            "end_timestamp": "1737725950745000000",
            "sequence_id": 2,
            "start_timestamp": "1737725941657000000",
            "comment": "",
            "attempts": "success"
        }
    ],
    "tag_set": [],
    "task_description": "20250205_RM_ItemPacking_zhouxw"
}

モデル訓練データのエクスポート

モデル訓練を便利にするため、プラットフォームは複数のデータエクスポート機能を提供し、元のMCAPとJSONデータを機械学習訓練に適したフォーマットに変換します。

一般的なHDF5とLeRobotフォーマットはワンクリックでエクスポートでき、異なるロボットやセンサー数量に自動適応し、手動設定は不要です。

HDF5フォーマット

HDF5フォーマットは大規模データストレージと高速アクセスに適しており、階層構造でデータを整理します。

ファイル構造:

chunk_001.hdf5
├── /data/                    # データグループ
│   ├── episode_001/         # 最初のタスクシーケンス
│   │   ├── action           # 関節コマンド (多次元配列)
│   │   ├── observation.state # センサー観測値
│   │   ├── observation.gripper # グリッパー状態
│   │   └── observation.images.* # マルチビュー画像
│   └── episode_002/         # 2番目のタスクシーケンス
└── /meta/                   # メタデータグループ

データ内容:

action - 関節制御コマンド (float32配列)
observation.state - センサー観測値 (float32配列)
observation.images.* - 圧縮画像データ (JPEGフォーマット)
observation.gripper - グリッパー状態 (float32配列)
task - 英語自然言語説明
task_zh - 中国語自然言語説明
score - 動作品質スコア

LeRobotフォーマット

LeRobotフォーマットはロボット学習分野の標準データフォーマットで、主流のロボット学習フレームワークと互換性があります。

参考サンプルデータ: https://huggingface.co/datasets/io-intelligence/piper_uncap_pen

データ特徴定義:

エクスポートされるLeRobotデータセットの長さとShapeは自動適応され、任意のカメラ数や関節数をサポートします。ここで示すShapeは松霊デスクトップ7自由度ロボットアームのエクスポートフォーマットです：

特徴名	データタイプ	Shape	説明
`action`	float32	[14]	関節コマンド (左右アーム各7関節)
`observation.state`	float32	[14]	関節状態 (左右アーム各7関節)
`observation.images.cam_high`	image	[3,480,640]	高カメラ画像
`observation.images.cam_low`	image	[3,480,640]	低カメラ画像
`observation.images.cam_left_wrist`	image	[3,480,640]	左手首カメラ画像
`observation.images.cam_right_wrist`	image	[3,480,640]	右手首カメラ画像
`timestamp`	float32	[1]	タイムスタンプ
`frame_index`	int64	[1]	フレームインデックス
`episode_index`	int64	[1]	タスクシーケンスインデックス

完全なLeRobotフォーマット定義例を表示

{
    "codebase_version": "v2.1",
    "robot_type": "aloha",
    "total_episodes": 10,
    "total_frames": 3000,
    "total_tasks": 1,
    "total_videos": 0,
    "total_chunks": 1,
    "chunks_size": 1000,
    "fps": 15,
    "splits": {
        "train": "0:10"
    },
    "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet",
    "video_path": "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4",
    "features": {
        "observation.state": {
            "dtype": "float32",
            "shape": [14],
            "names": [
                [
                    "right_waist",
                    "right_shoulder",
                    "right_elbow",
                    "right_forearm_roll",
                    "right_wrist_angle",
                    "right_wrist_rotate",
                    "right_gripper",
                    "left_waist",
                    "left_shoulder",
                    "left_elbow",
                    "left_forearm_roll",
                    "left_wrist_angle",
                    "left_wrist_rotate",
                    "left_gripper"
                ]
            ]
        },
        "action": {
            "dtype": "float32",
            "shape": [14],
            "names": [
                [
                    "right_waist",
                    "right_shoulder",
                    "right_elbow",
                    "right_forearm_roll",
                    "right_wrist_angle",
                    "right_wrist_rotate",
                    "right_gripper",
                    "left_waist",
                    "left_shoulder",
                    "left_elbow",
                    "left_forearm_roll",
                    "left_wrist_angle",
                    "left_wrist_rotate",
                    "left_gripper"
                ]
            ]
        },
        "observation.images.cam_high": {
            "dtype": "image",
            "shape": [3, 480, 640],
            "names": ["channels", "height", "width"]
        },
        "observation.images.cam_low": {
            "dtype": "image",
            "shape": [3, 480, 640],
            "names": ["channels", "height", "width"]
        },
        "observation.images.cam_left_wrist": {
            "dtype": "image",
            "shape": [3, 480, 640],
            "names": ["channels", "height", "width"]
        },
        "observation.images.cam_right_wrist": {
            "dtype": "image",
            "shape": [3, 480, 640],
            "names": ["channels", "height", "width"]
        },
        "timestamp": {
            "dtype": "float32",
            "shape": [1],
            "names": null
        },
        "frame_index": {
            "dtype": "int64",
            "shape": [1],
            "names": null
        },
        "episode_index": {
            "dtype": "int64",
            "shape": [1],
            "names": null
        },
        "index": {
            "dtype": "int64",
            "shape": [1],
            "names": null
        },
        "task_index": {
            "dtype": "int64",
            "shape": [1],
            "names": null
        }
    }
}

目次​

ヒューマンデータフォーマット​

ファイル構成​

マルチモーダルデータ​

自然言語アノテーション​

テレオペレーションロボットデータフォーマット​

ファイル構成​

マルチモーダルデータ​

自然言語アノテーション​

モデル訓練データのエクスポート​

HDF5フォーマット​

LeRobotフォーマット​

目次

ヒューマンデータフォーマット

ファイル構成

マルチモーダルデータ

自然言語アノテーション

テレオペレーションロボットデータフォーマット

ファイル構成

マルチモーダルデータ

自然言語アノテーション

モデル訓練データのエクスポート

HDF5フォーマット

LeRobotフォーマット