Data Export
Feature Overview
Data export is an important data delivery module of the IO data platform, providing functionality to export annotated data in multiple standard formats, including JSON, CSV, HDF5, LeRobot, MCap, etc. Through flexible filtering conditions, batch export functionality and export history management, it ensures annotated data can be delivered to downstream systems in the most suitable format, supporting model training, data analysis and various application scenarios.

Main Features
Multi-format Export Support
Standard Data Formats
Supports exporting to multiple standard data formats, including JSON (structured data), CSV (tabular data), HDF5 (scientific computing data), LeRobot (robot learning data), MCap (multimodal data), etc. These formats cover the vast majority of downstream application needs.
Custom Formats
Supports customizing export formats based on specific needs, including field selection, data conversion, format configuration, etc. Through custom formats, meet data export needs for special scenarios.
Format Conversion
Provides intelligent format conversion functionality, allowing data conversion from one format to another, ensuring data compatibility between different systems. Conversion process supports data validation and quality checking.
Flexible Filtering Function
Multi-dimensional Filtering
Supports data filtering by multiple dimensions such as project, time, annotator, quality level, etc. Through flexible filtering conditions, precisely select data that needs to be exported.
Advanced Filtering
Provides advanced filtering functionality, supporting complex filter condition combinations, including logical operations, range filtering, fuzzy matching, etc. Advanced filtering allows precise control of exported data range.
Preview Function
Provides data preview functionality before export, allowing viewing of filter results and confirming exported data meets expectations. Preview functionality avoids unnecessary export operations.
Batch Export Management
Batch Processing
Supports batch export of multiple datasets, can simultaneously process multiple export tasks, greatly improving export efficiency. Batch processing is particularly suitable for large-scale data export scenarios.
Task Queue
Provides export task queue management, supporting queuing and execution of multiple export tasks. Through task queue, orderly process large numbers of export requests.
Progress Monitoring
Real-time monitoring of export progress, including completed quantity, processing speed, estimated completion time, etc. Through progress monitoring, timely understand export status.
Export Status:
- pending - Export task has been created and is waiting for execution
- processing - Export task is currently being executed
- completed - Export task has been successfully completed and files have been generated
- failed - Export task execution failed, error information can be viewed
Progress Information:
- Real-time display of export progress percentage
- Display number of processed datasets and total count
- Display estimated remaining time
- Support automatic refresh of progress status
Export History Management
History Records
The platform fully records the history of all export operations, supporting viewing and management of all export tasks:
Record Information:
- Export Time - Creation time, start time, completion time
- Export Format - Export data types (HDF5, LeRobot, MCAP, JSON, CSV, etc.)
- Data Volume - Number of datasets included and file size
- Operator - User information of the person who performed the export
- Export Status - Current export task status (pending, processing, completed, failed)
- File Information - Export file name, size, storage location
History Record Features:
- Support filtering export records by conditions such as time, format, status
- Support searching for specific export tasks
- Display detailed information of export tasks, including list of datasets included
- Support viewing error information of export tasks (if failed)
MCAP Export History and Progress
MCAP export functionality provides complete history records and real-time progress monitoring:
Export History List:
- Display list of all MCAP export tasks
- Each record shows export status, creation time, number of datasets included, etc.
- Support expanding to view detailed information and list of datasets included
- Support filtering by conditions such as status, time
Real-time Progress Tracking:
- Status Monitoring - Real-time update of export task status (pending → processing → completed/failed)
- Progress Display - For tasks in progress, display real-time progress bar
- Auto Refresh - System automatically detects task status changes and updates display
- Error Handling - If task fails, display detailed error information
Export Results:
- File Download - After export completion, can directly download generated MCAP files
- File Information - Display export file size, compression format, etc.
- Storage Location - Display file location in cloud storage
- Training Integration - Exported MCAP files can be directly used for model training
MCAP Export Recommendations:
- When exporting large amounts of data, recommend batch export to improve success rate
- Can view previous export records through export history
- If export fails, can view error information and re-export
- Exported MCAP files are automatically compressed to tar.gz format to save space
HDF5 Export Functionality
HDF5 is an efficient data storage format, the platform provides dedicated HDF5 export functionality:
Export Configuration:
-
Chunk Size - Set the number of original files each HDF5 file contains
- Setting to 1 means each original file corresponds to one HDF5 file (one-to-one mapping)
- Setting larger values can merge multiple files into one HDF5 file
- Recommend setting based on data volume and training requirements
-
Data Refresh Frequency (hz) - Control number of data samples per second, affects file size
- Default 30Hz, suitable for most scenarios
- Can reduce frequency to reduce file size
- Can increase frequency for denser sampling
Export Statistics:
- Display number of selected datasets
- Display export quota usage
- Display export progress and estimated completion time
Export Results:
- Exported HDF5 files are named by original file groups (e.g.,
chunk_001.hdf5) - Files are automatically compressed to tar.gz format
- Support direct download or save to cloud storage
- Exported HDF5 files can be directly used for model training
HDF5 Export Details:
- For more information about HDF5 format and data structure, refer to: HDF5 Dataset Documentation
- HDF5 files use hierarchical structure to organize data, supporting multimodal data storage
- Exported HDF5 files contain complete annotation information (task descriptions, action sequences, etc.)
Version Management
Supports version management of exported data, can save different versions of export results, facilitating data backtracking and comparison. Version management ensures data traceability.
Permission Control
Provides fine-grained permission control, can set different user export permissions for different data. Through permission control, ensure data security and prevent unauthorized export.
Export Quota Management:
- System supports export quota limits to prevent resource abuse
- Display current user's export quota usage
- Administrators can configure global export quota limits
- When quota is exceeded, will prompt and block export operations
Applicable Roles
Administrator
As a platform administrator, you can deliver training data or downstream analysis required data externally, manage export tasks, monitor export progress, and control data export permissions. These functions ensure the platform's data delivery service is secure and efficient.
Project Manager
Project managers can export data related to projects, prepare data for project delivery, monitor data usage, and coordinate data export work. Through data export management, project managers can effectively control project data delivery.