Data Management
Feature Overview
Data management is the core module of the IO data platform, providing comprehensive data lifecycle management functionality. Users can centrally manage all data files here, perform retrieval, filtering, preview, annotation and batch operations, serving as the starting point of the data annotation workflow.

Main Features
Data Browsing and Retrieval
Project Filtering
Data management supports multiple project views: you can view data from all projects, select specific projects to view their data, access personal private data, or browse team shared data. This flexible filtering approach allows users of different roles to quickly find required data.
Advanced Search Function
The system provides powerful search capabilities, supporting fuzzy matching and exact search of data names, filtering by data source robots, filtering data by annotation tags, filtering by upload time, and filtering by file format (MCAP, BAG, video, audio, images). These search conditions can be combined to help you precisely locate target data.
Status Filtering
Through dimensions such as assignment status (assigned/unassigned tasks), annotation status (annotated/not annotated), quality status (high quality/low quality/pending review), you can quickly filter data meeting specific conditions, improving work efficiency.
Data Preview and Playback
Data Preview
The system provides thumbnail display functionality, allowing you to quickly browse data content. Simultaneously displays basic information such as file size, duration, upload time, and metadata such as robot information, collection parameters, helping you comprehensively understand data characteristics.
Online Playback
Supports online playback of multiple formats: video files, audio files, and MCAP format robot data visualization playback. The player provides control functions such as pause, fast forward, slow motion, loop playback, allowing you to flexibly view and analyze data content.
Batch Operation Functions
Data Management Operations
Supports rich batch operation functionality that greatly improves data management efficiency:
Batch Data Management:
- Batch Rename - Batch modify dataset names
- View Statistics - Batch view dataset statistical information (size, duration, annotation count, etc.)
- Manage Tags - Batch add or remove dataset tags
- Delete Data - Batch delete datasets (soft delete, can be recovered from trash)
- Import External Data - Batch import external data sources
- Associate Robots - Batch associate datasets with robot devices
Batch Annotation Operations:
- Create Annotation Task - After selecting multiple datasets, create annotation task with one click
- Append to Existing Task - Append datasets to existing annotation tasks
- View Annotation Progress - Batch view dataset annotation completion status
Dataset Re-upload and Recovery:
The platform supports intelligent dataset recovery functionality. When you re-upload a previously deleted (soft-deleted) dataset:
- Auto Detection - The system automatically detects if a soft-deleted dataset with the same name exists
- Recovery Options - If a soft-deleted dataset with the same name is detected during upload, you can choose:
- Recover Existing Dataset - Recover the soft-deleted dataset, preserving all historical information
- Preserve original annotation data (dataset_markers)
- Preserve task associations (dataset_tasks)
- Preserve dataset tags and metadata
- Preserve access and operation logs
- Create New Dataset - Ignore the soft-deleted dataset and create a new dataset record
- Recover Existing Dataset - Recover the soft-deleted dataset, preserving all historical information
- Seamless Recovery - Recovered datasets are immediately removed from trash and restored to normal use status
Dataset Recovery Recommendations:
- If data was accidentally deleted, we recommend selecting "Recover Existing Dataset" to preserve all historical information
- If it's a new data file, you can choose "Create New Dataset"
- Recovery operations preserve all important data such as annotations and task associations
Annotation Related Operations
After selecting data, you can create annotation tasks with one click, or append data to existing annotation tasks. You can also view annotation results and progress of data, as well as quality statistics, providing comprehensive support for annotation work.
Data Download and Export
File Download
Supports downloading original data files, converted MCAP files, and batch download in ZIP compressed package format. Whether you need individual files or batch data, you can conveniently obtain them.
Data Export
Provides functions such as annotation result export, statistical report export, metadata export. Exported data can be directly used for model training, data analysis or other purposes, meeting needs for different scenarios.
Data Quality Monitoring
Quality Indicators
The system continuously monitors key indicators such as annotation completion rate, quality pass rate, annotation efficiency, abnormal data, helping you comprehensively understand data quality status.
Quality Analysis
Through functions such as quality trend analysis, annotator performance comparison, problem analysis, you can deeply understand data quality change patterns, identify improvement opportunities, and improve overall annotation quality.
Metadata Synchronization and Validation
The platform provides powerful metadata synchronization functionality to ensure dataset information accuracy:
Automatic Metadata Extraction:
- After successful upload, the system automatically extracts dataset metadata
- Including file size, duration, start time, end time and other information
- For MCAP format, extracts complete message statistics and topic information
Manual Metadata Synchronization:
- If metadata is inaccurate or needs updating, can manually trigger metadata synchronization
- System will re-read files and update metadata information
- Support batch synchronization of metadata for multiple datasets
Metadata Validation:
- Automatically validate file integrity
- Detect if files are corrupted or have format errors
- For corrupted files, will be marked as error status and error information recorded
Error Handling:
- Automatically detect permanent errors (such as file corruption) and temporary errors
- Permanent errors are directly marked as error status, no retry
- Temporary errors can be manually retried for synchronization
Metadata Synchronization Notes:
- Synchronization process may take some time, especially for large files
- Synchronization does not affect existing annotation data
- If file is corrupted, synchronization will fail and mark error status
Trash and Data Recovery
The platform uses soft delete mechanism. Deleted datasets enter trash and can be recovered:
Trash Functionality:
- Centrally manage all deleted datasets
- Display deletion time, deleter and other information
- Support filtering by type, time and other conditions
- Support searching deleted datasets
Data Recovery:
- Can recover accidentally deleted datasets from trash
- After recovery, all historical information is preserved
- Support single recovery and batch recovery
- Recovered datasets are automatically removed from trash
Permanent Deletion:
- Administrators can permanently delete data in trash
- After permanent deletion, cannot be recovered, please operate with caution
Task Queue Integration:
- Dataset metadata synchronization and other operations use task queue processing
- Can monitor synchronization progress through task queue management functionality
- Support queue pause, resume, cleanup and other operations (see task queue management documentation for details)
Administrator
As a platform administrator, you can view and manage data from all projects, monitor overall data quality status, assign data to different projects, and perform system maintenance, clean up invalid data, optimize storage space.
Project Manager
Project managers can manage data for responsible projects, select data to create annotation tasks, monitor data annotation progress, and ensure data annotation quality. Through data management module, project managers can comprehensively control project data status.