Model Inference
The Embodiflow Data Platform provides comprehensive model inference services, supporting one-click deployment of trained robot learning models as production-grade inference services. The platform supports multiple model formats and flexible deployment methods, providing full-scenario AI inference capabilities from cloud to edge for robot applications.
Product Features
The platform provides a complete pipeline from model training to inference deployment, supporting multiple inference validation and deployment methods:
| Inference Method | Application Scenario | Description |
|---|---|---|
| Simulation Inference Test | Quick Validation | Use random data or custom inputs to quickly verify model inference functionality and performance |
| MCAP File Test | Real Data Validation | Use recorded robot demonstration data to verify model inference effects in real scenarios |
| Offline Edge Deployment | Production Environment Application | Deploy inference services to robot local GPUs for low-latency real-time control |
Inference Workflow
The platform provides a productized inference deployment process, implementing complete operations from model selection to production deployment through visual interfaces without requiring programming experience:
1. Model Source Selection
The platform supports multiple model sources:
Use Fine-tuned Model:
- Select models trained on the platform from the training task list
- Automatically inherit model configuration and parameters from training
- Support selecting different checkpoints (last, best, or specific step)
- Suitable for deploying models trained on your own data
Upload Custom Model:
- Support mainstream model formats:
- SafeTensors - Secure model weight format, recommended
- PyTorch - .pth or .pt format model weights
- ONNX - Standardized model format for cross-platform deployment
- Require uploading both model weights and configuration files
- Support custom model architectures and parameters
Use Pre-trained Model:
- Provide verified pre-trained base models for quick startup
- Suitable for transfer learning or fine-tuning scenarios
- Include models from HuggingFace and other sources
Model Selection Advice:
- For models trained on your data, recommend using fine-tuned models
- For quick testing, can use pre-trained models
- When deploying external models, ensure format compatibility

2. Service Configuration and Deployment
Basic Information Configuration
- Service Name - Custom name for the inference service for easy identification
- Service Description - Optional description of the service purpose
- Project Association - Associate service with specific project for management
Inference Parameter Configuration
Configure inference parameters based on model requirements:
- Inference Precision - Select precision type for inference (bfloat16 or float32)
- Batch Size - Batch size for batch inference
- Max Sequence Length - For models supporting sequences, limit max sequence length
- Other Model-Specific Parameters - Display relevant configuration options based on model type
Resource Configuration
Computing Resources:
- Automatically detect available GPU resources
- Support selecting specific GPU or multi-GPU deployment
- Support CUDA, MPS (Apple Silicon) and other platforms
- Automatically fallback to CPU when no GPU available (lower performance)
Container Configuration:
- Each inference service runs in independent Docker container
- Automatically assign port numbers (range 28000-28999)
- Support GPU passthrough for high-performance inference
- Automatic container management, no manual operation required
Service Deployment
After configuration is complete, click "Deploy" button:
- System automatically creates Docker container
- Loads model weights and configuration
- Starts inference service (takes about 20-30 seconds)
- Automatically performs health check to ensure service is normal
After deployment completion, inference service will automatically start and maintain running status, can immediately perform inference testing.
Service Management
After deployment completion, inference services provide comprehensive status monitoring and management functionality:
Service Information:
- Host Address and Port - HTTP and WebSocket access addresses for inference API
- Service Status - Real-time display of service running status (running, stopped, error, etc.)
- Container Information - Docker container ID and running status
- Creation Time - Service creation and last update time
Resource Monitoring:
- CPU Usage - Real-time display of CPU usage
- Memory Usage - Display memory usage and peak values
- GPU Usage - If using GPU, display GPU utilization and memory usage
- Network IO - Display network traffic statistics
Service Control:
- Start/Stop - Can start or stop inference service at any time
- Restart Service - Restart service to apply configuration changes
- Delete Service - Delete unnecessary inference services to free resources
Service Management Recommendations:
- After deployment, recommend waiting 20-30 seconds to ensure service fully starts
- Regularly check resource usage to avoid resource exhaustion
- Services not used for long time can be stopped to free resources

Model Input/Output Specifications
Inference services have intelligent adaptation capabilities, automatically recognizing and adapting to different models' input/output requirements:
- Image Input - Intelligent adaptation of camera count (one or multiple views) and resolution (automatic scaling)
- State Input - observation.state [12], observation.gripper [2], observation.score [1]
- Action Output - action [12] robot joint control commands
The above information displays the complete configuration of inference services, helping users understand model input/output requirements to ensure correct use of inference functionality.
Inference Testing Features
Simulation Inference Test
Simulation inference functionality provides convenient inference service validation methods, allowing quick model testing without preparing real data:

Feature Highlights:
-
Natural Language Task Input - Input robot execution instructions, such as "Pick up the apple and place it in the basket"
- Support Chinese and English natural language instructions
- System automatically performs language encoding
-
Intelligent Data Generation - One-click random filling of test data for quick test input generation
- Automatically generate image data meeting model requirements (random pixels or placeholder images)
- Automatically fill joint state values (observation.state)
- Automatically fill gripper state (observation.gripper)
- All data formats automatically adapt to model input requirements
-
Instant Inference Execution - Click "Send" button to immediately get model inference results
- Real-time display of inference progress
- Quick return of inference results
- Support multiple consecutive tests
-
Performance Indicator Display - Real-time display of key performance indicators
- Request Time - Total time from sending request to receiving response (including network transmission)
- Inference Time - Actual inference computation time of model
- Data Transfer Time - Time for data upload and download
- Help evaluate model performance and system latency
-
Result Visualization - Inference results displayed in intuitive way
- Display predicted joint positions (action)
- Display gripper control commands
- Support result export and saving
Simulation Inference Use Cases:
- Quickly verify if model service starts normally
- Test if model input/output format is correct
- Evaluate inference service response speed
- Verify natural language instruction processing capability
MCAP File Test
MCAP file test functionality supports using real robot demonstration data for inference validation, which is the best way to evaluate model performance in real scenarios:

Feature Highlights:
-
Data File Upload - Select MCAP data files containing complete robot operation processes
- Support directly selecting from platform datasets
- Support local MCAP file upload
- Automatically validate file format and integrity
-
Intelligent Data Parsing - System automatically extracts and maps multimodal data
- Image Sequence Extraction - Automatically identify and extract camera image topics
- Joint State Extraction - Extract joint state data (joint_states)
- Sensor Data Extraction - Extract other sensor data (such as gripper state, etc.)
- Timestamp Alignment - Automatically align timestamps from different data sources
-
Input Mapping Configuration - Flexibly configure mapping relationship between model inputs and MCAP data
- Image Input Mapping - Select which camera topics in MCAP to map to model inputs
- State Input Mapping - Configure mapping of joint state, gripper state and other data
- Task Description - Set natural language task description for entire sequence
- Default Value Settings - Can set default values for missing data
-
Sequential Batch Inference - Perform continuous inference on complete action sequences
- Support frame-by-frame sequence inference or time-interval inference
- Can set start frame and end frame for inference
- Support skipping certain frames to improve inference speed
- Real-time display of inference progress and completed frame count
-
Effect Comparison Analysis - Quantitatively compare and evaluate inference results with original expert demonstrations
- Action Comparison - Compare differences between inferred actions and expert demonstration actions
- Trajectory Visualization - Visualize predicted trajectories and real trajectories
- Error Statistics - Calculate statistical indicators like action error, position error
- Performance Evaluation - Evaluate model performance on real data
-
Result Export - Support exporting inference results for further analysis
- Export inference action sequences
- Export comparison analysis reports
- Export visualization results
MCAP Test Recommendations:
- Use MCAP files from scenarios similar to training data for testing
- For long sequences, can test in segments to save time
- Focus on action error and trajectory consistency to judge model performance
- Compare inference effects of different checkpoints to select best model
Offline Edge Deployment

Offline edge deployment functionality completely migrates inference services to robot local GPU devices for production-grade applications:
Standardized Deployment Process
The platform provides complete offline deployment solution, including detailed deployment steps and required files:
1. Environment Preparation
- Install Docker and nvidia-docker2 on robot controllers (if using GPU)
- Ensure sufficient storage space to download Docker images and model files
- Install Python 3.8+ and necessary dependency packages (if needed)
2. Image Download
- Platform provides download links for complete Docker images containing inference environment, model weights, and configurations
- Images contain all necessary dependencies and runtime environments
- Support multiple architectures (x86_64, ARM, etc.)
3. Model File Preparation
- Download model weight files and configuration files
- Platform provides pre-packaged model files containing checkpoints and configurations
- Support multiple model formats (PyTorch, ONNX, etc.)
4. Service Startup
- Use provided Docker commands to start inference service locally
- Support GPU acceleration (if hardware supports)
- Automatically configure ports and network
5. Client Connection
- Run ROS client scripts provided by platform
- Establish real-time communication with inference service (WebSocket + BSON protocol)
- Subscribe to sensor topics, publish joint control commands
6. Verification Testing
- Test inference service functionality
- Verify real-time inference performance
- Ensure closed-loop control works correctly
Production Application Advantages
- Edge Computing Architecture - Inference executes locally on robots, completely eliminating network latency and dependencies
- Deep ROS Integration - Seamlessly subscribe to sensor topics and directly publish joint control commands
- Real-time Closed-loop Control - Support high-frequency (2-10Hz) perception-decision-execution loops
- Industrial-grade Reliability - Suitable for industrial production environments with network limitations or high security requirements
- Flexible Configuration - Support custom inference parameters and resource allocation
Offline Deployment Use Cases:
- Production environments requiring low latency and high reliability
- Scenarios with network restrictions or security requirements
- Real-time robot control applications
- Long-term autonomous operation scenarios
Through the Embodiflow Data Platform's inference services, you can seamlessly deploy trained robot learning models to production environments, from cloud validation to edge deployment, achieving complete model application closed loops.