Model Inference
EmbodyFlow Platform provides complete model inference services, supporting one-click deployment of trained robot learning models into production-grade inference services. The platform supports multiple model formats and flexible deployment methods, providing full-scenario AI inference capabilities from cloud to edge for robot applications.
Product Features
The platform provides a complete chain from model training to inference deployment, supporting various inference verification and deployment methods.
| Inference Method | Application Scenario | Description |
|---|---|---|
| Simulated Inference | Rapid Verification | Use random data or custom input to quickly verify model inference functionality and performance |
| MCAP File Test | Real Data Verification | Use recorded robot demonstration data to verify model inference effects in real scenarios |
| Offline Edge Deployment | Production Application | Deploy inference services to local robot GPUs to achieve low-latency real-time control |
Inference Workflow
The platform provides a productized inference deployment process, achieving complete operations from model selection to production deployment through a visual interface, no programming experience required:
1. Model Source Selection
The platform supports multiple model sources to meet deployment needs in different scenarios:
Use Fine-tuned Model (Recommended):
- Select completed models from training tasks
- Support selecting different checkpoints (recommended to use "last" or "best")
- Automatically inherit model configuration and parameters from training
- No extra configuration needed, can be deployed directly
Upload Custom Model:
- Support mainstream model formats: SafeTensors, PyTorch (.pth, .pt), ONNX, etc.
- Support downloading model files via URL links
- Support ZIP and TAR packages with automatic decompression
- Suitable for external training or third-party model deployment
Use Pre-trained Model:
- Provide verified base models, such as Pi0 Latest, Pi0 0.5, SmolVLA, etc.
- Automatically download and load from model repository
- Quick start, suitable for rapid verification and testing
Model Selection Advice:
- For first-time deployment, it's recommended to use a fine-tuned model to ensure the model matches the training data
- If rapid testing is needed, you can use a pre-trained model
- Custom models need to ensure format compatibility and correct configuration

2. Service Configuration & Deployment
Basic Info Configuration
When creating an inference service, you need to configure the following basic info:
- Service Name - Set an easy-to-identify name for the inference service
- Service Description - Optional, add service purpose or explanation
- Project - Associate the service with a specific project for easy management
- Model Type - Select the model type (e.g., SmolVLA, Pi0, etc.), the system will automatically adapt
Inference Parameter Configuration
Depending on the model type, you can configure the following inference parameters:
- Inference Precision - Select the precision type used for inference (bfloat16 or float32)
- Batch Size - Batch size for batch inference
- Max Sequence Length - For models supporting sequences, limit the maximum sequence length
- Other Model-specific Parameters - Display relevant configuration options based on model type
Resource Configuration
Computing Resources:
- Automatically detect available GPU resources
- Support selecting specific GPU or multi-GPU deployment
- Support platforms like CUDA, MPS (Apple Silicon), etc.
- Automatically fall back to CPU when no GPU is available (lower performance)
Container Configuration:
- Each inference service runs in an independent Docker container
- Automatically assign port numbers (range 28000-28999)
- Support GPU passthrough for high-performance inference
- Automatic container management, no manual operation required
Service Deployment
After configuration is complete, click the "Deploy" button:
- The system automatically creates a Docker container
- Loads model weights and configuration
- Starts the inference service (takes about 20-30 seconds)
- Automatically performs health checks to ensure the service is normal
Once deployment is complete, the inference service will automatically start and stay running, and you can immediately proceed to inference testing.
Service Management
After deployment, the inference service provides complete status monitoring and management functions:
Service Information:
- Host Address and Port - HTTP and WebSocket access addresses for the inference API
- Service Status - Real-time display of service running status (Running, Stopped, Error, etc.)
- Container Info - Docker container ID and running status
- Creation Time - Service creation and last update time
Resource Monitoring:
- CPU Usage - Real-time display of CPU occupancy
- Memory Usage - Display memory occupancy and peak
- GPU Usage - If GPU is used, display GPU utilization and video memory occupancy
- Network IO - Display network traffic statistics
Service Control:
- Start/Stop - You can start or stop the inference service at any time
- Restart Service - Restart the service to apply configuration changes
- Delete Service - Delete unnecessary inference services to free up resources
Service Management Advice:
- After deployment, it's recommended to wait 20-30 seconds to ensure the service is fully started
- Regularly check resource usage to avoid resource exhaustion
- Services not used for a long time can be stopped to free up resources

Model Input/Output Specifications
The inference service has intelligent adaptation capabilities, automatically identifying and adapting to the input/output requirements of different models:
- Image Input - Intelligently adapt camera count (one or more perspectives) and resolution (auto-scaling)
- State Input - observation.state [12], observation.gripper [2], observation.score [1]
- Action Output - action [12] robot joint control commands
The above info shows the complete configuration of the inference service, making it easy for users to understand the model's input and output requirements and ensure correct use of inference functions.
Inference Test Features
Simulated Inference Test
Simulated inference provides a convenient way to verify inference services, allowing quick model testing without preparing real data:

Features:
-
Natural Language Task Input - Enter robot execution commands, such as "Pick up the apple and place it in the basket"
- Support both Chinese and English natural language commands
- System automatically performs language encoding processing
-
Intelligent Data Generation - One-click random filling of test data to quickly generate test inputs
- Automatically generate image data meeting model requirements (random pixels or placeholder images)
- Automatically fill joint state values (observation.state)
- Automatically fill gripper status (observation.gripper)
- All data formats automatically adapt to model input requirements
-
Instant Inference Execution - Click the "Send" button to immediately get model inference results
- Real-time display of inference progress
- Rapid return of inference results
- Support multiple consecutive tests
-
Performance Metrics Display - Real-time display of key performance indicators
- Request Time - Total time from sending the request to receiving the response (including network transmission)
- Inference Time - Actual model inference calculation time
- Data Transfer Time - Time for data upload and download
- Help evaluate model performance and system latency
-
Result Visualization - Inference results displayed intuitively
- Display predicted joint positions (action)
- Display gripper control commands
- Support result export and saving
Simulated Inference Use Cases:
- Quickly verify if the model service is started normally
- Test if the model's input and output formats are correct
- Evaluate the response speed of the inference service
- Verify the processing capability of natural language commands
MCAP File Test
The MCAP file test feature supports using real robot demonstration data for inference verification, which is the best way to evaluate model performance in actual scenarios:

Features:
-
Data File Upload - Select MCAP data files containing complete robot operation processes
- Support direct selection from platform datasets
- Support local upload of MCAP files
- Automatically verify file format and integrity
-
Intelligent Data Parsing - System automatically extracts and maps multimodal data
- Image Sequence Extraction - Automatically identify and extract camera image topics
- Joint State Extraction - Extract joint state data (joint_states)
- Sensor Data Extraction - Extract other sensor data (such as gripper status, etc.)
- Timestamp Alignment - Automatically align timestamps from different data sources
-
Input Mapping Configuration - Flexible configuration of mapping relationships between model inputs and MCAP data
- Image Input Mapping - Select which camera topics in the MCAP map to model inputs
- State Input Mapping - Configure mapping of joint state, gripper status, and other data
- Task Description - Set natural language task description for the entire sequence
- Default Value Setting - For missing data, default values can be set
-
Sequence Batch Inference - Perform continuous inference on complete action sequences
- Support inference by frame sequence or time interval
- Start and end frames for inference can be set
- Support skipping certain frames to improve inference speed
- Real-time display of inference progress and completed frame count
-
Effect Comparison Analysis - Quantitatively compare and evaluate inference results with original expert demonstrations
- Action Comparison - Compare differences between inference actions and expert demonstration actions
- Trajectory Visualization - Visualize predicted trajectories vs. real trajectories
- Error Statistics - Calculate action error, position error, and other statistical metrics
- Performance Evaluation - Evaluate model performance on real data
-
Result Export - Support exporting inference results for further analysis
- Export inference action sequences
- Export comparison analysis reports
- Export visualization results
MCAP Test Advice:
- Use MCAP files with scenarios similar to training data for testing
- For long sequences, segment testing can save time
- Focus on action errors and trajectory consistency to judge model performance
- Compare inference effects of different checkpoints to choose the best model
Offline Edge Deployment

The offline edge deployment feature migrates the complete inference service to the robot's local GPU device for production-level applications:
Standardized Deployment Process
The platform provides a complete offline deployment solution, including detailed deployment steps and required files:
1. Environment Preparation
- Install Docker and nvidia-docker2 (if using GPU) on the robot controller
- Ensure sufficient storage space for downloading Docker images and model files
- Install Python 3.8+ and necessary dependency packages (if needed)
2. Image Download
- The platform provides download links for complete Docker images containing the inference environment, model weights, and configuration
- Images include all necessary dependencies and runtime environments
- Support multiple architectures (x86_64, ARM, etc.)
3. Model File Preparation
- Download model weight files and configuration files
- The platform provides pre-packaged model files, including checkpoints and configurations
- Support multiple model formats (PyTorch, ONNX, etc.)
4. Service Startup
- Start the inference service locally using the provided Docker commands
- Support GPU acceleration (if hardware supported)
- Automatically configure ports and networking
5. Client Connection
- Run the ROS client script provided by the platform
- Establish real-time communication with the inference service (WebSocket + BSON protocol)
- Subscribe to sensor topics and publish joint control commands
6. Verification Testing
- Run test scripts to verify if the service is normal
- Check inference latency and accuracy
- Confirm ROS topic subscription and publishing are normal
Production Application Advantages
Edge Computing Architecture:
- Inference is executed locally on the robot, completely eliminating network latency
- No dependence on external network connections, ensuring availability in offline environments
- Reduce data transmission, lowering network bandwidth requirements
Deep ROS Integration:
- Seamlessly subscribe to ROS sensor topics (e.g., /camera_01/color/image_raw, /joint_states, etc.)
- Directly publish joint control commands to ROS topics (e.g., /joint_cmd)
- Support standard ROS message formats, perfectly integrating with existing ROS systems
Real-time Closed-loop Control:
- Support high-frequency inference (2-10Hz) to meet real-time control needs
- Low-latency inference (usually less than 100ms) for rapid response
- Stable temporal consistency, ensuring control precision
Industrial-grade Reliability:
- Suitable for industrial production environments with network restrictions or high security requirements
- Data doesn't leave the robot locally, meeting data security requirements
- Containerized deployment, easy to manage and maintain
Flexible Configuration:
- Support custom inference parameters
- Inference frequency and batch size can be adjusted
- Support multi-model switching and A/B testing
Offline Deployment Use Cases:
- Real-time robot control in production environments
- Environments with unstable or restricted network
- Application scenarios with extremely high latency requirements
- Security-sensitive scenarios requiring data localization
Through the inference service of the EmbodyFlow Platform, you can seamlessly deploy trained robot learning models into production environments, from cloud verification to edge deployment, achieving a complete model application closed loop.