Model Inference

The Embodiflow Data Platform provides comprehensive model inference services, supporting one-click deployment of trained robot learning models as production-grade inference services. The platform supports multiple model formats and flexible deployment methods, providing full-scenario AI inference capabilities from cloud to edge for robot applications.

Product Features

The platform provides a complete pipeline from model training to inference deployment, supporting multiple inference validation and deployment methods:

Inference Method	Application Scenario	Description
Simulation Inference Test	Quick Validation	Use random data or custom inputs to quickly verify model inference functionality and performance
MCAP File Test	Real Data Validation	Use recorded robot demonstration data to verify model inference effects in real scenarios
Offline Edge Deployment	Production Environment Application	Deploy inference services to robot local GPUs for low-latency real-time control

Inference Workflow

The platform provides a productized inference deployment process, implementing complete operations from model selection to production deployment through visual interfaces without requiring programming experience:

1. Model Source Selection

The platform supports multiple model sources:

Use Fine-tuned Model:

Select models trained on the platform from the training task list
Automatically inherit model configuration and parameters from training
Support selecting different checkpoints (last, best, or specific step)
Suitable for deploying models trained on your own data

Upload Custom Model:

Support mainstream model formats:
- SafeTensors - Secure model weight format, recommended
- PyTorch - .pth or .pt format model weights
- ONNX - Standardized model format for cross-platform deployment
Require uploading both model weights and configuration files
Support custom model architectures and parameters

Use Pre-trained Model:

Provide verified pre-trained base models for quick startup
Suitable for transfer learning or fine-tuning scenarios
Include models from HuggingFace and other sources

tip

Model Selection Advice:

For models trained on your data, recommend using fine-tuned models
For quick testing, can use pre-trained models
When deploying external models, ensure format compatibility

New inference service page provides multiple model deployment options

2. Service Configuration and Deployment

Basic Information Configuration

Service Name - Custom name for the inference service for easy identification
Service Description - Optional description of the service purpose
Project Association - Associate service with specific project for management

Inference Parameter Configuration

Configure inference parameters based on model requirements:

Inference Precision - Select precision type for inference (bfloat16 or float32)
Batch Size - Batch size for batch inference
Max Sequence Length - For models supporting sequences, limit max sequence length
Other Model-Specific Parameters - Display relevant configuration options based on model type

Resource Configuration

Computing Resources:

Automatically detect available GPU resources
Support selecting specific GPU or multi-GPU deployment
Support CUDA, MPS (Apple Silicon) and other platforms
Automatically fallback to CPU when no GPU available (lower performance)

Container Configuration:

Each inference service runs in independent Docker container
Automatically assign port numbers (range 28000-28999)
Support GPU passthrough for high-performance inference
Automatic container management, no manual operation required

Service Deployment

After configuration is complete, click "Deploy" button:

System automatically creates Docker container
Loads model weights and configuration
Starts inference service (takes about 20-30 seconds)
Automatically performs health check to ensure service is normal

After deployment completion, inference service will automatically start and maintain running status, can immediately perform inference testing.

Service Management

After deployment completion, inference services provide comprehensive status monitoring and management functionality:

Service Information:

Host Address and Port - HTTP and WebSocket access addresses for inference API
Service Status - Real-time display of service running status (running, stopped, error, etc.)
Container Information - Docker container ID and running status
Creation Time - Service creation and last update time

Resource Monitoring:

CPU Usage - Real-time display of CPU usage
Memory Usage - Display memory usage and peak values
GPU Usage - If using GPU, display GPU utilization and memory usage
Network IO - Display network traffic statistics

Service Control:

Start/Stop - Can start or stop inference service at any time
Restart Service - Restart service to apply configuration changes
Delete Service - Delete unnecessary inference services to free resources

tip

Service Management Recommendations:

After deployment, recommend waiting 20-30 seconds to ensure service fully starts
Regularly check resource usage to avoid resource exhaustion
Services not used for long time can be stopped to free resources

Inference service detail page shows service status and configuration information

Model Input/Output Specifications

Inference services have intelligent adaptation capabilities, automatically recognizing and adapting to different models' input/output requirements:

Image Input - Intelligent adaptation of camera count (one or multiple views) and resolution (automatic scaling)
State Input - observation.state [12], observation.gripper [2], observation.score [1]
Action Output - action [12] robot joint control commands

info

The above information displays the complete configuration of inference services, helping users understand model input/output requirements to ensure correct use of inference functionality.

Inference Testing Features

Simulation Inference Test

Simulation inference functionality provides convenient inference service validation methods, allowing quick model testing without preparing real data:

Simulation inference page supports random data generation and inference testing

Feature Highlights:

Natural Language Task Input - Input robot execution instructions, such as "Pick up the apple and place it in the basket"
- Support Chinese and English natural language instructions
- System automatically performs language encoding
Intelligent Data Generation - One-click random filling of test data for quick test input generation
- Automatically generate image data meeting model requirements (random pixels or placeholder images)
- Automatically fill joint state values (observation.state)
- Automatically fill gripper state (observation.gripper)
- All data formats automatically adapt to model input requirements
Instant Inference Execution - Click "Send" button to immediately get model inference results
- Real-time display of inference progress
- Quick return of inference results
- Support multiple consecutive tests
Performance Indicator Display - Real-time display of key performance indicators
- Request Time - Total time from sending request to receiving response (including network transmission)
- Inference Time - Actual inference computation time of model
- Data Transfer Time - Time for data upload and download
- Help evaluate model performance and system latency
Result Visualization - Inference results displayed in intuitive way
- Display predicted joint positions (action)
- Display gripper control commands
- Support result export and saving

tip

Simulation Inference Use Cases:

Quickly verify if model service starts normally
Test if model input/output format is correct
Evaluate inference service response speed
Verify natural language instruction processing capability

MCAP File Test

MCAP file test functionality supports using real robot demonstration data for inference validation, which is the best way to evaluate model performance in real scenarios:

MCAP file test page supports using real data for inference validation

Feature Highlights:

Data File Upload - Select MCAP data files containing complete robot operation processes
- Support directly selecting from platform datasets
- Support local MCAP file upload
- Automatically validate file format and integrity
Intelligent Data Parsing - System automatically extracts and maps multimodal data
- Image Sequence Extraction - Automatically identify and extract camera image topics
- Joint State Extraction - Extract joint state data (joint_states)
- Sensor Data Extraction - Extract other sensor data (such as gripper state, etc.)
- Timestamp Alignment - Automatically align timestamps from different data sources
Input Mapping Configuration - Flexibly configure mapping relationship between model inputs and MCAP data
- Image Input Mapping - Select which camera topics in MCAP to map to model inputs
- State Input Mapping - Configure mapping of joint state, gripper state and other data
- Task Description - Set natural language task description for entire sequence
- Default Value Settings - Can set default values for missing data
Sequential Batch Inference - Perform continuous inference on complete action sequences
- Support frame-by-frame sequence inference or time-interval inference
- Can set start frame and end frame for inference
- Support skipping certain frames to improve inference speed
- Real-time display of inference progress and completed frame count
Effect Comparison Analysis - Quantitatively compare and evaluate inference results with original expert demonstrations
- Action Comparison - Compare differences between inferred actions and expert demonstration actions
- Trajectory Visualization - Visualize predicted trajectories and real trajectories
- Error Statistics - Calculate statistical indicators like action error, position error
- Performance Evaluation - Evaluate model performance on real data
Result Export - Support exporting inference results for further analysis
- Export inference action sequences
- Export comparison analysis reports
- Export visualization results

tip

MCAP Test Recommendations:

Use MCAP files from scenarios similar to training data for testing
For long sequences, can test in segments to save time
Focus on action error and trajectory consistency to judge model performance
Compare inference effects of different checkpoints to select best model

Offline Edge Deployment

Offline deployment page provides complete edge device deployment solutions

Offline edge deployment functionality completely migrates inference services to robot local GPU devices for production-grade applications:

Standardized Deployment Process

The platform provides complete offline deployment solution, including detailed deployment steps and required files:

1. Environment Preparation

Install Docker and nvidia-docker2 on robot controllers (if using GPU)
Ensure sufficient storage space to download Docker images and model files
Install Python 3.8+ and necessary dependency packages (if needed)

2. Image Download

Platform provides download links for complete Docker images containing inference environment, model weights, and configurations
Images contain all necessary dependencies and runtime environments
Support multiple architectures (x86_64, ARM, etc.)

3. Model File Preparation

Download model weight files and configuration files
Platform provides pre-packaged model files containing checkpoints and configurations
Support multiple model formats (PyTorch, ONNX, etc.)

4. Service Startup

Use provided Docker commands to start inference service locally
Support GPU acceleration (if hardware supports)
Automatically configure ports and network

5. Client Connection

Run ROS client scripts provided by platform
Establish real-time communication with inference service (WebSocket + BSON protocol)
Subscribe to sensor topics, publish joint control commands

6. Verification Testing

Test inference service functionality
Verify real-time inference performance
Ensure closed-loop control works correctly

Production Application Advantages

Edge Computing Architecture - Inference executes locally on robots, completely eliminating network latency and dependencies
Deep ROS Integration - Seamlessly subscribe to sensor topics and directly publish joint control commands
Real-time Closed-loop Control - Support high-frequency (2-10Hz) perception-decision-execution loops
Industrial-grade Reliability - Suitable for industrial production environments with network limitations or high security requirements
Flexible Configuration - Support custom inference parameters and resource allocation

tip

Offline Deployment Use Cases:

Production environments requiring low latency and high reliability
Scenarios with network restrictions or security requirements
Real-time robot control applications
Long-term autonomous operation scenarios

Through the Embodiflow Data Platform's inference services, you can seamlessly deploy trained robot learning models to production environments, from cloud validation to edge deployment, achieving complete model application closed loops.

Product Features​

Inference Workflow​

1. Model Source Selection​

2. Service Configuration and Deployment​

Basic Information Configuration​

Inference Parameter Configuration​

Resource Configuration​

Service Deployment​

Service Management​

Model Input/Output Specifications​

Inference Testing Features​

Simulation Inference Test​

MCAP File Test​

Offline Edge Deployment​

Standardized Deployment Process​

Production Application Advantages​