Skip to main content

Model Inference

EmbodyFlow Platform provides complete model inference services, supporting one-click deployment of trained robot learning models into production-grade inference services. The platform supports multiple model formats and flexible deployment methods, providing full-scenario AI inference capabilities from cloud to edge for robot applications.

Product Features

The platform provides a complete chain from model training to inference deployment, supporting various inference verification and deployment methods.

Inference MethodApplication ScenarioDescription
Simulated InferenceRapid VerificationUse random data or custom input to quickly verify model inference functionality and performance
MCAP File TestReal Data VerificationUse recorded robot demonstration data to verify model inference effects in real scenarios
Offline Edge DeploymentProduction ApplicationDeploy inference services to local robot GPUs to achieve low-latency real-time control

Inference Workflow

The platform provides a productized inference deployment process, achieving complete operations from model selection to production deployment through a visual interface, no programming experience required:

1. Model Source Selection

The platform supports multiple model sources to meet deployment needs in different scenarios:

Use Fine-tuned Model (Recommended):

  • Select completed models from training tasks
  • Support selecting different checkpoints (recommended to use "last" or "best")
  • Automatically inherit model configuration and parameters from training
  • No extra configuration needed, can be deployed directly

Upload Custom Model:

  • Support mainstream model formats: SafeTensors, PyTorch (.pth, .pt), ONNX, etc.
  • Support downloading model files via URL links
  • Support ZIP and TAR packages with automatic decompression
  • Suitable for external training or third-party model deployment

Use Pre-trained Model:

  • Provide verified base models, such as Pi0 Latest, Pi0 0.5, SmolVLA, etc.
  • Automatically download and load from model repository
  • Quick start, suitable for rapid verification and testing
tip

Model Selection Advice:

  • For first-time deployment, it's recommended to use a fine-tuned model to ensure the model matches the training data
  • If rapid testing is needed, you can use a pre-trained model
  • Custom models need to ensure format compatibility and correct configuration

The new inference service page provides multiple model deployment options

2. Service Configuration & Deployment

Basic Info Configuration

When creating an inference service, you need to configure the following basic info:

  • Service Name - Set an easy-to-identify name for the inference service
  • Service Description - Optional, add service purpose or explanation
  • Project - Associate the service with a specific project for easy management
  • Model Type - Select the model type (e.g., SmolVLA, Pi0, etc.), the system will automatically adapt

Inference Parameter Configuration

Depending on the model type, you can configure the following inference parameters:

  • Inference Precision - Select the precision type used for inference (bfloat16 or float32)
  • Batch Size - Batch size for batch inference
  • Max Sequence Length - For models supporting sequences, limit the maximum sequence length
  • Other Model-specific Parameters - Display relevant configuration options based on model type

Resource Configuration

Computing Resources:

  • Automatically detect available GPU resources
  • Support selecting specific GPU or multi-GPU deployment
  • Support platforms like CUDA, MPS (Apple Silicon), etc.
  • Automatically fall back to CPU when no GPU is available (lower performance)

Container Configuration:

  • Each inference service runs in an independent Docker container
  • Automatically assign port numbers (range 28000-28999)
  • Support GPU passthrough for high-performance inference
  • Automatic container management, no manual operation required

Service Deployment

After configuration is complete, click the "Deploy" button:

  1. The system automatically creates a Docker container
  2. Loads model weights and configuration
  3. Starts the inference service (takes about 20-30 seconds)
  4. Automatically performs health checks to ensure the service is normal

Once deployment is complete, the inference service will automatically start and stay running, and you can immediately proceed to inference testing.

Service Management

After deployment, the inference service provides complete status monitoring and management functions:

Service Information:

  • Host Address and Port - HTTP and WebSocket access addresses for the inference API
  • Service Status - Real-time display of service running status (Running, Stopped, Error, etc.)
  • Container Info - Docker container ID and running status
  • Creation Time - Service creation and last update time

Resource Monitoring:

  • CPU Usage - Real-time display of CPU occupancy
  • Memory Usage - Display memory occupancy and peak
  • GPU Usage - If GPU is used, display GPU utilization and video memory occupancy
  • Network IO - Display network traffic statistics

Service Control:

  • Start/Stop - You can start or stop the inference service at any time
  • Restart Service - Restart the service to apply configuration changes
  • Delete Service - Delete unnecessary inference services to free up resources
tip

Service Management Advice:

  • After deployment, it's recommended to wait 20-30 seconds to ensure the service is fully started
  • Regularly check resource usage to avoid resource exhaustion
  • Services not used for a long time can be stopped to free up resources

The inference service detail page shows service status and configuration info

Model Input/Output Specifications

The inference service has intelligent adaptation capabilities, automatically identifying and adapting to the input/output requirements of different models:

  • Image Input - Intelligently adapt camera count (one or more perspectives) and resolution (auto-scaling)
  • State Input - observation.state [12], observation.gripper [2], observation.score [1]
  • Action Output - action [12] robot joint control commands

info

The above info shows the complete configuration of the inference service, making it easy for users to understand the model's input and output requirements and ensure correct use of inference functions.

Inference Test Features

Simulated Inference Test

Simulated inference provides a convenient way to verify inference services, allowing quick model testing without preparing real data:

The simulated inference page supports random data generation and inference testing

Features:

  • Natural Language Task Input - Enter robot execution commands, such as "Pick up the apple and place it in the basket"

    • Support both Chinese and English natural language commands
    • System automatically performs language encoding processing
  • Intelligent Data Generation - One-click random filling of test data to quickly generate test inputs

    • Automatically generate image data meeting model requirements (random pixels or placeholder images)
    • Automatically fill joint state values (observation.state)
    • Automatically fill gripper status (observation.gripper)
    • All data formats automatically adapt to model input requirements
  • Instant Inference Execution - Click the "Send" button to immediately get model inference results

    • Real-time display of inference progress
    • Rapid return of inference results
    • Support multiple consecutive tests
  • Performance Metrics Display - Real-time display of key performance indicators

    • Request Time - Total time from sending the request to receiving the response (including network transmission)
    • Inference Time - Actual model inference calculation time
    • Data Transfer Time - Time for data upload and download
    • Help evaluate model performance and system latency
  • Result Visualization - Inference results displayed intuitively

    • Display predicted joint positions (action)
    • Display gripper control commands
    • Support result export and saving
tip

Simulated Inference Use Cases:

  • Quickly verify if the model service is started normally
  • Test if the model's input and output formats are correct
  • Evaluate the response speed of the inference service
  • Verify the processing capability of natural language commands

MCAP File Test

The MCAP file test feature supports using real robot demonstration data for inference verification, which is the best way to evaluate model performance in actual scenarios:

The MCAP file test page supports inference verification using real data

Features:

  • Data File Upload - Select MCAP data files containing complete robot operation processes

    • Support direct selection from platform datasets
    • Support local upload of MCAP files
    • Automatically verify file format and integrity
  • Intelligent Data Parsing - System automatically extracts and maps multimodal data

    • Image Sequence Extraction - Automatically identify and extract camera image topics
    • Joint State Extraction - Extract joint state data (joint_states)
    • Sensor Data Extraction - Extract other sensor data (such as gripper status, etc.)
    • Timestamp Alignment - Automatically align timestamps from different data sources
  • Input Mapping Configuration - Flexible configuration of mapping relationships between model inputs and MCAP data

    • Image Input Mapping - Select which camera topics in the MCAP map to model inputs
    • State Input Mapping - Configure mapping of joint state, gripper status, and other data
    • Task Description - Set natural language task description for the entire sequence
    • Default Value Setting - For missing data, default values can be set
  • Sequence Batch Inference - Perform continuous inference on complete action sequences

    • Support inference by frame sequence or time interval
    • Start and end frames for inference can be set
    • Support skipping certain frames to improve inference speed
    • Real-time display of inference progress and completed frame count
  • Effect Comparison Analysis - Quantitatively compare and evaluate inference results with original expert demonstrations

    • Action Comparison - Compare differences between inference actions and expert demonstration actions
    • Trajectory Visualization - Visualize predicted trajectories vs. real trajectories
    • Error Statistics - Calculate action error, position error, and other statistical metrics
    • Performance Evaluation - Evaluate model performance on real data
  • Result Export - Support exporting inference results for further analysis

    • Export inference action sequences
    • Export comparison analysis reports
    • Export visualization results
tip

MCAP Test Advice:

  • Use MCAP files with scenarios similar to training data for testing
  • For long sequences, segment testing can save time
  • Focus on action errors and trajectory consistency to judge model performance
  • Compare inference effects of different checkpoints to choose the best model

Offline Edge Deployment

The offline deployment page provides a complete edge device deployment solution

The offline edge deployment feature migrates the complete inference service to the robot's local GPU device for production-level applications:

Standardized Deployment Process

The platform provides a complete offline deployment solution, including detailed deployment steps and required files:

1. Environment Preparation

  • Install Docker and nvidia-docker2 (if using GPU) on the robot controller
  • Ensure sufficient storage space for downloading Docker images and model files
  • Install Python 3.8+ and necessary dependency packages (if needed)

2. Image Download

  • The platform provides download links for complete Docker images containing the inference environment, model weights, and configuration
  • Images include all necessary dependencies and runtime environments
  • Support multiple architectures (x86_64, ARM, etc.)

3. Model File Preparation

  • Download model weight files and configuration files
  • The platform provides pre-packaged model files, including checkpoints and configurations
  • Support multiple model formats (PyTorch, ONNX, etc.)

4. Service Startup

  • Start the inference service locally using the provided Docker commands
  • Support GPU acceleration (if hardware supported)
  • Automatically configure ports and networking

5. Client Connection

  • Run the ROS client script provided by the platform
  • Establish real-time communication with the inference service (WebSocket + BSON protocol)
  • Subscribe to sensor topics and publish joint control commands

6. Verification Testing

  • Run test scripts to verify if the service is normal
  • Check inference latency and accuracy
  • Confirm ROS topic subscription and publishing are normal

Production Application Advantages

Edge Computing Architecture:

  • Inference is executed locally on the robot, completely eliminating network latency
  • No dependence on external network connections, ensuring availability in offline environments
  • Reduce data transmission, lowering network bandwidth requirements

Deep ROS Integration:

  • Seamlessly subscribe to ROS sensor topics (e.g., /camera_01/color/image_raw, /joint_states, etc.)
  • Directly publish joint control commands to ROS topics (e.g., /joint_cmd)
  • Support standard ROS message formats, perfectly integrating with existing ROS systems

Real-time Closed-loop Control:

  • Support high-frequency inference (2-10Hz) to meet real-time control needs
  • Low-latency inference (usually less than 100ms) for rapid response
  • Stable temporal consistency, ensuring control precision

Industrial-grade Reliability:

  • Suitable for industrial production environments with network restrictions or high security requirements
  • Data doesn't leave the robot locally, meeting data security requirements
  • Containerized deployment, easy to manage and maintain

Flexible Configuration:

  • Support custom inference parameters
  • Inference frequency and batch size can be adjusted
  • Support multi-model switching and A/B testing
info

Offline Deployment Use Cases:

  • Real-time robot control in production environments
  • Environments with unstable or restricted network
  • Application scenarios with extremely high latency requirements
  • Security-sensitive scenarios requiring data localization

Through the inference service of the EmbodyFlow Platform, you can seamlessly deploy trained robot learning models into production environments, from cloud verification to edge deployment, achieving a complete model application closed loop.