Model Training
The Embodiflow Data Platform provides comprehensive robot learning model training capabilities, supporting end-to-end workflows from data preprocessing to model deployment. The platform integrates various mainstream robot learning algorithms, providing researchers and developers with an efficient model training environment.
Product Features
Flexible Architecture
The product adopts a layered architecture design to ensure system scalability. Training computing power supports multiple choices:
- Private Cloud: Use local data center GPU servers (support multi-GPU parallel training)
- Public Cloud: On-demand rental of cloud service provider computing resources (billed by actual training duration)

From Data to Model
The platform covers the complete data pipeline from data collection, annotation, export, training fine-tuning, to model deployment.
Supported Model Types
The platform supports mainstream learning models in the robotics field, covering vision-language-action fusion, imitation learning, reinforcement learning, and other technical approaches:
Vision-Language-Action Models
- SmolVLA - Lightweight multimodal model that performs end-to-end learning of natural language instructions, visual perception, and robot actions
- OpenVLA - Large-scale pre-trained vision-language-action model supporting complex scene understanding and operation planning
Imitation Learning Models
- ACT (Action Chunking Transformer) - Transformer-based action chunking model that decomposes continuous action sequences into discrete chunks for learning
- PI0 - Zero-order policy optimization algorithm that quickly learns initial policies through expert demonstration data
- PI0Fast - Optimized version of PI0 algorithm with improved training strategies for faster convergence
Policy Learning Models
- Diffusion Policy - Policy learning based on diffusion processes, generating continuous robot action trajectories through denoising
- VQBET - Vector quantized behavior transformer that discretizes continuous action spaces and models them using Transformers
Reinforcement Learning Models
- SAC (Soft Actor-Critic) - Maximum entropy reinforcement learning algorithm that balances exploration and exploitation in continuous action spaces
- TDMPC - Temporal difference model predictive control, combining advantages of model-based planning and model-free learning
The above models cover mainstream technical approaches and can be applied to various robot tasks, for example:
| Application Scenario | Model Used | Description |
|---|---|---|
| Desktop Organization | SmolVLA, PI0 | Robots can understand natural language instructions like "please organize the items on the desk" and execute grasping, moving, and placing actions |
| Item Sorting | ACT | Through learning expert sorting demonstrations, robots can identify different items and sort them by category |
| Complex Operation Tasks | Diffusion Policy | Robots can learn to execute complex operation sequences requiring precise control, such as assembly and cooking |
| Adaptive Control | SAC and other RL algorithms | Robots can learn optimal control strategies in dynamic environments and adapt to environmental changes |
Training Workflow
The platform provides a productized training process that enables complete operations from data preparation to model deployment through web interfaces without requiring coding skills:
1. Data Preparation

The platform supports multiple data sources, including:
- Platform Export Data - Use robot demonstration data annotated and exported by the platform

- External Datasets - Import public datasets through URL links
- Local Data Upload - Support standard formats like HDF5, LeRobot
- HuggingFace Datasets - Directly obtain public data from HuggingFace Hub
2. Training Configuration
Computing Resource Selection
The platform supports flexible computing resource selection to meet training needs of different scales:
Training Location Selection:
-
Local GPU (local-gpu) - Use local data center GPU servers, suitable for long-term training tasks and private deployment
- Support multi-GPU parallel training
- Real-time display of GPU status (memory usage, temperature, utilization)
- Suitable for large-scale datasets and long-duration training
-
Public Cloud Resources - On-demand rental of cloud service provider computing resources, billed by actual training duration
- RunPod - Support GPU container rapid deployment
- AWS EC2/SageMaker/Batch - Amazon cloud service integration
- Tencent Cloud/Alibaba Cloud - Domestic cloud service provider support
- Suitable for temporary training tasks or resource expansion needs
Platform Auto Detection:
- CUDA Platform - Automatically detect NVIDIA GPU, support CUDA-accelerated training
- MPS Platform - Support Apple Silicon (M1/M2, etc.) Metal Performance Shaders acceleration
- CPU Platform - Automatically fallback to CPU training when no GPU available (slower, suitable for small-scale testing)
GPU Selection and Monitoring:
- Before training, can view available GPU list and real-time status
- Support manual selection of specific GPU or multi-GPU parallel
- Real-time monitoring of GPU utilization, memory usage, temperature and other information
- Automatically optimize memory allocation to avoid resource waste

Model Architecture Selection
Choose appropriate models based on specific task requirements:
- For tasks requiring natural language instruction understanding, choose SmolVLA or OpenVLA
- For imitation learning tasks with expert demonstration data, choose ACT, PI0, or PI0Fast
- For tasks requiring online learning, choose SAC or TDMPC
Training Parameter Settings
The platform provides rich training parameter configuration options to support specific needs of different models:
Common Training Parameters:
- batch_size (Batch Size) - Control number of samples used per training iteration, recommended range 1-32. Larger batches improve training stability but require more memory
- steps (Training Steps) - Total number of training steps, recommended starting from 10000, adjust based on validation results
- seed (Random Seed) - Ensure training result reproducibility, recommended using fixed values like 1000, 42
- num_workers (Data Loader Worker Count) - Accelerate data loading, recommended set to 1/2 to 1 times CPU core count
- eval_freq (Evaluation Frequency) - Perform model evaluation every N steps, recommended 10% of total steps
- log_freq (Log Frequency) - Print training logs every N steps, recommended 10 to 100 steps
- save_freq (Save Frequency) - Save checkpoint every N steps, recommended 30% of total steps
- save_checkpoint (Whether to Save Checkpoint) - Enable to save model checkpoints for resuming training or deployment
Optimizer Parameters:
- optimizer_lr (Learning Rate) - Control parameter update magnitude, recommended range 1e-4 to 1e-5. Too large causes unstable training, too small slows convergence
- optimizer_weight_decay (Weight Decay) - Regularization parameter to prevent overfitting, recommended range 0.0 to 0.01
- optimizer_grad_clip_norm (Gradient Clipping Threshold) - Prevent gradient explosion, recommended set to 1.0
- scheduler_warmup_steps (Learning Rate Warmup Steps) - Gradually increase learning rate in early training, recommended 5-10% of total steps
- scheduler_decay_steps (Learning Rate Decay Steps) - Reduce learning rate in late training, recommended 80-90% of total steps
Model-Specific Parameters:
Different models support their own specific parameters:
-
ACT Model:
chunk_size(Action Chunk Size) - Length of action sequence predicted at once, recommended range 10-50n_obs_steps(Observation History Steps) - Number of historical observation frames used, mostly use 1n_action_steps(Execution Steps) - Number of actual action steps executed, usually equals chunk_sizevision_backbone(Vision Backbone) - Optional resnet18/34/50/101/152dim_model(Model Dimension) - Main hidden dimension of Transformer, default 512n_heads(Attention Head Count) - Number of multi-head attention heads, default 8
-
Diffusion Policy Model:
horizon(Prediction Time Span) - Action prediction length of diffusion model, recommended 16num_inference_steps(Inference Steps) - Number of sampling steps, recommended 10
-
SmolVLA/OpenVLA Model:
max_input_seq_len(Max Input Sequence Length) - Limit input token count, recommended 256-512max_decoding_steps(Max Decoding Steps) - Maximum iteration count for generating action sequences, recommended 256freeze_lm_head(Freeze Language Model Head) - Recommended to enable during fine-tuningfreeze_vision_encoder(Freeze Vision Encoder) - Recommended to enable during fine-tuning
-
SAC and other Reinforcement Learning Models:
latent_dim(Latent Space Dimension) - Encoder output dimension, recommended 256
Parameter Setting Recommendations:
- First training recommend using default parameters to ensure normal training
- Adjust batch_size based on GPU memory size to avoid memory overflow
- For pre-trained model fine-tuning, recommend lowering learning rate (1e-5) and freezing some layers
- Regularly check training logs and adjust learning rate based on loss curves

After training starts, the platform provides complete real-time monitoring and management functionality:
3. Training Execution and Monitoring
Real-time Monitoring
Training Metrics Visualization:
- Loss Function Curves - Real-time display of training loss and validation loss for judging model convergence
- Validation Accuracy Metrics - Display model performance on validation set
- Learning Rate Changes - Visualize execution of learning rate scheduling strategy
- Training Progress - Display completed steps, total steps, estimated remaining time and other information
Model Output Preview:
- Periodically output prediction samples during training
- Visualize model prediction results on validation data
- Facilitate observing model learning progress and discovering potential issues
System Logs:
- Detailed training log records including detailed information of each training step
- Real-time display of errors and warnings for quick problem location
- Support real-time streaming of logs, can view latest training status at any time
Resource Monitoring:
- Real-time monitoring of GPU utilization and memory usage
- Track CPU and memory usage
- Monitor network IO and disk IO (if applicable)

Training Management
Process Control:
- Pause Training - Temporarily pause training process, preserve current progress
- Resume Training - Resume paused training, continue from last checkpoint
- Stop Training - Safely stop training, save current checkpoint
- Training state changes are synchronized in real-time
Checkpoint Management:
- Checkpoint Information - Display all saved checkpoints with creation time, step count, file size
- Checkpoint Operations - View checkpoint details, download checkpoint files, delete old checkpoints
- Checkpoint Selection - When deploying inference, select checkpoint to use (last, best, or specific checkpoint)
- Support checkpoint version management for easy comparison and rollback
Task Operations:
- View Details - View complete training configuration and parameter settings
- Export Configuration - Export training configuration for reuse or sharing
- Clone Task - Quickly create new training task based on current configuration
- Delete Task - Clean up completed or failed training tasks
Task Replication - Quickly create new tasks based on successful training configurations
4. Model Evaluation and Export
After training completion, the platform provides complete model evaluation, export and deployment functionality:
Model Evaluation
Performance Metrics:
- Automatically calculate various performance metrics of model on validation set
- Support multiple evaluation metrics: accuracy, success rate, action error, etc.
- Provide model performance reports and comparative analysis
Model Comparison:
- Compare model performance of different training tasks
- Visualize comparative charts of multiple model metrics
- Help select best performing model version
Checkpoint Management
During and after training, all saved checkpoints are displayed on the training detail page:

Checkpoint Information:
- Checkpoint Name - Auto-generated or custom checkpoint name (such as "step_1000", "last", etc.)
- Training Steps - Training step count corresponding to this checkpoint
- Save Time - Timestamp when checkpoint was saved
- File Size - Size of checkpoint file
- Performance Metrics - Performance of this checkpoint on validation set
Checkpoint Operations:
- View Details - View detailed information and evaluation results of checkpoint
- Download Checkpoint - Download checkpoint file to local for offline deployment or further analysis
- Mark as Best - Mark best performing checkpoint as best model
- Deploy Inference - One-click deploy as inference service directly from checkpoint (see next chapter for details)
Checkpoint Notes:
- last - Last saved checkpoint, usually the latest model state
- best - Best performing checkpoint on validation set, usually used for production deployment
- step_xxx - Checkpoints saved by training steps, can be used to analyze training process
Model Export
After training completion, models can be exported for:
- Offline deployment to robot local
- Integration with other systems
- Model version management and archiving
With this, you can conveniently train your specialized models using the Embodiflow Data Platform and complete model deployment and real machine inference in the next chapter.