Model Training

The Embodiflow Data Platform provides comprehensive robot learning model training capabilities, supporting end-to-end workflows from data preprocessing to model deployment. The platform integrates various mainstream robot learning algorithms, providing researchers and developers with an efficient model training environment.

Product Features

Flexible Architecture

The product adopts a layered architecture design to ensure system scalability. Training computing power supports multiple choices:

Private Cloud: Use local data center GPU servers (support multi-GPU parallel training)
Public Cloud: On-demand rental of cloud service provider computing resources (billed by actual training duration)

Select Training Location

From Data to Model

The platform covers the complete data pipeline from data collection, annotation, export, training fine-tuning, to model deployment.

Supported Model Types

The platform supports mainstream learning models in the robotics field, covering vision-language-action fusion, imitation learning, reinforcement learning, and other technical approaches:

Vision-Language-Action Models

SmolVLA - Lightweight multimodal model that performs end-to-end learning of natural language instructions, visual perception, and robot actions
OpenVLA - Large-scale pre-trained vision-language-action model supporting complex scene understanding and operation planning

Imitation Learning Models

ACT (Action Chunking Transformer) - Transformer-based action chunking model that decomposes continuous action sequences into discrete chunks for learning
PI0 - Zero-order policy optimization algorithm that quickly learns initial policies through expert demonstration data
PI0Fast - Optimized version of PI0 algorithm with improved training strategies for faster convergence

Policy Learning Models

Diffusion Policy - Policy learning based on diffusion processes, generating continuous robot action trajectories through denoising
VQBET - Vector quantized behavior transformer that discretizes continuous action spaces and models them using Transformers

Reinforcement Learning Models

SAC (Soft Actor-Critic) - Maximum entropy reinforcement learning algorithm that balances exploration and exploitation in continuous action spaces
TDMPC - Temporal difference model predictive control, combining advantages of model-based planning and model-free learning

info

The above models cover mainstream technical approaches and can be applied to various robot tasks, for example:

Application Scenario	Model Used	Description
Desktop Organization	SmolVLA, PI0	Robots can understand natural language instructions like "please organize the items on the desk" and execute grasping, moving, and placing actions
Item Sorting	ACT	Through learning expert sorting demonstrations, robots can identify different items and sort them by category
Complex Operation Tasks	Diffusion Policy	Robots can learn to execute complex operation sequences requiring precise control, such as assembly and cooking
Adaptive Control	SAC and other RL algorithms	Robots can learn optimal control strategies in dynamic environments and adapt to environmental changes

Training Workflow

The platform provides a productized training process that enables complete operations from data preparation to model deployment through web interfaces without requiring coding skills:

1. Data Preparation

Select Training Data

The platform supports multiple data sources, including:

Platform Export Data - Use robot demonstration data annotated and exported by the platform
External Datasets - Import public datasets through URL links
Local Data Upload - Support standard formats like HDF5, LeRobot
HuggingFace Datasets - Directly obtain public data from HuggingFace Hub

2. Training Configuration

Computing Resource Selection

The platform supports flexible computing resource selection to meet training needs of different scales:

Training Location Selection:

Local GPU (local-gpu) - Use local data center GPU servers, suitable for long-term training tasks and private deployment
- Support multi-GPU parallel training
- Real-time display of GPU status (memory usage, temperature, utilization)
- Suitable for large-scale datasets and long-duration training
Public Cloud Resources - On-demand rental of cloud service provider computing resources, billed by actual training duration
- RunPod - Support GPU container rapid deployment
- AWS EC2/SageMaker/Batch - Amazon cloud service integration
- Tencent Cloud/Alibaba Cloud - Domestic cloud service provider support
- Suitable for temporary training tasks or resource expansion needs

Platform Auto Detection:

CUDA Platform - Automatically detect NVIDIA GPU, support CUDA-accelerated training
MPS Platform - Support Apple Silicon (M1/M2, etc.) Metal Performance Shaders acceleration
CPU Platform - Automatically fallback to CPU training when no GPU available (slower, suitable for small-scale testing)

GPU Selection and Monitoring:

Before training, can view available GPU list and real-time status
Support manual selection of specific GPU or multi-GPU parallel
Real-time monitoring of GPU utilization, memory usage, temperature and other information
Automatically optimize memory allocation to avoid resource waste

Select Training Location

Model Architecture Selection

Choose appropriate models based on specific task requirements:

For tasks requiring natural language instruction understanding, choose SmolVLA or OpenVLA
For imitation learning tasks with expert demonstration data, choose ACT, PI0, or PI0Fast
For tasks requiring online learning, choose SAC or TDMPC

Training Parameter Settings

The platform provides rich training parameter configuration options to support specific needs of different models:

Common Training Parameters:

batch_size (Batch Size) - Control number of samples used per training iteration, recommended range 1-32. Larger batches improve training stability but require more memory
steps (Training Steps) - Total number of training steps, recommended starting from 10000, adjust based on validation results
seed (Random Seed) - Ensure training result reproducibility, recommended using fixed values like 1000, 42
num_workers (Data Loader Worker Count) - Accelerate data loading, recommended set to 1/2 to 1 times CPU core count
eval_freq (Evaluation Frequency) - Perform model evaluation every N steps, recommended 10% of total steps
log_freq (Log Frequency) - Print training logs every N steps, recommended 10 to 100 steps
save_freq (Save Frequency) - Save checkpoint every N steps, recommended 30% of total steps
save_checkpoint (Whether to Save Checkpoint) - Enable to save model checkpoints for resuming training or deployment

Optimizer Parameters:

optimizer_lr (Learning Rate) - Control parameter update magnitude, recommended range 1e-4 to 1e-5. Too large causes unstable training, too small slows convergence
optimizer_weight_decay (Weight Decay) - Regularization parameter to prevent overfitting, recommended range 0.0 to 0.01
optimizer_grad_clip_norm (Gradient Clipping Threshold) - Prevent gradient explosion, recommended set to 1.0
scheduler_warmup_steps (Learning Rate Warmup Steps) - Gradually increase learning rate in early training, recommended 5-10% of total steps
scheduler_decay_steps (Learning Rate Decay Steps) - Reduce learning rate in late training, recommended 80-90% of total steps

Model-Specific Parameters:

Different models support their own specific parameters:

ACT Model:
- chunk_size (Action Chunk Size) - Length of action sequence predicted at once, recommended range 10-50
- n_obs_steps (Observation History Steps) - Number of historical observation frames used, mostly use 1
- n_action_steps (Execution Steps) - Number of actual action steps executed, usually equals chunk_size
- vision_backbone (Vision Backbone) - Optional resnet18/34/50/101/152
- dim_model (Model Dimension) - Main hidden dimension of Transformer, default 512
- n_heads (Attention Head Count) - Number of multi-head attention heads, default 8
Diffusion Policy Model:
- horizon (Prediction Time Span) - Action prediction length of diffusion model, recommended 16
- num_inference_steps (Inference Steps) - Number of sampling steps, recommended 10
SmolVLA/OpenVLA Model:
- max_input_seq_len (Max Input Sequence Length) - Limit input token count, recommended 256-512
- max_decoding_steps (Max Decoding Steps) - Maximum iteration count for generating action sequences, recommended 256
- freeze_lm_head (Freeze Language Model Head) - Recommended to enable during fine-tuning
- freeze_vision_encoder (Freeze Vision Encoder) - Recommended to enable during fine-tuning
SAC and other Reinforcement Learning Models:
- latent_dim (Latent Space Dimension) - Encoder output dimension, recommended 256

tip

Parameter Setting Recommendations:

First training recommend using default parameters to ensure normal training
Adjust batch_size based on GPU memory size to avoid memory overflow
For pre-trained model fine-tuning, recommend lowering learning rate (1e-5) and freezing some layers
Regularly check training logs and adjust learning rate based on loss curves

Training Parameter Settings

After training starts, the platform provides complete real-time monitoring and management functionality:

3. Training Execution and Monitoring

Real-time Monitoring

Training Metrics Visualization:

Loss Function Curves - Real-time display of training loss and validation loss for judging model convergence
Validation Accuracy Metrics - Display model performance on validation set
Learning Rate Changes - Visualize execution of learning rate scheduling strategy
Training Progress - Display completed steps, total steps, estimated remaining time and other information

Model Output Preview:

Periodically output prediction samples during training
Visualize model prediction results on validation data
Facilitate observing model learning progress and discovering potential issues

System Logs:

Detailed training log records including detailed information of each training step
Real-time display of errors and warnings for quick problem location
Support real-time streaming of logs, can view latest training status at any time

Resource Monitoring:

Real-time monitoring of GPU utilization and memory usage
Track CPU and memory usage
Monitor network IO and disk IO (if applicable)

Training detail page provides real-time training monitoring and management functions

Training Management

Process Control:

Pause Training - Temporarily pause training process, preserve current progress
Resume Training - Resume paused training, continue from last checkpoint
Stop Training - Safely stop training, save current checkpoint
Training state changes are synchronized in real-time

Checkpoint Management:

Checkpoint Information - Display all saved checkpoints with creation time, step count, file size
Checkpoint Operations - View checkpoint details, download checkpoint files, delete old checkpoints
Checkpoint Selection - When deploying inference, select checkpoint to use (last, best, or specific checkpoint)
Support checkpoint version management for easy comparison and rollback

Task Operations:

View Details - View complete training configuration and parameter settings
Export Configuration - Export training configuration for reuse or sharing
Clone Task - Quickly create new training task based on current configuration
Delete Task - Clean up completed or failed training tasks

Task Replication - Quickly create new tasks based on successful training configurations

4. Model Evaluation and Export

After training completion, the platform provides complete model evaluation, export and deployment functionality:

Model Evaluation

Performance Metrics:

Automatically calculate various performance metrics of model on validation set
Support multiple evaluation metrics: accuracy, success rate, action error, etc.
Provide model performance reports and comparative analysis

Model Comparison:

Compare model performance of different training tasks
Visualize comparative charts of multiple model metrics
Help select best performing model version

Checkpoint Management

During and after training, all saved checkpoints are displayed on the training detail page:

Model Checkpoint List

Checkpoint Information:

Checkpoint Name - Auto-generated or custom checkpoint name (such as "step_1000", "last", etc.)
Training Steps - Training step count corresponding to this checkpoint
Save Time - Timestamp when checkpoint was saved
File Size - Size of checkpoint file
Performance Metrics - Performance of this checkpoint on validation set

Checkpoint Operations:

View Details - View detailed information and evaluation results of checkpoint
Download Checkpoint - Download checkpoint file to local for offline deployment or further analysis
Mark as Best - Mark best performing checkpoint as best model
Deploy Inference - One-click deploy as inference service directly from checkpoint (see next chapter for details)

info

Checkpoint Notes:

last - Last saved checkpoint, usually the latest model state
best - Best performing checkpoint on validation set, usually used for production deployment
step_xxx - Checkpoints saved by training steps, can be used to analyze training process

Model Export

After training completion, models can be exported for:

Offline deployment to robot local
Integration with other systems
Model version management and archiving

With this, you can conveniently train your specialized models using the Embodiflow Data Platform and complete model deployment and real machine inference in the next chapter.

Product Features​

Flexible Architecture​

From Data to Model​

Supported Model Types​

Vision-Language-Action Models​

Imitation Learning Models​

Policy Learning Models​

Reinforcement Learning Models​

Training Workflow​

1. Data Preparation​

2. Training Configuration​

Computing Resource Selection​

Model Architecture Selection​

Training Parameter Settings​

3. Training Execution and Monitoring​

Real-time Monitoring​

Training Management​

4. Model Evaluation and Export​

Model Evaluation​

Checkpoint Management​

Model Export​

Product Features

Flexible Architecture

From Data to Model

Supported Model Types

Vision-Language-Action Models

Imitation Learning Models

Policy Learning Models

Reinforcement Learning Models

Training Workflow

1. Data Preparation

2. Training Configuration

Computing Resource Selection

Model Architecture Selection

Training Parameter Settings

3. Training Execution and Monitoring

Real-time Monitoring

Training Management

4. Model Evaluation and Export

Model Evaluation

Checkpoint Management

Model Export