Skip to main content

Model Training

The Embodiflow Data Platform provides comprehensive robot learning model training capabilities, supporting end-to-end workflows from data preprocessing to model deployment. The platform integrates various mainstream robot learning algorithms, providing researchers and developers with an efficient model training environment.

Product Features

Flexible Architecture

The product adopts a layered architecture design to ensure system scalability. Training computing power supports multiple choices:

  • Private Cloud: Use local data center GPU servers (support multi-GPU parallel training)
  • Public Cloud: On-demand rental of cloud service provider computing resources (billed by actual training duration)

Select Training Location

From Data to Model

The platform covers the complete data pipeline from data collection, annotation, export, training fine-tuning, to model deployment.

Supported Model Types

The platform supports mainstream learning models in the robotics field, covering vision-language-action fusion, imitation learning, reinforcement learning, and other technical approaches:

Vision-Language-Action Models

  • SmolVLA - Lightweight multimodal model that performs end-to-end learning of natural language instructions, visual perception, and robot actions
  • OpenVLA - Large-scale pre-trained vision-language-action model supporting complex scene understanding and operation planning

Imitation Learning Models

  • ACT (Action Chunking Transformer) - Transformer-based action chunking model that decomposes continuous action sequences into discrete chunks for learning
  • PI0 - Zero-order policy optimization algorithm that quickly learns initial policies through expert demonstration data
  • PI0Fast - Optimized version of PI0 algorithm with improved training strategies for faster convergence

Policy Learning Models

  • Diffusion Policy - Policy learning based on diffusion processes, generating continuous robot action trajectories through denoising
  • VQBET - Vector quantized behavior transformer that discretizes continuous action spaces and models them using Transformers

Reinforcement Learning Models

  • SAC (Soft Actor-Critic) - Maximum entropy reinforcement learning algorithm that balances exploration and exploitation in continuous action spaces
  • TDMPC - Temporal difference model predictive control, combining advantages of model-based planning and model-free learning

info

The above models cover mainstream technical approaches and can be applied to various robot tasks, for example:

Application ScenarioModel UsedDescription
Desktop OrganizationSmolVLA, PI0Robots can understand natural language instructions like "please organize the items on the desk" and execute grasping, moving, and placing actions
Item SortingACTThrough learning expert sorting demonstrations, robots can identify different items and sort them by category
Complex Operation TasksDiffusion PolicyRobots can learn to execute complex operation sequences requiring precise control, such as assembly and cooking
Adaptive ControlSAC and other RL algorithmsRobots can learn optimal control strategies in dynamic environments and adapt to environmental changes

Training Workflow

The platform provides a productized training process that enables complete operations from data preparation to model deployment through web interfaces without requiring coding skills:

1. Data Preparation

Select Training Data

The platform supports multiple data sources, including:

  • Platform Export Data - Use robot demonstration data annotated and exported by the platform Export Training Data
  • External Datasets - Import public datasets through URL links
  • Local Data Upload - Support standard formats like HDF5, LeRobot
  • HuggingFace Datasets - Directly obtain public data from HuggingFace Hub

2. Training Configuration

Computing Resource Selection

The platform supports flexible computing resource selection to meet training needs of different scales:

Training Location Selection:

  • Local GPU (local-gpu) - Use local data center GPU servers, suitable for long-term training tasks and private deployment

    • Support multi-GPU parallel training
    • Real-time display of GPU status (memory usage, temperature, utilization)
    • Suitable for large-scale datasets and long-duration training
  • Public Cloud Resources - On-demand rental of cloud service provider computing resources, billed by actual training duration

    • RunPod - Support GPU container rapid deployment
    • AWS EC2/SageMaker/Batch - Amazon cloud service integration
    • Tencent Cloud/Alibaba Cloud - Domestic cloud service provider support
    • Suitable for temporary training tasks or resource expansion needs

Platform Auto Detection:

  • CUDA Platform - Automatically detect NVIDIA GPU, support CUDA-accelerated training
  • MPS Platform - Support Apple Silicon (M1/M2, etc.) Metal Performance Shaders acceleration
  • CPU Platform - Automatically fallback to CPU training when no GPU available (slower, suitable for small-scale testing)

GPU Selection and Monitoring:

  • Before training, can view available GPU list and real-time status
  • Support manual selection of specific GPU or multi-GPU parallel
  • Real-time monitoring of GPU utilization, memory usage, temperature and other information
  • Automatically optimize memory allocation to avoid resource waste

Select Training Location

Model Architecture Selection

Choose appropriate models based on specific task requirements:

  • For tasks requiring natural language instruction understanding, choose SmolVLA or OpenVLA
  • For imitation learning tasks with expert demonstration data, choose ACT, PI0, or PI0Fast
  • For tasks requiring online learning, choose SAC or TDMPC

Training Parameter Settings

The platform provides rich training parameter configuration options to support specific needs of different models:

Common Training Parameters:

  • batch_size (Batch Size) - Control number of samples used per training iteration, recommended range 1-32. Larger batches improve training stability but require more memory
  • steps (Training Steps) - Total number of training steps, recommended starting from 10000, adjust based on validation results
  • seed (Random Seed) - Ensure training result reproducibility, recommended using fixed values like 1000, 42
  • num_workers (Data Loader Worker Count) - Accelerate data loading, recommended set to 1/2 to 1 times CPU core count
  • eval_freq (Evaluation Frequency) - Perform model evaluation every N steps, recommended 10% of total steps
  • log_freq (Log Frequency) - Print training logs every N steps, recommended 10 to 100 steps
  • save_freq (Save Frequency) - Save checkpoint every N steps, recommended 30% of total steps
  • save_checkpoint (Whether to Save Checkpoint) - Enable to save model checkpoints for resuming training or deployment

Optimizer Parameters:

  • optimizer_lr (Learning Rate) - Control parameter update magnitude, recommended range 1e-4 to 1e-5. Too large causes unstable training, too small slows convergence
  • optimizer_weight_decay (Weight Decay) - Regularization parameter to prevent overfitting, recommended range 0.0 to 0.01
  • optimizer_grad_clip_norm (Gradient Clipping Threshold) - Prevent gradient explosion, recommended set to 1.0
  • scheduler_warmup_steps (Learning Rate Warmup Steps) - Gradually increase learning rate in early training, recommended 5-10% of total steps
  • scheduler_decay_steps (Learning Rate Decay Steps) - Reduce learning rate in late training, recommended 80-90% of total steps

Model-Specific Parameters:

Different models support their own specific parameters:

  • ACT Model:

    • chunk_size (Action Chunk Size) - Length of action sequence predicted at once, recommended range 10-50
    • n_obs_steps (Observation History Steps) - Number of historical observation frames used, mostly use 1
    • n_action_steps (Execution Steps) - Number of actual action steps executed, usually equals chunk_size
    • vision_backbone (Vision Backbone) - Optional resnet18/34/50/101/152
    • dim_model (Model Dimension) - Main hidden dimension of Transformer, default 512
    • n_heads (Attention Head Count) - Number of multi-head attention heads, default 8
  • Diffusion Policy Model:

    • horizon (Prediction Time Span) - Action prediction length of diffusion model, recommended 16
    • num_inference_steps (Inference Steps) - Number of sampling steps, recommended 10
  • SmolVLA/OpenVLA Model:

    • max_input_seq_len (Max Input Sequence Length) - Limit input token count, recommended 256-512
    • max_decoding_steps (Max Decoding Steps) - Maximum iteration count for generating action sequences, recommended 256
    • freeze_lm_head (Freeze Language Model Head) - Recommended to enable during fine-tuning
    • freeze_vision_encoder (Freeze Vision Encoder) - Recommended to enable during fine-tuning
  • SAC and other Reinforcement Learning Models:

    • latent_dim (Latent Space Dimension) - Encoder output dimension, recommended 256
tip

Parameter Setting Recommendations:

  • First training recommend using default parameters to ensure normal training
  • Adjust batch_size based on GPU memory size to avoid memory overflow
  • For pre-trained model fine-tuning, recommend lowering learning rate (1e-5) and freezing some layers
  • Regularly check training logs and adjust learning rate based on loss curves

Training Parameter Settings

After training starts, the platform provides complete real-time monitoring and management functionality:

3. Training Execution and Monitoring

Real-time Monitoring

Training Metrics Visualization:

  • Loss Function Curves - Real-time display of training loss and validation loss for judging model convergence
  • Validation Accuracy Metrics - Display model performance on validation set
  • Learning Rate Changes - Visualize execution of learning rate scheduling strategy
  • Training Progress - Display completed steps, total steps, estimated remaining time and other information

Model Output Preview:

  • Periodically output prediction samples during training
  • Visualize model prediction results on validation data
  • Facilitate observing model learning progress and discovering potential issues

System Logs:

  • Detailed training log records including detailed information of each training step
  • Real-time display of errors and warnings for quick problem location
  • Support real-time streaming of logs, can view latest training status at any time

Resource Monitoring:

  • Real-time monitoring of GPU utilization and memory usage
  • Track CPU and memory usage
  • Monitor network IO and disk IO (if applicable)

Training detail page provides real-time training monitoring and management functions

Training Management

Process Control:

  • Pause Training - Temporarily pause training process, preserve current progress
  • Resume Training - Resume paused training, continue from last checkpoint
  • Stop Training - Safely stop training, save current checkpoint
  • Training state changes are synchronized in real-time

Checkpoint Management:

  • Checkpoint Information - Display all saved checkpoints with creation time, step count, file size
  • Checkpoint Operations - View checkpoint details, download checkpoint files, delete old checkpoints
  • Checkpoint Selection - When deploying inference, select checkpoint to use (last, best, or specific checkpoint)
  • Support checkpoint version management for easy comparison and rollback

Task Operations:

  • View Details - View complete training configuration and parameter settings
  • Export Configuration - Export training configuration for reuse or sharing
  • Clone Task - Quickly create new training task based on current configuration
  • Delete Task - Clean up completed or failed training tasks

Task Replication - Quickly create new tasks based on successful training configurations

4. Model Evaluation and Export

After training completion, the platform provides complete model evaluation, export and deployment functionality:

Model Evaluation

Performance Metrics:

  • Automatically calculate various performance metrics of model on validation set
  • Support multiple evaluation metrics: accuracy, success rate, action error, etc.
  • Provide model performance reports and comparative analysis

Model Comparison:

  • Compare model performance of different training tasks
  • Visualize comparative charts of multiple model metrics
  • Help select best performing model version

Checkpoint Management

During and after training, all saved checkpoints are displayed on the training detail page:

Model Checkpoint List

Checkpoint Information:

  • Checkpoint Name - Auto-generated or custom checkpoint name (such as "step_1000", "last", etc.)
  • Training Steps - Training step count corresponding to this checkpoint
  • Save Time - Timestamp when checkpoint was saved
  • File Size - Size of checkpoint file
  • Performance Metrics - Performance of this checkpoint on validation set

Checkpoint Operations:

  • View Details - View detailed information and evaluation results of checkpoint
  • Download Checkpoint - Download checkpoint file to local for offline deployment or further analysis
  • Mark as Best - Mark best performing checkpoint as best model
  • Deploy Inference - One-click deploy as inference service directly from checkpoint (see next chapter for details)
info

Checkpoint Notes:

  • last - Last saved checkpoint, usually the latest model state
  • best - Best performing checkpoint on validation set, usually used for production deployment
  • step_xxx - Checkpoints saved by training steps, can be used to analyze training process

Model Export

After training completion, models can be exported for:

  • Offline deployment to robot local
  • Integration with other systems
  • Model version management and archiving

With this, you can conveniently train your specialized models using the Embodiflow Data Platform and complete model deployment and real machine inference in the next chapter.