Skip to main content

Model Training

The Embodiflow Data Platform provides comprehensive robot learning model training capabilities, supporting end-to-end workflows from data preprocessing to model deployment. The platform integrates various mainstream robot learning algorithms, providing researchers and developers with an efficient model training environment.

Product Features

Flexible Architecture

The product adopts a layered architecture design to ensure system scalability. Training computing power supports multiple choices:

  • Private Cloud: Use local data center GPU servers (support multi-GPU parallel training)
  • Public Cloud: On-demand rental of cloud service provider computing resources (billed by actual training duration)

Select Training Location

From Data to Model

The platform covers the complete data pipeline from data collection, annotation, export, training fine-tuning, to model deployment.

Supported Model Types

The platform supports mainstream learning models in the robotics field, covering vision-language-action fusion, imitation learning, reinforcement learning, and other technical approaches:

Vision-Language-Action Models

  • SmolVLA - Lightweight multimodal model that performs end-to-end learning of natural language instructions, visual perception, and robot actions
  • OpenVLA - Large-scale pre-trained vision-language-action model supporting complex scene understanding and operation planning

Imitation Learning Models

  • ACT (Action Chunking Transformer) - Transformer-based action chunking model that decomposes continuous action sequences into discrete chunks for learning
  • PI0 - Zero-order policy optimization algorithm that quickly learns initial policies through expert demonstration data
  • PI0Fast - Optimized version of PI0 algorithm with improved training strategies for faster convergence

Policy Learning Models

  • Diffusion Policy - Policy learning based on diffusion processes, generating continuous robot action trajectories through denoising
  • VQBET - Vector quantized behavior transformer that discretizes continuous action spaces and models them using Transformers

Reinforcement Learning Models

  • SAC (Soft Actor-Critic) - Maximum entropy reinforcement learning algorithm that balances exploration and exploitation in continuous action spaces
  • TDMPC - Temporal difference model predictive control, combining advantages of model-based planning and model-free learning

info

The above models cover mainstream technical approaches and can be applied to various robot tasks, for example:

Application ScenarioModel UsedDescription
Desktop OrganizationSmolVLA, PI0Robots can understand natural language instructions like "please organize the items on the desk" and execute grasping, moving, and placing actions
Item SortingACTThrough learning expert sorting demonstrations, robots can identify different items and sort them by category
Complex Operation TasksDiffusion PolicyRobots can learn to execute complex operation sequences requiring precise control, such as assembly and cooking
Adaptive ControlSAC and other RL algorithmsRobots can learn optimal control strategies in dynamic environments and adapt to environmental changes

Training Workflow

The platform provides a productized training process that enables complete operations from data preparation to model deployment through web interfaces without requiring coding skills:

1. Data Preparation

Select Training Data

The platform supports multiple data sources, including:

  • Platform Export Data - Use robot demonstration data annotated and exported by the platform Export Training Data
  • External Datasets - Import public datasets through URL links
  • Local Data Upload - Support standard formats like HDF5, LeRobot
  • HuggingFace Datasets - Directly obtain public data from HuggingFace Hub

2. Training Configuration

Computing Resource Selection

  • Private Cloud Computing - Use dedicated GPU servers, suitable for long-term training tasks
  • Public Cloud Resources - Support various cloud services like RunPod, AWS, Tencent Cloud, Alibaba Cloud
  • GPU Selection - Real-time display of GPU status including memory usage, temperature, utilization

Model Architecture Selection

Choose appropriate models based on specific task requirements:

  • For tasks requiring natural language instruction understanding, choose SmolVLA or OpenVLA
  • For imitation learning tasks with expert demonstration data, choose ACT, PI0, or PI0Fast
  • For tasks requiring online learning, choose SAC or TDMPC

Training Parameter Settings

  • Basic Parameters - batch_size, training steps, random seed, etc.
  • Optimization Parameters - learning rate, optimizer type, learning rate scheduling strategy
  • Model Parameters - Model-specific parameters like ACT's chunk_size, observation steps
  • Monitoring Parameters - evaluation frequency, log frequency, checkpoint saving strategy

Training Parameter Settings

After training starts, the platform provides complete monitoring and management functions:

Real-time Monitoring

  • Training Metrics - Real-time visualization of key indicators like loss function, validation accuracy, learning rate
  • Model Output - Prediction samples during training to observe model learning progress
  • System Logs - Detailed training logs and error information for troubleshooting

Training detail page provides real-time training monitoring and management functions

Training Management

  • Process Control - Support pause, resume, stop training tasks
  • Checkpoint Management - Automatically save model checkpoints, support resuming training and version rollback
  • Parameter Adjustment - Online adjustment of key parameters like learning rate
  • Task Replication - Quickly create new tasks based on successful training configurations

4. Model Evaluation and Export

After training completion, the platform provides model export and one-click inference deployment:

Model Output


With this, you can conveniently train your specialized models using the Embodiflow Data Platform and complete model deployment and real machine inference in the next chapter.