Skip to main content

Model Training

EmbodyFlow Platform provides complete robot learning model training capabilities, supporting an end-to-end workflow from data preprocessing to model deployment. The platform integrates various mainstream robot learning algorithms, providing an efficient model training environment for researchers and developers.

Product Features

Flexible Architecture

The product adopts a layered architecture design to ensure system scalability. Training computing power supports multiple options:

  • Private Cloud: Use GPU servers in a local data center (supports multi-card parallel training)
  • Public Cloud: Rent computing resources from cloud providers on demand (billed based on actual training duration)

Select training location

From Data to Model

The platform covers the entire data chain from data collection, annotation, export, fine-tuning, to model deployment.

Supported Model Types

The platform supports mainstream learning models in the robot field, covering Visual-Language-Action fusion, imitation learning, reinforcement learning, and other technical routes:

Visual-Language-Action Models

  • SmolVLA - Lightweight multimodal model for end-to-end learning of natural language instructions, visual perception, and robot actions.
  • OpenVLA - Large-scale pre-trained visual-language-action model, supporting complex scene understanding and operation planning.

Imitation Learning Models

  • ACT (Action Chunking Transformer) - Action chunking model based on Transformer architecture, decomposing continuous action sequences into discrete chunks for learning.
  • Pi0 / Pi0.5 - Physical Intelligence's flagship open-source VLA model, fine-tuned via the OpenPI framework, with extremely strong general manipulation capabilities. See: Pi0 Fine-tuning Guide
  • Pi0-Fast - Optimized version of the Pi0 algorithm, using an auto-regressive architecture to improve inference speed.

Policy Learning Models

  • Diffusion Policy - Policy learning based on the diffusion process, generating continuous robot action trajectories through a denoising process.
  • VQBET - Vector Quantized Behavior Transformer, discretizing continuous action space and modeling with Transformer.

Reinforcement Learning Models

  • SAC (Soft Actor-Critic) - Maximum entropy reinforcement learning algorithm, balancing exploration and exploitation in continuous action spaces.
  • TDMPC - Temporal Difference Model Predictive Control, combining advantages of model-based planning and model-free learning.

info

The above models cover mainstream technical routes and can be applied to various robot tasks, such as:

Application ScenarioModels UsedDescription
Desk TidyingSmolVLA, Pi0Robot understands "Please tidy the items on the desk" instructions and performs pick, move, place actions
Item SortingACTBy learning expert sorting demonstrations, the robot can identify different items and sort by category
Complex OperationsDiffusion PolicyRobot can learn complex action sequences like assembly or cooking requiring precise control
Adaptive ControlRL algorithms like SACRobot learns optimal control policies in dynamic environments, adapting to environmental changes

Training Workflow

The platform provides a productized training flow, requiring no coding ability. The entire process from data preparation to model deployment is achieved through the web:

1. Data Preparation

Select training data

The platform supports multiple data sources, including:

  • Platform Exported Data - Robot demonstration data annotated and exported using the platform. Export training data
  • External Datasets - Import public datasets via URL links.
  • Local Data Upload - Support standard formats like HDF5 and LeRobot.
  • HuggingFace Datasets - Obtain public data directly from HuggingFace Hub.

2. Training Configuration

Compute Resource Selection

The platform supports flexible compute resource selection to meet different training scales:

Training Location Selection:

  • Local GPU (local-gpu) - Use GPU servers in a local data center, suitable for long-term training and private deployment.

    • Support multi-GPU parallel training.
    • Real-time display of GPU status (VRAM usage, temperature, utilization).
    • Suitable for large datasets and long training sessions.
  • Public Cloud Resources - Rent cloud computing resources on demand, billed by actual training duration.

    • RunPod - Support rapid GPU container deployment.
    • AWS EC2/SageMaker/Batch - Amazon Web Services integration.
    • Tencent Cloud/Alibaba Cloud - Support for major domestic cloud providers.
    • Suitable for temporary training tasks or resource expansion needs.

Automatic Platform Detection:

  • CUDA Platform - Auto-detect NVIDIA GPUs, support CUDA accelerated training.
  • MPS Platform - Support Metal Performance Shaders acceleration for Apple Silicon (M1/M2, etc.).
  • CPU Platform - Fall back to CPU training if no GPU is found (slower, for small tests).

GPU Selection & Monitoring:

  • View available GPU list and real-time status before training.
  • Support manual selection of specific GPU or multi-GPU parallel.
  • Real-time monitoring of GPU utilization, VRAM occupancy, temperature, etc.
  • Automatically optimize VRAM allocation to avoid resource waste.

Select training location

Model Architecture Selection

Choose the appropriate model based on specific task requirements:

  • For tasks requiring NL instruction understanding, choose SmolVLA or OpenVLA.
  • For imitation learning tasks with expert demonstration data, choose ACT, Pi0, or Pi0-Fast.
  • For tasks requiring online learning, choose SAC or TDMPC.

Training Parameter Settings

The platform provides rich training parameter configuration options for different model needs:

General Training Parameters:

  • batch_size - Controls the number of samples per training step, recommended range 1-32. Larger batches improve stability but require more VRAM.
  • steps - Total number of training steps, recommended starting from 10,000, adjusted based on validation results.
  • seed - Ensure reproducibility, recommended using fixed values like 1000, 42.
  • num_workers - Data loader processes, recommended 1/2 to 1x of CPU cores.
  • eval_freq - How often to perform model evaluation, recommended 10% of total steps.
  • log_freq - How often to print training logs, recommended 10 to 100 steps.
  • save_freq - How often to save checkpoints, recommended 30% of total steps.
  • save_checkpoint - Enable to save model checkpoints for recovery or deployment.

Optimizer Parameters:

  • optimizer_lr - Controls parameter update magnitude, recommended range 1e-4 to 1e-5. Too large leads to instability, too small slows convergence.
  • optimizer_weight_decay - Regularization parameter to prevent overfitting, recommended 0.0 to 0.01.
  • optimizer_grad_clip_norm - Prevent gradient explosion, recommended 1.0.
  • scheduler_warmup_steps - Gradually increase LR at start, recommended 5-10% of total steps.
  • scheduler_decay_steps - Decrease LR in later stages, recommended 80-90% of total steps.

Model-specific Parameters:

Different models support their own specific parameters:

  • ACT Model:

    • chunk_size - Predicted action sequence length, recommended 10-50.
    • n_obs_steps - Historical observation frames used, mostly 1.
    • n_action_steps - Actual executed action steps, usually equals chunk_size.
    • vision_backbone - resnet18/34/50/101/152.
    • dim_model - Main hidden dimension for Transformer, default 512.
    • n_heads - Multi-head attention heads, default 8.
  • Model specific (Diffusion Policy):

    • horizon - Diffusion model action prediction length, recommended 16.
    • num_inference_steps - Sampling steps, recommended 10.
  • SmolVLA/OpenVLA Model:

    • max_input_seq_len - Limit input token count, recommended 256-512.
    • max_decoding_steps - Max iterations for generating action sequence, recommended 256.
    • freeze_lm_head - Recommended for fine-tuning.
    • freeze_vision_encoder - Recommended for fine-tuning.
  • RL Models (SAC, etc.):

    • latent_dim - Encoder output dimension, recommended 256.
tip

Parameter Setting Advice:

  • For first training, use default parameters to ensure normal process.
  • Adjust batch_size based on GPU VRAM to avoid OOM.
  • For fine-tuning pre-trained models, recommend lowering LR (1e-5) and freezing some layers.
  • Regularly check logs and adjust LR based on loss curves.

Training parameter settings

Once training starts, the platform provides complete monitoring and management features:

3. Training Execution & Monitoring

After training starts, the platform provides complete real-time monitoring and management:

Real-time Monitoring

Training Metric Visualization:

  • Loss Curve - Real-time display of training and validation loss to judge convergence.
  • Val Accuracy Metrics - Display model performance on the validation set.
  • LR Variation - Visualize LR scheduler strategy execution.
  • Training Progress - Display completed steps, total steps, ETA, etc.

Model Output Preview:

  • Periodically output prediction samples during training.
  • Visualize prediction results on validation data.
  • Easy to observe learning progress and find potential issues.

System Logs:

  • Detailed training logs including info for each training step.
  • Real-time error and warning display for quick problem locating.
  • Support real-time log streaming to view latest status.

Resource Monitoring:

  • Real-time GPU utilization and VRAM occupancy monitoring.
  • CPU and memory usage tracking.
  • Network and disk IO monitoring (if applicable).

Training detail page provides real-time monitoring and management

Training Management

Process Control:

  • Pause - Temporarily pause training task, keeping current progress.
  • Resume - Resume from pause point seamlessly.
  • Stop - Safely stop training, saving current checkpoint.
  • Restart - Re-initiate training task.

Checkpoint Management:

  • Auto-save - Automatically save checkpoints based on set frequency.
  • Checkpoint List - View all saved checkpoints with steps, time, etc.
  • Download - Support downloading checkpoints to local.
  • Resume from Checkpoint - Resume training from any checkpoint after interruption.
  • Version Rollback - Select historical checkpoints for model rollback.

Task Operations:

  • Param Adjustment - View and adjust some parameters during training (use with caution).
  • Task Copy - Quickly create new tasks based on successful configurations.
  • Task Delete - Delete unnecessary tasks to free up storage.
tip

Training Advice:

  • Regularly check logs to find problems early.
  • Adjust LR or stop training based on loss curves.
  • Regularly save checkpoints to avoid data loss from interruptions.
  • Use task copy to quickly try different parameter combinations.

4. Model Evaluation & Export

After training, the platform provides complete evaluation, export, and deployment:

Model Evaluation

Performance Metrics:

  • Automatically calculate performance metrics on validation set.
  • Support various metrics: Accuracy, Success Rate, Action Error, etc.
  • Provide performance reports and comparative analysis.

Model Comparison:

  • Compare performance across different training tasks.
  • Visualize comparative charts of multiple model metrics.
  • Help select the best performing model version.

Checkpoint Management

All saved checkpoints during and after training are displayed on the detail page:

Model checkpoint list

Checkpoint Info:

  • Name - Auto-generated or custom name (e.g., "step_1000", "last").
  • Steps - Training steps for this checkpoint.
  • Time - Save timestamp.
  • File Size - Size of the checkpoint file.
  • Metrics - Performance metrics on the validation set.

Checkpoint Operations:

  • View Details - See detailed info and evaluation results.
  • Download - Download checkpoint to local for offline deployment.
  • Mark as Best - Mark the best performing checkpoint as the Best model.
  • Deploy Inference - One-click deploy checkpoint as an inference service (see next chapter).
info

Checkpoint Explanation:

  • last - Last saved checkpoint, usually the latest state.
  • best - Best performing checkpoint on val set, usually for production.
  • step_xxx - Checkpoints saved at specific steps for analysis.

Model Export

After training, models can be exported for:

  • Offline deployment to robot local.
  • Integration with other systems.
  • Version management and archiving.

At this point, you can use EmbodyFlow Platform to conveniently train your exclusive models. Completed model checkpoints can be directly deployed for inference in the next chapter, achieving a full-process closed loop from training to application.