Diffusion Policy Model Fine-tuning

Overview

Diffusion Policy is a visuomotor policy learning method based on diffusion models, applying the generative capabilities of diffusion models to the field of robot control. This method generates diverse and high-quality robot action sequences by learning the diffusion process of action distributions, demonstrating excellent performance in complex robot manipulation tasks.

Core Features

Diffusion Generation: Uses diffusion models to generate continuous action sequences
Multimodal Actions: Can handle tasks with multiple solutions
High-Quality Output: Generates smooth and natural robot actions
Strong Robustness: Good robustness to noise and perturbations
Strong Expressiveness: Can learn complex action distributions

Prerequisites

System Requirements

Operating System: Linux (Ubuntu 20.04+ recommended) or macOS
Python Version: 3.8+
GPU: NVIDIA GPU (RTX 3080 or higher recommended), at least 10GB VRAM
Memory: At least 32GB RAM
Storage: At least 50GB available space

Environment Setup

1. Install LeRobot

# Clone LeRobot repository
git clone https://github.com/huggingface/lerobot.git
cd lerobot

# Create virtual environment
conda create -n lerobot python=3.10
conda activate lerobot

# Install dependencies
pip install -e .

2. Install Diffusion Policy-Specific Dependencies

# Install diffusion model related dependencies
pip install diffusers
pip install accelerate
pip install transformers
pip install einops
pip install wandb

# Install numerical computing libraries
pip install scipy
pip install scikit-learn

# Login to Weights & Biases (optional)
wandb login

Diffusion Policy Architecture

Core Components

Vision Encoder: Extracts image features
State Encoder: Processes robot state information
Conditional Encoder: Fuses vision and state information
Diffusion Network: Learns the diffusion process of action distributions
Noise Scheduler: Controls noise levels in the diffusion process

Diffusion Process

Forward Process: Gradually adds noise to action sequences
Reverse Process: Gradually recovers action sequences from noise
Conditional Generation: Generates actions based on observation conditions
Sampling Strategy: Uses DDPM or DDIM sampling

Data Preparation

LeRobot Format Data

Diffusion Policy requires using LeRobot format datasets:

your_dataset/
├── data/
│   ├── chunk-001/
│   │   ├── observation.images.cam_high.png
│   │   ├── observation.images.cam_low.png
│   │   ├── observation.state.npy
│   │   ├── action.npy
│   │   └── ...
│   └── chunk-002/
│       └── ...
├── meta.json
├── stats.safetensors
└── videos/
    ├── episode_000000.mp4
    └── ...

Data Quality Requirements

Minimum 100 episodes for basic training
500+ episodes recommended for optimal results
Action sequences should be smooth and continuous
Include diverse task scenarios
High-quality visual observation data

Fine-tuning Training

Basic Training Command

# Set environment variables
export HF_USER="your-huggingface-username"
export CUDA_VISIBLE_DEVICES=0

# Start Diffusion Policy training
lerobot-train \
  --policy.type diffusion \
  --policy.pretrained_path lerobot/diffusion_policy \
  --dataset.repo_id ${HF_USER}/your_dataset \
  --batch_size 64 \
  --steps 100000 \
  --output_dir outputs/train/diffusion_policy_finetuned \
  --job_name diffusion_policy_finetuning \
  --policy.device cuda \
  --policy.horizon 16 \
  --policy.n_action_steps 8 \
  --policy.n_obs_steps 2 \
  --policy.num_inference_steps 100 \
  --policy.optimizer_lr 1e-4 \
  --policy.optimizer_weight_decay 1e-6 \
  --policy.push_to_hub false \
  --save_checkpoint true \
  --save_freq 10000 \
  --wandb.enable true

Advanced Training Configuration

Multi-Step Prediction Configuration

# Configuration for long sequence prediction
lerobot-train \
  --policy.type diffusion \
  --policy.pretrained_path lerobot/diffusion_policy \
  --dataset.repo_id ${HF_USER}/your_dataset \
  --batch_size 32 \
  --steps 150000 \
  --output_dir outputs/train/diffusion_policy_long_horizon \
  --job_name diffusion_policy_long_horizon \
  --policy.device cuda \
  --policy.horizon 32 \
  --policy.n_action_steps 16 \
  --policy.n_obs_steps 4 \
  --policy.num_inference_steps 100 \
  --policy.beta_schedule squaredcos_cap_v2 \
  --policy.clip_sample true \
  --policy.prediction_type epsilon \
  --policy.optimizer_lr 1e-4 \
  --policy.scheduler_name cosine \
  --policy.scheduler_warmup_steps 5000 \
  --policy.push_to_hub false \
  --save_checkpoint true \
  --wandb.enable true

Memory Optimization Configuration

# For GPUs with smaller VRAM
lerobot-train \
  --policy.type diffusion \
  --policy.pretrained_path lerobot/diffusion_policy \
  --dataset.repo_id ${HF_USER}/your_dataset \
  --batch_size 16 \
  --steps 200000 \
  --output_dir outputs/train/diffusion_policy_memory_opt \
  --job_name diffusion_policy_memory_optimized \
  --policy.device cuda \
  --policy.horizon 16 \
  --policy.n_action_steps 8 \
  --policy.num_inference_steps 50 \
  --policy.optimizer_lr 5e-5 \
  --policy.use_amp true \
  --num_workers 2 \
  --policy.push_to_hub false \
  --save_checkpoint true \
  --wandb.enable true

Parameter Details

Core Parameters

Parameter	Meaning	Recommended Value	Description
`--policy.type`	Policy type	`diffusion`	Diffusion Policy model type
`--policy.pretrained_path`	Pretrained model path	`lerobot/diffusion_policy`	LeRobot official model (optional)
`--dataset.repo_id`	Dataset repository ID	`${HF_USER}/dataset`	Your HuggingFace dataset
`--batch_size`	Batch size	`64`	Adjust based on VRAM, RTX 3080 recommended 32-64
`--steps`	Training steps	`100000`	Diffusion models typically require more training steps
`--output_dir`	Output directory	`outputs/train/diffusion_policy_finetuned`	Model save path
`--job_name`	Job name	`diffusion_policy_finetuning`	For logging and experiment tracking (optional)

Diffusion Policy-Specific Parameters

Parameter	Meaning	Recommended Value	Description
`--policy.horizon`	Prediction horizon	`16`	Length of predicted action sequence
`--policy.n_action_steps`	Execute action steps	`8`	Number of actions executed each time
`--policy.n_obs_steps`	Observation steps	`2`	Number of historical observations
`--policy.num_inference_steps`	Inference steps	`100`	Number of diffusion sampling steps (not effective during training)
`--policy.beta_schedule`	Noise schedule	`squaredcos_cap_v2`	Noise addition scheduling strategy
`--policy.clip_sample`	Sample clipping	`true`	Whether to clip generated samples
`--policy.clip_sample_range`	Clipping range	`1.0`	Range for sample clipping
`--policy.prediction_type`	Prediction type	`epsilon`	Predict noise or sample
`--policy.num_train_timesteps`	Training timesteps	`100`	Number of forward diffusion steps

Network Architecture Parameters

Parameter	Meaning	Recommended Value	Description
`--policy.vision_backbone`	Vision backbone	`resnet18`	Image feature extraction network
`--policy.crop_shape`	Image crop size	`84 84`	Crop size for input images
`--policy.crop_is_random`	Random cropping	`true`	Whether to randomly crop during training
`--policy.use_group_norm`	Use group normalization	`true`	Replace batch normalization
`--policy.spatial_softmax_num_keypoints`	Spatial softmax keypoints	`32`	Number of keypoints in spatial softmax layer
`--policy.down_dims`	Downsampling dimensions	`512 1024 2048`	Dimensions of U-Net downsampling path
`--policy.kernel_size`	Convolution kernel size	`5`	Kernel size for 1D convolution
`--policy.n_groups`	Group normalization groups	`8`	Number of groups in GroupNorm
`--policy.diffusion_step_embed_dim`	Step embedding dimension	`128`	Embedding dimension for diffusion steps

Training Parameters

Parameter	Meaning	Recommended Value	Description
`--policy.optimizer_lr`	Learning rate	`1e-4`	Recommended learning rate for diffusion models
`--policy.optimizer_weight_decay`	Weight decay	`1e-6`	Regularization parameter
`--policy.optimizer_betas`	Adam optimizer beta	`0.95 0.999`	Beta parameters for Adam optimizer
`--policy.optimizer_eps`	Adam epsilon	`1e-8`	Numerical stability parameter
`--policy.scheduler_name`	Learning rate scheduler	`cosine`	Cosine annealing schedule
`--policy.scheduler_warmup_steps`	Warmup steps	`500`	Learning rate warmup
`--policy.use_amp`	Mixed precision	`true`	Saves VRAM
`--num_workers`	Data loading threads	`4`	Adjust based on CPU core count
`--policy.push_to_hub`	Push to Hub	`false`	Whether to upload model to HuggingFace (requires repo_id)
`--save_checkpoint`	Save checkpoints	`true`	Whether to save training checkpoints
`--save_freq`	Save frequency	`10000`	Checkpoint save interval

Training Monitoring and Debugging

Weights & Biases Integration

# Detailed W&B configuration
lerobot-train \
  --policy.type diffusion \
  --dataset.repo_id your-name/your-dataset \
  --batch_size 64 \
  --steps 100000 \
  --policy.push_to_hub false \
  --wandb.enable true \
  --wandb.project diffusion_policy_experiments \
  --wandb.notes "Diffusion Policy training with long horizon" \
  # ... other parameters

Key Metrics Monitoring

Metrics to monitor during training:

Diffusion Loss: Overall loss of the diffusion model
MSE Loss: Mean squared error loss
Learning Rate: Learning rate change curve
Gradient Norm: Gradient norm
Inference Time: Inference time
Sample Quality: Quality of generated samples

Training Visualization

# visualization.py
import torch
import matplotlib.pyplot as plt
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy

def visualize_diffusion_process(model_path, observation):
    # Load model
    policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
    policy.eval()
    
    # Generate diffusion process of action sequence
    with torch.no_grad():
        # Initial noise
        noise = torch.randn(1, policy.horizon, policy.action_dim, device="cuda")
        
        # Diffusion sampling process
        actions_sequence = []
        for t in range(policy.num_inference_steps):
            # Predict noise
            noise_pred = policy.unet(noise, t, observation)
            
            # Update sample
            noise = policy.scheduler.step(noise_pred, t, noise).prev_sample
            
            # Save intermediate results
            if t % 10 == 0:
                actions_sequence.append(noise.cpu().numpy())
        
        final_actions = noise.cpu().numpy()
    
    # Visualize diffusion process
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    
    for i, actions in enumerate(actions_sequence[:6]):
        ax = axes[i//3, i%3]
        ax.plot(actions[0, :, 0], label='Action Dim 0')
        ax.plot(actions[0, :, 1], label='Action Dim 1')
        ax.set_title(f'Diffusion Step {i*10}')
        ax.legend()
    
    plt.tight_layout()
    plt.savefig('diffusion_process.png')
    plt.show()
    
    return final_actions

if __name__ == "__main__":
    model_path = "outputs/train/diffusion_policy_finetuned/checkpoints/last"
    
    # Simulate observation
    observation = {
        "observation.images.cam_high": torch.randn(1, 3, 224, 224, device="cuda"),
        "observation.state": torch.randn(1, 7, device="cuda")
    }
    
    actions = visualize_diffusion_process(model_path, observation)
    print(f"Generated actions shape: {actions.shape}")

Model Evaluation

Offline Evaluation

# offline_evaluation.py
import torch
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from lerobot.datasets.lerobot_dataset import LeRobotDataset

def evaluate_diffusion_policy(model_path, dataset_path):
    # Load model
    policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
    policy.eval()
    
    # Load test dataset
    dataset = LeRobotDataset(dataset_path, split="test")
    
    total_mse_loss = 0
    total_mae_loss = 0
    num_samples = 0
    
    with torch.no_grad():
        for batch in dataset:
            # Model prediction
            prediction = policy(batch)
            
            # Calculate loss
            target_actions = batch['action']
            predicted_actions = prediction['action']
            
            mse_loss = torch.mean((predicted_actions - target_actions) ** 2)
            mae_loss = torch.mean(torch.abs(predicted_actions - target_actions))
            
            total_mse_loss += mse_loss.item()
            total_mae_loss += mae_loss.item()
            num_samples += 1
    
    avg_mse_loss = total_mse_loss / num_samples
    avg_mae_loss = total_mae_loss / num_samples
    
    print(f"Average MSE Loss: {avg_mse_loss:.4f}")
    print(f"Average MAE Loss: {avg_mae_loss:.4f}")
    
    return avg_mse_loss, avg_mae_loss

def evaluate_action_diversity(model_path, observation, num_samples=10):
    # Evaluate action diversity
    policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
    policy.eval()
    
    actions_list = []
    
    with torch.no_grad():
        for _ in range(num_samples):
            prediction = policy(observation)
            actions_list.append(prediction['action'].cpu().numpy())
    
    actions_array = np.array(actions_list)  # [num_samples, horizon, action_dim]
    
    # Calculate action diversity metric
    action_std = np.std(actions_array, axis=0)  # [horizon, action_dim]
    avg_std = np.mean(action_std)
    
    print(f"Average action standard deviation: {avg_std:.4f}")
    
    return avg_std, actions_array

if __name__ == "__main__":
    model_path = "outputs/train/diffusion_policy_finetuned/checkpoints/last"
    dataset_path = "path/to/your/test/dataset"
    
    # Offline evaluation
    evaluate_diffusion_policy(model_path, dataset_path)
    
    # Diversity evaluation
    observation = {
        "observation.images.cam_high": torch.randn(1, 3, 224, 224, device="cuda"),
        "observation.state": torch.randn(1, 7, device="cuda")
    }
    
    evaluate_action_diversity(model_path, observation)

Online Evaluation (Robot Environment)

# robot_evaluation.py
import torch
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy

class DiffusionPolicyController:
    def __init__(self, model_path, num_inference_steps=50):
        self.policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
        self.policy.eval()
        self.num_inference_steps = num_inference_steps
        self.action_queue = []
        self.current_obs_history = []
        
    def get_action(self, observations):
        # Update observation history
        self.current_obs_history.append(observations)
        if len(self.current_obs_history) > self.policy.n_obs_steps:
            self.current_obs_history.pop(0)
        
        # If action queue is empty or needs replanning, generate new action sequence
        if len(self.action_queue) == 0 or self.should_replan():
            with torch.no_grad():
                # Build input
                batch = self.prepare_observation_batch()
                
                # Set inference steps
                self.policy.scheduler.set_timesteps(self.num_inference_steps)
                
                # Generate action sequence
                prediction = self.policy(batch)
                actions = prediction['action'].cpu().numpy()[0]  # [horizon, action_dim]
                
                # Update action queue
                self.action_queue = list(actions[:self.policy.n_action_steps])
        
        # Return next action
        return self.action_queue.pop(0)
    
    def should_replan(self):
        # Simple replanning strategy: replan when action queue has less than half remaining
        return len(self.action_queue) < self.policy.n_action_steps // 2
    
    def prepare_observation_batch(self):
        batch = {}
        
        # Process image observations
        if "observation.images.cam_high" in self.current_obs_history[-1]:
            images = []
            for obs in self.current_obs_history:
                image = obs["observation.images.cam_high"]
                image_tensor = self.preprocess_image(image)
                images.append(image_tensor)
            
            # If history is insufficient, repeat last observation
            while len(images) < self.policy.n_obs_steps:
                images.insert(0, images[0])
            
            batch["observation.images.cam_high"] = torch.stack(images).unsqueeze(0)
        
        # Process state observations
        if "observation.state" in self.current_obs_history[-1]:
            states = []
            for obs in self.current_obs_history:
                state = torch.tensor(obs["observation.state"], dtype=torch.float32)
                states.append(state)
            
            # If history is insufficient, repeat last state
            while len(states) < self.policy.n_obs_steps:
                states.insert(0, states[0])
            
            batch["observation.state"] = torch.stack(states).unsqueeze(0)
        
        return batch
    
    def preprocess_image(self, image):
        # Image preprocessing logic
        image_tensor = torch.tensor(image).permute(2, 0, 1).float() / 255.0
        return image_tensor

# Usage example
if __name__ == "__main__":
    controller = DiffusionPolicyController(
        model_path="outputs/train/diffusion_policy_finetuned/checkpoints/last",
        num_inference_steps=50
    )
    
    # Simulate robot control loop
    for step in range(100):
        # Get current observation
        observations = {
            "observation.images.cam_high": np.random.randint(0, 255, (224, 224, 3)),
            "observation.state": np.random.randn(7)
        }
        
        # Get action
        action = controller.get_action(observations)
        
        # Execute action
        print(f"Step {step}: Action = {action}")
        
        # This should send the action to the actual robot
        # robot.execute_action(action)

Deployment and Optimization

Inference Acceleration

# fast_inference.py
import torch
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from diffusers import DDIMScheduler

class FastDiffusionInference:
    def __init__(self, model_path, num_inference_steps=10):
        self.policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
        self.policy.eval()
        
        # Use DDIM scheduler for fast sampling
        self.policy.scheduler = DDIMScheduler.from_config(self.policy.scheduler.config)
        self.num_inference_steps = num_inference_steps
        
        # Warmup model
        self.warmup()
    
    def warmup(self):
        # Warmup model with dummy data
        dummy_batch = {
            "observation.images.cam_high": torch.randn(1, 2, 3, 224, 224, device="cuda"),
            "observation.state": torch.randn(1, 2, 7, device="cuda")
        }
        
        with torch.no_grad():
            for _ in range(5):
                _ = self.predict(dummy_batch)
    
    @torch.no_grad()
    def predict(self, observations):
        # Set inference steps
        self.policy.scheduler.set_timesteps(self.num_inference_steps)
        
        # Fast inference
        prediction = self.policy(observations)
        return prediction['action'].cpu().numpy()

if __name__ == "__main__":
    fast_inference = FastDiffusionInference(
        "outputs/train/diffusion_policy_finetuned/checkpoints/last",
        num_inference_steps=10
    )
    
    # Test inference speed
    import time
    
    observations = {
        "observation.images.cam_high": torch.randn(1, 2, 3, 224, 224, device="cuda"),
        "observation.state": torch.randn(1, 2, 7, device="cuda")
    }
    
    start_time = time.time()
    for _ in range(100):
        action = fast_inference.predict(observations)
    end_time = time.time()
    
    avg_inference_time = (end_time - start_time) / 100
    print(f"Average inference time: {avg_inference_time:.4f} seconds")
    print(f"Inference frequency: {1/avg_inference_time:.2f} Hz")

Best Practices

Data Collection Recommendations

Smooth Actions: Ensure action sequences in demonstration data are smooth and continuous
Diverse Scenarios: Collect data with different starting states and goals
High-Quality Annotations: Ensure accuracy of action annotations
Sufficient Data Volume: Diffusion models typically require more data

Training Optimization Recommendations

Noise Scheduling: Choose appropriate noise scheduling strategy
Inference Steps: Balance quality and speed, choose appropriate inference steps
Learning Rate Scheduling: Use cosine annealing or step decay
Regularization: Appropriately use weight decay

Deployment Optimization Recommendations

Fast Sampling: Use DDIM or other fast sampling methods
Model Compression: Use knowledge distillation or quantization techniques
Parallel Inference: Utilize GPU parallel capabilities
Cache Optimization: Cache intermediate computation results

Frequently Asked Questions (FAQ)

Q: What advantages does Diffusion Policy have compared to other policy learning methods?

A: Main advantages of Diffusion Policy include:

Multimodal Generation: Can handle tasks with multiple solutions
High-Quality Output: Generates smooth and natural action sequences
Strong Robustness: Good robustness to noise and perturbations
Strong Expressiveness: Can learn complex action distributions

Q: How to choose the appropriate number of inference steps?

A: The choice of inference steps needs to balance quality and speed:

High quality: 100-1000 steps, suitable for offline evaluation
Real-time applications: 10-50 steps, suitable for online control
Fast prototyping: 5-10 steps, suitable for quick testing

Q: How long does training take?

A: Training time depends on multiple factors:

Dataset size: 500 episodes take approximately 12-24 hours (RTX 3080)
Model complexity: Larger models require more time
Inference steps: More steps increase training time
Convergence requirement: Typically requires 100000-200000 steps

Q: How to improve the quality of generated actions?

A: Methods to improve action quality:

Increase inference steps: More steps typically produce better results
Optimize noise scheduling: Choose appropriate noise addition strategy
Data quality: Ensure high quality of training data
Model architecture: Use larger or deeper networks
Regularization techniques: Appropriate regularization prevents overfitting

Q: How to handle real-time requirements?

A: Methods to meet real-time requirements:

Fast sampling: Use DDIM or DPM-Solver
Reduce inference steps: Find balance between quality and speed
Model distillation: Train smaller student models
Parallel inference: Utilize multi-GPU or batching
Pre-computation: Pre-compute partial results

Changelog

2024-01: Initial version release
2024-02: Added fast sampling support
2024-03: Optimized memory usage and training efficiency
2024-04: Added diversity evaluation and deployment optimization

Overview​

Core Features​

Prerequisites​

System Requirements​

Environment Setup​

1. Install LeRobot​

2. Install Diffusion Policy-Specific Dependencies​

Diffusion Policy Architecture​

Core Components​

Diffusion Process​

Data Preparation​

LeRobot Format Data​

Data Quality Requirements​

Fine-tuning Training​

Basic Training Command​

Advanced Training Configuration​

Multi-Step Prediction Configuration​

Memory Optimization Configuration​

Parameter Details​

Core Parameters​

Diffusion Policy-Specific Parameters​

Network Architecture Parameters​

Training Parameters​

Training Monitoring and Debugging​

Weights & Biases Integration​

Key Metrics Monitoring​

Training Visualization​

Model Evaluation​

Offline Evaluation​

Online Evaluation (Robot Environment)​

Deployment and Optimization​

Inference Acceleration​

Best Practices​

Data Collection Recommendations​

Training Optimization Recommendations​

Deployment Optimization Recommendations​

Frequently Asked Questions (FAQ)​

Q: What advantages does Diffusion Policy have compared to other policy learning methods?​

Q: How to choose the appropriate number of inference steps?​

Q: How long does training take?​

Q: How to improve the quality of generated actions?​

Q: How to handle real-time requirements?​

Related Resources​

Changelog​