Skip to main content

Diffusion Policy Model Fine-tuning

Overview

Diffusion Policy is a visuomotor policy learning method based on diffusion models, applying the generative capabilities of diffusion models to the field of robot control. This method generates diverse and high-quality robot action sequences by learning the diffusion process of action distributions, demonstrating excellent performance in complex robot manipulation tasks.

Core Features

  • Diffusion Generation: Uses diffusion models to generate continuous action sequences
  • Multimodal Actions: Can handle tasks with multiple solutions
  • High-Quality Output: Generates smooth and natural robot actions
  • Strong Robustness: Good robustness to noise and perturbations
  • Strong Expressiveness: Can learn complex action distributions

Prerequisites

System Requirements

  • Operating System: Linux (Ubuntu 20.04+ recommended) or macOS
  • Python Version: 3.8+
  • GPU: NVIDIA GPU (RTX 3080 or higher recommended), at least 10GB VRAM
  • Memory: At least 32GB RAM
  • Storage: At least 50GB available space

Environment Setup

1. Install LeRobot

# Clone LeRobot repository
git clone https://github.com/huggingface/lerobot.git
cd lerobot

# Create virtual environment
conda create -n lerobot python=3.10
conda activate lerobot

# Install dependencies
pip install -e .

2. Install Diffusion Policy-Specific Dependencies

# Install diffusion model related dependencies
pip install diffusers
pip install accelerate
pip install transformers
pip install einops
pip install wandb

# Install numerical computing libraries
pip install scipy
pip install scikit-learn

# Login to Weights & Biases (optional)
wandb login

Diffusion Policy Architecture

Core Components

  1. Vision Encoder: Extracts image features
  2. State Encoder: Processes robot state information
  3. Conditional Encoder: Fuses vision and state information
  4. Diffusion Network: Learns the diffusion process of action distributions
  5. Noise Scheduler: Controls noise levels in the diffusion process

Diffusion Process

  1. Forward Process: Gradually adds noise to action sequences
  2. Reverse Process: Gradually recovers action sequences from noise
  3. Conditional Generation: Generates actions based on observation conditions
  4. Sampling Strategy: Uses DDPM or DDIM sampling

Data Preparation

LeRobot Format Data

Diffusion Policy requires using LeRobot format datasets:

your_dataset/
├── data/
│ ├── chunk-001/
│ │ ├── observation.images.cam_high.png
│ │ ├── observation.images.cam_low.png
│ │ ├── observation.state.npy
│ │ ├── action.npy
│ │ └── ...
│ └── chunk-002/
│ └── ...
├── meta.json
├── stats.safetensors
└── videos/
├── episode_000000.mp4
└── ...

Data Quality Requirements

  • Minimum 100 episodes for basic training
  • 500+ episodes recommended for optimal results
  • Action sequences should be smooth and continuous
  • Include diverse task scenarios
  • High-quality visual observation data

Fine-tuning Training

Basic Training Command

# Set environment variables
export HF_USER="your-huggingface-username"
export CUDA_VISIBLE_DEVICES=0

# Start Diffusion Policy training
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 64 \
--steps 100000 \
--output_dir outputs/train/diffusion_policy_finetuned \
--job_name diffusion_policy_finetuning \
--policy.device cuda \
--policy.horizon 16 \
--policy.n_action_steps 8 \
--policy.n_obs_steps 2 \
--policy.num_inference_steps 100 \
--policy.optimizer_lr 1e-4 \
--policy.optimizer_weight_decay 1e-6 \
--policy.push_to_hub false \
--save_checkpoint true \
--save_freq 10000 \
--wandb.enable true

Advanced Training Configuration

Multi-Step Prediction Configuration

# Configuration for long sequence prediction
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 32 \
--steps 150000 \
--output_dir outputs/train/diffusion_policy_long_horizon \
--job_name diffusion_policy_long_horizon \
--policy.device cuda \
--policy.horizon 32 \
--policy.n_action_steps 16 \
--policy.n_obs_steps 4 \
--policy.num_inference_steps 100 \
--policy.beta_schedule squaredcos_cap_v2 \
--policy.clip_sample true \
--policy.prediction_type epsilon \
--policy.optimizer_lr 1e-4 \
--policy.scheduler_name cosine \
--policy.scheduler_warmup_steps 5000 \
--policy.push_to_hub false \
--save_checkpoint true \
--wandb.enable true

Memory Optimization Configuration

# For GPUs with smaller VRAM
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 16 \
--steps 200000 \
--output_dir outputs/train/diffusion_policy_memory_opt \
--job_name diffusion_policy_memory_optimized \
--policy.device cuda \
--policy.horizon 16 \
--policy.n_action_steps 8 \
--policy.num_inference_steps 50 \
--policy.optimizer_lr 5e-5 \
--policy.use_amp true \
--num_workers 2 \
--policy.push_to_hub false \
--save_checkpoint true \
--wandb.enable true

Parameter Details

Core Parameters

ParameterMeaningRecommended ValueDescription
--policy.typePolicy typediffusionDiffusion Policy model type
--policy.pretrained_pathPretrained model pathlerobot/diffusion_policyLeRobot official model (optional)
--dataset.repo_idDataset repository ID${HF_USER}/datasetYour HuggingFace dataset
--batch_sizeBatch size64Adjust based on VRAM, RTX 3080 recommended 32-64
--stepsTraining steps100000Diffusion models typically require more training steps
--output_dirOutput directoryoutputs/train/diffusion_policy_finetunedModel save path
--job_nameJob namediffusion_policy_finetuningFor logging and experiment tracking (optional)

Diffusion Policy-Specific Parameters

ParameterMeaningRecommended ValueDescription
--policy.horizonPrediction horizon16Length of predicted action sequence
--policy.n_action_stepsExecute action steps8Number of actions executed each time
--policy.n_obs_stepsObservation steps2Number of historical observations
--policy.num_inference_stepsInference steps100Number of diffusion sampling steps (not effective during training)
--policy.beta_scheduleNoise schedulesquaredcos_cap_v2Noise addition scheduling strategy
--policy.clip_sampleSample clippingtrueWhether to clip generated samples
--policy.clip_sample_rangeClipping range1.0Range for sample clipping
--policy.prediction_typePrediction typeepsilonPredict noise or sample
--policy.num_train_timestepsTraining timesteps100Number of forward diffusion steps

Network Architecture Parameters

ParameterMeaningRecommended ValueDescription
--policy.vision_backboneVision backboneresnet18Image feature extraction network
--policy.crop_shapeImage crop size84 84Crop size for input images
--policy.crop_is_randomRandom croppingtrueWhether to randomly crop during training
--policy.use_group_normUse group normalizationtrueReplace batch normalization
--policy.spatial_softmax_num_keypointsSpatial softmax keypoints32Number of keypoints in spatial softmax layer
--policy.down_dimsDownsampling dimensions512 1024 2048Dimensions of U-Net downsampling path
--policy.kernel_sizeConvolution kernel size5Kernel size for 1D convolution
--policy.n_groupsGroup normalization groups8Number of groups in GroupNorm
--policy.diffusion_step_embed_dimStep embedding dimension128Embedding dimension for diffusion steps

Training Parameters

ParameterMeaningRecommended ValueDescription
--policy.optimizer_lrLearning rate1e-4Recommended learning rate for diffusion models
--policy.optimizer_weight_decayWeight decay1e-6Regularization parameter
--policy.optimizer_betasAdam optimizer beta0.95 0.999Beta parameters for Adam optimizer
--policy.optimizer_epsAdam epsilon1e-8Numerical stability parameter
--policy.scheduler_nameLearning rate schedulercosineCosine annealing schedule
--policy.scheduler_warmup_stepsWarmup steps500Learning rate warmup
--policy.use_ampMixed precisiontrueSaves VRAM
--num_workersData loading threads4Adjust based on CPU core count
--policy.push_to_hubPush to HubfalseWhether to upload model to HuggingFace (requires repo_id)
--save_checkpointSave checkpointstrueWhether to save training checkpoints
--save_freqSave frequency10000Checkpoint save interval

Training Monitoring and Debugging

Weights & Biases Integration

# Detailed W&B configuration
lerobot-train \
--policy.type diffusion \
--dataset.repo_id your-name/your-dataset \
--batch_size 64 \
--steps 100000 \
--policy.push_to_hub false \
--wandb.enable true \
--wandb.project diffusion_policy_experiments \
--wandb.notes "Diffusion Policy training with long horizon" \
# ... other parameters

Key Metrics Monitoring

Metrics to monitor during training:

  • Diffusion Loss: Overall loss of the diffusion model
  • MSE Loss: Mean squared error loss
  • Learning Rate: Learning rate change curve
  • Gradient Norm: Gradient norm
  • Inference Time: Inference time
  • Sample Quality: Quality of generated samples

Training Visualization

# visualization.py
import torch
import matplotlib.pyplot as plt
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy

def visualize_diffusion_process(model_path, observation):
# Load model
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()

# Generate diffusion process of action sequence
with torch.no_grad():
# Initial noise
noise = torch.randn(1, policy.horizon, policy.action_dim, device="cuda")

# Diffusion sampling process
actions_sequence = []
for t in range(policy.num_inference_steps):
# Predict noise
noise_pred = policy.unet(noise, t, observation)

# Update sample
noise = policy.scheduler.step(noise_pred, t, noise).prev_sample

# Save intermediate results
if t % 10 == 0:
actions_sequence.append(noise.cpu().numpy())

final_actions = noise.cpu().numpy()

# Visualize diffusion process
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

for i, actions in enumerate(actions_sequence[:6]):
ax = axes[i//3, i%3]
ax.plot(actions[0, :, 0], label='Action Dim 0')
ax.plot(actions[0, :, 1], label='Action Dim 1')
ax.set_title(f'Diffusion Step {i*10}')
ax.legend()

plt.tight_layout()
plt.savefig('diffusion_process.png')
plt.show()

return final_actions

if __name__ == "__main__":
model_path = "outputs/train/diffusion_policy_finetuned/checkpoints/last"

# Simulate observation
observation = {
"observation.images.cam_high": torch.randn(1, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 7, device="cuda")
}

actions = visualize_diffusion_process(model_path, observation)
print(f"Generated actions shape: {actions.shape}")

Model Evaluation

Offline Evaluation

# offline_evaluation.py
import torch
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from lerobot.datasets.lerobot_dataset import LeRobotDataset

def evaluate_diffusion_policy(model_path, dataset_path):
# Load model
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()

# Load test dataset
dataset = LeRobotDataset(dataset_path, split="test")

total_mse_loss = 0
total_mae_loss = 0
num_samples = 0

with torch.no_grad():
for batch in dataset:
# Model prediction
prediction = policy(batch)

# Calculate loss
target_actions = batch['action']
predicted_actions = prediction['action']

mse_loss = torch.mean((predicted_actions - target_actions) ** 2)
mae_loss = torch.mean(torch.abs(predicted_actions - target_actions))

total_mse_loss += mse_loss.item()
total_mae_loss += mae_loss.item()
num_samples += 1

avg_mse_loss = total_mse_loss / num_samples
avg_mae_loss = total_mae_loss / num_samples

print(f"Average MSE Loss: {avg_mse_loss:.4f}")
print(f"Average MAE Loss: {avg_mae_loss:.4f}")

return avg_mse_loss, avg_mae_loss

def evaluate_action_diversity(model_path, observation, num_samples=10):
# Evaluate action diversity
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()

actions_list = []

with torch.no_grad():
for _ in range(num_samples):
prediction = policy(observation)
actions_list.append(prediction['action'].cpu().numpy())

actions_array = np.array(actions_list) # [num_samples, horizon, action_dim]

# Calculate action diversity metric
action_std = np.std(actions_array, axis=0) # [horizon, action_dim]
avg_std = np.mean(action_std)

print(f"Average action standard deviation: {avg_std:.4f}")

return avg_std, actions_array

if __name__ == "__main__":
model_path = "outputs/train/diffusion_policy_finetuned/checkpoints/last"
dataset_path = "path/to/your/test/dataset"

# Offline evaluation
evaluate_diffusion_policy(model_path, dataset_path)

# Diversity evaluation
observation = {
"observation.images.cam_high": torch.randn(1, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 7, device="cuda")
}

evaluate_action_diversity(model_path, observation)

Online Evaluation (Robot Environment)

# robot_evaluation.py
import torch
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy

class DiffusionPolicyController:
def __init__(self, model_path, num_inference_steps=50):
self.policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
self.policy.eval()
self.num_inference_steps = num_inference_steps
self.action_queue = []
self.current_obs_history = []

def get_action(self, observations):
# Update observation history
self.current_obs_history.append(observations)
if len(self.current_obs_history) > self.policy.n_obs_steps:
self.current_obs_history.pop(0)

# If action queue is empty or needs replanning, generate new action sequence
if len(self.action_queue) == 0 or self.should_replan():
with torch.no_grad():
# Build input
batch = self.prepare_observation_batch()

# Set inference steps
self.policy.scheduler.set_timesteps(self.num_inference_steps)

# Generate action sequence
prediction = self.policy(batch)
actions = prediction['action'].cpu().numpy()[0] # [horizon, action_dim]

# Update action queue
self.action_queue = list(actions[:self.policy.n_action_steps])

# Return next action
return self.action_queue.pop(0)

def should_replan(self):
# Simple replanning strategy: replan when action queue has less than half remaining
return len(self.action_queue) < self.policy.n_action_steps // 2

def prepare_observation_batch(self):
batch = {}

# Process image observations
if "observation.images.cam_high" in self.current_obs_history[-1]:
images = []
for obs in self.current_obs_history:
image = obs["observation.images.cam_high"]
image_tensor = self.preprocess_image(image)
images.append(image_tensor)

# If history is insufficient, repeat last observation
while len(images) < self.policy.n_obs_steps:
images.insert(0, images[0])

batch["observation.images.cam_high"] = torch.stack(images).unsqueeze(0)

# Process state observations
if "observation.state" in self.current_obs_history[-1]:
states = []
for obs in self.current_obs_history:
state = torch.tensor(obs["observation.state"], dtype=torch.float32)
states.append(state)

# If history is insufficient, repeat last state
while len(states) < self.policy.n_obs_steps:
states.insert(0, states[0])

batch["observation.state"] = torch.stack(states).unsqueeze(0)

return batch

def preprocess_image(self, image):
# Image preprocessing logic
image_tensor = torch.tensor(image).permute(2, 0, 1).float() / 255.0
return image_tensor

# Usage example
if __name__ == "__main__":
controller = DiffusionPolicyController(
model_path="outputs/train/diffusion_policy_finetuned/checkpoints/last",
num_inference_steps=50
)

# Simulate robot control loop
for step in range(100):
# Get current observation
observations = {
"observation.images.cam_high": np.random.randint(0, 255, (224, 224, 3)),
"observation.state": np.random.randn(7)
}

# Get action
action = controller.get_action(observations)

# Execute action
print(f"Step {step}: Action = {action}")

# This should send the action to the actual robot
# robot.execute_action(action)

Deployment and Optimization

Inference Acceleration

# fast_inference.py
import torch
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from diffusers import DDIMScheduler

class FastDiffusionInference:
def __init__(self, model_path, num_inference_steps=10):
self.policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
self.policy.eval()

# Use DDIM scheduler for fast sampling
self.policy.scheduler = DDIMScheduler.from_config(self.policy.scheduler.config)
self.num_inference_steps = num_inference_steps

# Warmup model
self.warmup()

def warmup(self):
# Warmup model with dummy data
dummy_batch = {
"observation.images.cam_high": torch.randn(1, 2, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 2, 7, device="cuda")
}

with torch.no_grad():
for _ in range(5):
_ = self.predict(dummy_batch)

@torch.no_grad()
def predict(self, observations):
# Set inference steps
self.policy.scheduler.set_timesteps(self.num_inference_steps)

# Fast inference
prediction = self.policy(observations)
return prediction['action'].cpu().numpy()

if __name__ == "__main__":
fast_inference = FastDiffusionInference(
"outputs/train/diffusion_policy_finetuned/checkpoints/last",
num_inference_steps=10
)

# Test inference speed
import time

observations = {
"observation.images.cam_high": torch.randn(1, 2, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 2, 7, device="cuda")
}

start_time = time.time()
for _ in range(100):
action = fast_inference.predict(observations)
end_time = time.time()

avg_inference_time = (end_time - start_time) / 100
print(f"Average inference time: {avg_inference_time:.4f} seconds")
print(f"Inference frequency: {1/avg_inference_time:.2f} Hz")

Best Practices

Data Collection Recommendations

  1. Smooth Actions: Ensure action sequences in demonstration data are smooth and continuous
  2. Diverse Scenarios: Collect data with different starting states and goals
  3. High-Quality Annotations: Ensure accuracy of action annotations
  4. Sufficient Data Volume: Diffusion models typically require more data

Training Optimization Recommendations

  1. Noise Scheduling: Choose appropriate noise scheduling strategy
  2. Inference Steps: Balance quality and speed, choose appropriate inference steps
  3. Learning Rate Scheduling: Use cosine annealing or step decay
  4. Regularization: Appropriately use weight decay

Deployment Optimization Recommendations

  1. Fast Sampling: Use DDIM or other fast sampling methods
  2. Model Compression: Use knowledge distillation or quantization techniques
  3. Parallel Inference: Utilize GPU parallel capabilities
  4. Cache Optimization: Cache intermediate computation results

Frequently Asked Questions (FAQ)

Q: What advantages does Diffusion Policy have compared to other policy learning methods?

A: Main advantages of Diffusion Policy include:

  • Multimodal Generation: Can handle tasks with multiple solutions
  • High-Quality Output: Generates smooth and natural action sequences
  • Strong Robustness: Good robustness to noise and perturbations
  • Strong Expressiveness: Can learn complex action distributions

Q: How to choose the appropriate number of inference steps?

A: The choice of inference steps needs to balance quality and speed:

  • High quality: 100-1000 steps, suitable for offline evaluation
  • Real-time applications: 10-50 steps, suitable for online control
  • Fast prototyping: 5-10 steps, suitable for quick testing

Q: How long does training take?

A: Training time depends on multiple factors:

  • Dataset size: 500 episodes take approximately 12-24 hours (RTX 3080)
  • Model complexity: Larger models require more time
  • Inference steps: More steps increase training time
  • Convergence requirement: Typically requires 100000-200000 steps

Q: How to improve the quality of generated actions?

A: Methods to improve action quality:

  • Increase inference steps: More steps typically produce better results
  • Optimize noise scheduling: Choose appropriate noise addition strategy
  • Data quality: Ensure high quality of training data
  • Model architecture: Use larger or deeper networks
  • Regularization techniques: Appropriate regularization prevents overfitting

Q: How to handle real-time requirements?

A: Methods to meet real-time requirements:

  • Fast sampling: Use DDIM or DPM-Solver
  • Reduce inference steps: Find balance between quality and speed
  • Model distillation: Train smaller student models
  • Parallel inference: Utilize multi-GPU or batching
  • Pre-computation: Pre-compute partial results

Changelog

  • 2024-01: Initial version release
  • 2024-02: Added fast sampling support
  • 2024-03: Optimized memory usage and training efficiency
  • 2024-04: Added diversity evaluation and deployment optimization