Diffusion Policy Model Fine-tuning
Overview
Diffusion Policy is a visuomotor policy learning method based on diffusion models, applying the generative capabilities of diffusion models to the field of robot control. This method generates diverse and high-quality robot action sequences by learning the diffusion process of action distributions, demonstrating excellent performance in complex robot manipulation tasks.
Core Features
- Diffusion Generation: Uses diffusion models to generate continuous action sequences
- Multimodal Actions: Can handle tasks with multiple solutions
- High-Quality Output: Generates smooth and natural robot actions
- Strong Robustness: Good robustness to noise and perturbations
- Strong Expressiveness: Can learn complex action distributions
Prerequisites
System Requirements
- Operating System: Linux (Ubuntu 20.04+ recommended) or macOS
- Python Version: 3.8+
- GPU: NVIDIA GPU (RTX 3080 or higher recommended), at least 10GB VRAM
- Memory: At least 32GB RAM
- Storage: At least 50GB available space
Environment Setup
1. Install LeRobot
# Clone LeRobot repository
git clone https://github.com/huggingface/lerobot.git
cd lerobot
# Create virtual environment
conda create -n lerobot python=3.10
conda activate lerobot
# Install dependencies
pip install -e .
2. Install Diffusion Policy-Specific Dependencies
# Install diffusion model related dependencies
pip install diffusers
pip install accelerate
pip install transformers
pip install einops
pip install wandb
# Install numerical computing libraries
pip install scipy
pip install scikit-learn
# Login to Weights & Biases (optional)
wandb login
Diffusion Policy Architecture
Core Components
- Vision Encoder: Extracts image features
- State Encoder: Processes robot state information
- Conditional Encoder: Fuses vision and state information
- Diffusion Network: Learns the diffusion process of action distributions
- Noise Scheduler: Controls noise levels in the diffusion process
Diffusion Process
- Forward Process: Gradually adds noise to action sequences
- Reverse Process: Gradually recovers action sequences from noise
- Conditional Generation: Generates actions based on observation conditions
- Sampling Strategy: Uses DDPM or DDIM sampling
Data Preparation
LeRobot Format Data
Diffusion Policy requires using LeRobot format datasets:
your_dataset/
├── data/
│ ├── chunk-001/
│ │ ├── observation.images.cam_high.png
│ │ ├── observation.images.cam_low.png
│ │ ├── observation.state.npy
│ │ ├── action.npy
│ │ └── ...
│ └── chunk-002/
│ └── ...
├── meta.json
├── stats.safetensors
└── videos/
├── episode_000000.mp4
└── ...
Data Quality Requirements
- Minimum 100 episodes for basic training
- 500+ episodes recommended for optimal results
- Action sequences should be smooth and continuous
- Include diverse task scenarios
- High-quality visual observation data
Fine-tuning Training
Basic Training Command
# Set environment variables
export HF_USER="your-huggingface-username"
export CUDA_VISIBLE_DEVICES=0
# Start Diffusion Policy training
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 64 \
--steps 100000 \
--output_dir outputs/train/diffusion_policy_finetuned \
--job_name diffusion_policy_finetuning \
--policy.device cuda \
--policy.horizon 16 \
--policy.n_action_steps 8 \
--policy.n_obs_steps 2 \
--policy.num_inference_steps 100 \
--policy.optimizer_lr 1e-4 \
--policy.optimizer_weight_decay 1e-6 \
--policy.push_to_hub false \
--save_checkpoint true \
--save_freq 10000 \
--wandb.enable true
Advanced Training Configuration
Multi-Step Prediction Configuration
# Configuration for long sequence prediction
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 32 \
--steps 150000 \
--output_dir outputs/train/diffusion_policy_long_horizon \
--job_name diffusion_policy_long_horizon \
--policy.device cuda \
--policy.horizon 32 \
--policy.n_action_steps 16 \
--policy.n_obs_steps 4 \
--policy.num_inference_steps 100 \
--policy.beta_schedule squaredcos_cap_v2 \
--policy.clip_sample true \
--policy.prediction_type epsilon \
--policy.optimizer_lr 1e-4 \
--policy.scheduler_name cosine \
--policy.scheduler_warmup_steps 5000 \
--policy.push_to_hub false \
--save_checkpoint true \
--wandb.enable true
Memory Optimization Configuration
# For GPUs with smaller VRAM
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 16 \
--steps 200000 \
--output_dir outputs/train/diffusion_policy_memory_opt \
--job_name diffusion_policy_memory_optimized \
--policy.device cuda \
--policy.horizon 16 \
--policy.n_action_steps 8 \
--policy.num_inference_steps 50 \
--policy.optimizer_lr 5e-5 \
--policy.use_amp true \
--num_workers 2 \
--policy.push_to_hub false \
--save_checkpoint true \
--wandb.enable true
Parameter Details
Core Parameters
Parameter | Meaning | Recommended Value | Description |
---|---|---|---|
--policy.type | Policy type | diffusion | Diffusion Policy model type |
--policy.pretrained_path | Pretrained model path | lerobot/diffusion_policy | LeRobot official model (optional) |
--dataset.repo_id | Dataset repository ID | ${HF_USER}/dataset | Your HuggingFace dataset |
--batch_size | Batch size | 64 | Adjust based on VRAM, RTX 3080 recommended 32-64 |
--steps | Training steps | 100000 | Diffusion models typically require more training steps |
--output_dir | Output directory | outputs/train/diffusion_policy_finetuned | Model save path |
--job_name | Job name | diffusion_policy_finetuning | For logging and experiment tracking (optional) |
Diffusion Policy-Specific Parameters
Parameter | Meaning | Recommended Value | Description |
---|---|---|---|
--policy.horizon | Prediction horizon | 16 | Length of predicted action sequence |
--policy.n_action_steps | Execute action steps | 8 | Number of actions executed each time |
--policy.n_obs_steps | Observation steps | 2 | Number of historical observations |
--policy.num_inference_steps | Inference steps | 100 | Number of diffusion sampling steps (not effective during training) |
--policy.beta_schedule | Noise schedule | squaredcos_cap_v2 | Noise addition scheduling strategy |
--policy.clip_sample | Sample clipping | true | Whether to clip generated samples |
--policy.clip_sample_range | Clipping range | 1.0 | Range for sample clipping |
--policy.prediction_type | Prediction type | epsilon | Predict noise or sample |
--policy.num_train_timesteps | Training timesteps | 100 | Number of forward diffusion steps |
Network Architecture Parameters
Parameter | Meaning | Recommended Value | Description |
---|---|---|---|
--policy.vision_backbone | Vision backbone | resnet18 | Image feature extraction network |
--policy.crop_shape | Image crop size | 84 84 | Crop size for input images |
--policy.crop_is_random | Random cropping | true | Whether to randomly crop during training |
--policy.use_group_norm | Use group normalization | true | Replace batch normalization |
--policy.spatial_softmax_num_keypoints | Spatial softmax keypoints | 32 | Number of keypoints in spatial softmax layer |
--policy.down_dims | Downsampling dimensions | 512 1024 2048 | Dimensions of U-Net downsampling path |
--policy.kernel_size | Convolution kernel size | 5 | Kernel size for 1D convolution |
--policy.n_groups | Group normalization groups | 8 | Number of groups in GroupNorm |
--policy.diffusion_step_embed_dim | Step embedding dimension | 128 | Embedding dimension for diffusion steps |
Training Parameters
Parameter | Meaning | Recommended Value | Description |
---|---|---|---|
--policy.optimizer_lr | Learning rate | 1e-4 | Recommended learning rate for diffusion models |
--policy.optimizer_weight_decay | Weight decay | 1e-6 | Regularization parameter |
--policy.optimizer_betas | Adam optimizer beta | 0.95 0.999 | Beta parameters for Adam optimizer |
--policy.optimizer_eps | Adam epsilon | 1e-8 | Numerical stability parameter |
--policy.scheduler_name | Learning rate scheduler | cosine | Cosine annealing schedule |
--policy.scheduler_warmup_steps | Warmup steps | 500 | Learning rate warmup |
--policy.use_amp | Mixed precision | true | Saves VRAM |
--num_workers | Data loading threads | 4 | Adjust based on CPU core count |
--policy.push_to_hub | Push to Hub | false | Whether to upload model to HuggingFace (requires repo_id) |
--save_checkpoint | Save checkpoints | true | Whether to save training checkpoints |
--save_freq | Save frequency | 10000 | Checkpoint save interval |
Training Monitoring and Debugging
Weights & Biases Integration
# Detailed W&B configuration
lerobot-train \
--policy.type diffusion \
--dataset.repo_id your-name/your-dataset \
--batch_size 64 \
--steps 100000 \
--policy.push_to_hub false \
--wandb.enable true \
--wandb.project diffusion_policy_experiments \
--wandb.notes "Diffusion Policy training with long horizon" \
# ... other parameters
Key Metrics Monitoring
Metrics to monitor during training:
- Diffusion Loss: Overall loss of the diffusion model
- MSE Loss: Mean squared error loss
- Learning Rate: Learning rate change curve
- Gradient Norm: Gradient norm
- Inference Time: Inference time
- Sample Quality: Quality of generated samples
Training Visualization
# visualization.py
import torch
import matplotlib.pyplot as plt
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
def visualize_diffusion_process(model_path, observation):
# Load model
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()
# Generate diffusion process of action sequence
with torch.no_grad():
# Initial noise
noise = torch.randn(1, policy.horizon, policy.action_dim, device="cuda")
# Diffusion sampling process
actions_sequence = []
for t in range(policy.num_inference_steps):
# Predict noise
noise_pred = policy.unet(noise, t, observation)
# Update sample
noise = policy.scheduler.step(noise_pred, t, noise).prev_sample
# Save intermediate results
if t % 10 == 0:
actions_sequence.append(noise.cpu().numpy())
final_actions = noise.cpu().numpy()
# Visualize diffusion process
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
for i, actions in enumerate(actions_sequence[:6]):
ax = axes[i//3, i%3]
ax.plot(actions[0, :, 0], label='Action Dim 0')
ax.plot(actions[0, :, 1], label='Action Dim 1')
ax.set_title(f'Diffusion Step {i*10}')
ax.legend()
plt.tight_layout()
plt.savefig('diffusion_process.png')
plt.show()
return final_actions
if __name__ == "__main__":
model_path = "outputs/train/diffusion_policy_finetuned/checkpoints/last"
# Simulate observation
observation = {
"observation.images.cam_high": torch.randn(1, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 7, device="cuda")
}
actions = visualize_diffusion_process(model_path, observation)
print(f"Generated actions shape: {actions.shape}")
Model Evaluation
Offline Evaluation
# offline_evaluation.py
import torch
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from lerobot.datasets.lerobot_dataset import LeRobotDataset
def evaluate_diffusion_policy(model_path, dataset_path):
# Load model
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()
# Load test dataset
dataset = LeRobotDataset(dataset_path, split="test")
total_mse_loss = 0
total_mae_loss = 0
num_samples = 0
with torch.no_grad():
for batch in dataset:
# Model prediction
prediction = policy(batch)
# Calculate loss
target_actions = batch['action']
predicted_actions = prediction['action']
mse_loss = torch.mean((predicted_actions - target_actions) ** 2)
mae_loss = torch.mean(torch.abs(predicted_actions - target_actions))
total_mse_loss += mse_loss.item()
total_mae_loss += mae_loss.item()
num_samples += 1
avg_mse_loss = total_mse_loss / num_samples
avg_mae_loss = total_mae_loss / num_samples
print(f"Average MSE Loss: {avg_mse_loss:.4f}")
print(f"Average MAE Loss: {avg_mae_loss:.4f}")
return avg_mse_loss, avg_mae_loss
def evaluate_action_diversity(model_path, observation, num_samples=10):
# Evaluate action diversity
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()
actions_list = []
with torch.no_grad():
for _ in range(num_samples):
prediction = policy(observation)
actions_list.append(prediction['action'].cpu().numpy())
actions_array = np.array(actions_list) # [num_samples, horizon, action_dim]
# Calculate action diversity metric
action_std = np.std(actions_array, axis=0) # [horizon, action_dim]
avg_std = np.mean(action_std)
print(f"Average action standard deviation: {avg_std:.4f}")
return avg_std, actions_array
if __name__ == "__main__":
model_path = "outputs/train/diffusion_policy_finetuned/checkpoints/last"
dataset_path = "path/to/your/test/dataset"
# Offline evaluation
evaluate_diffusion_policy(model_path, dataset_path)
# Diversity evaluation
observation = {
"observation.images.cam_high": torch.randn(1, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 7, device="cuda")
}
evaluate_action_diversity(model_path, observation)
Online Evaluation (Robot Environment)
# robot_evaluation.py
import torch
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
class DiffusionPolicyController:
def __init__(self, model_path, num_inference_steps=50):
self.policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
self.policy.eval()
self.num_inference_steps = num_inference_steps
self.action_queue = []
self.current_obs_history = []
def get_action(self, observations):
# Update observation history
self.current_obs_history.append(observations)
if len(self.current_obs_history) > self.policy.n_obs_steps:
self.current_obs_history.pop(0)
# If action queue is empty or needs replanning, generate new action sequence
if len(self.action_queue) == 0 or self.should_replan():
with torch.no_grad():
# Build input
batch = self.prepare_observation_batch()
# Set inference steps
self.policy.scheduler.set_timesteps(self.num_inference_steps)
# Generate action sequence
prediction = self.policy(batch)
actions = prediction['action'].cpu().numpy()[0] # [horizon, action_dim]
# Update action queue
self.action_queue = list(actions[:self.policy.n_action_steps])
# Return next action
return self.action_queue.pop(0)
def should_replan(self):
# Simple replanning strategy: replan when action queue has less than half remaining
return len(self.action_queue) < self.policy.n_action_steps // 2
def prepare_observation_batch(self):
batch = {}
# Process image observations
if "observation.images.cam_high" in self.current_obs_history[-1]:
images = []
for obs in self.current_obs_history:
image = obs["observation.images.cam_high"]
image_tensor = self.preprocess_image(image)
images.append(image_tensor)
# If history is insufficient, repeat last observation
while len(images) < self.policy.n_obs_steps:
images.insert(0, images[0])
batch["observation.images.cam_high"] = torch.stack(images).unsqueeze(0)
# Process state observations
if "observation.state" in self.current_obs_history[-1]:
states = []
for obs in self.current_obs_history:
state = torch.tensor(obs["observation.state"], dtype=torch.float32)
states.append(state)
# If history is insufficient, repeat last state
while len(states) < self.policy.n_obs_steps:
states.insert(0, states[0])
batch["observation.state"] = torch.stack(states).unsqueeze(0)
return batch
def preprocess_image(self, image):
# Image preprocessing logic
image_tensor = torch.tensor(image).permute(2, 0, 1).float() / 255.0
return image_tensor
# Usage example
if __name__ == "__main__":
controller = DiffusionPolicyController(
model_path="outputs/train/diffusion_policy_finetuned/checkpoints/last",
num_inference_steps=50
)
# Simulate robot control loop
for step in range(100):
# Get current observation
observations = {
"observation.images.cam_high": np.random.randint(0, 255, (224, 224, 3)),
"observation.state": np.random.randn(7)
}
# Get action
action = controller.get_action(observations)
# Execute action
print(f"Step {step}: Action = {action}")
# This should send the action to the actual robot
# robot.execute_action(action)
Deployment and Optimization
Inference Acceleration
# fast_inference.py
import torch
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from diffusers import DDIMScheduler
class FastDiffusionInference:
def __init__(self, model_path, num_inference_steps=10):
self.policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
self.policy.eval()
# Use DDIM scheduler for fast sampling
self.policy.scheduler = DDIMScheduler.from_config(self.policy.scheduler.config)
self.num_inference_steps = num_inference_steps
# Warmup model
self.warmup()
def warmup(self):
# Warmup model with dummy data
dummy_batch = {
"observation.images.cam_high": torch.randn(1, 2, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 2, 7, device="cuda")
}
with torch.no_grad():
for _ in range(5):
_ = self.predict(dummy_batch)
@torch.no_grad()
def predict(self, observations):
# Set inference steps
self.policy.scheduler.set_timesteps(self.num_inference_steps)
# Fast inference
prediction = self.policy(observations)
return prediction['action'].cpu().numpy()
if __name__ == "__main__":
fast_inference = FastDiffusionInference(
"outputs/train/diffusion_policy_finetuned/checkpoints/last",
num_inference_steps=10
)
# Test inference speed
import time
observations = {
"observation.images.cam_high": torch.randn(1, 2, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 2, 7, device="cuda")
}
start_time = time.time()
for _ in range(100):
action = fast_inference.predict(observations)
end_time = time.time()
avg_inference_time = (end_time - start_time) / 100
print(f"Average inference time: {avg_inference_time:.4f} seconds")
print(f"Inference frequency: {1/avg_inference_time:.2f} Hz")
Best Practices
Data Collection Recommendations
- Smooth Actions: Ensure action sequences in demonstration data are smooth and continuous
- Diverse Scenarios: Collect data with different starting states and goals
- High-Quality Annotations: Ensure accuracy of action annotations
- Sufficient Data Volume: Diffusion models typically require more data
Training Optimization Recommendations
- Noise Scheduling: Choose appropriate noise scheduling strategy
- Inference Steps: Balance quality and speed, choose appropriate inference steps
- Learning Rate Scheduling: Use cosine annealing or step decay
- Regularization: Appropriately use weight decay
Deployment Optimization Recommendations
- Fast Sampling: Use DDIM or other fast sampling methods
- Model Compression: Use knowledge distillation or quantization techniques
- Parallel Inference: Utilize GPU parallel capabilities
- Cache Optimization: Cache intermediate computation results
Frequently Asked Questions (FAQ)
Q: What advantages does Diffusion Policy have compared to other policy learning methods?
A: Main advantages of Diffusion Policy include:
- Multimodal Generation: Can handle tasks with multiple solutions
- High-Quality Output: Generates smooth and natural action sequences
- Strong Robustness: Good robustness to noise and perturbations
- Strong Expressiveness: Can learn complex action distributions
Q: How to choose the appropriate number of inference steps?
A: The choice of inference steps needs to balance quality and speed:
- High quality: 100-1000 steps, suitable for offline evaluation
- Real-time applications: 10-50 steps, suitable for online control
- Fast prototyping: 5-10 steps, suitable for quick testing
Q: How long does training take?
A: Training time depends on multiple factors:
- Dataset size: 500 episodes take approximately 12-24 hours (RTX 3080)
- Model complexity: Larger models require more time
- Inference steps: More steps increase training time
- Convergence requirement: Typically requires 100000-200000 steps
Q: How to improve the quality of generated actions?
A: Methods to improve action quality:
- Increase inference steps: More steps typically produce better results
- Optimize noise scheduling: Choose appropriate noise addition strategy
- Data quality: Ensure high quality of training data
- Model architecture: Use larger or deeper networks
- Regularization techniques: Appropriate regularization prevents overfitting
Q: How to handle real-time requirements?
A: Methods to meet real-time requirements:
- Fast sampling: Use DDIM or DPM-Solver
- Reduce inference steps: Find balance between quality and speed
- Model distillation: Train smaller student models
- Parallel inference: Utilize multi-GPU or batching
- Pre-computation: Pre-compute partial results
Related Resources
- Diffusion Policy Original Paper
- LeRobot Diffusion Policy Implementation
- Diffusers Library Documentation
- Diffusion Models Tutorial
- Robot Learning Course
Changelog
- 2024-01: Initial version release
- 2024-02: Added fast sampling support
- 2024-03: Optimized memory usage and training efficiency
- 2024-04: Added diversity evaluation and deployment optimization