Fine-tuning Diffusion Policy Model
Overview
Diffusion Policy is a visuomotor policy learning method based on diffusion models, applying the generative capabilities of diffusion models to the field of robot control. By learning the diffusion process of action distributions, this method can generate diverse and high-quality robot action sequences, performing excellently in complex robotic manipulation tasks.
Core Features
- Diffusion Generation: Uses diffusion models to generate continuous action sequences.
- Multimodal Actions: Capable of handling tasks with multiple solutions.
- High-Quality Output: Generates smooth and natural robot movements.
- Strong Robustness: Good robustness against noise and perturbations.
- Expressive Power: Capable of learning complex action distributions.
Prerequisites
System Requirements
- OS: Linux (Ubuntu 20.04+ recommended) or macOS.
- Python Version: 3.8+.
- GPU: NVIDIA GPU (RTX 3080 or higher recommended), at least 10GB VRAM.
- Memory: At least 32GB RAM.
- Storage: At least 50GB available space.
Environment Preparation
1. Install LeRobot
# Clone LeRobot repository
git clone https://github.com/huggingface/lerobot.git
cd lerobot
# Create virtual environment (venv recommended; conda also works)
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
# Install dependencies
pip install -e .
Diffusion Policy Architecture
Core Components
- Vision Encoder: Extracts image features.
- State Encoder: Processes robot state information.
- Condition Encoder: Fuses vision and state information.
- Diffusion Network: Learns the diffusion process of action distributions.
- Noise Scheduler: Controls the noise levels of the diffusion process.
Diffusion Process
- Forward Process: Gradually adds noise to the action sequence.
- Reverse Process: Gradually recovers the action sequence from noise.
- Conditional Generation: Generates actions based on observation conditions.
- Sampling Strategy: Uses DDPM or DDIM sampling.
Data Preparation
LeRobot Format Data
Diffusion Policy requires datasets in LeRobot format:
your_dataset/
├── data/
│ ├── chunk-001/
│ │ ├── observation.images.cam_high.png
│ │ ├── observation.images.cam_low.png
│ │ ├── observation.state.npy
│ │ ├── action.npy
│ │ └── ...
│ └── chunk-002/
│ └── ...
├── meta.json
├── stats.safetensors
└── videos/
├── episode_000000.mp4
└── ...
Data Quality Requirements
- Minimum 100 episodes for basic training.
- 500+ episodes recommended for optimal results.
- Action sequences should be smooth and continuous.
- Include diverse task scenarios.
- High-quality visual observation data.
Fine-tuning Training
Basic Training Command
# Set environment variables
export HF_USER="your-huggingface-username"
export CUDA_VISIBLE_DEVICES=0
# Start Diffusion Policy training
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 64 \
--steps 100000 \
--output_dir outputs/train/diffusion_policy_finetuned \
--job_name diffusion_policy_finetuning \
--policy.device cuda \
--policy.horizon 16 \
--policy.n_action_steps 8 \
--policy.n_obs_steps 2 \
--policy.optimizer_lr 1e-4 \
--policy.optimizer_weight_decay 1e-6 \
--policy.push_to_hub false \
--save_checkpoint true \
--save_freq 10000 \
--wandb.enable true
Advanced Training Configurations
Long-Horizon Configuration
# Configuration for long-horizon sequence prediction
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 32 \
--steps 150000 \
--output_dir outputs/train/diffusion_policy_long_horizon \
--job_name diffusion_policy_long_horizon \
--policy.device cuda \
--policy.horizon 32 \
--policy.n_action_steps 16 \
--policy.n_obs_steps 4 \
--policy.beta_schedule squaredcos_cap_v2 \
--policy.clip_sample true \
--policy.prediction_type epsilon \
--policy.optimizer_lr 1e-4 \
--policy.scheduler_name cosine \
--policy.scheduler_warmup_steps 5000 \
--policy.push_to_hub false \
--save_checkpoint true \
--wandb.enable true
Memory-Optimized Configuration
# For GPUs with smaller VRAM
lerobot-train \
--policy.type diffusion \
--policy.pretrained_path lerobot/diffusion_policy \
--dataset.repo_id ${HF_USER}/your_dataset \
--batch_size 16 \
--steps 200000 \
--output_dir outputs/train/diffusion_policy_memory_opt \
--job_name diffusion_policy_memory_optimized \
--policy.device cuda \
--policy.horizon 16 \
--policy.n_action_steps 8 \
--policy.num_inference_steps 50 \
--policy.optimizer_lr 5e-5 \
--policy.use_amp true \
--num_workers 2 \
--policy.push_to_hub false \
--save_checkpoint true \
--wandb.enable true
Parameter Details
Core Parameters
| Parameter | Meaning | Recommended | Description |
|---|---|---|---|
--policy.type | Policy type | diffusion | Diffusion Policy model type |
--policy.pretrained_path | Pre-trained model path | lerobot/diffusion_policy | LeRobot official model (optional) |
--dataset.repo_id | Dataset repo ID | ${HF_USER}/your_dataset | Your HuggingFace dataset |
--batch_size | Batch size | 64 | Adjust based on VRAM, RTX 3080 recommended 32-64 |
--steps | Training steps | 100000 | Diffusion models usually require more training steps |
--output_dir | Output directory | outputs/train/diffusion_policy_finetuned | Model save path |
--job_name | Job name | diffusion_policy_finetuning | For logs and experiment tracking (optional) |
Diffusion Policy Specific Parameters
| Parameter | Meaning | Recommended | Description |
|---|---|---|---|
--policy.horizon | Prediction horizon | 16 | Length of the predicted action sequence |
--policy.n_action_steps | Execution steps | 8 | Number of actions executed each time |
--policy.n_obs_steps | Observation steps | 2 | Number of historical observations |
--policy.num_inference_steps | Inference steps | 100 | Diffusion sampling steps (not active during training) |
--policy.beta_schedule | Noise schedule | squaredcos_cap_v2 | Scheduling strategy for adding noise |
--policy.clip_sample | Sample clipping | true | Whether to clip the generated samples |
--policy.clip_sample_range | Clipping range | 1.0 | Range for sample clipping |
--policy.prediction_type | Prediction type | epsilon | Predict noise or sample |
--policy.num_train_timesteps | Training timesteps | 100 | Steps for forward diffusion |
Network Architecture Parameters
| Parameter | Meaning | Recommended | Description |
|---|---|---|---|
--policy.vision_backbone | Vision backbone | resnet18 | Image feature extraction network |
--policy.crop_shape | Image crop size | 84 84 | Crop size for input images |
--policy.crop_is_random | Random cropping | true | Whether to use random cropping during training |
--policy.use_group_norm | Use group norm | true | Replaces batch normalization |
--policy.spatial_softmax_num_keypoints | Spatial softmax keypoints | 32 | Number of keypoints in spatial softmax layer |
--policy.down_dims | Downsampling dims | 512 1024 2048 | Dimensions of U-Net downsampling path |
--policy.kernel_size | Kernel size | 5 | Kernel size for 1D convolution |
--policy.n_groups | Group norm groups | 8 | Number of groups for GroupNorm |
--policy.diffusion_step_embed_dim | Step embedding dim | 128 | Embedding dimension for diffusion steps |
Training Parameters
| Parameter | Meaning | Recommended | Description |
|---|---|---|---|
--policy.optimizer_lr | Learning rate | 1e-4 | Recommended learning rate for diffusion models |
--policy.optimizer_weight_decay | Weight decay | 1e-6 | Regularization parameter |
--policy.optimizer_betas | Adam betas | 0.95 0.999 | Beta parameters for Adam optimizer |
--policy.optimizer_eps | Adam epsilon | 1e-8 | Numerical stability parameter |
--policy.scheduler_name | Scheduler | cosine | Cosine annealing scheduler |
--policy.scheduler_warmup_steps | Warmup steps | 500 | Learning rate warmup steps |
--policy.use_amp | Use AMP | true | Saves VRAM with mixed precision |
--num_workers | Data workers | 4 | Adjust based on CPU cores |
--policy.push_to_hub | Push to Hub | false | Whether to upload model to HuggingFace (requires repo_id) |
--save_checkpoint | Save checkpoint | true | Whether to save training checkpoints |
--save_freq | Save frequency | 10000 | Interval for saving checkpoints |
Monitoring and Debugging
Weights & Biases Integration
# Detailed W&B configuration
lerobot-train \
--policy.type diffusion \
--dataset.repo_id your-name/your-dataset \
--batch_size 64 \
--steps 100000 \
--policy.push_to_hub false \
--wandb.enable true \
--wandb.project diffusion_policy_experiments \
--wandb.notes "Diffusion Policy training with long horizon" \
# ... other parameters
Key Metrics to Monitor
Focus on these metrics during training:
- Diffusion Loss: Overall loss of the diffusion model.
- MSE Loss: Mean Squared Error loss.
- Learning Rate: Learning rate change curve.
- Gradient Norm: Gradient norm monitoring.
- Inference Time: Inference time monitoring.
- Sample Quality: Quality of the generated samples.
Training Visualization
# visualization.py
import torch
import matplotlib.pyplot as plt
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
def visualize_diffusion_process(model_path, observation):
# Load model
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()
# Diffusion process for generating action sequences
with torch.no_grad():
# Initial noise
noise = torch.randn(1, policy.horizon, policy.action_dim, device="cuda")
# Diffusion sampling process
actions_sequence = []
for t in range(policy.num_inference_steps):
# Predict noise
noise_pred = policy.unet(noise, t, observation)
# Update sample
noise = policy.scheduler.step(noise_pred, t, noise).prev_sample
# Save intermediate results
if t % 10 == 0:
actions_sequence.append(noise.cpu().numpy())
final_actions = noise.cpu().numpy()
# Visualize diffusion process
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
for i, actions in enumerate(actions_sequence[:6]):
ax = axes[i//3, i%3]
ax.plot(actions[0, :, 0], label='Action Dim 0')
ax.plot(actions[0, :, 1], label='Action Dim 1')
ax.set_title(f'Diffusion Step {i*10}')
ax.legend()
plt.tight_layout()
plt.savefig('diffusion_process.png')
plt.show()
return final_actions
if __name__ == "__main__":
model_path = "outputs/train/diffusion_policy_finetuned/checkpoints/last"
# Mock observation
observation = {
"observation.images.cam_high": torch.randn(1, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 7, device="cuda")
}
actions = visualize_diffusion_process(model_path, observation)
print(f"Generated actions shape: {actions.shape}")
Model Evaluation
Offline Evaluation
# offline_evaluation.py
import torch
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from lerobot.datasets.lerobot_dataset import LeRobotDataset
def evaluate_diffusion_policy(model_path, dataset_path):
# Load model
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()
# Load test dataset
dataset = LeRobotDataset(dataset_path, split="test")
total_mse_loss = 0
total_mae_loss = 0
num_samples = 0
with torch.no_grad():
for batch in dataset:
# Model prediction
prediction = policy(batch)
# Calculate loss
target_actions = batch['action']
predicted_actions = prediction['action']
mse_loss = torch.mean((predicted_actions - target_actions) ** 2)
mae_loss = torch.mean(torch.abs(predicted_actions - target_actions))
total_mse_loss += mse_loss.item()
total_mae_loss += mae_loss.item()
num_samples += 1
avg_mse_loss = total_mse_loss / num_samples
avg_mae_loss = total_mae_loss / num_samples
print(f"Average MSE Loss: {avg_mse_loss:.4f}")
print(f"Average MAE Loss: {avg_mae_loss:.4f}")
return avg_mse_loss, avg_mae_loss
def evaluate_action_diversity(model_path, observation, num_samples=10):
# Evaluate action diversity
policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
policy.eval()
actions_list = []
with torch.no_grad():
for _ in range(num_samples):
prediction = policy(observation)
actions_list.append(prediction['action'].cpu().numpy())
actions_array = np.array(actions_list) # [num_samples, horizon, action_dim]
# Calculate action diversity metric
action_std = np.std(actions_array, axis=0) # [horizon, action_dim]
avg_std = np.mean(action_std)
print(f"Average action standard deviation: {avg_std:.4f}")
return avg_std, actions_array
if __name__ == "__main__":
model_path = "outputs/train/diffusion_policy_finetuned/checkpoints/last"
dataset_path = "path/to/your/test/dataset"
# Offline evaluation
evaluate_diffusion_policy(model_path, dataset_path)
# Diversity evaluation
observation = {
"observation.images.cam_high": torch.randn(1, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 7, device="cuda")
}
evaluate_action_diversity(model_path, observation)
Online Evaluation (Robot Environment)
# robot_evaluation.py
import torch
import numpy as np
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
class DiffusionPolicyController:
def __init__(self, model_path, num_inference_steps=50):
self.policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
self.policy.eval()
self.num_inference_steps = num_inference_steps
self.action_queue = []
self.current_obs_history = []
def get_action(self, observations):
# Update observation history
self.current_obs_history.append(observations)
if len(self.current_obs_history) > self.policy.n_obs_steps:
self.current_obs_history.pop(0)
# If action queue is empty or replanning is needed, generate new action sequence
if len(self.action_queue) == 0 or self.should_replan():
with torch.no_grad():
# Build input
batch = self.prepare_observation_batch()
# Set inference steps
self.policy.scheduler.set_timesteps(self.num_inference_steps)
# Generate action sequence
prediction = self.policy(batch)
actions = prediction['action'].cpu().numpy()[0] # [horizon, action_dim]
# Update action queue
self.action_queue = list(actions[:self.policy.n_action_steps])
# Return next action
return self.action_queue.pop(0)
def should_replan(self):
# Simple replanning strategy: replan when less than half of actions remain in queue
return len(self.action_queue) < self.policy.n_action_steps // 2
def prepare_observation_batch(self):
batch = {}
# Handle image observations
if "observation.images.cam_high" in self.current_obs_history[-1]:
images = []
for obs in self.current_obs_history:
image = obs["observation.images.cam_high"]
image_tensor = self.preprocess_image(image)
images.append(image_tensor)
# Pad history if insufficient
while len(images) < self.policy.n_obs_steps:
images.insert(0, images[0])
batch["observation.images.cam_high"] = torch.stack(images).unsqueeze(0)
# Handle state observations
if "observation.state" in self.current_obs_history[-1]:
states = []
for obs in self.current_obs_history:
state = torch.tensor(obs["observation.state"], dtype=torch.float32)
states.append(state)
# Pad history if insufficient
while len(states) < self.policy.n_obs_steps:
states.insert(0, states[0])
batch["observation.state"] = torch.stack(states).unsqueeze(0)
return batch
def preprocess_image(self, image):
# Image preprocessing logic
image_tensor = torch.tensor(image).permute(2, 0, 1).float() / 255.0
return image_tensor
# Example usage
if __name__ == "__main__":
controller = DiffusionPolicyController(
model_path="outputs/train/diffusion_policy_finetuned/checkpoints/last",
num_inference_steps=50
)
# Mock robot control loop
for step in range(100):
# Get current observation
observations = {
"observation.images.cam_high": np.random.randint(0, 255, (224, 224, 3)),
"observation.state": np.random.randn(7)
}
# Get action
action = controller.get_action(observations)
# Execute action
print(f"Step {step}: Action = {action}")
# Send action to actual robot
# robot.execute_action(action)
Deployment and Optimization
Inference Acceleration
# fast_inference.py
import torch
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from diffusers import DDIMScheduler
class FastDiffusionInference:
def __init__(self, model_path, num_inference_steps=10):
self.policy = DiffusionPolicy.from_pretrained(model_path, device="cuda")
self.policy.eval()
# Use DDIM scheduler for fast sampling
self.policy.scheduler = DDIMScheduler.from_config(self.policy.scheduler.config)
self.num_inference_steps = num_inference_steps
# Warmup model
self.warmup()
def warmup(self):
# Warmup with dummy data
dummy_batch = {
"observation.images.cam_high": torch.randn(1, 2, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 2, 7, device="cuda")
}
with torch.no_grad():
for _ in range(5):
_ = self.predict(dummy_batch)
@torch.no_grad()
def predict(self, observations):
# Set inference steps
self.policy.scheduler.set_timesteps(self.num_inference_steps)
# Fast inference
prediction = self.policy(observations)
return prediction['action'].cpu().numpy()
if __name__ == "__main__":
fast_inference = FastDiffusionInference(
"outputs/train/diffusion_policy_finetuned/checkpoints/last",
num_inference_steps=10
)
# Test inference speed
import time
observations = {
"observation.images.cam_high": torch.randn(1, 2, 3, 224, 224, device="cuda"),
"observation.state": torch.randn(1, 2, 7, device="cuda")
}
start_time = time.time()
for _ in range(100):
action = fast_inference.predict(observations)
end_time = time.time()
avg_inference_time = (end_time - start_time) / 100
print(f"Average inference time: {avg_inference_time:.4f} seconds")
print(f"Inference frequency: {1/avg_inference_time:.2f} Hz")
Best Practices
Data Collection Recommendations
- Smooth Actions: Ensure action sequences in demonstration data are smooth and continuous.
- Diverse Scenarios: Collect data with different initial states and goals.
- High-Quality Annotation: Ensure accuracy of action annotations.
- Sufficient Data Volume: Diffusion models usually require more data.
Training Optimization Recommendations
- Noise Schedule: Choose an appropriate noise scheduling strategy.
- Inference Steps: Balance quality and speed by choosing appropriate inference steps.
- Learning Rate Schedule: Use cosine annealing or step decay.
- Regularization: Use weight decay appropriately.
Deployment Optimization Recommendations
- Fast Sampling: Use DDIM or other fast sampling methods.
- Model Compression: Use knowledge distillation or quantization.
- Parallel Inference: Leverage GPU parallel capabilities.
- Cache Optimization: Cache intermediate calculation results.
FAQ
Q: What are the advantages of Diffusion Policy compared to other policy learning methods?
A: Key advantages of Diffusion Policy include:
- Multimodal Generation: Capable of handling tasks with multiple solutions.
- High-Quality Output: Generates smooth and natural action sequences.
- Strong Robustness: Good robustness against noise and perturbations.
- Expressive Power: Capable of learning complex action distributions.
Q: How to choose the right number of inference steps?
A: The choice of inference steps balances quality and speed:
- High Quality: 100-1000 steps, suitable for offline evaluation.
- Real-time Applications: 10-50 steps, suitable for online control.
- Rapid Prototyping: 5-10 steps, suitable for quick testing.
Q: How long does training take?
A: Training time depends on several factors:
- Dataset Size: 500 episodes take about 12-24 hours (RTX 3080).
- Model Complexity: Larger models require more time.
- Inference Steps: More steps increase training time.
- Convergence Requirements: Usually requires 100,000-200,000 steps.
Q: How to improve the quality of generated actions?
A: Methods to improve action quality:
- Increase Inference Steps: More steps usually yield better results.
- Optimize Noise Schedule: Choose an appropriate noise addition strategy.
- Data Quality: Ensure high quality of training data.
- Model Architecture: Use larger or deeper networks.
- Regularization Techniques: Appropriate regularization to prevent overfitting.
Q: How to handle real-time requirements?
A: Methods to meet real-time requirements:
- Fast Sampling: Use DDIM or DPM-Solver.
- Reduce Inference Steps: Find a balance between quality and speed.
- Model Distillation: Train smaller student models.
- Parallel Inference: Leverage multiple GPUs or batch processing.
- Pre-calculation: Calculate some results in advance.
Related Resources
- Diffusion Policy Original Paper
- LeRobot Diffusion Policy Implementation
- Diffusers Library Documentation
- Diffusion Models Tutorial
- Robotics Learning Course
Update Log
- 2024-01: Initial version released.
- 2024-02: Added fast sampling support.
- 2024-03: Optimized memory usage and training efficiency.
- 2024-04: Added diversity evaluation and deployment optimization.