Research Article 2026-04-21 under-review v1

A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control

W
Wonhyeok Choi Daegu Gyeongbuk Institute of Science and Technology
S
Shutong Ding ShanghaiTech University
M
Minwoo Choi Daegu Gyeongbuk Institute of Science and Technology
J
Jungwan Woo Daegu Gyeongbuk Institute of Science and Technology
K
Kyumin Hwang Daegu Gyeongbuk Institute of Science and Technology
J
Jaeyeul Kim Daegu Gyeongbuk Institute of Science and Technology
Y
Ye Shi ShanghaiTech University
S
Sunghoon Im Daegu Gyeongbuk Institute of Science and Technology

Abstract

Diffusion policies have emerged as a powerful approach for robotic control, demonstrating superior expressiveness in modeling multimodal action distributions compared to conventional policy networks. However, their integration with online reinforcement learning remains challenging due to fundamental incompatibilities between diffusion model training objectives and standard RL policy improvement mechanisms. This paper presents the first comprehensive review and empirical analysis of current Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems. We propose a novel taxonomy that categorizes existing approaches into four distinct families—Action-Gradient, Q-Weighting, Proximity-Based, and Backpropagation Through Time (BPTT) methods—based on their policy improvement mechanisms. Through extensive experiments on a unified NVIDIA Isaac Lab benchmark encompassing 12 diverse robotic tasks, we systematically evaluate representative algorithms across five critical dimensions: task diversity, parallelization capability, diffusion step scalability, cross-embodiment generalization, and environmental robustness. Our analysis highlights key insights regarding the fundamental trade-offs inherent in each algorithmic family, particularly concerning sample efficiency and scalability. Furthermore, we reveal critical computational and algorithmic bottlenecks that currently limit the practical deployment of online DPRL. Based on these findings, we provide concrete guidelines for algorithm selection tailored to specific operational constraints and outline promising future research directions to advance the field toward more general and scalable robotic learning systems.

Citation Information

@article{wonhyeokchoi2026,
  title={A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control},
  author={Wonhyeok Choi and Shutong Ding and Minwoo Choi and Jungwan Woo and Kyumin Hwang and Jaeyeul Kim and Ye Shi and Sunghoon Im},
  journal={Artificial Intelligence Review},
  year={2026},
  doi={https://doi.org/10.21203/rs.3.rs-9346251/v1}
}
Back to Top
Home
Paper List
Submit
0.061018s