NCRL: Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

NCRL

Efficient Reinforcement Learning by
Guiding World Models with Non-Curated Data

ICLR 2026

1Aalto University    2University of Edinburgh    3ELLIS Institute Finland    4Deep Render    5Imperial College London    6Max Planck Institute for Intelligent Systems    7CIFAR AI Chair    8University of Alberta    9Alberta Machine Intelligence Institute (Amii)    10University of Oulu

Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two techniques: i) experience rehearsal and ii) execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves nearly twice the aggregate score of learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.