NCRL: Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two techniques: i) experience rehearsal and ii) execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves nearly twice the aggregate score of learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.

NCRL: Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

NCRL

Efficient Reinforcement Learning by
Guiding World Models with Non-Curated Data

ICLR 2026

Yi Zhao¹, Aidan Scannell^1,2, Wenshuai Zhao^1,3, Yuxin Hou⁴, Tianyu Cui^1,5, Le Chen⁶,

Dieter Büchler^6,7,8,9 Arno Solin^1,3, Juho Kannala^1,10, Joni Pajarinen¹,

NCRL: Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

NCRL

Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

ICLR 2026

Yi Zhao1, Aidan Scannell1,2, Wenshuai Zhao1,3, Yuxin Hou4, Tianyu Cui1,5, Le Chen6,

Dieter Büchler6,7,8,9 Arno Solin1,3, Juho Kannala1,10, Joni Pajarinen1,

Efficient Reinforcement Learning by
Guiding World Models with Non-Curated Data

Yi Zhao¹, Aidan Scannell^1,2, Wenshuai Zhao^1,3, Yuxin Hou⁴, Tianyu Cui^1,5, Le Chen⁶,

Dieter Büchler^6,7,8,9 Arno Solin^1,3, Juho Kannala^1,10, Joni Pajarinen¹,