Document detail
ID

oai:arXiv.org:2406.10667

Topic
Computer Science - Machine Learnin...
Author
Pu, Yuan Niu, Yazhe Ren, Jiyuan Yang, Zhenjie Li, Hongsheng Liu, Yu
Category

Computer Science

Year

2024

listing date

6/19/2024

Keywords
unizero latent learning
Metrics

Abstract

Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents.

Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains.

However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates rapidly.

We identify that this is partially due to the \textit{entanglement} of latent representations with historical information, which results in incompatibility with the auxiliary self-supervised state regularization.

To overcome this limitation, we present \textit{UniZero}, a novel approach that \textit{disentangles} latent states from implicit latent history using a transformer-based latent world model.

By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in latent space.

We demonstrate that UniZero, even with single-frame inputs, matches or surpasses the performance of MuZero-style algorithms on the Atari 100k benchmark.

Furthermore, it significantly outperforms prior baselines in benchmarks that require long-term memory.

Lastly, we validate the effectiveness and scalability of our design choices through extensive ablation studies, visual analyses, and multi-task learning results.

The code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}.

;Comment: 32 pages, 16 figures

Pu, Yuan,Niu, Yazhe,Ren, Jiyuan,Yang, Zhenjie,Li, Hongsheng,Liu, Yu, 2024, UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Document

Open

Share

Source

Articles recommended by ES/IODE AI

The role of VI-RADS scoring criteria for predicting oncological outcomes in bladder cancer
bladder cancer multiparametric magnetic resonance... recurrence bladder cancer staging 96 0 recurrent cancer bca primary bladder tumors vi-rads mibc