UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Documentdetail

ID kaart

oai:arXiv.org:2406.10667

Onderwerp

Computer Science - Machine Learnin...

Auteur

Pu, Yuan Niu, Yazhe Ren, Jiyuan Yang, Zhenjie Li, Hongsheng Liu, Yu

Categorie

Computer Science

Jaar

2024

vermelding datum

19-06-2024

Trefwoorden

unizero latent learning

Metriek

Beschrijving

Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents.

Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains.

However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates rapidly.

We identify that this is partially due to the \textit{entanglement} of latent representations with historical information, which results in incompatibility with the auxiliary self-supervised state regularization.

To overcome this limitation, we present \textit{UniZero}, a novel approach that \textit{disentangles} latent states from implicit latent history using a transformer-based latent world model.

By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in latent space.

We demonstrate that UniZero, even with single-frame inputs, matches or surpasses the performance of MuZero-style algorithms on the Atari 100k benchmark.

Furthermore, it significantly outperforms prior baselines in benchmarks that require long-term memory.

Lastly, we validate the effectiveness and scalability of our design choices through extensive ablation studies, visual analyses, and multi-task learning results.

The code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}.

;Comment: 32 pages, 16 figures

Pu, Yuan,Niu, Yazhe,Ren, Jiyuan,Yang, Zhenjie,Li, Hongsheng,Liu, Yu, 2024, UniZero: Generalized and Efficient Planning with Scalable Latent World Models