UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Dokumentdetails

ID

oai:arXiv.org:2406.10667

Thema

Computer Science - Machine Learnin...

Autor

Pu, Yuan Niu, Yazhe Ren, Jiyuan Yang, Zhenjie Li, Hongsheng Liu, Yu

Kategorie

Computer Science

Jahr

2024

Auflistungsdatum

19.06.2024

Schlüsselwörter

unizero latent learning

Metrisch

Zusammenfassung

Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents.

Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains.

However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates rapidly.

We identify that this is partially due to the \textit{entanglement} of latent representations with historical information, which results in incompatibility with the auxiliary self-supervised state regularization.

To overcome this limitation, we present \textit{UniZero}, a novel approach that \textit{disentangles} latent states from implicit latent history using a transformer-based latent world model.

By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in latent space.

We demonstrate that UniZero, even with single-frame inputs, matches or surpasses the performance of MuZero-style algorithms on the Atari 100k benchmark.

Furthermore, it significantly outperforms prior baselines in benchmarks that require long-term memory.

Lastly, we validate the effectiveness and scalability of our design choices through extensive ablation studies, visual analyses, and multi-task learning results.

The code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}.

;Comment: 32 pages, 16 figures

Pu, Yuan,Niu, Yazhe,Ren, Jiyuan,Yang, Zhenjie,Li, Hongsheng,Liu, Yu, 2024, UniZero: Generalized and Efficient Planning with Scalable Latent World Models