Home MarkTechPost BYOL-Explore: Exploration with Bootstrapped Prediction
MarkTechPost

BYOL-Explore: Exploration with Bootstrapped Prediction

Share
BYOL-Explore: Exploration with Bootstrapped Prediction
Share


Research

Published
Authors

Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

Second-person and top-down views of a BYOL-Explore agent solving Thow-Across level of DM-HARD-8, whereas pure RL and other baseline exploration methods fail to make any progress on Thow-Across.

Curiosity-driven exploration is the active process of seeking new information to enhance the agent’s understanding of its environment. Suppose that the agent has learned a model of the world that can predict future events given the history of past events. The curiosity-driven agent can then use the prediction mismatch of the world model as the intrinsic reward for directing its exploration policy towards seeking new information. As follows, the agent can then use this new information to enhance the world model itself so it can make better predictions. This iterative process can allow the agent to eventually explore every novelty in the world and use this information to build an accurate world model.

Inspired by the successes of bootstrap your own latent (BYOL) – which has been applied in computer vision, graph representation learning, and representation learning in RL – we propose BYOL-Explore: a conceptually simple yet general, curiosity-driven AI agent for solving hard-exploration tasks. BYOL-Explore learns a representation of the world by predicting its own future representation. Then, it uses the prediction-error at the representation level as an intrinsic reward to train a curiosity-driven policy. Therefore, BYOL-Explore learns a world representation, the world dynamics, and a curiosity-driven exploration policy all-together, simply by optimising the prediction error at the representation level.

Comparison between BYOL-Explore, Random Network Distillation (RND), Intrinsic Curiosity Module (ICM) and pure RL (no intrinsic reward), in terms of mean capped human-normalised score (CHNS).

Despite the simplicity of its design, when applied to the DM-HARD-8 suite of challenging 3-D, visually complex, and hard exploration tasks, BYOL-Explore outperforms standard curiosity-driven exploration methods such as Random Network Distillation (RND) and Intrinsic Curiosity Module (ICM), in terms of mean capped human-normalised score (CHNS), measured across all tasks. Remarkably, BYOL-Explore achieved this performance using only a single network concurrently trained across all tasks, whereas prior work was restricted to the single-task setting and could only make meaningful progress on these tasks when provided with human expert demonstrations.

As further evidence of its generality, BYOL-Explore achieves super-human performance in the ten hardest exploration Atari games, while having a simpler design than other competitive agents, such as Agent57 and Go-Explore.

Comparison between BYOL-Explore, Random Network Distillation (RND), Intrinsic Curiosity Module (ICM) and pure RL (no intrinsic reward), in terms of mean capped human-normalised score (CHNS).

Moving forward, we can generalise BYOL-Explore to highly stochastic environments by learning a probabilistic world model that could be used to generate trajectories of the future events. This could allow the agent to model the possible stochasticity of the environment, avoid stochastic traps, and plan for exploration.



Source link

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Google’s research on quantum error correction
MarkTechPost

Google’s research on quantum error correction

Quantum computers have the potential to revolutionize drug discovery, material design and...

A new era of discovery
MarkTechPost

A new era of discovery

AI is revolutionizing the landscape of scientific research, enabling advancements at a...

Pushing the frontiers of audio generation
MarkTechPost

Pushing the frontiers of audio generation

Technologies Published 30 October 2024 Authors Zalán Borsos, Matt Sharifi and Marco...

New generative AI tools open the doors of music creation
MarkTechPost

New generative AI tools open the doors of music creation

This work was made possible by core research and engineering efforts from...