NON-MARKOVIAN PREDICTIVE CODING FOR PLANNING IN LATENT SPACE Anonymous

Abstract

High-dimensional observations are a major challenge in the application of modelbased reinforcement learning (MBRL) to real-world environments. In order to handle high-dimensional sensory inputs, existing MBRL approaches use representation learning to map high-dimensional observations into a lower-dimensional latent space that is more amenable to dynamics estimation and planning. Crucially, the task-relevance and predictability of the learned representations play critical roles in the success of planning in latent space. In this work, we present Non-Markovian Predictive Coding (NMPC), an information-theoretic approach for planning from high-dimensional observations with two key properties: 1) it formulates a mutual information objective that prioritizes the encoding of taskrelevant components of the environment; and 2) it employs a recurrent neural network capable of modeling non-Markovian latent dynamics. To demonstrate NMPC's ability to prioritize task-relevant information, we evaluate our new model on a challenging modification of standard DMControl tasks where the DMControl background is replaced with natural videos, containing complex but irrelevant information to the planning task. Our experiments show that NMPC is superior to existing methods in the challenging complex-background setting while remaining competitive with current state-of-the-art MBRL models in the standard setting.

1. INTRODUCTION

Learning to control from high dimensional observations has been made possible due to the advancements in reinforcement learning (RL) and deep learning. These advancements have enabled notable successes such as solving video games (Mnih et al., 2015; Lample & Chaplot, 2017) and continuous control problems (Lillicrap et al., 2016) from pixels. However, it is well known that performing RL directly in the high-dimensional observation space is sample-inefficient and may require a large amount of training data (Lake et al., 2017) . This is a critical problem, especially for real-world applications. Recent model-based RL works (Kaiser et al., 2020; Ha & Schmidhuber, 2018; Hafner et al., 2019; Zhang et al., 2019; Hafner et al., 2020) proposed to tackle this problem by learning a world model in the latent space, and then applying RL algorithms in the latent world model. The existing MBRL methods that learn a latent world model typically do so via reconstruction-based objectives, which are likely to encode task-irrelevant information, such as of the background. In this work, we take inspiration from the success of contrastive learning and propose Non-Markovian Predictive Coding (NMPC), a novel information-theoretic approach for planning from pixels. In contrast to reconstruction, NMPC formulates a mutual information (MI) objective to learn the latent space for control. This objective circumvents the need to reconstruct and prioritizes the encoding of task-relevant components of the environment, thus make NMPC more robust when dealing with complicated observations. Our primary contributions are as follows: • We propose Non-Markovian Predictive Coding (NMPC), a novel information-theoretic approach to learn latent world models for planning from high-dimensional observations and theoretically analyze its ability to prioritize the encoding of task-relevant information. • We show experimentally that NMPC outperforms the state-of-the-art model when dealing with complex environments dominated by task-irrelevant information, while remaining competitive on standard DeepMind control (DMControl) tasks. Additionally, we conduct detailed ablation analyses to study the empirical importance of the components in NMPC.

