AGENT-CONTROLLER REPRESENTATIONS: PRINCIPLED OFFLINE RL WITH RICH EXOGENOUS INFORMATION

Abstract

Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenous information, i.e, any control-irrelevant information contained in observations. For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information, and introduce new offline RL benchmarks offering the ability to study this problem. We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications. To address these, we propose to use multi-step inverse models, which have seen a great deal of interest in the RL theory community, to learn Agent-Controller Representations for Offline-RL (ACRO). Despite being simple and requiring no reward, we show theoretically and empirically that the representation created by this objective greatly outperforms baselines.

1. INTRODUCTION

Effective real-world applications of reinforcement learning or sequential decision-making must cope with exogenous information in sensory data. For example, visual datasets of a robot or car navigating in busy city streets might contain information such as advertisement billboards, birds in the sky or other people crossing the road walks. Parts of the observation (such as birds in the sky) are irrelevant for controlling the agent, while other parts (such as people crossing along the navigation route) are extremely relevant. How can we effectively learn a representation of the world which extracts just the information relevant for controlling the agent while ignoring irrelevant information? Real world tasks are often more easily solved with fixed offline datasets since operating from offline data enables thorough testing before deployment which can ensure safety, reliability, and quality in the deployed policy (Lange et al., 2012; Ebert et al., 2018; Kumar et al., 2019; Jaques et al., 2019; Levine et al., 2020) . The Offline-RL setting also eliminates the need to address exploration and planning which comes into play during data collection.foot_0 Although approaches from representation learning have been studied in the online-RL case, yielding improvements, exogenous information has proved to be empirically challenging. A benchmark for learning from offline pixel-based data (Lu et al., 2022a) formalizes this challenge empirically. Combining these challenges, is it possible to learn distraction-invariant representations with rich observations in offline RL? Approaches for discovering small tabular-MDPs (≤500 discrete latent states) or linear control problems invariant to exogenous information have been introduced (Dietterich et al., 2018; Efroni et al., 2021; 2022b; a; Lamb et al., 2022) before. However, the planning and exploration techniques in these algorithms are difficult to scale. A key insight that Efroni et al. (2021); Lamb et al. (2022) uncovered is the usefulness of multi-step action prediction for learning exogenous-invariant representation.



This elimination however can make offline RL more difficult if the wrong data is collected.1

