LEARNING DYNAMIC ABSTRACT REPRESENTATIONS FOR SAMPLE-EFFICIENT REINFORCEMENT LEARNING

Abstract

In many real-world problems, the learning agent needs to learn a problem's abstractions and solution simultaneously. However, most such abstractions need to be designed and refined by hand for different problems and domains of application. This paper presents a novel top-down approach for constructing state abstractions while carrying out reinforcement learning. Starting with state variables and a simulator, it presents a novel domain-independent approach for dynamically computing an abstraction based on the dispersion of Q-values in abstract states as the agent continues acting and learning. Extensive empirical evaluation on multiple domains and problems shows that this approach automatically learns abstractions that are finely-tuned to the problem, yield powerful sample efficiency, and result in the RL agent significantly outperforming existing approaches.

1. INTRODUCTION

It is well known that good abstract representations can play a vital role in improving the scalability and efficiency of reinforcement learning (RL) (Sutton & Barto, 2018; Yu, 2018; Konidaris, 2019) . However, it is not very clear how good abstract representations could be efficiently learned without extensive hand-coding. Several authors have investigated methods for aggregating concrete states based on similarities in value functions but this approach can be difficult to scale as the number of concrete states or the transition graph grows. This paper presents a novel approach for top-down construction and refinement of abstractions for sample efficient reinforcement learning. Rather than aggregating concrete states based on the agent's experience, our approach starts with a default, auto-generated coarse abstraction that collapses the domain of each state variable (e.g., the location of each taxi and each passenger in the classic taxi world) to one or two abstract values. This eliminates the need to consider concrete states individually, although this initial abstraction is likely to be too coarse for most practical problems. The overall algorithm proceeds by interleaving the process of refining this abstraction with learning and evaluation of policies, and results in automatically generated, problem and reward-function specific abstractions that aid learning. This process not only helps in creating a succinct representation of cumulative value functions, but it also makes learning more sample efficient by using the abstraction to locally transfer states' values and cleaving abstract states only when it is observed that an abstract state contains states featuring a large spread in their value functions. This approach is related to research on abstraction for reinforcement learning and on abstraction refinement for model checking Dams & Grumberg (2018); Clarke et al. (2000) (a detailed survey of related work is presented in the next section). However, unlike existing streams of work, we develop a process that automatically generates conditional abstractions, where the final abstraction on the set of values of a variable can depend on the specific values of other variables. For instance, Fig. 1 displays a taxi world where for different values of the state variables (destination and passengers locations), meaningful conditional abstractions are constructed for the taxi location. A meaningful abstraction provides greater details in the taxi-location variable around the passenger location when the taxi needs to pick up a passenger (Fig. 1 (middle)). When the taxi has the passenger, the abstraction should show greater details around the destination (Fig. 1 (right)). Furthermore, our approach goes beyond the concept of counter-example driven abstraction refinement to consider the reward function as well as stochastic dynamics, and it uses measures of dispersion such as the standard deviation of Q-values to drive the refinement process. The main contributions of this paper

