LEARNING ROBUST STATE ABSTRACTIONS FOR HIDDEN-PARAMETER BLOCK MDPS

Abstract

Many control tasks exhibit similar dynamics that can be modeled as having common latent structure. Hidden-Parameter Markov Decision Processes (HiP-MDPs) explicitly model this structure to improve sample efficiency in multi-task settings. However, this setting makes strong assumptions on the observability of the state that limit its application in real-world scenarios with rich observation spaces. In this work, we leverage ideas of common structure from the HiP-MDP setting, and extend it to enable robust state abstractions inspired by Block MDPs. We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings. Further, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work that use the same environment assumptions. To further demonstrate the efficacy of the proposed method, we empirically compare and show improvement over multi-task and meta-reinforcement learning baselines.

1. INTRODUCTION

A key open challenge in AI research that remains is how to train agents that can learn behaviors that generalize across tasks and environments. When there is common structure underlying the tasks, we have seen that multi-task reinforcement learning (MTRL), where the agent learns a set of tasks simultaneously, has definite advantages (in terms of robustness and sample efficiency) over the singletask setting, where the agent independently learns each task. There are two ways in which learning multiple tasks can accelerate learning: the agent can learn a common representation of observations, and the agent can learn a common way to behave. Prior work in MTRL has also leveraged the idea by sharing representations across tasks (D'Eramo et al., 2020) or providing pertask sample complexity results that show improved sample efficiency from transfer (Brunskill & Li, 2013) . However, explicit exploitation of the shared structure across tasks via a unified dynamics has been lacking. Prior works that make use of shared representations use a naive unification approach that posits all tasks lie in a shared domain (Figure 1 , left). On the other hand, in the single-task setting, research on state abstractions has a much richer history, with several works on improved generalization through the aggregation of behaviorally similar states (Ferns et al., 2004; Li et al., 2006; Luo et al., 2019; Zhang et al., 2020b) . In this work, we propose to leverage rich state abstraction models from the single-task setting, and explore their potential for the more general multi-task setting. We frame the problem as a structured super-MDP with a shared state space and universal dynamics model conditioned on a task-specific hidden parameter (Figure 1 , right). This additional structure gives us better sample efficiency, both



Figure 1: Visualizations of the typical MTRL setting and the HiP-MDP setting.

