UNRAVEL STRUCTURED HETEROGENEITY OF TASKS IN META-REINFORCEMENT LEARNING VIA EX-PLORATORY CLUSTERING

Abstract

Meta-reinforcement learning (meta-RL) is developed to quickly solve new tasks by leveraging knowledge from prior tasks. The assumption that tasks are drawn IID is typically made in previous studies, which ignore possible structured heterogeneity of tasks. The non-transferable knowledge caused by structured heterogeneity hinders fast adaptation in new tasks. In this paper, we formulate the structured heterogeneity of tasks via clustering such that transferable knowledge can be inferred within different clusters and non-transferable knowledge would be excluded across clusters thereby. To facilitate so, we develop a dedicated exploratory policy to discover task clusters by reducing uncertainty in posterior inference. Within the identified clusters, the exploitation policy is able to solve related tasks by utilizing knowledge shared within the clusters. Experiments on various MuJoCo tasks showed the proposed method can unravel cluster structures effectively in both rewards and state dynamics, proving strong advantages against a set of state-of-the-art baselines.

1. INTRODUCTION

Conventional reinforcement learning (RL) is notorious for sample inefficiency, which often requires millions of interactions with the environment to learn a performing policy for a new task. Inspired by the human learning process, meta-reinforcement learning (meta-RL) is proposed to quickly learn new tasks by leveraging knowledge shared by related tasks (Finn et al., 2017; Duan et al., 2016; Wang et al., 2016) . Extensive efforts have been put into learning and utilizing transferable knowledge in meta-RL. For example, Finn et al. (2017) proposed to learn a set of shared meta parameters which is used to initialize the local policy when a new task comes. Duan et al. (2016) and Wang et al. (2016) trained an RNN encoder to characterize prior tasks according to the interaction history. However, little attention has been paid to the situations where some knowledge is only locally transferable among tasks. All the aforementioned methods implicitly assume tasks have substantially shared structures, and thus knowledge can be broadly shared across all tasks. However, heterogeneity among tasks exists in practice, if not prevails, which hampers the effectiveness of existing meta-RL algorithms. For example, the necessary skills for the Go game can hardly be applied to the Gomoku game, though both of them operate on the same chessboards. We formulate this scenario as a more complicated but also more realistic meta-RL setting where tasks are originated from different distributions, i.e., tasks are clustered. As a result, some knowledge is locally transferable within clusters, but cannot be shared globally. We refer to this as structured heterogeneity among RL tasks, and explicitly model cluster structures in the task distribution to capture cluster-level knowledgefoot_0 . Structured heterogeneity has been studied in supervised meta-learning (Yao et al., 2019) ; but it is a lot more challenging to be handled in meta-RL, where the key bottleneck is how to unravel the clustering structure in a population of RL tasks. This can be further decomposed into two key research questions, namely population-level structure estimation and task-level inference. Different from supervised learning tasks where static task-specific data is already provided before meta learning, the observations in RL tasks are collected by an agent's interactions with the environment. As



In this paper, we do not assume the knowledge in different clusters is exclusive, and thus each cluster can also contain overlapping global knowledge, e.g., motor skills in locomotion tasks.

