UNSUPERVISED TASK CLUSTERING FOR MULTI-TASK REINFORCEMENT LEARNING Anonymous

Abstract

Meta-learning, transfer learning and multi-task learning have recently laid a path towards more generally applicable reinforcement learning agents that are not limited to a single task. However, most existing approaches implicitly assume a uniform similarity between tasks. We argue that this assumption is limiting in settings where the relationship between tasks is unknown a-priori. In this work, we propose a general approach to automatically cluster together similar tasks during training. Our method, inspired by the expectation-maximization algorithm, succeeds at finding clusters of related tasks and uses these to improve sample complexity. We achieve this by designing an agent with multiple policies. In the expectation step, we evaluate the performance of the policies on all tasks and assign each task to the best performing policy. In the maximization step, each policy trains by sampling tasks from its assigned set. This method is intuitive, simple to implement and orthogonal to other multi-task learning algorithms. We show the generality of our approach by evaluating on simple discrete and continuous control tasks, as well as complex bipedal walker tasks and Atari games. Results show improvements in sample complexity as well as a more general applicability when compared to other approaches.

1. INTRODUCTION

Figure 1 : An agent (smiley) should reach one of 12 goals (stars) in a grid world. Learning to reach a goal in the top right corner helps him to learn about the other goals in that corner. However, learning to reach the green stars (bottom left corner) at the same time gives conflicting objectives, hindering training. Task clustering resolves the issue. Imagine we are given an arbitrary set of tasks. We know that dissimilarities and/or contradicting objectives can exist. However, in most settings we can only guess these relationships and how they might affect joint training. Many recent works rely on such human guesses and (implicitly or explicitly) limit the generality of their approaches. This can lead to impressive results, either by explicitly modeling the relationships between tasks as in transfer learning (Zhu et al., 2020) , or by meta learning implicit relations (Hospedales et al., 2020) . However, in some cases an incorrect similarity assumption can hurt learning performance (Lazaric, 2012). Our aim with this paper is to provide an easy, straightforward approach to avoid human assumptions on task similarities. An obvious solution is to train a separate policy for each task. However, this leads to a large amount of experience being required to learn the desired behaviors. Therefore, it is desirable to have a single agent and allow the sharing of knowledge between tasks. This is generally known as multi-task learning, a field which has received a large amount of interest in both the supervised learning and reinforcement learning (RL) community (Zhang & Yang, 2017) . If tasks are sufficiently similar, a policy that is trained on one task provides a good starting point for another task, and experience from each task will help training in the other tasks. This is known as positive transfer (Lazaric, 2012). However, if the tasks are sufficiently dissimilar, negative transfer occurs and reusing a pre-trained policy is disadvantageous. It can even lead to a worse performance than simply starting with a random initialization. Here using experience from the other tasks might slow training or even prevent con-1

