PROVABLE BENEFITS OF REPRESENTATIONAL TRANSFER IN REINFORCEMENT LEARNING

Abstract

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a target task. We propose a new notion of task relatedness between source and target tasks, and develop a novel approach for representational transfer under this assumption. Concretely, we show that given a generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy, with only online access to the target task. The sample complexity is close to knowing the ground truth features in the target task, and comparable to prior representation learning results in the source tasks. We complement our positive results with lower bounds without generative access, and validate our findings with empirical evaluation on rich observation MDPs that require deep exploration.

1. INTRODUCTION

Leveraging historical experiences acquired in learning past skills to accelerate the learning of a new skill is a hallmark of intelligent behavior. In this paper, we study this question in the context of reinforcement learning (RL). Specifically, we consider a setting where the learner is exposed to multiple tasks and ask the following question: Can we accelerate RL by sharing representations across multiple related tasks? There is rich empirical literature which studies multiple approaches to this question and various paradigms for instantiating it. For instance, in a multi-task learning scenario, the learner has simultaneous access to different tasks and tries to improve the sample complexity by sharing data across them (Caruana, 1997) . Other works study a transfer learning setting, where the learner has access to multiple source tasks during a pre-training phase, followed by a target task (Pan and Yang, 2009) . The goal is to learn features and/or a policy which can be quickly adapted to succeed in the target task. More generally, the paradigms of meta-learning (Finn et al., 2017) , lifelong learning (Parisi et al., 2019) and curriculum learning (Bengio et al., 2009 ) also consider related questions. On the theoretical side, questions of representation learning have received an increased recent emphasis owing to their practical significance, both in supervised learning and RL settings. In RL, a limited form of transfer learning across multiple downstream reward functions is enabled by several recent reward-free representation learning approaches (Jin et al., 2020a; Zhang et al., 2020; Wang et al., 2020; Du et al., 2019; Misra et al., 2020; Agarwal et al., 2020; Modi et al., 2021) . Inspired by recent treatments of representation transfer in supervised (Maurer et al., 2016; Du et al., 2020) and imitation learning (Arora et al., 2020) , some works also study more general task collections in bandits (Hu et al., 2021; Yang et al., 2020 , 2022 ) and RL (Hu et al., 2021; Lu et al., 2021) . Almost all these works study settings where the representation is frozen after pre-training in the source tasks, and a linear policy or optimal value function approximation is trained in the target task using these learned features. This setting, which we call representational transfer, is the main focus of our paper. A crucial question in formalizing representational transfer settings is the notion of similarity between source and target tasks. Prior works in supervised learning make the stringent assumption that the covariates x follow the same underlying distribution in all the tasks, and only the conditional P (y|x) can vary across tasks (Du et al., 2020) . This assumption does not nicely generalize to RL settings, where state distributions are typically policy dependent, and prior attempts to extend this assumption to RL (Lu et al., 2021) result in strong assumptions on the learning setup. Other works (Hu et al., 

