PROVABLY EFFICIENT MULTI-TASK REINFORCEMENT LEARNING IN LARGE STATE SPACES

Abstract

We study multi-task Reinforcement Learning where shared knowledge among different environments is distilled to enable scalable generalization to a variety of problem instances. In the context of general function approximation, Markov Decision Process (MDP) with low Bilinear rank encapsulates a wide range of structural conditions that permit polynomial sample complexity in large state spaces, where the Bellman errors are related to bilinear forms of features wi 'th low intrinsic dimensions. To achieve multi-task learning in MDPs, we propose online representation learning algorithms to capture the shared features in the different task-specific bilinear forms. We show that in the presence of low-rank structures in the features of the bilinear forms, the algorithms benefit from sample complexity improvements compared to single-task learning. Therefore, we achieve the first sample efficient multi-task reinforcement learning algorithm with general function approximation.

1. INTRODUCTION

The ability to capture informative representations that generalize among multiple tasks has become significant in various machine learning applications Li et al. ( 2014 



); Tsiakas et al. (2016); Baevski et al. (2019); D'Eramo et al. (2019); Kubota et al. (2020); Liu et al. (2019b). In the context of multi-task learning Caruana (1997); Baxter (2000); Yu et al. (2005), this ability is highly desirable and becomes vital to learn with fewer amount of samples than learning each single task individually. Representation learning Bengio et al. (2013) is a powerful approach for achieving such sample efficiency improvement. This paper considers representation learning in Multitask Reinforcement Learning, an important class of meta Reinforcement Learning (meta-RL) Wang et al. (2016); Finn et al. (2017); Ritter et al. (2018). Reinforcement learning (RL) is a sequential decision-making problem where an agent aims to learn the optimal decisions by interacting with an unknown environment Sutton & Barto (2018). Empowered by representation learning with deep neural networks LeCun et al. (2015); Goodfellow et al. (2016), RL has achieved tremendous success in various real-world applications, such as Go Silver et al. (2016), Atari Mnih et al. (2013), Dota2 Berner et al. (2019), Texas Holdém poker Moravčík et al. (2017), and autonomous driving Shalev-Shwartz et al. (2016). Therefore, the benefit of using representation learning to extract joint feature embedding from different but related tasks emerged as an essential problem to investigate. Specifically, this paper studies the problem of learning multiple RL problems jointly with the help of representation learning. Although multi-task learning in online-decision making problems has received increasing research interest Lazaric & Ghavamzadeh (2010); Mutti et al. (2021); Maurer et al. (2016); Qin et al. (2021); Yang et al. (2021); Hu et al. (2021), most existing works focus on tabular or linear models. Indeed, how general function approximations extrapolate across huge state spaces remains largely an open problem itself. Recently, Bilinear class Du et al. (2021) proposes a promising structural framework of generalization in reinforcement learning through the use of function approximation. Bilinear class postulates that the Bellman error can be related to a bilinear form depending on the hypothesis and captures nearly all existing function approximation models, e.g. Jin et al. (2020a); Zanette et al. (2020); Yang & Wang (2020); Jiang et al. (2017); Sun et al. (2019); Kakade et al. (2020); Agarwal et al. (2020). However, in the presence of shared information in the bilinear forms across multiple tasks, the Bilin-UCB proposed in Du et al. (2021) is not able to adapt

