RETURN-BASED CONTRASTIVE REPRESENTATION LEARNING FOR REINFORCEMENT LEARNING

Abstract

Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL). However, existing auxiliary tasks do not take the characteristics of RL problems into consideration and are unsupervised. By leveraging returns, the most important feedback signals in RL, we propose a novel auxiliary task that forces the learnt representations to discriminate state-action pairs with different returns. Our auxiliary loss is theoretically justified to learn representations that capture the structure of a new form of state-action abstraction, under which state-action pairs with similar return distributions are aggregated together. In low data regime, our algorithm outperforms strong baselines on complex tasks in Atari games and DeepMind Control suite, and achieves even better performance when combined with existing auxiliary tasks.

1. INTRODUCTION

Deep reinforcement learning (RL) algorithms can learn representations from high-dimensional inputs, as well as learn policies based on such representations to maximize long-term returns simultaneously. However, deep RL algorithms typically require large numbers of samples, which can be quite expensive to obtain (Mnih et al., 2015) . In contrast, it is usually much more sample efficient to learn policies with learned representations/extracted features (Srinivas et al., 2020) . To this end, various auxiliary tasks have been proposed to accelerate representation learning in aid of the main RL task (Suddarth and Kergosien, 1990; Sutton et al., 2011; Gelada et al., 2019; Bellemare et al., 2019; Franc ¸ois-Lavet et al., 2019; Shen et al., 2020; Zhang et al., 2020; Dabney et al., 2020; Srinivas et al., 2020) . Representative examples of auxiliary tasks include predicting the future in either the pixel space or the latent space with reconstruction-based losses (e.g., Jaderberg et al., 2016; Hafner et al., 2019a; b) . Recently, contrastive learning has been introduced to construct auxiliary tasks and achieves better performance compared to reconstruction based methods in accelerating RL algorithms (Oord et al., 2018; Srinivas et al., 2020) . Without the need to reconstruct inputs such as raw pixels, contrastive learning based methods can ignore irrelevant features such as static background in games and learn more compact representations. Oord et al. (2018) propose a contrastive representation learning method based on the temporal structure of state sequence. Srinivas et al. (2020) propose to leverage the prior knowledge from computer vision, learning representations that are invariant to image augmentation. However, existing works mainly construct contrastive auxiliary losses in an unsupervised manner, without considering feedback signals in RL problems as supervision. In this paper, we take a further step to leverage the return feedback to design a contrastive auxiliary loss to accelerate RL algorithms. Specifically, we propose a novel method, called Return-based

