D2RL: DEEP DENSE ARCHITECTURES IN REINFORCE-MENT LEARNING

Abstract

While improvements in deep learning architectures have played a crucial role in improving the state of supervised and unsupervised learning in computer vision and natural language processing, neural network architecture choices for reinforcement learning remain relatively under-explored. We take inspiration from successful architectural choices in computer vision and generative modeling, and investigate the use of deeper networks and dense connections for reinforcement learning on a variety of simulated robotic learning benchmark environments. Our findings reveal that current methods benefit significantly from dense connections and deeper networks, across a suite of manipulation and locomotion tasks, for both proprioceptive and image-based observations. We hope that our results can serve as a strong baseline and further motivate future research into neural network architectures for reinforcement learning. The project website is at this link https://sites.google.

1. INTRODUCTION

Deep Reinforcement Learning (DRL) is a general purpose framework for training goal-directed agents in high dimensional state and action spaces. There have been plenty of successes from DRL for robotic control tasks, spanning across locomotion and navigation tasks, both in simulation and in the real world (Schulman et al., 2015; Akkaya et al., 2019; Kalashnikov et al., 2018) . While the generality of the DRL framework lends itself to be applicable to a wide variety of tasks, one has to address issues such as the sample-efficiency and generalization of the agents trained with this framework. Sample-efficiency is fundamentally critical to agents trained in the real world, particularly for robotic control tasks. Baking in minimal inductive biases into the framework is one effective mechanism to address the issue of sample-efficiency of DRL agents and make them more efficient. The generality of the framework makes it difficult to control particular behaviours and inductive biases for DRL algorithms. Inductive biases are important for learning algorithms, as they are able to induce desirable behaviour in the learned agents. Recent work has sought to improve the sample efficiency of DRL by adding an inductive bias of invariance, when learning from images, through techniques such as data augmentations (Laskin et al., 2020; Kostrikov et al., 2020) and contrastive losses (Srinivas et al., 2020) . Similarly, another important inductive bias in DRL is the choice of the architectures for function approximators, for example how to parameterize the neural network for the policy and value functions. However, the problem of choosing architecture designs in DRL and robotics, for planning and control, has been largely ignored. Modern computer vision and language processing research have shown the disproportionate advantage of the size and depth of the neural networks used (He et al., 2016b; Radford et al.) wherein very deep neural networks can be trained such that they learn better and more generalizable representations. Furthermore, recent evidence suggests that deeper neural networks can not only learn more complex functions but also have a smoother loss landscape (Rolnick & Tegmark, 2017) . Learning function approximators which enable better optimization and expressivity is an important inductive bias, which is greatly exploited in vision and language processing by using clever neural network architecture choices such as residual connections (He et al., 2016a) , normalization layers (Santurkar et al., 2018) , and gating mechanisms (Hochreiter & Schmidhuber, 1997) , to name a few. It would be ideal to incorporate similar inductive biases in modern DRL algorithms in robotics in order to

