ON THE GEOMETRY OF REINFORCEMENT LEARNING IN CONTINUOUS STATE AND ACTION SPACES Anonymous

Abstract

Advances in reinforcement learning have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens. Central to our work is the idea that the transition dynamics induce a low dimensional manifold of reachable states embedded in the high-dimensional nominal state space. We prove that, under certain conditions, the dimensionality of this manifold is at most the dimensionality of the action space plus one. This is the first result of its kind, linking the geometry of the state space to the dimensionality of the action space. We empirically corroborate this upper bound for four MuJoCo environments. We further demonstrate the applicability of our result by learning a policy in this low dimensional representation. To do so we introduce an algorithm that learns a mapping to a low dimensional representation, as a narrow hidden layer of a deep neural network, in tandem with the policy using DDPG. Our experiments show that a policy learnt this way perform on par or better for four MuJoCo control suite tasks.

1. INTRODUCTION

The goal of a reinforcement learning (RL) agent is to learn an optimal policy that maximises the return which is the time discounted cumulative reward (Sutton & Barto, 1998) . Recent advances in RL research have lead to agents successfully learning in environments with enormous state spaces, such as games (Mnih et al., 2015; Silver et al., 2016) , and robotic control in simulation (Lillicrap et al., 2016; Schulman et al., 2015; 2017a) and real environments (Levine et al., 2016; Zhu et al., 2020; Deisenroth & Rasmussen, 2011) . However, we do not have an understanding of the intrinsic complexity of these seemingly large problems. For example, in most popular deep RL algorithms for continuous control, the agent's policy is parameterised by a a deep neural network (DNN) (Lillicrap et al., 2016; Schulman et al., 2015; 2017a) but we do not have theoretical models to guide the design of DNN architecture required to efficiently learn an optimal policy for various environments. There have been approaches to measure the difficulty of an RL environment from a sample complexity perspective (Antos et al., 2007; Munos & Szepesvari, 2008; Bastani, 2020) but these models fall short of providing recommendations for the policy and value function complexity required to learn an optimal policy. We view the complexity of RL environments through a geometric lens. We build on the intuition behind the manifold hypothesis, which states that most high-dimensional real-world datasets actually lie on low-dimensional manifolds (Tenenbaum, 1997; Carlsson et al., 2007; Fefferman et al., 2013; Bronstein et al., 2021) ; for example, the set of natural images are a very small, smoothly-varying subset of all possible value assignments for the pixels. A promising geometric approach is to model the data as a low-dimensional structure-a manifold-embedded in a high-dimensional ambient space. In supervised learning, especially deep learning theory, researchers have shown that the approximation error depends strongly on the dimensionality of the manifold (Shaham et al., 2015; Pai et al., 2019; Chen et al., 2019; Cloninger & Klock, 2020) , thereby connecting the complexity of the underlying structure of the dataset to the complexity of the DNN. As in supervised learning, researchers have applied the manifold hypothesis in RL-i.e. hypothesized that the effective state space lies on a low dimensional manifold (Mahadevan, 2005; Machado et al., 

