DENSITY CONSTRAINED REINFORCEMENT LEARNING

Abstract

Constrained reinforcement learning (CRL) plays an important role in solving safety-critical and resource-limited tasks. However, existing methods typically rely on tuning reward or cost parameters to encode the constraints, which can be tedious and tend to not generalize well. Instead of building sophisticated cost functions for constraints, we present a pioneering study of imposing constraints directly on the state density function of the system. Density functions have clear physical meanings and can express a variety of constraints in a straightforward fashion. We prove the duality between the density function and Q function in CRL and use it to develop an effective primal-dual algorithm to solve density constrained reinforcement learning problems. We provide theoretical guarantees of the optimality of our approach and use a comprehensive set of case studies including standard benchmarks to show that our method outperforms other leading CRL methods in terms of achieving higher reward while respecting the constraints.

1. INTRODUCTION

Constrained reinforcement learning (CRL) (Achiam et al., 2017; Altman, 1999; Dalal et al., 2018; Paternain et al., 2019; Tessler et al., 2019) has received increasing interests as a way of addressing the safety challenges in reinforcement learning (RL). CRL techniques aim to find the optimal policy that maximizes the cumulative reward signal while respecting the specified constraints. Existing CRL approaches typically involve constructing suitable cost functions and value functions to take into account the constraints. Then a crucial step is to choose appropriate parameters such as thresholds for the cost and value functions to encode the constraints. However, one significant gap between the use of such methods and solving practical RL problems is the correct construction of the cost and value functions, which is typically not solved systematically but relies on engineering intuitions (Paternain et al., 2019) . Simple cost functions may not exhibit satisfactory performance, while sophisticated cost functions may not have clear physical meanings. When cost functions lack clear physical interpretations, it is difficult to formally guarantee the satisfaction of the performance specifications, even if the constraints on the cost functions are fulfilled. Moreover, different environments generally need different cost functions, which makes the tedious tuning process extremely time-consuming. In this work, we fill the gap by imposing constraints on the state density functions as an intuitive and systematic way to encode constraints in RL. Density is a measurement of state concentration in the state space, and is directly related to the state distribution. It has been well-studied in physics (Yang, 1991) and control (Brockett, 2012; Chen & Ames, 2019; Rantzer, 2001) . A variety of real-world constraints are naturally expressed as density constraints in the state space. Pure safety constraints can be trivially encoded as the entire density of the states being contained in the safe region. In more general examples, the vehicle densities in certain areas are supposed to be less than the critical density (Gerwinski & Krug, 1999) to avoid congestion. When spraying pesticide using drones, different parts of a farmland requires different levels of pesticide density. Indeed, in the experiments we will see how these problems are solved with guarantees using density constrained RL (DCRL). Our approach is based on the new theoretical results of the duality relationship between the density function and value function in optimal control (Chen & Ames, 2019). One can prove generic duality between density functions and value functions for both continuous dynamics and discrete-state Markov decision processes (MDP), under various setups such as using Bolza form terminal constraints, infinite horizon discounted rewards, or finite horizon cumulative rewards. In Chen & Ames (2019) the duality is proved for value functions in optimal control, assuming that the full dynamics 1

