NON-MARKOVIAN STOCHASTIC CONTROL PROBLEMS IN CONTINUOUS-TIME WITH NEURAL RDES

Abstract

We propose a novel framework for solving continuous-time, non-Markovian stochastic optimal problems with the use of neural rough differential equations (Neural RDEs). By parameterising the control process as the solution of a Neural RDE driven by the state process, we show that the control-state joint dynamics are governed by an uncontrolled RDE with structured vector fields, allowing for efficient trajectories simulation, Monte-Carlo estimation of the value function and backpropagation. To deal with input paths of infinite 1-variation, we refine the universal approximation result in Kidger et al. ( 2020) to a probabilistic density result for Neural RDEs driven by random rough paths. Experiments on various non-Markovian problems indicate how the proposed framework is time-resolutioninvariant and capable of learning optimal solutions with higher accuracy than traditional RNN-based approaches. Finally, we discuss possible extensions of this framework to the setting of non-Markovian continuous-time reinforcement learning and provide promising empirical evidence in this direction.

1. INTRODUCTION

The field of stochastic control is concerned with problems where an agent interacts over time with some random environment through the action of a control. In this setting, the agent seeks to select the control such that some objective depending on the trajectory of the system under their control and the choice of the control itself is optimised; commonly, as the system is stochastic, such an objective takes the form of an expectation of some pathwise cost or reward. The study of this class of problems is intimately related to reinforcement learning (RL) and has been successfully applied to many fields of modern sciences, including biology Cucker & Smale ( 2007 Stochastic control is nowadays regarded as a well-established field of mathematics. Two main approaches govern the analysis: the stochastic maximum principle and the dynamic programming approach, see Yong & Zhou (1999); Pham (2009). In either case, an agent is interested in characterising a set of optimal strategies, the dynamics of the system under such strategies, and the optimal value of the corresponding reward functional. The two main sources of complexity for tackling these problems are: 1) the continuous-time nature of the underlying stochastic dynamics, and 2) the presence of memory yielding a non-negligible impact of the system's history on its future evolution. On the one hand, compared to their discrete counterparts, continuous-time stochastic control problems have received an increasing amount of attention in recent years, partly because the underlying physical processes themselves often develop in continuous time, partly because of their characterisation via partial differential equations (PDEs) or backward stochastic differential equations (BSDEs). On the other hand, non-Markovian stochastic control problems, where the evolution of the system depends on its history and not only on its current state, often provide a more faithful class of models to describe real-world phenomena than their Markov counterparts, where the (infinitesimal) displacement of the state dynamics depend only on the current state.



), economics Kamien & Schwartz (2012), engineering Grundel et al. (2007), finance Pham (2009), and more recently, epidemics control Hubert et al. (2022).

