NON-MARKOVIAN STOCHASTIC CONTROL PROBLEMS IN CONTINUOUS-TIME WITH NEURAL RDES

Abstract

We propose a novel framework for solving continuous-time, non-Markovian stochastic optimal problems with the use of neural rough differential equations (Neural RDEs). By parameterising the control process as the solution of a Neural RDE driven by the state process, we show that the control-state joint dynamics are governed by an uncontrolled RDE with structured vector fields, allowing for efficient trajectories simulation, Monte-Carlo estimation of the value function and backpropagation. To deal with input paths of infinite 1-variation, we refine the universal approximation result in Kidger et al. ( 2020) to a probabilistic density result for Neural RDEs driven by random rough paths. Experiments on various non-Markovian problems indicate how the proposed framework is time-resolutioninvariant and capable of learning optimal solutions with higher accuracy than traditional RNN-based approaches. Finally, we discuss possible extensions of this framework to the setting of non-Markovian continuous-time reinforcement learning and provide promising empirical evidence in this direction.

1. INTRODUCTION

The field of stochastic control is concerned with problems where an agent interacts over time with some random environment through the action of a control. In this setting, the agent seeks to select the control such that some objective depending on the trajectory of the system under their control and the choice of the control itself is optimised; commonly, as the system is stochastic, such an objective takes the form of an expectation of some pathwise cost or reward. The study of this class of problems is intimately related to reinforcement learning (RL) and has been successfully applied to many fields of modern sciences, including biology Cucker & Smale (2007), economics Kamien & Schwartz (2012 ), engineering Grundel et al. (2007) , finance Pham (2009) , and more recently, epidemics control Hubert et al. (2022) . Stochastic control is nowadays regarded as a well-established field of mathematics. Two main approaches govern the analysis: the stochastic maximum principle and the dynamic programming approach, see Yong & Zhou (1999); Pham (2009) . In either case, an agent is interested in characterising a set of optimal strategies, the dynamics of the system under such strategies, and the optimal value of the corresponding reward functional. The two main sources of complexity for tackling these problems are: 1) the continuous-time nature of the underlying stochastic dynamics, and 2) the presence of memory yielding a non-negligible impact of the system's history on its future evolution. On the one hand, compared to their discrete counterparts, continuous-time stochastic control problems have received an increasing amount of attention in recent years, partly because the underlying physical processes themselves often develop in continuous time, partly because of their characterisation via partial differential equations (PDEs) or backward stochastic differential equations (BSDEs). On the other hand, non-Markovian stochastic control problems, where the evolution of the system depends on its history and not only on its current state, often provide a more faithful class of models to describe real-world phenomena than their Markov counterparts, where the (infinitesimal) displacement of the state dynamics depend only on the current state. Typical examples of settings where non-Markovian stochastic control problems in continuous-time arise include rough volatility models Gatheral et al. (2018) from quantitative finance in which the non-Markovianity stems from having a fractional Brownian motion as the driving noise. Another common source of non-Markovian problems are delayed control problems, where memory is incorporated into the system by assuming path-dependence of the vector fields governing the dynamics (see Sec. 3 for a precise statement). These are ubiquitous in economics, for example in the study of growth models with delayed production or pension funds models, Kydland & Prescott (1982) ; Salvatore (2011) , in marketing for models of optimal advertising with "distributed lag" effects Gozzi et al. (2009) , and in finance for portfolio selection under the market with memory and delayed responses Øksendal et al. (2011) . See also Kolmanovskiı & Shaıkhet (1996) for modelling systems with after-effect in mechanics, engineering, biology, and medicine. Despite recent theoretical advances in simplified settings, non-Markovian stochastic control problems in continuous-time are often not analytically tractable, a fact that undeniably motivates the need for developing efficient numerical schemes to solve them. Additionally, such methods could provide a fruitful basis for (non-Markovian) reinforcement learning in continuous time, studied in the Markovian case recently by Jia & Zhou ( 2021 2021) -a continuous-time analogue to recurrent neural networks (RNNs)we propose a novel framework which, to the best of our knowledge, is the first numerical approach allowing to solve non-Markovian stochastic control problems in continuous-time. More precisely, we parameterise the control process as the solution of a Neural RDE driven by the state process, and show that the control-state joint dynamics are governed by an uncontrolled RDE with vector fields parameterised by neural networks. We demonstrate how this formulation allows for trajectories sampling, Monte-Carlo estimation of the reward functional and backpropagation. To deal with sample paths of infinite 1-variation, which is necessary in stochastic control, we also extend the universal approximation result in Kidger et al. (2020) to a probabilistic density result for Neural RDEs driven by random rough paths. The interpretation is that we are able to approximate continuous feed-back controls arbitrarily well in probability. Through various experiments, we demonstrate how the proposed framework is time-resolution-invariant and capable of learning optimal solutions with higher accuracy than traditional RNN-based approaches. Finally, we discuss possible extensions to the setting of non-Markovian reinforcement learning (RL) in continuous-time and provide promising empirical evidence in this direction. The rest of the paper is organised as follows: in Sec. 2 we discuss some related work, in Sec. 3 we present our numerical scheme and the universality result, in Sec. 4 we study the extension to non-Markovian RL in continuous-time, and in Sec. 5 we present our numerical results.

2. RELATED WORK

Over the last decade, a large volume of research has been conducted to solve Markovian stochastic control problems numerically using neural networks, either by directly parameterising the control and then sampling from the state process, such as done by Han et al. (2016) Cartea et al. (2022) . This approach does not rely on a model underpinning the dynamics of the unaffected processes and has shown excellent results when solving a number of algorithmic trading problems. However, this method has two main drawbacks: (i) it suffers from the curse of dimensionality -this happens when one wishes to compute signatures of a high-dimensional (more than five)



); Wang et al. (2020). Contributions Using the modern tool set offered by neural rough differential equations (Neural RDEs) Morrill et al. (

, or by solving the PDEs or BSDEs associated with the problem; see Germain et al. (2021) for a recent survey about neural networks-based algorithms for stochastic control and PDEs. We also mention two examples from the growing literature. The Deep BSDE model from Han et al. (2017), where the authors propose an algorithm to solve parabolic PDEs and BSDEs in high dimension and think of the gradient of the solution as the policy function, approximated with a neural network. The Deep Galerkin model Sirignano & Spiliopoulos (2018) is a mesh-free algorithm to solve PDEs associated with the value function of control problems; the authors approximate the solution with a deep neural network which is trained to satisfy the PDE differential operator, initial condition, and boundary conditions. Recently, signatures methods Lyons (2014); Kidger et al. (2019) have been employed for solving both Markovian and non-Markovian control problems in simplified settings Kalsi et al. (2020);

