FACTOR LEARNING PORTFOLIO OPTIMIZATION INFORMED BY CONTINUOUS-TIME FINANCE MODELS Anonymous

Abstract

We study financial portfolio optimization in the presence of unknown and uncontrolled system variables referred to as stochastic factors. Existing work falls into two distinct categories: (i) reinforcement learning employs end-to-end policy learning with flexible factor representation, but does not precisely model the dynamics of asset prices or factors; (ii) continuous-time finance methods, in contrast, take advantage of explicitly modeled dynamics but pre-specify, rather than learn, factor representation. We propose FaLPO (factor learning portfolio optimization), a framework that interpolates between these two approaches. Specifically, FaLPO hinges on deep policy gradient to learn a performant investment policy that takes advantage of flexible representation for stochastic factors. Meanwhile, FaLPO also incorporates continuous-time finance models when modeling the dynamics. It uses the optimal policy functional form derived from such models and optimizes an objective that combines policy learning and model calibration. We prove the convergence of FaLPO and provide performance guarantees via a finite-sample bound. On both synthetic and real-world portfolio optimization tasks, we observe that FaLPO outperforms five leading methods. Finally, we show that FaLPO can be extended to other decision-making problems with stochastic factors.

1. INTRODUCTION

Portfolio optimization studies how to allocate investments across multiple risky financial assets such as stocks and safe assets such as US government bonds. The investment target is often formulated as maximizing the expected utility of the investment portfolio's value at a fixed time horizon, which conceptually maximizes profit while constraining risk (von Neumann & Morgenstern, 1947) . With continuous-time stochastic models of stock prices, great advances in the expected utility maximization framework were made in Merton (1969) using stochastic optimal control (dynamic programming) methods. More realistic models incorporate factors like economic indices and proprietary trading signals (Merton et al., 1973; Fama & French, 2015; 1992) , which (i) affect the dynamics of stock prices; (ii) stochastically evolve over time; (iii) are not affected by individual investment decisions. With greater data availability, it is natural to design and apply data-driven machine learning methods (Bengio, 1997; Dixon et al., 2020; De Prado, 2018) to handle factors for portfolio optimization. This work proposes a novel method-Factor Learning Portfolio Optimization (FaLPO)-which combines tools from both machine learning and continuous-time finance. Portfolio optimization with stochastic factors is challenging for three reasons. First, financial data is notoriously noisy and idiosyncratic (Goyal & Santa-Clara, 2003) , causing complex purely data-driven methods to be unstable and prone to overfitting. Second, the relationship between the factors and their impact on stock prices can be extremely complicated and difficult to model ex ante. Third, many successful finance models are in continuous time and require interacting with the environment infinitely frequently. As a result, such models cannot be easily combined with machine learning methods, many of which are in discrete time. Current approaches to portfolio optimization broadly fall into two categories: reinforcement learning (RL) and continuous-time finance methods. Many RL solutions to portfolio optimization are built on deep deterministic policy gradient (Silver et al. 2014; Hambly et al. 2021, Section 4.3) . Such methods parameterize the policy function as a neural network with strong representation power and learn the neural network by optimizing the corresponding portfolio performance. However, these approaches

