MULTI-AGENT DEEP FBSDE REPRESENTATION FOR LARGE SCALE STOCHASTIC DIFFERENTIAL GAMES

Abstract

In this paper we present a deep learning framework for solving large-scale multiagent non-cooperative stochastic games using fictitious play. The Hamilton-Jacobi-Bellman (HJB) PDE associated with each agent is reformulated into a set of Forward-Backward Stochastic Differential Equations (FBSDEs) and solved via forward sampling on a suitably defined neural network architecture. Decision making in multi-agent systems suffers from curse of dimensionality and strategy degeneration as the number of agents and time horizon increase. We propose a novel Deep FBSDE controller framework which is shown to outperform the current state-of-the-art deep fictitious play algorithm on a high dimensional interbank lending/borrowing problem. More importantly, our approach mitigates the curse of many agents and reduces computational and memory complexity, allowing us to scale up to 1,000 agents in simulation, a scale which, to the best of our knowledge, represents a new state of the art. Finally, we showcase the framework's applicability in robotics on a belief-space autonomous racing problem.

1. INTRODUCTION

Stochastic differential games represent a framework for investigating scenarios where multiple players make decisions while operating in a dynamic and stochastic environment. The theory of differential games dates back to the seminal work of Isaacs (1965) studying two-player zero-sum dynamic games, with a first stochastic extension appearing in Kushner & Chamberlain (1969) . A key step in the study of games is obtaining the Nash equilibrium among players (Osborne & Rubinstein, 1994) . A Nash equilibrium represents the solution of non-cooperative game where two or more players are involved. Each player cannot gain benefit by modifying his/her own strategy given opponents equilibrium strategy. In the context of adversarial multi-objective games, the Nash equilibrium can be represented as a system of coupled Hamilton-Jacobi-Bellman (HJB) equations when the system satisfies the Markovian property. Analytic solutions exist only for few special cases. Therefore, obtaining the Nash equilibrium solution is usually done numerically, and this can become challenging as the number of states/agents increases. Despite extensive theoretical work, the algorithmic part has received less attention and mainly addresses special cases of differential games (e.g., Duncan & Pasik-Duncan (2015)), or suffers from the curse of dimensionality (Kushner, 2002) . Nevertheless, stochastic differential games have a variety of applications including in robotics and autonomy, economics and management. Relevant examples include Mataramvura & Øksendal (2008) , which formulate portfolio management as a stochastic differential game in order to obtain a market portfolio that minimizes the convex risk measure of a terminal wealth index value, as well as Prasad & Sethi (2004) , who investigate optimal advertising spending in duopolistic settings via stochastic differential games. Reinforcement Learning (RL) aims in obtaining a policy which can generate optimal sequential decisions while interacting with the environment. Commonly, the policy is trained by collecting histories of states, actions, and rewards, and updating the policy accordingly. Multi-agent Reinforcement Learning (MARL) is an extension of RL where several agents compete in a common environment, which is a more complex task due to the interaction between several agents and the environment, as well as between the agents. One approach is to assume agents to be part of environment (Tan, 1993) , but this may lead to unstable learning during policy updates (Matignon et al., 2012) . On the other hand, a centralized approach considers MARL through an augmented state and action system, reducing its training to that of single agent RL problem. Because of the combinatorial complexity, the centralized learning method cannot scale to more than 10 agents (Yang et al., 2019) . Another method is centralized training and decentralized execute (CTDE), however the challenge therein lies on how to decompose value function in the execute phase for value-based MARL. Sunehag et al. The mathematical formulation of a differential game leads to a nonlinear PDE. This motivates algorithmic development for differential games that combine elements of PDE theory with deep learning. Recent encouraging results (Han et al., 2018; Raissi, 2018) in solving nonlinear PDEs within the deep learning community illustrate the scalability and numerical efficiency of neural networks. The transition from a PDE formulation to a trainable neural network is done via the concept of a system of Forward-Backward Stochastic Differential Equations (FBSDEs). Specifically, certain PDE solutions are linked to solutions of FBSDEs, and the latter can be solved using a suitably defined neural network architecture. This is known in the literature as the deep FBSDE approach. Han et al. In this work we propose an alternative deep FBSDE approach to multi-agent non-cooperative differential games, aiming on reducing complexity and increasing the number of agents the framework can handle. The main contribution of our work is threefold: 1. We introduce an efficient Deep FBSDE framework for solving stochastic multi-agent games via fictitious play that outperforms the current state of the art in Relative Square Error (RSE) and runtime/memory efficiency on an inter-bank lending/borrowing example. 2. We demonstrate that our approach scales to a much larger number of agents (up to 1,000 agents, compared to 50 in existing work). To the best of our knowledge, this represents a new state of the art. 3. We showcase the applicability of our framework to robotics on a belief-space autonomous racing problem which has larger individual control and state space. The experiments demonstrates that the decoupled BSDE provides the possibility of applications for competitive scenario. The rest of the paper is organized as follows: in Section 2 we present the mathematical preliminaries. In Section 3 we introduce the Deep Fictitious Play Belief FBSDE, with simulation results following in Section 4. We conclude the paper and discuss some future directions in Section 5.

2. MULTI-AGENT FICTITIOUS PLAY FBSDE

Fictitious play is a learning rule first introduced in Brown (1951) where each player presumes other players' strategies to be fixed. An N -player game can then be decoupled into N individual decisionmaking problems which can be solved iteratively over M stages. When each agentfoot_0 converges to a stationary strategy at stage m, this strategy will become the stationary strategy for other players at stage m + 1. We consider a N -player non-cooperative stochastic differential game with dynamics dX(t) = f (X(t), t) + G(X(t), t)U (t) dt + Σ(X(t), t)dW (t), X(0) = X 0 , where X = (x 1 , x 2 , . . . , x N ) is a vector containing the state process of all agents generated by their controls U = (u 1 , u 2 , . . . , u N ) with x i ∈ R nx and u i ∈ R nu . Here, f : R nx × [0, T ] → R nx represents the drift dynamics, G : R nx × [0, T ] → R nx×nu represents the actuator dynamics, and Σ : [0, T ] × R n → R nx×nw represents the diffusion term. We assume that each agent is only driven by its own controls so G is a block diagonal matrix with G i corresponding to the actuation of agent i.



Agent and player are used interchangeably in this paper



(2018) and Zhou et al. (2019) decompose the joint value function into a summation of individual value functions. Rashid et al. (2018) keep the monotonic trends between centralized and decentralized value functions by augmenting the summation non-linearly and designing a mixing network (QMIX). Further modifications on QMIX include Son et al. (2019); Mahajan et al. (2019).

(2018); Pereira et al. (2019); Wang et al. (2019b) utilize various deep neural network architectures to solve such stochastic systems. However, these algorithms address single agent dynamical systems. Two-player zero-sum games using FBSDEs were initially developed in Exarchos et al. (2019) and transferred to a deep learning setting in Wang et al. (2019a). Recently,Hu (2019) brought deep learning into fictitious play to solve multi-agent non-zero-sum game, Han & Hu (2019) introduced the deep FBSDEs to a multi-agent scenario and the concept of fictitious play, furthermore, Han et al. (2020) gives the convergence proof.

