NEURAL LYAPUNOV MODEL PREDICTIVE CONTROL

Abstract

With a growing interest in data-driven control techniques, Model Predictive Control (MPC) provides a significant opportunity to exploit the surplus of data reliably, particularly while taking safety and stability into account. In this paper, we aim to infer the terminal cost of an MPC controller from transitions generated by an initial unknown demonstrator. We propose an algorithm to alternatively learn the terminal cost and update the MPC parameters according to a stability metric. We design the terminal cost as a Lyapunov function neural network and theoretically show that, under limited approximation error, our proposed approach guarantees that the size of the stability region (region of attraction) is greater than or equal to the one from the initial demonstrator. We also present theorems that characterize the stability and performance of the learned MPC in the presence of model uncertainties and sub-optimality due to function approximation. Empirically, we demonstrate the efficacy of the proposed algorithm on non-linear continuous control tasks with soft constraints. Our results show that the proposed approach can improve upon the initial demonstrator also in practice and achieve better task performance than other learning-based baselines.

1. INTRODUCTION

Control systems comprise of safety requirements that need to be considered during the controller design process. In most applications, these are in the form of state/input constraints and convergence to an equilibrium point, a specific set or a trajectory. Typically, a control strategy that violates these specifications can lead to unsafe behavior. While learning-based methods are promising for solving challenging non-linear control problems, the lack of interpretability and provable safety guarantees impede their use in practical control settings (Amodei et al., 2016) . Model-based reinforcement learning (RL) with planning uses a surrogate model to minimize the sum of future costs plus a learned value function terminal cost (Moerland et al., 2020; Lowrey et al., 2018) . Approximated value functions, however, do not offer safety guarantees. By contrast, control theory focuses on these guarantees but it is limited by its assumptions. Thus, there is a gap between theory and practice. A feedback controller stabilizes a system if a local Control Lyapunov Function (CLF) function exists for the pair. This requires that the closed-loop response from any initial state results in a smaller value of the CLF at the next state. The existence of such a function is a necessary and sufficient condition for showing stability and convergence (Khalil, 2014) . However, finding an appropriate Lyapunov function is often cumbersome and can be conservative. By exploiting the expressiveness of neural networks (NNs), Lyapunov NNs have been demonstrated as a general tool to produce stability (safety) certificates (Bobiti, 2017; Bobiti & Lazar, 2016) and also improve an existing controller (Berkenkamp et al., 2017; Gallieri et al., 2019; Chang et al., 2019) . In most of these settings, the controller is parameterized through a NN as well. The flexibility provided by this choice comes at the cost of increased sample complexity, which is often expensive in real-world safety-critical systems. In this work, we aim to overcome this limitation by leveraging an initial set of one-step transitions from an unknown expert demonstrator (which may be sub-optimal) and by using a learned Lyapunov function and surrogate model within an Model Predictive Control (MPC) formulation. Our key contribution is an algorithmic framework, Neural Lyapunov MPC (NLMPC), that obtains a single-step horizon MPC for Lyapunov-based control of non-linear deterministic systems with constraints. By treating the learned Lyapunov NN as an estimate of the value function, we provide theoretical results for the performance of the MPC with an imperfect forward model. These results complement the ones by Lowrey et al. (2018) , which only considers the case of a perfect dynamics

