ADVERSARIALLY ROBUST NEURAL LYAPUNOV CONTROL

Abstract

State-of-the-art learning-based stability control methods for nonlinear robotic systems suffer from the issue of reality gap, which stems from discrepancy of the system dynamics between training and target (test) environments. To mitigate this gap, we propose an adversarially robust neural Lyapunov control (ARNLC) method to improve the robustness and generalization capabilities for Lyapunov theory-based stability control. Specifically, inspired by adversarial learning, we introduce an adversary to simulate the dynamics discrepancy, which is learned through deep reinforcement learning to generate the worst-case perturbations during the controller's training. By alternatively updating the controller to minimize the perturbed Lyapunov risk and the adversary to deviate the controller from its objective, the learned control policy enjoys a theoretical guarantee of stability. Empirical evaluations on five stability control tasks with the uniform and worstcase perturbations demonstrate that ARNLC not only accelerates the convergence to asymptotic stability, but can generalize better in the entire perturbation space.

1. INTRODUCTION

Designing a stable and robust controller to stabilize nonlinear dynamical systems has long been a challenge. Lyapunov stability theory performs a fairly significant role in the controller design for stability control of robotic systems (Uddin et al., 2021; Sharma & Kumar, 2020; Liu et al., 2020b; Norouzi et al., 2020; Pal et al., 2020) . However, many previous approaches are restricted to the polynomial approximation of system dynamics (Kwakernaak & Sivan, 1969; Parrilo, 2000) , and suffer from sensitivity issues when searching for the Lyapunov functions (Löfberg, 2009) . Recently, by leveraging deep learning-based methods, some works have successfully incorporated the Lyapunov stability theory with the powerful expressiveness of neural networks and convenience of gradient descent for network learning (Chang et al., 2019; Abate et al., 2020; Mehrjou et al., 2021; Dawson et al., 2022) . One outstanding method among them is the neural Lyapunov control (NLC) (Chang et al., 2019) , where both the Lyapunov function and controller policy are approximated by neural networks. In NLC, the networks are trained by minimizing a Lyapunov risk stemmed from the Lyapunov stability theorem. Nevertheless, most existing learning-based controller are trained without any distinction between the training and test environments (Cobbe et al., 2019; Witty et al., 2021) . Since the training simulator cannot perfectly model the target environment for testing, a reality gap will incur inevitably by such a modelling error (i.e., discrepancy of system dynamics), which degrades the performance of controller at the actual deployment. Hence, learning-based controller needs to consider the uncertainty of physical parameters (or external forces) that may cause the modelling error (Liu et al., 2020a; Garg & Panagou, 2021; Islam et al., 2015; Zhao et al., 2020) . Motivated by this, we focus in this paper on addressing the challenging problem of learning a controller to stabilize the nonlinear dynamical system in face of such a modelling error. Over the years, several approaches have already been proposed to alleviate the controller's performance degradation incurred by modelling errors. The majority of them is built upon another splendid learning-based control method: deep reinforcement learning (RL) (Sutton & Barto, 2018; Schulman et al., 2017b) . These deep RL-based control methods treat the modeling error as an extra disturbance to the system (Başar & Bernhard, 2008) , and have achieved a great success in controlling (Pinto et al., 2017; Tessler et al., 2019; Zhang et al., 2020; 2021; Mankowitz et al., 2020) . For example, in robust adversarial reinforcement learning (RARL), the policy learning is formulated as a zero-sum game between the controller and an adversary that generates disturbances to the system, where the learned controller is proved to have improved capability of robustness and generalization. Since RL methods train policies by maximizing the sum of expected rewards that the agent obtains during the interaction with environment, its performance depends greatly on the manually designed reward function while the learned policy is sensitive to the preset control interval (Tallec et al., 2019; Park et al., 2021) . Hence, RL is prone to fail in the control tasks with a relatively small control interval, as will later be verified in our experiments. While our aim is to find the control policy that can enable a stable control, which is also robust to the choice of control intervals. In this paper, we present a novel method that can automatically learn robust control policies with a provable guarantee of stability. Specifically, we formulate a perturbed Lyapunov risk for learning a controller in the dynamical system, which is imposed with the adversary's perturbations in a certain range. To train the controller policy to resist to the worst-case perturbations within that range, we formulate the learning of adversary as a Markov decision process (MDP), and train the adversary policy by proximal policy optimization (PPO). In the case of known system dynamics, the action space in the MDP can be the range of external forces or space of physical parameters, which causes the modelling error. More practically for the unknown dynamics, the original NLC no longer works since update of the networks is infeasible without prior knowledge of the system dynamics. We therefore train an environment model by sampling from the system, while the adversary's action is set as the offset to the output of this environment model. We further formulate an adversarially robust controller learning problem, which is approximately solved by alternatively updating the controller policy with Lyapunov methods and the adversary policy by PPO. Our contributions can be summarized as follows. • We propose a perturbed Lyapunov risk for learning the control policy under perturbations. • We formulate an optimization problem for adversarially robust controller learning, to learn a policy in face of the worst-case perturbations that are imposed by the RL-trained adversary. • We propose an adversarially robust neural Lyapunov control (ARNLC) approach to approximately solve this problem, and demonstrate its performance on several stability control tasks. 2022) also propose an approach for learning the robust nonlinear controller based on the robust convex optimization and Lyapunov theory, achieving generalization beyond system parameters seen during the training process. However, this approach only considers the control-affine dynamical systems, not the more general nonlinear ones. In this work, we focus on improving the robustness and generalization for control policies of nonlinear dynamical systems.

2. RELATED WORK

Robust model predictive control (Robust MPC). Robust MPC is another research branch to deal with the uncertainty in physical parameters (Sun et al., 2018; Hu & Ding, 2019; Köhler et al., 2021) . It looks for the optimal feedback law among all the feasible feedback laws within a given finite horizon, in terms of a given control performance criterion at every sampling instant (Houska & Villanueva, 2019) . However, it is usually restricted to the additive disturbances (Löfberg, 2003) and is computationally expensive (Bemporad & Morari, 1998) .



Adversarial training. The idea of viewing the gap between training and test scenarios as an extra disturbance of the system was first proposed in Morimoto & Doya (2005), with the problem formulated as finding a min-max solution of the value function that takes the perturbations into account. Inspired by Morimoto & Doya (2005), Pinto et al. (2017) propose the robust adversarial reinforcement learning (RARL), where an adversary is learned simultaneously to prevent the agent from accomplishing its goal, while the agent's policy and the adversary policy are trained alternately. Zhang et al. (2020) propose robust reinforcement learning based on perturbations on state observations, introducing an adversary to apply disturbances on the state observations of the agent. Tessler et al. (2019) focus on a scenario where the agent attempts to perform an action, which behaves differently from expected due to disturbances. All of the above literature mainly studies training the adversary for RL settings, while our focus in this work is on introducing adversarial training to the Lyapunov stability control. Neural Lyapunov stability control. Chang et al. (2019) propose the neural Lyapunov control, which uses neural networks to learn both the control and Lyapunov functions for nonlinear dynamical systems based on the Lyapunov stability theory. Saha et al. (2021) learn a control law that stabilizes an unknown nonlinear dynamic system. However, it needs to design a Lyapunov function manually. Dawson et al. (

