LEARNING SYSTEM DYNAMICS FROM SENSORY INPUT UNDER OPTIMAL CONTROL PRINCIPLES

Abstract

Identifying the underlying dynamics of actuated physical systems from sensory input is of high interest in control, robotics, and engineering in general. In the context of control problems, existing approaches decouple the construction of the feature space where the dynamics identification process occurs from the target control tasks, potentially leading to a mismatch between feature and real state spaces: the systems may not be controllable in feature space, and synthesized controls may not be applicable in the state space. Borrowing from the Koopman formalism, we propose instead to learn an embedding of both the states and controls in feature spaces where the dynamics are linear, and to include the target control task in the learning objective in the form of a differentiable and robust optimal control problem. We validate this approach with simulation experiments of systems with non-linear dynamics, demonstrating that the controls obtained in feature space can be used to drive the corresponding physical systems and that the learned model can serve for future state prediction.

1. INTRODUCTION

The study of dynamical systems is a key element in understanding most physical phenomena. Such systems are ruled by ordinary differential equations of state variables that contain enough information to describe and determine their behavior, and analytical models of these systems are traditionally derived as solutions of the differential equations in question. However, it is hard to fully model mathematically most real-life phenomena for several reasons: they may have very complex dynamics with complex and constantly changing interactions with the environment, and the state of the physical systems involved may be unknown or not fully observable. On the other hand, the physical systems themselves, if not their internal states, can be observed with sensory data providing implicit information about the underlying (and unknown) states. Thus, leveraging measurement data is natural, and is actually done in a wide range of approaches which build representations of systems from past measurements in the form of feature spaces (Brunton et al., 2016b; Arbabi et al., 2018; Bruder et al., 2019; Brunton et al., 2021) . These models are of high practical interest since they enable compact representations compared to the density of measurements (e.g., when measurements are images). They also enable lifting the state of the system to a higher dimensional space where predictive models can be built. However, even when effective, these estimated models and feature spaces remain highly uninterpretable, and using them in solving control problems remains challenging. Linear models on the contrary are easily interpretable, and enable exact and effective control when coupled with LQR solvers. In particular, the Koopman operator theory (Koopman, 1931) has gained a lot of interest recently (Proctor et al., 2016; Brunton et al., 2016b; Abraham et al., 2017; Morton et al., 2018; Korda & Mezić, 2018; Arbabi et al., 2018; Brunton et al., 2021) . It guarantees the existence of a linear (if typically infinite-dimensional) representation of the dynamics of the observables (vector-valued functions) defined over the state space. Finite dimensional approximations have been proposed, and dynamic mode decomposition (DMD) (Schmid, 2010) is of particular interest in this context. In DMD, an approximation of the Perron-Frobenius operator, adjoint to the Koopman operator, is constructed in the form of a matrix that ensures the transition from one observation to the next. Proctor et al. (2016) first extended the use of DMD to actuated systems and modeled the system dynamics as a linear function of the state representation and the control. Several works have built upon this approximation (Morton et al., 2018; Li et al., 2020) and various methods for estimating the corresponding operators have been proposed (Morton et al., 2018; Xiao et al., 2021; Li et al., 2020) . In all these works, these operators are constructed to solve a prediction task, assuming the controls are known and, once obtained, they are used in a control task, typically an LQR problem. However, there are two main issues with this decoupled approach. First, the learned features may not be adapted to the control task, since they were not trained for it. Thus, the modeled dynamics are not guaranteed to be effective when used as (linear) constraints to minimize a given (quadratic) cost. We believe that including the control problem in the learning process should help learning features that are well suited for both prediction and control. Second, when looking for a couple of matrices that satisfy the desired linearity property in the feature space, it is assumed that the dynamics are linear in the real controls. This is a strong assumption that is not necessarily satisfied, and is not justified by Koopman operator theory. In addition, the dimension of the observations representations is usually larger than that of the (unknwon) states, resulting in a representation that has potentially more degrees of freedom than the real system. Thus, if the dimension of the control in feature space remains that of the original system, the system might not be controllable anymore. This is justified by the Kalman criterion for controllability (Kalman, 1964) which is that the controllability matrix is full rank. This is facilitated by having observation and control feature spaces of similar dimensions. To address this issue, we propose to also lift the controls to a higher dimensional space in order to avoid rank-deficiency issues in the controllability matrix. To summarize, we learn in this work a representation of dynamical systems from measurements where a linear model of their dynamics can be identified and used to predict their behaviour and control them. In this representation space, both states and controls are lifted to a higher dimension, which makes the system linearly controllable in this space. We are able to do so by directly including a control task in the learning objective, which allows us to learn representations that are better suited for control. We include experiments that show the effectiveness of our approach in controlling pendulum and cartpole systems in simulation.

2. METHOD

2.1 CONTROLLED DYNAMICAL SYSTEMS Actuated dynamical systems are dynamical systems whose state x(t) ∈ X ⊂ R n follows a differential equation of the form ẋ(t) = f (x(t), u(t)), where f is a (non-linear) function, and u(t) a control input in a control space U ⊂ R p . In this work, we study discrete time actuated dynamical systems, i.e., systems whose discrete state x t in X ⊂ R n follows an equation of the type x t+1 = f t (x t , u t ), (1) where u t in U ⊂ R p is a control vector. In practical settings, the model f t is unknown, and the states x t are unknown or only partially observable through sensors g t . In this work, we consider measurements d t = g t (x t ), and seek to learn encodings of the unknown states of the system from these measurements.

2.2. APPROACH

We want to learn encodings of both the states and the controls that have three properties: first, the code for the system state at time t, constructed from system measurements at that time t, should contain enough information to capture the behavior of the system at that time. Second, we want the dynamics to be linear in codes space, even though they may be arbitrary in the original state space. This is motivated in part by the Koopman linear representation of arbitrary dynamics for nonactuated systems. Third, we want the system to be controllable in the learned representation space. Because of the first and second properties, the system dynamics are lifted to a higher dimensional space. In such a space, the system representation has more degrees of freedom, and might not be controllable with the original controls anymore. In our approach, we propose to also learn an encoding of the controls through a second autoencoder, in order to lift the controls to a higher dimensional space. Encoding. Let us consider a dynamical system governed by Eq. ( 1). We assume that the states x t and the model g are unknown, and that we only have access to a sequence of T measurements d t in I ⊂ R C×H×W of the system (images in our case). We want to learn the parameters of encoders ϕ : I → R n and ψ : U → R d such that: z t = ϕ(d t ), c t = ψ(u t ), for t = 1, . . . , T . (2)

