LEARNING INSTANCE-SOLUTION OPERATOR FOR OP-TIMAL CONTROL

Abstract

Optimal control problems (OCPs) involves finding a control function for a dynamical system such that a cost functional is optimized, which are central to physical system research in both academia and industry. In this paper, we propose a novel instance-solution operator learning perspective, which solves OCPs in a one-shot manner with no dependence on the explicit expression of dynamics or iterative optimization processes. The design is in principle endowed with substantial speedup in running time, and the model reusability is guaranteed by high-quality in-and out-of-distribution generalization. We theoretically validate the perspective by presenting the approximation bounds for the instance-solution operator learning. Experiments on 7 synthetic environments and a real-world dataset verify the effectiveness and efficiency of our approach. The source code will be made publicly available.

1. INTRODUCTION

The explosion of data for embedding the physical world is reshaping the ways we understand, model, and control dynamical systems. Though control theory has been classically rooted in a model-based design and solving paradigm, the demands of model reusability, and the opacity of complex dynamical systems call for a rapprochement of modern control theory, machine learning, and optimization. Recent years have witnessed emerging trends of control theories with successful applications to engineering and scientific research, such as robotics (Krimsky & Collins, 2020) , aerospace technology (He et al., 2019) , and economics and management (Lapin et al., 2019) etc. We consider the well-established formulation of optimal control (Kirk, 2004) in finite time horizon T = [t 0 , t f ]. Denote X and U as two vector-valued function sets, representing state functions and control functions respectively. Functions in X (resp. U ) are defined over T and have their outputs in R dx (resp. R du ). State functions x ∈ X and control functions u ∈ U are governed by a differential equation. The optimal control problem (OCP) is targeted at finding a control function that minimizes the cost functional f (Lions, 1992; Kirk, 2004; Vinter & Vinter, 2010; Lewis et al., 2012) : min u∈U f (x, u) = t f t0 p(x(t), u(t)) dt + h(x(t f )) (1a) s.t. ẋ(t) = d(x(t), u(t)), (1b) x(t 0 ) = x 0 , ( ) where d is the dynamics of differential equations; p evaluates the cost alongside the dynamics and h evaluates the cost at the termination state x(t f ); and x 0 is the initial state. We restrict our discussion to differential equation-governed optimal control problems, leaving the control problems in stochastic networks (Dai & Gluzman, 2022), inventory management (Abdolazimi et al., 2021) , etc. out of the scope of this paper. The analytic solution of Eq. 1 is usually unavailable, especially for complex dynamical systems. Thus, there has been a wealth of research towards accurate, efficient, and scalable numerical OCP solvers (Rao, 2009) and neural network based solvers (Kiumarsi et al., 2017) in recent years. However, both classic and modern numerical OCP solvers are facing challenges, especially emerging in the big data era, which we briefly discuss as follows. 1) Opacity of Dynamical Systems. Existing works (Böhme & Frank, 2017a; Effati & Pakdaman, 2013; Jin et al., 2020) assume the dynamical systems a priori and exploit their explicit forms to ease Table 1: Comparison of modern optimal control approaches. The proposed OptCtrlOP naturally covers all the merits in the sense of performing a single-phase direct-mapping paradigm that does not rely on known system dynamics, and supports arbitrary input-domain queries. the optimization. However, the real-world dynamical systems can be unknown and hard to model. It raises serious challenges in data collection and system inference (Schmidt et al., 2021; Ghosh et al., 2021) , where special care is required for error reduction.

Methods

2) Model Reusability. Model reusability is conceptually measured by the capability of utilizing historical data when facing an unprecedented problem instance. Since solving an individual instance of Eq. 1 from scratch is expensive, a reusable model that can be well adapted to systems of similar forms is welcomed for practical usage. This point traces to the sensitivity analysis of optimal control (Oniki, 1973) yet is absent in recent works. 3) Running Paradigm. As of typical paradigms adopted in optimization, numerical optimal control solvers traditionally use iterative methods before picking the control solution, and thus introducing a multiplicative term regarding the iteration in the running time complexity. This sheds light on the high computational cost of solving a single optimal control problem. 4) Control Solution Continuity. Control functions are defined on a continuous domain (typically time), although being intractable for numerical solvers. Hence resorting to discretization in the input domain gives rise to the trade-off in the precision of the control solution and the computational cost, as well as the truncation errors. While the discrete solution can give point-wise queries, learning a control function for arbitrary time queries is much more valued. et al., 2018; Wang et al., 2021a; Hwang et al., 2022) can (partially) overcome the above issues at the cost of introducing an auxiliary dynamics inference phase. This thread of works first approximates the state dynamics by a differentiable surrogate model and then, in its second phase, solves an optimization problem for control variables (more explanation in Appx. B). However, the two-phase paradigm increases computational cost and manifests inconsistency between the two phases. A motivating example in Fig. 1 shows the two-phase paradigm leads to crucial failures. When the domain of phase-2 optimization goes outside the training distribution in phase-1, this method might collapse. Table 1 compares the methods regarding the above aspects. We propose an instance-solution operator perspective for learning to solve OCPs, thereby tackling the issues above. The contributions are: 1. We propose the operator perspective and solve OCPs by learning direct mappings from OCPs to their solutions. The design holds the following merits. The system dynamics is implicitly learned during the training, which relies on neither any explicit form of the system nor the optimization process at test time. As such the operator can be reused and generalized to similarly-formed OCPs without retraining, and such generalization ability is even missing for learning-free solvers. The single-phase direct mapping paradigm avoids iterative processes with substantial speedup.



Figure 1: Phase-2 cost curves of two failed instances of two-phase control on Pendulum system. The control function gradually moves outside the training data distribution of phase-1. As a result, the control function converges w.r.t. the cost predicted by the surrogate model (blue), but diverges w.r.t. true cost (red).5) Running Phase.A two-phase model(Chen  et al., 2018; Wang et al., 2021a; Hwang et al.,  2022)  can (partially) overcome the above issues at the cost of introducing an auxiliary dynamics inference phase. This thread of works first approximates the state dynamics by a differentiable surrogate model and then, in its second phase, solves an optimization problem for control variables (more explanation in Appx. B). However, the two-phase paradigm increases computational cost and manifests inconsistency between the two phases. A motivating example in Fig.1shows the two-phase paradigm leads to crucial failures. When the domain of phase-2 optimization goes outside the training distribution in phase-1, this method might collapse.

