OOD-CONTROL: OUT-OF-DISTRIBUTION GENERAL-IZATION FOR ADAPTIVE UAV FLIGHT CONTROL

Abstract

Data-driven control methods have demonstrated precise and agile control of Unmanned Aerial Vehicles (UAVs) over turbulence environments. However, they are relatively weak at taming the out-of-distribution (OoD) data, i.e., encountering the generalization problem when faced with unknown environments with different data distributions from the training set. Many studies have designed algorithms to reduce the impact of the OoD problem, a common but tricky problem in machine learning. To tackle the OoD generalization problem in control, we propose a theoretically guaranteed approach: OoD-Control. We provide proof that for any perturbation within some range on the states, the control error can be upper bounded by a constant. In this paper, we present our OoD-Control generalization algorithm for online adaptive flight control and execute it in two instances. Experiments show that systems trained by the proposed OoD-Control algorithm perform better in quite different environments from training. And the control method is extensible and pervasively applicable and can be applied to different dynamical models. OoD-Control is validated on UAV dynamic models, and we find it performs state-of-the-art in positioning stability and trajectory tracking problems.

1. INTRODUCTION

UAVs have gained considerable attention and are widely used for various purposes because of their high manoeuvrability and flexibility. For example, quadrotors are widely deployed for inspection, reconnaissance, and rescue. As control strategies evolve, novel scenarios for UAVs, such as aerial grasping, transporting, and bridge inspection (Ruggiero et al., 2018) , require more precise trajectory tracking. Especially in the outdoor environment, unpredictable and changing wind field conditions pose substantial challenges to the stability of UAVs. Rotor blades are affected by induced airflow caused by the wind, which creates complex and non-stationary aerodynamic interactions (see Appendix B.6.3) . From security and policy perspectives, demonstrating that UAVs can operate safely and reliably in unpredictable environments with various distributions is an essential requirement. It is also the premise for future medical robots, autonomous cars, and manned aerial vehicles to be widely accepted. Many areas have benefited from data-driven approaches. However, they are susceptible to performance degradation after generalization. And the majority of deep learning algorithms heavily rely on the I.I.D assumption for data, which is generally violated in practice due to domain generalization (Zhou et al., 2022) . Nevertheless, neural networks may lose their robustness when confronted with OoD data. Many cases of failure in DNN originate from shortcut learning in the learning process (Geirhos et al., 2020) . The damage to the UAV is undoubtedly considerable if the UAV cannot adjust to the changing environment, i.e., it is unstable or even crashes in an OoD situation. One significant objective of this paper is to propose a control algorithm to enable UAVs to maintain accurate control even in the case of environment domain shifts. Our Contributions. UAVs interact with the changing environment, resulting in complex environment-dependent uncertain aerodynamics, called unknown dynamics, that are tricky to model and significantly impact precise control. Previous data-driven controllers attempt to solve the problem by estimating the unknown dynamics, while the estimation accuracy and the performance of the controllers are limited by the environment domain shifts in tests. This paper presents a methodology for adaptive flight control problems, focusing on enabling UAVs to fly under unknown environments. Compared with previous works, our proposed OoD-Control algorithm can provide performance guarantees under domain shifts of the environment distribution. Compared with previous state-of-the-art work (Shi et al., 2021) , the proposed OoD-Control method does not require strong assumptions, for example, e-ISS stability and a fully actuated system. Additionally, our algorithm has a greater capacity for generalization. For different distributions of the environment, we show theoretically that the bound on the prediction error of the unknown dynamics remains constant over a certain range of perturbations. Besides, simulated results under challenging aerodynamic conditions indicate that the OoD-Control algorithm achieves better control performance than the SOTA deep learning algorithms.

2. RELATED WORK

2.1 FLIGHT CONTROL ALGORITHMS UAVs have found broad applicability in a variety of fields and have attracted the attention of several researchers. Many published studies describe the significance and efficiency of flight control algorithms, including PID Control (Szafranski & Czyba, 2011 ), LQR Control (Priyambodo et al., 2020) , Sliding Mode Control (Chen et al., 2016) , Backstepping Control (Labbadi & Cherkaoui, 2019) , Robust Control (Hasseni & Abdou, 2021) , etc. However, most of the previously mentioned control methods suffer from limitations. Imprecise system modelling and non-modelled environmental disturbances may result in unacceptable performance or instability. Today, artificial intelligence triggered a new wave of research in many fields (Jumper et al., 2021; Silver et al., 2017) . The data-driven control methods can directly learn the corresponding control strategy from the interaction process of the controlled system so that it can adapt to new environments. Bansal et al. (2016) validates their proposed deep learning algorithm on a quadrotor testbed. On the other hand, reinforcement learning is a model-free algorithm widely used for control problems. Koch et al. (2019) present an intelligent high-precision flight control system using the reinforcement learning algorithm for UAVs. Moreover, the performance and accuracy of the internal control loop for quadrotor attitude control are analyzed and compared. Results indicate that the neural network has good generalization abilities and can learn the quadrotor dynamics accurately and apply them to the control system. Underwood & Husain (2010) propose an online parameter estimation, and the experimental results validate the effectiveness of the adaptive control method. O'Connell et al. (2022) have combined online adaptive learning with representation learning and adapted a DNN to learn a nonlinear representation. However, the environment's diversity is not considered in this work. Adapting to an environment completely different from the training set is challenging. Inspired by Shi et al. (2021) , mechanical-based models with learnable dynamics and DNNs are constructed in this study for their interpretability and stability. We further investigate in this paper whether the robustness of the algorithm can be improved with OoD generalization methods.

2.2. OUT-OF-DISTRIBUTION GENERALIZATION

Out-of-Distribution (OoD) generalization, which involves generalizing under data distribution domain shifts, is an active research area in the community. Generalizing a prediction model under distribution shifts is the process of generalizing its performance. Many algorithms have been proposed to achieve the OoD generalization, including meta-learning (Li et al., 2019; Zhang et al., 2020) , prototypical learning (Dubey et al., 2021) , gradient alignment (Rame et al., 2022) , domain adversarial learning (Akuzawa et al., 2019; Xu et al., 2020) and kernel methods (Li et al., 2018; Ghifary et al., 2016) etc. Literature has extensively discussed how to deal with domain shift, and the OoD generalization problem is extensively studied in computer vision (Hsu et al., 2020) , natural language processing (Hendrycks et al., 2020) , speech recognition (Shankar et al., 2018) , and other fields, but seldom in the context of online control. In Shi et al. (2021) , a multi-task learning method for nonlinear systems was presented that can withstand disturbances and unknown environments. Previous studies have suffered from shortcomings in lacking a discussion about the misspecification of dynamics systems and neglecting the gap between the simulation experiments and reality. Additionally, the generalization of flight control is plagued by system measurement errors and unknown environmental parameters like wind modes and air density and resistance. In this work, we demonstrate the average control error can be upper-bounded by a constant when the environmental disturbances are within some range. Moreover, experimental results indicate that the OoD-Control is robust to shifts in the environmental domain.

3. PROBLEM FORMULATION

Notations We use subscripts (e.g., t in x (z) t ) to denote the time index and superscripts (e.g., (z) in x (z) t ) to denote the dynamics state under the environmental perturbation z ∼ π 0 , where π 0 is the environmental distribution, ∥ • ∥ denotes the 2-norm of a vector. Note: Superscript (z) in x (z) t is used as a symbol to represent that the state is perturbed by the environmental distribution z ∼ π 0 and x (z) t = x t + z, where x t ∈ R n is the state in the windless environment. In this paper, we consider a discrete nonlinear control system whose dynamics are described by the following formula: x (z) t+1 = f 0 (x (z) t ) + B(x (z) t )u t -f (x (z) t , c) + w t , 1 ≤ t ≤ T, where x (z) t ∈ R n is the state variable. z ∈ R n changes with the environmental distribution domain shifts. B(x (z) t ) : R n → R n×m is the state-dependent actuation matrix; u t ∈ R m is the control input on the dynamic system; f 0 (x (z) t ) : R n → R n is the known nominal dynamic term that can be modelled with well-defined differential equations; f (x (z) t , c) : R n × R h → R n are the unknown environment-dependent dynamics that are hard to be modelled and c is the unknown environmental parameter, we also use f (x (z) t ) for short in the following paragraphs; w t ∈ R n is the random noise vector. We hypothesize that the environmental disturbance w t and the control u t are bounded. Due to the structural restrictions of the actuators, there are certain limits to the output. For example, the control output of the UAV is constrained by the maximum rotor blades' revolutions per minute (RPM). Here we give the boundedness Assumption. Assumption 1 (Bounded controls and disturbances) Assume that the controller's output has an upper bound: ∀t, ∥u t ∥ ≤ U . Moreover, the environmental noise vectors are also bounded with zero expectations: ∀t, ∥w t ∥ ≤ W , E(w t ) = 0. Definition 1 (Average Control Error under disturbances) The control error of the system under disturbance distribution π 0 at t is calculated as ∥E z∼π0 (x (z) t ) -x d t ∥. The average control error ACE π0 of T time steps is defined as the performance metric: ACE π0 = 1 T T t=1 ∥E z∼π0 (x (z) t ) -x d t ∥ (2) where x d t denotes the desired states at t. Remark 1 In Definition 1, we focus on fixed-point hovering and trajectory tracking. A sequence of perturbations matching π 0 is obtained by sampling for N times: (z 1 , z 2 , • • • , z N ). And the perturbed states sequence is also derived at time t: (x (z1) t , x (z2) t , • • • , x (z N ) t ). ∥E z∼π0 (x (z) t )∥ can be approximated with Monte Carlo method: N i=1 x (zi) t /N , where the subscripts i represent the index of the i th sample. Compared with the average control error definition in Shi et al. (2021); Åström & Murray (2008) , ACE π0 represents the expectation of the difference between the actual states and the desired states of the dynamical system under environmental perturbations. Interaction protocol. We set the study of the OoD adaptive flight control problem under the following interaction protocol: 1. Stochastically selects an environment for the controller to encounter every time step, which depends on the unobserved variable c (e.g., wind condition and air density). 2. The controller interacts with the environment and observes the state x (z) t to take action u t . 3. Optionally changes c after a short time and repeats from Step 1.

4. METHODOLOGY

We expect the UAV to learn the shared representation of the unknown dynamics between different environments so that it can generalize well in unseen areas with few adaptations. This section will introduce the methodology that provides the guaranteed upper bound for the prediction errors of the unknown dynamics. Notations and settings. For modelling unknown environment-dependent dynamics, we use f (x (Shi et al., 2021) . That means we consider the unknown environment-dependent dynamics typically consist of two coupling parts: the irregular higher-order aerodynamics of UAVs that are caused by the complex streamlined design, and the environmentdependent variables which encode wind field information. ϕ(x (z) t ; Θ) represents the former, which is a deep neural network (DNN) with L layers parameterized by Θ. ĉ is the latter, which is particular for a certain environment. This model enables us to consider the joint higher-order effects beyond nominal dynamics and enable agile control of UAVs. We use f (x (z) t ) for brevity in following paragraphs. (z) t , ĉ) = F(ϕ(x (z) t ; Θ), ĉ) inspired by Assume an environmental distribution π 0 , the perturbed unknown dynamics and the predictable value is denoted as the expectation of the states under environmental disturbances: f π0 (x (z) t ) = E z∼π0 [f (x (z) t , c)] and fπ0 (x (z) t ) = E z∼π0 [F(ϕ(x (z) t ; Θ), ĉ)] = E z∼π0 [ f (x (z) t , ĉ)]. In OoD-control, the objective is to minimize the unknown dynamics prediction loss for controller inputs: ℓ f = ∥f π0 (x (z) t )-fπ0 (x (z) t )∥. The loss is mapped to [0, 1] by h(∥f π0 (x (z) t )-fπ0 (x (z) t )∥) and we also use h(x (z) t ) for short in the following paragraphs. h(•) satisfies: 1) h(•) ∈ [0, 1]; 2) h(•) is monotonically decreasing and the inverse h-1 (•) exists. In the next section, we will propose a framework that provides a guaranteed upper bound for the ACE under OoD-Control.

4.1. THE PROOF OF ACE'S UPPER BOUND

Next, we will introduce a methodology to tackle the problem of generalization and give proof of ACE's upper bound. We want to verify that for any perturbation in B = {δ ∈ R n : ∥δ∥ 2 ≤ r} with radius r, the lower bound of the prediction error maintains constant under unpredictable disturbance, i.e., ∀∥δ∥ ≤ r, ∃p > 0, hπ0 (x (z) t ) > p, it still holds hπ0 (x (z) t +δ) > p. This conclusion is significant for the calculation of the upper bound for ACE. For a perturbation of radius r, the expectation of the prediction of the unknown dynamics remains the same. Assume H is a function class, which includes hπ0 (•) and satisfies H = {h : h(x) ∈ [0, 1], ∀x ∈ R n }. Performing the following optimization will result in a guaranteed lower bound. If H includes only hπ0 (•), the bound is exact: min δ∈B hπ0 (x (z) t + δ) ≥ min h∈H min δ∈B h π0 (x (z) t + δ) s.t. h π0 (x (z) t ) = hπ0 (x (z) t ) . (3) Theorem 1 (Lagrangian) L π0 (H, B) is denoted as the lower bound in equation 3. Lagrangian methods can be adapted to solve inequality: L π0 (H, B) = min h∈H min δ∈B max λ∈R L(h, δ, λ) ≜ min h∈H min δ∈B max λ∈R h π0 (x (z) t + δ) -λ[h π0 (x (z) t ) -hπ0 (x (z) t )] . (4) Exchanging the min and max yields the following dual form: L π0 (H, B) ≥ max λ≥0 min h∈H min δ∈B L(h, δ, λ) = max λ≥0 λ hπ0 (x (z) t ) -max δ∈B D H (λ π0 ||π δ ) (5) where π δ represents the distribution of z + δ when z ∼ π 0 and D H (λ π0 ||π δ ) = max h∈H {λE z∼π0 [h(x (z) t )] -E z∼π δ [h(x (z) t )]} = [λπ 0 (z) -π δ (z)] + dz. Corollary 1 (Gaussian noise) With Gaussian noise π 0 = N (0, σ 2 I) and bounded disturbance B = {δ : ∥δ∥ 2 ≤ r}, the lower bound in equation 5 satisfies: L π0 (H, B) = max λ≥0 λ hπ0 (x (z) t ) -max δ∈B D H (λ π0 ||π δ ) ≥ Φ(Φ -1 ( hπ0 (x (z) t ) - r σ ) (6) where Φ(•) represents the Gaussian Cumulative Density Function (CDF). For the case p=0.5, i.e., hπ0 (x (z) t ) > 0.5 , the radius satisfies r ≤ σΦ -1 ( hπ0 (x (z) t ) ). As a side note, the Monte Carlo method for perturbation radius calculation is also given by Algorithm 2 in Appendix B.3.

4.2. AVERAGE CONTROL ERROR BOUND

The selection of h. Given a sequence of state variables at t under perturbation z ∼ π 0 = N (0, σ 2 ): X = (x (z1) t , x (z2) t , . . . , x (z N ) t ). Moreover, the predicted and unknown dynamics sequences are defined as F p = ( f (x (z1) t ), f (x (z2) t ), . . . , f (x (z N ) t )) and F u = (f (x (z1) t ), f (x (z2) t ), . . . , f (x (z N ) t )). Let D p be the discrepancy sequence between F p and F u . D p = (∥ f (x (z1) t ) -f (x (z1) t )∥, ∥ f (x (z2) t ) - f (x (z2) t )∥, . . . , ∥ f (x (z N ) t ) -f (x (z N ) t )∥). Denote by p the successful rate for the prediction error under a given threshold ε t . By simulating with a large sample size N , p is calculated as: p ≜ P(∥ f (x (z) t ) -f (x (z) t )∥ < ε t ) = n a N (7) where n a is the number of elements in D p less than ε t . Recalling the h function's requirements, we can instantiate h as follows: h(∥ f (x (z) t ) -f (x (z) t )∥) ≜ p -k p(1 -p) n where k = Φ -1 (1 -α 2 ) is the 1 -α 2 quantile of a standard normal distribution. Moreover, this equation represents the lower confidence bound estimation of the error under a given confidence level α. As noise increases, the predicted value will deviate from the actual value by a more significant amount. Therefore, the h will decrease monotonically. Besides, the range of the lower confidence bound lies in [0, 1], which satisfies both requirements for h as discussed in the previous section. Let b denote the lower confidence bound in equation 8. In section 4.1, a proof is given that hπ0 (x (z) t ) and hπ0 (x (z) t + δ) has the same lower bound under the disturbances ∥δ∥ ≤ r. The radius ensuring an equal lower bound under the perturbation in this paper is: r = σΦ -1 (b) = σΦ -1 (p -k p(1 -p) n ). Controlling. The control term u consists of three parts: feedback, feedforward, and residual, where the feedback part gets information from sensors and minimizes the gap between x (z) t and x d t , the feedforward part offsets the nominal term f 0 (x (z) t ), and the residual part counterweights unknown environment-dependent term f (x (z) t ). B † (x (z) t ) is the pseudo-inverse B(x (z) t ). The controller of model-based control is u t = B † (x (z) t )(-f 0 (x (z) t ) + f (x (z) t )). (10) Thus, the equation 1 becomes: x t+1 = f (x (z) t ) -f (x (z) t ) + w t . ( ) Lemma 1 (ACE bound in ideal case) For any perturbation in B = {δ : ∥δ∥ 2 ≤ r}, the theoretical average control error is bounded as: ACE π δ = 1 T T t=1 ∥E z∼π δ [ f (x (z) t )] -E z∼π δ [f (x (z) t )]∥ ≤ h-1 (p). ( ) Remark 2 Average control error is related to the prediction error of unknown dynamics and environmental perturbations in the ideal case. Compared with the upper bound calculated in Shi et al. (2021) , the derived bound is more general. Our calculation method does not require the system to be fully actuated and does not require the nominal dynamics to be exponentially input-to-state stable (e-ISS). The assumption of e-ISS is too strong. And many dynamic systems are under-actuated in the real world, such as quadrotors. Corollary 2 (ACE bound under control actuation misspecification) ∆B(x (z) t ) is the parametric misspecification in the actuation matrix, and the ACE π δ satisfies: ACE π δ = 1 T T t=1 ∥E z∼π δ (x (z) t ) -x d t ∥ ≤ h-1 (p) + 1 T T t=1 ∥E z∼π δ [∆B(x (z) t )B † (x (z) t )e f (x (z) t )]∥ where e f (x (z) t ) = f (x (z) t , c) -f 0 (x (z) t ). Lemma 2 (Trajectory tracking ACE of quadrotor) In this paper, for environmental disturbances in B = {δ : ∥δ∥ 2 ≤ r}, the quadrotors' trajectory tracking error formula is given as: ACE π δ = 1 T T t=1 ∥E z∼π δ (x (z) t ) -x d t ∥ = 1 T T t=1 ∥E z∼π δ [C 1 e r1x (z) t + C 2 e r2x (z) t ] -h-1 (p)/K v ∥ where C 1 = -x d t -C 2 + ϵ/K v , C 2 = (ϵ + (K v x d t -ϵ)e r1x d t )/K v (e r2x d t -e r1x d t ) and ϵ = f (x (z) t , c) -f (x (z) t , ĉ) , which is the prediction error of the unknown dynamics. Remark 3 Note that Lemma 2 gives the trajectory tracking error that can be calculated. Based on the previous description, it was demonstrated that the errors for unknown dynamics under perturbation have the same bound. The detailed proof can be found in Appendix A.5.

4.3. OOD GENERALIZATION ALGORITHM

Based on the theoretical analysis above, we propose an algorithm-out-of-distribution generalization for adaptive flight control named OoD-Control. We focus on minimizing the prediction loss and learning Θ during the simulation. We intend to design an OoD-controller with lower ACEs that converges the estimated unknown dynamics f (x (z) t ) faster to the true dynamics f (x (z) t ) under environment distribution domain shifts. The proposed OoD-Control algorithm is shown in Algorithm 1 (see Appendix B.1). Given a set of distribution functions X, χ ∈ X are picked for each iteration. The wind velocity is a series of random variables sampled from χ. (Specifically, we use X for the training distribution set with each member denoted as χ. For the testing set, we use Ω and ω instead.) Each time-series simulation begins with random noise ϵ 1 being introduced to the structural parameters of the system. At each iteration, the predicted loss, which measures the error between the unknown dynamics and its prediction is minimized. After the unknown dynamic predictor is trained, it can be used for model-based control as discussed in Section 4.2.

5. EXPERIMENTS AND RESULTS

In this section, numerical experiments on the inverted pendulum and quadrotors will be conducted to demonstrate the effectiveness of the proposed OoD-control algorithmic framework. To better understand the proposed OoD-Control algorithm and environment setting, we choose an uncoupled dynamics model, the inverted pendulum, as the introductory example before the quadrotor instance.

5.1. DYNAMICS MODELING

Inverted Pendulum. Consider an inverted pendulum, the dynamic model of the pendulum is: ml 2 θ -mlgsinθ = u + f (θ, θ, c) where θ represents the angle away from the center, l is the length of the arm, g is gravitational acceleration, l is the length of the pendulum's arm, and m is the mass. The state variable consists of θ and θ, which could be measured by position and inertia sensors. f (•) represents the unknown dynamic term, including air resistance, wind force and modelling misspecification, subject to θ, θ and environment parameter c. u is the controlling term. Our goal is to keep the pendulum closer to the center, i.e., minimize average control error. Quadrotor. The quadrotor is a plane model where the four rotors are always on the same plane. So the quadrotor adjusts its attitude by setting different rotation speeds for the four rotors. We define the dynamic model of quadrotor as: m v = mg + R(θ)f T + f (15) J θ = J θ × θ + τ ( ) where θ represents the attitude angel of the quadrotor; R(θ) ∈ R 3×3 is the attitude rotation matrix subject to θ; J is the inertia matrix of the quadrotor, f T is the force imposed on the system; τ is the total torque; m is the mass of quadrotor and g is the gravitational acceleration; f T and τ are subject to the speeds of rotors n r ∈ R 1×4 . In the experiments, the goal is to maintain the quadrotor's position states or follow a given trajectory under turbulent environments.

5.2. COMPARISON METHOD

In the experiments, the proposed adaptive UAV flight control algorithm OoD-Control are compared with OMAC (Shi et al., 2021) and no-adapt (PID) method. OMAC (online meta-learning adaptive control) is the state-of-the-art data-driven UAV flight control method. In the OMAC paper, three versions of OMAC are provided with different model specifications: convex, bi-convex, and deep learning. We illustrate the results of deep learning because it is the best-performing version of OMAC. Meanwhile, the no-adapt method and the omniscient method are compared in this paper. No-adapt indicates the controller cannot perceive the environmental domain shifts with f (x (z) t ) = 0 which is just the conventional PID controller. omniscient is the controller which has access to the unknown dynamics perfectly, i.e., f (x (z) t ) = f (x (z) t ). Among all the controllers, no-adapt and omniscient are the two extremes, with no-adapt being unable to predict while omniscient can do so with zero error. We run each simulation ten times with different random seeds to obtain the mean and standard deviation of ACE under perturbation for rigorousness.

5.3. WIND FIELD CONSTRUCTION AND FLIGHT TRAJECTORY DESIGN

Wind fields can be derived according to the Navier-Stokes (N-S) equations and the continuum equation. However, in practice, the N-S equations are generally hard to be solved due to their high computational cost. For turbulent wind field simulations, the Dryden model (Specification, 1980) is widely used. We refer to the Dryden model to simulate turbulent wind fields on quadrotors by generating Gaussian wind disturbances (Beal, 1993) . To construct realistic situations in the inverted pendulum and quadrotor experiment, we simulated two types of winds: turbulent wind and gust. For turbulent winds, the speed and direction change at any time. In the case of gusts, the wind speed remains constant over a period of time. For further study, we divided the two wind fields turbulent wind and gust into three categories respectively according to their strength: breeze, strong breeze and gale. The direction and strength of the turbulent winds change continuously and the wind forces are applied to the object. This requires higher manoeuvrability to maintain stability. The wind environment setting in the experiment can be found in Appendix B.6.1. Quadrotors must also be capable of flying along the desired trajectory and hovering at a fixed point for various applications, such as inspection, patrol, and delivery. In order to meet the requirements of different application scenarios, we design a variety of trajectories to test the performance of the proposed OoD-Control under different situations. The designed trajectory can correspond to a specific application scenario, hovering for fixed-point photography, figure-8 trajectories for scenarios requiring high manoeuvrability, the spiral trajectory for power lines detection, and sin-forward for transporting items in the forests or area scanning. The mathematical forms of the trajectories are shown in Appendix B.6.1.

5.4. RESULTS

Pendulum. Table 1 (see Appendix B.2) and Figure 1 illustrate the average and standard deviation of the control errors in different testing environments. We mainly compare the OoD-Control algorithm with the OMAC. And for the completeness of the experiments, we also set two control groups: the no-adapt and the omniscient. As shown in row 3 of Table 1 , our OoD-Control algorithm performs significantly better than the OMAC in the gale dataset. That means the former generalizes better than the latter when meeting a large environmental distribution shift. When changing to the less difficult dataset such as the strong breeze, we can see that the gap between the two algorithms decreases, but OoD-Control still achieves nearly half the score of the OMAC. (The breeze dataset is not complicated enough to distinguish the mentioned methods. We also show the results of when ĉ is unchanged. In this setting, both OMAC and OoD-Control perform terribly because the variable used to fit the ground truth of c is frozen. Figure 1 : Result of inverted pendulum experiment where the testing environment is Gale. The black dashed line represents the desired states for the inverted pendulum. The objective of this task is to maintain the angle θ and angular velocity θ of the inverted pendulum to zero. f is the ground truth of the torque, while f is the predicted torque. * is given for the best performance. As shown in the amplified areas (the black rounded rectangles), our algorithm predicts much better than OMAC. Quadrotor. We show the result of the quadrotor task in Figure 2 and Table 2 (see Appendix B.6.1). As testing environments differ from training ones, the OoD-Control method maintains good stability and tracking accuracy. In some cases, the performance of OoD-Control goes close to the omniscient case where the rotor is provided with precise wind conditions, which shows that our algorithm is able to predict the wind with acceptable error. Our algorithm achieves lower ACE than baseline methods (60% than OMAC and over 70% than PID) in most difficult cases. Besides, we shorten the training time to test its sample efficiency, and it turns out that our algorithm performs well in few-shot learning. By adding noise during training DNN and fixing a learning rate of meta-learning of ĉ, our algorithm gains robustness and adapts quicker in a different environment. We tested our method under several different trajectories, and OoD-Control outperforms the baseline and conventional no-adapt methods when the distribution domain shifts during the testing process. Meanwhile, OoD-control can learn more from changes in the environment and apply it as prior knowledge, thereby improving its adaptability. The OMAC and OoD-Control algorithm were tested under hovering, figure-8, spiral upward and sin-forward trajectory scenarios. Figure 2 and Table 2 (see Appendix B.6.1) show the trajectory tracking experimental results. OoD-Control provides more accurate results across trajectories under a wide range of wind environments and achieves state-of-the-art performance in all these situations compared with the baseline. Based on experiments, it has been demonstrated that systems trained by the proposed OoD-Control algorithm perform state-of-the-art. In addition, the control method can be applied to different dynamical models and is extensible and universally applicable.

6. CONCLUSION

In this paper, we theoretically demonstrate that the average control error is upper-bounded by a constant when the perturbation on the state variables is within a certain radius for UAV flight control. Besides, we propose an algorithmic framework-OoD-Control that is evaluated under turbulent environmental conditions. Based on the results of our experiments, we can conclude that our algorithm is scalable and pervasively applicable that can be applied to a variety of dynamic models. For future work, we will explore extending our algorithmic framework to more UAV types, such as unmanned helicopters, tilt-rotors, and unmanned fixed-wing aircraft. As far as we are aware, this is one of the first papers that theoretically discusses out-of-distribution problems in the context of online adaptive UAV flight control.

A PROOF OF LEMMA AND COROLLARY

A.1 PROOF FOR THEOREM 1 Theorem 1 L π0 (H, B) is denoted as the lower bound in equation 3. Lagrangian methods can be adapted to solve inequality. L π0 (H, B) = min h∈H min δ∈B max λ∈R L(h, δ, λ) ≜ min h∈H min δ∈B max λ∈R h π0 (x (z) t + δ) -λ[h π0 (x (z) t ) -hπ0 (x (z) t )] (17) Exchanging the min and max yields the following dual form: L π0 (H, B) ≥ max λ≥0 min h∈H min δ∈B L(h, δ, λ) = max λ≥0 λ hπ0 (x (z) t ) -max δ∈B D H (λ π0 ||π δ ) ( ) where π δ represents the distribution of z + δ when z ∼ π 0 and D H (λ π0 ||π δ ) = max h∈H {λE z∼π0 [h(x (z) t )] -E z∼π δ [h(x (z) t )]} = [λπ 0 (z) -π δ (z)] + dz. Proof. (i) L π0 (H, B) = min h∈H min δ∈B max λ∈R h π0 (x (z) t + δ) -λ[h π0 (x (z) t ) -hπ0 (x (z) t )] ≥ max λ≥0 min h∈H min δ∈B h π0 (x (z) t + δ) -λ[h π0 (x (z) t ) -hπ0 (x (z) t )] = max λ≥0 λ hπ0 (x (z) t ) -max h∈H (λh π0 (x (z) t ) -h π δ (x (z) t )) = max λ≥0 λ hπ0 (x (z) t ) -max δ∈B D H (λ π0 ||π δ ) (ii) We denote the sign function sgn(z) as: sgn(δ) = 1, if [λπ 0 (z) -π δ (z)] ≥ 0 0, if [λπ 0 (z) -π δ (z)] < 0 Thus we can calculate D H (λ π0 ||π δ ) directly as: D H (λ π0 ||π δ ) = max h∈H (λh π0 (x (z) t ) -h π0 (x (z) t + δ)) = max h∈H λE z∼π0 [h(x (z) t )] -E z∼π δ [h(x (z) t )] = sgn(x (z) t )[λπ 0 (z) -π δ (z)]dz = [λπ 0 (z) -π δ (z)] + dz A.2 PROOF FOR COROLLARY 1 Corollary 1 (Gaussian noise) With Gaussian noise π 0 = N (0, σ 2 I) and bounded disturbance B = {δ :]∥δ∥ 2 ≤ r}, the lower bound in equation 5 satisfies: L π0 (H, B) = max λ≥0 λ hπ0 (x (z) t ) -max δ∈B D H (λ π0 ||π δ ) ≥ Φ(Φ -1 ( hπ0 (x (z) t )) - r σ ) where Φ(•) represents the Gaussian Cumulative Density Function (CDF). For the case p=0.5, i.e., t ) > 0.5 , the radius satisfies r ≤ σΦ -1 ( hπ0 (x (z) t )). Proof. L ≥ Φ(Φ -1 (h π0 (x (z) t )) - r σ ) > 1 2 (21) L ≥ min ∥δ∥≤r max λ≥0 λ hπ0 (x (z) t ) -[λπ 0 (z) -π δ (z)] + dz We denote C λ = {z : λπ 0 (z) ≥ π δ (z)}={z : δ T z ≤ ∥δ∥ 2 2 + σ 2 ln λ} and F (δ, λ) = λ hπ0 (x t ) -[λπ 0 (z) -π δa (z)] + dz. Then we will get: F (δ, λ) = λ hπ0 (x (z) t ) -[λπ 0 (z) -π δ (z)] + dz = λ hπ0 (x (z) t ) - C λ [λπ 0 (z) -π δ (z)]dz = λ hπ0 (x (z) t ) -λΦ ∥δ∥ 2 2σ + σ ln λ ∥δ∥ 2 + Φ -∥δ∥ 2 2σ + σ ln λ ∥δ∥ 2 It is notable that F (δ, λ) is a concave function w.r.t λ, thus the maximum value occurs when ∂F (δ,λ) ∂λ λ=λ δ = 0. A direct calculation gives λ δ = exp 2σ∥δ∥2Φ -1 (hπ 0 (x (z) t ))-∥δ∥ 2 2 2σ 2 . There is L ≥ min ∥δ∥≤r max λ≥0 F (δ, λ) = min ∥δ∥≤r Φ -∥δ∥ 2 2σ + σ ln λ δ ∥δ∥ 2 = min ∥δ∥≤r Φ Φ -1 (h π0 (x (z) t )) - ∥δ∥ 2 σ = Φ Φ -1 (h π0 (x (z) t )) - r σ In case p=0.5, the perturbation radius r is calculated asfoot_0 : min ∥δ∥≤r max λ≥0 F (δ, λ) > 1 2 ⇔ Φ Φ -1 (h π0 (x (z) t )) - r σ > 1 2 ⇔ r < σΦ -1 (h π0 (x (z) t )) A.3 PROOF FOR LEMMA 1 Lemma 1 (ACE bound in ideal case) For any perturbation in B = {δ : ∥δ∥ 2 ≤ r}, the theoretical average control error is bounded as: ACE π δ = 1 T T t=1 ∥E z∼π δ [ f (x (z) t )] -E z∼π δ [f (x (z) t )]∥ ≤ h-1 (p) Proof. From equation 1, we consider a discrete nonlinear control-affine system: x t+1 = f 0 (x t ) + B(x t )u t -f (x t , c) + w t , 1 ≤ t ≤ T, The controller of model-based control with the ideal model is u t = B † (x (z) t )(-f 0 (x (z) t ) + f (x (z) t )) thus, the equation 1 becomes: x t+1 = f (x (z) t ) -f (x (z) t , c) + w t Following a straightforward calculation, we will obtain the upper ACE bound under disturbance z ∼ π δ ACE π δ = 1 T T t=1 ∥E z∼π δ (x (z) t ) -x d t ∥ = 1 T T t=1 ∥E z∼π δ [ f (x (z) t )] -E z∼π δ [f (x (z) t , c)] + E z∼π δ (w t )∥ ≤ 1 T T t=1 ∥E z∼π δ [ f (x (z) t )] -E z∼π δ [f (x (z) t , c)]∥ + ∥E z∼π δ (w t )∥ ≤ ∥ ĥ-1 (p)∥ A.4 PROOF FOR COROLLARY 2 Corollary 2 (ACE bound under control actuation misspecification) ∆B(x (z) t ) is the parametric misspecification in the actuation matrix, and the ACE π δ satisfies: ACE π δ = 1 T T t=1 ∥E z∼π δ (x (z) t ) -x d t ∥ ≤ h-1 (p) + 1 T T t=1 (∥E z∼π δ [∆B(x (z) t )B † (x (z) t )e f (x (z) t )]∥) where e f (x (z) t ) = f (x (z) t , c) -f 0 (x (z) t ). Proof. x t+1 = f 0 (x (z) t ) + (B(x (z) t ) + ∆B(x (z) t )u(t) -f (x (z) t , c) + w t = f (x (z) t ) -f (x (z) t , c) + w t + ∆B(x (z) t )B † (x (z) t )[-f 0 (x (z) t ) + f (x (z) t )] = f (x (z) t ) -f (x (z) t , c) + w t + ∆B(x (z) t )B † (x (z) t )e f (x (z) t ) We use e f (x (z) t ) to denote f (x (z) t ) -f 0 (x (z) t ), the ACE upper bound is calculated as follows: ACE π δ = 1 T T t=1 ∥E z∼π δ (x (z) t ) -x d t ∥ = 1 T T t=1 ∥E z∼π δ [ f (x (z) t ) -f (x (z) t , c) + w t + ∆B(x (z) t )B † (x (z) t )e f (x (z) t )]∥ = 1 T T t=1 ∥E z∼π δ [ f (x (z) t ) -f (x (z) t , c)] + E z∼π0 (w t ) + E z∼π δ [∆B(x (z) t )B † (x (z) t )e f (x (z) t )]∥ = 1 T T t=1 ∥E z∼π δ [ f (x (z) t )] -E z∼π δ [f (x (z) t , c)] + E z∼π δ [∆B(x (z) t )B † (x (z) t )e f (x (z) t )]∥ ≤ h-1 (p) + 1 T T t=1 (∥E z∼π δ [∆B(x (z) t )B † (x (z) t )e f (x (z) t )]∥) A.5 PROOF FOR LEMMA 2 Lemma 2 (Trajectory tracking ACE of quadrotor) In this paper, for environmental disturbances in B = {δ : ∥δ∥ 2 ≤ r}, the formula of quadrotors' trajectory tracking error is given as: ACE π δ = 1 T T t=1 ∥E z∼π δ (x (z) t ) -x d t ∥ = 1 T T t=1 ∥E z∼π δ [C 1 e r1x (z) t + C 2 e r2x (z) t -ϵ/K v ]∥ = 1 T T t=1 ∥E z∼π δ [C 1 e r1x (z) t + C 2 e r2x (z) t ] -h-1 (p)/K v ∥ where C 1 = -x d t -C 2 + ϵ/K v , C 2 = (ϵ + (K v x d t -ϵ)e r1x d t )/K v (e r2x d t -e r1x d t ) and ϵ = f (x (z) t , c) -f (x (z) t , ĉ), which is the prediction error of the unknown dynamics. Proof. Recall the kinetic function in equation 15: m v = mg + Rf T + f t . The desired control force f d is designed as: f d = Rf T = fd -ft fd = m vr + K v x e -mg where x e = x (z) t -x d t . x e is the error of trajectory tracking and v r is the desired velocity at t. By substituting equation 32 into equation 15, the UAV dynamics becomes: m v = mg + Rf T + f t m v -mg -fd = f t -ft m( v -vr ) -K v x e = f t -ft mẍ e -K v x e -ϵ = 0 ẍe - K v m x e - 1 m ϵ = 0 where ϵ = f t -ft = f (x (z) t , c)-f (x t , ĉ). Note that equation 33 is a second-order inhomogeneous linear differential equation. 1) General solution: Making the substitution in the differential equation, r satisfies the auxiliary equation: r 2 - K v m = 0 ⇒ r 1,2 = ± K v m then, we obtain the general solution of the differential function xe as: xe = C 1 e r1x + C 2 e r2x 2) Special solution: Consider the standard second order differential equation: ÿ + p ẏ + qy = P (x)e αx , the special solution x * e is obvious: x * e = -ϵ/K v Therefore, x e can be expressed as: x e = xe + x * e = C 1 e r1x + C 2 e r2x -ϵ/K v (37) 3) Calculation of C 1 and C 2 : In the equation 37, there exist two fixed points:(x d ,0) and (0,-x d ). We have: 0 = C 1 e r1x + C 2 e r2x -ϵ/K v -x d = C 1 + C 2 -ϵ/K v (38) By solving the simultaneous formulas, we will get the answer of C 1 and C 2 . C 1 = -x d -C 2 + ϵ/K v C 2 = (ϵ + (K v x d -ϵ)e r1x d )/K v (e r2x d -e r1x d ) Thus the solution of the original derivative function is: x e = x (z) t -x d t = C 1 e r1x + C 2 e r2x -ϵ/K v and we have the average error bound in equation 2: ACE π δ = 1 T T t=1 ∥E z∼π δ (x (z) t -x d t )∥ = 1 T T t=1 ∥E z∼π δ [C 1 e r1x (z) t + C 2 e r2x (z) t -ϵ/K v ]∥ = 1 T T t=1 ∥E z∼π δ [C 1 e r1x (z) t + C 2 e r2x (z) t ] -E z∼π δ (ϵ/K v )∥ = 1 T T t=1 ∥E z∼π δ [C 1 e r1x (z) t + C 2 e r2x (z) t ] -E z∼π δ [f (x (z) t , c) -f (x (z) t , ĉ)]/K v ∥ = 1 T T t=1 ∥E z∼π δ [C 1 e r1x (z) t + C 2 e r2x (z) t ] -h-1 (p)/K v ∥ B EXPERIMENTAL SETTINGS AND DETAILS

B.1 OOD-CONTROL PSEUDO CODE AND ALGORITHM SETTINGS

Out-of-Distribution comes from the misspecification of systems' components and the systematic error of sensors and environment models. Our algorithm is able to extrapolate unknown wind disturbances after learning a model from previous data containing generalized information. To eliminate the influence caused by other factors, the two models share the same initial state and environmental conditions. Furthermore, in order to make the training process fair for both models, we simulate them for the same number of iterations and sustain each iteration for the same period of time. Algorithm 1 OoD-Control (Out-of-Distribution Generalization control for Adaptive Nonlinear Control) Input: Set of distribution functions X , DNN ϕ with parameter Θ, environment estimation vector ĉ Parameter: Parameters of mechanical system and aerodynamics. Output: The estimation of unknown force f 1: while picking χ from X do 2: Sample a series of independent random variables w subject to χ as external wind force. 3: Apply external force to the simulation according to equation 43 4: Calculate loss with noise in the state: L = ||ϕ(x + ∆x) T ĉ -f || 2 5: Update Θ : Θ = Θ -η 1 ∇ θ L 6: Update ĉ : ĉ = ĉ -η 2 ∇ ĉL 7: return f = ϕ(x; Θ) T ĉ 8: end while

B.2 UPDATE OF PARAMETERS ϕ AND ĉ

Based on equation 1, we design a discrete-time simulation process to calculate state variables and estimate the unknown term at each time interval and by this way, collect data for training ϕ. In algorithm 1, we update Θ during simulation: Θ = Θ -∇L θ . Keeping a constant ĉ in the inverted pendulum experiment results in the inability to update environmental parameters. This can result in higher ACE or even failure to control the system (see Table 2 ). For the quadrotor, when the wind is severe, it cannot maintain its position, as illustrated in Figure 3 . It should be noted, however, that our algorithm exhibits better control even when position drift occurs. Generally, updates to ϕ would be more energy intensive, whereas updating only ĉ would be closer to the actual embedded device. The fourth-order Runge-Kutta method is a common iterative method to calculate differential equations and approach continuous functions. In this paper, we use this method to calculate equation 1 and simulate the process of the mechanical system. We do the simulation by moving tiny ∆t each time, which corresponds to the step size of the Runge-Kutta method. And the right-hand side of equation 1 is the derivative of the state x.

B.5 THE COMPARISON OF VARIOUS TRAJECTORY TRACKING MODES

OoD generalized model adapts better and gives inputs forces closer to the desired ones as illustrated in Figure 4 Protocol Two sets of distribution functions X and Ω with X ∩ Ω = ∅ are defined for training and testing. The training distribution χ ∈ X and the testing distribution ω ∈ Ω are specified for different experiment tasks. The wind velocity is a series of independent random variables sampled subject to the distributions (χ or ω) picked out. The wind brings induced airflow to the rotor blades, creating complex and nonstationary aerodynamic interactionsfoot_1 . All models are simulated with the same wind series and the simulating duration is also the same.  ∥V t ∥ = k f n 2 2πρr 2 where k f is the lift coefficient, ρ is the air density, r is the rotor radius. A wind field results in a total aerodynamic force on the rotor equal to the sum of the lift force T and the additional wind disturbance force F w . The total lift can be calculated as: ∥T + F w ∥ = 2πρr 2 ∥V t ∥V w + V t ∥ Therefore, the wind disturbance force H wi and moment M wi on i th rotor is:    H wi = k f n 2 i -2πρr 2 ∥(0, 0, k f n 2 i /2πρr 2 ) T + V B w ∥ M wi = k m H wi /k f , when the rotor turns clockwise -k m H wi /k f , otherwise where k m is the anti-torque coefficient related to the shape of the rotors and local air density. 2) The air drag. Air drag can be ignored in hovering or low-speed flights without wind. However, in the presence of a wind field, the following equation can be used to calculate air drag. D g = 1 2 cρS air V 2 air ( ) where c represents the air drag coefficient, S air is the windward area, and V air is the relative speed of the wind to the quadrotor. N oise x and N oise a mean the noise scale added to the state vector and the environment representation vector respectively. In this experiment, We double the test duration (compared to the results in Table 2 ) to enlarge the differences.



The calculation method of the radius r is given inCohen et al. (2019) for the case p=0.5. More detailed information related to the aerodynamics under wind is shown in Appendix B.6.3



Figure-8

Figure 2: 2D view of trajectories in different wind conditions and performance comparison of OMAC and OoD-Control algorithms for trajectory tracking. The goal of the controller is to get closer to the desired trajectory (black line). Different colors demonstrate the distance from the actual location to the desired location, corresponding to the color bar at the bottom. The shade of color indicates the magnitude of the deviation in position. As compared to OMAC and no-adapt, the proposed OoD-Control method provides more accurate results across a wide range of wind environments and trajectories. The average ACE of ten independent experiments is marked in the subplot and * is given for the best performance under the same environment.

Figure 3: Results when both ĉ and ϕ keep unchanged in testing

to Figure 7 which shows the traces under OMAC (Deep Learning), OoD-Control and omniscient models and the desired trajectory, the OoD-Control algorithm clearly goes closer to the desired curve.

Figure 4: Traces when the quadrotor tries to keep still.

Figure 5: Traces when the trajectory is a spiral curve..

Figure 6: Traces of OMAC, OoD-Control and omniscient quadrotor models with figure-8 trajectory

Figure 8: Sketch of the aerodynamics of the propeller in wind conditions

PERFORMANCE IN I.I.D. ENVIRONMENT AND DIFFERENT NOISE SETTING

ACE results in pendulum experiments with the changed or unchanged ĉ

ACE in quadrotor experiment with different trajectories and environment

ACE in quadrotor experiment in i.i.d. environment

annex

Pendulum In this task, we use 5 different normal distributions as the training set, i.e. X = {χ k |χ k = N (0, 0.2k), k = 1, 2, . . . , 5}. When testing, we set 3 different levels of the wind based on the difficulty, that is the breeze, strong breeze, and gale. And each level has 6 or 8 different uniform distributions. We show the details of the testing set Ω as follows:• Breeze:Quadrotor In this instantiation, we only use one training distribution, the three-dimensional standard normal distribution, to train the models. Meanwhile, we transform the initial wind data sampled from the normal distribution to their absolute values, i.e. χ = |N (µ, Σ)|, µ = [0, 0, 0], Σ = diag(1, 1, 1). The reason for this operation is that we want to make the wind come from only one octant, then it differs from the test environment in distribution and direction. Similarly, the test set Ω also has 3 different levels, but the exact distributions are different from the pendulum task.Following are the details:• Breeze:

Trajectory illustration

The three trajectories we used in the quadrotor experiment (see Table 2 ) are mathematically described as:• sin-forward: (x, y, z) = (2sin( πt 3 ), 0.2t, 0.5t) • figure-8: (x, y, z) = (2sin( πt 5 ), 2sin( πt 5 ), sin( 2πt 5 )) • spiral-up: (x, y, z) = (sin( 2πt 5 ), cos( 

