TOWARDS ROBUST NEURAL NETWORKS VIA CLOSE-LOOP CONTROL

Abstract

Despite their success in massive engineering applications, deep neural networks are vulnerable to various perturbations due to their black-box nature. Recent study has shown that a deep neural network can misclassify the data even if the input data is perturbed by an imperceptible amount. In this paper, we address the robustness issue of neural networks by a novel close-loop control method from the perspective of dynamic systems. Instead of modifying the parameters in a fixed neural network architecture, a close-loop control process is added to generate control signals adaptively for the perturbed or corrupted data. We connect the robustness of neural networks with optimal control using the geometrical information of underlying data to design the control objective. The detailed analysis shows how the embedding manifolds of state trajectory affect error estimation of the proposed method. Our approach can simultaneously maintain the performance on clean data and improve the robustness against many types of data perturbations. It can also further improve the performance of robustly trained neural networks against different perturbations. To the best of our knowledge, this is the first work that improves the robustness of neural networks with close-loop control 1 .

1. INTRODUCTION

Due to the increasing data and computing power, deep neural networks have achieved state-of-theart performance in many applications such as computer vision, natural language processing and recommendation systems. However, many deep neural networks are vulnerable to various malicious perturbations due to their black-box nature: a small (even imperceptible) perturbation of input data may lead to completely wrong predictions (Szegedy et al., 2013; Nguyen et al., 2015) . This has been a major concern in some safety-critical applications such as autonomous driving (Grigorescu et al., 2020) and medical image analysis (Lundervold & Lundervold, 2019) . Various perturbations have been reported, including the p norm based attack (Madry et al., 2017; Moosavi-Dezfooli et al., 2016; Carlini & Wagner, 2017) , semantic perturbation (Engstrom et al., 2017) etc. On the other side, some algorithms to improve the robustness against those perturbations have shown great success (Madry et al., 2017) . However, most robustly trained models are tailored for certain types of perturbations, and they do not work well for other types of perturbations. Khoury & Hadfield-Menell (2018) showed the non-existence of optimal decision boundary for any p -norm perturbation. Recent works (E, 2017; Haber & Ruthotto, 2017) have shown the connection between dynamical systems and neural networks. This dynamic system perspective provides some interesting theoretical insights about the robustness issue. Given a set of data x 0 ∈ R d and its labels y ∈ R l with a joint distribution D, training a neural network can be considered as following min θ E (x0,y)∼D [Φ(x T , y)], s.t. x t+1 = f (x t , θ t ), § Equal contributing authors. 1 A Pytorch implementation can be found in:https://github.com/zhuotongchen/ Towards-Robust-Neural-Networks-via-Close-loop-Control.git where θ are the unknown parameters to train, and f , Φ represent the forward propagation rule and loss function (e.g. cross-entropy) respectively. The dynamical system perspective interprets the vulnerability of neural networks as a system instability issue, which addresses the state trajectory variation under small perturbations applied on initial conditions. The optimal control theory focuses on developing a control model to adjust the system state trajectory in an optimal manner. The first work that links and extends the classical back-propagation algorithm using optimal control theory was presented in Li et al. (2017) , where the direct relationship between the Pontryagin's Maximum Principle (Kirk, 1970) 

1.1. PAPER CONTRIBUTIONS

To address the limitation of using open-loop control methods, we propose the Close-Loop Control Neural Network (CLC-NN), the first close-loop control method to improve the robustness of neural networks. As shown in Fig. 1 , our method adds additional blocks to a given T -layer neural network: embedding functions E t , which induce running losses in all layers that measure the discrepancies between true features and observed features under input perturbation, then control processes generate control variables u t to minimize the total running loss under various data perturbations. The original neural network can be designed by either standard training or robust training. In the latter case, our CLC-NN framework can achieve extra robustness against different perturbations. The forward propagation rule is thus modified with an extra control parameter u t ∈ R d x t+1 = f (x t , θ t , u t ). Fig. 1 should not be misunderstood as an open-loop control. From the perspective of dynamic systems, x 0 is an initial condition, and the excitation input signal is u t (which is 0 in a standard feed-forward network). Therefore, the forward signal path is from u t to the internal states x t and then to the output label y. The path from x t to the embedding function E t (x t ) and then to the excitation signal u t forms a feedback and closes the whole loop. The technical contributions of this paper are summarized below: • The proposed method relies on the well accepted assumption that the data and hidden state manifolds are low dimensional compared to the ambient dimension (Fefferman et al., 2016) . We study the geometrical information of the data and hidden layers to define the objective function for control. Given a trained T -layer neural network, a set of embedding functions E t are trained off-line by minimizing the reconstruction loss E(x t ) -x t over some clean data from D only. The embedding functions support defining a running loss required in our control method. • We define the control problem by dynamic programming and implement the online iterative solver based on the Pontryagin's Maximum Principle to avoid the curse of dimensionality. The proposed close-loop control formulation does not require prior information of the perturbation. • We provide a theoretical error bound of the controlled system for the simplified case with linear activation functions and linear embedding. This error bound reveals how the close-loop control improves neural network robustness in the simplest setting.

2. RELATED WORKS

Many techniques have been reported to improve the robustness of neural networks, such as data augmentation (Shorten & Khoshgoftaar, 2019) , gradient masking (Liu et al., 2018) , etc. We review adversarial training and reactive defense which are most relevant to this work. Adversarial Training. Adversarial training is (possibly) the most popular robust training method, and it solves a min-max robust optimization problem to minimize the worse-case loss with perturbed data. Adversarial training effectively regularizes the network's local Lipschitz constants of the loss surface around the data manifold (Liu et al., 2018) . Zhang et al. (2019) Reactive Defense. A reactive defense method tries to reject or pre-process the input data that may cause mis-classifications. Metzen et al. (2017) rejected perturbed data by using adversarial detectors that are trained with adversarial data to detect abnormal data during forward propagation. Song et al. (2017) estimated the input data distribution D with a generative model (Oord et al., 2016) to detect data that does not belong to D, it applies a greedy method to search the local neighbour of input data for a more statistically plausible counterpart. This purification process has shown improved accuracy with adversarial data contaminated by various types of perturbations. Purification can be considered as a one-step method to solve the optimal control problem that has the objective function defined over the initial condition only. On the contrary, the proposed CLC-NN solves the control problem by the dynamic programming principle and its objective function is defined over the entire state trajectory, which guarantees the optimality for the resulted controls.

3. THE CLOSE-LOOP CONTROL FRAMEWORK FOR NEURAL NETWORKS

Now we present a close-loop optimal control formulation to address the robustness issue of deep learning. Consider a neural network consisting of model parameters θ equipped with external control policy π, where π ∈ Π is a collection of functions R d → R d acting on the state and outputting the control signal. The feed-forward propagation in a T -layer neural network can be represented as x t+1 = f (x t , θ t , π t (x t )), t = 0, • • • , T -1. (1) Given a trained network, we solve the following optimization problem min π E (x0,y)∼D [J(x 0 , y, π)] := min π E (x0,y)∼D Φ(x T , y) + T -1 s=0 L(x s , π s (x s )) , s.t. Eq. (1), (2) where π collects the control policies π 0 , • • • , π T -1 for all layers. Note that (2) differs from the open-loop control used in standard training. An open-loop control that treats the network parameters as control variables seeks for a set of fixed parameters θ to match the output with true label y by minimizing the terminal loss Φ, and the running loss L defines a regularization for θ. However, the terminal and running losses play different roles when our goal is to improve the robustness of a neural network by generating some adaptive controls for different inputs. Challenge of Close-loop Control for Neural Networks. Optimal control has been well studied in the control community for trajectory optimization, where one defines the running loss as the error between the actual state x t and a reference state x t,ref over time interval [0, T ]. The resulting control policy adjusts x t and makes it approach x t,ref . In this paper, we apply the idea of trajectory optimization to improve the robustness of a neural network via adjusting the undesired state of x t . However, the formulation is more challenging in neural networks: we do not have a "reference" state during the inference process, therefore it is unclear how to define the running loss L. In the following, we investigate manifold embedding of the state trajectory to precisely define the loss functions Φ and L of Eq. ( 2) required for the control objective function of a neural network.

3.1. MANIFOLD LEARNING FOR STATE TRAJECTORIES

State Manifold. Our controller design is based on the "manifold hypothesis": real-world high dimensional data can often be embedded in a lower dimensional manifold M (Fefferman et al., 2016) . Indeed, neural networks extract the embedded features from M. To fool a well-trained neural network, the perturbed data often stays away from the data manifold M (Khoury & Hadfield-Menell, 2018) . We consider the data space Z (x ∈ Z, ∀x ∼ D) as: Z = Z Z ⊥ , where Z contains the embedded manifold M and Z ⊥ is the orthogonal complement of Z . During forward propagation, the state manifold embedded in Z varies at different layers due to both the nonlinear activation function f and state dimensionality variation. Therefore, we denote Z t = Z t Z t ⊥ as the state space decomposition at layer t and M t ∈ Z t . Once an input data is perturbed, the main effects of causing misclassifications are in Z ⊥ . Therefore, it is important to measure how far the possibly perturbed state x t deviates from the state manifold M t . Embedding Function. Given an embedding function E t that encodes x t onto the lowerdimensional manifold M t and decodes the result back to the full state space Z t , the reconstruction loss E t (x t ) -x t measures the deviation of the possibly perturbed state x t from the manifold M t . The reconstruction loss is nonzero as long as x t has components in Z t ⊥ . The embedding functions are constructed offline by minimizing the total reconstruction losses over a clean training data set. • Linear Case: E t (•) can be considered as V r t (V r t ) T where V r t forms an orthonormal basis for Z t . Specifically one can first perform a principle component analysis over a collection of hidden states at layer t, then V r t can be obtained as the first r columns of the resulting eigenvectors. • Nonlinear Case: we choose a convolutional auto-encoder (detailed in Appendix B) to obtain a representative manifold embedding function E t due to its ease of implementation. Based on the assumption that most perturbations are in the Z ⊥ subspace, the embeddings are effective to detect the perturbations as long as the target manifold is of a low dimension. Alternative manifold learning methods such as Izenman (2012) may also be employed.

3.2. FORMULATION FOR THE CLOSE-LOOP CONTROL OF NEURAL NETWORKS

Control Objectives. The above embedding function allows us to define a running loss L: L(x t , π t (x t ), E t (•)) = E t (x t ) -x t 2 2 + (π t (x t )) T Rπ t (x t ). Here the matrix R defines a regularization term promoting controls of small magnitudes. In practical implementations, using a diagonal matrix R with small elements often helps to improve the performance. Now we are ready to design the control objective function of CLC-NN. Different from a standard open-loop control, this work sets the terminal loss Φ as zero because no true label is given during inference. Consequently, the close-loop control formulation in Eq. ( 2) becomes min π E (x0,y)∼D [J(x 0 , y, π)] := min π E (x0,y)∼D T -1 t=0 [L(x t , π t (x t ), E t (•))] , s.t. Eq. (1). (4) Assume that the input data is perturbed by a bounded and small amount, i.e., x ,0 = x 0 + • z, where z can be either random or adversarial. The proposed CLC-NN adjusts the perturbed state trajectory x ,t such that it stays at a minimum distance from the desired manifold M t while promoting small magnitudes of controls. Intuition. We use an intuitive example to show how CLC-NN controls the state trajectory of unseen data samples. We create a synthetic binary classification data set with 1500 samples. We train a residual neural network with one hidden layer of dimension 2, and adopt the fast gradient sign method (Goodfellow et al., 2014) to generate adversarial data. 

4. IMPLEMENTATION VIA THE PONTRYAGIN'S MAXIMUM PRINCIPLE

Dynamic Programming for Close-Loop Control (4). The control problem in Eq. ( 4) can be solved by the dynamical programming principle (Bellman, 1952) . For simplicity we consider one input data sample, and define a value function V : T × R d → R (where T := {0, 1, . . . , T -1}). Here V (t, x) represents the optimal cost-to-go function of Eq. ( 4) incurred from time t at state x. One can show that V (t, x) satisfies the dynamic programming principle V (t, x) = inf π∈Π [V (t + 1, x + f (x, θ t , π(x))) + L(x, π(x), E t (•))] . Eq. ( 5) gives a necessary and sufficient condition for optimality of Eq. ( 4), and it is often solved backward in time by discretizing the entire state space. The state dimension of a modern neural network is at the order of thousands or even higher, therefore, discretizing the state space and directly solving Eq. ( 5) is intractable for real-world applications due to the curse of dimensionality. Solving (5) via the Pontryagin's Maximum Principle. To overcome the computational challenge, the Pontryagin's Maximum Principle (Kirk, 1970) converts the intractable dynamical programming into two ordinary differential equations and a maximization condition. Instead of computing the control policy π of Eq. ( 5), the Pontryagin's Maximum Principle provides a necessary condition for the optimality with a set of control parameters [u * 0 , • • • , u * T ]. The mean-field Pontryagin's Maximum Principle can be considered when the initial condition is a batch of i.i.d. samples drawn from D. Specifically, we trade the intractable computational complexity with processing time for solving the Hamilton equations and its maximization condition for every newly observed data. To begin with, we define the Hamiltonian H : T × R d × R d × R l × R m → R as H(t, x t , p t+1 , θ t , u t ) := p T t+1 • f (x t , θ t , u t ) -L(x t , u t , E t (•)). Let x * denote the corresponding optimally controlled state trajectory. There exists a co-state process p * : [0, T ] → R d such that the Hamilton's equations x * t+1 = ∇ p H(t, x * t , p * t , θ t , u * t ), (x * 0 , y) ∼ D, (7) p * t = ∇ x H(t, x * t , p * t+1 , θ t , u * t ), p * T = 0, are satisfied. The terminal co-state p T = 0, since we do not consider the terminal loss Φ(x T , y). Moreover, we have the Hamiltonian maximization condition H(t, x * t , p * t , θ t , u * t ) ≥ H(t, x * t , p * t , θ t , u t ), ∀u ∈ R d and ∀t ∈ T . Instead of solving Eq. ( 5) for the optimal control policy π * (x t ), for a given initial condition, the Pontryagin's Maximum Principle seeks for a open-loop optimal solution such that the global optimum of Eq. ( 5) is satisfied. The limitation of using the maximum principle is that the control parameters u * t need to be solved for every unseen data to achieve the optimal solution. Algorithm Flow. The numerical implementation of CLC-NN is summarized in Alg. 1. Given a trained network (either from standard or adversarial training) and a set of embedding functions, the controls are initialized as u t = 0, ∀t ∈ T , because adding random initialization weakens the Algorithm 1: CLC-NN with the Pontryagin's Maximum Principle. Input : Possibly perturbed data x , a trained neural network, embedding functions [E 1 , • • • , E T -1 ], maxItr (maximum number of iterations). Output: A set of optimal control parameters u * 0 , • • • , u * T -1 . for k = 0 to maxItr do J k = 0, for t = 0 to T -1 do x t+1,k = f (x t,k , θ t , u t,k ), where x 0,k = x , Forward propagation Eq. ( 7), J k = J k + L(x t,k , u t,k , E t (x t,k )), Objective function Eq. ( 4), end for for t = T to 1 do p t,k = p T t+1 • ∇ xt f (x t,k , θ t , u t,k ) -∇ xt L(x t,k , u t,k , E t (x t,k )), where p T,k = 0, Backward propagation Eq. ( 8) end for for t = 0 to T -1 do u t,k+1 = u t,k + p T t+1,k • ∇ ut f (x t,k , θ t , u t,k ) -∇ ut L(x t,k , u t,k , E t (x t,k )) , Maximization of Hamiltonian Eq. ( 9) based on Eq. ( 6) and gradient ascent. end for end for robustness performance in general, and clean trajectory often does not result in any running loss for the gradient update on the control parameters. In every iteration, a given input x 0 is propagated forwardly with Eq. ( 7) to obtain all the intermediate hidden states x t for all t and to accumulate cost J. Eq. ( 8) backward propagates the co-state p t and Eq. ( 9) maximizes the t th Hamiltonian with current x t and p t to compute the optimal control parameters u * t .

5. ERROR ANALYSIS FOR SIMPLIFIED LINEAR CASES

For the ease of analysis, we consider a simplified neural network with linear activation functions: x t+1 = θ t (x t + u t ), and reveal why our proposed method can improve robustness in the simplest setting. Given a perturbed data sample x ,0 , we denote its perturbation-free counterpart as x 0 so that z = x ,0 -x 0 . We consider a general perturbation where z is the direct sum of two orthogonal contributions: z , which is a perturbation within the data manifold (subspace), and z ⊥ , which is a perturbation in the orthogonal complement of the data manifold. This case is general: if we consider adversarial attacks, then the perturbation along the orthogonal complement dominates. In contrast, if we consider random perturbations, then the two perturbations are on the same scale. Our formulation covers both such extreme scenarios, together with intermediate cases. We use an orthogonal projection as the embedding function such that E t = V r t (V r t ) T , where V r t is the first r columns of the eigenvectors computed by the Principle Component Analysis on a collection of states x t . The proposed CLC-NN minimizes x ,t -x t 2 2 by reducing the components of x ,t that lie in the the orthogonal complement of Z t . The following theorem provides an error estimation between x ,t and x t . Theorem 1. For t ≥ 1, we have the error estimation x ,t -x t 2 2 ≤ θ t-1 • • • θ 0 2 2 • α 2t z ⊥ 2 2 + z 2 2 +γ t z 2 2 γ t α 2 (1-α t-1 ) 2 +2(α-α t ) , ( ) where γ t := max s≤t 1 + κ(θ s ) 2 Iθ T s θ s 2 , and α = c 1+c , c represents the control regularization. In particular, the equality x ,t -x t 2 2 = α 2t z ⊥ 2 2 + z 2 2 , holds when all θ t are orthogonal. The detailed derivation is presented in Appendix A. Let us summarize the insights from Theorem 1. • The above error estimation is general for any input perturbation. It shows the working principle behind the proposed CLC-NN on controlling the perturbation that lies in the orthogonal complement of input subspace (z ⊥ ). • The above error estimation improves as the control regularization c goes to 0 (so α → 0). It is not the sharpest possible as it relies on a greedily optimal control at each layer. The globally optimal control defined by the Ricatti equation may achieve a lower loss when c = 0. • When the dimension of embedding subspace r decreases, our control becomes more effective in reducing x ,t -x t 2 2 . This means that the control approach works the best when the data is constrained on a low dimensional manifold, which is consistent with the manifold hypothesis. In particular, observe that as r → 0, z 2 2 → 0 • The obtained upper bound is tight: the estimated upper bound becomes the actual error if all the forward propagation layers are orthogonal matrices.

6. NUMERICAL EXPERIMENTS

We test our proposed CLC-NN framework under various input data perturbations. Here we briefly summarize our experimental settings, and we refer readers to Appendix B for the details. • Original Networks without Close-Loop Control. We choose residual neural networks (He et al., 2016) with ReLU activation functions as our target for close-loop control. In order to show that CLC-NN can improve the robustness in various settings, we consider networks from both standard and adversarial trainings. We consider multiple adversarial training methods: fast gradient sign method (FGSM) (Goodfellow et al., 2014) , project gradient descent (PGD) (Madry et al., 2017) , and the Label smoothing training (Label Smooth) (Hazan et al., 2017) . • Input Perturbations. In order to test our CLC-NN framework, we perturb the input data within a radius of with = 2, 4 and 8 respectively. We consider various perturbations, including nonadversarial perturbations with the manifold-based attack (Jalal et al., 2017 ) (Manifold), as well as some adversarial attacks such as FGSM, PGD and the CW methods (Carlini & Wagner, 2017 ). • CLC-NN Implementations. We consider both linear and nonlinear embedding in our closeloop control. Specifically, we employ a principal component analysis with a 1% truncation error for linear embedding, and convolutional auto-encoders for nonlinear embedding. We use Adam (Kingma & Ba, 2014) to maximize the Hamiltonian function ( 9) and keep the same hyperparameters (learning rate, maximum iterations) for each model against all perturbations. Result Summary: Table 1 and Table 2 show the results for both CIFAR-10 and CIFAR-100 datasets on some neural networks from both standard training and adversarial training respectively. • CLC-NN significantly improves the robustness of neural networks from standard training. Table 1 shows that the baseline network trained on a clean data set becomes completely vulnerable (with almost 0% accuracy) under PGD and CW attacks. Our CLC-NN improves its accuracy to nearly 40% and 80% under PGD and CW attacks respectively. The accuracy under FGSM attacks has almost been doubled by our CLC-NN method. The accuracy on clean data is slightly decreased because the lower-dimensional embedding functions cannot exactly capture Z or M. • CLC-NN further improves the robustness of adversarially trained networks. Table 2 shows that while an adversarially trained network is inherently robust against certain types of perturbations, CLC-NN strengthens its robustness significantly against various perturbations. For in- stance, CLC-NN improves the accuracy of an FGSM trained network under PGD and CW attacks by a maximum of 59% and 81%, respectively. • The robustness improvement of adversarially trained networks is less significant. This is expected because the trajectory of perturbed data lies on the embedding subspace Z if that data sample has been used in adversarial training. However, our experiments show that applying CLC-NN to adversarially trained networks can achieve the best performance under most attacks. Comparison with PixelDefend (Song et al., 2017) . Our method achieves similar performance on CIFAR-10 with slightly different experimental setting. Specifically, PixelDefend improved the robustness of a normally trained 62-layer ResNet from 0% to 78% against CW attack. Our proposed CLC-NN improves the robustness of a 20-layer ResNet from 0% to 81% against CW attacks. Furthermore, we show that CLC-NN is robust against the manifold-based attack. No result was reported for CIFAR-100 in Song et al. (2017) . Comparison with Reactive Defense Reactive defenses can be understood as only applying a control at the initial condition of a dynamical system. Specifically, reactive defense equipped with linear embedding admits the following dynamics: x t+1 = f (x t , θ t ), s.t. x 0 = V r 0 (V r 0 ) T x ,0 . By contrast, CLC-NN controls all hidden states and results in a decreasing error as the number of layers T increases (c.f. Theorem 1). To quantitatively compare CLC-NN with reactive defense, we implement them with the same linear embedding functions and against all perturbations. In Table 3 , CLC-NN outperforms reactive defense in almost all cases except that their performances are case-dependent on clean data.

7. CONCLUSION

We have proposed a close-loop control formulation to improve the robustness of neural networks. We have studied the embedding of state trajectory during forward propagation to define the optimal control objective function. The numerical experiments have shown that our method can improve the robustness of a trained neural network against various perturbations. We have provided an error estimation for the proposed method in the linear case. Our current implementation uses the Pontryagin's Maximum Principle and an online iterative algorithm to overcome the intractability of solving a dynamical programming. This online process adds extra inference time. In the future, we plan to show the theoretical analysis for the nonlinear embedding case. to the state trajectory equipped with u * t define by Eq. ( 14), which gives a guaranteed upper bound for the error estimation of the dynamic programming solution. We define the feedback gain matrix K t = (c • I + Q T t Q t ) -1 Q T t Q t . Thus, the one-step optimal feedback control can be represented as u * t = -K t x t . The difference between the controlled system applied with perturbation at initial condition and the uncontrolled system without perturbation is shown x ,t+1 -x t+1 = θ t (x ,t + u t -x t ), = θ t (x ,t -K t x ,t -x t ). (15) The control objective is to minimize the state components that span the orthogonal complement of the data manifold (I -V r t (V r t ) T ), when the input data to feedback control only stays in the state manifold, such that (I -V r t (V r t ) T )x t 2 2 = 0, the feedback control K t x t = 0. The state difference of Eq. ( 15) can be further shown by adding a 0 term of (θ t K t x t ) x ,t+1 - x t+1 = θ t (I -K t )x ,t -θ t x t + θ t K t x t , = θ t (I -K t )(x ,t -x t ). (16) In the following, we show a transformation on the control dynamic term (I -K t ) based on its definition. Lemma 1. For t ≥ 0, we have I -K t = α • I + (1 -α) • P t , where P t := V r t (V r t ) T , which is the orthogonal projection onto Z t , and α := c 1+c such that α ∈ [0, 1]. Proof. Recall that K t = (c • I + Q T t Q t ) -1 Q T t Q t , and Q t = I -V r t (V r t ) T , Q t can be diagonalized as following Q t = V t       0 0 • • • 0 0 0 0 • • • 0 0 . . . . . . . . . 0 0 0 0 • • • 1 0 0 0 • • • 0 1       V T t , where the first r diagonal elements have common value of 0 and the last (d -r) diagonal elements have common value of 1. Furthermore, the feedback gain matrix K t can be diagonalized as K t = V t        0 0 • • • 0 0 0 0 • • • 0 0 . . . . . . . . . 0 0 0 0 • • • 1 1+c 0 0 0 • • • 0 1 1+c        V T t , where the last (d -r) diagonal elements have common value of 1 1+c . The control term (I -K t ) thus can be represented as I -K t = V t        1 0 • • • 0 0 0 1 • • • 0 0 . . . . . . . . . 0 0 0 0 • • • c 1+c 0 0 0 • • • 0 1 1+c        V T t , where the first r diagonal elements have common value of 1 and the last (d -r) diagonal elements have common value of c 1+c . By denoting the projection of first r columns as V r t and last (d -r) columns as Vr t , it can be further shown as I -K t = V r t (V r t ) T + c 1 + c Vr t ( Vr t ) T , = P t + α I -P t , = α • I + (1 -α) • P t . Oblique Projections. Let P be a linear operator on R d , • We say that P is an projection if P 2 = P. • P is an orthogonal projection if P = P T = P 2 . • If P 2 = P but P = P T , it is called an oblique projection. Proposition 2. For a projection P, 1. If P is an orthogonal projection, then P 2 = 1. 2. If P is an oblique projection, then P 2 > 1. 3. If P, Q are two projections such that range(P) = range(Q), then PQ = Q and QP = P. 4. If P is a projection, then rank(P) = T r(P). Furthermore, if P is an orthogonal projection, then rank(P) = P 2 F = T r(PP T ). Define for t ≥ 0 P 0 t := P t , P (s+1) t := θ -1 t-s-1 P s t θ t-s-1 , s = 0, 1, . . . , t -1, Lemma 3. Let P s t be defined as above for 0 ≤ s ≤ t. Then 1. P s t is a projection. 2. P s t is a projection onto Z t-s , i.e. range(P s t ) = Z t-s .

3.

P s t 2 F ≤ κ(θ t-1 θ t-2 . . . θ t-s ) 2 • r, where κ(A) is the condition number of A, i.e. κ(A) = A 2 • A -1 2 , and r = rank(Z 0 ) = rank(Z 1 ) = . . . = rank(Z t ).

Proof.

1. We prove it by induction on s for each t. For s = 0, P 0 t = P t , which is a projection by its definition. Suppose it is true for s such that P s t = P s t P s t , then for (s + 1), (P s+1 t ) 2 = θ -1 t-s-1 P s t θ t-s-1 2 , = θ -1 t-s-1 P s t 2 θ t-s-1 , = θ -1 t-s-1 P s t θ t-s-1 , = P s+1 t . 2. We prove it by induction on s for each t. For s = 0, P 0 t = P t , which is the orthogonal projection onto Z t . Suppose that it is true for s such that P s t is a projection onto Z t-s , then for (s + 1), P s+1 t = θ -1 t-s-1 P s t θ t-s-1 , which implies range(P s+1 t ) = range(θ -1 t-s-1 P s t ), = {θ -1 t-s-1 x : x ∈ Z t-s }, = Z t-s-1 .

3.. We use the inequalities AB

F ≤ A 2 B F , and AB F ≤ A F B 2 . By the definition of P s t , P s t = θ t-1 θ t-2 • • • θ t-s -1 P 0 t θ t-1 θ t-2 • • • θ t-s , we have the following P s t 2 F ≤ θ t-1 θ t-2 • • • θ t-s -1 2 2 • θ t-1 θ t-2 • • • θ t-s 2 2 • P 0 t 2 F , ≤ κ(θ t-1 θ t-2 • • • θ t-s ) 2 • r, Lemma 2(4). for (t + 1), F t+1 = G t t F (t), = (α • I + (1 -α) • P t t )F t , = (α • I + (1 -α) • P t t )(α t • I + (1 -α) t-1 s=0 α s P s s ), = α t+1 • I + α t (1 -α)P t t + (1 -α) 2 t-1 s=0 α s • P t t P s s + α(1 -α) t-1 s=0 α s • P s s . Recall Lemma 3, range(P t t ) = range(P s s ) = Z 0 . According to Proposition 2 (3), P t t P s s = P s s . Hence, F t+1 = α t+1 • I + α t (1 -α) • P t t + (1 -α) t-1 s=0 α s • P s s , = α t+1 • I + (1 -α) t s=0 α s • P s s . Lemma 6. Let V ∈ R d×r be a matrix whose columns are an orthogonal basis for a subspace D, and θ ∈ R d×d be invertible. Let P = VV T be the orthogonal projection onto D. Denote by P the orthogonal projection onto θD := {θx : x ∈ D}. Then 1. θ -1 Pθ is an oblique projection onto D. 2. θ -1 Pθ -P 2 ≤ 1 + κ(θ) 2 • I -θ T θ 2 . In general, the last inequality shows that θ -1 Pθ = P, if θ is orthogonal. Proof. 1. (θ -1 Pθ) 2 = θ -1 P2 θ = θ -1 Pθ, therefore, θ -1 Pθ is an projection. 2. Since P is orthogonal projection onto the row space of θV, then P = θV (θV) T (θV) -1 (θV) T , = θV V T θ T θV -1 V T θ T . θ -1 Pθ = V V T θ T θV -1 V T θ T θ. Furthermore, θ -1 Pθ -P 2 = V V T θ T θV -1 V T θ T θ -VV T 2 , ≤ V V T θ T θV -1 V T θ T θ -VV T θ T θ 2 + VV T θ T θ -VV T 2 , ≤ V [V T θ T θV] -1 -I V T 2 • θ T θ 2 + θ T θ -I 2 , ≤ [V T θ T θV] -1 2 • I -V T θ T θV 2 • θ T θ 2 + θ T θ -I 2 , ≤ [V T θ T θV] -1 2 • I -θ T θ 2 • θ T θ 2 + θ T θ -I 2 . We further bound [V T θ T θV] -1 2 . [V T θ T θV] -1 2 = λ min (V T θ T θV) -1 , = inf x 2=1 x T V T θ T θVx -1 , ≤ inf x 2=1 (x ) T θ T θx -1 , = λ min (θ T θ) -1 , = (θ T θ) -1 2 . Hence, we have θ -1 Pθ -P 2 ≤ 1 + θ T θ 2 • (θ T θ) -1 2 • I -θ T θ 2 , = 1 + κ(θ) 2 • I -θ T θ 2 . Corollary 1. Let t ≥ 1. Then for each s = 0, 1, • • • , t, we have P s s -P 0 2 ≤ 1 + κ(θ s ) 2 • I -θ T s θ s 2 , where • θ := θ s-1 • • • θ 0 , s ≥ 1, • θ := I, s = 0. Observe that P s s = (θ s ) -1 P s θ s . Using Lemma 6, we arrive at the main theorem. Theorem 1. For t ≥ 1, we have the error estimation x ,t -x t 2 2 ≤ θ t-1 • • • θ 0 2 2 • α 2t z ⊥ 2 2 + z 2 2 + γ t z 2 2 γ t α 2 (1 -α t-1 ) 2 + 2(α -α t ) . where γ t := max s≤t 1 + κ(θ s ) 2 Iθ T s θ s 2 , and α = c 1+c , c represents the control regularization. In particular, the equality x ,t -x t 2 2 = α 2t z ⊥ 2 2 + z 2 2 . holds when all θ t are orthogonal. Proof. The input perturbation z = x ,0 -x 0 can be written as z = z + •z ⊥ , where z ∈ Z and z ⊥ ∈ Z ⊥ , where z and z ⊥ are vectors such that • z • z ⊥ = 0 almost surely. • z , z ⊥ have uncorrelated components. • z ∈ D, and z ⊥ ∈ D ⊥ . Since z and z ⊥ are orthogonal almost surely, recall Lemma 4, x ,t -x t 2 2 = (θ t-1 θ t-2 • • • θ 0 )(G t-1 t-1 • • • G 0 0 )z 2 2 , ≤ θ t-1 θ t-2 • • • θ 0 2 2 • (G t-1 t-1 • • • G 0 0 )z 2 2 , For the term (G t-1 t-1 • • • G 0 0 )z 2 2 , recall Lemma 5, (G t-1 t-1 • • • G 0 0 )z 2 2 = α t • I + (1 -α) t-1 s=0 α s • P s s z 2 2 , = α t z + (1 -α) t-1 s=0 α s P 0 z + (1 -α) t-1 s=0 α s (P s s -P 0 )z 2 2 , = α t z + (1 -α t )z + (1 -α) t-1 s=0 α s (P s s -P 0 )z 2 2 , in the above, P 0 is an orthogonal projection on t = 0 (input data space), therefore, P 0 z = z . Furthermore, when s = 0, P s s -P 0 = 0. Thus, (G t-1 t-1 • • • G 0 0 )z 2 2 = α 2t z 2 2 + (1 -α t ) 2 z 2 2 + (1 -α) 2 t-1 s,q=1 α s α q z T (P s s -P 0 ) T (P q q -P 0 )z + 2α t (1 -α t ) z 2 2 + 2α t (1 -α) t-1 s=1 α s z T (P s s -P 0 )z + 2(1 -α t )(1 -α) t-1 s=1 α s (z ) T (P s s -P 0 )z, = α 2t z ⊥ 2 2 + α 2t + 2α t (1 -α t ) + (1 -α t ) 2 z 2 2 + (1 -α) 2 t-1 s,q=1 α s α q z T (P s s -P 0 ) T (P q q -P 0 )z + 2α t (1 -α) t-1 s=1 α s z T (P s s -P 0 )z + 2(1 -α t )(1 -α) t-1 s=1 α s (z ) T (P s s -P 0 )z, = α 2t z ⊥ 2 2 + z 2 2 + (1 -α) 2 t-1 s,q=1 α s α q z T (P s s -P 0 ) T (P q q -P 0 )z + 2α t (1 -α) t-1 s=1 α s z T (P s s -P 0 )z + 2(1 -α t )(1 -α) t-1 s=1 α s (z ) T (P s s -P 0 )z. Using Corollary 1, we have • z T (P s s -P 0 )z ≤ z 2 2 • P s s -P 0 2 , ≤ γ t z 2 2 . • z T (P s s -P 0 ) T (P q q -P 0 )z ≤ z 2 2 • P s s -P 0 2 • P q q -P 0 2 , ≤ γ 2 t z 2 2 . • (z ) T (P s s -P 0 )z ≤ γ t z 2 • z 2 , ≤ γ t z 2 2 . Thus, we have (G t-1 t-1 • • • G 0 0 )z 2 2 ≤ α 2t z ⊥ 2 2 + z 2 2 + α 2 (1 -α t-1 ) 2 γ 2 t z 2 2 + 2α t+1 (1 -α t-1 )γ t z 2 2 + 2α(1 -α t )(1 -α t-1 )γ t z 2 2 , = α 2t z ⊥ 2 2 + z 2 2 + γ t z 2 2 γ t α 2 (1 -α t-1 ) 2 + 2(α -α t ) . Recall the error estimation in Eq. ( 18), In the specific case, when all θ t are orthogonal, x ,t -x t 2 2 ≤ θ t-1 θ t-2 • • • θ 0 2 2 • (G t-1 t-1 • • • G 0 0 )z 2 2 , ≤ θ t-1 • • • θ 0 2 2 • α 2t z ⊥ 2 2 + z 2 2 + γ t z 2 2 γ t α 2 (1 -α t-1 ) 2 + 2(α -α t ) . γ t : = max s≤t 1 + κ(θ s ) 2 I -θ T s θ s 2 = 0. Thus, x ,t -x t 2 2 = α 2t z ⊥ 2 2 + z 2 2 .

B APPENDIX B DETAILS OF EXPERIMENTAL SETTING B.1 NETWORK CONFIGURATIONS

Since the proposed CLC-NN optimizes the entire state trajectory, it is important to have a relatively smooth state trajectory, in which case, when the reconstruction loss E t (x t ) -x t 2 2 at layer t is small, the reconstruction losses at its adjacent layers should be small. For this reason, we use residual neural network (He et al., 2016) as network candidate to retain smoother dynamics. The configuration of the residual neural network used for both CIFAR-10 and CIFAR-100 is shown in Tab. 4. Based on the configuration of residual neural network shown in Tab. 4, we construct 4 embedding functions applied at input space, outputs of initial layer, residual block 1 and residual block 2. The output of residual block 3 is embedded with a linear orthogonal projection. We randomly select 5000 clean training data to collect state trajectories at all 5 locations. • For the linear orthogonal projections: we apply the Principle Component Analysis on each of the state collections. We retain the first r columns of the resulted basis, such that r = arg min{i : , 18, 36] [16, 36, 72] [16, 36, 72] [32, 128, 256] • For the nonlinear embedding: we train 4 convolutional auto-encoders for the input space, outputs of the initial layer and residual blocks 1, 2. All of the embedding functions are trained individually. We adopt shallow convolutional auto-encoder structure to gain fast inference speed, in which case, CLC-NN equipped with linear embedding often outperform the nonlinear embedding as shown in Tab. 1. The configuration of all 4 convolutional auto-encoders are shown in Tab. 5. (input channel = c 1 , output channel = c 2 , kernel size = 4 × 4, stride = 2 × 2, padding = 1 × 1 ), ELU(alpha=1), BatchNorm2d(channel = c 2 ), Conv2d (input channel = c 2 , output channel = c 3 , kernel size = 4 × 4, stride = 2 × 2, padding = 1 × 1 ), ELU(alpha=1). Decoder ConvTranspose2d (input channel = c 3 , output channel = c 2 , kernel size = 4 × 4, stride = 2 × 2, padding = 1 × 1), ELU(alpha=1), ConvTranspose2d (input channel = c 2 , output channel = c 1 , kernel size = 4 × 4, stride = 2 × 2, padding = 1 × 1), Auto-encoder Index 0 1 2 3 Channel Dimensions [c 1 , c 2 , c 3 ] [3

B.2 PERTURBATIONS AND DEFENSIVE TRAINING

In this section, we show details about the perturbations and robust networks that have been considered in this work. For the adversarial training objective function, min θ∈Θ max x ,0=∆(x0 , ) E (x0,y)∼D [(1 -λ) • Φ i (x ,T , y, θ) + λ • Φ i (x T , y, θ)], where ∆(x 0 , ) generates a perturbed data from given input x 0 within the range of . λ balances between standard accuracy and robustness. We choose λ = 0.5 in all adversarial training. For robust networks, we consider both perturbation agnostic and non-agnostic methods. For the perturbation agnostic adversarial training algorithms equipped ∆(x 0 , ), the resulted network that is the most robust against the ∆(x 0 , ) perturbation. On the contrary, perturbation non-agnostic robust training methods are often robust against many types of perturbations. • Adversarial training with the fast gradient sign method (FGSM) (Goodfellow et al., 2014) considers perturbed data as follows. x ,0 = x 0 + sign(∇ x0 Φ(x T , y)), (x 0 , y) ∼ D, where sign(•) outputs the sign of the input. In which case, FGSM considers the worse case within the range of along the gradient ∇ x0 Φ(x T , y) increasing direction. Due to the worse case consideration, it does not scale well for deep networks, for this reason, we adversarially train the network with FGSM with = 4, which is half of the maximum perturbation considered in this paper. • The label smoothing training (Label Smooth) (Hazan et al., 2017) does not utilize any perturbation information ∆(x 0 , ). It converts one-hot labels into soft targets by setting the correct class as 1 -, while other classes have value of N -1 , where is a small constant and N is number of classes. Specifically, we choose = 0.9 in this paper. We provide detailed explanation for the successful defense of the proposed CLC-NN against such strong adversarial attack. Exsiting manifold-based defense (Samangouei et al., 2018) focuses on detecting and de-noising the input components that do not lie within the underlying manifold. The overpowered attack proposed in Jalal et al. (2017) searches adversarial attack with in the embedded latent space, which is undetectable for the manifold-based defenses and caused complete failure defense. In the real implementation, the manifold-based attack (Jalal et al., 2017) is detectable and controllable under the proposed framework due to the following reason. The numerically generated manifold embedding functions are not ideal. The error sources of non-ideal embedding functions are mainly due to the algorithm that used to compute the manifold, the architecture of embedding function, and the distribution shift between training and testing data (embedding functions of training data do not perfectly agree with testing data). In which case, even the perturbation is undetectable and non-controllable at initial layer, as it is propagated into hidden layers, each layer amplifies such perturbation, therefore, the perturbation becomes detectable and controllable in hidden layers. We randomly select the batch of testing data to generate the manifold-based attack following the same procedure proposed in Jalal et al. (2017) . The proposed method improves the attacked accuracy from 1% to 78%. More specifically, we compare the differences of all hidden states spanning the orthogonal complement between a perturbed testing data and its unperturbed counterpart, P ⊥ t x ,t -P ⊥ t x ,t , where P ⊥ t is a projection onto the orthogonal complement. The difference is growing such as 0, 0.016, 0.0438, 0.0107, 0.0552 for hidden states at layer 0, 1, 2, 3, 4 respectively. This validates the argument for how the proposed method is able to detect such perturbation and controls the perturbation in hidden layers. Furthermore, we provide some insights about the reasons behind the success of such an adversarial attack. This follows the same concept of the existence of adversarial attack in neural networks. The highly nonlinear behaviour of neural networks preserves complex representative ability, meanwhile, its powerful representation results in its vulnerability. For example, a constant function has 50% chance to make a correct prediction in binary classification problem under any perturbation, but its performance is limited. Therefore, we propose to use a linear embedding function that compensates between the embedding accuracy and robustness.

E DEFINITION OF THREAT MODEL

Generally, an attacker should not have access to the hidden states during inference, in which case, an attacker is not allowed to inject extra noise during inference. To define the threat model of the proposed method, for the white-box setting, an attacker has access to both network and all embedding functions. The condition that the perturbation • z makes our method vulnerable is defined as follows, T -1 t=0 E t (x ,t ) -x ,t 2 2 = 0, x ,0 = x 0 + • z. In words, the perturbation • z applied on the input data must result in 0 reconstruction losses across all hidden layers, which means its corresponding state trajectory does not span any of all orthogonal complements of all hidden state spaces. To obtain an effective attack satisfying the above equation, conventional gradient-based attackers cannot guarantee to find an perfect attack. A possible way is



Figure 1: The structures of feed-forward neural network (black) and the proposed method (blue).

and the gradient based network training was established. Ye et al. (2019) used control theory to adjust the hyperparameters in the adversarial training algorithm. Han et al. (2018) established the mathematical basis of the optimal control viewpoint of deep learning. These existing works on algorithm development are open-loop control methods since they commonly treat the network weights θ as control parameters and keep them fixed once the training is done. The fixed control parameters θ operate optimally for data sampled from the data distribution D. However, various perturbation methods cause data distributions to deviate from the true distribution D (Song et al., 2017) and cause poor performance with the fixed open-loop control parameters.

Fig. 2 (a) and (b) show the states of clean data (red and blue) and of perturbed data (black and gray) at t = 0 and t = 1, respectively. The CLC-NN adjusts the state trajectory to reduce the reconstruction loss as shown in Fig. 2 (c) and (d), where lighter background color represents lower reconstruction loss. Comparing Fig. 2 (a) with (c), and Fig. 2 (b) with (d), we see that the perturbed states in Fig. 2 (a) and (b) deviate from the desired state manifold (light green region) and has a high reconstruction loss. Running 1000 iterations of Alg. 1 adjusts the perturbed states and improves the classification accuracy from 86% to 100%.

Figure 2: (a) and (b) show the states of clean data (red and blue) and of perturbed data (black and gray) at the initial and hidden layers respectively. The yellow and green backgrounds represent the two classes predicted by the network. Some of the perturbed data are mis-classified. (c) and (d) show the adjusted states after the proposed close-loop control. The background lightness from light to dark represent the increasing reconstruction loss E t (x t ) -x t 2 2 .

λ1+...+λi λ1...+λ d ≥ 1 -δ}, where δ = 0.1.

formulated the robustness training using the Pontryagon's Maximum Principle, such open-loop control methods result in a set of fixed parameters that operates optimally on the considered perturbation.Liu et al. (2020a;b)   considered a close-loop formulation from the differential dynamic programming perspective, this algorithm is categorized as a open-loop control method because it utilizes the state feedback information to boost the training convergence and results in a set of fixed controls for any unseen data. On the contrary, the proposed CLC-NN formulation adaptively targets on the inputs with different control parameters and is capable of distinguishing clean data by generating no control.

Experimental results on ResNet-20 from standard training.

Experimental results on robustly trained networks.

Accuracy comparision of CLC-NN and reactive defense in Eq. (12), with attack = 2 / 4 / 8.

ResNet for Both CIFAR-10 and CIFAR-100

Convolutional Auto-Encoders

Experimental results on DenseNet-40 from standard training.

Comparison between CLC-NN and layer-wise projection.

acknowledgement

Acknowledgement Zhuotong Chen and Zheng Zhang are supported by NSF CAREER Award No. 1846476 and NSF CCF No. 1817037. Qianxiao Li is supported by the start-up grant under the NUS PYP programme.

A APPENDIX A ERROR ESTIMATION FOR THE PROPOSED CLC-NN

Preliminaries We define the performance index at time t aswhere Q t = I -V r t (V r t ) T , V r t is the linear projection matrix at time t with only its first r principle components corresponding to the largest r eigenvalues. The optimal feedback control is defined asdue to the linear system and quadratic performance index, the optimal feedback control admits an analytic solution by taking the gradient of performance index (Eq. ( 13)) and setting it to 0.The above analytic control solution u * t optimizes the performance index instantly at time step t, the error measured by Eq. ( 13) for the dynamical programming solution x ,t must be smaller or equalThe following Lemma uses the concept of oblique projection to show a recursive relationship to project any t th state space of Eq. ( 16) back to the input data space. Lemma 4. Define for 0 ≤ s ≤ t,Then, Eq. ( 16) can be written asProof. We prove it by induction on t. For t = 1, by the definition of G s t and transformation from Lemma 1,. Suppose that it is true for (x ,t -x t ), by using Eq. ( 16) and Lemma 1, we haveRecall the definitions ofwe have the following, which results in the equality for the oblique projections. Furthermore,Applying the above to Eq. ( 17) results inProof. We prove it by induction on t. Recall the definition oft-1 s=0 α s P s s , sarial data by iteratively run FGSM with small step size, which results in stronger perturbations compared with FGSM within the same range . We use 7-step of = 2 to generate adversarial data for robust training.For Perturbations, we consider the maximum range of = 2, 4, 8 to test the performance the network robustness against both strong and weak perturbations. For this work, we test network robustness with the manifold-based attack (Jalal et al., 2017) , FGSM (Goodfellow et al., 2014) , 20-step of PGD (Madry et al., 2017) and the CW attack (Carlini & Wagner, 2017) .

B.3 ONLINE OPTIMIZATION

Optimization Methods. we use Adam (Kingma & Ba, 2014) to maximize the Hamiltonian Eq. ( 9) with default setting. In which case, solving the PMP brings in extra computational cost for inference.Each online iteration of solving the PMP requires a combination of forward propagation (Eq. ( 7)), backward propagation (Eq. ( 8)) and a maximization w.r.t. the control parameters (Eq. ( 9)), which has computational cost approximately the same as performing gradient descent on training a neural network for one iteration. For the numerical results presented in the paper, we choose the maximum iteration that gives the best performance from one of [5, 10, 20, 30, 50] .

C MORE NUMERICAL EXPERIMENTS

The proposed CLC-NN is designed to be compatible with existing open-loop trained. We show extra experiments by employing the proposed CLC-NN on two baseline models, DenseNet-40 (Table 6 ).The layer-wise projection performs orthogonal projection on the hidden state. We define the local cost function at the t th layer as followsthe layer-wise achieves the optimal solution at local time t,However, the layer-wise optimal control solution does not guarantee the optimum across all layers. In Table 7 , we launch comparisons between the proposed CLC-NN with layer-wise projection. Specifically, under all perturbations the proposed CLC-NN outperforms layer-wise projection.

D ROBUSTNESS AGAINST MANIFOLD-BASED ATTACK

The manifold-based attack (Jalal et al., 2017) (denoted as Manifold) has shown great success on breaking down the manifold-based defenses (Samangouei et al., 2018) . The proposed CLC-NN can successfully defend such specifically design adversarial attack for manifold-based defenses and improves the robustness accuracy from 1% to 81% for the standard trained model in Cifar-10, and 2% to 52% in Cifar-100.to perform grid-search backward in layers to find such an adversarial attack satisfying the thread model condition, which is extremely costly.

