SYNC: SAFETY-AWARE NEURAL CONTROL FOR STA-BILIZING STOCHASTIC DELAY-DIFFERENTIAL EQUA-TIONS

Abstract

Stabilization of the systems described by stochastic delay-differential equations (SDDEs) under preset conditions is a challenging task in the control community. Here, to achieve this task, we leverage neural networks to learn control policies using the information of the controlled systems in some prescribed regions. Specifically, two learned control policies, i.e., the neural deterministic controller (NDC) and the neural stochastic controller (NSC), work effectively in the learning procedures that rely on, respectively, the well-known LaSalle-type theorem and the newly-established theorem for guaranteeing the stochastic stability in SDDEs. We theoretically investigate the performance of the proposed controllers in terms of convergence time and energy cost. More practically and significantly, we improve our learned control policies through considering the situation where the controlled trajectories only evolve in some specific safety set. The practical validity of such control policies restricted in safety set is attributed to the theory that we further develop for safety and stability guarantees in SDDEs using the stochastic control barrier function and the spatial discretization. We call this control as SYNC (SafetY-aware Neural Control). The efficacy of all the articulated control policies, including the SYNC, is demonstrated systematically by using representative control problems.

1. INTRODUCTION

Stochastic delay-differential equations (SDDEs) (Mao, 1996; Lin & He, 2005; Sun & Cao, 2007; Guo et al., 2016) have been widely applied to characterize the complex dynamical behavior emergent in real-world systems with dependence on the current state, the past state, and the noise. Efficiently controlling these systems is a long-standing and crucial problem, with the consequent emphasis being placed on the design of control policies and analysis of stability in SDDEs. Traditional control methods in stochastic settings have been fully developed in the convex optimization frameworks using the control Lyapunov stability theory, e.g. the quadratic programming (QP) (Fan et al., 2020; Sarkar et al., 2020) . These methods cannot provide the analytical form of feedback controllers and own a high computational cost, requiring solving QP problems at each iteration step. To overcome these difficulties, utilizing neural networks (NNs) to automatically design controllers becomes one of the mainstream approaches in recent years (Zhang et al., 2022; Chang et al., 2019) . However, existing machine-learning-based methods either focus on controlling systems without time-delay or aim at learning the control Lyapunov function instead of the control policy (Khansari-Zadeh & Billard, 2014) . All these, therefore, motivate us to design neural controllers for general nonlinear SDDEs. The safety verification of controlled systems plays an important role in many branches of cybernetics and industry. For example, with the safety verification, one can reduce a significant economic burden and loss of life (Ames et al., 2016; Wang et al., 2016) . In particular, the dominant framework for safety control in stochastic settings is the use of stochastic control barrier function (SCBF) (Clark, 2019; 2021; Santoyo et al., 2021) . The core idea of designing a candidate SCBF is that its value tends to explode as the system's state leaves the safe region, implying a safety guarantee as long as one could design a controller such that the SCBF is always finite within the controlled time duration. Unfortunately, the existing theories of SCBF either require a lot of inequality constraints or are limited in handling systems without any time delay. In this paper, we utilize neural networks (NNs) to learn control policies for SDDEs based on the corresponding stability theories. Additionally, we develop a simplified SCBF theory for SDDEs and then use it to construct the neural controller with a safety guarantee, named SYNC. All these control policies are intuitively depicted in Figure 1 . The major contributions of this paper include: • designing a novel and practical framework of neural deterministic control based on the existing LaSalle-Type stability theory, • proposing a simplified stability theorem and designing the second novel neural stochastic control framework that can benefit from noise according to this theorem, • establishing an SCBF theory for SDDEs as well as a theory of safety guarantee and stability guarantee using neural network settings, • providing theoretical estimation for the proposed neural controller in terms of convergence time and energy cost based on the developed theory of safety and stability guarantees, and • demonstrating the efficacy of the proposed neural control methods through numerical comparisons with the typical existing control methods on several representative physical systems.

2. PRELIMINARIES

To begin with, we consider the SDDE in a general form of dx(t) = F (x(t), x(t -τ ), t)dt + G(x(t), x(t -τ ), t)dB t , t ≥ 0, τ > 0, x(t) ∈ R d , where x(t) = ξ(t) ∈ C F0 ([-τ, 0]; R d ) is the initial function, the drift term F : R d × R d × R + → R d and the diffusion term G : R d × R d × R + → R d×r are Borel-measurable functions, and B t is a standard r-dimensional (r-D) Brownian motion defined on probability space (Ω, F, {F t } t≥0 , P) with a filtration {F t } t≥0 satisfying the regular conditions. Without loss of generality, we assume that F (0, 0, t) = 0 and G(0, 0, t) = 0. This assumption guarantees that the zero solution x(t) ≡ 0 with t ≥ 0 is an equilibrium of Eq. ( 1). Additionally, the following notations and assumptions are used throughout the paper. Assumption 2.1 Assume that Eq. ( 1) has a unique solution x(t, ξ) on t ≥ 0 for any ξ ∈ C F0 ([-τ, 0]; R d ) and that, for every integer n ≥ 1, there is a number  K n > 0 such that ∥F (x, y, t)∥ ∨ ∥G(x, y, t)∥ F ≤ K n , for any (x, y, t) ∈ R d × R d × R + with ∥x∥ ∨ ∥y∥ ≤ n, L ≜ ∂ ∂t + d i=1 F i (x, y, t) ∂ ∂x i + 1 2 d i,j=1 [G(x, y, t)G ⊤ (x, y, t)] ij ∂ 2 ∂x i ∂x j . According to the above definition of the derivative operator, an operation of L on the function V ∈ C 2,1 (R d × R + ; R) yields: LV (x, y, t) = V t (x, t) + ∇V (x, t) ⊤ F (x, y, t) + 1 2 Tr G ⊤ (x, y, t)HV (x, t)G(x, y, t) . (2) Here, V t , ∇V and HV represent, respectively, the time derivative, the gradient, and the Hessian matrix of V . Notably, the following LaSalle-type stability theorem will be crucial to the establishment of our partial results. Theorem 2.2 (Mao, 2002) Suppose that Assumptions 2.1 holds. Assumes there are functions V ∈ C 2,1 (X × R + ; R + ), γ ∈ L 1 (R + ; R + ), and w 1 , w 2 ∈ C(X ; R + ) such that LV (x, y, t) ≤ γ(t) - w 1 (x) + w 2 (y), w 1 (x) ≥ w 2 (x), and lim ∥x∥→∞ inf 0≤t≤∞ V (x, t) = ∞. Here, X ⊂ R d is the state space. Then, Ker(w 1 -w 2 ) ̸ = ∅ and lim t→∞ dist(x(t, ξ), Ker(w 1 -w 2 )) = 0 a.s., where Ker(w 1 -w 2 ) ≜ {x : w 1 (x) -w 2 (x) = 0}, dist(x, K) ≜ inf y∈K ∥x -y∥ for a set K ⊆ R d , and a.s. stands for the abbreviation of almost surely.

Problem Statement

We assume that the zero solution of the following SDDE: dx(t) = f (x, x(t -τ ), t)dt + g(x, x(t -τ ), t)dB t (3) is unstable, i.e. lim t→∞ x(t; ξ) ̸ = 0 on some set of positive measures. We aim to stabilize the zero solution using control based on neural networks (NNs). In other words, our goal is to leverage the NNs to design an appropriate controller u = (u f , u g ) with u f (0, 0, t) = u g (0, 0, t) = 0 such that the controlled system dx = [f + u f (x(t), x(t -τ ), t)]dt + [g + u g (x(t), x(t -τ ), t)]dB t (4) is steered to the zero solution. We call u f : R d × R d × R + → R d as deterministic control while we call u g : R d × R d × R + → R d×r as stochastic control, since they are integrated with dt and dB t , respectively. The major difficulty of this problem comes from the non-Markovian property of SDDEs. As such, we cannot apply the Markov decision process (MDP)-based methods, such as the reinforcement learning, to control SDDEs. The majority of existing works prefer to learn deterministic control and often regard the noise as a negative ingredient that may destroy the natural dynamics of f . In what follows, we not only show that the deterministic control can achieve stabilization in a probability sense, but also that elaborately-designed stochastic control can make the same stabilization. This, therefore, yields two frameworks, viz., the neural deterministic control (Section 3) and the neural stochastic control (Section 4). We make all our code and data available at https://github.com/jingddong-zhang/SYNC.

3. NEURAL DETERMINISTIC CONTROL

In this section, we propose the neural deterministic controller (NDC) based on the Theorem 2.2 to stabilize system (3). Heuristically, we construct the neural network form auxiliary functions and control functions, and integrate the sufficient conditions in the theorem into the loss function to find the neural controller that satisfies the expected conditions. However, the NDC can neither be used to find stochastic controllers nor rigorously satisfy the expected stability conditions. These problems will be addressed in Section 4 and 5.

3.1. METHOD: LEARNING CONTROL AND AUXILIARY FUNCTIONS

The core idea of our method is base on using Theorem 2.2, that is, once we construct the auxiliary functions V, γ, w 1 , w 2 and the neural controller u to meet all the conditions assumed in Theorem 2.2 for the controlled system (4), the solution x(t; ξ) converges to the Ker(w 1 -w 2 ). In particular, if we set Ker(w 1 -w 2 ) = {0}, the unstable zero solution of the control-free system (3) can be stabilized. To this end, we first provide appropriate constructions of NNs to learn these candidate functions. Thus, we design the explicit form of the loss function in the learning step. Auxiliary Function We employ a multi-layer feedforward neural network, denoted by NN(•; θ), to design all the functions. Precisely, θ 1 is the parameter vector of the positive function V (x, t; θ 1 ), and the L 2 term ∥x∥ 2 is added to guarantee lim ∥x∥→∞ inf 0≤t<∞ V (x, t; θ 1 ) = ∞, that is V (x, t; θ V ) = NN(x, t; θ V ) 2 + ε∥x∥ 2 , ε > 0. In our framework, it requires V ∈ C 2,1 (R d × R + ). We therefore use a C 2 activation function for an NN, such as the hyperbolic tangent function, Tanh(•). We further discuss the impact of the L 2 term in Appendix A.1.3. In order to design an integrable positive function γ(t) with the NN, we use an activation function with at most linear growth such as ReLU and multiply an exponential decay factor to the output of the NN, that is γ(t; θ γ ) = exp(-ct) • NN(t; θ γ ) 2 , c > 0. For simplicity, we design w(x, θ w ) = NN(x; θ w ) 2 as a positive function. Additionally, we set w 2 = w, w 1 = w + p(x), p ≥ 0, Ker(p) = {0}. Deterministic Control Function We first consider the deterministic control, i.e. u = (u f , 0). To guarantee the same zero solution of the control-free system (3) and the controlled system (4), the NDC u f : R d × R d × R + → R d should satisfy u f (0, 0, t) = 0. One feasible way to meet such a condition is to set u f (x, y, t) = NN(x, y, t; θ f ) -NN(0, 0, t; θ f ) or u f (x, y, t) = diag(x)NN(x, y, t; θ f ). Here, diag(x) is a diagonal matrix with x i as its i-th diagonal element. Loss Function Once the learned functions V, γ, w 1 , w 2 and u with the coefficient functions, f u ≜ f + u and g, in the controlled system (4), meet all the conditions assumed in Theorem 2.2, the stability of zero solution is naturally assured. To achieve this, we demand a suitable loss function to evaluate the likelihood that those conditions are satisfied. It can be seen from our construction that the only condition needed to be satisfied is LV (x, y, t) ≤ γ(t) -w 1 (x) + w 2 (y). Hence, we define LaSalle's loss function for the controlled system (4) as follows. Definition 3.1 (LaSalle's Loss) Consider the above parameterized candidate functions V, γ, w 1 , w 2 and a controller u f for the controlled system (4). Then, LaSalle's loss is defined as L N,ε,c,p (θ V , θ γ , θ w , θ f ) = 1 N N i=1 max (0, LV (x i , y i , t i ) -γ(t i ) + w 1 (x i ) -w 2 (y i )) , ( ) where {x i , y i , t i } N i=1 are sampled from some distribution µ on R d × R d × R + . In summary, the developed NDC framework is shown in Algorithm 1 in Appendix A.3.1. Remark 3.1 The proposed NDC framework can be easily applied to the autonomous SDDE: dx(t) = f (x, x(t -τ ))dt + g(x, x(t -τ ))dB t . In particular, one can simply consider the autonomous auxiliary function V and the control function, and set γ(t) = 0. For sample distribution µ(Ω), here we select the uniform distribution on a sufficiently large and closed region Ω as used in (Han et al., 2016; Chang et al., 2019) , and we include further analyses for the impact of µ in Appendix A.2.1.

3.2. NUMERICAL AND ANALYTICAL INVESTIGATIONS

Comparison Studies Recent works on controlling time-delayed systems mainly focus on elaborately designing the analytical form of control to satisfy the conditions in the LaSalle-Type Theorem 2.2 (Lin & He, 2005; Xu et al., 2014) , or simultaneously designing control and the Lyapunov function to satisfy the conditions based on the Lyapunov theory (Yu & Cao, 2007) . It should be noted that all these methods require a delicate design of functions for specific dynamics, and thus are limited in practical application for controlling general time-delayed systems. However, our neural method leverages NNs to automatically learn the control policies, and can be applied in any kind of time-delayed systems with stochastic settings. In Figure 3 , we numerically compare the NDC and a baseline, the linear control (LC) proposed in (Lin & He, 2005) , on a noised driving-response Chua's circuit. Here, Chua's circuit is a three-dimensional autonomous dynamical system with a unique nonlinear element, producing typical chaotic dynamics (Matsumoto, 1984) . In the simulation, we show that the NDC can find the neural control for the response system y = (y 1 , y 2 , y 3 ) with the autonomous and even the nonautonomous time-delay noise. Actually, the nonautonomous time-delay noise was not considered in (Lin & He, 2005) . The simulation configurations are described in Appendix A.3.4. Failure in Finding Stochastic Control As we can see that the NDC performs well, a natural idea is to utilize the noise part to achieve the stabilization of the SDDE (3). To explore this idea, we adopt the same NN of u f , design u g = NN(x, y, t; θ g ), and train its parameters θ g with LaSalle's loss (8). However, in Figure 2 , we show that the loss cannot converge to zero in controlling a simple 1-D toy system via the stochastic controller u g : dx(t) = [x(t) + x(t -τ )]dt + [x(t -τ ) + u g (x(t), x(t -τ ); θ g )]dB t . Actually, this phenomenon can be analytically explained. Notice that θ g arises in loss function as a quadratic term l(θ g ) = 1 2 Tr[u ⊤ g HV u g ] according to Eq. ( 2), the sign of this term depends on the convexity of V , i.e. the maximum eigenvalue's sign of HV . Nevertheless, the positive function V with lim ∥x∥→∞ V (x, t) = ∞ implies l(θ g ) ≥ 0 for most of time. Hence, when we minimize l(θ g ) ≥ 0 in the training procedure, the ideal case l(θ g ) = 0 is equivalent to u g = 0. This indicates that we are unable to learn a stochastic controller under LaSalle's loss (8) satisfying the sufficient conditions assumed in Theorem 2.2.

4. NEURAL STOCHASTIC CONTROL

To find the neural stochastic controller (NSC), we provide the following theoretical result on stabilization of general stochastic functional differential equations (SFDEs) with the proof provided in Appendix A.1.4. Since the failure of NDC in the stochastic control case comes from the positive number contributed by the diffusion term, we aim at constructing stability condition such that the part related to the diffusion term can be negative. We further explain the Theorem 4.1 in Appendix A.1.4. Theorem 4.1 (Stochastic Stabilization) Consider the SFDE dx(t) = F (x t , t)dt + G(x t , t)dB(t), with F, G being locally Lipschitzian functions, F (0, t) = 0, and G(0, t) = 0. For every M > 0, assume that min ∥xt(0)∥=M ∥x t (0) ⊤ G(x t , t)∥ > 0. If there exists a number α ∈ (0, 1) such that ∥x t (0)∥ 2 (2⟨x t (0), F (x t , t)⟩ + ∥G(x t , t)∥ 2 F ) -(2 -α)∥x t (0) ⊤ G(x t , t)∥ 2 ≤ 0, for x t ∈ C([-τ, 0], X ), where x t (s) = x(t + s) for s ∈ [-τ, 0] and X is the state space. Then, the solution of the SFDE satisfies lim t→∞ x(t; ξ) = 0 a.s. for any ξ ∈ C F0 ([-τ, 0]; R d ). Remark 4.2 The SFDE in Theorem 4.1 is formulated in a very general type, including the SDDE dx(t) = F (x(t), x(t -τ 1 ), • • • , x(t -τ q ), t)dt + G(x(t), x(t -τ 1 ), • • • , x(t -τ q ), t)dB t with τ 1 < τ 2 < • • • < τ q ∈ [0, τ ]. This indicates that our framework can be generalized to stabilize the SDDEs with multiple delays and even more general SFDEs as well. In light of Theorem 4.1, we establish a more general framework for learning a neural controller of system (4) with the form u = (u f , u g ) designed in the same NN architecture as the one used in the NDC framework. We focus on stochastic control with u f = 0 and provide more control combinations in Appendix A.3.3, whereas the loss function is differently designed as follows. Definition 4.1 (Asymptotic Loss) Utilize the notations set in Definition 3.1 and g u = g + u g . The loss function for the controlled system (4) with the controller u is defined as: L µ,α (θ) = 1 N N i=1 max 0, (α -2)∥x i ⊤ g u (x i , y i , t i )∥ 2 + ∥x i ∥ 2 (2⟨x i , f (x i , y i , t i )⟩ + ∥g u (x i , y i , t i )∥ 2 F ) , (10) where θ = (θ f , θ g ). Akin to Definition 3.1, we use the empirical loss function for training. Here, α is an adjustable parameter, which is related to the convergence rate and the control energy. We further discuss the design of the asymptotic loss in Appendix A.2.2 and numerically investigate the role of α in Appendix A.4.1. We summarize the framework in Algorithm 2 in Appendix A.3.1. And we further compare the computational complexity in Appendix A.3.2. We compare our neural control methods on a noiseperturbed kinematic bicycle model for car-like vehicles (Rajamani, 2011) in terms of the convergence time and the energy cost, which are two important indexes to measure the quality of a controller (Yan et al., 2012; Li et al., 2017; Sun et al., 2017) . To quantify the energy cost in the control process, we first denote by τ ϵ ≜ inf{t > 0 : ∥x(t)∥ = ϵ} the stopping time and then by E ϵ ≜ E τϵ 0 ∥u f ∥ 2 + ∥u g ∥ 2 dt the energy cost. We approximate this expectation value by the empirical value as

4.1. EXPERIMENTS OF THE COMBINATION METHODS

1 N N i=1 τ i ϵ 0 ∥u i f ∥ 2 + ∥u i g ∥ 2 dt through the Monte Carlo sampling. We show the results in Figure 4 and in Table 1 as well. Table 1 includes the training time (Tt), empirical energy cost E 0.001 , nearest distance (Nd) between the bicycle and target position, and empirical expectation E[τ 0.001 ] for different methods. We include more experimental details in Appendix A.3.5. We can see that the ranking of the comprehensive performance is NSC > NDC > QP. This means that we can really benefit from introducing noise in the control protocol. This is reasonable because, when we regard the energy cost as an objective function for minimization, the randomness is more likely to lead this functional to the shortest path, akin to the common case where the stochastic gradient descent outperforms the full-batch gradient descent. We show the NSC can enlarge the region of attraction of the 100-D gene regulatory networks in Appendix A.4.2.

Uncontrollable Fluctuation

The neural stochastic method we propose outperforms the control methods including the deterministic control. However, the method can cause uncontrollable fluctuation due to the stochasticity. In practice, we always want to bound this perturbation owing to physical and engineering restrictions in the real world. We tackle this safety guarantee problem in Section 5. We check the safety condition on discretization points with mesh r. In this section, we study the safety and stability guarantees for the SYNC framework. Based on the stochastic control barrier functions, we establish an analytical result on the safety guarantee problem for SDDEs, which guarantees that the process x(t; ξ) satisfies the safety constraint, i. Baseline We extend the recent results on stochastic control barrier functions in SDEs (Clark, 2019) to the SDDEs and summarize the results in Proposition 5.1. With this proposition and Theorem 2.2, the traditional deterministic control methods based on the Quadratic Program (QP) in (Fan et al., 2020; Sarkar et al., 2020) can be applied to test on the SDDEs. We use this QP method as the baseline and the specific algorithm is shown in Appendix A.3.1. We also take the classic MPC method as the baseline. Proposition 5.1 Let the function B: R d → R be locally Lipschitz and twice-differentiable on int(C). If there exist three extended class-K functions α 1,2,3 (x) such that [α 1 (h(x))] -1 ≤ B(x) ≤ [α 2 (h(x))] -1 , and LB(x, y, t) ≤ α 3 (h(x)) for the SDDE in (1). Then, P x(t) ∈ int(C) = 1 for all t, provided with x(0) ∈ int(C). A natural idea is to integrate Proposition 5.1 into our proposed neural control framework, but the main drawback in the usage of this proposition is that B(x) is unbounded on C, lacking Lipschitz continuity. This drawback makes it impossible to fulfill the expected conditions only through numerical verification on finite samples. To conquer the difficulty, we propose the following theorem for safety guarantee, which, we believe, is a significant promotion of the existing barrier function theory. Theorem 5.2 For the SDDE specified in (1), where F and G satisfy locally Lipschitz condition and locally linear growth condition, if there exists an extended class-K function λ(x) such that Lh ≥ -λ • h for x ∈ D, where • represents the function composition, D is compact and C ⊂ D. Then, the solution satisfies P(x(t; ξ) ∈ int(C)) = 1 for any ξ ∈ C F0 ([-τ, 0]; R ) with ξ(0) ∈ int(C). Discretization and Safety Guarantee. Based on the Theorem 5.2, we can construct a neural candidate class-K function λ and combine it with the NDC and NSC to learn a safe controller, where the candidate λ is required to satisfy the condition assumed in Theorem 5.2. However, the main difficulty is to guarantee the condition for every point x ∈ D, since, in practice, we can basically guarantee this condition on a finite number of training data D with D being a discretization of D. Surprisingly, the following theorem suggests that we only need to check a slightly stronger condition on a finite number of states in D in order to establish the safety guarantee on the whole D. Theorem 5.3 Let M = M(F, G, h, λ, D) be the maximum of the Lipschitz constants of Lh and λ • h on D. Also, let r be the mesh size of D. Thus, for each x ∈ D, there exists x ∈ D such that ∥x -x∥ 2 < r. Suppose there exists a non-negative constant δ ≤ M r such that -Lh -λ • h + 4M r ≤ δ, ∀x ∈ D. Then, λ satisfies the safety condition specified in Theorem 5.2. Remark 5.4 Here, the non-negative δ is regarded as the tolerance error in the training stage. So, practically, we terminate the training until the safety loss is smaller than M r. Construct Neural Networks with Bounded Lipschitz Constant. We can define the loss function for safety in the manner of the left-hand side in (11). However, M depends on the Lipschitz constants of the NN functions λ and u, which probably makes it complex and difficult to train the loss function. To simplify the loss function, we construct the NNs with bounded Lipschitz constants for λ and u. Specifically, we add the spectral normalization for the neural control function to constrain its Lipschitz constant lower than 1 (Miyato et al., 2018; Yoshida & Miyato, 2017) . We apply the monotonic NNs to construct the candidate extended class-K function as λ θ λ (x) = x 0 q θ λ (s)ds, where q θ λ (•), the output of the NNs, is definitely positive (Wehenkel & Louppe, 2019) . To constrain the Lipschitz constant of λ θ λ , we modify the integral formula as λ θ λ (x) = x 0 min{q θ λ (s), M λ }ds, where M λ is a predefined hyperparameter. Thus, the Lipschitz constant of λ θ λ is smaller than M λ . Therefore, we can calculate M from the considered functions and M λ . Other Lipschitz regularization methods can be applied in our framework (Gouk et al., 2021; Liu et al., 2022) as well. SYNC Algorithm: We define the loss function for the safety guarantee of the controlled system (4) as follows, (the specific algorithms are summarized in Algorithm 1 and 2) L D,M λ (θ, θ λ ) = 1 | D| 2 (x,y)∈ D× D max {0, -Lh(x, y) -λ θ λ (h(x)) + 4M r} . (12) We add this loss to equation 8 and equation 10, respectively, to separately train the NDC and NSC. To obtain the safety guarantee, we terminate the training process once L D,M λ (θ, θ λ ) is less than M r. From Safety Guarantee to Stability Guarantee. Akin to the safety guarantee, we provide the stability guarantee for the candidate neural control functions satisfying the condition in Theorems 2.2 and 4.1. However, both theorems require their conditions to be valid for every point x ∈ X ⊂ R d , while, in practice, it is impossible to obtain a finite discretization or a bounded Lipschitz constant on the unbounded X . Ingeniously, this difficulty can be conquered with the help of the safety guarantee since the safety condition restricts X ⊂ D where D is compact. As such, we can establish theoretical results on stability guarantee for NDC and NSC in a similar manner as that in Theorem 5.3. We thus summarize all these results in Appendix A.1.8. We test the proposed safe control method to suppress the fluctuations emergent in the control process on the task of controlling noise-perturbed inverted pendulum with time-delay. This control task is a standard nonlinear control problem for testing different control methods (Anderson, 1989; Huang & Huang, 2000) . We apply the safe control method to steer the system to the upright position without rotating a semi-circle, i.e. |θ| ≤ π. The results are shown in Figure 6 and the experimental details are provided in Appendix A.3.6. It is observed that the safe control method significantly outperforms the baseline and the stochastic control method in terms of stabilization and safety guarantee.

6. THEORETICAL RESULTS FOR NDC AND NSC

We have mentioned the stopping time and the energy cost in section 4.1 and numerically compare the proposed neural controllers with these indexes. These two indexes are the classic factors to measure the performance of the controller (Sun et al., 2017) . From the construction in Section 5, we circumscribe the Lipschitz constant k u of the control function. Based on the safety and stability guarantee, the neural controller thus satisfies the conditions assumed in Theorems 2.2 and 4.1. Then, we have the following theoretical results and include their proofs in Appendix A.1.9. Theorem 6.1 (Estimation for NDC) Consider the SDDE with NDC controller as dx(t) = (f (x, x(t -τ )) + u f (x(t), x(t -τ ))dt + g(x(t), x(t -τ ))dB t , x(0) = x 0 ∈ R d , where ∥f (x, y) -f ( x, ȳ)∥ ∨ ∥u f (x, y) -u f ( x, ȳ)∥ ≤ L(∥x -x∥ + ∥y -ȳ∥). Assume that the controlled system satisfies the conditions assumed in Theorem 2.2 and Remark 3.1 with Ker(w 1w 2 ) = 0. Denote by η ε = inf{t > 0 : ∥x(t)∥ = ε} the stopping time and by E(η ε , T ) = E[ ηε∧T 0 ∥u(x(s), x(s -τ ))∥ 2 ds] the corresponding energy cost in the control process with ϵ < ∥x 0 ∥. Thus, using the same notations in Theorem 2.2, we have          E[η ϵ ] ≤ T ϵ = V (x 0 ) -min ∥x∥=ε V (x) + 0 -τ w 2 (ξ(s))ds min ∥x∥≥ε (w 1 (x) -w 2 (x)) , E(η ϵ , T ϵ ) ≤ k 2 u C 0 2(L 2 + L + k u ) exp 4(L 2 + L + k u )T ε -1 + 0 -τ k 2 u ξ 2 (s)ds. where C 0 = ∥x 0 ∥ 2 + (2L 2 + L + k u ) 0 -τ ξ(s) 2 ds and ξ ∈ C[-τ, 0] is the initial data. We provide the similar theoretical results for NSC in Appendix A.1.10.

7. RELATED WORKS

Stability Theory of SDDEs. The early endeavors to develop the stability theory for SDDEs were attributed to (Mao, 1999; 2002) inspired by LaSalle's theory (LaSalle, 1968) . The subsequent developments have been systematically and fruitfully achieved in the last twenty years in the control community Appleby (2003); Song et al. (2014) ; Liu et al. (2016) ; Zhu (2018) ; Peng et al. (2021) . These works reveal the positive effect of multiplicative noise to the stochastic dynamics with delays, and motivate us to develop only neural stochastic control to stabilize dynamical systems. Finding Stabilization Controller. Traditional control methods focus on transforming control criteria, such as the control Lyapunov functions (CLFs), into the QP (Fan et al., 2020; Sarkar et al., 2020) or the semi-definite planning (SDP) problems (Henrion & Garulli, 2005; Jarvis-Wloszek et al., 2003; Parrilo, 2000) to find optimal control iteratively. These methods have high computational complexity since they cannot give the closed form of the control. Hence, machine-learning-based control methods have been introduced to improve the generalization and efficiency of the original convex optimal problems (Khansari-Zadeh & Billard, 2014; Ravanbakhsh & Sankaranarayanan, 2019; Gurriet et al., 2018) . However, all the existing learning methods consider dynamics without time-delay (Wagener et al., 2019; Williams et al., 2018; Chang et al., 2019; Zhang et al., 2022) . Theory and Application of Control Barrier Function The barrier function method has been extensively researched in the problem of safety verification of controlled dynamics (Prajna & Jadbabaie, 2004; Jankovic, 2018; Prajna et al., 2004; Clark, 2019; 2021) . Existing works for constructing barrier functions in applications typically based on quadratic programming (Ames et al., 2014; 2016; Khojasteh et al., 2020; Fan et al., 2020) . Machine learning methods have also been introduced in safe control fields in (Robey et al., 2020; Dean et al., 2020; Taylor et al., 2020) .

8. DISCUSSION

We heuristically design two kinds of neural controllers for SDDEs based on the classic LaSalle-type stabilization theory and the newly proposed stochastic stabilization theorem. To assure the controlled trajectories can stay in the safety region, we cultivate the safety guarantee theorem through the SCBF and the discretization techniques. Since the state space of the controlled SDDEs with safety guarantee is bounded by the compact safety region, we can similarly deduce the stability guarantee theorem for neural controllers through spatial discretization. Furthermore, we theoretically and numerically investigate the neural controllers' performance in terms of convergence time and energy cost. The proposed neural controllers with safety and stability guarantee are summarized as SYNC, which significantly simplifies the process of control design and has extensive potential in different control fields, such as financial engineering (Zhou & Li, 2000) .



Figure 1: Overall work flow. Sketches of SYNC. Both the NDC and NSC can stabilize the SDDEs to the target unstable equilibrium x * . The safety-aware controlled state trajectories are restricted in the safe region.

Figure 3: (a) The original driving-response model, (b) the controlled orbits under LC and NDC, (c) the time trajectory of y 2 with autonomous noise, and (d) the nonautonomous noise. The solid lines are obtained through averaging the 10 sampled trajectories, while the shaded areas stand for the standard errors.to satisfy the conditions based on the Lyapunov theory(Yu & Cao, 2007). It should be noted that all these methods require a delicate design of functions for specific dynamics, and thus are limited in practical application for controlling general time-delayed systems. However, our neural method leverages NNs to automatically learn the control policies, and can be applied in any kind of time-delayed systems with stochastic settings. In Figure3, we numerically compare the NDC and a baseline, the linear control (LC) proposed in(Lin & He, 2005), on a noised driving-response Chua's circuit. Here, Chua's circuit is a three-dimensional autonomous dynamical system with a unique nonlinear element, producing typical chaotic dynamics(Matsumoto, 1984). In the simulation, we show that the NDC can find the neural control for the response system y = (y 1 , y 2 , y 3 ) with the autonomous and even the nonautonomous time-delay noise. Actually, the nonautonomous time-delay noise was not considered in(Lin & He, 2005). The simulation configurations are described in Appendix A.3.4.

Figure 2: Training loss for 1-D SDDE.

Figure 4: (Left) A schematic diagram of the kinematic bicycle model. (Right) Time trajectories of the state variables x, y of the kinematic bicycle under different control cases. The solid lines are obtained through averaging the 10 sampled trajectories, while the shaded areas stand for the standard errors.

Figure 5: Diagram of the safety guarantee.

e., x(t; ξ) ∈ int(C) for all t with the initial value ξ(0) ∈ int(C). Here, C = {x : h(x) ≥ 0} is a compact set and the local Lipschitz function h: R d → R is called a stochastic control barrier function (SCBF). Inspired by(Lechner et al., 2022), we prove that the safety and stability conditions for NN form functions can be guaranteed through a stronger condition on finite samples. We include the analytical proofs for all the results in Appendix A.1. Definition 5.1 A continuous function α : (-b, +∞) → (-∞, +∞) is said to be of an extended class-K function for some b > 0 if it is strictly increasing and α(0) = 0.

Figure 6: Schematic diagram of inverted pendulum task (a). The θ component of the original system (b), under baseline control (c), under NSC (d), and under our proposed safe control (e). The solid lines are obtained through averaging the 5 sampled trajectories, while the shaded areas stand for the standard errors.

where ∥ • ∥ denotes the L 2 -norm and ∥ • ∥ F denotes the Frobenius norm, i.e. ∥G(x, y, t)∥ 2 F =

Results on kinematic bicycle model.

9. ACKNOWLEDGMENTS

We thank the anonymous reviewers for their valuable and constructive comments that helped us to improve the work. Q.Z is supported by the China Postdoctoral Science Foundation (No. 2022M720817), by the Shanghai Postdoctoral Excellence Program (No. 2021091), and by the STCSM (Nos. 21511100200 and 22ZR1407300). W.Y. is supported by the STCSM (Nos. 21511100200, 22ZR1407300 and 22dz1200502). W.L. is supported by the National Natural Science Foundation of China (No. 11925103) and by the STCSM (Nos. 22JC1402500, 22JC1401402, and 2021SHZDZX0103).

