ON EXPLAINING NEURAL NETWORK ROBUSTNESS WITH ACTIVATION PATH

Abstract

Despite their verified performance, neural networks are prone to be misled by maliciously designed adversarial examples. This work investigates the robustness of neural networks from the activation pattern perspective. We find that despite the complex structure of the deep neural network, most of the neurons provide locally stable contributions to the output, while the minority, which we refer to as float neurons, can greatly affect the prediction. We decompose the computational graph of the neural network into the fixed paths and float paths and investigate their role in generating adversarial examples. Based on our analysis, we categorize the vulnerable examples into Lipschitz vulnerability and float neuron vulnerability. We show that the boost of robust accuracy from randomized smoothing is the result of correcting the latter. We then propose an SC-RFP (smoothed classifier with repressed float path) to further reduce the instability of the float neurons and show that our result can provide a higher certified radius as well as accuracy.

1. INTRODUCTION

Despite their verified performance, neural networks are prone to be misled by maliciously designed adversarial examples. In response to this issue, many studies focus on defensive algorithms that aim to increase the robustness of deep neural networks. One of the emerging topics in this field is certifiable methods that aim to construct a guaranteed region, within which classifiers are able to provide stable results regardless of the perturbation. The certifiable methods appear in two different forms: verifiable training and randomized smoothing. This work introduces an SC-RFP (smoothed classifier with repressed float path) which builds on randomized smoothing algorithms and is able to further improve their robustness accuracy. We decompose the local mapping function into fixed paths and float paths according to the stability of neurons on the path. The fixed paths have a stable mapping relationship between input and output, while the float paths can result in a sudden change of the mapping function and alter the result. We categorize the adversarial examples into Lipschitz vulnerable and float neuron vulnerable. With respect to the ability of randomized classifiers in correcting misclassified data, we conclude that the essence of the smoothed classifier is to average the contribution of the float path and achieve a locally stable result. Based on this, we further repress the float paths of the network and show that such a classifier can achieve better performance. The theoretical basis of this work is developed from the analysis of the activation region that was initially proposed for explaining the performance of neural network with a piecewise linear activation function. The input domain of such a neural network N is separated into many regions, within which the mapping of N is piecewise linear. Previous investigation of this field includes the expressivity, sensitivity, and potential issues of the network. However, due to the complexity of neural network, the theoretical investigation only provides insights into the neural network but has yet to be deployed downstream. In this work, we use the theory to explain the model robustness and introduce a novel way to apply the complex theory to practical. The contributions of this work are: (1) we introduce a complete framework to describe and decompose the neural network according to the activation status of each neuron; (2) we provide an explana-tion of adversarial examples and discuss the role of smoothed classifiers as well as their contribution in correcting misclassified example; (3) we introduce SC-RFP that achieves better performance in certifying the network.

2. RELATED WORKS

The adversarial examples are malicious inputs that are formed by applying an imperceptible perturbation to the original inputs but result in misclassification of a well-trained network (Biggio et al. (2013) ; Szegedy et al. (2013) ). To explain the existence of adversarial example, previous works presented several hypotheses, such as linearity hypothesis (Szegedy et al. (2013) ; Luo et al. (2015) ) and Evolutionary stalling (Rozsa et al. (2016) ). Early works on increasing the robustness of neuron networks focused on adversarial training methods (Goodfellow et al. (2014) ; Wong et al. (2020) ; Tramèr et al. (2017) ; Dong et al. (2018) ; Kurakin et al. (2016) ), while recent investigation shows adversarial training methods can be broken by more advanced attacks. To address the issue, certifiable training and randomized smoothing methods aim to provide a certified region, within which the input data are free from attack. By viewing the training as a convex optimization problem, dual relaxation approaches apply duality to provide a solid bound for training as well as verify the network (Wong & Kolter (2018) ; Wong et al. (2018) ). An alternative is to estimate the Lipschitz boundary of the network and introduce constraints on either objective loss (Tsuzuku et al. (2018) ) or forward propagation (Lee et al. (2020) ; Weng et al. (2018) On the other hand, randomized smoothing introduces a smoothed classifier to the base classifier, therefore has a limited effect on the performance of standard models. Cao & Gong (2017) first propose to ensemble the information around input data to smooth the prediction, but fail to provide a theoretical guarantee on the result. Lecuyer et al. (2019) certify the result of the smoothed classifier with differential privacy. Cohen et al. (2019) provides a theoretical analysis of the certifiable with Monte Carlo, followed by Levine et al. (2019) ; Li et al. (2019) . Jeong & Shin (2020) introduces a regularized to improve the prediction constantly over noise, Jeong et al. (2021) trains the model on a convex combination of samples and Salman et al. ( 2019) employs PGD attack with randomized smoothing to further increases the robustness accuracy. Another related topic is the explainability of neuron networks. Lin et al. (2017 ), Hornik et al. (1989) and Park et al. (2020) investigate how deep models approximate an objective function. An inspiring observation is that the network is a piecewise function when the activation function is piecewise linear (Pascanu et al. (2013) ). The number of linear regions is then adopted as a proxy of network complexity (Montufar et al. (2014) ; Hanin & Rolnick (2019a; b) ). Novak et al. (2018) studies the network sensitivity by countering the transition density of trajectory in the input space. Jiang et al. (2022) compares the similarity of activation patterns globally to study the limitation of deep neural network. Inspired by the theoretical investigation, Jordan et al. (2019) introduces an algorithm named GeoCert that computes the l p bound of the network with a piecewise linear activation function. Zhang et al. (2022) proposes an algorithm that systematically searches the adversarial example based on the activation space of ReLU network.

3.1. NOTATIONS

Let N be a d block feedforward neural network for classification task with measure zero parameter set θ with respect to Lebesgue measure. Each of the block h i consists of a linear affine ϕ i , an optional batch-normalization layer ψ i , and piecewise linear activation function σ i , while the last block h d omits the activation function. Consider D as the distribution of a classification problem with c classes from R n0 to Y = {1, 2, . . . , c}, network N computes a function f : R n0 → R c , where f is a composition of d blocks f = h n • h d-1 • • • • h 1 . For every (x, y) ∼ D, the network computes a probability for each class f (x) ∈ R c and predicts the label of x as the class with the highest probability: ŷ = arg max m∈Y f m (x), where f m (x) is the mth element of the network output vector. We use x i (x; θ), y i (x; θ) and z i (x; θ) to denote the input, output, and pre-activation value of block i for data x. Tuple pair (i, j) and set I = ∪ d-1 i=1 {(i, j)|j ∈ {1, . . . , n i }} denote the j-th neuron of layer i and all the neurons in intermediate layers respectively, where n i is the output size of layer i.

3.2. RANDOMIZED CERTIFIABLE CLASSIFIER

The research on certifiable training aims to provide a guaranteed region for its input x, within which a classifier always provide stationary result. To be specific, a classifier is regarded as robust for an input x for perturbation of size r if: arg max m∈Y f m (x ′ ) = arg max m f m (x), ∀x ′ ∈ B p (x, r) where B p (x, r) := {x ′ : ∥x ′ -x∥ p ≤ r} is the sphere with radius r measured by the metric induced by p-norm. By taking the standard performance of the classifier into account, the robust accuracy of f with radius r is then defined as: R(f ) = E (x,y)∼D arg max m∈Y f m (x ′ ) = y, ∀x ′ ∈ B p (x, r) . However, robust models often incur increased stableness and impaired expressivity. As a concession for that, randomized algorithms are proposed to verify the network with a sound theoretical bound at the cost of slight additional computation other than model performance. Let g be a randomized algorithm constructed based on classifier f . Given a data (x, y) ∼ D, g employs a certain degree of randomness during the induction of f . For instance, smoothed classifier g computes the probability of f (x + ϵ) belongs to class i given ϵ ∈ N (0, σ 2 I) (Cohen et al. (2019) ): g i (x) = P(arg max m∈Y f m (x + ϵ) = i), ϵ ∼ N (0, σ 2 I), With certain confidence level α, the lower bound of p A of random variable g y (x) and the upper bound p B of the probability of second possible class max m ∈ Y g y (x) can be computed. This induces a certified radius of classifier g(x): r = σ 2 Φ -1 p A ) -Φ -1 p B ) . For every x ′ ∈ B 2 (x, r), if arg max m∈Y g m (x) = Y , then arg max m∈Y g m (x ′ ) = y. In other words, with confidence level α, every x ′ within the radius can be correctly classifier by the smoothed classifier. The smoothed classifier can also appear in other forms depends on the randomness and training algorithm (Levine et al. (2019) ; Salman et al. ( 2019); Jeong & Shin (2020)).

3.3. ACTIVATION PATTERN AND COMPUTATIONAL PATH

The input space is partitioned into small linear regions by a neural network with piecewise linear activation function. Each of the regions is initially referred to as an activation region. Previous works mostly focused on the general properties of linear regions in investigating the expressivity and limitation of neural network. As our objective is to study the unit-wise reaction towards perturbation, introducing additional notations to describe the neurons is necessary for our analysis. Definition 1 (Generalized Activation Pattern / Region). Let N be a network defined as Section 3.1. Denote Γ = {γ 1 , γ 2 , . . . , γ q } as a set of breakpoints that separates the domain of activation function into q + 1 intervals U = {U 0 , U 1 , . . . , U q }. A generalized activation pattern of N is an assignment to each neuron of a label: A := {a ij |a ij ∈ {0, 1, . . . , q}, (i, j) ∈ I}. Given an activation pattern A, the activation region of A is defined as: R(A; θ, σ, Γ) := {x ∈ R in |z ij (x; θ) ∈ U aij , a ij ∈ A}. 1 (5) Activation Region Operator. The generalized activation pattern describes the activation status of each unit at the intermediate layers. Given an activation pattern A and fixed dependencies, R(A) can be viewed as an operator that finds the region, such that, for every x ∈ R(A), the pre-activation value of each neuron z ij (x; θ) locates within the interval U aij that determined by its pattern a ij ∈ {0, 1, . . . , q}. However, as the scale of the network grows, the mean volume of a single activation region decreases exponentially and can hardly provide insights into model robustness. Therefore, it is necessary to generalize our investigation from single region to its neighbor. From Single Region to its Neighbor.foot_0 Equation 5 suggests that the essence of operator R(•) is to find all the x that satisfies a certain constraint determined by the activation pattern. It can be expressed as the intersection of a sequence of subspaces: R(A) = ∀i,j {x ∈ R in |z ij (x) ∈ U aij }. Given a subset of indexes, removing the intersection operations on those neurons defines a larger region that contains R(A). This implies that adjacent activation regions can be merged into one by releasing constraints on certain neurons. We described it with an incomplete activation pattern: Definition 2. Let N be a network defined as Section 3.1. Given an activation pattern A and a subset of the index set I c ∈ I, we denote A I c ⊂ A is an incomplete activation pattern of A: A I c := {a ij |a ij ∈ A, (i, j) ∈ I C }. The merged activation region of A I c is denoted as: R(A I c ) = (i,j)∈I c {x ∈ R in |z ij (x) ∈ U aij }. In the following section, we delve into such merged region to investigate model robustness.

4.1. FLOAT NEURON AND PATH

Merged activation regions are irregularly shaped as it is defined by the post-activation of neurons, while a regular subspace, such as sphere B(x, r), is preferred in analyzing the properties of a network. Definition 3 introduces float neurons and fixed neurons to describe the status of neurons in a regular subspace, followed by Lemma 1 which builds a connection between Definitions 2 and 3. Definition 3 (Float and Fixed Neuron). Let N be a network defined as Section 3.1. For any space R ⊂ R n0 , if a neuron z has the same pattern for any x ∈ R, we refer it as a fixed neuron of R, otherwise it is a float neuron. We denote the collection of fixed neurons and float neurons in region R as I X (R) and I T (R), respectively: I X (R) ={(i, j) | ∃x 1 , x 2 ∈ R, a ij (x 1 ) ̸ = a ij (x 2 )} I T (R) ={(i, j) | ∀x 1 , x 2 ∈ R, a ij (x 1 ) = a ij (x 2 )}. The following lemma shows that: (1) the fixed and float neurons are complementary sets in I and (2) any subset R ∈ R n0 can be covered by a merged region defined by the fixed neurons of R. Lemma 1. Let N be a neural network defined as Section 3.1. Given R ⊂ R n0 , denote I X and I T are the set of float neuron and fixed neuron in R. Then: Definition 4 (Path). Let N be a network defined as Section 3.1. A path of N is a set of neurons: ζ := {(i, ζ i )|i = 0, 1, . . . , d}, ζ i ∈ {1, 2, ..., n i }, where n i is the number of neurons in layer i. The value of a path is defined as: ζ(x, A) := x ζ0 i m=1 d a m,ζm W ′(m) (ζm,ζm-1) , x ∈ R, where x ζj is the ζ j -th element of input, W ′(i) is the equivalent matrix linear transformation ψ i • ϕ i , a m,ζm is the activation pattern of neuron (m, ζ m ) and d a m,ζm is the slope of activation σ within a m,ζm -th interval. Definition 5 (Float Path and Fixed Path). Let N be a neural network defined as Section 3.1. Given a subspace R ∈ R n0 , a path ζ of N is a float path in R if there exist a neuron (i, ζ i ) ∈ ζ is a float neuron, otherwise it is a fixed path: float path in R := {ζ|∃ζ i ∈ ζ, (i, ζ i ) / ∈ I I (R)} fixed path in R := {ζ|∀ζ i ∈ ζ, (i, ζ i ) ∈ I I (R)}. ( ) The value of float paths and float paths in R are denoted as: Z T (x, A; R) = float path in R ζ(x, A), Z T (x, A; R) = float path in R ζ(x, A). Figure 1 illustrate the proposed concepts with a simple network with 2D input, 4D output and 1 hidden layer. Neuron (1, 4) is the only float neuron in region B(x, r). As all the other neurons are fixed, the only non-linearity is provided by neuron (1, 4). The computational graph of N can be decomposed into float paths and fixed paths according to whether (1, 4) is on the path. By removing the constraint of neuron (1, 4) in Equation 6, an incomplete activation pattern A I c defines a merged activation region R(A I X ) from green and pink, where I c = I/{(1, 4)} is subset of neurons without neuron (1, 4). As Lemma 1 suggests, region R(A I X ) covers B(x, r). Moreover, according to Definition 5, the activation pattern of neurons on fixed paths remains unchanged for every x ∈ R(A I X ). Therefore, given x, x ′ ∈ B(x, r) with different activation patterns, the value of their fixed paths is linear according to Equation 8. The following theorem generalizes the above discussion by aggregating the value of all fixed and float paths with Definition 5. Theorem 1. Let N be a neural network defined as Section 3. Given R ⊂ R n0 , the following statements hold for any x, x ′ ∈ R with activation pattern A and A ′ : 2. f (x) -f (x ′ ) = J(x)(x -x ′ ) + Z T (x ′ , A; R) -Z T (x ′ , A ′ ; R) where J(x) is the Jacobian matrix of f at x. Statement 1 decomposes the computational graph of f (x) into a linear function Z I (x, A; R) and an unstable function Z T (x, A; R) with high non-linearity. Given x, x ′ ∈ R, f (x) -f (x ′ ) can also be written as the sum of a fixed part and a float part by substitution with Statement 1. We rearrange the equation so that the fixed part can be represented by the Jacobian matrix at x, while the float part can be viewed as a general instability caused by float neurons.

4.2. FLOAT PATH AND NETWORK ROBUSTNESS

Now that we have stated the motivation and properties of proposed concepts, the remaining question is how float and fixed paths affect the model robustness and randomized smoothing algorithms. Lipschitz Vulnerable. We start with investigating the fixed part J(x)(x -x ′ ), which is determined by both local Lipschitz constant and the scale of perturbation. Intuitively, given sphere B(x, r), if the Lipschitz constant is larger enough, the fixed part in f (x ′ ) -f (x) itself can alter the prediction of x. The threshold of ∥J(x)∥ is determined by the sum of prediction margin M (f (x), y) := min y ′ ̸ =y |f (x) y -f (x) ′ y | and upper bound of float path Z T (x ′ , A; B(x, r)) -Z T (x ′ , A ′ ; B(x, r)) . We refer such an x ′ as a Lipschitz vulnerable data. In particular, taking expectation on above sum, we have the following theorem. Theorem 2. Let f be the base classifier. Given (x, y) ∼ D with f (x) = y, If ∥J(x)∥ > M (f (x), y) + E[M (Z T (x, A; R), y)] r , then for any smoothed classifier g defined as above, there exist x ′ ∈ B(x, r) such that g(x ′ ) ̸ = y. It shows that by removing the float paths between x and x ′ , a significant accuracy boost can be achieved. This suggests that the unstable part of the network can greatly affect the prediction of network. We refer x ′ as a float neuron vulnerable data if arg max Z I (x ′ ) = y while f (x ′ ) is misclassified. Given x ′ is float vulnerable, if the smoothed classifier corrected the prediction of x ′ , then the majority of the neighbor of x ′ are voting for the correct label. In other word, the instability caused by float paths is smoothed by additional samples around x ′ . ∥Z I (x ′ ) + Z T (x ′ )(1 -η) -f (x)∥

SC-RFP.

Above discussion suggests that smoothed classifier fails to boost the performance of Lipschitz vulnerable example, but is able to correct float neuron vulnerable data by smoothing the sudden change of float path in a region. Moreover, sampling more data shows that, the better smoothing is performed, the higher certified radius and accuracy can be achieved (Cohen et al. (2019) ). In other words, randomized smoothing provides robustness by restricting the instability the network locally. Based on this insight, SC-RFP introduces a manual repression on the local instability caused by float path into the smoothing classifier. The first set of Table 1 shows the averaged l 1 norm of difference between expectation of repressed prediction E[Z I (x ′ ) + (1 -η)Z T (x ′ )] and f (x), where x ′ is perturbed by ϵ ∼ N (0, 0.25) with g(x ′ ) = y. At each x ′ , we sample 2550 samples around x ′ and compute the averaged distance to f (x). We find that: (1) the clean model has high Lipschitz constant while repressing the instability from float path cannot reduce the prediction gap, and (2) for other models, repressing the instability drives the prediction of smoothed classifier towards original prediction f (x). The results support above discussion. Algorithm 1 describes SC-RFP in detailsfoot_3 . At each block, we first compute the pre-activation value of x and x + ϵ. Prior to passes to the activation function, we compare the activation pattern between z i (x) and z i (x + ϵ) according to the separation Γ, and use a repression factor η to reduce the value of float path. Algorithm 1 Smoothed Classifier with Repressed Float Path Inputs: Network N with parameter θ, randomized algorithm g, input x, repress ratio η. Outputs: Predicted label ŷ while g sample noise ϵ do for Block i in Block 1, 2, . . . , d -1 do I T i ← A(z i (x)) ̸ = A(z i (x + ϵ)) z i (x + ϵ) ← z i (x + ϵ) -z i (x + ϵ) × I T i × η x i+1 (x + ϵ) ← σ(z i (x + ϵ)) end for counts ← ϕ d (x d (x + ϵ)) end while ĉA , ĉB ← top two indices in counts nA , nB ← counts[ĉ A ], counts[ĉ B ] if BinomP V alue(n A , n A + n B , 0.5) ≤ α then return ĉA else return Abstain end if Intuitively, manipulating the computational graph can greatly change the prediction result, while we show that there are only a small proportion of path are affected by our method. Figure 2 present the proportion of fixed neuron between x and x + ϵ of VGG16 network trained on CIFAR10. It shows that models trained with noised sample have relatively more stable activation pattern, as the ratio of fixed neurons are lower. In particular, the model trained with clean data has average fixed ratio around 72%, while after adding a minimum scale of noise σ = 0.05, it increases to around 90%. At the end of this section, we present Theorem 3 to link the certifiable boundary of proposed algorithm with previous works. Theorem 3. Let N be a network defined as Section 3.1. Let g be a smoothed classifier that samples noise from distribution D noise and g ′ is the SC-RFP built on g. Given ϵ ∼ D noise , assume that the direction of ϵ is uniformly distributed: ∀∥η 1 ∥ = ∥η 2 ∥ = 1, P ( ϵ ∥ϵ∥ = η 1 ) = P ( ϵ ∥ϵ∥ = η 2 ). ( ) If arg max m∈Y f y (x) = y, then p ′ A > p A , p B ′ < p B , ) where p ′ A , p A are the lower bound of g ′ y (x) and g y (x), p B ′ , p B are the upper bound of g ′ m̸ =y (x) and g ′ m̸ =y (x). Moreover, if arg max m∈Y g(x) = y, g ′ (x) has certified radius no less than R, where R = σ 2 (Φ -1 (p A ) -Φ -1 (p B )). ( ) Notice that in the above theorem, we do not specify the randomized classifier g but introduce a constraint of the randomness of g. This means that the theorem holds for not only the naive smoothed classifier, but also its variants. However, it does not hold for the classifiers that sample directed noise. 5 EXPERIMENTS CIFAR10. We train the VGG16 network with different scales of noisy samples. Each of the models is trained for 200 epochs with SGD optimizer and an initial learning rate of 0.1, which decays after 60, 120, and 160 epochs with a rate of 0.2. Table 2 compares the robust accuracy of SC-RFP and benchmark. Generally, our SC-RFP increases have an increasingly robust accuracy between 6% to 10% compared with the benchmark. We notice that by repressing the float path, SC-RFP is able to improve the accuracy of clean data, which supports our analysis that the float neuron vulnerability is one of the causes of misclassification. Figure 3 present the robust accuracy at different noise scale. We find that the performance boost from SC-RFP increases as the radius increases for all the models. Method 0.25 0.5 0.75 1.0 1.25 1.5 2.0 2.5 Clean 2019)), the SC-RFP has the highest robust accuracy at all the radius. Moreover, an increasing boost in the robust accuracy is observed as the radius increase, which is consistent with our previous analysis of CIFAR10 result. Figure 4 provides more details on our experiments. For a noise level of σ = 0.25, SmoothAdv and SmoothAdv + SC-RFP has neglectable difference, while in Figure 4 (c), the SC-RFP (σ = 0.25) increases the robust accuracy by around 5%. By introducing SC-RFP (σ = 0.1) to the benchmark classifier, it achieves comparable results with other state-of-art models when l 2 radius is around 1.5. l 2 radius 0.25 0.5 0.75 1.0 1.5 2.0 Clean Jeong & Shin (2020) We find two insightful observations from the above results. First, introducing the repression factor to the float path has a minimum negative effect on the standard accuracy, while can greatly boost the robust accuracy. This supports our discussion that float neuron vulnerable examples are caused by the float path, and can be cured by reducing their contribution. Second, we find that as the radius grows, the boost from our method increases. This is consistent with the theoretical basis of SC-RFP as well as the result from CIFAR10 dataset. When size of the perturbation grows, the uncertainty of the randomized algorithm increases along with the number of float paths as Figure 2 suggests. This results in a higher abstain rate and a higher possibility of misclassification. By repressing the value of the float path, the deviation of g(x + ϵ) from f (x) is reduced, therefore a more stable result can be achieved.

6. CONCLUSION

In this work, we introduce SC-RFP algorithm that can improve the performance of a randomized smoothing classifier. We first introduce a framework for describing the local activation status of neurons and show that most of the neurons are locally stable, while the others can greatly affect the model prediction. By decomposing the computational graph of the network, we find that the boost of robust accuracy provided by the smoothed classifier is averaging the deviation of float paths. Based on this, we suggest further repressing the value of float paths with SC-RFP method. The experiments show that our method can improve the performance of smoothing models. The Lebesgue measure of {x|z ij (x) ∈ {γ 1 , γ 2 , . . . , γ q }} is 0 in R n0dimension. From a geometric point of view, the collection of those x partition the input space R n0 into numerous regions by a set of hyperplanes {H ijk (θ)}, where: H ijk (θ) := {x ∈ U |z ij (x; θ) = γ k }, is determined by neuron (i, j) and breakpoint γ k . Each of the region, as Definition 1, is an activation region. The following figure presents a geometric illustration similar to that in the Section 4, but with the description of bent-hyperplanes. For example, the pink and green region are divided by the bent-hpyerplane H = x|z 1,4 (x) = 0, where z 1,4 is the pre-activation value of 4-th neuron in the 1st layer. Now we consider a sphere B(x, r) centered at x with radius r. B(x, r) is covered by the union of pink and green region. Since the two regions can be merged into one by removing the hyperplane, then the incomplete activation pattern of the merged region is I\{(1, 4)}. Therefore, (1, 4) is the only float neuron in B(x, r). This means that for every x ∈ B(x, r), all the neuron has same activation pattern expect for the neuron (1, 4). Now we consider the computational path of x ∈ B(x, r). Given a path ζ, if neuron (1, 4) is not on this path, then ζ is a fixed path, which are the black lines in the figure. This means that for every x ∈ B(x, r), the value of those fixed path are linear function with respect to x. On the other hand, all the non-linearity of function f (x) for x ∈ B(x, r) are contributed by the float path (Orange lines). With the illustration above, we proof Lemma 1 as follows. Lemma 1 (Restated). Let N be a neural network defined as Section 3.1. Given R ⊂ R n0 , denote I X and I T are the set of float neuron and fixed neuron in R. Then: 1. I X (R) I T (R) = I 2. R ⊂ R(A I X ) Proof. Given a neuron (i, j) and a region R ∈ R n0 . For any x ∈ R, we use âij (x). Since z ij (x) ∈ (-∞, ∞), the pattern of x is either an index k of the region when z ij (x) ∈ U k or -1 when z ij (x) locate on the bent-hyperplane H ijk := {x|z ij (x) = k}. This means every neuron has a pattern for x ∈ R. If for every x ∈ R n0 , the pattern of neuron (i, j) remains the same, then (i, j) is a fixed neuron. Otherwise, it is a float neuron. In other words, a neuron is either fixed or float in R. This implies that I X (R) I T (R) = I. Now we consider statement 2. For any x 1 , x 2 ∈ R, and an index of neuron (i, j). If âij (x 1 ) = âij (x 2 ), (i, j) is a fixed neuron in R: (i, j) ∈ I X . Denote the pattern a i j. Then for all x 1 ∈ R, z ij;x1,θ = z ij;x2,θ . Therefore, x 1 ∈ R(A X ).

B DECOMPOSING THE NETWORK

Before we delve into details, we have to restate the definition of path and introduce definition of sub-path. This enables us to take the bias, the computational graph of which starts at middle of the network instead of the beginning, into account. We start with introducing a sub-path of network N . Definition 6 (Sub-path). A sub-path of a path consists of several consecutive elements of γ: ζ (i,j) = {ζ j , ζ j+1 . . . , ζ i } ⊆ ζ We use ζ (i,j) (x) to represent the computation graph from (j, ζ j ) to (i, ζ i ): ζ i (x) = ζ (i,j) • ζ j,0 (x) Definition 4 (Path (Restate)). Let N be a network defined as Section 3.1. A path of N is a set of neurons: ζ := {(i, ζ i )|i = 0, 1, . . . , d}, ζ i ∈ {1, 2, ..., n i }, where n i is the number of neurons in layer i. The value of a path is defined as: ζ (i,j) (v, A) := v i m=j+1 d A m,ζm W ′(m) (ζm,ζm-1) , where x ζj is the ζ j -th element of input, W ′(i) is the equivalent matrix linear transformation ψ i • ϕ i , a m,ζm is the activation pattern of neuron (m, ζ m ) and d a m,ζm is the slope of activation σ within a m,ζm -th interval. Next, we consider the mapping from v, which can be either from x or a bias at intermediate layers, to output of i-th layer y i . Lemma 2. Let N be a neural network defined as Section 3.1. Given activation pattern A, for any x ∈ R(A; θ, σ, Γ). The k-th component of output vector of layer i can be represented as: y ik (x) = ∀ζi=k x j,ζj i m=j+1 d A m,ζm W ′(m) (ζm,ζm-1) + i-1 o=j ∀ζi=k β (o) ζo i m=o+1 d A m,ζm W ′(m) (ζm,ζm-1) (20) where x j,ζj is the ζ j -th element of input in layer j, W ′(i) and β (i) are the equivalent matrix and bias of linear transformation ψ i • ϕ i , A m,ζm is the activation pattern of ζ m -th component of layer m and d A m,ζm is the slope of pattern A m,ζm : d A m,ζm := σ ′ (t), t ∈ U A m,ζm . Proof (Lemma 2). We first show that the pre-activation transformation can be represented by a matrix. As ϕ i is a linear affine, we denote ϕ i (x) = W i x i . Combing with batch normalization layer, the mapping from input x i to pre-activation is: z i = γ i ϕ i (x) - μ σ + β i = γ σ ϕ i (x) - γ μ σ + β i = D i ( γ σ )W i x - γ μ σ + β i (21) where D i (•) is the diagonal operator, μ and σ are the mean and variance parameter of the ψ i . Denote W ′(i) = D i ( γ σ )W i , β ′ i = (β i -γ μ σ )/n i . We proof this byh deduction. For i = 1: z i = n0 j=0 W ′(1) ij d A1,i x 0j + n 1 × β ′(1) i . where A 1,i is the activation pattern of z 1i . Since all the path in layer i ends at i are {(1, i), (2, i), . . . , (n 0 , i)}. Equation 20 holds. Now we assume equation holds for i = p. z pi = ni j=0 W ′(i) x ij + n i × β ′(i) j = ni j=0 W ′(i) z i-1,j + n i × β ′(i) j = ni j=0 W ′(i) d Ai,j z i-1,j + n i × β ′(i) j = ni j=0 W ′(i) d Ai,j ( ∀ζi=j x j,ζj i m=p d A m,ζm W ′(m) (ζm,ζm-1) + i-1 o=p-1 ∀ζi=k β (o) ζo i m=o+1 d A m,ζm W ′(m) (ζm,ζm-1) ) + β ′(i) j = ∀ζp=k x j,ζj p m=j+1 d A m,ζm W ′(m) (ζm,ζm-1) + p-1 o=j ∀ζp=k β (o) ζo i m=o+1 d A m,ζm W ′(m) (ζm,ζm-1) Lemma 2 decomposes the computational graph of network N , while Theorem 1 categorize the paths of f (x) into fixed part and float part. Moreover, it describes the model robustness with by divide the variant of f into a linear part as well as a non-linear part. Theorem 1 (Restate). Let N be a neural network defined as 3. Given R ⊂ X, for any x, x ′ ∈ R with activation pattern A and A ′ , we have : 1. f (x) = Z I (x, A; R) + Z T (x, A; R) 2. f (x) -f (x ′ ) = J(x)(x -x ′ ) + Z T (x ′ , A; R) -Z T (x ′ , A ′ ; R) where Z I (x, A; R) = ζ∈Z I (R) ζ(x, A), Z T (x, A; R) = ζ∈Z T (R) ζ(x, A ) are the sum of fixed path and float path given the region R, J(x) is the Jacobian matrix of f at x. Before we proof above Theorem, we use the following Lemma to show that every fixed path in a region is linear regardless of the activation pattern. Lemma 3. Let N be a network defined as Section 3.3. Let ζ be a fixed path in R ⊂ R n0 . Then for any x, x ′ ∈ R with activation pattern Z I (x ′ , A; R) is the aggregation of all the fixed path above. Since the summation of linear function is still linear, we have: A, A ′ , ζ(x ′ , A) = ζ(x ′ , A ′ ). Moreover, Z I (x ′ , A; R) = Z I (x ′ , A ′ ; R). Proof (Lemma 3). Given x ∈ R Z I (x ′ , A; R) = Z I (x ′ , A ′ ; R) Proof (Theorem 1). From Lemma 1, every neuron is either fixed neuron or float neuron. For the neurons in path ζ, if there exists a float neuron, then the path is float path. Otherwise, it is a fixed path. The float path and fixed path are complementary set on the set of all paths. Lemma 2 decomposes the f (x) into summation of paths. As each path is either float or fixed, we have f (x) = Z I (x, A; R) + Z T (x, A; R). For statement 2, we have: f (x) -f (x ′ ) = Z I (x, A; R) + Z T (x, A; R) -(Z I (x ′ , A ′ ; R) + Z T (x ′ , A ′ ; R) = Z I (x, A; R) + Z T (x, A; R) -Z I (x ′ , A; R) -Z T (x ′ , A; R) + Z T (x ′ , A; R) -Z T (x ′ , A; R) Notice that, Z I (x ′ , A; R) is the collection of all fixed path, therefore is unrelated with the change of A. We have Z I (x ′ , A; R) = Z I (x ′ , A ′ ; R). The former part of above equation is equals to J(x)(x -x ′ ), which reaches to statement 4.

C MODEL ROBUSTNESS C.1 LIPSCHITZ VULNERABLE AND FLOAT NEURON VULNERABLE

In Section 4.2 we discuss two kinds of vulnerable data and discuss our motivation of SC-RFP. In this section, we present formally statements of the proposed idea. Before we move to Theorem 2, we need the following two lemmas. Lemma 5 suggests that, given a fixed path ζ, the expectation for the randomized algorithm g(ζ(x)) is equal to the ζ(x) when g satisfies certain conditions. In fact, the condition here just assume that the randomly sampled noise has equal probability at each direction, which is a natural assumption for most sampling procedure. Lemma 4. Denote N as a neural network as Definition 3 with mapping function f , ζ is a fixed path in B 2 (x, p). Assume g is a randomized algorithm that applies noise ϵ from certain distribution D noise on the computation path of ζ: g (ζ(x)) = ζ d,i (ζ i,0 (x) + ϵ), i ∈ {0, 1, . . . , d -1}, ϵ ∼ D noise (24) If the direction of ϵ is uniformly distributed, then the expectation of g(ζ(x)) is equals to ζ(x), that is: ∀∥η 1 ∥ = ∥η 2 ∥ = 1, P ( ϵ ∥ϵ∥ = η 1 ) = P ( ϵ ∥ϵ∥ = η 2 ) ⇒ E[g(ζ(x))] = ζ(x), where ϵ is the noise generated for randomized algorithm g. Proof. Assume that g samples noise at layer k. Then for any g, ζ ( k, 0) = g(ζ k,0 ), which we denote as x k . Since the activation function is piecewise linear, the mapping of ζ d,k (•) is linear. We have: n k i ∂ 2 y j ∂z 2 kj = 0. ζ d,k is harmonic. Therefore, given any B(x, r) with radius r > 0: ζ d,k (x) = 1 nω n r n-1 ∂B(x,r) ζdσ, where ω n is the volume of the unit ball in n dimensions, σ is the (n -1) dimensional surface measure. Now we consider the expectation of g(x): E(g(x)) = p(ϵ)ζ d,k (x k + ϵ)dµ = ∥ϵ∥=r p(ϵ)ζ d,k (x k + ϵ)dµdr, ( ) where µ is the probability measure of ϵ D noise . As the noise sampling is assumed to be direction irrelevant, the measure of p(ϵ|∥ϵ∥ 2 = r) is uniformly distributed given radius r. Therefore: ∥ϵ∥=r p(ϵ)ζ d,k dµ = P (∥ϵ∥ 2 = r)ζ d,k . Equation 28 then equals: E(g(x)) = p(ϵ)ζ d,k (x k + ϵ)dµ = P (∥ϵ∥ 2 = r)ζ d,k dr = ζ d,k (x) Lemma 5. Let f and g be the mapping function and randomized classifier defined as above. Given radius r such that, almost surely, Z T (B(x, r)) = ∅, ∀(x, y) ∼ D, then the accuracy of base classifier and the naive smoothed classifier are same, that is: E (x,y)∼D arg max m∈Y g m (x) = y = E (x,y)∼D arg max m∈Y f m (x) = y Lemma 5 is the achieved by directly applying Lemma 4 on the all the computational graph of the network. It shows that if all the neurons have locally stable activation pattern with respect to the distribution of dataset, the smoothed classifier provides identical accuracy with the base classifier. Next, we present the proof of Theorem 2. Theorem 2. Let f be the base classifier. Given (x, y) ∼ D with f (x) = y, denote M (f (x), y) := min y ′ ̸ =y |f (x) y -f (x) ′ y | as the margin operator of prediction vector. If ∥J(x)∥ > M (f (x), y) + E[M (Z T (x, A; R), y)] r , then for any smoothed classifier g defined as above, there exist x ′ ∈ B(x, r) such that g(x ′ ) ̸ = y. Proof. Statement 4 of 1 suggests: f (x) -f (x ′ ) = J(x)(x -x ′ ) + Z T (x ′ , A; R) -Z T (x ′ , A ′ ; R) (31) As ∥J(x)∥ > M (f (x),y)+M(E[Z T (x,A;R)],y) r , there exist x ′ and i such that: f i (x ′ ) > f y (x ′ ) + E[Z T y (x, A; R) -Z T i (x, A; R)] Therefore, g i (x) -g i (x ′ ) > g y (x) -g y (x ′ ) + M (f (x), y), g i (x ′ ) > g y Next, we consider the case that x is an adversarial example that mislead f but correctly classified by g, which we referred to as a float neuron vulnerable example. The following theorem suggests that, the float path at x are the cause of the altered prediction. In other words, the network is locally correct around x, while there are sudden change from the float path that causes the misclassification. Theorem 4. Let f and g be the base and smoothed classifier defined above. Given (x, y) ∼ D, denote x ′ ∈ B(x, r) as an adversarial example that mislead the base classifier but corrected by g without abstain: arg max m∈Y f m (x) = i, arg max m∈Y g m (x) = y, i ̸ = y (34) then loss of float path is higher than that of g: CE(Z T (x ′ , A; R), onehot(y)) > CE(g(Z T (x ′ , A; R)), onehot(y)) where CE(•, •) is the cross entropy loss, onehot(y) is the one hot embedding of label y. Proof (Theorem 4). Since arg max m∈Y f m (x) ̸ = y, arg max m∈Y g m (x) = y, there exist i ∈ {1, . . . , c}: g y (x) >f y (x) g i (x) <f i (x) (36) Then we have: g y (x) -g i (x) ≥ f y (x) -f i (x) Introducing Statement 4 of Lemma 1 and Lemma 5: g i (Z T (x, A; R)) -g y (Z T (x, A; R)) < Z T i (x, A; R) -Z T y (x, A; R). Moreover, since the prediction is not abstained. E[g y (x)] -f y (x) > E[g j (x)] > f j (x), ∀j ̸ = i, y This directly leads us to the result. Theorem 2 and Theorem 4 show that a smoothed classifier is not able to boost the performance of fixed path, while it is applied to reduce the sudden change of float path in a region. To be specific, if x ′ can be corrected by g, it is resulted from the sudden change provided by the float path. A higher confidence score of g(x ′ ) can be achieved by reducing the weight of float path during the computation path. On the other hand, if x ′ is Lipschitz vulnerable, then regardless of the form of smoothed classifier, it cannot be fixed. This lead us to the theoretical basis of SC-RFP algorithm. 

C.2 VERIFIABLE RADIUS

At the end of section 4.2, we propose Theorem 3 to describe the upper and lower bound of SC-RFP as well as the certified radius. We present the proof of Theorem 3 below. Theorem 3. Let N be a network defined as Section 3.1. Let g be a smoothed classifier that samples noise from distribution D noise and g ′ is the SC-RFP built on g. Assume that the direction of ϵ is uniformly distributed. ∀∥η 1 ∥ = ∥η 2 ∥ = 1, P ( ϵ ∥ϵ∥ = η 1 ) = P ( ϵ ∥ϵ∥ = η 2 ), ϵ ∼ D ( ) If arg max m∈Y f y (x) = y, then p ′ A > p A , p ′ B < p B , where p ′ A , p A are the lower bound of g ′ y (x) and g y (x), p ′ B , p B are the upper bound of g ′ m̸ =y (x), g ′ m̸ =y (x). Moreover, arg max m∈Y g ′ (x) = y for all ∥ϵ∥ ≤ R, R = σ 2 (Φ -1 (p A ) -Φ -1 (p B )) Proof. The proof of Theorem 3 can be divided into three steps. First, we show that given ϵ ∼ D noise , the expectation of fixed paths for the any E(Z X (x + ϵ)) = Z X (x). (44)



Appendix A provides additional discussion regarding activation pattern and geometric intuition. Appendix B explains the computational graph of network with more details. f (x) = Z I (x, A; R) + Z T (x, A; R) Code is provided at: https://github.com/OrangeBai/APCT-master



; Zhang et al. (2019); Huang et al. (2021)). As verifiable training methods often come with a compromise of performance, recent works focus on bridging the gap between adversarial and verifiable training to address the scalability and accuracy issue (Xiao et al. (2018); Balunović & Vechev (2020); De Palma et al. (2022)).

Figure 1: An illustration of the fixed (float) path and neuron of a neural network with 2D input, 4D output, and 1 hidden layer. (1) The 2D input space; (2) A sphere centered at x; (3) A float neuron with index (1,4) in the network.

Figure 2: Proportion of fixed neuron between x and x + ϵ for VGG16 models trained on CIFAR10 with noised data at different scales: clean data, σ = 0.05, σ = 0.10 and σ = 0.25.Figure 2(a) and 2(b) show the ratio given ϵ ∼ N (0, 0.1) and ϵ ∼ N (0, 0.25).

Figure 3: Certified Accuracy of base methods and SC-RFP (η = 0.1) with different level of noise on CIFAR10. The solid and dashed lines represent benchmark and SC-RFP.

Figure 4: Certified Accuracy of base methods and SC-RFP on ImageNet. Model are trained and tested with (a) σ = 0.25, (b) σ = 0.5 and (c) σ = 1.0. The solid and dashed lines represent benchmark and SC-RFP with different repression factor.

Figure 5: An illustration of the fixed (float) path and neuron of a neural network with 2D input, 4D output, and 1 hidden layer. (1) The 2D input space; (3) A sphere centered at x; (4) A float neuron with index (1,4) in the network.; (4) A bent-hyperplane defined by H = x|z 1,4 (x) = 0;

Since ζ is a fixed path in R, then every neuron alone ζ is fixed neuron, the activation pattern of(m, ζ m ) is same regardless of input x. Therefore, d A m,ζm is constant for any x ∈ R. Then ζ(x, A) is a linear function of x : ζ(x, A) = ζ(x).In other words, the change of activation pattern does not affect the neurons on ζ, therefore the slope of this path does not change. ζ(x) is dependent on x in region R. This means that ζ(x ′ , A) = ζ(x ′ , A ′ ) for any x, x ′ ∈ R with activation pattern A, A ′

Figure 6: The value of first element of prediction vector from VGG16 model trained on CIFAR10 given a 2D slice centered at a random data from test set. (a) The wireframe represents the prediction f (x) while the surface is the sum of fixed path Z I (x, A; R), respectively. (b) The sum of float path Z T (x, A; R) = f (x) -Z I (x, A; R). (c) The wireframe and surface are prediction of SC-RFP: Z I (x, A; R) + ηZ T (x, A; R) with η < 1 and sum of fixed path same as (a).

Figure6illustrates the SC-RFP by showing the prediction, fixed path and float path. We first present the prediction and the fixed path value in Figure6(a). The float path value is then computed as Z T (x, A; R) = f (x) -Z I (x, A; R) in Figure6(b). At last, by repressing the float value, we achieve a locally stable prediction from SC-RFP.

Consider ζ is a fixed path on a region R, then from Lemma 4 E[g(ζ(x))] = ζ(x). (43) Notice that, since the SC-RFP does not affect the sample of noise, then above equation holds for both g and g ′ : E[g ′ (ζ(x))] = ζ(x).

Experiment results for VGG16 trained on CIFAR10 with noise at different scales.The above theorem suggests that randomized smoothing fails to correct misclassified data x if x has an extremely large Lipschitz constant. Empirically, randomized smoothing cannot provide a certified radius for models trained with clean data, while can achieve minor accuracy on models trained with slightly perturbed data. The second set in Table1suggests that training models with noised sample can compress the Lipschitz constant therefore enables the effect of smoothed classifier.Float Neuron Vulnerable. On the other hand, the float part contributes to f (x ′ ) -f (x) differently. The last set in Table1presents the accuracy of FGSM(Goodfellow et al. (2014)) examples x adv with ϵ = 8/255 under l ∞ norm and the accuracy of sum of fixed paths Z I (x adv ) between {x ′ , x}.



compares the certified robust accuracy of different models at different radius on ImageNet. It shows that when deployed on the PGD+noise training (Salman et al. (

Certified robust accuracy for models with different methods on ImageNet

APPENDIX

The Appendix contains four sections. Section A further discusses the geometric intuition following Section 3.3. Section B provides proofs of the decomposition of computational graph following Section 4. Section C further invesitgates the properties of proposed SC-RFP algorithm. At last, Section D provide supplementary experiment results for Section 5

A GEOMETRIC ILLUSTRATION

This section provides geometric illustration of the proposed framework. For the completeness of definition of activation region, we discuss the case that pre-activation value z ij (x) locates within none of the intervals. In other words, we consider the case that z ij (x; θ) ∈ {γ 1 , γ 2 , . . . , γ q }. Definition 1 (Generalized Activation Pattern / Region (Restate)). Let N be a network defined as in Section 3.1. Denote Γ = {γ 1 , γ 2 , . . . , γ q } as a set of breakpoints that separates the domain of activation function into q + 1 intervalsAn activation pattern of N is defined as an indexed family that assigns each neuron a sign to represent the status of its pre-activation in U :Given an activation pattern A, the activation region of A is defined as:Conversely, given x ∈ R in , we denote the activation pattern of neuron (i, j) at x as:The activation pattern of x is the collection of the all the pattern of neurons:Inverse of activation region operator. The generalized activation pattern describes the activation status of each unit at the intermediate layers. Given an activation pattern, R(•) finds the region, such that, for every x ∈ R(A; θ, σ, Γ), the pre-activation value of each neuron z ij (x; θ) locates within the interval U aij that determined by its pattern a ij ∈ {0, 1, . . . , q}. However, the activation pattern of x given z ij (x) ∈ {γ 1 , γ 2 , . . . , γ q } cannot be determined under previous definition as well as previous works (Hanin & Rolnick (2019b )Raghu et al. (2017) ). Therefore, we introduce an inverse operator Â(x; θ, σ, Γ) that computes the activation pattern of an input x ∈ R n0 in Definition 1. This allows us to fill the completeness with the framework.Next, we consider the set {x|z ij (x) ∈ {γ 1 , γ 2 , . . . , γ q }}. Under trivial assumption that the probability distribution of parameter set {θ} has no atom:By applying randomized smoothing on statement 1 of Lemma 1, we have:where we use g ′ (Z T (x, A; R)) to denote applying smoothing algorithm g on deterministic function Z I (x, A; R). Since Z I (x, A; R) is the aggregation of all the fixed path, with Equation 43 and 44, we have :This means that the difference between E(g(x)) and E(g(x)) is same with that of g(Z T (x, A; R)) and g ′ (Z T (x, A; R)).Next, we discuss how the float path affect the prediction of g(Z T (x, A; R)) and g ′ (Z T (x, A; R)).Given ϵ ∼ D noise , we denote P 1 as the event that f y (x + ϵ) > f y (x). Since we have excluded the fixed path, P 1 holds means that the float path boosts the probability of f y (x + ϵ). However, as arg max m ∈ Y f y (x) = y, repressing the float path does not alter the prediction of f y (x ′ ). On the other hand, we denote P 2 as the event that f y (x + ϵ) < f y (x). This means that the float path Z T (x + ϵ, A; R) < 0 and negatively contributes to the prediction. Then repressing the float path means that can increase the probability of y.In other words, for any x + ϵ, repressing the float path between x and x ′ can increase the predicted f y (x + ϵ). We assume that P 1 and P 2 happens with probability p 1 and p 2 , then given a certain number of sampling,By applying Chebyshev Inequality, we have p ′ A > p A . Similarly, we also have p ′ B < p B . Therefore, we conclude the lower bound p ′ A is larger than that of p A , and similar to p ′ B . Computing the certified radius of x then is same with that of Cohen et al. (2019) . Since both g and g ′ are random function that sample noise from same distribution, the Neyman-Pearson theorem holds for both g and g ′ . Therefore, the certified radius remains unchanged.

D EXTRA TABLES AND FIGURES

This section present the complete experiment results. Table 4 shows the certified accuracy at different l 2 radius level.Table 5 presents the experiment result of our model certified with different repression rate. Among the different repression rate, we find that when η = 0.25 SC-RFP provides the best performance on increasing the model robust accuracy for perturbation with large size, while the standard accuracy and are slightly damaged. This is also observed on previous works. 

