ROBUST UNIVERSAL ADVERSARIAL PERTURBATIONS

Abstract

Universal Adversarial Perturbations (UAPs) are imperceptible, image-agnostic vectors that cause deep neural networks (DNNs) to misclassify inputs from a data distribution with high probability. In practical attack scenarios, adversarial perturbations may undergo transformations such as changes in pixel intensity, rotation, etc. while being added to DNN inputs. Existing methods do not create UAPs robust to these real-world transformations, thereby limiting their applicability in attack scenarios. In this work, we introduce and formulate robust UAPs. We build an iterative algorithm using probabilistic robustness bounds and transformations generated by composing arbitrary sub-differentiable transformation functions to construct such robust UAPs. We perform an extensive evaluation on the popular CIFAR-10 and ILSVRC 2012 datasets measuring our UAPs' robustness under a wide range common, real-world transformations such as rotation, contrast changes, etc. Our results show that our method can generate UAPs up to 23% more robust than existing state-of-the-art baselines.

1. INTRODUCTION

Deep neural networks (DNNs) have achieved impressive results in many application domains such as natural language processing (Abdel-Hamid et al., 2014; Brown et al., 2020) , medicine (Esteva et al., 2017; 2019) , and computer vision (Simonyan & Zisserman, 2014; Szegedy et al., 2016) . Despite their performance, they can be fragile in the face of adversarial perturbations: small imperceptible changes added to a correctly classified input that make a DNN misclassify. While there is a large amount of work on generating adversarial perturbations (Szegedy et al., 2013; Goodfellow et al., 2014; Moosavi-Dezfooli et al., 2016; Madry et al., 2017; Carlini & Wagner, 2017; Xiao et al., 2018a; Dong et al., 2018; Croce & Hein, 2019; Wang et al., 2019; Zheng et al., 2019; Andriushchenko et al., 2019; Tramèr et al., 2020) , the threat model considered by these works cannot be realized in practical scenarios. This is because the threat model depends upon unrealistic assumptions about the power of the attacker: the attacker knows the DNN input in advance, generates input-specific perturbations in real-time and exactly combines the perturbation with the input before being processed by the DNN. Practically feasible adversarial perturbations. In this work, we consider a more practical adversary to reveal real-world vulnerabilities of state-of-the-art DNNs. We assume that the attacker (i) does not know the DNN inputs in advance, (ii) can only transmit additive adversarial perturbations, and (iii) their transmitted perturbations are susceptible to modification due to real-world effects. Examples of attacks in our threat model include adding stickers to the cameras for fooling image classifiers (Li et al., 2019b) or transmitting perturbations over the air for deceiving audio classifiers (Li et al., 2019a) . Note that this threat model is distinct from directly generating adversarial examples (Athalye et al., 2018) which require access to the original input. The first two requirements in our threat model can be fulfilled by generating Universal Adversarial Perturbations (UAPs) (Moosavi-Dezfooli et al., 2017) . Here the attacker can train a single adversarial perturbation that has a high probability of being adversarial on all inputs in the training distribution. However, as our experimental results show, the generated UAPs need to be combined with the DNN inputs precisely, otherwise they fail to remain adversarial. In practice, changes to UAPs are likely due to real-world effects. For example, the stickers applied to a camera can undergo changes in contrast due to weather conditions or the transmitted perturbation in audio can change due to noise in the transmission channel. This non-robustness reduces the efficiency of practical attacks created with existing methods (Moosavi-Dezfooli et al., 2017; Shafahi et al., 2020; Li et al., 2019b; a) . This work: Robust UAPs. To overcome the above limitation, we propose the concept of robust UAPs: perturbations that have a high probability of remaining adversarial on inputs in the training distribution even after applying a set of real-world transformations. The optimization problem in generating robust UAPs (Moosavi-Dezfooli et al., 2017) is the main challenge as we are looking for perturbations that are adversarial for a set of inputs as well as to transformations applied to the perturbations. To address this challenge, we make the following main contributions: • We introduce Robust UAPs and formulate their generation as an optimization problem. • We design a new method for constructing robust UAPs. Our method is general and works for any transformations generated by composing arbitrary sub-differentiable transformation functions. We provide an algorithm for computing provable probabilistic bounds on the robustness of our UAPs against many practical transformations. • We perform an extensive evaluation of the effectiveness of our method, RobustUAP, on stateof-the-art models for the popular CIFAR-10 ( Krizhevsky et al., 2009) and ILSVRC 2012 (Deng et al., 2009) datasets. We compare the robustness of our UAPs under compositions of challenging real-world transformations, such as rotation, contrast change, etc. We show that on both datasets, the UAPs generated by RobustUAP are significantly more robust, achieving up to 23% more robustness, than the UAPs generated from the baselines. Our work is complementary to the development of real-world attacks (Li et al., 2019a; b) in various domains, which require modeling how the universal perturbations change during transmission. RobustUAP can improve the efficiency of such attacks by constructing perturbations that are more robust against domain-specific, real-world transformations than possible with existing algorithms (Moosavi-Dezfooli et al., 2017; Shafahi et al., 2020; Li et al., 2019a; b) .

2. BACKGROUND

In this section, we provide necessary background definitions and notation for our work. Adversarial Examples and Perturbations. An adversarial example is a misclassified data point that is close (in some norm) to a correctly classified data point (Goodfellow et al., 2014; Madry et al., 2017; Carlini & Wagner, 2017) . Let µ ⊂ R d be the input data distribution, x ∈ µ be an input point with the corresponding true label y ∈ R, and f : R d → R d ′ be our target classifier. For ease of notation, we define f k (x) to be the k th element of f (x) and allow f (x) = arg max k f k (x) to directly refer to the classification label. We use v to reference image specific perturbations and u to reference universal adversarial perturbations, v r and u r refer to the robust variants and will be defined in Sec. 3. We now formally define an adversarial example. Definition 2.1. Given a correctly classified point x, a distance function d(•, •) : R d × R d → R, and bound ϵ ∈ R, x ′ is an adversarial example iff d(x ′ , x) < ϵ and f (x ′ ) ̸ = y. In this paper, we consider examples x ′ generated as x ′ = x+v where v is an adversarial perturbation. Universal Adversarial Perturbations. UAPs are single vector, input-agnostic perturbations (Moosavi-Dezfooli et al., 2017) . They differ from traditional adversarial attacks, which create perturbations dependent on each input sample. To measure UAP performance, we introduce the notion of universal adversarial success rate, which measures the probability that a perturbation u when added to x, sampled from µ, causes a change in classification under f . Definition 2.2. Given a data distribution µ, and perturbation u, universal adversarial success rate ASR U for u, is defined as ASR U (f, µ, u) = P x∼µ ( f (x + u) ̸ = f (x)) Using Definition 2.2, we formally define a UAP. Definition 2.3. A universal adversarial perturbation is a vector u ∈ R d which, when added to almost all datapoints in µ causes the classifier f to misclassify. Formally, given γ, a bound on universal ASR, and l p -norm with corresponding bound ϵ, u is a UAP iff ASR U (f, µ, u) > γ and ||u|| p < ϵ. In general, if the additive perturbations have small l p -norm , then they look like noise and do not affect the semantic content of the image. For ease of notation in later parts of the paper, we can also pose the construction of UAPs as an expectation minimization problem: arg min u E x∼µ [δ( f (x + u), f (x))] s.t. ||u|| p < ϵ (2) where δ is the Kronecker Delta function (Agarwal, 2013) .

3. ROBUST UNIVERSAL ADVERSARIAL PERTURBATIONS

In this section, we will define the optimization problem for generating robust UAPs. We first define transformation sets and neighborhoods. Definition 3.1. A transformation, τ , is a composition of bijective sub-differentiable transformation functions. A transformation set, T , is a set of distinct transformations. A point v ′ is in the neighborhood N T (v), of v, if there is a transform in T that maps v to v ′ . Formally, v ′ ∈ N T (v) ⇐⇒ ∃τ ∈ T s.t. τ (v) = v ′ (3) Example 3.2. Let T be the set of all transformations represented by a rotation of ±30 • , scaling of up to a factor of 2, and a translation of up to ±2 pixels, in this case one τ ∈ T could be {rotation of 8 • , scaling a factor of 1.2, and translation of -1.3} in that order and N T (v) would include any point that can be obtained by applying one of the transformations from T on v. In order to define robust UAPs we introduce robust universal adversarial success rate. Definition 3.3. Given a data distribution µ, transformation set T , universal ASR level γ, bound ϵ on l p -norm, and perturbation u r , robust universal adversarial success rate, ASR R , is defined as, ASR R (f, µ, T, γ, u r ) = P u ′ r ∼N T (ur) (ASR U (f, µ, u ′ r ) > γ ∧ ||u ′ r || p < ϵ) The robust universal adversarial success rate measures the probability that a neighbor of u r is also an UAP on µ, i.e. after transformation it maintains high universal ASR. We note that even though ||u r || p ≤ ϵ, it can happen that a u ′ r ∈ N T (u r ) has ||u ′ r || p > ϵ, this is particularly true for the semantic transformations considered in this work. Therefore, we require that the norm of u ′ r is small. Using Definition 3.3 we can now formally define a robust UAP. Definition 3.4. A robust universal adversarial perturbation, u r , is one which most points within a neighborhood of u r when added to most points in µ fool the classifier, f . u r satisfies ||u r || p < ϵ and ASR R (f, µ, T, γ, u r ) > ζ. In order to construct robust UAPs, we can pose the following expectation minimization problem: arg min ur E u ′ r ∈N T (ur) [I(||u ′ r || < ϵ) × E x∼µ [δ( f (x + u ′ r ), f (x))]] s.t. ||u r || p < ϵ Here I : R d → R denotes an indicator function. The inner expectation represents the UAP condition for the transformed perturbation u ′ r while the outer expectation represents the neighborhood robustness condition. Solving Equation 5 requires computing u r which minimizes the expectation over the transformation set and data distribution. This composition makes it computationally harder than minimizing over only the transformation set, as in EOT (Athalye et al., 2018) , or than minimizing over only the data distribution, as done for standard UAP (Moosavi-Dezfooli et al., 2017) .

4. GENERATING ROBUST UNIVERSAL ADVERSARIAL PERTURBATIONS

In this section, we will discuss our approach for optimizing Equation 5 to generate UAPs robust to transformations generated by a composition of arbitrary sub-differentiable transformation functions. At a high level, the objective can be seen as gluing the outer expectation, a EOT objective over the transformations applied on the perturbation, with the inner expectation, a UAP objective over the input data distribution. We first describe intuitive baselines for optimizing Equation 5 and then present our new algorithm, RobustUAP.

4.1. STOCHASTIC GRADIENT DESCENT

The first baseline directly solves Equation 5 using gradient descent. Since we are solving a constrained optimization problem, we cannot use gradient descent directly. Instead, we can solve the Lagrangianrelaxed form of the problem as in (Carlini & Wagner, 2017; Athalye et al., 2018) . arg min ur E u ′ r ∈N T (ur) [I(||u ′ r || < ϵ) × E x∼µ [δ( f (x + u ′ r ), f (x))]] -λ||u r || p We use a momentum based Stochastic Gradient Descent (SGD) method for solving Equation 6. Shafahi et al. (2020) suggests that this is an effective method for generating standard UAPs. In order to implement this, we replace the Kronecker Delta function with a loss function, L. We iteratively converge towards the inner expectation by computing it in batches, and towards the outer expectation by sampling a large number of transformations. Given that we would like to estimate on a batch, x ⊂ µ, and a random set of transformations sampled from T , τ ⊂ T , we can approximate Equation 6: I(||τ j (u r )|| < ϵ) |x| × |τ | |x| i=1 |τ | j=1 L[f (x i + τj (u r )), f (x i )] -λ||u r || p (7) Our final algorithm is in Appendix C.

4.2. STANDARD UAP ALGORITHM WITH ROBUST ADVERSARIAL PERTURBATIONS

For our second baseline, we leverage the standard UAP algorithm from Moosavi-Dezfooli et al. ( 2017) (see Appendix D for the algorithm). The standard UAP algorithm iterates over the entire training dataset and at each input, x i , computes the smallest additive change, ∆u, to the current perturbation, u, that would make u + ∆u an adversarial perturbation for x i . Intuitively, over time the algorithm will approach a perturbation that works on most inputs in the training dataset. This approach works by computing robust adversarial perturbations rather than standard adversarial perturbations. At each point x i , we compute the smallest additive change, ∆u r , to the current robust adversarial perturbation, u r , that would make u r + ∆u r a robust adversarial perturbation for x i . We search for robust adversarial perturbations by optimizing the expectation that a point in the neighborhood of v r is adversarial while restricting the perturbation to an l p norm of ϵ. We formulate this as the following minimization problem: arg min vr E v ′ r ∈N T (vr) [I(||v ′ r || < ϵ) × δ( f (x + v ′ r ), f (x))] s.t. ||v r || p < ϵ (8)

4.3. ROBUST UAP ALGORITHM

The baseline algorithms have two fundamental limitations: (i) they rely on random sampling over the symbolic transformation region, but the sampling strategy does not explicitly try to maximize the robustness of the generated UAP over the entire symbolic region, and (ii) they do not estimate robustness on unsampled transformations. As a result, the baselines yield suboptimal UAPs (as confirmed by our experiments below). To overcome these fundamental limitations, we create a method to compute probabilistic bounds for expected robustness on an entire symbolic region. We leverage this method for approximating expected robustness in a new algorithm to generate robust UAPs with guarantees. We make a simplifying assumption that N T (u r ) has a well defined, sampleable probability density function (PDF) as we cannot bound robustness for arbitrary transformations. Our experiments show that even though our assumptions do not hold for all the transformation sets considered in this work, they significantly improve the robustness of our generated UAPs. Our approximation of the expected robustness relies on the following theoretical result: Theorem 4.1. Given a perturbation u r , a neural network f , a finite set of inputs X, a set of transformations T , and minimum universal adversarial success rate γ ∈ R. Let p(γ) = P u ′ r ∼N T (ur) (ASR U (f, X, u ′ r ) > γ). For i ∈ 1 . . . n, let u i r ∼ N T (u r ) be random variables with a well defined PDF and I : R d → R be the indicator function, let pn (γ) = 1 n n i=1 I(ASR U (f, X, u i r ) > γ) For accuracy level, ψ ∈ (0, 1), and confidence, ϕ ∈ (0, 1), where (0, 1) is the open interval between 0 and 1. If n ≥ 1 2ψ 2 ln 2 ϕ then P (|p n (γ) -p(γ)| < ψ) ≥ 1 -ϕ (10) Proof. The bound on n is derived via the Chernoff inequality applied to pn (γ) and E[pn(γ)] = p(γ) (Chernoff, 1952; Alippi, 2014) . Equation 10holds since computing universal ASR is Lebesgue measurable over the data distribution and since we assume N T (u r ) has a well defined PDF. Theorem 4.1 states that with enough samples from the neighborhood of a perturbation, u r , the adversarial success rate of u r on the entire neighborhood is arbitrarily close to the adversarial success rate of u r on sampled transformations with probability greater than 1 -ϕ. One key observation is that the Chernoff bound is independent of the dimensionality of the sample space which allows us to efficiently apply this result to high-dimensional transformation set provided they have a well-defined PDF (e.g., L ∞ -ball) and obtain provable bounds on the expected robustness. For the combinations of semantic transformations, such as rotation, translation, etc. used in the experiment section the neighborhood does not have a well-defined PDF, thus we uniformly sample the parameter space of each transformation to produce a point in the neighborhood. We believe uniformly sampling the parameter space is a realistic approximation of real-world effects. Leveraging Theorem 4.1, we create EstimateRobustness which given accuracy, ψ, and confidence, ϕ, returns the robust adversarial success rate on a finite set of inputs with probabilistic robustness guarantees under the assumptions of Theorem 4.1. The pseudocode for EstimateRobustness is in Algorithm 1 Algorithm 1 EstimateRobustness 1: Draw n = ⌈ 1 2ψ 2 ln 2 ϕ ⌉ samples τ i ∼ T 2: Compute pn (γ) = 1 n n i I(ASR U (f, X, τ i (u r )) > γ) 3: Return pn (γ) Our algorithm: RobustUAP. We leverage Theorem 4.1 and Algorithm 1 to develop RobustUAP, the pseudocode for which is seen in Algorithm 2. Similar to the SGD baseline, we approximate the expectation in Equation 5 in batches. We start by sampling transformations from the PDF of the neighborhood. We set the number of transformations, n, based on Theorem 4.1 to satisfy the desired confidence level and accuracy. For each gradient step, we compute the mean loss over the current batch and set of sampled transforms (line 8). For each set of batch and sampled transformations, instead of making a single gradient update like SGD, we use Projected Gradient Descent (PGD) to iteratively compute a more robust update to the universal perturbation and end only when the estimated robustness on the batch satisfies a given threshold (line 10). At the end of each epoch, we check the robustness across the entire training set and transformation space using EstimateRobustness and stop when we have reached the desired performance (line 14). Algorithm 2 Robust UAP Algorithm end for 14: until EstimateRobustness(f, X, T, γ, u r , ψ, ϕ) < ζ 1: Initialize u r ← 0, n ← ⌈ 1 2ψ 2 ln 2 ϕ ⌉ 2: For i = 1 . . . n sample τ i ∼ T 3: repeat 4: for B ⊂ X do 5: if EstimateRobustness(f, B, T, γ, u r , ψ, ϕ) < ζ then 6: ∆u r ← 0 7: repeat 8: Compute L B,τ = 1 |B|×n |B| i=1 n j=1 L[f (B i + τ j (u r + ∆u r )), f (B i )]

5. EVALUATION

Our RobustUAP framework is applicable to all transformation sets in a variety of domains. We empirically evaluate our method RobustUAP and three baseline approaches (SGD, StandardUAP_RP, StandardUAP (Moosavi-Dezfooli et al., 2017) ) on popular models from the vision domain. We show that RobustUAP is more robust on both uniform random noise and compositions of real-world transformations such as rotation, scaling, etc. Experimental evaluation. We consider two popular image recognition datasets: CIFAR-10( Krizhevsky et al., 2009) and ILSVRC 2012 (Deng et al., 2009) . For CIFAR-10, we evaluate on the entire test set (1,000 images) and use a state-of-the-art pretrained VGG16 (Simonyan & Zisserman, 2014) network as the target classification model. For ILSVRC 2012, we evaluate on a random subset of the test set (1,000 images), and use a state-of-the-art Inception-v3 (Szegedy et al., 2016) network. We evaluate the robustness against uniform random noise as well as a composition of transformations from brightness/contrast, rotation, scaling, shearing, and translation. All experiments were performed on a desktop PC with a GeForce RTX(TM) 3090 GPU and a 16-core Intel(R) Core(TM) i9-9900KS CPU @ 4.00GHz. We report the results for l 2 -norm with ϵ = 100 for ILSVRC 2012 and ϵ = 10 for CIFAR-10. These values were chosen based on the values presented by the original UAP paper (Moosavi-Dezfooli et al., 2017) . We use an image normalization function given by our pretrained models and thus scaled our ϵ values accordingly. We note that the ϵ-values are significantly smaller than the image norms. Therefore the generated perturbation is imperceptible and does not affect the semantic content of the image. Due to the hardness of the optimization problem, for the same norm value, the effectiveness of a UAP is less than input-specific perturbations. We note that crafting input-specific perturbations requires making unrealistic assumptions about the power of the attacker as mentioned in the introduction and therefore we do not consider them part of our threat model which aims to generate practically feasible perturbations. We use ψ = 0.05 and ϕ = 0.05 resulting in n = 738 for generating samples for our RobustUAP algorithm as well as reporting robust ASR in our evaluation. The UAPs are trained on 2,000 images, other parameters for evaluation are given in Appendix E.

5.1. ROBUSTNESS TO RANDOM NOISE

First, we show that our algorithm generates UAPs robust against uniform random noise. Here our neighborhood is defined as an L ∞ ball of radius ϵ around the perturbation. U (ϵ) represents noise drawn uniformly from such a ball. Figure 2 shows the performance of each algorithm. For example, the RobustUAP algorithm achieves a ASR U of 0.9 greater than 97% of the time under U (0.1) on CIFAR-10, where all other algorithms achieve 0.9 at most 30% of the time. RobustUAP outperforms all other algorithms for both noise sizes. StandardUAP has a lower mean and higher variance in universal ASR and is much less robust to transformation. A table of Robust ASR results for γ = 0.8 can be seen in Appendix F. Our Robust ASR results are guaranteed to be ±0.05 from the actual result with a probability of 95%. For example, we estimate that RobustUAP has ASR R of 96.1% for U(0.3), we are guaranteed that the true robustness is > 91.1% with a probability of 95%. Note that we get robustness guarantees from EstimateRobustness as our neighborhood has a well-defined PDF. 

5.2. ROBUSTNESS TO SEMANTIC TRANSFORMATIONS

Next, we consider transformation sets generated by composing five popular semantic transformations in existing literature (Athalye et al., 2018; Balunović et al., 2019) : brightness/contrast, rotation, scaling, shearing, and translation. We use a variety of different compositions to show that our algorithm works under different conditions, and base our parameters for the transformations on (Balunović et al., 2019) . For our experiments, R(θ) corresponds to rotations with angles between ±θ; T (x, y), to translations of ±x horizontally and ±y vertically; Sc(p) to scaling the image between ±p%; Sh(m) to shearing by shearing factor between ±m%; and B(α, β) to changes in contrast between ±α% and brightness between ±β. Further details about these transformations can be seen in Appendix A. We consider compositions of different subsets and ranges of these transformations shown in Table 1 including composing all transformations together. The hardness of generating robust UAPs depends on the effect that the transformation set has on the UAP (i.e. random noise has a relatively small effect compared to rotation). The hardness also increases with the number of transformations in the composition as well as the range of parameters for each individual transformation. For example, generating robust UAPs is harder for the composition shown in the first and last row for ILSVRC 2012 in Table 1 compared to the second and third row. The same is true for generating a UAP robust to uniform random noise. Robust ASR (ASR R ). Figure 3 shows performance of UAPs obtained by applying 738 randomly sampled transformations to the original UAPs generated by different methods on ILSVRC, similar graphs for CIFAR-10 can be found in Appendix G. The RobustUAP algorithm outperforms all others in each case, we observe that for these harder transformation sets StandardUAP loses its effectiveness completely. In Table 1 we compare robust universal adversarial success rate ASR R with γ = 0.6, in other words, we are finding the percentage of sampled neighbors of the perturbation that are still UAPs with 60% effectiveness on the testing set. We provide average ASR U scores as well as ASR R for different γ levels in Appendix H. Our RobustUAP algorithm achieves at least 53.4% higher robust ASR when compared to the standard UAP algorithm on both datasets and the challenging transformation sets shown in Table 1 . Furthermore, our RobustUAP algorithm significantly outperforms both robust baseline approaches. Except for the T (2, 2) case which we observe to be the easiest, RobustUAP achieves at least 11.6% performance gain over the baselines. SGD is the best performing baseline and achieves high robust ASR on relatively easier transformation sets performing within 1% of RobustUAP on T (2, 2). On harder transformation sets, this gap widens considerably, see Table 1 . We further visualize UAPs generated with our three robust algorithms on the same transformation set against a standard UAP generated on ILSVRC 2012 in Figure 5 . We observe that UAPs generated by the StandardUAP algorithm resemble those generated by the StandardUAP_RP algorithm. We believe that this is due to the similarity in the workings of both algorithms. However, the two UAPs are not identical. Under our transformation set the center of the image is least likely to be perturbed so we observe StandardUAP_RP algorithm concentrates its budget towards the center. Both the RobustUAP and the SGD algorithm generate larger patterns distributed over the entire image. DATASET TRANSFORMATION SET ST A N D A R D SGD ST A N D A R D RO B U

5.3. ADDITIONAL EXPERIMENTS

In Appendix I we show how our robust UAPs compare to standard UAPs on the non-robust universal ASR metric. In Appendix J, we evaluate our methods on ResNet18 (He et al., 2015) and MobileNet (Howard et al., 2017) for CIFAR-10 and ILSVRC 2012 respectively. The results follow the same trends as those reported in Table 1 . In Appendix H we provide the average ASR U achieved by all the algorithms and also provide ASR R computed with different values of γ for the same transformation sets in Table 1 . Finally, we provide runtimes for all algorithms in Appendix L. 

6. RELATED WORK

In this section, we survey works closely related ours. UAP Algorithms. Most works focusing on UAPs (Moosavi-Dezfooli et al., 2017; Mopuri et al., 2018; Zhang et al., 2020a; Khrulkov & Oseledets, 2018; Li et al., 2020; Akhtar et al., 2018; Hendrik Metzen et al., 2017; Zhang et al., 2020b) generate singular vectors and do not consider perturbation robustness. Bahramali et al. (2021) introduces a perturbation generator model (PGM) for the wireless domain which creates UAPs with random trigger patterns. They show that both adversarial training and noise subtracting defenses used in the wireless domain are highly effective in mitigating the effects of a single vector UAP attack; they further show that their method of generating a set of UAPs is an effective way for an attacker to circumvent these defenses. Although PGM provides a method for efficiently sampling unique UAPs, they do not train to be robust to real-world transformations. In contrast, our method enables efficient sampling of UAPs that are robust to transformations. Robust Adversarial Examples. The following papers introduce notions of robustness under different viewpoints and environmental conditions for constructing realizable adversarial examples. This is a different threat model compared to the additive perturbations discussed in this paper. Luo et al. (2018) constructs adversarial examples which minimize human detectability, further introducing the idea of robustness for adversarial examples. They show that their attacks are robust against jpeg compression. Sharif et al. ( 2016) attack facial recognition systems by putting adversarial perturbations on glass frames. Their work demonstrates a successful physical attack under stable conditions and poses. Eykholt et al. (2018) proposes Robust Physical Perturbations (RP 2 ) in order to show that adding graffiti on a stop sign can cause it to be misclassified in both simulations and in the real world. Athalye et al. (2018) introduce Expectation over Transformation (EOT) and use it to print real-world objects which are adversarial given a range of physical and environmental conditions. Robust Adversarial Perturbations. Li et al. (2019a) generates music which affects a voice assistant based system from picking up its wake word. Li et al. (2019b) presents a method for generating a targeted adversarial sticker which changes an image classifier's classification from one pre-specified class to another. Both of these methods rely on specific use cases and are tailored towards generating adversaries coming from strict distributions, e.g. (Li et al., 2019a) generates guitar music while (Li et al., 2019b) generates a small grid of dots. These works build on algorithms akin to our baseline approaches and are limited in scope to domain specific transformations. Our work provides a framework for improving robustness against a wide range of transformations in diverse domains and can be leveraged for improving the effectiveness of these attacks.

7. CONCLUSION

In this paper, we demonstrate that standard UAPs are highly susceptible to transformations, i.e. they fail to be universally adversarial under transformation. We propose a new method, RobustUAP to generate robust UAPs based upon obtaining probabilistic bounds on UAP robustness across an entire transformation space. Our experiments provide empirical evidence that this principled approach generates UAPs that are practically more robust under a wide range of transformation sets than those from the baseline methods.

APPENDIX A SEMANTIC TRANSFORMATIONS

In this section, we discuss the semantic transformations used in the paper. Brightness and contrast can be represented via bias (β) and gain (α > 0) parameters respectively. Formally, if x is the original image, then the transformed image, x ′ , can be represented as x ′ = αx + β Rotation, scaling, shearing, and translation are all affine transformations acting on the coordinate system, c, of the images instead of the pixel values, x. In order to recover the pixel values and differentiate over the transformation, we will need sub-differentiable interpolation, see Appendix B. For finite dimensions, affine transformations can be represented as a linear coordinate map where the original coordinates are multiplied by an invertible augmented matrix and then translated with additional bias vector. Below, we give the general form for an affine transformation given augmented matrix A, bias matrix b, and input coordinates c. We can compute the output coordinates, c ′ , as c ′ 1 = [ccc|c] A b 0 . . . 0 1 c 1 Below, we give the augmented matrix A and additional bias matrix b for rotation, scaling, shearing, and translation. Rotation, R(θ), by θ degrees: A = cos θ -sin θ sin θ cos θ , b = 0 0 Scaling, Sc(p), by p%: A = 1 + p 100 0 0 1 + p 100 , b = 0 0 Shearing, Sh(m), by shear factor m%: A = 1 1 + m 100 0 1 , b = 0 0 Translation, T (x, y), by x pixels horizontally and y pixels vertically: A = 0 0 0 0 , b = x y

B INTERPOLATION

Affine transformations may change a pixel's integer coordinates into non-integer coordinates. Interpolation is typically used to ensure that the resulting image can be represented on a lattice (integer) pixel grid. For this paper, we will be using bilinear interpolation, a common interpolation method which achieves a good trade-off between accuracy and efficiency in practice and is commonly used in literature (Xiao et al., 2018b; Balunović et al., 2019) . Let x i,j , x ′ i,j represent the pixel value at position i, j for the original and transformed image respectively. Let c ′ x i,j , c ′ y i,j represent the x-coordinate and y-coordinate of the pixel at i, j after transformation. We define our transformed image by summing over all pixels n, m ∈ [1 . . . H] × [1 . . . W ] where H and W represent the height and width of the image. x ′ i,j = H n W m x n,m max(0, 1 -|c ′ x i,j -m|) max(0, 1 -|c ′ y i,j -n|) This interpolation can be computed for each channel in the image. While interpolation is typically not differentiable, in order to generate adversarial examples using standard techniques we need a differentiable version of interpolation. (Jaderberg et al., 2015) introduces differentiable image sampling. Their method works for any interpolation method as long as the (sub-)gradients can be defined with respect to x, c ′ i,j . For bilinear interpolation this becomes, ∂x ′ i,j ∂x n,m = H n W m max(0, 1 -|c ′ x i,j -m|) max(0, 1 -|c ′ y i,j -n|) (18) ∂x ′ i,j ∂c ′ x i,j = H n W m x n,m max(0, 1 -|c ′ y i,j -n|)    1 if m ≥ |c ′ x i,j -m| -1 if m < |c ′ x i,j -m| 0 otherwise C SGD ALGORITHM Our SGD UAP algorithm is based on standard momentum based SGD while optimizing over the objective proposed in 5, the algorithm details can be seen in Algorithm 3. Algorithm 3 Stochastic Gradient Descent UAP Algorithm 1: Initialize u r ← 0, ∆u r ← 0 2: repeat 3: for B ∈ X do 4: Sample t ⊂ T 5: ∆u r ← α∆u r -ν |x|×| t| |x| i=1 | t| j=1 ∇L[f (x i + tj (u r )), f (x i )] 6: Update the perturbation with projection:  for x i ∈ X do 4: if f (x i + u) = f (x i ) then 5: Compute minimal adversarial perturbation: 6: ∆u ← arg min r ||r|| 2 s.t. f (x i + u + r) ̸ = f (x i ) 7: Update the perturbation with projection: end for 11: until ASR U (f, X, u) < γ

E EXPERIMENT PARAMETERS

In our experiments, we have capped all algorithms at 5 epochs or if they have achieved an ASR R of 0.95. The UAPs are trained with the same transformation set that they are evaluated on. For algorithms running PGD internally, we have capped the number of iterations to 40. H AVERAGE ASR U AND ASR R WITH DIFFERENT γ'S

F FURTHER EVALUATION OF UNIFORM NOISE

We provide additional metrics computed on the same set of transformations, datasets, and models as in Table 1 . In Table 3 , we present the Average ASR U rather than ASR R . The average shows us that our RobustUAP algorithm creates UAPs which after transformation on average are better UAPs than all other algorithms. We observe that the average shows us that even standard UAPs aren't completely ineffective after transformation they just have a very low chance of being highly effective. 

DATASET TRANSFORMATION SET ST

A N D A R D SGD ST A N D A R D RO B U S T UAP UAP_RP UAP R

I COMPARISON ON NON-ROBUST UNIVERSAL ASR METRIC

We compare our robust UAPs to standard UAPs on the non-robust universal ASR metric, see Table 5 . All robust UAPs are generated to be robust against R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001). We observe that at the same l 2 -norm all robust UAPs achieve a lower universal ASR than the standard UAP algorithm. This result is not too surprising as solving the optimization problem for robust UAP is significantly more difficult. We further observe that our RobustUAP algorithm is the most effective in comparison to the other robust baseline approaches.

J ADDITIONAL MODELS

We also provide additional data on our methods evaluated on the same transformations and datasets but on different models. In this case, we use ResNet-18 (He et al., 2015) for CIFAR-10 and MobileNet (Howard et al., 2017) for ILSVRC 2012. Results can be seen in Table 6 . We observe similar performance across models suggesting that the performance of the attacks is more directly tied to transformation set and dataset. 

K COMMON CORRUPTIONS

We also evaluate robust UAP against the 2D fog transformations in (Kar et al., 2022) . We set the shift intensity of the fog to be 1 and train our robust UAPs to be robust against random fog perturbations. We observe similar results to the transformations we experiment with above. The graph of the results can be seen in Figure 7 . Previous sections highlight SGD as the most competitive algorithm to RobustUAP in terms of performance. However, in the previous section we note that SGD takes significantly less time to run. In this section, we investigate how RobustUAP performs with limited compute time as well as how SGD performs with increased runtime. We first add results to the ILSVRC 2012 part of Table 1 by also computing RobustUAP performance when limited to the same amount of time that SGD takes. Table 8 shows that RobustUAP outperforms SGD even when its compute time is limited with up to 9% more robustness on our most challenging transformation R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001). 



Figure 1: Robust UAPs (left) cause a classier to misclassify on most of the data distribution even after transformations are applied on them. Standard UAPs (right) are not robust to transformations and have a low probability of remaining UAPs after transformation.

9:∆u r = P p,ϵ (∆u r + αsign(∇L B,τ ))10: until EstimateRobustness(f, B, T, γ, u r + ∆u r , ψ, ϕ) < ζ 11:Update the perturbation with projection: u r ← P p,ϵ (u r + ∆u r )

Figure2: For each method, a point (x, y) in the corresponding line represents the percentage of sampled UAPs (y%) with Universal ASR > x for U (0.1) and U (0.3) on ILSVRC and CIFAR-10.

Figure 3: For each method, a point (x, y) in the corresponding line represents the percentage of sampled UAPs (y%) with Universal ASR > x for the different semantic transformations on ILSVRC.

Visualization. We visualize UAPs generated with RobustUAP and StandardUAP transformed with random transformations from R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001) and added to images in ILSVRC 2012 in Figure4. Our robust UAPs have a similar level of imperceptibility to standard UAPs and do not affect the semantic content of the images. Robust UAPs affect the model classification after transformation with high probability, unlike standard UAPs.

Figure 4: Examples of perturbed images with labels. The top row is unperturbed ILSVRC 2012 test set images, the second row has a randomly transformed robust UAP added to it, and the bottom row has a randomly transformed standard UAP added to it. Labels calculated using Inception-v3.

Figure 5: Comparison of UAPs generated with (a) StandardUAP, (b) RobustUAP, (c) StandardUAP_RP, and (d) RobustUAP on ILSVRC 2012.

ASR R (f, X, T, γ, u r ) < ζ D ITERATIVE UAP ALGORITHM Moosavi-Dezfooli et al. (2017) introduces an iterative UAP algorithm, the algorithm can be seen in Algorithm 4. Algorithm 4 Iterative Universal Perturbation Algorithm (Moosavi-Dezfooli et al. (2017)) 1: Initialize u ← 0 2: repeat 3:

Figure 7: For each method, a point (x, y) in the corresponding line represents the percentage of sampled UAPs (y%) with Universal ASR > x for the different semantic transformations on ILSVRC-2012.L ALGORITHM RUNTIMESWe compare the average runtimes of the different methods on one of our most challenging R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001) transformation set on ILSVRC-2012 and n = 738. The results are in Table7. We observe that RobustUAP is the slowest algorithm and SGD is the fastest. RobustUAP uses EstimateRobustness in each loop and thus with high n it requires much more time to compute. The extra computation enables Robust UAP to obtain better robustness than all baselines. On the same set of transformations and dataset we observe that one iteration of EstimateRobustness on the entire test set takes on average 19 minutes. When running EstimateRobustness in the RobustUAP loop, each call takes 36 seconds for a batch size of 32.

Robust ASR of RobustUAP compared to the three baselines.

Robust ASR with uniform random noise, γ = 0.8.

Average Universal ASR of our Robust UAP algorithms and the standard UAP(Moosavi- Dezfooli et al., 2017) method.

Robust ASR of our Robust UAP algorithms and the standard UAP (Moosavi-Dezfooli et al., 2017) method with γ = [0.5, 0.7].

Universal ASR of our Robust UAP algorithms and the standard UAP method.

Robust ASR on Resnet-18 for CIFAR-10 and MobileNet for ILSVRC 2012.

We observe that RobustUAP is the slowest algorithm and SGD is the fastest. RobustUAP uses EstimateRobustness in each loop and thus with high n it requires much more time to compute. The extra computation enables Robust UAP to obtain better robustness than all baselines. On the same set of transformations and dataset we observe that one iteration of EstimateRobustness on the entire test set takes on average 19 minutes. When running EstimateRobustness in the RobustUAP loop, each call takes 36 seconds for a batch size of 32.

Average Runtime for Robust UAP algorithms M EFFECT OF COMPUTE TIME ON ROBUSTNESS

Robust ASR of RobustUAP restricted to the same amount of compute time as SGD.

annex

Next, we vary the number of SGD iterations. We compute the robust ASR on ILSVRC for robustness against R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001). Figure 8 , shows the robust ASR achieved by SGD over time, here we observe that SGD's performance flatlines after a small number of iterations and seems to be unable to surpass about 65. Here SGD is allowed to continue to run past where it would usually stop (at around 250 iterations), in this experiment we allow it to go to 1250 iterations which is about the same amount of time that RobustUAP takes to run. RobustUAP is able to achieve a performance of 72 even when restricted to the amount of compute time of base SGD (It achieves 86.4 when unrestricted). These two results in combination show that RobustUAP is able to find more robust UAPs than SGD whose performance stabilizes. In this section, we measure the effectiveness of our robust UAPs against hold-out In this experiment, we learn a UAP which is robust to R(5) and obtain a robust ASR of 96.2 at γ = 0.6. We then measure its effectiveness against Sc(5) and get a robust ASR of 85.4, in contrast, a robust UAP trained directly to be robust to Sc(5) obtains robust ASR of 98.1. Next, we measure the robustness of UAP trained against R(5) when subjected to transformations from B(5, 0.01). Here we get a robust ASR of 97.3, whereas a robust UAP trained to be robust to B(5, 0.01) obtains a robust ASR of 99.2. Finally, we test the robust UAP on R(5), Sc(5), B(5, 0.01) and get a robust ASR of 83.1. Our previous results show that a UAP trained to be robust against these parameters directly can obtain a robust ASR of 96.1. In each case, our UAP maintains robustness on hold-out transformations but has lower performance compared to robust UAPs trained directly to be robust to those transformations.

O TARGETED ATTACK

So far in this paper we have focused on untargeted attacks, i.e. attacks which aim to degrade the general performance of the model. Targeted attacks are also possible with both standard adversarial attack methods and universal adversarial perturbation methods. Here, we can simply turn our algorithm from untargeted to targeted by replacing the loss function. We would like to have target class, A, be classified as target class, B. Instead of maximizing the expected value of the cross entropy loss we can instead formulate the loss based on maximizing B while minimizing A similar to (Benz et al., 2020) .For ILSVRC 2012, we randomly select a couple of target classes and perform this attack, for each of these cases, we train our robust UAP to be robust to R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001).Table 9 shows our results for robust ASR with γ = 0.6. We are measuring our robust ASR of turning class A into class B and observe similar results with RobustUAP being the most robust followed by SGD. It is also interesting to note that different random combinations lead to more or less success, i.e. it is easier to turn a dog into another dog than perfume into a padlock. Table 9 : Robust ASR of RobustUAP for to target attack compared to the three baselines with γ = 0.6.

P DATA EFFICIENCY

In this section, we will evaluate the data efficiency of RobustUAP. We use RobustUAP to generate UAPs robust to R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001) on ILSVRC-2012 with differing amounts of training data. The results can be seen in Figure 9 . These results show that the algorithm is able to achieve good performance at 500 data points but continues to improve up to 4000 data points. After that it seems to stagnate. In this section, we will evaluate the transferability of RobustUAP. Previous works on UAPs (Moosavi-Dezfooli et al., 2017) show that UAPs are transferable across different models.Here, we will evaluate whether robust UAPs exhibit the same behavior for robustness. The robust UAPs studied here are generated with RobustUAP on R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001) for ILSVRC-2012 with γ = 0.6. We use a variety of models: Inception-v3 (Szegedy et al., 2016) , Mo-bileNet (Howard et al., 2017) , Inception-v3 trained to be robust on R(20) (InceptionR20), Inception-v3 trained to be robust on horizontal flips (InceptionHF), and ViT (Dosovitskiy et al., 2020) . Table 10 shows us that our robust UAPs are transferable between different architectures. Our results show that robust UAPs transfer their robustness properties between architectures and models. Ignoring ViT, on all of the Inception and MobileNet models, the generated UAPs maintain at least 65% robust ASR when transferred to each other. This transfer is less but still significant for ViT where it maintains at least 32% robustness when transferred to or from the other models. 

R TRANSFORMER-BASED MODELS

Recently, transformers have become popular as a new architecture for deep learning models for computer vision tasks. In this section, we evaluate the effectiveness of robust UAPs against one such model, ViT (Dosovitskiy et al., 2020) . Benz et al. (2021) has shown that standard UAPs are still effective against transformer based architectures. In Table 11 we can see that we get similar results compared to our results on Inception and MobileNet. This shows that our methods work against transformer based models as well. 

S ROBUST UAPS AGAINST ROBUSTLY TRAINED NETWORKS

In this section, we are interested in seeing whether training networks to be robust against the same transformations that the UAP is trying to be robust against is helpful. For this, we trained two new Inception-v3 networks. Because of time limitations, we started with our base Inception-v3 network and fine-tuned it using data augmentations. For the first network InceptionR20, we augmented the data by adding random rotations within 20 degrees. For the second network InceptionHF, we augmented the data by adding horizontal flips. We then crafted UAPs robust against rotations and flips on InceptionR20 and InceptionHF respectively. The results can be seen in Table 12 . We can compare the R(20) results to those from our normal inception network. We postulate that since the network has received some additional robustness training it is harder to attack, and thus we should see slightly lower robustness scores. However, it seems that training the network to be robust to R(20) does not significantly effect the ability to create robust UAPs. The horizontal flips seems like it might be too easy of a transformation as even standard UAP performs quite well for robust ASR. 

T ABLATION ON OPTIMIZATION STRATEGY

In this section, we study the effect of using different optimizers in addition to SGD. We use a variety of standard PyTorch optimizers, Adam, Adamax, Adagrad, and RMSProp. We formulate the optimization problem in the same way but instead use these algorithms in order to optimize our perturbation. We compute these results on ILSVRC-2012 with Inception-v3 and use R(10), T (2, 2), Sh(2), Sc(2), B(2, 0.001) as the transformation set and with γ = 0.6. The results can be seen in Table 13 . We see that the optimization strategy has some affect on the results and that SGD performs the best. We also found that SGD performed marginally faster than the rest of the approaches. 

