ROBUSTNESS AGAINST RELATIONAL ADVERSARY

Abstract

Test-time adversarial attacks have posed serious challenges to the robustness of machine-learning models, and in many settings the adversarial perturbation need not be bounded by small � p -norms. Motivated by the semantics-preserving attacks in vision and security domain, we investigate relational adversaries, a broad class of attackers who create adversarial examples that are in a reflexive-transitive closure of a logical relation. We analyze the conditions for robustness and propose normalize-and-predict -a learning framework with provable robustness guarantee. We compare our approach with adversarial training and derive an unified framework that provides benefits of both approaches. Guided by our theoretical findings, we apply our framework to image classification and malware detection. Results of both tasks show that attacks using relational adversaries frequently fool existing models, but our unified framework can significantly enhance their robustness.

1. INTRODUCTION

The robustness of machine learning (ML) systems has been challenged by test-time attacks using adversarial examples (Szegedy et al., 2013) . These adversarial examples are intentionally manipulated inputs that preserve the essential characteristics of the original inputs, and thus are expected to have the same test outcome as the originals by human standard; yet they severely affect the performance of many ML models across different domains (Moosavi-Dezfooli et al., 2016; Eykholt et al., 2018; Qin et al., 2019) . As models in high-stake domains such as system security are also undermined by attacks (Grosse et al., 2017; Rosenberg et al., 2018; Hu & Tan, 2018) , robust ML in adversarial test environment becomes an imperative task for the ML community. Existing work on test-time attacks predominately considers � p -norm bounded adversarial manipulation (Goodfellow et al., 2014; Carlini & Wagner, 2017) . However, in many security-critical settings, the adversarial examples need not respect the � p -norm constraint as long as they preserve the malicious semantics. In malware detection, for example, a malware author can implement the same function using different APIs, or bind a malware within benign softwares like video games or office tools. The modified malware preserves the malicious functionality despite the drastically different syntactic features. Hence, focusing on adversarial examples of small � p -norm in this setting will fail to address a sizable attack surface that attackers can exploit to evade detectors. In addition to security threats, another rising concern on ML models is the spurious correlations they could have learned in a biased data set. Ribeiro et al. (2016) show that a highly accurate wolf-vshusky-dog classifier indeed bases its prediction on the presence/absence of snow in the background. A reliable model, in contrast, should be robust to changes of this nature. Although dubbed as semantic perturbation or manipulation (Mohapatra et al., 2020; Bhattad et al., 2019) , these changes do not alter the core of the semantics of input data, thus, we still consider them to be semantics-preserving pertaining to the classification task. Since such semantics-preserving changes often resulted in large � p -norms, they are likely to render the existing � p -norm based defenses ineffective. In this paper, we consider a general attack framework in which attackers create adversarial examples by transforming the original inputs via a set of rules in a semantics-preserving manner. Unlike the prior works (Rosenberg et al., 2018; Hu & Tan, 2018; Hosseini et al., 2017; Hosseini & Poovendran, 2018) which investigate specific adversarial settings, our paper extends the scope of attacks to general logical transformation: we unify the threat models into a powerful relational adversary, which can readily incorporate more complex input transformations. From the defense perspective, recent work has started to look beyond � p -norm constraints, including adversarial training (Grosse et al., 2017; Rosenberg et al., 2019; Lei et al., 2019) , verificationloss regularization (Huang et al., 2019) and invariance-induced regularization (Yang et al., 2019) . Adversarial training in principle can achieve high robust accuracy when the adversarial example in the training loop maximizes the loss. However, finding such adversarial examples is in general NPhard (Katz et al., 2017) , and we show in Sec 4 that it is even PSPACE-hard for semantics-preserving attacks that are considered in this paper. Huang et al. (2019) and Yang et al. (2019) add regularizers that incorporate model robustness as part of the training objective. However, such regularization can not be strictly enforced in training, and neither can the model robustness. These limitations still cause vulnerability to semantics-preserving attacks. Normalize-and-Predict Learning Framework This paper attempts to overcome the limitations of prior work by introducing a learning framework that guarantees robustness by design. In particular, we target a relational adversary, whose admissible manipulation is specified by a logical relation. A logical relation is a set of input pairs, each of which consists of a source and target of an atomic, semantics-preserving transformation. We consider a strong adversary who can apply an arbitrary number of transformations. Our paper makes the following contribution towards the theoretical understanding of robust ML against relational adversaries: 1. We formally describe admissible adversarial manipulation using logical relations, and characterize the necessary and sufficient conditions for robustness to relational adversaries. 2. We propose normalize-and-predict (hereinafter abbreviated as N&P), a learning framework that first converts each data input to a well-defined and unique normal form and then trains and classifies over the normalized inputs. We show that our framework has guaranteed robustness, and characterize conditions to different levels of robustness-accuracy trade-off. 3. We compare N&P to the popular adversarial training framework, which directly optimizes for accuracy under attacks. We show that N&P has the advantage in terms of explicit robustness guarantee and reduced training complexity, and in certain cases yields the same model accuracy as adversarial training. Motivated by the comparison, we propose a unified framework, which selectively normalizes over relations that tend to preserve the model accuracy and adversarially trains over the rest. Our unified approach gets the benefits from both frameworks. We then apply our theoretical findings to malware detection and image classification. For the former, first, we formulate two types of common program transformation -(1) addition of redundant libraries and API calls, and (2) substitution of equivalent API calls -as logical relations. Next, we instantiate our learning framework to these relations, and propose two generic relational adversarial attacks to determine the robustness of a model. Finally, we perform experiments over Sleipnir, a real-world WIN32 malware data set. Regarding image classification, we reused an attack method proposed by the prior work (Hosseini & Poovendran, 2018) -shifting of the hue in the HSV color space -that can be deemed as a specific instantiation of our attack framework. We then compare the accuracy and robustness of ResNet-32 (He et al., 2016) , a common image classification model, trained with the unified framework against the standard adversarial training on CIFAR-10 ( Krizhevsky et al., 2009) . The results we obtained in both tasks show that: 1. Attacks using addition and substitution suffice to evade existing ML malware detectors. 2. Our unified approach using input normalization and adversarial training achieves highest robust accuracy among all baselines in malware detection. The drop in accuracy on clean inputs is small and the computation cost is lower than pure adversarial training. 3. When trained with the unified learning framework, ResNet-32 achieves similar clean accuracy but significantly higher robust accuracy than adversarial training alone. Finally, based on our theoretical and empirical results, we conclude that input normalization is vital to robust learning against relational adversaries. We believe techniques that can improve the quality of normalization are promising directions for future work.

2. RELATED WORK.

Test-time attacks using adversarial examples have been extensively studied in the past several years. Research has shown ML models are vulnerable to such attack in a variety of application domains (Moosavi-Dezfooli et al., 2016; Chen et al., 2017; Papernot et al., 2017; Eykholt et al., 2018; Ebrahimi et al., 2018; Qin et al., 2019; Yang et al., 2020) including system security where reliable defense is absolutely essential. For instance, Grosse et al. (2017) 2017) use feature squeezing, which quantizes the feature values in order to reduce the number of adversarial choices. However, their defense is for � p -norm adversaries and thus inapplicable for relational attacks. Normalization is a technique to reduce the number of syntactically distinct instances. First introduced to network security in the early 2000s in the context of intrusion detection systems (Handley et al., 2001) , it was later applied to malware detection (Christodorescu et al., 2007; Coogan et al., 2011; Bichsel et al., 2016; Salem & Banescu, 2016; Baumann et al., 2017) . Our work addresses the open question whether normalization is useful for ML under relational adversary by investigating its impact on both model robustness and accuracy.

3. BACKGROUND

In this section, we first describe the learning task, then formalize the potential adversarial manipulation as logical relations, and eventually derive the notion of robustness to relational adversaries. Learning Task. We consider a data distribution D over a input space X and categorical label space Y. We use bold face letters, e.g. x, for input vectors and y for the label. Given a hypothesis class H, the learner wants to learn a classifier f : X → Y in H that minimizes the risk over the data distribution. In non-adversarial settings, the learner solves min f ∈H E(x,y)∼D �(f, x, y), where � is a loss function. For classification, �(f, x, y) = 1(f (x) � = y). Logical Relation. A relation R is a set of input pairs, where each pair (x, z) specifies a transformation of input x to output z. We write x → R z iff (x, z) ∈ R. We write x → * R z iff x = z or there exists z 0 , z 1 , • • • , z k (k > 0) such that x = z 0 , z i → R z i+1 (0 ≤ i < k) and z k = z. In other words, → * R is the reflexive-transitive closure of → R . We describe an example relation as follows: Example 1 (Hue Shifting). Let x h , x s , x v denote the hue, saturation and value components of an image x. In a hue shifting relation R, x → R z iff z h = (x h +δ) % 1 where δ is a scalar, z s = x s , z v = x v . Since x h changes in a circle, i.e., hue of 1 is equal to hue of 0. Hence, we compute the modulo of the hue component with 1 to map z h within [0,1] (Appendix B gives the background of HSV). In this paper, we also consider unions of relations. Notice that a finite union  R of m relations R 1 , • • • , R m is also a relation, and x → R z iff x → Ri z for any i ∈ {1, • • • , m}. � (x,y)∈D �(f, x, y) min f � (x,y)∈D �(f, N (x), y) min f max A(•) � (x,y)∈D �(f, A(x), y) Test f * (x) f * (N (x)) f * (x) Threat Model. A test-time adversary replaces a clean test input x with an adversarially manipulated input A(x), where A(•) represents the attack algorithm. We consider an adversary who wants to maximize the classification error rate: E(x,y)∼D 1(f (A(x)) � = y). We assume white-box attacksfoot_0 , i.e. the adversary has total access to f , including its structures, model parameters and any defense mechanism in place. To maintain the malicious semantics, the adversarial input A(x) should belong to a feasible set T (x). In this paper, we focus on T (x) that is described by relation. We consider a logical relation R that is known to both the learner and the adversary, and we define a relational adversary as the following. Definition 1 (relational adversary). An adversary is said to be R-relational if T (x) = {z | x → * R z}, i.e. each element in R represents an admissible transformation, and the adversary can apply arbitrary number of transformation specified by R. We can then define the robustness of a classifier f by how often its prediction is consistent under attack, and robust accuracy as the fraction of predictions that are both robust and accurate. Definition 2 (Robustness and robust accuracy). Let Q(R, f, x) be the following statement: ∀ z((x → * R z) ⇒ f (x) = f (z)). Then, a classifier f is robust at x if Q(R, f, x ) is true, and the robustness of f to an R-relational adversary is: E x∼D X 1 Q(R,f,x) , where 1 (•) indicates the truth value of a statement and D X is the marginal distribution over inputs. The robust accuracy of f w.r.t. an R-relational adversary is then: E(x,y)∼D 1 Q(R,f,x)∧f (x)=y . Notice that the robust accuracy of a classifier is no more than the robustness in value because of the extra requirement of f (x) = y. Meanwhile, a classifier with the highest robustness accuracy may not always have the highest robustness and vice versa: an intuitive example is that a constant classifier is always robust but not necessarily robustly accurate. In Sec 4, we will discuss both objectives and characterize the trade-off between them.

4. N&P -A PROVABLY ROBUST LEARNING FRAMEWORK

In this section, we introduce N&P, a learning framework which learns and predicts over normalized training and test inputs. We first identify the necessary and sufficient condition for robustness, and propose a normalization procedure that makes N&P provably robust to R-relational adversaries. Finally, we analyze the performance of N&P: since N&P guarantees robustness, the analysis will focus on robustness-accuracy trade-off and provide an in-depth understanding to causes of such trade-off.

4.1. AN OVERVIEW OF THE N&P FRAMEWORK

In N&P, the learner first specifies a normalizer N : X → X . We call N (x) the 'normal form' of input x. The learner then both trains the classifier and predicts the test label over the normal forms instead of the original inputs. Let D denote the training set. In the empirical risk minimization learning scheme, for example, the learner will now solve the following problem min f ∈H � (x,y)∈D �(f, N (x), y), and use the minimizer f * as the classifier. During test-time, the model will predict f * (N (x)). Table 1 compares the N&P learning pipeline to normal risk minimization and adversarial training. 

4.2. FINDING THE NORMALIZER

The normalizer N is crucial for achieving robustness: intuitively, if x and its adversarial example x adv share the same normal form, then the prediction will be robust. Meanwhile, a constant N is robust, but has no utility as f (N (•)) is also constant. Therefore, we seek an N that perform only the necessary normalization for robustness and has minimal impact on accuracy. We first construct the relational graph G R = {V, E} of R: the vertex set V contains all elements in X ; the edge set E contains an edge (x, z) iff (x, z) ∈ R. Then, a directed path exists from x to z iff x → * R z. We derive the following necessary and sufficient condition for robustness under N&P in Observation 1, and thus obtain a normalizer N in Proposition 1 that guarantees robustness. Observation 1 (Condition for Robustness). Let C 1 , • • • , C k denote the weakly connected components (WCC) in G R . A classifier f is robust for all x ∈ C i iff f (x) returns the same label for all x ∈ C i . Proposition 1 (Choice of Normalizer). Let N be a function that maps an input x ∈ C i to any deterministic element in C i . Then f (N (•)) is robust to R-relational adversaries.foot_2 

4.3. ROBUSTNESS-ACCURACY TRADE-OFF

Optimal Accuracy under N&P. Let µ(x) denote the probability mass of x. The label of an input x may also be probabilistic in nature, therefore we use η(x, l) = Pr(y = l| x) to denote the probability that x has label l. 3 Then the optimal robust accuracy using N&P, denoted by Acc * R , is � Ci max l∈Y � x∈Ci µ(x)η(x, l ), which happens when f (N (x)) = arg max l∈Y � x∈Ci µ(x)η(x, l) for x ∈ C i . Intuitively, f shall assign the most likely label of random samples in C i to all x ∈ C i . Price of Robustness. In N&P, the optimal robust accuracy depends on R. We then observe the following fundamental robustness-accuracy trade-off: as the relation becomes more complicated, we may lose accuracy for enforcing invariant model predictions, and such loss is the price of robustness. Observation 2 (Robustness-accuracy trade-off). Let R � and R be two relations s.t. R � = R � {(x, z)}, i.e. R � allows an extra transformation from x to z than R. Let C x,R denote the WCC in G R that contains x, and l C be the most likely label of inputs in a WCC C. Then Acc * R � -Acc * R ≤ 0 for all R, R � pairs, and the equality only holds when l C x,R = l C z,R . The intuition is that the extra edge on the relation graph may join two connected components which are otherwise separate. As a result, a model under N&P will predict the same label for the two components, thus the accuracy on one component will drop if two components have different labels. We further characterize three different levels of trade-offs (Figure 1 ). First, if two inputs x, z have the same most likely label on D, then the optimal accuracy under N&P is the same as before normalization, in other words, robustness is obtained for free. Second, if both (x, z) and (z, x) are in R but x, z have different most likely labels, then the model with the highest natural accuracy, which predicts the most likely label of x and z respectively, do not have any robustness. In contrast, N&P achieves the optimal robust accuracy by predicting a single label -the most likely label of samples in {x, z}for both x and z. Third, if x can only be one-way transformed to two inputs z 1 , z 2 with different most likely labels, then N&P may have suboptimal robust accuracy. An absolutely robust classifier need to predict the same label for x, z 1 and z 2 , while the classifier with the highest robust accuracy should predict the mostly likely labels for z 1 and z 2 if z 1 , z 2 appear more frequently than x.

5. COMPARING AND UNIFYING N&P WITH ADVERSARIAL TRAINING

N&P differs from the adversarial training -the most widely acknowledged defense mechanism against test-time adversary -in its objective and procedure. While each approach has its own limitation against relational adversaries, we show that they can complement each other and be unified into one framework that enjoys the benefits from both worlds. Comparative Advantages. The performance of adversarial training depends on the quality of the adversarial examples. However, we show in Proposition 2 that the inner maximization problem is in general computationally infeasible for relational adversaries. Proposition 2 (Hardness of Inner Maximization). The inner optimization problem of adversarial training is PSPACE-hard for relational adversaries. Intuitively, the search space of a relational adversary can grow combinatorially with the number of transformations, and the proposition follows the classical results of reachability analysis in model checking (Kozen, 1977) . The N&P framework, in contrast, solves a typical minimization problem, and thus reduces the computation complexity if an efficient normalizer exists. Meanwhile, we show in Appendix A.4 that robust accuracy can be achieved with a simpler model class on normalized inputs than on original inputs; reduced model complexity may also improve the sample efficiency of the underlying learning algorithm. On the other hand, N&P may incur excessive loss in accuracy to enforce robustness, for example, the last scenario in Figure 1 , in which case, adversarial training will be a better choice for overall utility. A Unified Framework. Motivated by the above observations, we propose a unified framework: for a relation R, we strategically select a subset R � ⊂ R to normalize inputs, and adversarially train on the normalized inputs. Let N R � denote the normalizer for R � . Formally, the learner solves min f ∈H max A(•) � (x,y)∈D � (f, A (N R � (x)) , y) , during training to obtain a minimizer f * , and predicts f * (N R � (x)) at test-time. The classifier f * will be robust to R � -relational adversary, and have potentially higher robust accuracy than using N&P alone. In particular, if R � is reversible by Definition 3, then our unified framework preserves the optimal robust accuracy as shown in Theorem 1. Definition 3. A relation R � is reversible iff x → R � * z implies z → R � * x and vice versa. Theorem 1 (Preservation of robust accuracy). Let f * be the classifier that minimizes the objective of our unified framework over data distribution D, and let f * adv minimize the objective of adversarial training over D. Then, in principle, f * (N R � (•)) and f * adv have the same optimal robust accuracy if R � is reversible. The proof can be found in Appendix A.5. In essence, Theorem 1 is a generalization of the second scenario in Figure 1 , in particular, we extend the same principle applied to (x, z) to all possible pairs of inputs in the relational graph induced by R � . Note that reversible relation is also common: if z is x's adversarial example, then x is also likely to be an adversarial choice of z. Observation 2 and Theorem 1 provide a general guideline for selecting R � : choose the reversible subset of R first, and then consider transformations that cause little drop in optimal robust accuracy. Regarding the efficiency of normalization, we show in Appendix A.6 that the strongest adversarial example satisfies the requirment of Proposition 1, and thus can be used as the normal form. Therefore, in theory, N&P is at least as efficient as the optimal adversarial training. In practice, the normalizer we use in our empirical evaluation are all more efficient than adversarial training.

6. EXPERIMENT

We now evaluate the effectiveness of our unified framework against relational attacks. In particular, we seek answers to the following questions: 1. Do relational attacks pose real threats to existing ML models? 2. How effective is our unified framework in enhancing robustness, and do the results corroborate with the theory? We investigate these aspects over two real world tasks -malware detection and image classification. For each task, we identify relations that do not alter the essential semantics of the inputs. Our result shows that the models obtained from our unified framework has the highest robust accuracy compared to adversarially trained models and unprotected models.

6.1. MALWARE DETECTION

We evaluate a malware detection task on Sleipnir, a data set containing Windows binary API usage features of 34,995 malware and 19,696 benign software, extracted from their Portable Executable (PE) files using LIEF (Thomas, 2017) . The detection is exclusively based on the API usage of a malware. There are 22,761 unique API calls in the data set, so each PE file is represented by a binary indicator vector x ∈ {0, 1} m , where m = 22, 761. Note that this is the same encoding scheme adopted by Al-Dujaili et al. (2018) . We sample 19,000 benign PEs and 19,000 malicious PEs to construct the training (60%), validation (20%), and test (20%) sets. Existing � p norm based attacks are not applicable for relational adversaries. Meanwhile, exhaustive search over adversarial choices may be computationally prohibitive. Therefore, we propose two heuristic attack algorithms -GREEDYBYGROUP and GREEDYBYGRAD -to validate models' robust accuracy. Both algorithms are greedy and iterative in nature. Detailed algorithm descriptions are in Appendix C.2. GREEDYBYGROUP takes a test input vector x and a maximum number of iterations K. In each iteration, it partitions R into subsets of relations R 1 , • • • , R m , and finds the instance within the transitive closure of each R i that maximizes the loss. These instances from all R i s are combined to create the new version of x adv . Notice the attack reduces to exact search if R is not partitioned. GREEDYBYGRAD takes a test input vector x, a maximum number m of transformation to apply in each iteration, and a maximum number of iteration K. In each iteration, it makes a first-order approximation of the change in test loss caused by each transformation, and then applies the transformations with top m approximated increases in test loss to create the new version of x adv . Relation and Attacks. The goal of an adversary is to evade a malware detector. A common strategy that (Al-Dujaili et al., 2018) also adopts is adding redundant API calls. This strategy can be described by an additive relation: (x, z) ∈ R iff z is obtained by flipping some x's feature values from 0 to 1. We also consider a new attacking strategy, which substitutes API calls with functionally equivalent counterparts. This strategy can be described by an equivalence relation: (x, z) ∈ R iff z is obtained by changing some of x's feature values from 1 to 0 in conjunction with some of x's other feature values changed from 0 to 1. With expert knowledge, we extract nearly 2,000 equivalent API groups described in Appendix C.3. We use three attack algorithms -GREEDYBYGRAD, GREEDYBYGROUP and the rfgsm_k additive attack presented by Al-Dujaili et al. (2018) -and consider the attack to be successful if any algorithm fools the detector. Model and Baselines. We compare four ML detectors. The Unified detector is realized using our unified framework in Sec 5: we normalize over the equivalence relation based on the functionally equivalent API groups, and then adversarially trains over the additive relation. The Adv-Trained detector is adversarially trained with the best adversarial example generated using GREEDYBYGRAD and the rfgsm_k additive attack (Al-Dujaili et al., 2018) as GREEDYBYGROUP is too computationally expensive to be included in the training loop. We also include the model proposed by Al-Dujaili et al. (2018) , which is adversarially trained against only the rfgsm_k additive attack, and a Natural Last, all detectors using robust learning techniques have higher FPR compared to Natural, which is expected because of the inevitable robustness-accuracy trade-off. However, the difference is much smaller compared to the cost due to attacks, and thus the trade-off is worthwhile.

6.2. IMAGE CLASSIFICATION

We evaluate the effectiveness of our unified framework on CIFAR10 containing 50,000 training and 10,000 test images of size 32x32 pixels. We randomly sample 5,000 images for validation and train on the remaining 45,000 images. Relation and Attacks. We consider a relation induced by hue shifting specified in Example 1. Due to the shape bias property (Landau et al., 1988) , humans can still correctly classify most images after the adjustment of color hue. Therefore, we consider this relation to be semantics-preserving. The attacker uses a combination of � ∞ and relational attacks: it first shifts the color hue of the image, and then generates � ∞ adversarial example using PGD attack. For each image, the attacker tries different hue adjustments, which evenly split the hue space. In addition, we consider an attacker that can also adjust the brightness and contrast of the image by a factor in [0.8, 1.2]. It tries 500 random combination of hue, brightness and contrast adjustments followed by PGD attack. Model and Baselines. The Unified classifier is obtained with our unified framework in Sec 5: we adjust hue of the input such that the pixel at the top-left corner has hue value 1, and then adversarially train against the PGD attack. 4 We also consider two adversarial training baselines: the first uses the combined attack (PGD and hue adjustment) in training, while the second only uses the PGD attack. We train a ResNet32 network for 100 epochs in all configurations, and pick the model with the lowest validation loss. We also run five different data splits. Results. Table 3 shows the results against attackers using hue-shift and � ∞ perturbation. Although adversarial training against only the PGD attack has higher clean input accuracy, the combined attack heavily reduces its test accuracy, indicating again the effectiveness of simple relational attack to unprotected models. Unified achieves the highest robust accuracy against the combined attack -≥ 4.8% higher compared to adversarial training with the combined attack over all attack parameters. This result shows the advantage of normalization over reversible relations, as projected by our analysis in Sec 5. In addition, Table 4 shows the results against an attacker using more transformations than the ones normalized in training. Our unified approach still achieves the highest accuracy with a substantial margin over the baselines. Although the attacker may use more transformations, normalization can still reduce the search space of adversarial examples and increase robustness.

7. CONCLUSION AND FUTURE WORK

In this work, we set the first step towards robust learning against relational adversaries: we theoretically characterize the conditions for robustness and the sources of robustness-accuracy trade-off, and propose a provably robust learning framework. Our empirical evaluation shows that a combination of input normalization and adversarial training can significantly enhance model robustness. For future work, we see automatic detection of semantics-preserving transformation as a promising addition to our current expert knowledge approach, and plan to extend the normalization approach to deal with other kinds of attacks beyond relational adversaries.



We consider a strong white-box attacker to avoid interference from security by obscurity, which is shown fragile in various other adversarial settings(Carlini & Wagner, 2017). Appendix C.1 shows a decidable algorithm of realizing such an N given GR. For example, a ransomware and a zip tool may have the same static feature vector x. The label of a randomly drawn x is probabilistic, and the probability depends on the frequency that each software appears. Given an input image in RGB format, we first convert the image to HSV format, and then add a scalar to the hue of all pixels. The scalar is determined by 1 -(the hue of the pixel on the first row and first column). The hue values are then projected back to the [0,1] interval by taking the remainder over 1. Finally, we convert the image back to RGB for classification.



Figure. 1: Relations with different robustness-accuracy trade-off. Different node colors indicate different most likely labels. Appendix A.7 gives a detailed explanation on why semantics-preserving transformation can still change the labels of data. Left: N&P preserves natural accuracy; Middle: N&P preserves robust accuracy; Right: N&P causes suboptimal robust accuracy: suppose µ(x) = 0.02, µ(z 1 ) = µ(z 2 ) = 0.49, and η is deterministic. N&P predict the same label and thus has accuracy at most 0.49, while the highest robust accuracy is 0.98 by predicting the true label for z 1 and z 2 .

On the defense end, the work closest to ours in spirit isYang et al. (2019), which adds invarianceinduced regularizers to the training process. Their work however differs from ours in two major ways. First, their work considers a specific spatial transformation attack in image classification; our work considers a general adversary based on logic relations. Second, their regularizer may not enforce the model robustness on finite samples as they are primarily interested in enhancing the model accuracy.

Comparison of training objective and test output for standard risk minimization learning scheme, N&P and adversarial training; f * is the minimizer of the training objective.

Malware Detection: False Negative Rate (FNR) and False Positive Rate (FPR) on Sleipnir.

Image Classification: Classification accuracy on CIFAR10, same relation in training and testing. The first column specifies the attack parameters used in test-time. The parameters are in the form of (� ∞ -norm, PGD step size, PGD steps, number of hue-shifts). The models are adversarially trained using (4/255, 2/255, 3, 20).

Image Classification: Classification accuracy on CIFAR10, relation in training is a subset of relation in testing. The attacker uses a 15-step PGD attack with � ∞ -norm 4/255 and step size 2/255, and randomly samples 500 combinations of hue, brightness and constrast adjustment factors. We use the same network architecture as Al-Dujaili et al. (2018), a fullyconnected neural net with three hidden layers, each with 300 ReLU nodes, to set up a fair comparison. We train each baseline to minimize the negative log-likelihood loss for 20 epochs, and pick the model with the lowest validation loss. We run five different data splits.Results. As Table2shows, relational attacks are overwhelmingly effective to detectors that are oblivious to potential transformations. Adversarial examples almost always (>99% FNR) evade the naturally trained model, and also evade the detector inAl-Dujaili et al. (2018) most of the time (>89% FNR) as it does not consider API substitution. On the defense end, Unified achieves the highest robust accuracy: the evasion rate (FNR) only increases by 0.5% on average. Adv-Trained comes second but the evasion rate is still 22.1% higher. The evasion is mostly caused by GREEDYBYGROUP, the attack that is too computationally expensive to be included in the training loop. This result corroborates with the theoretical advantage of N&P: its robustness guarantee is independent of training algorithms.

