AN UPPER BOUND FOR THE DISTRIBUTION OVERLAP INDEX AND ITS APPLICATIONS

Abstract

The overlap index between two probability distributions has various applications in statistics, machine learning, and other scientific research. However, approximating the overlap index is challenging when the probability distributions are unknown (i.e., distribution-free settings). This paper proposes an easy-to-compute upper bound for the overlap index without requiring any knowledge of the distribution models. We first utilize the bound to find the upper limit for the accuracy of a trained machine learning model when a domain shift exists. We additionally employ this bound to study the distribution membership classification of given samples. Specifically, we build a novel, distribution-free, computation-efficient, memory-efficient one-class classifier by converting the bound into a confidence score function. The proposed classifier does not need to train any parameters and is empirically accurate with only a small number of in-class samples. The classifier shows its efficacy and outperforms many state-of-the-art methods on various datasets in different one-class classification scenarios, including novelty detection, out-of-distribution detection, and backdoor detection. The obtained results show significant promise toward broadening the applications of overlap-based metrics.

1. INTRODUCTION

The distribution overlap index refers to the area intersected by two probability density functions (i.e., Fig. 1(a) ) and measures the similarity between the two distributions. A high overlap index value implies a high similarity. Although the overlap index has various applications in many areas, such as biology (Langøy et al., 2012; Utne et al., 2012) , economics (Milanovic & Yitzhaki, 2002) , and statistics (Inman & Bradley Jr, 1989) , the literature on approximating it under distributionfree settings is thin. This work proposes an upper bound for the overlap index with distributionfree settings to broaden the potential applications of overlap-based metrics. The bound is easy to compute and contains three terms: a constant number, the norm of the difference between the two distributions' means, and a variation distance between the two distributions over a subset. Even though finding such an upper bound for the distribution overlap index is already valuable, we further explore two additional applications of our bound as discussed below. One application of our bound is for domain shift analysis. Specifically, a domain shift is a change in the dataset distribution between a model's training dataset and the testing dataset encountered during implementation (i.e., the overlap index value between the distributions of the two datasets is less than 1). We calculated the model's testing accuracy in terms of the overlap index between the distributions of the training and testing datasets and further found the upper limit of the accuracy using our bound for the overlap index. Knowing the upper bound for a model's testing accuracy helps measure the model's potential and compare it with other models. We validated the calculated upper limit accuracy with experiments in backdoor attacks. Another application of our bound is for one-class classification. Specifically, one-class classification refers to a model that outputs positive for in-class samples and negative for out-class samples that are absent, poorly sampled, or not well defined (i.e., Fig. 1(b) ). We propose a novel oneclass classifier by converting our bound into a confidence score function to evaluate if a sample is in-class or out-class. The proposed classifier has many advantages. For example, implementing deep neural network-based classifiers requires training thousands of parameters and large memory, whereas implementing our classifier does not. It only needs sample norms to calculate the confi- dence score. Besides, deep neural network-based classifiers need relatively large amounts of data to avoid under-fitting or over-fitting, whereas our method is empirically accurate with only a small number of in-class samples. Therefore, our classifier is computation-efficient, memory-efficient, and data-efficient. Additionally, compared with other traditional one-class classifiers, such as Gaussian distribution-based classifier, Mahalanobis distance-based classifier (Lee et al., 2018) and one-class support vector machine (Schölkopf et al., 2001) , our classifier is distribution-free, explainable, and easy to understand. Overall, the contributions of this paper include: • Finding a distribution-free upper bound for the overlap index. • Applying this bound to the domain shift analysis problem with experiments. • Proposing a novel one-class classifier with the bound being the confidence score function. • Evaluating the proposed one-class classifier through comparison with various state-ofthe-art methods on several datasets, including UCI datasets, CIFAR-100, sub-ImageNet, etc., and in different one-class classification scenarios, such as novelty detection, out-ofdistribution detection, and neural network backdoor detection.

1.1. BACKGROUND AND RELATED WORKS

Measuring the similarity between distributions: Gini & Livada (1943) and Weitzman (1970) introduced the concept of the distribution overlap index. Other measurements for the similarity between distributions include the total variation distance, Kullback-Leibler divergence (Kullback & Leibler, 1951 ), Bhattacharyya's distance (Bhattacharyya, 1943) , and Hellinger distance (Hellinger, 1909) . In psychology, some effect size measures' definitions involve the concept of the distribution overlap index, such as Cohen's U index (Cohen, 2013) , McGraw and Wong's CL measure (McGraw & Wong, 1992) , and Huberty's I degree of non-overlap index (Huberty & Lowman, 2000) . However, they all have strong distribution assumptions (e.g., symmetry or unimodality) regarding the overlap index. Pastore & Calcagnì (2019) approximates the overlap index via kernel density estimators. One-class classification: Moya & Hush (1996) coined the term one-class classification. One-class classification intersects with novelty detection, anomaly detection, out-of-distribution detection, and outlier detection. Yang et al. (2021) explains the differences among these detection areas. Khan & Madden (2014) discusses many traditional non neural network-based one-class classifiers, such as one-class support vector machine (Schölkopf et al., 2001 ), decision-tree (Comité et al., 1999) , and one-class nearest neighbor (Tax, 2002) . Two neural network-based one-class classifiers are (Ruff et al., 2018) and OCGAN (Perera et al., 2019) . Morteza & Li (2022) introduces a Gaussian mixturebased energy measurement and compares it with several other score functions, including maximum softmax score (Hendrycks & Gimpel, 2017) , maximum Mahalanobis distance (Lee et al., 2018) , and energy score (Liu et al., 2020a) for one-class classification. Neural network backdoor attack and detection: Gu et al. (2019) and Liu et al. (2018b) mentioned the concept of the neural network backdoor attack. The attack contains two steps: during training, the attacker injects triggers into the training dataset; during testing, the attacker leads the network to misclassify by presenting the triggers (i.e., Fig. 1(c) ). The data poisoning attack (Biggio et al., 2012) and adversarial attack (Goodfellow et al., 2014) overlap with the backdoor attack. Some proposed trigger types are Wanet (Nguyen & Tran, 2021) , invisible sample-specific (Li et al., 2021) , smooth (Zeng et al., 2021) , and reflection (Liu et al., 2020b) . Some methods protecting neural networks from backdoor attacks include neural cleanse (Wang et al., 2019) , fine-pruning (Liu et al., 2018a) , and STRIP (Gao et al., 2019) . NNoculation (Veldanda et al., 2021) and RAID (Fu et al., 2022) utilize online samples to improve their detection methods. The backdoor detection problem also intersects with one-class classification. Therefore, some one-class classifiers can detect poisoned samples against the neural network backdoor attack. Organization of the Paper: We first provide preliminaries and derive the proposed upper bound for the overlap index in Sec. 2. We next apply our bound to domain shift analysis in Sec. 3. We then propose, analyze, and evaluate our novel one-class classifier in Sec. 4. We finally conclude the paper in Section 5.

2. AN UPPER BOUND FOR THE OVERLAP INDEX

2.1 PRELIMINARIES For simplicity, we consider the R n space and continuous random variables. We also define P and Q as two probability distributions in R n with f P and f Q being their probability density functions. Definition 1 (Overlap Index). The overlap η : R n × R n → [0, 1] of the two distributions is defined: η(P, Q) = R n min[f P (x), f Q (x)]dx. Definition 2 (Total Variation Distance). The total variation distance δ : R n × R n → [0, 1] of the two distributions is defined as δ(P, Q) = 1 2 R n |f P (x) -f Q (x)| dx. Definition 3 (Variation Distance on Subsets). Given a subset A from R n , we define δ A : R n ×R n → [0, 1] to be the variation distance of the two distributions on A, which is δ A (P, Q) = 1 2 A |f P (x) -f Q (x)|dx. Remark 1. One can prove that η and δ satisfy the following equation: η(P, Q) = 1 -δ(P, Q) = 1 -δ A (P, Q) -δ R n \A (P, Q). The quantity δ A defined in (3) will play an important role in deriving our upper bound for η.

2.2. THE UPPER BOUND FOR THE OVERLAP INDEX

We now proceed with deriving our proposed upper bound. Theorem 1. Without loss of generality, assume D + and D -are two probability distributions on a bounded domain B ⊂ R n with defined norm || • ||foot_0 (i.e., sup x∈B ||x|| < +∞), then for any subset A ⊂ B with its complementary set A c = B \ A, we have  η(D + , D -) ≤ 1 - 1 2r A c ||µ D + -µ D -|| - r A c -r A r A c δ A η(D + , D -) ≤ 1 - 1 2r B ||µ D + -µ D -|| - r B -r A r B δ A . Since ( 6) holds for any A, a tighter bound can be written as η(D + , D -) ≤ 1 - 1 2r B ||µ D + -µ D -|| -max A r B -r A r B δ A . Proof. Let f D + and f D -be the probability density functions for D + and D -. From (4), we have η(D + , D -) = 1 -δ A (D + , D -) -δ A c (D + , D -). (8) Using ( 8), triangular inequality, and boundedness, we obtain ||µ D + -µ D -|| = || B x (f D + (x) -f D -(x)) dx|| ≤ B ||x(f D + (x) -f D -(x))||dx (9) = A ||x|| • |f D + (x) -f D -(x)|dx + A c ||x|| • |f D + (x) -f D -(x)|dx (10) ≤ 2r A δ A + 2r A c δ A c = 2r A δ A + 2r A c (1 -δ A -η(D + , D -)) ) which implies (5). Replacing r A c with r B in ( 11) implies ( 6). Remark 2. The only assumption in this theorem is that the probability distribution domain is bounded. However, almost all real-world applications satisfy the boundedness assumption since the data is bounded. Therefore, r B can always be found (or at least a reasonable approximation can be found). Additionally, we can constrain A to be a bounded ball so that r A is also known. Although the proof of this theorem involves probability density functions, the computation does not require knowing the probability density functions but only finite samples because we can use the law of large numbers to estimate ||µ D + -µ D -|| and δ A , which will be shown next.

2.3. APPROXIMATING THE BOUND WITH FINITE SAMPLES

Let g : B → {0, 1} be a condition functionfoot_1 and define A = {x | g(x) = 1, x ∈ B}. According to the definition of δ A and triangular inequality, we have  δ A (D + , D -) = 1 2 A |f D + (x) -f D -(x)|dx ≥ 1 2 | A f D + (x) -f D -(x)dx| (12) = 1 2 B f D + (x)g(x)dx - B f D -(x)g(x)dx = 1 2 |E D + [g] -E D -[g]| . η(D + , D -) ≤ 1 - 1 2r B ||µ D + -µ D -|| -max g r B -r A(g) 2r B |E D + [g] -E D -[g]| . Given several condition functions {g j } k j=1 and finite sample sets (i.e., {x + i } n i=1 ∼ D + and {x - i } m i=1 ∼ D -), Alg . 1 shows how to compute the RHS of ( 14). Algorithm 1 ComputeBound({x + i } n i=1 , {x - i } m i=1 , {g j } k j=1 ) B ← {x + 1 , x + 2 , ..., x + n , x - 1 , x - 2 , ..., x - m } and r B ← max x∈B ||x|| ∆ µ ← 1 n n i=1 x + i -1 m m i=1 x - i for j = 1 → k do A = {x | g j (x) = 1, x ∈ B} and r A ← max x∈A ||x|| s j ← 1 -r A r B 1 n n i=1 g j (x + i ) -1 m m i=1 g j (x - i ) end for Return: 1 -1 2r B ∆ µ -1 2 max j s j Remark 3. The choice of condition functions is not unique. In this work, we use the indicator function g(x) = 1{||x|| ≤ r}, which outputs 1 if ||x|| ≤ r and 0 otherwise. By setting different values for r, we generate a family of condition functions. The motivation for choosing such indicator function form is that it is the most simple way to separate a space nonlinearly and it saves computations by directly applying r into Corollary 1. However, other indicator functions, such as RBF kernel-based indicator functions, are worth exploring and will be considered in our future works.

3. APPLICATION OF OUR BOUND TO DOMAIN SHIFT ANALYSIS

We now apply our bound to domain shift analysis. Remark 4. To prove the theorem, let f D and f D * be their probability density functions, then Accuracy = x∼D * p min{f D (x), f D * (x)} f D * (x) + q 1 - min{f D (x), f D * (x) f D * (x) f D * (x)dx (15) = pη(D, D * ) + q(1 -η(D, D * )) = (p -q)η(D, D * ) + q. (16) Without loss of generality, assume p > q. Then ( 16) shows that a large domain shift (i.e., a small η(D, D * )) leads to low overall accuracy of the model on the testing data distribution D * . If p = 1 and q = 0, then the overall accuracy is equal to η(D, D * ). Theorem 2 in Backdoor Attack Scenarios: A backdoor attack scenario (Fig. 1(b) ) considers that the model has a zero accuracy on poisoned data distribution as the attack success rate is almost 100%. Define the clean data distribution as D, poisoned data distribution as D p , and a testing data distribution D * composed by D and D p , i.e., D * = σD + (1 -σ)D p , where σ ∈ [0, 1] is the purity ratio (i.e., the ratio of clean samples to the entire testing samples). With all the settings above, we know that q = 0 on D p and ( 16) becomes Accuracy = pη(D, D * ) ≤ p(1 - 1 2r B ||µ D -µ D * || -max g r B -r A(g) 2r B |E D [g] -E D * [g]|) (17) = p(1 - 1 -σ 2r B ||µ D -µ D p || -(1 -σ) max g r B -r A(g) 2r B |E D [g] -E D p [g]|). (17) shows that the actual model accuracy on the contaminated testing data should be bounded by the multiplication of its accuracy p on clean data and our upper bound for η(D, D * ) with samples. (18) shows that the upper limit of the model accuracy on the contaminating data should linearly increase with the purity ratio σ (i.e., the percentage of clean samples over the entire testing samples). Validating Theorem 2 in Backdoor Attack Scenarios: The considered datasets are MNIST (Le-Cun et al., 2010), GTSRB (Stallkamp et al., 2011) , YouTube Face (Wolf et al., 2011) , and sub-ImageNet (Deng et al., 2009) . We composed the testing datasets D * with σ = 0, 0.1, ..., 0.9, 1 and calculated the upper bound for η(D, D * ) using L 1 , L 2 , and L ∞ norms in the raw data space, model output space, and intermediate layer space. The actual model accuracy and corresponding upper bounds are plotted in Fig. 2 . The actual model accuracy is below all the calculated upper limits, validating Theorem 2. Additionally, the upper limits grow linearly with σ, supporting (18). In other scenarios except for backdoor attacks, the model accuracy, p, and q may be known, then the lower bound for η can be estimated by Theorem 2. Theorem 2 can also help in finding q by knowing p, η, and model accuracy. Therefore, Theorem 2 has practical relevance and usefulness.

4.1. PROBLEM FORMULATION FOR ONE-CLASS CLASSIFICATION

Given R d space and n samples {x i } n i=1 that lie in an unknown probability distribution, we would like to build a test Ψ : R d → {±1} so that for any new input x, Ψ(x) outputs 1 when x is from the same unknown probability distribution, and outputs -1, otherwise. Some applications of Ψ are novelty detection, out-of-distribution detection, and backdoor detection (e.g., Fig. 1(b, c )).

4.2. A NOVEL CONFIDENCE SCORE FUNCTION

Given some in-class samples {x i } n i=1 , one can pick several condition functions {g j } k j=1 , where x is an out-class sample end if Remark 5. The score function f measures the maximum similarity between the new input x and the available in-class samples {x i } n i=1 . Different T 0 lead to different detection accuracy. However, we will show that the proposed one-class classifier has an overall high accuracy under different T 0 . g j (x) = 1{||x|| ≤ r j } for different r j , so that f (x) = ComputeBound({x}, {x i } n i=1 , {g j } k j=1 ) defined in

4.3. COMPUTATION AND SPACE COMPLEXITIES

Our algorithm can pre-compute and store 1 n n i=1 x i and 1 n n i=1 g j (x i ) with j = 1, 2, ..., k. Therefore, the total space complexity is O(k + 1). Assume that the total number of new online inputs is l; then, for every new input x, our algorithm needs to calculate ||x|| once and s j for k times. Therefore, the total computation complexity is O(l(k + 1)). Empirically, we restricted k to be a small number (e.g., 10) so that even devices without strong computation power can run our algorithm efficiently. Therefore, our classifier is computation-efficient and memory-efficient.

4.4. EVALUATION

Overall Setup: All the baseline algorithms with optimal hyperparameters, related datasets, and models were acquired from the corresponding authors' websites. The only exception is backdoor detection, in which we created our own models. However, we have carefully fine-tuned the baseline Figure 3 : The method is listed in the same order as in Table 4 . methods' hyperparameters to ensure their best performance over other hyperparameter choices. Our approach used ten indicator functions for all the experiments.

4.4.1. ONE-CLASS CLASSIFICATION FOR NOVELTY DETECTION

We evaluated our classifier on 100 small UCI datasets (UCI; Dua & Graff, 2017) and recorded the area under the receiver operating characteristic curve (AUROC). Fig. 3 shows the mean and standard deviation of AUROC for ours and other classifiers. Detailed numerical results can be found in Table 4 in Appendix A. The implementation code is provided in the supplementary material. Our classifier is the most consistent with the smallest standard deviation. Except for Gaussian, K-Nearest Neighbor, and Minimum Covariance Determinant, our classifier outperforms the other methods by showing the highest AUROC mean and the lowest AUROC standard deviation. Among Gaussian, K-Nearest Neighbor, and Minimum Covariance Determinant, Gaussian is the best classifier by having the highest mean and lowest standard deviation of AUROC. However, our method is comparable to Gaussian by showing a close mean and a smaller standard deviation. Besides the results, our classifier is distribution-free, computation-efficient, and memory-efficient, whereas some other classifiers do not. Additionally, our method is also easy to explain and understand: the score measures the maximum similarity between the new input and the available in-class samples. Therefore, we conclude that our classifier is valid for novelty detection.

4.4.2. ONE-CLASS CLASSIFICATION FOR OUT-OF-DISTRIBUTION DETECTION

We used CIFAR-10 and CIFAR-100 testing datasets (Krizhevsky et al., 2009) as the in-distribution datasets. The compared methods contain MSP (Hendrycks & Gimpel, 2017) , Mahalanobis (Lee et al., 2018) , Energy score (Liu et al., 2020a) , and GEM (Morteza & Li, 2022) . We used WideResNet (Zagoruyko & Komodakis, 2016) to extract features from the raw data. The WideResNet models (well-trained on CIFAR-10 and CIFAR-100 training datasets) and corresponding feature extractors were acquired from Morteza & Li (2022) . All the methods were evaluated in the same feature spaces with their optimal hyperparameters for fair comparisons. To fit the score function's parameters for all the methods, we formed a small dataset by randomly selecting 10 samples from each class. The out-of-distribution datasets include Textures (Cimpoi et al., 2014) , SVHN (Netzer et al., 2011) , LSUN-Crop (Yu et al., 2015) , LSUN-Resize (Yu et al., 2015) , and iSUN (Xu et al., 2015) . We used three metrics: the detection accuracy for out-of-distribution samples when the detection accuracy for in-distribution samples is 95% (TPR95), AUROC, and area under precision and recall (AUPR). Table 1 shows the average results for CIFAR-10 and CIFAR-100. The details for each individual outof-distribution dataset can be found in Table 5 and Table 6 in Appendix B. Our method outperforms the other methods by using the least memory and showing the highest AUROC on average. Our approach is also one of the fastest methods: the execution time of our approach for each sample is less than one millisecond (ms). For the CIFAR-10 case, our method also shows the highest average TPR95, and the average AUPR of our method is over 92%. For the CIFAR-100 case, the average TPR95 of our method is 0.3% close to the highest average TPR95, and the average AUPR of our method is over 85%. For each individual out-of-distribution dataset, our method always outperforms no less than half of the methods in TPR95 and AUROC, and the total average AUPR of our method over all cases is 89.02%. Our method can be further improved with an iterative approach as shown in Table 2 with details discussed at the end. We noticed that the out-of-distribution datasets are much smaller in size than the in-distribution datasets. Therefore, although the current AUPR is sufficient to ensure that our approach is valid, we see a potential improvement in our method to increase the AUPR for heavily imbalanced problems in our prospective work. On balanced datasets, our approach shows higher AUPR than the baseline methods as shown in Table 3 with details in Table 7 in Appendix C. Therefore, our approach performs better on balanced datasets than on unbalanced datasets. We also empirically observed that the compared baseline methods reported errors when data dimensions are dependent because the compared baseline methods need to calculate the inverse of the estimated covariance matrices that will not be full rank if data dimensions are dependent. We have reported this observation in Table 7 in the appendix. In contrast, our approach works since it does not require finding the inverse of any matrices. Further, Table 1 and Table 3 together show that the baseline methods perform well only for out-of-distribution detection, whereas our approach performs well for both out-of-distribution detection and backdoor detection (details are explained in next subsection). In summary, our classifier is a valid out-of-distribution detector. Improvement with more indicator functions: Our approach used ten indicator functions in all experiments. However, we have evaluated our approach by using more indicator functions (i.e., more r i ) and plotted the results in Fig. 4 . From the figure, the performance of our approach increases with more indicator functions being used and eventually converges to a limit. This limit is determined by the out-of-distribution dataset, the tightness of our bound in Corollary 1, and the form of utilized indicator functions (i.e., g(x) = 1{||x|| ≤ r} in this work). To increase this limit on a given outof-distribution dataset with the current form of our upper bound, a more advanced type of indicator function is required, which will be our future work as mentioned in Remark 3. Improvement with an iterative approach: Except for using more indicator functions, our approach can also be improved by an interative approach with only the original ten indicator functions being used. Assume that the confidence score of an input x is s(x), then its iterative confidence score, s ′ (x), can be calculated by Alg. 1 with the condition function g(x) = 1{s(x) ≤ T i }, where T i represents different thresholds. Table 2 shows the performance of our approach by applying the iterative score s ′ (x) to Alg. 2. The results show considerable improvement compared to Table 1 for the average and to Table 5 and Table 6 for each individual out-of-distribution dataset. Exploring the potential improvement of our approach with more rounds of iterations will also be our future work.

4.4.3. ONE-CLASS CLASSIFICATION FOR BACKDOOR DETECTION

The utilized datasets are MNIST (LeCun et al., 2010) , CIFAR-10 (Krizhevsky et al., 2009) , GTSRB (Stallkamp et al., 2011) , YouTube Face (Wolf et al., 2011) , and sub-ImageNet (Deng et al., 2009) . The adopted backdoor attacks include naive triggers, all-label attacks (Gu et al., 2019) , moving triggers (Fu et al., 2020 ), Wanet (Nguyen & Tran, 2021) , combination attacks, large-sized triggers, filter triggers, and invisible sample-specific triggers (Li et al., 2021) , as listed in Fig. 5 in Appendix C. The neural network architecture includes Network in Network (Lin et al., 2014) , Resnet (He et al., 2016) , and other networks from (Wang et al., 2019; Gu et al., 2019) . For each backdoor attack, we assume that a small clean validation dataset is available (i.e., 10 samples from each class) at the beginning. Therefore, the poisoned samples (i.e., samples attached with triggers) can be considered out-class samples, whereas the clean samples can be considered in-class samples. We used the backdoored network to extract data features. Then, we evaluated our one-class classifier and compared it with the previous baseline methods and STRIP (Gao et al., 2019) in the feature space. The metrics used are the same: TPR95 (i.e., the detection accuracy for poisoned samples when the detection accuracy for clean samples is 95%), AUROC, and AUPR. Table 3 shows the average performance. Details on each individual trigger can be found in Table 7 in Appendix C From the table, our classifier outperforms other baseline methods on average by showing higher AUROC, and AUPR. As for TPR95, our approach is very close to GEM. Compared with STRIP on the overall average performance, our classifier is 49.8% higher in TPR95, 26.38% higher in AUROC, and 26.66% higher in AUPR. For each individual trigger, the TPR95 of our method is over 96% for most cases, the AUROC of our method is over 97% for most cases, and the AUPR of our method is over 95% for most cases. It is also seen that our classifier is robust against the latest or advanced backdoor attacks, such as Wanet, invisible trigger, all label attack, and filter attack, whereas the baseline methods show low performance on those attacks. Therefore, we conclude that our classifier is valid for backdoor detection.

5. CONCLUSION

This paper proposes an easy-to-compute distribution-free upper bound for the distribution overlap index. Two applications of the bound are explored. The first application is for domain shift analysis with a proposed theorem and discussion. The second application is for one-class classification. Specifically, this paper introduces a novel distribution-free one-class classifier with the bound being its confidence score function. The classifier is sample-efficient, computation-efficient, and memoryefficient. The proposed classifier is evaluated on novelty detection, out-of-distribution detection, and backdoor detection on various datasets and compared with many state-of-the-art methods. The obtained results show significant promise toward broadening the application of overlap-based metrics. A DETAILS FOR NOVELTY DETECTION The method in Table 4 is in the same order as shown in Fig. 3 . C DETAILS FOR BACKDOOR DETECTION Fig. 5 shows the used triggers and the corresponding clean samples. Table 7 shows the details for backdoor detection. 



In this paper, we use the L2 norm. However, the choice of the norm is not unique and the analysis can be carried out using other norms as well. The condition function is an indicator function 1{condition} that outputs 1 when the input satisfies the given condition and 0 otherwise.



Figure 1: (a): Overlap of two distributions. (b): One-class classification. (c): Backdoor attack.

5) where r A = sup x∈A ||x|| and r A c = sup x∈A c ||x||, µ D + and µ D -are the means of D + and D -, and δ A is the variation distance on set A defined in Definition 3. Moreover, let r B = sup x∈B ||x||, then we have

Calculating E D + [g] and E D -[g] is easy: one just needs to draw samples from D + and D -, and then average their g values. Applying (13) into Theorem 1 gives the following corollary: Corollary 1. Given D + , D -, B, and || • || used in Theorem 1, let A(g) = {x | g(x) = 1, x ∈ B} with any condition function g : B → {0, 1}. Then, an upper bound for η(D + , D -) that can be obtained by our approximation is

Theorem 2. Assume that D and D * are two different data distributions (i.e., η(D, D * ) < 1). If a model is trained on D with p accuracy on D and q accuracy on D * \ D, then the overall accuracy of the model on D * is pη(D, D * ) + q(1 -η(D, D * )), which is upper bounded because η(D, D * ) is upper bounded by (14).

Alg. 1 is a score function that measures the likelihood of any input, x, being an in-class sample. Alg. 2 shows the overall one-class classification algorithm. k = 10 in our experiments.

Figure 2: The actual model accuracy (dot) vs. (16) (solid) calculated with L 1 , L 2 , and L ∞ norms in input, output, and hidden spaces. x: the ratio of clean samples to the entire testing samples. Algorithm 2 The Novel One-Class Classifier for the Input x Given in-class samples {x i } n i=1 , select several condition functions {g j } k j=1 , set a threshold T 0 if ComputeBound({x}, {x i } n i=1 , {g j } k j=1 ) ≥ T 0 then x is an in-class sample else x is an out-class sample end if

Figure 4: Improvements with more indicator functions with CIFAR-10 being the in-distribution data.

Figure 5: Pictures under "Triggers" are poisoned samples regarding different backdoored attacks. Pictures under "Clean" are clean samples for each dataset.

Average performance on various out-of-distribution datasets. Our method can be further improved with an iterative approach as shown in Table2.

Performance of our approach with the iterative approach. In-Distributions Metrics (%) Texture SVHN LSUN-C LSUN-R iSUN Ave.

Average performance for backdoor detection over various backdoor triggers and datasets.

Means and standard deviations of AUROC (%) for different methods on 100 UCI datasets.

is for CIFAR-10 case and Table6is for CIFAR-100 case.

Results for CIFAR-10 in-distribution case (higher number implies higher accuracy). Boldface shows the best performing algorithm, whereas underline shows the second best algorithm.

Results for CIFAR-100 in-distribution case (higher number implies higher accuracy). Boldface shows the best performing algorithm, whereas underline shows the second best algorithm.

Comparison results for backdoor detection (higher number implies higher accuracy).

