AN ANALYTIC FRAMEWORK FOR ROBUST TRAINING OF DIFFERENTIABLE HYPOTHESES Anonymous

Abstract

The reliability of a learning model is key to the successful deployment of machine learning in various industries. Creating a robust model, particularly one unaffected by adversarial attacks, requires a comprehensive understanding of the adversarial examples phenomenon. However, it is difficult to describe the phenomenon due to the complicated nature of the problems in machine learning. Consequently, many studies investigate the phenomenon by proposing a simplified model of how adversarial examples occur and validate it by predicting some aspect of the phenomenon. While these studies cover many different characteristics of the adversarial examples, they have not reached a holistic approach to the geometric and analytic modeling of the phenomenon. Furthermore, the phenomenon have been observed in many applications of machine learning, and its effects seems to be independent of the choice of the hypothesis class. In this paper, we propose a formalization of robustness in learning theoretic terms and give a geometrical description of the phenomenon in analytic classifiers. We then utilize the proposal to devise a robust classification learning rule for differentiable hypothesis classes and showcase our proposal on synthetic and real-world data.

1. INTRODUCTION

The state-of-the-art machine learning models are shown to suffer from the phenomenon of adversarial examples, where a trained model is fooled to return an undesirable output on particular inputs that an adversary carefully crafts. While there is no consensus on the reasons behind the emergence of these examples, many facets of the phenomenon have been revealed. Szegedy et al. (2014) show that adversarial perturbations are not random and they generalize to other models. Goodfellow et al. (2015) indicate that linear approximations of the model around a test sample is an effective surrogate for the model in the generation of adversarial examples. Tanay & Griffin (2016) show that adversarial examples will appear in linear classifiers when the decision boundary is tilted towards the manifold of natural samples. Ilyas et al. (2019) reveal that the distribution of training samples and robustness of the trained model are related. Demontis et al. (2019) have proposed three metrics for measuring transferability between a target and a surrogate model based on the similarity of the loss landscape and the derivatives of the models. Li et al. (2020) infer that the cause of the phenomenon is probably geometrical and that statistical defects are amplifying its effects. Barati et al. (2021) find an example that shows pointwise convergence of the trained model to the optimal hypothesis is enough for the phenomenon to emerge. Shamir et al. (2021) explore the interaction of the decision boundary and the manifold of the samples in non-linear hypotheses. However, there are some issues with the current proposals in the literature. The geometrical and computational descriptions of the phenomenon does not always agree in their predictions (Moosavi-Dezfooli et al., 2019; Akhtar et al., 2021) . Even though geometrical perspectives have the advantage of being applicable to all hypothesis classes, they are not verifiable without a computational description. On the other hand, computational approaches are mostly coupled with a particular construction of a hypothesis and in turn need a geometric description to be applicable to different scenarios. The current defence methods does not appear to be very effective (Machado et al., 2021) and a need for novel ideas is felt (Bai et al., 2021) . Our aim in this paper is to devise a framework for the analysis of the adversarial examples phenomenon that 1. is decoupled from the underlying representation of the hypothesis. 2. provides the means for visualization of the phenomenon. 3. models the known characteristics of the phenomenon as much as possible. To this end, we first describe a necessary condition for robustness that we derive from the first principles of learning theory and lay out the groundwork for the analysis of the phenomenon from the perspective of learning rules in section 2. Next, we will extend the framework with the necessary definitions so that it would be open to geometrical interpretations in section 3. Finally, we put the proposed framework to use and verify its predictions for small scale synthetic and real-world problems. We provide a summary of the framework in appendix A for the interested reader.

2. PRELIMINARIES

In learning theory we are interested to study how machines learn (Shalev-Shwartz & Ben-David, 2014). We will base our analysis on the framework of probably approximately correct (PAC) learning. The basic objects of study in learning theory are hypothesis classes and learning rules. We are interested in determining the necessary condition for a learning rule A to be robust with respect to a nonuniform learnable hypothesis class H. We assume that the training samples are labeled by the true labeling function and that samples come from a compactly supported distribution throughout the paper. Definition 2.1 (perplexity). Consider a universally consistent learning rule A and training sets S ′ , S ⊂ X . The perplexity of h = A(S) with respect to h ′ = A(S ′ ) is ∥h ′ -h∥ ∞ = sup x∈X |h ′ (x) -h(x)|. (1) We expect that the perplexity of the output of a learning rule h ∈ H to decrease as we add more natural samples to the training set. In contrast, adding an adversarial sample would be perplexing for h. If we find some sample x ∈ X that does not perplex h but is not correctly labeled by h, then we assume that H is agnostic to the pattern of x. In other words, if adversarial training cannot improve the robustness of h, then h is as robust as it gets for hypotheses in H. Definition 2.2 (robust learning rule). Consider a learning rule A and a sequence {d n } ∞ n=1 of random variables that almost surely converge to a random variable D that is uniformly distributed on X . A is robust if for every ϵ > 0, there exists a natural number N such that for all m, n ≥ N we have that ∥A(d m ) -A(d n )∥ ∞ ≤ ϵ. (2) This definition of a robust learning rule could be interpreted as the characteristic that when A observes enough natural samples, adding more samples would not be consequential to the output of A, independent from the generative process of the samples. The definition is equivalent with Cauchy's criterion for uniform convergence of {A(d n )} ∞ n=1 to the optimal hypothesis A(D). The largest hypothesis class that we consider here is L 2 (X ). The main characteristic of functions in L 2 (X ) is that they are square-integrable. Formally, for a function f ∈ L 2 (X ), ∥f ∥ L 2 (X ) = X |f (x)| 2 dV (x) 1 2 < ∞. (3) Theorem 2.3. L 2 (X ) is nonuniform learnable. We assume that a hypothesis h ∈ H has a series or an integral representation, (5) h(x) = ∞ i=1 a i φ i (x), The series representation is customary in machine learning. A series representation is adequate if we have a discreet set of features {φ i } ∞ i=0 , e.g. polynomials. We argue that an integral representation is more adequate for analysis when there is a continuum of features to choose from, e.g. a neuron in an artificial neural network (ANN). Informally, the integral representation would abstract away the complexity of finding good features by incorporating every feature possible.

