LEARNING ROBUST KERNEL ENSEMBLES WITH KER-NEL AVERAGE POOLING

Abstract

Model ensembles have long been used in machine learning to reduce the variance in individual model predictions, making them more robust to input perturbations. Pseudo-ensemble methods like dropout have also been commonly used in deep learning models to improve generalization. However, the application of these techniques to improve neural networks' robustness against input perturbations remains underexplored. We introduce Kernel Average Pool (KAP), a new neural network building block that applies the mean filter along the kernel dimension of the layer activation tensor. We show that ensembles of kernels with similar functionality naturally emerge in convolutional neural networks equipped with KAP and trained with backpropagation. Moreover, we show that when combined with activation noise, KAP models are remarkably robust against various forms of adversarial attacks. Empirical evaluations on CIFAR10, CIFAR100, TinyImagenet, and Imagenet datasets show substantial improvements in robustness against strong adversarial attacks such as AutoAttack that are on par with adversarially trained networks but are importantly obtained without training on any adversarial examples.

1. INTRODUCTION

Model ensembles have long been used to improve robustness in the presence of noise. Classic methods like bagging (Breiman, 1996) , boosting (Freund, 1995; Freund et al., 1996) , and random forests (Breiman, 2001) are established approaches for reducing the variance in estimated prediction functions that build on the idea of constructing strong predictor models by combining many weaker ones. As a result, performance of these ensemble models (especially random forests) is surprisingly robust to noise variables (i.e. features) (Hastie et al., 2009) . Model ensembling has also been applied in deep learning (Zhou et al., 2001; Agarwal et al., 2021; Liu et al., 2021; Horváth et al., 2022) . However, the high computational cost of training multiple neural networks and averaging their outputs at test time often quickly becomes prohibitively expensive (also see work on averaging network weights across multiple fine-tuned versions (Wortsman et al., 2022) ). To tackle these challenges, alternative approaches have been proposed to allow learning pseudo-ensembles of models by allowing individual models within the ensemble to share parameters (Bachman et al., 2014; Srivastava et al., 2014; Hinton et al., 2012; Goodfellow et al., 2013) . Most notably, dropout (Hinton et al., 2012; Srivastava et al., 2014) was introduced to approximate the process of combining exponentially many different neural networks by "dropping out" a portion of units from layers of the neural network for each batch. It was argued that this technique prevents "co-adaptation" in the neural network and leads to learning more general features (Hinton et al., 2012) . While these techniques often improve the network generalization for i.i.d. sample sets, they are not as effective in improving the network robustness against input perturbations and in particular against adversarial attacks (Wang et al., 2018) . Adversarial attacks (Goodfellow et al., 2014) , slight but carefully constructed input perturbations that can significantly impair the network's performance, are one of the major challenges to the reliability of modern neural networks. Despite numerous works on this topic in recent years, the problem remains largely unsolved (Kannan et al., 2018; Madry et al., 2017; Zhang et al., 2019; Sarkar et al., 2021; Pang et al., 2020; Bashivan et al., 2021; Rebuffi et al., 2021; Gowal et al., 2021) . Moreover, the most effective empirical defense methods against adversarial attacks (e.g. adversarial training (Madry et al., 2017) and TRADES (Zhang et al., 2019) ) are extremely computationally demanding (although see more recent work on reducing their computational cost (Wong et al., 2019; Shafahi et al., 2019 )). Our central premise in this work is that if ensembles can be learned at the level of features (in contrast to class likelihoods), the resulting hierarchy of ensembles in the neural network could potentially lead to a much more robust classifier. To this end, we propose a simple method for learning ensembles of kernels in deep neural networks that significantly improves the network's robustness against adversarial attacks. In contrast to prior methods such as dropout that focus on minimizing feature co-adaptation and improving the individual features' utility in the absence of others, our method focuses on learning feature ensembles that form local "committees" similar to those used in Boosting and Random Forests. To create these committees in layers of a neural network, we introduce the Kernel Average Pool (KAP) operation that computes the average activity in nearby kernels within each layer -similar to how Average Pooling layer computes the locally averaged activity within each spatial window but instead along the kernel dimension. We show that incorporating KAP into convolutional networks leads to learning kernel ensembles that are topographically organized across the tensor dimensions over which the kernels are arranged. When combined with activation noise, these networks demonstrate a substantial boost in robustness against adversarial attacks. In contrast to other ensemble approaches to adversarial robustness, our approach does not seek to train multiple independent neural network models and instead focuses on learning kernel ensembles within a single neural network. Our contributions are as follows: • we introduce the kernel average pool as a simple method for learning kernel ensembles in deep neural networks. • we demonstrate how kernel average pooling leads to learning smoothly transitioning kernel ensembles that in turn substantially improve model robustness against input noise. • through extensive experiments on a wide range of benchmarks, we demonstrate the effectiveness of kernel average pooling on robustness against strong adversarial attacks.

2. RELATED WORKS AND BACKGROUND

Adversarial attacks: despite their superhuman performance in many vision tasks such as visual object recognition, neural network predictions are highly unreliable in the presence of input perturbations, including natural and artificial noise. While performance robustness of predictive models to natural noise have long been studied in the literature, more modern methods have been invented in the past decade to allow discovering small model-specific noise patterns (i.e. adversarial examples) that could maximize the model's risk (Goodfellow et al., 2014) . Adversarial defenses: Concurrent to the research on adversarial attacks, numerous methods have also been proposed to defend neural network models against these attacks (Kannan et al., 2018; Madry et al., 2017; Zhang et al., 2019; Sarkar et al., 2021; Pang et al., 2020; Bashivan et al., 2021; Robey et al., 2021; Sehwag et al., 2022; Rebuffi et al., 2021; Gowal et al., 2021) . Formally, the goal of these defense methods is to guarantee that the model predictions match the true label not only over the sample set but also within the ϵ-neighborhood of samples x. Adversarial training,



Numerous adversarial attacks have been proposed in the literature during the past decade Carlini & Wagner (2017); Croce & Hein (2020); Moosavi-Dezfooli et al. (2016); Andriushchenko et al. (2020); Brendel et al. (2017); Gowal et al. (2019). These attacks seek to find artificially generated samples that maximize the model's risk. Formally, given a classifier function f θ : X → Y, X ⊆ R n , Y = {1, ..., C}, denote by π(x, ϵ) a perturbation function (i.e. adversarial attack) which, for a given (x, y) ∈ X × Y, generates a perturbed sample x ′ ∈ B(x, ϵ) within the ϵ-neighborhood of x, B(x, ϵ) = {x ′ ∈ X : ∥x ′ -x∥ p < ϵ}, by solving the following maximization problemmax t∈B(x,ϵ) L(f θ (t), y),(1)where L is the classification loss function (i.e. classifier's risk) and ∥.∥ p is the L p norm function. Solutions x ′ are called adversarial examples and are essentially the original input samples altered with additive noise of magnitude ϵ measured by the L p norm.

