TESSELLATED NEURAL NETWORKS: A ROBUST DEFENCE AGAINST ADVERSARIAL ATTACKS

Abstract

Data-driven deep learning approaches for image classification are prone to adversarial attacks. An adversarial image which is sufficiently close (visually indistinguishable) from a true image of its representative class can often be misclassified to be a member of a different class. It is possible for attackers to exploit the high dimensionality of image representations, as learned by the neural models, to identify adversarial perturbations. To mitigate this problem, we propose a novel divide-and-conquer based approach of tessellating a base network architecture (e.g., a ResNet used in our experiments). The tessellated network learns the parameterized representations of each non-overlapping sub-region or tiles within an image, independently, and then learns how to combine these representations to finally estimate the class of the input image. We investigate two different modes of tessellation, namely periodic, comprised of regular square-shaped tiles, and aperiodic, comprised of rectangles of different dimensions. Experiments demonstrate that the tessellated extension of two standard deep neural models leads to a better defence against a number of standard adversarial attacks. We observed that the decrease in post-attack accuracy values relative to the accuracy of the uncompromised networks is smaller for our proposed tessellated approach.

1. INTRODUCTION

Deep neural networks are known to be susceptible to adversarial attacks. Image representations learned by a deep neural network differ from their visual interpretation. Attackers exploit this fact by introducing imperceptible evasive perturbation in test images such that the victim network misclassifies them (Goodfellow et al., 2018; Machado et al., 2021) . Defending neural networks against such adversarial attacks is of significant theoretical and practical importance. Well known evasive attacks include the gradient based input perturbation strategies such as fast gradient sign method (FGSM) (Goodfellow et al., 2015) , and the projected gradient descent (PGD) (Madry et al., 2018) methodologies. Non-gradient based attacks use norm bounded perturbations that change class membership through an optimization process (Andriushchenko et al., 2019) . Universal attacks that are image-agnostic and add the same perturbation for all input images while still modifying the class labels are also prevalent (Moosavi-Dezfooli et al., 2017) . Norm based attacks seeking to optimize the magnitude of perturbation in input images were subsequently proposed to victimize newer defence strategies (Carlini & Wagner, 2017; Croce & Hein, 2019) . Patch attacks, which involve perturbing image segments rather than the image pixels, have also been attempted (Sharif et al., 2016; Yang et al., 2020) . More recent attacks approaches include the use of ensemblebased strategies with a capability to adapt on the defence mechanisms employed (Tramèr et al., 2020) . As newer attacks are being proposed by researchers, developing models that are robust to adversarial attacks has attracted significant attention of the research community (Machado et al., 2021) . Early defence strategies include adversarial training (Madry et al., 2018; Goodfellow et al., 2015) where a classifier is trained using both legitimate and adversarial examples to improve its robustness. Adversarial training restricts the defence only to the specific attack strategy using which the examples were generated. Other proactive defences retrain deep networks using the smoothed output probabilities over the class labels using the principles of network distillation (Papernot et al., 2016) . Both these retraining methods modify the gradient of the networks so as to allow fewer directions (subspaces) towards which the attacker might perturb the input image. Input transformation is another popular defence strategy. In this approach the corrupted inputs are either detected and rejected before classification (Chen et al., 2017) , or a preprocessing is performed to mitigate its adversarial effects. Various preprocessing strategies have been suggested towards this end. Adding a random resizing and padding layer in early part of the architecture (Xie et al., 2017) , blockwise image transformation (AprilPyone & Kiya, 2020), cropping and rescaling (Guo et al., 2018) , include some of these techniques. As a different thread of work, transformation of the features at the output of the convolution layers such as activation pruning (Goodfellow, 2018) and denoising (Dhillon et al., 2018; Liao et al., 2018) are often equally effective as defence mechanism. Input dimensionality reduction approaches based on PCA (Hendrycks et al., 2019) , and spatial smoothing (Xu et al., 2017) are also found to provide robustness against attacks. Besides adversarial retraining and input transformation, various other techniques have also been attempted as defence strategies in deep neural networks. Ensemble of classifiers are found to be more robust towards adversarial attacks (Tramèr et al., 2017) . Data augmentation using GAN, generative models, and ensembles (Wang et al. 2019) generate diverse structured networks for ensuring robustness against adversarial threats. An alternative convolutional network (CNN) architecture which randomly masks parts of the feature maps also demonstrates adversarial robustness (Luo et al., 2020) . An advantage of this approach over its transformation-based counterparts is that it provides an effective defense mechanism that is mostly agnostic to attack strategies. As a motivation of the work in this paper, we hypothesize that modification of the network structure leads to implicit feature transformation, cropping and masking. This, in turn, potentially results in improved robustness against adversarial attacks. Moreover, incorporation of diversity in network topology likely disrupts the gradients, and acts as an effective defence against ensemble attacks. Consequently, reconfiguring the topology of a network may provide effective defence against adaptive adversarial attacks. Since attackers exploit high dimensionality of image inputs to identify directions for adversarial perturbations (Machado et al., 2021) , a divide and conquer strategy of processing smaller blocks of an input image, which is the basis of our proposed method in this paper, potentially restricts the exploitable space of the attacker. Our Contributions. In this paper, we propose a tessellated deep neural network architecture -'split and merge' based workflow that provides an effective defence mechanism against adversarial attacks. In our proposed approach, an input image is partitioned into blocks (tiles) according to a tessellation (tiling) pattern. Each region of the input image makes use of a separate branch in the computation graph to propagate its effects forward in the form of feature representations. The individual feature representations then interact with each other for the eventual prediction of an image class (see Figure 1 for a schematic representation). We investigate the use of two types of rectangular tessellation patterns, namely, regular grid tiling and tiling with non-uniform rectangles. As base networks (on which the tessellation is applied), we investigate standard deep networks for image classification, namely the ResNet50 (He et al., 2016) , and show that the proposed approach turns out to resist standard adversarial attacks more effectively than a number of baseline defence methodologies.

2. TESSELLATED NEURAL NETWORK

In this section, we describe our proposed method of tessellated neural architecture.



, 2021) has been widely studied in this context. State-of-art defences as reported in the RobustBench (Croce et al., 2020) benchmark dataset include those based on data augmentation for adversarial training (Rebuffi et al., 2021a), as well as those that are based on transformation or randomization of model parameters (Gowal et al., 2021). Different from the input or feature transformation based approaches, architectural changes to a network topology is a promising means of achieving adversarial robustness (Huang et al., 2021). For instance, Du et al. (2021) and Pang et al. (

