TESSELLATED NEURAL NETWORKS: A ROBUST DEFENCE AGAINST ADVERSARIAL ATTACKS

Abstract

Data-driven deep learning approaches for image classification are prone to adversarial attacks. An adversarial image which is sufficiently close (visually indistinguishable) from a true image of its representative class can often be misclassified to be a member of a different class. It is possible for attackers to exploit the high dimensionality of image representations, as learned by the neural models, to identify adversarial perturbations. To mitigate this problem, we propose a novel divide-and-conquer based approach of tessellating a base network architecture (e.g., a ResNet used in our experiments). The tessellated network learns the parameterized representations of each non-overlapping sub-region or tiles within an image, independently, and then learns how to combine these representations to finally estimate the class of the input image. We investigate two different modes of tessellation, namely periodic, comprised of regular square-shaped tiles, and aperiodic, comprised of rectangles of different dimensions. Experiments demonstrate that the tessellated extension of two standard deep neural models leads to a better defence against a number of standard adversarial attacks. We observed that the decrease in post-attack accuracy values relative to the accuracy of the uncompromised networks is smaller for our proposed tessellated approach.

1. INTRODUCTION

Deep neural networks are known to be susceptible to adversarial attacks. Image representations learned by a deep neural network differ from their visual interpretation. Attackers exploit this fact by introducing imperceptible evasive perturbation in test images such that the victim network misclassifies them (Goodfellow et al., 2018; Machado et al., 2021) . Defending neural networks against such adversarial attacks is of significant theoretical and practical importance. Well known evasive attacks include the gradient based input perturbation strategies such as fast gradient sign method (FGSM) (Goodfellow et al., 2015) , and the projected gradient descent (PGD) (Madry et al., 2018) methodologies. Non-gradient based attacks use norm bounded perturbations that change class membership through an optimization process (Andriushchenko et al., 2019) . Universal attacks that are image-agnostic and add the same perturbation for all input images while still modifying the class labels are also prevalent (Moosavi-Dezfooli et al., 2017) . Norm based attacks seeking to optimize the magnitude of perturbation in input images were subsequently proposed to victimize newer defence strategies (Carlini & Wagner, 2017; Croce & Hein, 2019) . Patch attacks, which involve perturbing image segments rather than the image pixels, have also been attempted (Sharif et al., 2016; Yang et al., 2020) . More recent attacks approaches include the use of ensemblebased strategies with a capability to adapt on the defence mechanisms employed (Tramèr et al., 2020) . As newer attacks are being proposed by researchers, developing models that are robust to adversarial attacks has attracted significant attention of the research community (Machado et al., 2021) . Early defence strategies include adversarial training (Madry et al., 2018; Goodfellow et al., 2015) where a classifier is trained using both legitimate and adversarial examples to improve its robustness. Adversarial training restricts the defence only to the specific attack strategy using which the examples were generated. Other proactive defences retrain deep networks using the smoothed output probabilities over the class labels using the principles of network distillation (Papernot et al., 2016) . Both these

