GAME THEORETIC MIXED EXPERTS FOR COMBINA-TIONAL ADVERSARIAL MACHINE LEARNING

Abstract

Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically tailored to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer models and Spiking Neural Networks (SNNs). A natural question arises: how can one best leverage a combination of adversarial defenses to thwart such attacks? In this paper, we provide a game-theoretic framework for ensemble adversarial attacks and defenses which answers this question. In addition to our framework we produce the first adversarial defense transferability study to further motivate a need for combinational defenses utilizing a diverse set of defense architectures. Our framework is called Game theoretic Mixed Experts (GaME) and is designed to find the Mixed-Nash strategy for a defender when facing an attacker employing compositional adversarial attacks. We show that this framework creates an ensemble of defenses with greater robustness than multiple state-of-the-art, singlemodel defenses in addition to combinational defenses with uniform probability distributions. Overall, our framework and analyses advance the field of adversarial machine learning by yielding new insights into compositional attack and defense formulations.

1. INTRODUCTION

Machine learning models have been shown to be vulnerable to adversarial examples Goodfellow et al. (2014) ; Papernot et al. (2016) . Adversarial examples are inputs with small perturbations added, such that machine learning models misclassify the example with high confidence. Addressing the security risks posed by adversarial examples are critical for the safe deployment of machine learning in areas like health care Finlayson et al. (2019) 1. Do adversarial examples generated for one specific defense transfer to other defenses? 2. Based on adversarial transferability, can a game theoretic framework be developed to determine the optimal choices for both attacker and defender? 3. Can randomized defense selection yield higher robustness than a single state-of-the-art defense? These are precisely the questions our paper seeks to answer. We break from the traditional dynamic of adversarial machine learning which focuses on the single best attack and defense. We instead take a multi-faceted approach and develop a game theoretic framework to answer the above questions. Specifically, we provide the following contributions: Most importantly, we formulate a practical, game-theoretic framework for finding the optimal strategies for an attacker and defender who each employ a set of state-of-the-art adversarial attacks and defenses. 2020). We further leverage the low transferability between these classifiers to find those which are best suited for a combined, ensemble defense such as the one developed in our game-theoretic framework.

2. ADVERSARIAL MACHINE LEARNING DEFENSES

Here we summarize the state-of-the-art defenses we analyze in this paper. In the following subsections, we give an overview of each defense and our reasons for choosing said defense. It is important to note our analyses encompass a broad range of different defenses, including ones based on randomization, ones based on adversarial training and ones based on exploiting model transferability. In addition, we also consider diverse architectures including Big Transfer models (BiTs), Vision Transformers (ViTs) and Spiking Neural Networks (SNNs). Despite our broad range, we do not attempt to test every novel adversarial defense. It is simply infeasible to test every proposed adversarial machine learning defense, as new defenses are constantly being produced. However, based on our game theoretic design and open source code (which will be provided upon publication), any new defense can easily be tested and integrated into our proposed framework.

2.1. BARRAGE OF RANDOM TRANSFORMS

Barrage of Random Transforms (BaRT) Raff et al. ( 2019) utilize a set of image transformations in a random order and with randomized transformation parameters to thwart adversarial attacks. Let t i j (x) represent the i th transformation used in the j th order in the sequence. A BaRT defense using n image transformations randomly alters the input x: t(x) = t ωn µn • t ωn-1 µn-1 • ... • t ω1 µ1 (x) where ω represents the subset of n transformations randomly selected from a set of N total possible transformations and µ represents the randomized order in which the n transformations are applied. In Equation 1 the parameters of each image transformation t i j (x) are also randomized at run time, further adding to the stochastic nature of the defense. In this paper, we work with the original BaRT implementation which includes both differentiable and non-differentiable image transformations. Why we selected it: Many defenses are broken soon after being proposed Tramer et al. (2020) . BaRT is one of the few defenses that has continued to show robustness even when attacks are specifically tailored to work against it. For example, most recently BaRT achieves 29% robustness on CIFAR-10 against a customized white-box attack Sitawarin et al. (2022) . It remains an open question whether using BaRT with other randomized approaches (i.e. selecting between different defenses) can yield even greater robustness. 



and self driving vehicles Qayyum et al. (2020). However, current defenses and attacks in adversarial machine learning have trended towards a cat and mouse dynamic where in new defenses are continually being proposed and then broken Carlini & Wagner (2017); Tramer et al. (2020); Mahmood et al. (2021a); Sitawarin et al. (2022) by improved attacks. In parallel to attack and defense development, studies have also been conducted on the transferability of adversarial examples Liu et al. (2016); Mahmood et al. (2021b); Xu et al. (2022). Transferabiltiy refers to the phenomena where adversarial examples generated for one model are also misclassified by a different machine learning model. However, to the best of our knowledge no analyses have been done on the transferability of adversarial examples designed to attack specific defenses. From these observations several pertinent questions arise:

FRIENDLY ADVERSARIAL TRAINING Training classifiers to correctly recognize adversarial examples was originally proposed in Goodfellow et al. (2014) using FGSM. This concept was later expanded to include training on adversarial examples generated by PGD in Madry et al. (2018). In Zhang et al. (2020) it was shown that Friendly Adversarial Training (FAT) could achieve high clean accuracy while maintaining robustness to adversarial examples. This training was accomplished by using a modified version of PGD called PGD-K-τ . In PGD-K-τ , K refers to the number of iterations used for PGD. The τ variable is a hyperparamter used in training which stops the PGD generation of adversarial examples earlier than the normal K number of steps, if the sample is already misclassified.

Motivated by this framework, we develop two new white-box attacks called the Momentum Iterative Method over Expectation (MIME) and the Auto Expectation Self-Attention Gradient Attack (AE-SAGA) in order to create a stronger adversary. These attacks are necessary for targeting certain randomized defenses and for adapting to multi-defense strategies. Lastly, we analyze the adversarial transferability of current defenses like Trash is Treasure Xiao & Zheng (2020), Barrage of Random Transforms Raff et al. (2019), Friendly Adversarial Training Zhang et al. (2020) and other new architectures like SNNs Rathi & Roy (2021b); Fang et al. (2021) and ViTs Dosovitskiy et al. (

