GAME THEORETIC MIXED EXPERTS FOR COMBINA-TIONAL ADVERSARIAL MACHINE LEARNING

Abstract

Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically tailored to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer models and Spiking Neural Networks (SNNs). A natural question arises: how can one best leverage a combination of adversarial defenses to thwart such attacks? In this paper, we provide a game-theoretic framework for ensemble adversarial attacks and defenses which answers this question. In addition to our framework we produce the first adversarial defense transferability study to further motivate a need for combinational defenses utilizing a diverse set of defense architectures. Our framework is called Game theoretic Mixed Experts (GaME) and is designed to find the Mixed-Nash strategy for a defender when facing an attacker employing compositional adversarial attacks. We show that this framework creates an ensemble of defenses with greater robustness than multiple state-of-the-art, singlemodel defenses in addition to combinational defenses with uniform probability distributions. Overall, our framework and analyses advance the field of adversarial machine learning by yielding new insights into compositional attack and defense formulations.



1. Do adversarial examples generated for one specific defense transfer to other defenses? 2. Based on adversarial transferability, can a game theoretic framework be developed to determine the optimal choices for both attacker and defender? 3. Can randomized defense selection yield higher robustness than a single state-of-the-art defense? These are precisely the questions our paper seeks to answer. We break from the traditional dynamic of adversarial machine learning which focuses on the single best attack and defense. We instead take



have been shown to be vulnerable to adversarial examples Goodfellow et al. (2014); Papernot et al. (2016). Adversarial examples are inputs with small perturbations added, such that machine learning models misclassify the example with high confidence. Addressing the security risks posed by adversarial examples are critical for the safe deployment of machine learning in areas like health care Finlayson et al. (2019) and self driving vehicles Qayyum et al. (2020). However, current defenses and attacks in adversarial machine learning have trended towards a cat and mouse dynamic where in new defenses are continually being proposed and then broken Carlini & Wagner (2017); Tramer et al. (2020); Mahmood et al. (2021a); Sitawarin et al. (2022) by improved attacks. In parallel to attack and defense development, studies have also been conducted on the transferability of adversarial examples Liu et al. (2016); Mahmood et al. (2021b); Xu et al. (2022). Transferabiltiy refers to the phenomena where adversarial examples generated for one model are also misclassified by a different machine learning model. However, to the best of our knowledge no analyses have been done on the transferability of adversarial examples designed to attack specific defenses. From these observations several pertinent questions arise:

