THE DARK SIDE OF AUTOML: TOWARDS ARCHITEC-TURAL BACKDOOR SEARCH

Abstract

This paper asks the intriguing question: is it possible to exploit neural architecture search (NAS) as a new attack vector to launch previously improbable attacks? Specifically, we present EVAS, a new attack that leverages NAS to find neural architectures with inherent backdoors and exploits such vulnerability using input-aware triggers. Compared with existing attacks, EVAS demonstrates many interesting properties: (i) it does not require polluting training data or perturbing model parameters; (ii) it is agnostic to downstream fine-tuning or even re-training from scratch; (iii) it naturally evades defenses that rely on inspecting model parameters or training data. With extensive evaluation on benchmark datasets, we show that EVAS features high evasiveness, transferability, and robustness, thereby expanding the adversary's design spectrum. We further characterize the mechanisms underlying EVAS, which are possibly explainable by architecture-level "shortcuts" that recognize trigger patterns. This work showcases that NAS can be exploited in a harmful way to find architectures with inherent backdoor vulnerability. The code is available at https://github.com/ain-soph/nas_backdoor.

1. INTRODUCTION

As a new paradigm of applying ML techniques in practice, automated machine learning (AutoML) automates the pipeline from raw data to deployable models, which covers model design, optimizer selection, and parameter tuning. The use of AutoML greatly simplifies the ML development cycles and propels the trend of ML democratization. In particular, neural architecture search (NAS), one primary AutoML task, aims to find performant deep neural network (DNN) archesfoot_0 tailored to given datasets. In many cases, NAS is shown to find models remarkably outperforming manually designed ones (Pham et al., 2018; Liu et al., 2019; Li et al., 2020) . In contrast to the intensive research on improving the capability of NAS, its security implications are largely unexplored. As ML models are becoming the new targets of malicious attacks (Biggio & Roli, 2018) , the lack of understanding about the risks of NAS is highly concerning, given its surging popularity in security-sensitive domains (Pang et al., 2022) . Towards bridging this striking gap, we pose the intriguing yet critical question: Is it possible for the adversary to exploit NAS to launch previously improbable attacks? This work provides an affirmative answer to this question. We present exploitable and vulnerable arch search (EVAS), a new backdoor attack that leverages NAS to find neural arches with inherent, exploitable vulnerability. Conventional backdoor attacks typically embed the malicious functions ("backdoors") into the space of model parameters. They often assume strong threat models, such as polluting training data (Gu et al., 2017; Liu et al., 2018; Pang et al., 2020) or perturbing model parameters (Ji et al., 2018; Qi et al., 2022) , and are thus subject to defenses based on model inspection (Wang et al., 2019; Liu et al., 2019) and data filtering (Gao et al., 2019) . In EVAS, however, as the backdoors are carried in the space of model arches, even if the victim trains the models using clean data and operates them in a black-box manner, the backdoors are still retained. Moreover, due Published as a conference paper at ICLR 2023 to its independence of model parameters or training data, EVAS is naturally robust against defenses such as model inspection and input filtering. To realize EVAS, we define a novel metric based on neural tangent kernel (Chen et al., 2021) , which effectively indicates the exploitable vulnerability of a given arch; further, we integrate this metric into the NAS-without-training framework (Mellor et al., 2021; Chen et al., 2021) . The resulting search method is able to efficiently identify candidate arches without requiring model training or backdoor testing. To verify EVAS's empirical effectiveness, we evaluate EVAS on benchmark datasets and show: (i) EVAS successfully finds arches with exploitable vulnerability, (ii) the injected backdoors may be explained by arch-level "shortcuts" that recognize trigger patterns, and (iii) EVAS demonstrates high evasiveness, transferability, and robustness against defenses. Our findings show the feasibility of exploiting NAS as a new attack vector to implement previously improbable attacks, raise concerns about the current practice of NAS in security-sensitive domains, and point to potential directions to develop effective mitigation.

2. RELATED WORK

Next, we survey the literature relevant to this work. Neural arch search. The existing NAS methods can be categorized along search space, search strategy, and performance measure. Search space -early methods focus on the chain-of-layer structure (Baker et al., 2017) , while recent work proposes to search for motifs of cell structures (Zoph et al., 2018; Pham et al., 2018; Liu et al., 2019) . Search strategy -early methods rely on either random search (Jozefowicz et al., 2015) or Bayesian optimization (Bergstra et al., 2013) , which are limited in model complexity; recent work mainly uses the approaches of reinforcement learning (Baker et al., 2017) or neural evolution (Liu et al., 2019) . Performance measure -one-shot NAS has emerged as a popular performance measure. It considers all candidate arches as different sub-graphs of a super-net (i.e., the one-shot model) and shares weights between candidate arches (Liu et al., 2019) . Despite the intensive research on NAS, its security implications are largely unexplored. Recent work shows that NAS-generated models tend to be more vulnerable to various malicious attacks than manually designed ones (Pang et al., 2022; Devaguptapu et al., 2021) . This work explores another dimension: whether it can be exploited as an attack vector to launch new attacks, which complements the existing studies on the security of NAS. Backdoor attacks and defenses. Backdoor attacks inject malicious backdoors into the victim's model during training and activate such backdoors at inference, which can be categorized along attack targets -input-specific (Shafahi et al., 2018 ), class-specific (Tang et al., 2020) , or any-input (Gu et al., 2017 ), attack vectors -polluting training data (Liu et al., 2018) or releasing infected models (Ji et al., 2018) , and optimization metrics -attack effectiveness (Pang et al., 2020 ), transferability (Yao et al., 2019) , or attack evasiveness (Chen et al., 2017) . To mitigate such threats, many defenses have also been proposed, which can be categorized according to their strategies (Pang et al., 2022) : input filtering purges poisoning samples from training data (Tran et al., 2018) ; model inspection determines whether a given model is backdoored (Liu et al., 2019; Wang et al., 2019) , and input inspection detects trigger inputs at inference time (Gao et al., 2019) . Most attacks and defenses above focus on backdoors implemented in the space of model parameters. Concurrent to this work, Bober-Irizar et al. ( 2022) explore using neural arches to implement backdoors by manually designing "trigger detectors" in the arches and activating such detectors using poisoning data during training. This work investigates using NAS to directly search for arches with exploitable vulnerability, which represents a new direction of backdoor attacks.

3. EVAS

Next, we present EVAS, a new backdoor attack leveraging NAS to find neural arches with exploitable vulnerability. We begin by introducing the threat model.

3.1. THREAT MODEL

A backdoor attack injects a hidden malicious function ("backdoor") into a target model (Pang et al., 2022) . The backdoor is activated once a pre-defined condition ("trigger") is present, while the model



In the following, we use "arch" for short of "architecture".

