

Abstract

Data augmentation is used extensively to improve model generalisation. However, reliance on external libraries to implement augmentation methods introduces a vulnerability into the machine learning pipeline. It is well known that backdoors can be inserted into machine learning models through serving a modified dataset to train on. Augmentation therefore presents a perfect opportunity to perform this modification without requiring an initially backdoored dataset. In this paper we present three backdoor attacks that can be covertly inserted into data augmentation. Our attacks each insert a backdoor using a different type of computer vision augmentation transform, covering simple image transforms, GAN-based augmentation, and composition-based augmentation. By inserting the backdoor using these augmentation transforms, we make our backdoors difficult to detect, while still supporting arbitrary backdoor functionality. We evaluate our attacks on a range of computer vision benchmarks and demonstrate that an attacker is able to introduce backdoors through just a malicious augmentation routine.

1. INTRODUCTION

Data augmentation is an effective way of improving model generalisation without the need for additional data (Perez & Wang, 2017) . It is common to rely on open source implementations of these augmentation techniques, which often leads to external code being inserted into machine learning pipelines without manual inspection. This presents a threat to the integrity of the trained models. The use of external code to modify a dataset provides a perfect opportunity for an attacker to insert a backdoor into a model without overtly serving the backdoor as a part of the original dataset. Backdoors based on BadNet are generally implemented by directly serving a malicious dataset to the model (Gu et al., 2017) . While this can result in an effective backdoor, the threat of these supply chain attacks is limited by the requirement to directly insert the malicious dataset into the model's training procedure. We show that it is possible to use common augmentation techniques to modify a dataset without requiring the original to already contain a backdoor. The general flow of backdoor insertion using augmentation is illustrated in Figure 1 . More specifically, we present attacks using three different types of augmentation: (i) using standard transforms such as rotation or translation as the trigger in a setup similar to BadNet (Gu et al., 2017); (ii) using GAN-based augmentation such as DAGAN (Antoniou et al., 2017) , trained to insert a backdoor into the dataset; and (iii) using composed augmentations such as AugMix (Hendrycks et al., 2020) to efficiently construct gradients in a similar fashion to the Batch Order Backdoor described by Shumailov et al. (2021) . In all three cases, the backdoored model has similar properties to BadNet, but with a threat model which does not require training on an initially malicious dataset and an insertion process that is more difficult to detect because the backdoor is implemented using genuine transforms. Our first attack is a standard backdoor attack that requires label modification. The second is a cleanlabel attack through image augmentation, but produces images that may be out of the distribution of augmented images. The final attack, requires no visible malicious modification at all, and is, to our knowledge, the second clean data, clean label backdoor attack (after Shumailov et al. ( 2021)). To summarise, we make the following contributions in this paper: • We present three new backdoor attacks that can be inserted into a model's training pipeline through a variety of augmentation techniques. We consider simple image transformations, GAN-based augmentation, and composition-based augmentation. • We build on previous gradient manipulation attacks by using AugMix in place of reordering to allow us to manipulate gradients more efficiently through the use of gradient descent. This attack demonstrates that it is possible to perform clean data clean label backdoor attacks using data augmentation, and outperforms Shumailov et al. ( 2021) significantly. • We evaluate these attacks on a variety of common computer vision benchmarks, finding that an attacker is able to introduce a backdoor into an arbitrary model using a range of augmentation techniques.

2. RELATED WORK

Backdoor attacks Gu et al. ( 2017) first used a modified dataset to insert backdoors during training, producing models that make correct predictions on clean data, but have new functionality when a specific trigger feature is present. Improvements to this process have since been made to create attacks that assume stronger threat models. 



Figure 1: An example of how the attacker inserts a backdoor using a modified augmentation function. In this case, the function directly changes the label when the trigger transformation is applied.

Ma et al. (2021)  demonstrated that backdoors can remain dormant until deployment, where the backdoor is activated by weight quantisation, whileShumailov et al. (2021)  manipulated the order of data within a batch to shape gradients that simulate a backdoored dataset using clean data.Chen et al. (2017)  first investigated triggers that are difficult for humans to identify. Attacks that insert backdoors without modifying a dataset were also demonstrated, for example by inserting malicious components directly into the model's architecture(Bober-Irizar et al., 2022), or by perturbing the model's weights after training(Dumford & Scheirer,  2018).Many of these techniques assume direct access to either the model itself or its training set. Methods that use preprocessing to indirectly insert backdoors have been shown to be an effective mechanism to insert backdoors into machine learning pipelines.Quiring et al. (2020)  and Gao et al. (2021) discuss inserting backdoors using image scaling by adding additional perturbations to the scaling procedure. Our attacks are similarly inserted into augmentation functions, but insert their backdoors using the random parameters of the augmentation rather than by adding additional perturbations to the images in order to stay more discrete. Our attacks each focus on a different class of data augmentation function, building on the work from Wu et al. (2022), who investigate only the rotation transformation. Here we consider the more general threat of adversarial augmentation as a mechanism for inserting backdoors into the training pipeline while also remaining covert by inserting the backdoor through the augmentations' random parameters as opposed to direct dataset modification.Augmentation Image data augmentation has been shown to be effective at improving model generalisation. Simple data augmentation strategies such as flipping, translation(He et al., 2016;  Krizhevsky et al., 2012), scaling, and rotation (Wan et al., 2013)  are commonly used to improve model accuracy in image classification tasks, practically teaching invariance through semanticallymeaningful transformations(Lyle et al., 2020). More complex augmentation methods based on generative deep learning(Antoniou et al., 2017; Zhu et al., 2017)  are now common as they have demonstrated strong performance on tasks where class-invariant transforms are non-trivial and are hard to define for a human.Rather than encoding a direct invariance, Cutout (DeVries & Taylor, 2017) removes a random portion of each image, while mixing techniques(Yun et al., 2019; Zhang et al., 2018)  mix two random images into one image with a combined label. AugMix(Hendrycks et al., 2020)  uses random compositions of simpler transforms to provide more possible augmentations. AugMax(Wang et al.,

