DEFENDING AGAINST ADVERSARIAL AUDIO VIA DIFFUSION MODEL

Abstract

Deep learning models have been widely used in commercial acoustic systems in recent years. However, adversarial audio examples can cause abnormal behaviors for those acoustic systems, while being hard for humans to perceive. Various methods, such as transformation-based defenses and adversarial training, have been proposed to protect acoustic systems from adversarial attacks, but they are less effective against adaptive attacks. Furthermore, directly applying the methods from the image domain can lead to suboptimal results because of the unique properties of audio data. In this paper, we propose an adversarial purification-based defense pipeline, AudioPure, for acoustic systems via offthe-shelf diffusion models. Taking advantage of the strong generation ability of diffusion models, AudioPure first adds a small amount of noise to the adversarial audio and then runs the reverse sampling step to purify the noisy audio and recover clean audio. AudioPure is a plug-and-play method that can be directly applied to any pretrained classifier without any fine-tuning or re-training. We conduct extensive experiments on speech command recognition task to evaluate the robustness of AudioPure. Our method is effective against diverse adversarial attacks (e.g. L 2 or L ∞ -norm). It outperforms the existing methods under both strong adaptive white-box and black-box attacks bounded by L 2 or L ∞norm (up to +20% in robust accuracy). Besides, we also evaluate the certified robustness for perturbations bounded by L 2 -norm via randomized smoothing. Our pipeline achieves a higher certified accuracy than baselines. Code is available at

1. INTRODUCTION

Deep neural networks (DNNs) have demonstrated great successes in different tasks in the audio domain, such as speech command recognition, keyword spotting, speaker identification, and automatic speech recognition. Acoustic systems built by DNNs (Amodei et al., 2016; Shen et al., 2019) are applied in safety-critical applications ranging from making phone calls to controlling household security systems. Although DNN-based models have exhibited significant performance improvement, extensive studies have shown that they are vulnerable to adversarial examples (Szegedy et al., 2014; Carlini & Wagner, 2018; Qin et al., 2019; Du et al., 2020; Abdullah et al., 2021; Chen et al., 2021a) , where attackers add imperceptible and carefully crafted perturbations to the original audio to mislead the system with incorrect predictions. Thus, it becomes crucial to design robust DNN-based acoustic systems against adversarial examples. To address it, existing works (e.g., Rajaratnam & Alshemali, 2018; Yang et al., 2019) have tried to leverage the temporal dependency property of audio to defend against adversarial examples. They apply the time-domain and frequency-domain transformations to the adversarial examples to improve the robustness. Although they can alleviate this problem to some extent, they are still vulnerable against strong adaptive attacks where the attacker obtains full knowledge of the whole acoustic system (Tramer et al., 2020) . Another way to enhance the robustness against adversarial examples is adversarial training (Goodfellow et al., 2015; Madry et al., 2018 ) that adversarial perturbations have been added to the training stage. Although it has been acknowledged as the most effective defense, the training process will require expensive computational resources and the model is still vulnerable to other types of adversarial examples that are not similar to those used in the training process (Tramer & Boneh, 2019) . Adversarial purification (Yoon et al., 2021; Shi et al., 2021; Nie et al., 2022) is another family of defense methods that utilizes generative models to purify the adversarial perturbations of the input examples before they are fed into neural networks. The key of such methods is to design an effective generative model for purification. Recently, diffusion models have been shown to be the state-of-the-art models for images (Song & Ermon, 2019; Ho et al., 2020; Nichol & Dhariwal, 2021; Dhariwal & Nichol, 2021) and audio synthesis (Kong et al., 2021; Chen et al., 2021b) . It motivates the community to use it for purification. In particular, in the image domain, DiffPure (Nie et al., 2022) applies diffusion models as purifiers and obtains good performance in terms of both clean and robust accuracy on various image classification tasks. Since such methods do not require training the model with pre-defined adversarial examples, they can generalize to diverse threats. Given the significant progress of diffusion models made in the image domain, it motivates us to ask: is it possible to obtain similar success in the audio domain? Unlike the image domain, audio signals have some unique properties. There are different choices of audio representations, including raw waveforms and various types of time-frequency representations (e.g., Mel spectrogram, MFCC). When designing an acoustic system, some particular audio representations may be selected as the target features, and defenses that work well on some features may perform poorly on other features. In addition, one may think of treating the 2-D time-frequency representations (i.e., spectrogram) as images, where the frequency-axis is set as height and the timeaxis is set as width, then directly apply the successful DiffPure (Nie et al., 2022) from the image domain for spectrogram. Despite the simplicity, there are two major issues: i) the acoustic system can take audio with variable time duration as the input, while the underlying diffusion model within DiffPure can only handle inputs with fixed width and height. ii) Even if we apply it in a fixed-length segment-wise manner for the time being, it still achieves the suboptimal results as we will demonstrate in this work. These unique issues pose a new challenge of designing and evaluating defense systems in the audio domain. In this work, we aim to defend against diverse unseen adversarial examples without adversarial training. We propose a play-and-plug purification pipeline named AudioPure based on a pre-trained diffusion model by leveraging the unique properties of audio. In specific, our model consists of two main components: (1) a waveform-based diffusion model and (2) a classifier. It takes the audio waveform as input and leverages the diffusion model to purify the adversarial audio perturbations. Given an adversarial input formatted with waveform, AudioPure first adds a small amount of noise via the diffusion process to override the adversarial perturbations, and then uses the truncated reverse process to recover the clean sample. The recovered sample is fed into the classifier. We conduct extensive experiments to evaluate the robustness of our method on the task of speech command recognition. We carefully design the adaptive attacks so that the attacker can accurately compute the full gradients to evaluate the effectiveness of our method. In addition, we also comprehensively evaluate the robustness of our method against different black-box attacks and the Expectation Over Transformation (EOT) attack. Our method shows a better performance under both white-box and black-box attacks against diverse adversarial examples. Moreover, we also evaluate the certified robustness of AudioPure via randomized smoothing, which offers a provable guarantee of model robustness against L 2 -based perturbation. We show that our method achieves better certified robustness than baselines. Specifically, our method obtains a significant improvement (up to +20% at most in robust accuracy) compared to adversarial training, and over 5% higher certified robust accuracy than baselines. To the best of our knowledge, we are the first to use diffusion models to enhance the security of acoustic systems and investigate how different working domains of defenses affect adversarial robustness.

2. RELATED WORK

Adversarial attacks and defenses. Szegedy et al. (2014) introduce adversarial examples, which look similar to normal examples but will fool the neural networks to give incorrect predictions. Usually, adversarial examples are constrained by L p norm to ensure the imperceptibility. Recently, stronger attack methods are emerging (Madry et al., 2018; Carlini & Wagner, 2017; Andriushchenko et al., 2020; Croce & Hein, 2020; Xiao et al., 2018a; b; 2019; 2022b; a; Cao et al., 2019b; a; 2022a) .

availability

https://github.com/cychomatica/AudioPure.

