ADVERSARIAL TRAINING USING CONTRASTIVE DI-VERGENCE

Abstract

To protect the security of machine learning models against adversarial examples, adversarial training becomes the most popular and powerful strategy against various adversarial attacks by injecting adversarial examples into training data. However, it is time-consuming and requires high computation complexity to generate suitable adversarial examples for ensuring the robustness of models, which impedes the spread and application of adversarial training. In this work, we reformulate adversarial training as a combination of stationary distribution exploring, sampling, and training. Each updating of parameters of DNN is based on several transitions from the data samples as the initial states in a Hamiltonian system. Inspired by our new paradigm, we design a new generative method for adversarial training by using Contrastive Divergence (ATCD), which approaches the equilibrium distribution of adversarial examples with only few iterations by building from small modifications of the standard Contrastive Divergence (CD). Our adversarial training algorithm achieves much higher robustness than any other state-of-the-art adversarial training acceleration method on the ImageNet, CIFAR-10, and MNIST datasets and reaches a balance between performance and efficiency.

1. INTRODUCTION

Although deep neural networks have become increasingly popular and successful in many machine learning tasks (e. 2002). We minimize the difference of Kullback-Leibler divergence between two adjacent sampling steps to avoid running long Monte-Carlo Markov Chains (MCMC). Instead of running the chain to achieve equilibrium, we can simply run the chain for fewer or even only one full step and then update the parameters to reduce the tendency of the chain to wander away from the initial distribution on the first step. Our approach is advantageous over existing ones in three folds: • We offer a new perspective on adversarial examples generation in a HMC framework. From the view of HMC, we bridge the relationship between several adversarial examples generating methods and MCMC sampling, which effectively draw multiple fair samples from the underlying distribution of adversarial examples. • By analyzing the trajectory shift of different lengths of MCMC simulating, we speed up the adversarial training by proposing a contrastive adversarial training (ATCD) method, which accelerates the process of achieving distribution equilibrium. • We thoroughly compare the effectiveness of our algorithm in various settings and different architectures on ImageNet, CIFAR10 and MNIST. Models trained by our proposed algorithm achieve robust accuracies markedly exceeding the ones trained by regular adversarial training and the state-of-the-art speedup methods when defending against several attacks.

2. BACKGROUND AND RELATED WORK

Adversarial Defense. To deal with the threat of adversarial examples, different strategies have been studied to find countermeasures to protect ML models. These approaches can be roughly categorized into two main types: 



g., image recognition He et al. (2016b), speech recognition Hinton et al. (2012); van den Oord et al. (2016) and natural language processing Hochreiter & Schmidhuber (1997); Vaswani et al. (2017)), the discovery of adversarial examples Szegedy et al. (2014); Goodfellow et al. (2015) has attracted great attention to strengthening the robustness of deep neural network (DNN) under such subtle but malicious perturbations. These crafted samples pose potential security threats in various safety-critical tasks such as autonomous vehicles Evtimov et al. (2017) or face recognition Sharif et al. (2016); Dong et al. (2019), which are required to be highly stable and reliable. Unfortunately, it is considered to be unresolved since no final conclusion has yet been reached on the root of the adversarial examples. Many defense methods Papernot et al. (2016); Na et al. (2018); Buckman et al. (2018) motivated by different interpretability of adversarial examples Goodfellow et al. (2015); Fawzi et al. (2018); Ma et al. (2018) were broken within a short time, indicating that there is still no thorough solution to settle this matter once and away. Nonetheless, adversarial training Szegedy et al. (2014); Goodfellow et al. (2015) has shown its ability to make classifiers more robust against sorts of attacks than any other defenses in Madry et al. (2018); Athalye et al. (2018). It offers an intuitive approach to handle the problem, which first obtains suitable adversarial examples by solving the inner maximization problem and then update the parameters of ML model from these examples by outer minimization. More and more advanced defenses Kannan et al. (2018); Lin et al. (2019); Xie et al. (2019); Zhang et al. (2019c) are developed based on adversarial training. However, a major issue of the current adversarial training methods is their significantly higher computational cost than regular training. It often needs multiple days and hundreds of GPUs for ImageNet-like datasets to achieve better convergence Xie et al. (2019), which makes it nearly intractable and impractical for large models on tons of data. Even for small-sized datasets like CIFAR10, adversarial training takes much longer time than regular training. To address this issue, we formulate the problem of generating adversarial examples in a Hamiltonian Monte Carlo framework (HMC) Neal et al. (2011), which can be considered as exploring the stationary distribution of adversarial examples for current parameters. The high computational cost of adversarial training can be easily attributed to the long trajectory of HMC producing. Therefore, we propose a new adversarial training algorithm called ATCD for strengthening the robustness of target models, enlightened by the Contrastive Divergence (CD) Hinton (

(a) detection only and (b) complete defense. The former approaches Bhagoji et al. (2018); Ma et al. (2018); Lee et al. (2018); Tao et al. (2018); Zhang et al. (2018) is to reject the potential malignant samples before feeding them to the ML models. The latter defenses obfuscate the gradient information of the classifiers to confuse the attack mechanisms including gradient masking Papernot & McDaniel (2017); Athalye et al. (2018) or randomized models Liu et al. (2018); Xie et al. (2018a); Lecuyer et al. (2019); Liu et al. (2019). There are also some add-ons modules Xie et al. (2019); Svoboda et al. (2019); Akhtar et al. (2018); Liao et al. (2018) being appended to the targeted network or adversarial interpolation schemes Zhang & Xu (2020); Lee et al. (2020) to protect deep networks against the adversarial attacks. Fast Adversarial Training. Besides all the above methods, adversarial training Goodfellow et al. (2015); Kurakin et al. (2017); Kannan et al. (2018); Madry et al. (2018); Tramèr et al. (2018); Liu & Hsieh (2019); Wang et al. (2020; 2019) is the most effective way to ensure better robustness, which has been widely verified in many works and competitions. However, limited works focus on boosting robust accuracy with reasonable training speed. Free Shafahi et al. (2019) recycle the gradient information computed to reduce the overhead cost of adversarial training. YOPO Zhang et al. (2019b) recast the adversarial training as a discrete time differential game and derive a Pontryagin's Maximum Principle (PMP) for it. Fast-FGSM Wong et al. (2020) combines FGSM with random initialization to accelerate the whole process. Markov Chain Monte Carlo Methods. Markov chain Monte Carlo (MCMC) Neal (1993) provides a powerful framework for exploring the complex solution space and achieves a nearly global optimal solution independent of the initial state. But the slow convergence rate of MCMC hinders its wide use in time critical fields. By utilizing the gradient information in the target solution space, Hamiltonian (or Hybrid) Monte Carlo method (HMC) Duane et al. (1987); Neal et al. (2011) achieves tremendous speed-up in comparison to previous MCMC algorithms. Multiple variants of HMC Pasarica & Gelman (2010); Salimans et al. (2015); Hoffman & Gelman (2014) were yet to be developed for adaptively tuning step size or iterations of leapfrog integrator. The fusion of MCMC and machine learning Tu & Zhu (2002); Chen et al. (2014); Song et al. (2017); Xie et al. (2018b) also shows great potential of MCMC. Contrastive Divergence. Contrastive Divergence (CD) has achieved notable success in training energy-based models including Restricted Boltzmann Machines (RBMs) as an efficient training

