PROBABILISTIC CATEGORICAL ADVERSARIAL AT-TACK & ADVERSARIAL TRAINING

Abstract

The existence of adversarial examples brings huge concern for people to apply Deep Neural Networks (DNNs) in safety-critical tasks. However, how to generate adversarial examples with categorical data is an important problem but lack of extensive exploration. Previously established methods leverage greedy search method, which can be very time-consuming to conduct successful attack. This also limits the development of adversarial training and potential defenses for categorical data. To tackle this problem, we propose Probabilistic Categorical Adversarial Attack (PCAA), which transfers the discrete optimization problem to a continuous problem that can be solved efficiently by Projected Gradient Descent. In our paper, we theoretically analyze its optimality and time complexity to demonstrate its significant advantage over current greedy based attacks. Moreover, based on our attack, we propose an efficient adversarial training framework. Through a comprehensive empirical study, we justify the effectiveness of our proposed attack and defense algorithms.

1. INTRODUCTION

Adversarial attacks (Goodfellow et al., 2015) have raised great concerns for the applications of Deep Neural Networks(DNNs) in many security-critical domains (Cui et al., 2019; Stringhini et al., 2010; Cao & Tay, 2001) . The majority of existing methods focus on differentiable models and continuous input space, where we can apply gradient-based approaches to generate adversarial examples. However, there are many machine learning tasks where the input data are categorical. For example, data in ML-based intrusion detection systems (Khraisat et al., 2019 ) contains records of system operations; and in financial transaction systems, data includes categorical information such as the types of transactions. Therefore, how to explore potential attacks and corresponding defenses for categorical inputs is also desired. Existing methods introduce search-based approaches for categorical adversarial attacks (Yang et al., 2020b; Lei et al., 2019a) . For example, the method in (Yang et al., 2020a) first finds top-K features of a given sample that have the maximal influence on the model output, and then, a greedy search is applied to obtain the optimal combination of perturbation in these K features. However, these search-based methods cannot be guaranteed to find the strongest adversarial examples. Moreover, they can be computationally expensive, especially when data is high dimensional and the number of categories for each feature is large. In this paper, we propose a novel Probabilistic Categorical Adversarial Attack (PCAA) algorithm to generate categorical adversarial examples by estimating their probabilistic distribution. In detail, given a clean sample, we assume that (each feature of) the adversarial example follows a categorical distribution, and satisfies: (1) the generated samples following this distribution have a high expected loss value and (2) the generated samples only have a few features which are different from the original clean sample. (See Section 3 for more details.) In this way, we transfer the categorical adversarial attack in the discrete space to an optimization problem in a continuous probabilistic space. Thus, we are able to apply gradient-based methods such as (Madry et al., 2017) to find adversarial examples. On one hand, the distribution of adversarial examples in PCAA is searched in the whole space of allowed perturbations. This can facilitate our method to find stronger adversarial examples (with higher loss value) than the greedy search methods (Yang et al., 2020b) . Moreover, when the dimension of input data expands, the increase of computational cost of PCAA will be significantly slower than search-based methods (Section 3.4). Therefore, our method can enjoy good attacking optimality and computational efficiency simultaneously. For example, in our experiments 1

