BOUNDED ATTACKS AND ROBUSTNESS IN IMAGE TRANSFORM DOMAINS

Abstract

Classical image transformation such as the discrete cosine transform (DCT) and the discrete wavelet transforms (DWTs) provide semantically meaningful representations of images. In this paper we propose a general method for adversarial attacks in such transform domains that, in contrast to prior work, obey the L ∞ constraint in the pixel domain. The key idea is to replace the standard attack based on projections with the barrier method. Experiments with DCT and DWTs produce adversarial examples that are significantly more similar to the original than with prior attacks. Further, through adversarial training we show that robustness against our attacks transfers to robustness against a broad class of common image perturbations.



Typically, the distance between a clean and a perturbed input is measured by an L p normfoot_0 . In particular, L 0 (a pseudonorm) and L ∞ have been argued to be necessary adversarial robustness metrics for images Kotyan & Vargas (2022) since they are easily interpretable: number of modified pixels and pixel-wise threshold, respectively. Further, Hendrycks & Dietterich (2019) noticed an interesting interaction between the L ∞ adversarial robustness and common image corruptions such as motion blur, shot noise, and frost. An additional argument for L 2 and L ∞ perturbations are the closed formulas for projections needed in common attacks like the Projected Gradient Descent (PGD) Shafahi All norms in this paper are vector norms, i.e., an H × W RGB image is considered as vector in R n n = 3HW , not as a matrix.1



et al. (2013); Szegedy et al. (2014); Papernot et al. (2016a) have raised concerns about the safety and robustness of deploying neural networks in critical decision-making processes. Given a neural network that makes accurate predictions on clean data, these attacks modify inputs in a way indiscernible to humans to produce erroneous predictions. Adversarial attacks can be broadly grouped into black-box and white-box Papernot et al. (2016a); Tramèr et al. (2018). White-box attacks have full access to the neural network architecture, its weights, the training data and the learning algorithm Goodfellow et al. (2015); Kurakin et al. (2017); Papernot et al. (2016a); Madry et al. (2018); Croce & Hein (2020). Black-box attacks are only allowed to perform queries on the target network and observe the input-output relationship Narodytska & Kasiviswanathan (2017); Brendel et al. (2017); Su et al. (2019); Andriushchenko et al. (2020). Many approaches have been proposed to detect adversarial examples Xu et al. (2018); Ma et al. (2018); Feinman et al. (2017); Metzen et al. (2017) and defend against them Gu & Rigazio (2014); Papernot et al. (2016b); Liao et al. (2018); Xie et al. (2019); Zhou et al. (2021). However, most of these defenses can again be broken by suitable adaptive attacks Tramèr et al. (2020); Carlini & Wagner (2017). Adversarial training Kurakin et al. (2017); Madry et al. (2018), a seminal approach that augments the training data with adversarial examples, reveals to be effective in training empirically Zhang et al. (2019) and provably Salman et al. (2019) robust neural networks. Another approach proposed by Balunovic & Vechev (2019) combines adversarial training with provable defenses to boost the certified robustness. Further, the robustness of trained neural networks can be verified formally through abstract interpretations and relaxations Singh et al. (2019); Xu et al. (2020); Bunel et al. (2020); Müller et al. (2022).

