ADVERSARIAL ATTACKS ON BINARY IMAGE RECOGNITION SYSTEMS

Abstract

We initiate the study of adversarial attacks on models for binary (i.e. black and white) image classification. Although there has been a great deal of work on attacking models for colored and grayscale images, little is known about attacks on models for binary images. Models trained to classify binary images are used in text recognition applications such as check processing, license plate recognition, invoice processing, and many others. In contrast to colored and grayscale images, the search space of attacks on binary images is extremely restricted and noise cannot be hidden with minor perturbations in each pixel. Thus, the optimization landscape of attacks on binary images introduces new fundamental challenges. In this paper we introduce a new attack algorithm called SCAR, designed to fool classifiers of binary images. We show that SCAR significantly outperforms existing L 0 attacks applied to the binary setting and use it to demonstrate the vulnerability of real-world text recognition systems. SCAR's strong performance in practice contrasts with hardness results that show the existence of worst-case classifiers for binary images that are robust to large perturbations. In many cases, altering a single pixel is sufficient to trick Tesseract, a popular open-source text recognition system, to misclassify a word as a different word in the English dictionary. We also demonstrate the vulnerability of check recognition by fooling commercial check processing systems used by major US banks for mobile deposits. These systems are substantially harder to fool since they classify both the handwritten amounts in digits and letters, independently. Nevertheless, we generalize SCAR to design attacks that fool state-of-the-art check processing systems using unnoticeable perturbations that lead to misclassification of deposit amounts. Consequently, this is a powerful method to perform financial fraud.

1. INTRODUCTION

In this paper we study adversarial attacks on models designed to classify binary (i.e. black and white) images. Models for binary image classification are heavily used across a variety of applications that include receipt processing, passport recognition, check processing, and license plate recognition, just to name a few. In such applications, the text recognition system typically binarizes the input image (e.g. check processing (Jayadevan et al., 2012) , document extraction (Gupta et al., 2007) ) and trains a model to classify binary images. In recent years there has been an overwhelming interest in understanding the vulnerabilities of AI systems. In particular, a great deal of work has designed attacks on image classification models (e.g. (Szegedy et al., 2013; Goodfellow et al., 2014; Moosavi-Dezfooli et al., 2016; Kurakin et al., 2016; Papernot et al., 2016; Madry et al., 2017; Carlini & Wagner, 2017; Chen et al., 2017; Ilyas et al., 2018a; b; Tu et al., 2019; Guo et al., 2019; Li et al., 2019) ). Such attacks distort images in a manner that is virtually imperceptible to the human eye and yet cause state-of-the-art models to misclassify these images. Although there has been a great deal of work on attacking image classification models, these attacks are designed for colored and grayscale images. These attacks hide the noise in the distorted images by making minor perturbations in the color values of each pixel. Somewhat surprisingly, when it comes to binary images, the vulnerability of state-of-the-art models is poorly understood. In contrast to colored and grayscale images, the search space of attacks on binary images is extremely restricted and noise cannot be hidden with minor perturbations of color values in each pixel. As a result, existing attack algorithms on machine learning systems do not apply to binary inputs. Since binary image classifiers are used in high-stakes decision making and are heavily used in banking and other multi-billion dollar industries, the natural question is: Are models for binary image classification used in industry vulnerable to adversarial attacks? In this paper we initiate the study of attacks on binary image classifiers. We develop an attack algorithm, called SCAR, designed to fool binary image classifiers. SCAR carefully selects pixels to flip to the opposite color in a query efficient manner, which is a central challenge when attacking black-box models. We first show that SCAR outperforms existing attacks that we apply to the binary setting on multiple models trained over the MNIST and EMNIST datasets, as well as models for handwritten strings and printed word recognition. We then use SCAR to demonstrate the vulnerability of text recognition systems used in industry. We fool commercial check processing systems used by US banks for mobile check deposits. One major challenge in attacking these systems, whose software we licensed from providers, is that there are two independent classifiers, one for the amount written in words and one for the amount written in numbers, that must be fooled with the same wrong amount. Check fraud is a major concern for US banks, accounting for $1.3 billion in losses in 2018 (American Bankers Association, 2020). Since check fraud occurs at large scale, we believe that the vulnerability of check processing systems to adversarial attacks raises a serious concern. We also show that no attack can obtain reasonable guarantees on the number of pixel inversions needed to cause misclassification as there exist simple classifiers that are provably robust to large perturbations. There exist classifiers for d-dimensional binary images such that every class contains some image that requires Ω(d) pixel inversions (L 0 distance) to change the label of that image and such that for every class, a random image in that class requires Ω( √ d) pixel inversions in expectation. Related work. The study of adversarial attacks was initiated in the seminal work by Szegedy et al. ( 2013) that showed that models for image classification are susceptible to minor perturbations in the input. There has since then been a long line of work developing attacks on colored and greyscale images. Most relevant to us are L 0 attacks, which iteratively make minor perturbations in carefully chosen pixels to minimize the total number of pixels that have been modified (Papernot et al., 2016; Carlini & Wagner, 2017; Schott et al., 2018; Guo et al., 2019) . We compare our attack to two L 0 attacks that are applicable in the black-box binary setting (Schott et al., 2018; Guo et al., 2019) . Another related area of research focuses on developing attacks that query the model as few times as possible (Chen et al., 2017; Ilyas et al., 2018a; b; Guo et al., 2019; Li et al., 2019; Tu et al., 2019; Al-Dujaili & O'Reilly, 2019) . We discuss below why most of these attacks cannot be applied to the binary setting. There has been previous work on attacking OCR systems (Song & Shmatikov, 2018) , but the setting deals with grayscale images and white-box access to the model. Attacks on colored and grayscale images employ continuous optimization techniques and are fundamentally different than attacks on binary images which, due to the binary nature of each pixel, employ combinatorial optimization approaches. Previous work has formulated adversarial attack settings as combinatorial optimization problems, but in drastically different settings. Lei et al. (2018) consider attacks on text classification for tasks such as sentiment analysis and fake news detection, which is a different domain than OCR. Moon et al. ( 2019) formulate L ∞ attacks on colored image classification as a combinatorial optimization problem where the search space for the change in each pixel is {-ε, ε} instead of [-ε, ε] . Finally, we also note that binarization, i.e. transforming colored or grayscale images into black and white images, has been studied as a technique to improve the robustness of models (Schott et al., 2018; Schmidt et al., 2018; Ding et al., 2019) . Previous attacks are ineffective in the binary setting. Previous attacks on grayscale (or colored) images are not directly applicable to our setting since they cause small perturbations in pixel values, which is not possible with binary images. One potential approach to use previous attacks is to relax the binary values to be in the grayscale range. However, the issue with this approach is that small changes in the relaxed grayscale domain are lost when rounding the pixel values back to being a valid binary input for the classifier. Another approach is to increase the step size of an attack such that a small change in a grayscale pixel value instead causes a binary pixel value to flip. This approach is most relevant to L 0 attacks since they perturb a smaller number of pixels. However, even for the two L 0 attacks which can be applied to the binary setting with this approach (Guo et al., 2019; Schott et al., 2018) , this results in a large and visible number of pixel inversions, as shown in Section 6.

