ADVERSARIAL ATTACKS ON BINARY IMAGE RECOGNITION SYSTEMS

Abstract

We initiate the study of adversarial attacks on models for binary (i.e. black and white) image classification. Although there has been a great deal of work on attacking models for colored and grayscale images, little is known about attacks on models for binary images. Models trained to classify binary images are used in text recognition applications such as check processing, license plate recognition, invoice processing, and many others. In contrast to colored and grayscale images, the search space of attacks on binary images is extremely restricted and noise cannot be hidden with minor perturbations in each pixel. Thus, the optimization landscape of attacks on binary images introduces new fundamental challenges. In this paper we introduce a new attack algorithm called SCAR, designed to fool classifiers of binary images. We show that SCAR significantly outperforms existing L 0 attacks applied to the binary setting and use it to demonstrate the vulnerability of real-world text recognition systems. SCAR's strong performance in practice contrasts with hardness results that show the existence of worst-case classifiers for binary images that are robust to large perturbations. In many cases, altering a single pixel is sufficient to trick Tesseract, a popular open-source text recognition system, to misclassify a word as a different word in the English dictionary. We also demonstrate the vulnerability of check recognition by fooling commercial check processing systems used by major US banks for mobile deposits. These systems are substantially harder to fool since they classify both the handwritten amounts in digits and letters, independently. Nevertheless, we generalize SCAR to design attacks that fool state-of-the-art check processing systems using unnoticeable perturbations that lead to misclassification of deposit amounts. Consequently, this is a powerful method to perform financial fraud.

1. INTRODUCTION

In this paper we study adversarial attacks on models designed to classify binary (i.e. black and white) images. Models for binary image classification are heavily used across a variety of applications that include receipt processing, passport recognition, check processing, and license plate recognition, just to name a few. In such applications, the text recognition system typically binarizes the input image (e.g. check processing (Jayadevan et al., 2012 ), document extraction (Gupta et al., 2007) ) and trains a model to classify binary images. In recent years there has been an overwhelming interest in understanding the vulnerabilities of AI systems. In particular, a great deal of work has designed attacks on image classification models (e.g. (Szegedy et al., 2013; Goodfellow et al., 2014; Moosavi-Dezfooli et al., 2016; Kurakin et al., 2016; Papernot et al., 2016; Madry et al., 2017; Carlini & Wagner, 2017; Chen et al., 2017; Ilyas et al., 2018a; b; Tu et al., 2019; Guo et al., 2019; Li et al., 2019) ). Such attacks distort images in a manner that is virtually imperceptible to the human eye and yet cause state-of-the-art models to misclassify these images. Although there has been a great deal of work on attacking image classification models, these attacks are designed for colored and grayscale images. These attacks hide the noise in the distorted images by making minor perturbations in the color values of each pixel. Somewhat surprisingly, when it comes to binary images, the vulnerability of state-of-the-art models is poorly understood. In contrast to colored and grayscale images, the search space of attacks on binary images is extremely restricted and noise cannot be hidden with minor perturbations of color values in 1

