LEARNING ITERATIVE NEURAL OPTIMIZERS FOR IM-AGE STEGANOGRAPHY

Abstract

Image steganography is the process of concealing secret information in images through imperceptible changes. Recent work has formulated this task as a classic constrained optimization problem. In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose an iterative neural network trained to perform the optimization steps. In contrast to classical optimization methods like L-BFGS or projected gradient descent, we train the neural network to also stay close to the manifold of natural images throughout the optimization. We show that our learned neural optimization is faster and more reliable than classical optimization approaches. In comparison to previous state-of-the-art encoder-decoder based steganography methods, it reduces the recovery error rate by multiple orders of magnitude and achieves zero error up to 3 bits per pixel (bpp) without the need for error-correcting codes.

1. INTRODUCTION

Image steganography aims to alter a cover image, to imperceptibly hide a (secret) bit string, such that a previously defined decoding method can then extract the message from the altered image. Steganography has been used in many applications such as digital watermarking to establish ownership (Cox et al., 2007) , copyright certification (Bilal et al., 2014) , anonymized image sharing (Kishore et al., 2021) , and for hiding information coupled with images (e.g. patient name and ID in medical CT scans (Srinivasan et al., 2004) ). Following the success of neural networks on various tasks, prior work has used end-to-end encoder-decoder networks for steganography (Zhu et al., 2018; Zhang et al., 2019; Hayes & Danezis, 2017a; Baluja, 2017) . The encoder takes as input an image X and a message M , and produces a steganographic image X that is visually imperceptible from the original image X. The decoder recovers the message M from the steganographic image X. Such methods can hide large amounts of data in images with great image quality, due to convolutional networks' ability to generate realistic outputs that are close to the input image along the manifold of natural images (Zhu et al., 2016; Zhang et al., 2019) . However, these approaches can only reliably encode 2 bits per pixel. At higher bit rates, they suffer from poor recovery error rates for the message M (around 5-25% at 4 bpp). Consequently, they cannot be used for some steganography applications that require the message to be recovered with 100.0% accuracy, for example, if the message is encrypted or hashed. Although error-correcting codes (Crandall, 1998; Munuera, 2007) can be used to recover spurious mistakes, their reliance on additional parity bits reduces the payload, negating much of the advantages of neural approaches. Recent work has abandoned learned encoders altogether and reformulated image steganography as a constrained optimization problem (Kishore et al., 2021) , where the steganographic image is optimized with respect to the outputs of a fixed (random or pre-trained) decoder -by employing a technique based on adversarial image perturbations (Szegedy et al., 2013) . The optimization problem is solved with off-the-shelve gradient-based optimizers, such as projected gradient descent (Carlini & Wagner, 2017) or L-BFGS (Fletcher, 2013) . Such approaches achieve low error rates with high payloads (2-3% error at 4 bpp), but are slow and prone to getting stuck in local minima. Further,

