LEARNING ITERATIVE NEURAL OPTIMIZERS FOR IM-AGE STEGANOGRAPHY

Abstract

Image steganography is the process of concealing secret information in images through imperceptible changes. Recent work has formulated this task as a classic constrained optimization problem. In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose an iterative neural network trained to perform the optimization steps. In contrast to classical optimization methods like L-BFGS or projected gradient descent, we train the neural network to also stay close to the manifold of natural images throughout the optimization. We show that our learned neural optimization is faster and more reliable than classical optimization approaches. In comparison to previous state-of-the-art encoder-decoder based steganography methods, it reduces the recovery error rate by multiple orders of magnitude and achieves zero error up to 3 bits per pixel (bpp) without the need for error-correcting codes.

1. INTRODUCTION

Image steganography aims to alter a cover image, to imperceptibly hide a (secret) bit string, such that a previously defined decoding method can then extract the message from the altered image. Steganography has been used in many applications such as digital watermarking to establish ownership (Cox et al., 2007) , copyright certification (Bilal et al., 2014) , anonymized image sharing (Kishore et al., 2021) , and for hiding information coupled with images (e.g. patient name and ID in medical CT scans (Srinivasan et al., 2004) ). Following the success of neural networks on various tasks, prior work has used end-to-end encoder-decoder networks for steganography (Zhu et al., 2018; Zhang et al., 2019; Hayes & Danezis, 2017a; Baluja, 2017) . The encoder takes as input an image X and a message M , and produces a steganographic image X that is visually imperceptible from the original image X. The decoder recovers the message M from the steganographic image X. Such methods can hide large amounts of data in images with great image quality, due to convolutional networks' ability to generate realistic outputs that are close to the input image along the manifold of natural images (Zhu et al., 2016; Zhang et al., 2019) . However, these approaches can only reliably encode 2 bits per pixel. At higher bit rates, they suffer from poor recovery error rates for the message M (around 5-25% at 4 bpp). Consequently, they cannot be used for some steganography applications that require the message to be recovered with 100.0% accuracy, for example, if the message is encrypted or hashed. Although error-correcting codes (Crandall, 1998; Munuera, 2007) can be used to recover spurious mistakes, their reliance on additional parity bits reduces the payload, negating much of the advantages of neural approaches. Recent work has abandoned learned encoders altogether and reformulated image steganography as a constrained optimization problem (Kishore et al., 2021) , where the steganographic image is optimized with respect to the outputs of a fixed (random or pre-trained) decoder -by employing a technique based on adversarial image perturbations (Szegedy et al., 2013) . The optimization problem is solved with off-the-shelve gradient-based optimizers, such as projected gradient descent (Carlini & Wagner, 2017) or L-BFGS (Fletcher, 2013) . Such approaches achieve low error rates with high payloads (2-3% error at 4 bpp), but are slow and prone to getting stuck in local minima. Further,

annex

each pixel is optimized in isolation, and pixel-level constraints only ensure that the steganographic image stays close to the input image according to an algebraic norm, rather than along the natural image manifold. Although similar manifold-unaware approaches are successfully deployed for adversarial attacks, steganography aims to precisely control millions of binary decoder outputs, instead of a single class prediction; the resulting optimization problem is thus harder and prone to producing unnatural-looking images.

Message Decoding Error

Error: 50% Error: 0.46%Step 0Error: 0%Step 1Step 8PSNR: 23.50 PSNR: 30.16Error: 0.005%Step 4PSNR: 28.23 Figure 1 demonstrates how the steganographic image and error rate change over subsequent iterations. We also use a critic network (Zhang et al., 2019) to ensure the changes stay imperceptible and that the steganographic image looks natural. Our resulting architecture can be trained end-to-end.We show that LISO learns a more efficient descent direction than standard (manifold unaware) optimization algorithms and produces better steganography results with great consistency. Finally, the error rate can (almost) always be driven to a flat 0 if the optimization is finished with a few iterations of L-BFGS within the vicinity of LISO's solution (LISO + L-BFGS).We evaluate the efficacy of LISO extensively across multiple datasets. We demonstrate that at testtime, with unseen cover images and random bit strings, the optimizer can reliably circumvent bad local minima and find a low-error solution within only a few iterative steps that already outperforms all previous encoder-decoder-based approaches. If the optimization is followed by a few additional updates of L-BFGS optimization, we can reliably reach 100% error-free recovery even with 3 bits per pixel (bpp) hidden information across all (thousands of) images we tested on.Our contributions are as follows. 1) We introduce a novel gradient-based neural optimization algorithm, LISO, that learns preferred descent directions and is image manifold aware. 2) We show that its optimization process based on learned descent directions is orders of magnitudes faster than classical optimization procedures. 3) As far as we know, our variant LISO + L-BFGS is by far the most accurate steganography algorithm in existence, resulting in perfect recovery on all images we tried with 3bpp and the vast majority for 4bpp -implying that it avoids bad local minima and can be deployed without error correcting code. The code for LISO is available at https://github.com/cxy1997/LISO.

2. RELATED WORK

Steganography. Classic steganography methods operate directly on the spatial image domain to encode data. Least Significant Bit (LSB) Steganography encodes data by replacing the least significant bits of the input image pixels with bits from the secret data sequentially (Chan & Cheng, 2004) . Pixel-value Differencing (PVD) (Wu & Tsai, 2003) hides data by comparing the differences between the intensity values of two successive pixels. Following this, many methods were presented to hide data while minimizing distortion to the input images and these methods differed in how the distortion was measured. Highly Undetectable Steganography (HUGO) (Pevnỳ et al., 2010) identifies

