PERFECTLY SECURE STEGANOGRAPHY USING MINIMUM ENTROPY COUPLING

Abstract

Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in developing scalable steganography techniques. In this work, we show that a steganography procedure is perfectly secure under Cachin (1998)'s information theoretic-model of steganography if and only if it is induced by a coupling. Furthermore, we show that, among perfectly secure procedures, a procedure is maximally efficient if and only if it is induced by a minimum entropy coupling. These insights yield what are, to the best of our knowledge, the first steganography algorithms to achieve perfect security guarantees with non-trivial efficiency; additionally, these algorithms are highly scalable. To provide empirical validation, we compare a minimum entropy coupling-based approach to three modern baselines-arithmetic coding, Meteor, and adaptive dynamic grouping-using GPT-2, WaveRNN, and Image Transformer as communication channels. We find that the minimum entropy coupling-based approach achieves superior encoding efficiency, despite its stronger security constraints. In aggregate, these results suggest that it may be natural to view information-theoretic steganography through the lens of minimum entropy coupling.

1. INTRODUCTION

In steganography (Blum & Hopper, 2004; Cachin, 2004) , the goal, informally speaking, is to encode a plaintext message into another form of content (called stegotext) such that it appears similar enough to innocuous content (called covertext) that an adversary would not realize that there is hidden meaning. Because steganographic procedures hide the existence of sensitive communication altogether, they provide a complementary kind of security to that of cryptographic methods, which only hide the contents of the sensitive communication-not the fact that it is occurring. In this work, we consider the information-theoretic model of steganography introduced in (Cachin, 1998) . In Cachin (1998)'s model, the exact distribution of covertext is assumed to be known to all parties. Security is defined in terms of the KL divergence between the distribution of covertext and the distribution of stegotext. A procedure is said to be perfectly secure if it guarantees a divergence of zero. Perfect security is a very strong notion of security, as it renders detection by statistical or * Equal contribution human analysis impossible. To the best of our knowledge, the only existing algorithms that achieve both perfect security and non-trivial efficiency are limited to specific distributions of covertext, such as the uniform distribution, making their applicability limited. The main contribution of this work is formalizing a relationship between perfect security and couplings of distributions-that is, joint distributions that marginalize to prespecified marginal distributions. We provide two results characterizing this relationship. First, we show that a steganographic procedure is perfectly secure if and only if it is induced by couplings between the distribution of ciphertext (an encoded form of plaintext that can be made to look uniformly random) and the distribution of covertext. Second, we show that, among perfectly secure procedures, a procedure is maximally efficient if and only if it is induced by couplings whose joint entropy are minimal-that is, minimal entropy couplings (MECs) (Kovačević et al., 2015) . While minimum entropy coupling is an NP-hard problem, there exist O(N log N ) approximation algorithms (Kocaoglu et al., 2017; Cicalese et al., 2019; Rossi, 2019) that are suboptimal (in terms of joint entropy) by no more than one bit, while retaining exact marginalization guarantees. Furthermore, Sokota et al. ( 2022) recently introduced an iterative minimum entropy coupling approach, which we call iMEC, that iteratively applies these approximation procedures to construct couplings between one uniform distribution and one autoregressively specified distribution, both having arbitrarily large supports, while still retaining marginalization guarantees. We show that, because ciphertext can be made to look uniformly random, and any distribution of covertext can be specified autoregressively, iMEC can be leveraged to perform steganography with arbitrary covertext distributions and plaintext messages. Excitingly, to the best of our knowledge, this represents the first instance of a steganography algorithm with non-trivial efficiency and perfect security guarantees that scales to arbitrary distributions of covertext. In our experiments, we evaluate iMEC using GPT-2 (Radford et al., 2019 ) (a language model), WaveRNN (Kalchbrenner et al., 2018 ) (an audio model), and Image Transfomer (an image model) as covertext distributions. We compare against arithmetic coding (Ziegler et al., 2019) , Meteor (Kaptchuk et al., 2021) , and adaptive dynamic grouping (ADG) (Zhang et al., 2021) , other recent methods for performing information theoretic-steganography with deep generative models. To examine empirical security, we estimate the KL divergence between the stegotext and the covertext for each method. For iMEC, we find that the KL divergence is on the order of numerical precision, in agreement with our theoretical guarantees. In contrast, arithmetic coding, Meteor, and ADG yield KL divergences many orders of magnitude larger, reflecting their weaker security guarantees. To examine encoding efficiency, we measure the number of bits transmitted per step. We find that iMEC generally yields superior efficiency results to those of arithmetic coding, Meteor, and ADG, despite its stricter constraints. We would summarize our theoretical results as showing that minimum entropy coupling-based approaches are the most efficient perfect security approaches and our empirical results as showing that minimum entropy coupling-based approaches can be more efficient than less secure alternatives. In aggregate, we believe that these findings suggest that it may be natural to view information-theoretic steganography through the perspective of minimum entropy coupling.

2. BACKGROUND

In the first half of this section, we review the information-theoretic model of steganography introduced by Cachin (1998). In the second half, we review couplings and minimum entropy couplings (Kovačević et al., 2015) .

2.1. AN INFORMATION-THEORETIC MODEL FOR STEGANOGRAPHY

Problem Setting The objects involved in information-theoretic steganography can be divided into two classes: those which are externally specified and those which require algorithmic specification. Each class contains three objects. The externally specified objects include the distribution over plaintext messages M, the distribution over covertext C, and the random source generator. • The distribution over plaintext messages M may be known by the adversary, but is not known by the sender or the receiver. However, the sender and receiver are aware of the domain M over which

