AN OPTIMAL TRANSPORT PERSPECTIVE ON UNPAIRED IMAGE SUPER-RESOLUTION Anonymous

Abstract

Real-world image super-resolution (SR) tasks often do not have paired datasets, which limits the application of supervised techniques. As a result, the tasks are usually approached by unpaired techniques based on Generative Adversarial Networks (GANs), which yield complex training losses with several regularization terms, e.g., content or identity losses. We theoretically investigate optimization problems which arise in such models and find two surprizing observations. First, the learned SR map is always an optimal transport (OT) map. Second, we theoretically prove and empirically show that the learned map is biased, i.e., it does not actually transform the distribution of low-resolution images to high-resolution ones. Inspired by these findings, we propose an algorithm for unpaired SR which learns an unbiased OT map for the perceptual transport cost. Unlike the existing GAN-based alternatives, our algorithm has a simple optimization objective reducing the need for complex hyperparameter selection and an application of additional regularizations. At the same time, it provides a nearly state-of-the-art performance on the large-scale unpaired AIM19 dataset.

1. INTRODUCTION

The problem of image super-resolution (SR) is to reconstruct a high-resolution (HR) image from its low-resolution (LR) counterpart. In many modern deep learning approaches, SR networks are trained in a supervised manner by using synthetic datasets containing LR-HR pairs (Lim et al., 2017, 4.1) ; (Zhang et al., 2018b, 4.1) . For example, it is common to create LR images from HR with a simple downscaling, e.g., bicubic (Ledig et al., 2017, 3.2) . However, such an artificial setup barely represents the practical setting, in which the degradation is more sophisticated and unknown (Maeda, 2020) . This obstacle suggests the necessity of developing methods capable of learning SR maps from unpaired data without considering prescribed degradations. Contributions. We study the unpaired image SR task and its solutions based on Generative Adversarial Networks (Goodfellow et al., 2014, GANs) and analyse them from the Optimal Transport (OT, see (Villani, 2008) ) perspective. 1. We investigate the GAN optimization objectives regularized with content losses, which are common in unpaired image SR methods ( 5, 4). We prove that the solution to such objectives is always an optimal transport map. We theoretically and empirically show that such maps are biased ( 7.1), i.e., they do not transform the LR image distribution to the true HR image distribution. 2. We provide an algorithm to fit an unbiased OT map for perceptual transport cost ( 6.1) and apply it to the unpaired image SR problem ( 7.2). We establish connections between our algorithm and regularized GANs using integral probability metrics (IPMs) as a loss ( 6.2). Our algorithm solves a minimax optimization objective and does not require extensive hyperparameter search, which makes it different from the existing methods for unpaired image SR. At the same time, the algorithm provides a nearly state-of-art performance in the unpaired image SR problem ( 7.2). Notation. We use X , Y to denote Polish spaces and P(X ), P(Y) to denote the respective sets of probability distributions on them. We denote by Π(P, Q) the set of probability distributions on X × Y with marginals P and Q. For a measurable map T : X → Y, we denote the associated push-forward operator by T # . The expression ∥ • ∥ denotes the usual Euclidean norm if not stated otherwise. We denote the space of Q-integrable functions on Y by L 1 (Q).

2. UNPAIRED IMAGE SUPER-RESOLUTION TASK

In this section, we formalize the unpaired image super-resolution task that we consider (Figure 2 ). Figure 2 : The task of super-resolution we consider. Let P and Q be two distributions of LR and HR images, respectively, on spaces X and Y, respectively. We assume that P is obtained from Q via some unknown degradation. The learner has access to unpaired random samples from P and Q. The task is to fit a map T : X → Y satisfying T # P = Q which inverts the degradation. We highlight that the image SR task is theoretically ill-posed for two reasons. 1. Non-existence. The degradation filter may be non-injective and, consequently, non-invertible. This is a theoretical obstacle to learn one-to-one SR maps T . 2. Ambiguity. There might exist multiple maps satisfying T # P = Q but only one inverting the degradation. With no prior knowledge about the correspondence between P and Q, it is unclear how to pick this particular map. The first issue is usually not taken into account in practice. Most existing paired and unpaired SR methods learn one-to-one SR maps T , see (Ledig et al., 2017; Lai et al., 2017; Wei et al., 2021) . The second issue is typically softened by regularizing the model with the content loss. In the real-world, it is reasonable to assume that HR and the corresponding LR images are close. Thus, the fitted SR map T is expected to only slightly change the input image. Formally, one may require the learned map T to have the small value of R c (T ) def = Y c x, T (x) dP(x), where c : X × Y → R + is a function estimating how different the inputs are. The most popular example is the ℓ 1 identity loss, i.e, formulation (1) for X = Y = R D and c(x, y) = ∥x -y∥ 1 . More broadly, losses R c (T ) are typically called content losses and incorporated into training objectives of methods for SR (Lugmayr et al., 2019a , 3.4), (Kim et al., 2020, 3) and other unpaired tasks beside SR (Taigman et al., 2016, 4) , (Zhu et al., 2017, 5. 2) as regularizers. They stimulate the learned map T to minimally change the image content.

3. BACKGROUND ON OPTIMAL TRANSPORT

In this section, we give the key concepts of the OT theory (Villani, 2008) that we use in our paper. Primal form. For two distributions P ∈ P(X ) and Q ∈ P(Y) and a transport cost c : X × Y → R, Monge's primal formulation of the optimal transport cost is as follows: Cost(P, Q) def = inf T # P=Q X c x, T (x) dP(x), where the minimum is taken over the measurable functions (transport maps) T : X → Y that map P to Q, see Figure 3a . The optimal T * is called the optimal transport map. Note that (2) is not symmetric, and this formulation does not allow mass splitting, i.e., for some P, Q there may be no map T that satisfies T # P = Q. Thus, (Kantorovitch, 1958) proposed the relaxation: Cost(P, Q)  where the minimum is taken over the transport plans π, i.e., the measures on X × Y whose marginals are P and Q (Figure 3b ). The optimal π * ∈ Π(P, Q) is called the optimal transport plan.



Figure 1: Super-resolution of a squirrel using Bicubic upsample, OTS (ours) and DASR (Wei et al., 2021) methods (4×4 upsample, 370×800 crops).

def= inf π∈Π(P,Q) X ×Y c(x, y)dπ(x, y),

