DEEP GENERATIVE MODEL BASED RATE-DISTORTION FOR IMAGE DOWNSCALING ASSESSMENT Anonymous authors Paper under double-blind review

Abstract

In this paper, we propose a novel measure, namely Image Downscaling Assessment by Rate-Distortion (IDA-RD), to quantitatively evaluate image downscaling algorithms. In contrast to image-based methods that measure the quality of downscaled images, ours is process-based that draws ideas from the rate-distortion theory to measure the distortion incurred during downscaling. Our main idea is that downscaling and super-resolution (SR) can be viewed as the encoding and decoding processes in the rate-distortion model, respectively, and that a downscaling algorithm that preserves more details in the resulting low-resolution (LR) images should lead to less distorted high-resolution (HR) images in SR. In other words, the distortion should increase as the downscaling algorithm deteriorates. However, it is non-trivial to measure this distortion as it requires the SR algorithm to be blind and stochastic. Our key insight is that such requirements can be met by recent SR algorithms based on deep generative models that can find all matching HR images for a given LR image on their learned image manifolds. Empirically, we first validate our IDA-RD measure with synthetic downscaling algorithms which simulate distortions by adding various types and levels of degradations to the downscaled images. We then test our measure on traditional downscaling algorithms such as bicubic, bilinear, nearest neighbor interpolation as well as state-of-the-art downscaling algorithms such as DPID (Weber et al., 2016), L0-regularized downscaling (Liu et al., 2017), and Perceptual downscaling (Oeztireli & Gross, 2015). Experimental results show the effectiveness of our IDA-RD in evaluating image downscaling algorithms.

1. INTRODUCTION

Image downscaling is a fundamental problem in image processing and computer vision. To address the diverse application scenarios, various digital devices with different resolutions, such as smartphones, iPads, and desktop monitors, co-exist, which makes this problem even more important. In contrast to image super-resolution (SR), which aims to "add" information to low-resolution (LR) images, image downscaling algorithms focus on "preserving" information present in the highresolution (HR) images, which is particularly important for applications and devices with very limited screen spaces. Traditional image downscaling algorithms low-pass filter an image before resampling it. While this prevents aliasing in the downscaled LR image, important high-frequency details of the HR image are removed simultaneously, resulting in a blurred or overly-smooth LR image. To improve the quality of downscaled images, several sophisticated approaches have been proposed recently, including remapping of high-frequency information (Gastal & Oliveira, 2017) , optimization of perceptual image quality metrics (Oeztireli & Gross, 2015) , using L0-regularized priors (Liu et al., 2017) , and pixelizing the HR image (Gerstner et al., 2012; Han et al., 2018; Kuang et al., 2021; Shang & Wong, 2021) . Nevertheless, research in image downscaling algorithms has significantly slowed down due to the lack of a quantitative measure to evaluate them. Specifically, standard distance measures (e.g. L1, L2 norm) and full-reference image quality assessment (IQA) methods are not applicable here due to the absence of ground truth LR images; existing No-Reference IQA (NR-IQA) metrics (Mittal et al., 2012b; a; Bosse et al., 2017 ) cannot be applied either as they rely on the "naturalness" of HR images, which is not present in LR images (we will verify this in our experiments). In this paper, we propose a new quantitative measure for image downscaling based on Claude Shannon's rate-distortion theory (Berger, 2003) , namely Image Downscaling Assessment by Rate-Distortion (IDA-RD). The main idea of our IDA-RD measure is that a superior image downscaling algorithm would try to retain as much information as possible in the LR image, thereby reducing the distortion when being up-scaled (a.k.a. super-resolved) to the size of the original HR image. However, such an upscaling method is non-trivial as it must satisfy two challenging requirements: i) blindness, i.e. it must apply to all kinds of downscaling algorithms without knowing them in advance; ii) stochasticity, i.e. it must be able to generate a manifold of HR images that captures the conditional distribution of the super-resolution process. Our key insight is that both such requirements can be satisfied by the recent success of deep generative models in blind and stochastic super-resolution. To demonstrate the flexibility of our IDA-RD measure, we show that it can be successfully implemented with two mainstream generative models: Generative Adversarial Networks (Menon et al., 2020) and Normalizing Flows (Lugmayr et al., 2020) . Extensive experiments demonstrate the effectiveness of our IDA-RD measure in evaluating image downscaling algorithms. Our contributions include: • Drawing on Claude Shannon's rate-distortion theory (Berger, 2003) , we propose the Image Downscaling Assessment by Rate-Distortion (IDA-RD) measure to quantitatively evaluate image downscaling algorithms, which fills a gap in existing image downscaling research. • We demonstrate the effectiveness of our IDA-RD measure with extensive experiments on both synthetic and real-world image downscaling algorithms.

2. RELATED WORK

Image Downscaling has a long history and its traditional methods (e.g. bicubic) have now become the standard for image processing and computer vision software, making it difficult to trace their origins. To this end, we only review recent attempts in developing better image downscaling algorithms. For example, Gastal & Oliveira (2017) conducted a discrete Gabor frequency analysis and propose to remap the high-frequency information of HR images to the representable range of the downsampled spectrum, thereby preserving high frequency details in image downscaling. Oeztireli & Gross (2015) model image downscaling as an optimization problem and minimize a perceptual metric (SSIM) between the input and downscaled image. However, the limitations of SSIM are also carried over to their approach. DPID (Weber et al., 2016) preserves small details by assigning higher weights to the input pixels whose color deviates from their local neighborhood within the convolutional filter. et al., 2018) . However, such IQA metrics are not applicable in the evaluation of image downscaling algorithms as there are no ground truth LR images for comparison. Thus, most researchers rely on subjective evaluation of downscaled images, which is costly and time-consuming. No-Reference Image Quality Assessment (NR-IQA) addresses IQA in the absence of a reference (i.e. ground truth) image. For example, Mittal et al. (2012a) propose BRISQUE, an NR-IQA metric that uses the natural scene statistics (NSS) to quantify loss of "naturalness" in distorted images.



Liu et al. (2017)  propose an optimization framework using two L0 regularized priors that addresses two issues of image downscaling, i.e. salient feature preservation and downscaled image construction. Image thumbnailing, a special case of image downscaling, has been studied bySun  & Ling (2013). Their two-component thumbnailing framework, named as Scale and Object Aware Thumbnailing (SOAT) focuses on saliency measure and thumbnail cropping.Li et al. (2018)  term image downscaling as image Compact Resolution (CR) and address it with a Convolutional Neural Network (CNN). Inspired by the success of CNNs in image super-resolution (SR), they introduce the CNN-CR model for image downscaling that can be jointly trained with any CNN-SR model. Although their CNN-CR model results in better reconstruction quality than other downscaling algorithms, they only demonstrate results for small downscaling factors (×2). However, the majority of both image downscaling and super-resolution algorithms tend to focus on larger scaling factors (e.g. ×8). Despite the aforementioned works, there does not exist a good quantitative measure for the evaluation of image downscaling methods, which impedes the research on them.

