LARGE SCALE IMAGE COMPLETION VIA CO-MODUL-ATED GENERATIVE ADVERSARIAL NETWORKS

Abstract

Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation.

1. INTRODUCTION

Generative adversarial networks (GANs) have received a great amount of attention in the past few years, during which a fundamental problem emerges from the divergence of development between image-conditional and unconditional GANs. Image-conditional GANs have a wide variety of computer vision applications (Isola et al., 2017) . As vanilla U-Net-like generators cannot achieve promising performance especially in free-form image completion (Liu et al., 2018; Yu et al., 2019) , a multiplicity of task-specific approaches have been proposed to specialize GAN frameworks, mostly focused on hand-engineered multi-stage architectures, specialized operations, or intermediate structures like edges or contours (Altinel et al., 2018; Ding et al., 2018; Iizuka et al., 2017; Jiang et al., 2019; Lahiri et al., 2020; Li et al., 2020; Liu et al., 2018; 2019a; 2020; Nazeri et al., 2019; Ren et al., 2019; Wang et al., 2018; Xie et al., 2019; Xiong et al., 2019; Yan et al., 2018; Yu et al., 2018; 2019; Yu et al., 2019; Zeng et al., 2019; Zhao et al., 2020a; Zhou et al., 2020) . These branches of works have made significant progress in reducing the generated artifacts like color discrepancy and blurriness. However, a serious challenge remains that all existing algorithms tend to fail when handling large-scale missing regions. This is mainly due to their lack of the underlying generative capability -one can never learn to complete a large proportion of an object so long as it does not have the capability of generating a completely new one. We argue that the key to overcoming this challenge is to bridge the gap between image-conditional and unconditional generative architectures.

