ENJOY YOUR EDITING: CONTROLLABLE GANS FOR IMAGE EDITING VIA LATENT SPACE NAVIGATION

Abstract

Controllable semantic image editing enables a user to change entire image attributes with a few clicks, e.g., gradually making a summer scene look like it was taken in winter. Classic approaches for this task use a Generative Adversarial Net (GAN) to learn a latent space and suitable latent-space transformations. However, current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. To address these concerns, we learn multiple attribute transformations simultaneously, integrate attribute regression into the training of transformation functions, and apply a content loss and an adversarial loss that encourages the maintenance of image identity and photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work, which primarily focuses on qualitative evaluation. Our model permits better control for both single-and multipleattribute editing while preserving image identity and realism during transformation. We provide empirical results for both natural and synthetic images, highlighting that our model achieves state-of-the-art performance for targeted image manipulation.

1. INTRODUCTION

Semantic image editing is the task of transforming a source image to a target image while modifying desired semantic attributes, e.g., to make an image taken during summer look like it was captured in winter. The ability to semantically edit images is useful for various real-world tasks, including artistic visualization, design, photo enhancement, and targeted data augmentation. To this end, semantic image editing has two primary goals: (i) providing continuous manipulation of multiple attributes simultaneously and (ii) maintaining the original image's identity as much as possible while ensuring photo-realism. Existing GAN-based approaches for semantic image editing can be categorized roughly into two groups: (i) image-space editing methods directly transform one image to another across domains (Choi et al., 2018; 2020; Isola et al., 2017; Lee et al., 2020; Wu et al., 2019; Zhu et al., 2017a; b) , usually using variants of generative adversarial nets (GANs) (Goodfellow et al., 2014) . These approaches often have high computational cost, and they primarily focus on binary attribute (on/off) changes, rather than providing continuous attribute editing abilities. (ii) latent-space editing methods focus on discovering latent variable manipulations that permit continuous semantic image edits. The chosen latent space is most often the latent space of GANs. Both unsupervised and (self-)supervised latent space editing methods have been proposed. Unsupervised latent-space editing methods (Härkönen et al., 2020; Voynov & Babenko, 2020) are often less effective at providing semantically meaningful directions and all too often change image identity during an edit. Current (self-)supervised methods (Jahanian et al., 2019; Plumerault et al., 2020) are limited to geometric edits such as rotation and scale. To our knowledge, only one supervised approach has been proposed (Shen et al., 2019) -developed to discover semantic latent-space directions for binary attributes. As we show, this method suffers from entangled attributes and often does not preserve image identity during manipulation. Contributions. We propose a latent-space editing framework for semantic image manipulation that fulfills the aforementioned primary goals. Specifically, we use a GAN and employ a joint sampling strategy trained to edit multiple attributes simultaneously. To disentangle attribute transformations

