MULTI-DOMAIN IMAGE GENERATION AND TRANSLA-TION WITH IDENTIFIABILITY GUARANTEES

Abstract

Multi-domain image generation and unpaired image-to-to-image translation are two important and related computer vision problems. The common technique for the two tasks is the learning of a joint distribution from multiple marginal distributions. However, it is well known that there can be infinitely many joint distributions that can derive the same marginals. Hence, it is necessary to formulate suitable constraints to address this highly ill-posed problem. Inspired by the recent advances in nonlinear Independent Component Analysis (ICA) theory, we propose a new method to learn the joint distribution from the marginals by enforcing a specific type of minimal change across domains. We report one of the first results connecting multi-domain generative models to identifiability and shows why identifiability is essential and how to achieve it theoretically and practically. We apply our method to five multi-domain image generation and six image-toimage translation tasks. The superior performance of our model supports our theory and demonstrates the effectiveness of our method. The training code are available at https://github.com/Mid-Push/i-stylegan.

1. INTRODUCTION

Multi-domain image generation and unpaired image-to-image translation are two important and closely related problems in computer vision and machine learning. They have many promising applications such as domain adaptation (Liu & Tuzel, 2016; Hoffman et al., 2018; Murez et al., 2018; Wang & Jiang, 2019) and medical analysis (Armanious et al., 2019; 2020; Kong et al., 2021) . As shown in Fig. 1 , multi-domain image generation takes as input the random noise ϵ and domain label u and the task aims to generate image tuples where the images in the tuple share the same content, e.g., different facial expressions of the same people. The second task takes as input an image in one domain and target domain label u and aims to generate another image which is in the target domain but share the same content of input, e.g., the output image has the same identity but different facial expression from the input image. Both tasks can be viewed as instantiations of joint distribution learning problem. A joint distribution of multi-domain images is a probability density function that gives a density value to each joint occurrence of images in different domains such as images of the same people with different facial expressions. Once the joint distribution is learned, it can be used to generate meaningful tuples (the first task) and translate an input image into another domain without content distortion (the second task). If the correspondence across domains is given (e.g., the identity), we can apply supervised approaches to learn the joint distribution easily. However, collecting corresponding data across domains can be prohibitively expensive . For example,



Figure 1: Two tasks.

