RGI: ROBUST GAN-INVERSION FOR MASK-FREE IM-AGE INPAINTING AND UNSUPERVISED PIXEL-WISE ANOMALY DETECTION

Abstract

Generative adversarial networks (GANs), trained on a large-scale image dataset, can be a good approximator of the natural image manifold. GAN-inversion, using a pre-trained generator as a deep generative prior, is a promising tool for image restoration under corruptions. However, the performance of GAN-inversion can be limited by a lack of robustness to unknown gross corruptions, i.e., the restored image might easily deviate from the ground truth. In this paper, we propose a Robust GAN-inversion (RGI) method with a provable robustness guarantee to achieve image restoration under unknown gross corruptions, where a small fraction of pixels are completely corrupted. Under mild assumptions, we show that the restored image and the identified corrupted region mask converge asymptotically to the ground truth. Moreover, we extend RGI to Relaxed-RGI (R-RGI) for generator fine-tuning to mitigate the gap between the GAN learned manifold and the true image manifold while avoiding trivial overfitting to the corrupted input image, which further improves the image restoration and corrupted region mask identification performance. The proposed RGI/R-RGI method unifies two important applications with state-of-the-art (SOTA) performance: (i) mask-free semantic inpainting, where the corruptions are unknown missing regions, the restored background can be used to restore the missing content. (ii) unsupervised pixelwise anomaly detection, where the corruptions are unknown anomalous regions, the retrieved mask can be used as the anomalous region's segmentation mask.

1. INTRODUCTION

When trained on large-scale natural image datasets, GAN (Goodfellow et al., 2020 ) is a good approximator of the underlying true image manifold. It captures rich knowledge of natural images and can serve as an image prior. Recently, utilizing the learned prior through GANs shows impressive results in various tasks, including the image restoration (Yeh et al., 2017; Pan et al., 2021; Gu et al., 2020) , unsupervised anomaly detection (Schlegl et al., 2017; Xia et al., 2022b ) and so on. In those applications, GAN learns a deep generative image prior (DGP) to approximate the underlying true image manifold. Then, for any input image, GAN-inversion (Zhu et al., 2016) is used to search for the nearest image on the learned manifold, i.e., recover the d-dimensional latent vector ẑ by ẑ = arg min z∈R d L rec (x, G(z)), where G(•) is the pre-trained generator, x is the input image, and L rec (•, •) is the loss function measuring the distance between x and the restored image x = G(ẑ), such as l 1 , l 2 -norm distance and perceptual loss (Johnson et al., 2016) , or combinations thereof. However, this approach may fail when x is grossly corrupted by unknown corruptions, i.e., a small fraction of pixels are completely corrupted with unknown locations and magnitude. For example, in semantic image inpainting (Yeh et al., 2017) , where the corruptions are unknown missing regions, a pre-configured missing regions' segmentation mask is needed to exclude the missing regions' influence on the optimization procedure. Otherwise, the restored image will easily deviate from the ground truth image (Figure 1 ). For another example, in unsupervised anomaly detection (Schlegl et al., 2017) , where the anomalies naturally occur as unknown gross corruptions and the residual between the input image and the restored image is adopted as the anomaly segmentation mask, i.e., x -G(ẑ), such a deviation will deteriorate the segmentation performance. However, the assumption of knowing a pre-configured corrupted region mask can be strong (for semantic inpainting) or even invalid (for unsupervised anomaly detection). Therefore, improving the robustness of GAN-inversion under unknown gross corruptions is important. Another problem is the GAN approximation gap between the GAN learned image manifold and the true image manifold, i.e., even without corruptions, the restored image x from Equation 1 can contain significant mismatches to the input image x. This limits the performance of GAN-based methods for semantic inpainting and, especially for unsupervised anomaly detection since any mismatch between the restored image and the input image will be counted towards the anomaly score. When a segmentation mask of the corrupted region is known, such an approximation gap can be mitigated by fine-tuning the generator (Pan et al., 2021) . However, adopting such a technique under unknown gross corruptions can trivially overfit the corrupted image and fail at restoration. Therefore, mitigating GAN approximation gap under unknown gross corruptions is important. To address these issues, we propose an RGI method and further generalize it to R-RGI. For any corrupted input image, the proposed method can simultaneously restore the corresponding clean image and extract the corrupted region mask. The main contributions of the proposed method are: Methodologically, RGI improves the robustness of GAN-inversion in the presence of unknown gross corruptions. We further prove that, under mild assumptions, (i) the RGI restored image (and identified mask) asymptotically converges to the true clean image (and the true binary mask of the corrupted region) (Theorems 1 and 2); (ii) in addition to asymptotic results, for a properly selected tuning parameter, the true mask of the corrupted region is given by simply thresholding the RGI identified mask (Theorem 2). (iii) Moreover, we generalize the RGI method to R-RGI for meaningful generator fine-tuning to mitigate the approximation gap under unknown gross corruptions. Practically (i) for mask-free semantic inpainting, where the corruptions are unknown missing regions, the restored background can be used to restore the missing content; (ii) for unsupervised pixel-wise anomaly detection, where the corruptions are unknown anomalous regions, the retrieved mask can be used as the anomalous region's segmentation mask. The RGI/R-RGI method unifies these two important tasks and achieves SOTA performance in both tasks. -inversion (Xia et al., 2022a) aims to project any given image to the latent space of a pretrained generator. The inverted latent code can be used for various downstream tasks, including GAN-based image editing (Wang et al., 2022a ), restoration (Pan et al., 2021) , and so on. GANinversion can be categorized into learning-based, optimization-based, and hybrid methods. The



Figure 1: Inverting a corrupted test image in Stanford cars dataset (Krause et al., 2013) (i) the GAN-inversion restored background can significantly deviate from the ground truth; In contrast, the RGI method achieves a robust background restoration under unknown gross corruptions; (ii) due to the GAN approximation gap, the true clean image may not live on the GAN learned image manifold; R-RGI can further fine tune the learned manifold toward the true image manifold.

