LANDSCAPE LEARNING FOR NEURAL NETWORK INVERSION

Abstract

Many machine learning methods operate by inverting a neural network at inference time, which has become a popular technique for solving inverse problems in computer vision, robotics, and graphics. However, these methods often involve gradient descent through a highly non-convex loss landscape, causing the optimization process to be unstable and slow. We introduce a method that learns a loss landscape where gradient descent is efficient, bringing massive improvement and acceleration to the inversion process. We demonstrate this advantage on a number of methods for both generative and discriminative tasks, including GAN inversion, adversarial defense, and 3D human pose reconstruction.

1. INTRODUCTION

Many inference problems in machine learning are formulated as inverting a forward model F (x) by optimizing an objective over the input space x. This approach, which we term optimization-based inference (OBI), has traditionally been used to solve a range of inverse problems in vision, graphics, robotics, recommendation systems, and security (Hernandez et al., 2008; Lee & Kuo, 1993; Domke, 2012; Brakel et al., 2013; Stoyanov et al., 2011; Cremer et al., 2019) . Recently, neural networks have emerged as the paramterization of choice for forward models (Loper et al., 2015; Pavlakos et al., 2019; Abdal et al., 2019; Menon et al., 2020; Yu et al., 2021; Wang et al., 2019; Chen et al., 2021a; Zhang et al., 2021) , which can be pretrained on large collections of data, and inverted at testing time in order to solve inference queries. Optimization-based inference (OBI) has a number of advantages over feed-forward or encoder-based inference (EBI). Since there is no encoder, OBI provides flexibility to adapt to new tasks, allowing one to define new constraints into the objective during inference. When observations are partially missing, OBI can adapt without additional training. Moreover, OBI naturally supports generating multiple and diverse hypotheses when there is uncertainty. Finally, OBI has intrinsic advantages for robustness, both adapting to new data distributions as well as defending against adversarial examples. However, the key bottleneck for OBI in practice is the computational efficiency and the speed of inference. Feed-forward models are fast because they only require a single forward pass of a neural network, but OBI requires many (often hundreds) steps of optimization in order to obtain strong results for one example. Forward models in OBI are often trained with generative or discriminative tasks, but they are not trained for the purpose of performing gradient descent in the input space. Fig. 8 visualizes the loss landscape for uncurated examples. The loss landscape is not guaranteed to be an efficient path from the initialization to the solution, causing the instability and inefficiency. In this paper, we propose a framework to accelerate and stabilize the inversion of forward neural networks. Instead of optimizing over the original input space, we will learn a new input space such that gradient descent converges quickly. Our approach uses an alternating algorithm in order to learn the mapping between these spaces. The first step collects optimization trajectories in the new space until convergence. The second step updates the mapping parameters to reduce the distance between the convergence point and each point along the trajectory. By repeating these steps, our approach will learn a mapping between the spaces that allows gradient descent over the input to converge in significantly fewer steps. Empirical experiments and visualizations on both generative and discriminative models show that our method can significantly improve the convergence speed for optimization. We validate our approach on a diverse set of computer vision tasks, including GAN inversion (Abdal et al., 2019) , adversarial defense (Mao et al., 2021) , and 3D human pose reconstruction (Pavlakos et al., 2019) . Our experiments show that our method converges an order of magnitude faster without loss in absolute performance after convergence. As our approach does not require retraining of the forward model, it can be compatible to all existing OBI methods with a differentiable forward model and objective function. The primary contribution of this paper is an efficient optimization-based inference framework. In Sec. 2, we survey the related literature to provide an overview of forward model inversion problem. In Sec. 3, we formally define OBI (3.1); our method to learn an efficient loss landscape for OBI (3.2); a training algorithm for better generalization and robustness (3.3) . In Sec. 4, we experimentally study and analyze the effectiveness of the mapping network for OBI. We will release all code and models.

2. RELATED WORK

The different approaches for inference with a neural network can be partitioned into either encoderbased inference, which is feedforward, or optimization-based inference, which is iterative. We briefly review these two approaches in the context of our work. In addition, we also review representative work in meta-learning and discuss the similarities and differences with our work.

2.1. ENCODER-BASED INFERENCE

Encoder-based inference trains a neural network F to directly map from the output space to the input space. Auto-encoder based approach (Pidhorskyi et al., 2020) 2016) learn an encoder from the image to the latent space in GAN. Encoder based inference requires training the encoder on the anticipated distribution in advance, which is often less effective and can fail on unexpected samples (Dinh et al., 2021; Kang et al., 2021) .

2.2. OPTIMIZATION-BASED INFERENCE

OBI methods perform inference by solving an optimization problem with gradient-based methods such as Stochastic Gradient Descent (SGD) (Robbins & Monro, 1951) and Projected Gradient Descent (PGD) (Madry et al., 2017) . In these cases, the objective function specifies the inference task. Besides these methods which use a point estimate for the latent variable, one can estimate the posterior distribution of the latent variable through Bayesian optimization, such as SGLD (Welling & Teh, 2011) . Gradient based optimization methods have been used to infer the latent code of query samples in deep generative models like GANs (Goodfellow et al., 2014 ) via GAN inversion (Karras et al., 2020; Jahanian et al., 2019; Shen et al., 2020; Zhu et al., 2016; Abdal et al., 2019; 2020; Bau et al., 2019; Huh et al., 2020; Pan et al., 2021; Yang et al., 2019) . Style transfer relies on gradient based optimization to change the style of the input images (Jing et al., 2019) . It can also create adversarial attacks that fool the classifier (Croce & Hein, 2020; Carlini & Wagner, 2017; Mao et al., 2019; Szegedy et al., 2013) (2021) . They search in the latent space to produce an image that has the highest similarity with the given text as measured by a multi-modal similarity model like CLIP (Radford et al., 2021) . Test-time constrained optimization is also related to the idea of 'prompt-tuning' for large language models. Lester et al. (2021) learns "soft prompts" to condition frozen language models to perform specific downstream tasks. Soft prompts are learned through backpropagation to incorporate signal from just a few labeled examples (few-shot learning). A major challenge for optimization-based inference is how to perform efficient optimization in a highly non-convex space Geiger et al. (2021) . To address this, input convex model (Amos et al., 2017) was proposed so that gradient descent can be performed in a convex space. Tripp et al. (2020) introduced a method to retrain the generative model such as it learns a latent manifold that is easy



learns an encoder that map the input data to the latent space. Richardson et al. (2021); Tov et al. (2021); Wei et al. (2021); Perarnau et al. (

. Recently, back-propagation-based optimization has shown to be effective in defending adversarial examples (Mao et al., 2021) and compressed sensing (Bora et al., 2017). Wu et al. (2019) uses Model-Agnostic Meta-Learning (MAML) (Finn et al., 2017) to accelerate the optimization process in compressed sensing. Recently, constrained optimization was popularized for text-to-image synthesis by Crowson et al. (2022); Liu et al.

