LEARNING PROXIMAL OPERATORS TO DISCOVER MULTIPLE OPTIMA

Abstract

Finding multiple solutions of non-convex optimization problems is a ubiquitous yet challenging task. Most past algorithms either apply single-solution optimization methods from multiple random initial guesses or search in the vicinity of found solutions using ad hoc heuristics. We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence. The learned proximal operator can be further generalized to recover multiple optima for unseen problems at test time, enabling applications such as object detection. The key ingredient in our formulation is a proximal regularization term, which elevates the convexity of our training loss: by applying recent theoretical results, we show that for weakly-convex objectives with Lipschitz gradients, training of the proximal operator converges globally with a practical degree of over-parameterization. We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method.

1. INTRODUCTION

Searching for multiple optima of an optimization problem is a ubiquitous yet under-explored task. In applications like low-rank recovery (Ge et al., 2017) , topology optimization (Papadopoulos et al., 2021) , object detection (Lin et al., 2014) , and symmetry detection (Shi et al., 2020) , it is desirable to recover multiple near-optimal solutions, either because there are many equally-performant global optima or due to the fact that the optimization objective does not capture user preferences precisely. Even for single-solution non-convex optimization, typical methods look for multiple local optima from random initial guesses before picking the best local optimum. Additionally, it is often desirable to obtain solutions to a family of optimization problems with parameters not known in advance, for instance, the weight of a regularization term, without having to restart from scratch. Formally, we define a multi-solution optimization (MSO) problem to be the minimization min x∈X f τ (x), where τ ∈ T encodes parameters of the problem, X is the search space of the variable x, and f τ : R d → R is the objective function depending on τ . The goal of MSO is to identify multiple solutions for each τ ∈ T , i.e., the set {x * ∈ X : f τ (x * ) = min x∈X f τ (x)}, which can contain more than one element or even infinitely many elements. In this work, we assume that X ⊂ R d is bounded and that d is small, and that T is, in a loose sense, a continuous space, such that the objective f τ changes continuously as τ varies. To make gradient-based methods viable, we further assume that each f τ is differentiable almost everywhere. As finding all global minima in the general case is extremely challenging, realistically our goal is to find a diverse set of local minima.

