MISSO: MINIMIZATION BY INCREMENTAL STOCHAS-TIC SURROGATE OPTIMIZATION FOR LARGE SCALE NONCONVEX AND NONSMOOTH PROBLEMS

Abstract

Many constrained, nonconvex and nonsmooth optimization problems can be tackled using the majorization-minimization (MM) method which alternates between constructing a surrogate function which upper bounds the objective function, and then minimizing this surrogate. For problems which minimize a finite sum of functions, a stochastic version of the MM method selects a batch of functions at random at each iteration and optimizes the accumulated surrogate. However, in many cases of interest such as variational inference for latent variable models, the surrogate functions are expressed as an expectation. In this contribution, we propose a doubly stochastic MM method based on Monte Carlo approximation of these stochastic surrogates. We establish asymptotic and non-asymptotic convergence of our scheme in a constrained, nonconvex, nonsmooth optimization setting. We apply our new framework for inference of logistic regression model with missing data and for variational inference of Bayesian variants of LeNet-5 and Resnet-18 on respectively the MNIST and CIFAR-10 datasets.

1. INTRODUCTION

We consider the constrained minimization problem of a finite sum of functions: min θ∈Θ L(θ) := 1 n n i=1 L i (θ) , where Θ is a convex, compact, and closed subset of R p , and for any i ∈ 1, n , the function L i : R p → R is bounded from below and is (possibly) nonconvex and nonsmooth. To tackle the optimization problem (1), a popular approach is to apply the majorization-minimization (MM) method which iteratively minimizes a majorizing surrogate function. A large number of existing procedures fall into this general framework, for instance gradient-based or proximal methods or the Expectation-Maximization (EM) algorithm (McLachlan & Krishnan, 2008) and some variational Bayes inference techniques (Jordan et al., 1999) ; see for example (Razaviyayn et al., 2013) and (Lange, 2016) and the references therein. When the number of terms n in (1) is large, the vanilla MM method may be intractable because it requires to construct a surrogate function for all the n terms L i at each iteration. Here, a remedy is to apply the Minimization by Incremental Surrogate Optimization (MISO) method proposed by Mairal (2015) , where the surrogate functions are updated incrementally. The MISO method can be interpreted as a combination of MM and ideas which have emerged for variance reduction in stochastic gradient methods (Schmidt et al., 2017) . An extended analysis of MISO has been proposed in (Qian et al., 2019) . The success of the MISO method rests upon the efficient minimization of surrogates such as convex functions, see (Mairal, 2015, Section 2.3). A notable application of MISO-like algorithms is described in (Mensch et al., 2017) where the authors builds upon the stochastic majorizationminimization framework of Mairal (2015) to introduce a method for sparse matrix factorization. Yet, in many applications of interest, the natural surrogate functions are intractable, yet they are defined as expectation of tractable functions. For instance, this is the case for inference in latent variable models via maximum likelihood (McLachlan & Krishnan, 2008) . Another application is

