PROJECTIVE PROXIMAL GRADIENT DESCENT FOR A CLASS OF NONCONVEX NONSMOOTH OPTIMIZA-TION PROBLEMS: FAST CONVERGENCE WITHOUT KURDYKA-ŁOJASIEWICZ (KŁ) PROPERTY

Abstract

Nonconvex and nonsmooth optimization problems are important and challenging for statistics and machine learning. In this paper, we propose Projected Proximal Gradient Descent (PPGD) which solves a class of nonconvex and nonsmooth optimization problems, where the nonconvexity and nonsmoothness come from a nonsmooth regularization term which is nonconvex but piecewise convex. In contrast with existing convergence analysis of accelerated PGD methods for nonconvex and nonsmooth problems based on the Kurdyka-Łojasiewicz (KŁ) property, we provide a new theoretical analysis showing local fast convergence of PPGD. It is proved that PPGD achieves a fast convergence rate of O(1/k 2 ) when the iteration number k ≥ k 0 for a finite k 0 on a class of nonconvex and nonsmooth problems under mild assumptions, which is locally Nesterov's optimal convergence rate of first-order methods on smooth and convex objective function with Lipschitz continuous gradient. Experimental results demonstrate the effectiveness of PPGD.

1. INTRODUCTION

Nonconvex and nonsmooth optimization problems are challenging ones which have received a lot of attention in statistics and machine learning (Bolte et al., 2014; Ochs et al., 2015) . In this paper, we consider fast optimization algorithms for a class of nonconvex and nonsmooth problems presented as min x∈R d F (x) = g(x) + h(x), where g is convex, h(x) = d j=1 h j (x j ) is a separable regularizer, each h j is piecewise convex. A piecewise convex function is defined in Definition 1.1. For simplicity of analysis we let h j = f for all j ∈ [d], and f is a piecewise convex function. Here [d] is the set of natural numbers between 1 and n inclusively. f can be either nonconvex or convex, and all the results in this paper can be straightforwardly extended to the case when {h j } are different. Definition 1.1. A univariate function f : R → R is piecewise convex if f is lower semicontinuous and there exist intervals {R m } M m=1 such that R = M m=1 R m , and f restricted on R m is convex for each m ∈ [M ]. The left and right endpoints of R m are denoted by q m-1 and q m for all m ∈ [M ], where {q m } M m=0 are the endpoints such that q 0 = -∞ ≤ q 1 < q 2 < . . . < q M = +∞. Furthermore, f is either left continuous or right continuous at each endpoint q m for m ∈ [M -1]. {R m } M m=1 are also referred to as convex pieces throughout this paper. It is important to note that for all m ∈ [M -1], when f is continuous at the endpoint q m or f is only left continuous at q m , q m ∈ R m and q m / ∈ R m+1 . If f is only right continuous at q m , q m / ∈ R m and q m ∈ R m+1 . This ensures that any point in R lies in only one convex piece. When M = 1, f becomes a convex function on R, and problem (1) is a convex problem. We consider M > 1 throughout this paper, and our proposed algorithm trivially extends to the case when M = 1. We allow a special case that an interval  R m = {q m } for m ∈ [M -1] is a single- point set, in this case q m-1 = q m . (x) = λ1I {x<τ } is piecewise convex with R 1 = (-∞, τ ), R 2 = [τ, ∞). (2) The capped-ℓ 1 penalty function f (x) = f (x; λ, b) = λ min {|x| , b} is piecewise convex with R 1 = (-∞, -b], R 2 = [-b, b], R 3 = [b, ∞). (3) The leaky capped- ℓ 1 penalty function (Wangni & Lin, 2017) f = λ min {|x| , b} + β1I {|x|≥b} |x -b| with R 1 = (-∞, -b], R 2 = [-b, b], R 3 = [b, ∞). The three functions are illustrated in Figure 1 . While not illustrated, f (x) = 1I {x̸ =0} for the ℓ 0 -norm with h(x) = ∥x∥ 0 is also piecewise convex.

1.1. MAIN ASSUMPTION

The main assumption of this paper is that g and h satisfy the following conditions. Assumption 1 (Main Assumption). (a) g is convex with L-smooth gradient, that is, ∥∇g(x) -∇g(y)∥ 2 ≤ L g ∥x -y∥ 2 . F is coercive, that is, F (x) → ∞ when ∥x∥ 2 → ∞, and inf x∈R d F (x) > -∞. (b) f : R → R is a piecewise convex function and lower semicontinuous. Furthermore, there exists a small positive constant s 0 < R 0 such that f is differentiable on (q m -s 0 , q m ) and (q m , q m +s 0 ) for all m ∈ [M -1]. (c) The proximal mapping prox sfm has a closed-form solution for all m ∈ [M ], where prox sfm (x) := argmin v∈R 1 2s (v -x) 2 + f m (v). (d) f has "negative curvature" at each endpoint q m where f is continuous for all m ∈ [M -1], that is, lim x→q - m f ′ (x) > lim x→q + m f ′ (x). We define C := min m∈[M -1] : f continuous at qm lim x→q - m f ′ (x) -lim x→q + m f ′ (x) > 0. In addition, f has bounded Fréchet subdifferential, that is, sup x∈ M m=1 R o m sup v∈ ∂f (x) ∥v∥ 2 ≤ F 0 for some absolute constant F 0 > 0, where R o denotes the interior of an interval. Fréchet subdifferential is formally defined in Section 4. It is noted that on each R o m , the convex differential of f coincides with its Fréchet subdifferential. We define the minimum jump value of f at noncontinuous endpoints by (3)



Figure 1: Illustration of three piecewise convex functions. Example 1.1. (1) The indicator penalty function f(x) = λ1I {x<τ } is piecewise convex with R 1 = (-∞, τ ), R 2 = [τ, ∞). (2) The capped-ℓ 1 penalty function f (x) = f (x; λ, b) = λ min {|x| , b} is piecewise convex with R 1 = (-∞, -b], R 2 = [-b, b], R 3 = [b, ∞). (3) The leaky cappedℓ 1 penalty function (Wangni & Lin, 2017) f = λ min {|x| , b} + β1I {|x|≥b} |x -b| with R 1 = (-∞, -b], R 2 = [-b, b], R 3 = [b, ∞).The three functions are illustrated in Figure1. While not illustrated, f (x) = 1I {x̸ =0} for the ℓ 0 -norm with h(x) = ∥x∥ 0 is also piecewise convex.

-1] : f is only right continuous at qm lim y→q - m f (y) -f (qm) , min m∈[M -1] : f is only left continuous at qm lim y→q + m f (y) -f (qm)    .

