LOCAL SEARCH ALGORITHMS FOR RANK-CONSTRAINED CONVEX OPTIMIZATION

Abstract

We propose greedy and local search algorithms for rank-constrained convex optimization, namely solving min rank(A)≤r * R(A) given a convex function R : R m×n → R and a parameter r * . These algorithms consist of repeating two steps: (a) adding a new rank-1 matrix to A and (b) enforcing the rank constraint on A. We refine and improve the theoretical analysis of Shalev-Shwartz et al. (2011), and show that if the rank-restricted condition number of R is κ, a solution A with rank O(r * • min{κ log R(0)-R(A * ) , κ 2 }) and R(A) ≤ R(A * ) + can be recovered, where A * is the optimal solution. This significantly generalizes associated results on sparse convex optimization, as well as rank-constrained convex optimization for smooth functions. We then introduce new practical variants of these algorithms that have superior runtime and recover better solutions in practice. We demonstrate the versatility of these methods on a wide range of applications involving matrix completion and robust principal component analysis.

1. INTRODUCTION

Given a real-valued convex function R : R m×n → R on real matrices and a parameter r * ∈ N, the rank-constrained convex optimization problem consists of finding a matrix A ∈ R m×n that minimizes R(A) among all matrices of rank at most r * : min rank(A)≤r * R(A) Even though R is convex, the rank constraint makes this problem non-convex. Furthermore, it is known that this problem is NP-hard and even hard to approximate (Natarajan (1995); Foster et al. (2015) ). In this work, we propose efficient greedy and local search algorithms for this problem. Our contribution is twofold: 1. We provide theoretical analyses that bound the rank and objective value of the solutions returned by the two algorithms in terms of the rank-restricted condition number, which is the natural generalization of the condition number for low-rank subspaces. The results are significantly stronger than previous known bounds for this problem. 2. We experimentally demonstrate that, after careful performance adjustments, the proposed general-purpose greedy and local search algorithms have superior performance to other methods, even for some of those that are tailored to a particular problem. Thus, these algorithms can be considered as a general tool for rank-constrained convex optimization and a viable alternative to methods that use convex relaxations or alternating minimization. The rank-restricted condition number Similarly to the work in sparse convex optimization, a restricted condition number quantity has been introduced as a reasonable assumption on R. If we let ρ + r be the maximum smoothness bound and ρ - r be the minimum strong convexity bound only along rank-r directions of R (these are called rank-restricted smoothness and strong convexity respectively), the rank-restricted condition number is defined as κ r = ρ + r ρ - r . If this quantity is bounded, one can efficiently find a solution A with R(A) ≤ R(A * ) + and rank r = O(r * • κ r+r * R(0) ) using a greedy algorithm (Shalev-Shwartz et al. (2011) ). However, this is not an ideal bound since the rank scales linearly with R(0) , which can be particularly high in practice. Inspired by the analogous literature on sparse convex optimization by Natarajan ( 1995 Runtime improvements Even though the rank bound guaranteed by our theoretical analyses is adequate, the algorithm runtimes leave much to be desired. In particular, both the greedy algorithm of Shalev-Shwartz et al. ( 2011) and our local search algorithm have to solve an optimization problem in each iteration in order to find the best possible linear combination of features added so far. Even for the case that R (A) = 1 2 (i,j)∈Ω (M -A) 2 ij ,

this requires solving a least squares problem on |Ω|

examples and r 2 variables. For practical implementations of these algorithms, we circumvent this issue by solving a related optimization problem that is usually much smaller. This instead requires solving n least squares problems with total number of examples |Ω|, each on r variables. This not only reduces the size of the problem by a factor of r, but also allows for a straightforward distributed implementation. Interestingly, our theoretical analyses still hold for these variants. We propose an additional heuristic that reduces the runtime even more drastically, which is to only run a few (less than 10) iterations of the algorithm used for solving the inner optimization problem. Experimental results show that this modification not only does not significantly worsen results, but for machine learning applications also acts as a regularization method that can dramatically improve generalization. These matters, as well as additional improvements for making the local search algorithm more practical, are addressed in Section 2.3.

Roadmap

In Section 2, we provide the descriptions and theoretical results for the algorithms used, along with several modifications to boost performance. In Section 3, we evaluate the proposed greedy and local search algorithms on optimization problems like robust PCA. Then, in Section 4 we evaluate their generalization performance in machine learning problems like matrix completion.

2. ALGORITHMS & THEORETICAL GUARANTEES

In Sections 2.1 and 2.2 we state and provide theoretical performance guarantees for the basic greedy and local search algorithms respectively. Then in Section 2.3 we state the algorithmic adjustments that we propose in order to make the algorithms efficient in terms of runtime and generalization performance. A discussion regarding the tightness of the theoretical analysis is deferred to Appendix A.4. When the dimension is clear from context, we will denote the all-ones vector by 1, and the vector that is 0 everywhere and 1 at position i by 1 i . Given a matrix A, we denote by im(A) its column span. One notion that we will find useful is that of singular value thresholding. More specifically, given a rank-k matrix A ∈ R m×n with SVD k i=1 σ i u i v i such that σ 1 ≥ • • • ≥ σ k , as well as an integer parameter r ≥ 1, we define H r (A) = r i=1 σ i u i v i to be the operator that truncates to the r highest singular values of A.

2.1. GREEDY

Algorithm 1 (Greedy) was first introduced in Shalev- Shwartz et al. (2011) as the GECO algorithm. It works by iteratively adding a rank-1 matrix to the current solution. This matrix is chosen as the



);Shalev-Shwartz et al. (2010);Zhang  (2011); Jain et al. (2014)  and more recently Axiotis & Sviridenko (2020), one would hope to achieve a logarithmic dependence or no dependence at all onR(0) . In this paper we achieve this goal by providing an improved analysis showing that the greedy algorithm ofShalev-Shwartz et al. (2011)   in fact returns a matrix of rank of r = O(r * • κ r+r * log R(0) ). We also provide a new local search algorithm together with an analysis guaranteeing a rank of r = O(r * • κ 2 r+r * ). Apart from significantly improving upon previous work on rank-restricted convex optimization, these results directly generalize a lot of work in sparse convex optimization, e.g. Natarajan (1995);Shalev-Shwartz et al.  (2010); Jain et al. (2014). Our algorithms and theorem statements can be found in Section 2.

