AN EFFICIENT PROTOCOL FOR DISTRIBUTED COLUMN SUBSET SELECTION IN THE ENTRYWISE p NORM

Abstract

We give a distributed protocol with nearly-optimal communication and number of rounds for Column Subset Selection with respect to the entrywise 1 norm (k-CSS 1 ), and more generally, for the p -norm with 1 ≤ p < 2. We study matrix factorization in 1 -norm loss, rather than the more standard Frobenius norm loss, because the 1 norm is more robust to noise, which is observed to lead to improved performance in a wide range of computer vision and robotics problems. In the distributed setting, we consider s servers in the standard coordinator model of communication, where the columns of the input matrix A ∈ R d×n (n d) are distributed across the s servers. We give a protocol in this model with O(sdk) communication, 1 round, and polynomial running time, and which achieves a multiplicative k 1 p -1 2 poly(log nd)-approximation to the best possible column subset. A key ingredient in our proof is the reduction to the p,2 -norm, which corresponds to the p-norm of the vector of Euclidean norms of each of the columns of A. This enables us to use strong coreset constructions for Euclidean norms, which previously had not been used in this context. This naturally also allows us to implement our algorithm in the popular streaming model of computation. We further propose a greedy algorithm for selecting columns, which can be used by the coordinator, and show the first provable guarantees for a greedy algorithm for the 1,2 norm. Finally, we implement our protocol and give significant practical advantages on real-world data analysis tasks.

1. INTRODUCTION

Column Subset Selection (k-CSS) is a widely studied approach for rank-k approximation and feature selection. In k-CSS, one seeks a small subset U ∈ R d×k of k columns of a data matrix A ∈ R d×n , typically n d, for which there is a right factor V such that |U V -A| is small under some norm | • |. k-CSS is a special case of low rank approximation for which the left factor is an actual subset of columns. The main advantage of k-CSS over general low rank approximation is that the resulting factorization is more interpretable, as columns correspond to actual features while general low rank approximation takes linear combinations of such features. In addition, k-CSS preserves the sparsity of the data matrix A. k-CSS has been extensively studied in the Frobenius norm (Guruswami & Sinop, 2012; Boutsidis et al., 2014; Boutsidis & Woodruff, 2017; Boutsidis et al., 2008) and operator norms (Halko et al., 2011; Woodruff, 2014) . A number of recent works (Song et al., 2017; Chierichetti et al., 2017; Dan et al., 2019; Ban et al., 2019; Mahankali & Woodruff, 2020) studied this problem in the p norm (k-CSS p ) for 1 ≤ p < 2. The 1 norm is less sensitive to outliers, and better at handling missing data and non-Gaussian noise, than the Frobenius norm (Song et al., 2017) . Specifically, the 1 norm leads to improved performance in many real-world applications, such as structure-from-motion (Ke & Kanade, 2005) and image denoising (Yu et al., 2012) . Distributed low-rank approximation arises naturally when a dataset is too large to store on one machine, takes prohibitively long time for a single machine to compute a rank-k approximation, or is collected simultaneously on multiple machines. Despite the flurry of recent work on k-CSS p , this problem remains largely unexplored in the distributed setting. This should be contrasted to Frobenius norm column subset selection and low rank approximation, for which a number of results in the distributed model are known, see, e.g., Altschuler et al. (2016); Balcan et al. (2015; 2016); Boutsidis et al. (2016) . We consider a widely applicable model in the distributed setting, where s

