ALMOST LINEAR CONSTANT-FACTOR SKETCHING FOR ℓ 1 AND LOGISTIC REGRESSION

Abstract

We improve upon previous oblivious sketching and turnstile streaming results for ℓ 1 and logistic regression, giving a much smaller sketching dimension achieving O(1)approximation and yielding an efficient optimization problem in the sketch space. Namely, we achieve for any constant c > 0 a sketching dimension of Õ(d 1+c ) for ℓ 1 regression and Õ(µd 1+c ) for logistic regression, where µ is a standard measure that captures the complexity of compressing the data. For ℓ 1 -regression our sketching dimension is near-linear and improves previous work which either required Ω(log d)-approximation with this sketching dimension, or required a larger poly(d) number of rows. Similarly, for logistic regression previous work had worse poly(µd) factors in its sketching dimension. We also give a tradeoff that yields a 1 + ε approximation in input sparsity time by increasing the total size to (d log(n)/ε) O(1/ε) for ℓ 1 and to (µd log(n)/ε) O(1/ε) for logistic regression. Finally, we show that our sketch can be extended to approximate a regularized version of logistic regression where the data-dependent regularizer corresponds to the variance of the individual logistic losses.

1. INTRODUCTION

We consider logistic regression in distributed and streaming environments. A key tool for solving these problems is a distribution over random oblivious linear maps S ∈ R r×n which have the property that, for a given n × d matrix X, where we assume the labels for the rows of X have been multiplied into X, given only SX one can efficiently and approximately solve the logistic regression problem. The fact that S does not depend on X is what is referred to as S being oblivious, which is important in distributed and streaming tasks since one can choose S without first needing to read the input data. The fact that S is a linear map is also important for such tasks, since given SX (1) and SX (2) , one can add these to obtain S(X (1) + X (2) ), which allows for positive or negative updates to entries of the input in a stream, or across multiple servers in the arbitrary partition model of communication, see, e.g., (Woodruff, 2014) for a discussion of data stream and communication models. An important goal is to minimize the sketching dimension r of the sketching matrix S, as this translates into the memory required of a streaming algorithm and the communication cost of a distributed algorithm. At the same time, one would like the approximation factor that one obtains via this approach to be as small as possible. Specifically we develop and improve oblivious sketching for the most important robust linear regression variant, namely ℓ 1 regression, and for logistic regression, which is a generalized linear model of high importance for binary classification and estimation of Bernoulli probabilities. Sketching supports very fast updates which is desirable for performing robust and generalized regression in high-velocity data processing applications, for instance in physical experiments and other resource constraint settings, cf. (Munteanu et al., 2021; Munteanu, 2023). We focus on the case where the number n of data points is very large, i.e., n ≫ d. In this case, applying a standard algorithm directly is not a viable option since it is either too slow or even becomes impossible when it requires more memory than we can afford. Following the sketch & solve paradigm

