A FRAMEWORK FOR LEARNED COUNTSKETCH

Abstract

Sketching is a compression technique that can be applied to many problems to solve them quickly and approximately. The matrices used to project data to smaller dimensions are called "sketches". In this work, we consider the problem of optimizing sketches to obtain low approximation error over a data distribution. We introduce a general framework for "learning" and applying CountSketch, a type of sparse sketch. The sketch optimization procedure has two stages: one for optimizing the placements of the sketch's non-zero entries and another for optimizing their values. Next, we provide a way to apply learned sketches that has worst-case guarantees for approximation error. We instantiate this framework with three sketching applications: least-squares regression, low-rank approximation (LRA), and k-means clustering. Our experiments demonstrate that our approach substantially decreases approximation error compared to classical and naïvely learned sketches. Finally, we investigate the theoretical aspects of our approach. For regression and LRA, we show that our method obtains state-of-the art accuracy for fixed time complexity. For LRA, we prove that it is strictly better to include the first optimization stage for two standard input distributions. For k-means, we derive a more straightforward means of retaining approximation guarantees.

1. INTRODUCTION

In recent years, we have seen the influence of machine learning extend far beyond the field of artificial intelligence. The underlying paradigm, which assumes that a given algorithm has an input distribution for which algorithm parameters can be optimized, has even been applied to classical algorithms. Examples of classical problems that have benefitted from ML include cache eviction strategies, online algorithms for job scheduling, frequency estimation of data stream elements, and indexing strategies for data structures (Lykouris & Vassilvitskii, 2018; Purohit et al., 2018; Hsu et al., 2019; Kraska et al., 2018) . This input distribution assumption is often realistic. For example, many real-world applications use data streaming to track things like product purchasing statistics in real time. Consecutively streamed datapoints are usually tightly correlated and closely fit certain distributions. We are interested in how this distributional paradigm can be applied to sketching, a data compression technique. With the dramatic increase in the dimensions of data collected in the past decade, compression methods are more important than ever. Thus, it is of practical interest to improve the accuracy and efficiency of sketching algorithms. We study a sketching scheme in which the input matrix is compressed by multiplying it with a "sketch" matrix with a small dimension. This smaller, sketched input is then used to compute an approximate solution. Typically, the sketch matrix and the approximation algorithm are designed to satisfy worst-case bounds on approximation error for arbitrary inputs. With the ML perspective in mind, we examine if it is possible to construct sketches which also have low error in expectation over an input distribution. Essentially, we aim for the best of both worlds: good performance in practice with theoretical worst-case guarantees. Further, we are interested in methods that work for multiple sketching applications. Typically, sketching is very application-specific. The sketch construction and approximation algorithm are tailored to individual applications, like robust regression or clustering (Sarlos, 2006; Clarkson & Woodruff, 2009; 2014; 2017; Cohen et al., 2015; Makarychev et al., 2019) . Instead, we consider

