AUTOMATING NEAREST NEIGHBOR SEARCH CONFIG-URATION WITH CONSTRAINED OPTIMIZATION

Abstract

The approximate nearest neighbor (ANN) search problem is fundamental to efficiently serving many real-world machine learning applications. A number of techniques have been developed for ANN search that are efficient, accurate, and scalable. However, such techniques typically have a number of parameters that affect the speed-recall tradeoff, and exhibit poor performance when such parameters aren't properly set. Tuning these parameters has traditionally been a manual process, demanding in-depth knowledge of the underlying search algorithm. This is becoming an increasingly unrealistic demand as ANN search grows in popularity. To tackle this obstacle to ANN adoption, this work proposes a constrained optimization-based approach to tuning quantization-based ANN algorithms. Our technique takes just a desired search cost or recall as input, and then generates tunings that, empirically, are very close to the speed-recall Pareto frontier and give leading performance on standard benchmarks.

1. INTRODUCTION

Efficient nearest neighbor search is an integral part of approaches to numerous tasks in machine learning and information retrieval; it has been leveraged to effectively solve a number of challenges in recommender systems (Benzi et al., 2016; Cremonesi et al., 2010) , coding theory (May & Ozerov, 2015) , multimodal search (Gfeller et al., 2017; Miech et al., 2021) , and language modeling (Guu et al., 2020; Khandelwal et al., 2020; Kitaev et al., 2020) . Vector search over the dense, high-dimensional embedding vectors generated from deep learning models has become especially important following the rapid rise in capabilities and performance of such models. Nearest neighbor search is also increasingly being used for assisting training tasks in ML (Lindgren et al., 2021; Yen et al., 2018) . Formally, the nearest neighbor search problem is as follows: we are given an n-item dataset X ∈ R n×d composed of d-dimensional vectors, and a function for computing the distance between two vectors D : R d × R d → R. For a query vector q ∈ R d , our goal is to find the indices of the k-nearest neighbors in the dataset to q: k -arg min i∈{1,...,n} D(q, X i ) Common choices of D include D(q, x) = -q, x for maximum inner product search (MIPS) and D(q, x) = q -x 2 2 for Euclidean distance search. A linear-time scan over X solves the nearest neighbor search problem but doesn't scale to the large dataset sizes often found in modern-day applications, hence necessitating the development of approximate nearest neighbor (ANN) algorithms. A number of approaches to the ANN problem have been successful in trading off a small search accuracy loss, measured in result recall, for a correspondingly large increase in search speed (Aumüller et al., 2020) . However, these approaches rely on tuning a number of hyperparameters that adjust the tradeoff between speed and recall, and poor hyperparameter choices may result in performance far below what could be achievable with ideal hyperparameter tuning. This tuning problem becomes especially difficult at the billions-scale, where the larger dataset size typically leads to a greater number of hyperparameters to tune. Existing approaches to tuning an ANN index, enumerated in Table 1 , all suffer from some deficiency, such as using an excessive amount of computation Mitigating these issues is becoming increasingly important with the growth in dataset sizes and in the popularity of the ANN-based retrieval paradigm. This paper describes how highly performant ANN indices may be created and tuned with minimal configuration complexity to the end user. Our contributions are: • Deriving theoretically-grounded models for recall and search cost for quantization-based ANN algorithms, and presenting an efficient Lagrange multipliers-based technique for optimizing either of these metrics with respect to the other. • Showing that on millions-scale datasets, the tunings from our technique give almost identical performance to optimal hyperparameter settings found through exhaustive grid search. • Achieving superior performance on track 1 of the billions-scale big-ann-benchmarks datasets using tunings from our technique over tunings generated by a black-box optimizer on the same ANN index, and over all existing benchmark submissions. Our constrained optimization approach is very general and we anticipate it can be extended to distance measures, quantization algorithms, and search paradigms beyond those explored in this paper. Hashing approaches Techniques under this family utilize locality sensitive hash (LSH) functions, which are functions that hash vectors with the property that more similar vectors are more likely to collide in hash space (Andoni & Razenshteyn, 2015; Datar et al., 2004; Shrivastava & Li, 2014) . By hashing the query and looking up the resulting hash buckets, we may expect to find vectors close to the query. Hashing algorithms are generally parameterized by the number and size of their hash tables. The random memory access patterns of LSH often lead to difficulties with efficient implementation, and the theory that prescribes hyperparameters for LSH-based search generally cannot consider dataset-specific idiosyncrasies that allow for faster search than otherwise guaranteed for worst-case inputs; see Appendix A.1 for further investigation.

2. RELATED WORK

Graph approaches These algorithms compute a (potentially approximate) nearest neighbor graph on X , where each element of X becomes a graph vertex and has directed edges towards its nearest neighbors. The nearest neighbors to q are computed by starting at some vertex and traversing edges



Our technique is the first to use minimal computational cost and human involvement to configure an ANN index to perform very close to its speed-recall Pareto frontier.

