DEEP RETRIEVAL: AN END-TO-END STRUCTURE MODEL FOR LARGE-SCALE RECOMMENDATIONS Anonymous

Abstract

One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model and then use maximum inner product search (MIPS) algorithms to search top candidates, leading to potential loss of retrieval accuracy. In this paper, we present Deep Retrieval (DR), an end-to-end learnable structure model for largescale recommendations. DR encodes all candidates into a discrete latent space. Those latent codes for the candidates are model parameters and to be learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the latent codes is performed to retrieve the top candidates. Empirically, we showed that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline.

1. INTRODUCTION

Recommendation systems have gained great success in various commercial applications for decades. The objective of these systems is to retrieve relevant candidate items from an corpus based on user features and historical behaviors. One of the early successful techniques of recommendation systems is the collaborative filtering (CF), which makes predictions based on the simple idea that similar users may prefer similar items. Item-based collaborative filtering (Item-CF) (Sarwar et al., 2001) extends the idea by considering the similarities between items and items, which lays the foundation for Amazon's recommendation system (Linden et al., 2003) . In the Internet era, the amount of candidates from content platforms and the number of active users in those platforms rapidly grow to tens to hundreds of millions. The scalability, efficiency as well as accuracy are all challenging problems in the design of modern recommendation systems. Recently, vector-based retrieval methods have been widely adopted. The main idea is to embed users and items in a latent vector space, and use the inner product of vectors to represent the preference between users and items. Representative vector embedding methods include matrix factorization (MF) (Mnih & Salakhutdinov, 2008; Koren et al., 2009) , factorization machines (FM) (Rendle, 2010), DeepFM (Guo et al., 2017) , Field-aware FM (FFM) (Juan et al., 2016) , etc. However, when the number of items is large, the cost of brute-force computation of the inner product for all items can be prohibitive. Thus, maximum inner product search (MIPS) or approximate nearest neighbors (ANN) algorithms are usually used to retrieve top relevant items when the corpus is large. Efficient MIPS or ANN algorithms include tree-based algorithms (Muja & Lowe, 2014; Houle & Nett, 2014) , locality sensitive hashing (LSH) (Shrivastava & Li, 2014; Spring & Shrivastava, 2017) , product quantization (PQ) (Jegou et al., 2010; Ge et al., 2013) , hierarchical navigable small world graphs (HNSW) (Malkov & Yashunin, 2018) , etc. Despite their success in real world applications, vector-based algorithms has two main deficiencies: (1) The objective of learning vector representation and learning good MIPS structure are not well aligned for the recommendation task; (2) The dependency on inner products of user and item embeddings might not be sufficient to capture the complicated structure of user-item interactions (He et al., 2017) . In order to break these limitations, tree based models (Zhu et al., 2018; 2019; Zhuo et al., 2020) , TDM/JDM, have been proposed. These methods use a tree as indices and map each item to a leaf node of the tree. Learning objectives for model parameters and tree structures are well aligned to improve the accuracy. However, the number of parameters in these models are proportional to the ... number of clusters, making the tree structure itself difficult to learn -data available at the leaf level can be scarce and might not provide enough signal to learn a good tree at a finer level. In this paper, we proposed an end-to-end learnable structure model -Deep Retrieval (DR). In DR, we use a K ⇥ D matrix as in Figure 1a for indexing, motivated by Chen et al. (2018) . In this structure, we define a path c as the forward index traverse over matrix columns. Each path is of length D with index value range {1, 2, . . . , K}. There are K D possible paths and each path can be interpreted as a cluster of items. There are two major characteristics in designing the structure. First, There is no "leaf node" as that in a tree, so the data scarcity problem of learning the tree based model can be largely avoided in DR's structure. Second, each item can be indexed by one or more paths -each path could contain multiple items and each item could also belong to multiple paths. This property is naturally accommodated using our probabilistic formulation of the DR model as we will show below. This multiple-to-multiple encoding scheme between items and paths differs significantly with the one-to-one mapping used in tree structure design. In training, the item paths are learnt together with the neural network parameters of the structure model using an expectation-maximization (EM) type algorithm (Dempster et al., 1977) . The entire training process is end-to-end and can be easily deployed for large-scale content platforms. The rest of the paper is organized as follows. In Section 2, we describe the structure model and its structure objective function used in training in detail. We then introduce a beam search algorithm to find candidate paths in the inference stage. In Section 3, we introduce the EM algorithm for training neural network parameters and paths of items jointly. In Section 4, we demonstrate the performance of DR on two public datasets: MovieLens-20Mfoot_0 and Amazon booksfoot_1 . Experiment results show that DR can almost achieve the brute-force accuracy with sub-linear computational complexity. In Section 5, we conclude the paper and discuss several possible future research directions.

2. DEEP RETRIEVAL: AN END-TO-END STRUCTURE MODEL

In this section, we introduce the DR structure model in detail. First, we show how we establish the probability function for user x to select path c given the model parameters ✓. This follows the extension to the multi-path mechanism that enables DR to capture multi-aspect properties of items.



https://grouplens.org/datasets/movielens http://jmcauley.ucsd.edu/data/amazon



Figure 1: (a) Consider a structure with width K = 100 and depth D = 3. Assuming an item is encoded by length-D vector [36, 27, 20], which is called a "path". The path denotes that the item is assigned to the (1, 36), (2, 27), (3, 20) indices of the K ⇥ D matrix. In the figure, arrows with the same color form a path. Different paths could intersect with each other by sharing the same index at some layer. (b) Flow chart showing the process for constructing the probability of path c = [c 1 , c 2 , c 3 ] given input x, p(c|x, ✓).

