DEEP RETRIEVAL: AN END-TO-END STRUCTURE MODEL FOR LARGE-SCALE RECOMMENDATIONS Anonymous

Abstract

One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model and then use maximum inner product search (MIPS) algorithms to search top candidates, leading to potential loss of retrieval accuracy. In this paper, we present Deep Retrieval (DR), an end-to-end learnable structure model for largescale recommendations. DR encodes all candidates into a discrete latent space. Those latent codes for the candidates are model parameters and to be learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the latent codes is performed to retrieve the top candidates. Empirically, we showed that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline.

1. INTRODUCTION

Recommendation systems have gained great success in various commercial applications for decades. The objective of these systems is to retrieve relevant candidate items from an corpus based on user features and historical behaviors. One of the early successful techniques of recommendation systems is the collaborative filtering (CF), which makes predictions based on the simple idea that similar users may prefer similar items. Item-based collaborative filtering (Item-CF) (Sarwar et al., 2001) extends the idea by considering the similarities between items and items, which lays the foundation for Amazon's recommendation system (Linden et al., 2003) . In the Internet era, the amount of candidates from content platforms and the number of active users in those platforms rapidly grow to tens to hundreds of millions. The scalability, efficiency as well as accuracy are all challenging problems in the design of modern recommendation systems. Recently, vector-based retrieval methods have been widely adopted. The main idea is to embed users and items in a latent vector space, and use the inner product of vectors to represent the preference between users and items. Representative vector embedding methods include matrix factorization (MF) (Mnih & Salakhutdinov, 2008; Koren et al., 2009) , factorization machines (FM) (Rendle, 2010), DeepFM (Guo et al., 2017) , Field-aware FM (FFM) (Juan et al., 2016) , etc. However, when the number of items is large, the cost of brute-force computation of the inner product for all items can be prohibitive. Thus, maximum inner product search (MIPS) or approximate nearest neighbors (ANN) algorithms are usually used to retrieve top relevant items when the corpus is large. Efficient MIPS or ANN algorithms include tree-based algorithms (Muja & Lowe, 2014; Houle & Nett, 2014) , locality sensitive hashing (LSH) (Shrivastava & Li, 2014; Spring & Shrivastava, 2017) , product quantization (PQ) (Jegou et al., 2010; Ge et al., 2013) , hierarchical navigable small world graphs (HNSW) (Malkov & Yashunin, 2018) , etc. Despite their success in real world applications, vector-based algorithms has two main deficiencies: (1) The objective of learning vector representation and learning good MIPS structure are not well aligned for the recommendation task; (2) The dependency on inner products of user and item embeddings might not be sufficient to capture the complicated structure of user-item interactions (He et al., 2017) . In order to break these limitations, tree based models (Zhu et al., 2018; 2019; Zhuo et al., 2020) , TDM/JDM, have been proposed. These methods use a tree as indices and map each item to a leaf node of the tree. Learning objectives for model parameters and tree structures are well aligned to improve the accuracy. However, the number of parameters in these models are proportional to the

