LEARNING TWO-TIME-SCALE REPRESENTATIONS FOR LARGE SCALE RECOMMENDATIONS

Abstract

We propose a surprisingly simple but effective two-time-scale (2TS) model for learning user representations for recommendation. In our approach, we will partition users into two sets, active users with many observed interactions and inactive or new users with few observed interactions, and we will use two RNNs to model them separately. Furthermore, we design a two-stage training method for our model, where, in the first stage, we learn transductive embeddings for users and items, and then, in the second stage, we learn the two RNNs leveraging the transductive embeddings trained in the first stage. Through the lens of online learning and stochastic optimization, we provide theoretical analysis that motivates the design of our 2TS model. The 2TS model achieves a nice bias-variance trade-off while being computationally efficient. In large scale datasets, our 2TS model is able to achieve significantly better recommendations than previous state-of-the-art, yet being much more computationally efficient.

1. INTRODUCTION

A hypothetical user's interaction with recommendation systems gives us diminishing returns in terms of its information value in understanding the user. For an active user who has lots of historical interactions, she is typically well understood by the recommender, and each new interaction gives relatively little new information. In contrast, for an inactive or new user, every additional interaction will provide interesting information for understanding this user. Therefore, the representations for active and inactive users should be updated differently when a new interaction occurs. Figure 1 illustrates such information diminishing phenomenon, where the amount of change in user embedding from φ t to φ t+1 due to an additional interaction is decaying. One can select a particular threshold t * for the number of interactions, above which the users can be categorized to active users, and below which inactive users. Roughly active users' embeddings evolve slowly as a function of the number of interactions, while inactive users' embeddings evolve fast. Hence a two-time-scale embedding evolution. Apart from the time-scale difference in temporal dynamics, the simultaneous presence of active and inactive users also presents other modeling and computational challenges. On the one hand, active users lead to long sequences of interactions and high degree nodes in the user-item interaction graph. Existing sequence models, such as RNN models, have some limitations when dealing with long-range sequences, due to the difficulty in gradient propagation. Moreover, graph neural network-based models become computationally inefficient due to the intensive message passing operations through high-degree nodes introduced by active users. On the other hand, predicting preferences of inactive or



Figure 1: Two-time-scale convergence of user embeddings motivates us to design a two-stage method, where the first stage estimates transductive embedding and the second stage learns two different RNNs for active and inactive users respectively.

