RECOMMENDER TRANSFORMERS WITH BEHAVIOR PATHWAYS

Abstract

Sequential recommendation requires the recommender to capture the evolving behavior characteristics from logged user behavior data for accurate recommendations. Nevertheless, user behavior sequences are viewed as a script with multiple ongoing threads intertwined. We find that only a small set of pivotal behaviors can be evolved into the user's future action. As a result, the future behavior of the user is hard to predict. We conclude this characteristic for sequential behaviors of each user as the behavior pathway. Different users have their unique behavior pathways. Among existing sequential models, transformers have shown great capacity in capturing global-dependent characteristics. However, these models mainly provide a dense distribution over all previous behaviors using the self-attention mechanism, making the final predictions overwhelmed by the trivial behaviors not adjusted to each user. In this paper, we build the Recommender Transformer (RETR) with a novel Pathway Attention mechanism. RETR can dynamically plan the behavior pathway specified for each user, and sparingly activate the network through this behavior pathway to effectively capture evolving patterns useful for recommendation. The key design is a learned binary route to prevent the behavior pathway from being overwhelmed by trivial behaviors. Pathway attention is model-agnostic and can be applied to a series of transformer-based models for sequential recommendation. We empirically evaluate RETR on seven intra-domain benchmarks and RETR yields state-of-the-art performance. On another five cross-domain benchmarks, RETR can capture more domain-invariant representations for sequential recommendation.

1. INTRODUCTION

Recommender systems (Hidasi et al., 2015; Lu et al., 2015; Zhao et al., 2021) have been widely adopted in real-world industrial applications such as E-commerce and social media. Benefiting from the increase in computing power and model capacity, some recent efforts formulate recommendation as a time-series forecasting problem, known as sequential recommendation (Kang & McAuley, 2018; Sun et al., 2019; Chen et al., 2021) . The core idea of this field is to infer upcoming actions based on user's historical behaviors, which are reorganized as time-ordered sequences. This intuitive modeling of recommendation is proved time-sensitive and context-aware to make precise predictions. Recent advanced sequential recommendation models, such as SASRec (Kang & McAuley, 2018 ), Bert4Rec (Sun et al., 2019 ) and S3-Rec (Zhou et al., 2020) , have achieved significant improvements. Transformers enable these models to recognize global-range sequential patterns, and to model how future behaviors are anchored in historical ones. The self-attention mechanism does make it possible to explore all previous behaviors of each user, with the whole neural network activated. However, misuse of all user information, regardless of whether it is informative or not, floods models with trivial ones, makes models dense in neuron connections and inefficient in computation, and results in key behaviors losing voice. And this clearly contradicts with the way our brain works. The human being has many different parts of the brain specialized for various tasks, yet the brain only calls upon the relevant pieces for a given situation (Zaidi, 2010) . To some extent, user behavior sequences can be viewed as a script with multiple ongoing threads intertwined. And only key clues suggest what will happen next. In sequential recommendation, we find that only a small part of pivotal behaviors can be evolved into the user's future action. And we conclude this characteristics of sequential behaviors as the behavior pathway. Motivated by the Pathways (Dean, 2021), a new way of thinking about AI, which builds a single model that is sparsely activated for all tasks with small pathways through the network called into action as needed, we propose a novel Recommender Transformer (RETR) with a Pathway Attention mechanism. RETR dynamically explores behavior pathways for different users and then captures evolving patterns through these pathways effectively. Specifically, the user-dependent pathway attention, which incorporates a pathway router, determines whether or not a behavior token will be maintained in the behavior pathway. Technically, the pathway router generates a customized binary route for each token based on their information redundancy. RETR has a stacked structure, and successive pathway routers constitute a hierarchical evolution of user behaviors. To enable the pathway router to be end-to-end optimized, we propose an adaptive Gumbel-Softmax sampling strategy to overcome the non-differentiable problem of sampling from a Bernoulli distribution. To effectively capture the evolving patterns via the behavior pathway, our pathway attention mechanism makes RETR mainly attend to the obtained pathway. We force the model to focus on the most informative behaviors by using the query routed through the behavior pathway. We cut off the interaction from the off-pathway behaviors of the query. Compared with using all previous behaviors, our pathway attention mechanism is obviously more effective and can avoid the most informative tokens being overwhelmed by trivial behaviors. Besides, our pathway attention mechanism is model-agnostic and can be easily applied to the existing transformer-based models. To validate the effectiveness of our approach, we conduct experiments on seven intra-domain competitive datasets for sequential recommendations and RETR achieves state-of-the-art performance; Furthermore, our RETR also achieves consistent performance improvements under the cross-domain setting, indicating RETR can capture more domain-universal representation for sequential recommendation. Our main contributions can be summarized as follows: • We first propose the concept of behavior pathway for sequential recommendation, and find the key to the recommender is to dynamically capture the behavior pathway for each user. • We propose the novel Recommender Transformer (RETR) with a novel pathway attention mechanism, which can generate the behavior pathway hierarchically and capture the evolving patterns dynamically through the pathway. • We validate the effectiveness of RETR on seven intra-domain benchmarks and five crossdomain benchmarks, both achieving state-of-the-art performance. RETR can capture more domain-invariant representations and our pathway attention can be applied together with a rich family of transformer-based models to yield consistent performance improvements.



Figure 1: Three typical examples of the behavior pathway for different users: correlated, causal, and drifted. The behavior pathway is outlined by the red boxes. Different users have their unique behavior pathways, and we have provided three typical examples: (a) Correlated behavior pathway: A user's behavior pathway is closely associated with behaviors in a certain period. As shown in the first line of Figure1, the mouse is clicked many times recently, leading to the final decision to buy a mouse. (b) Casual behavior pathway: A user's behavior pathway is interested in a specific item at casual times. In the second line of Figure1, the backpack is randomly clicked sequentially in a multi-hop manner. (c) Drifted behavior pathway: A user's behavior pathway in a particular brand might drift over time. In the third line of Figure1, the user was initially interested in a keyboard, but suddenly became interested in buying a phone at last. It's challenging to capture these potential behaviors dynamically for each user to make precise recommendations.

