GRAPH NEURAL BANDITS

Abstract

Contextual bandits aim to choose the optimal arm with the highest reward out of a set of candidates based on their contextual information, and various bandit algorithms have been applied to personalized recommendation due to their ability of solving the exploitation-exploration dilemma. Motivated by online recommendation scenarios, in this paper, we propose a framework named Graph Neural Bandits (GNB) to leverage the collaborative nature among users empowered by graph neural networks (GNNs). Instead of estimating rigid user clusters, we model the "fine-grained" collaborative effects through estimated user graphs in terms of exploitation and exploration individually. Then, to refine the recommendation strategy, we utilize separate GNN-based models on estimated user graphs for exploitation and adaptive exploration. Theoretical analysis and experimental results on multiple real data sets in comparison with state-of-the-art baselines are provided to demonstrate the effectiveness of our proposed framework.

1. INTRODUCTION

Contextual bandits are a specific type of multi-armed bandit problem where the additional contextual information (contexts) related to arms are available at each round, and the learner intends to refine its selection strategy based on the received arm contexts and rewards. Various contextual bandit algorithms have been applied in real-world recommendation tasks, such as online content recommendation and advertising (Li et al., 2010; Wu et al., 2016) , and clinical trials (Durand et al., 2018; Villar et al., 2015) . Meanwhile, collaborative effects among users provide us the opportunity to design better recommender strategies, since the target user's preference can be inferred based on other similar users. Such effects have been studied by many bandit works (Gentile et al., 2014; Li et al., 2019; Gentile et al., 2017; Li et al., 2016; Ban & He, 2021) . Different from the conventional collaborative filtering methods (He et al., 2017; Wang et al., 2019) , bandit-based approaches focus on more dynamic environments (such as news, short-video platform) and the exploitation-exploration dilemma inherently existed in the decisions of recommendation. Existing works for clustering of bandits (Gentile et al., 2014; Li et al., 2019; Gentile et al., 2017; Li et al., 2016; Ban & He, 2021; Ban et al., 2022a) have been proposed to model the user correlations (collaborative effects) by clustering users into rigid groups, and assigning each formed group with an estimator to learn the assumed reward functions combined with an Upper Confidence Bound (UCB) strategy for exploration. However, these works only consider the "coarse-grained" user correlations. To be specific, they assume that users from the same group would share identical preferences, i.e., the users from the same group are compelled to make equal contributions to the final decision (arm selection) with regard to the target user. Such formulation of user correlations ("coarsegrained" collaborative effects), evidently fails to comply with real-world application scenarios, since users within the same group tend to have similar but subtly different preferences instead of sharing completely identical tastes. Therefore, given a target user, it is more practical to assume that the rest of the users would impose different levels of (collaborative) effects on this user. Motivated by aforementioned limitations of existing works, in this paper, we propose a novel framework, named Graph Neural Bandits (GNB), to formulate the "fine-grained" collaborative effects, where the correlation of each user pair is preserved by user graphs. Given a target user, other users are allowed to make different contributions to the final decision based on the strength of their correlation to the target user, which therefore corresponds to the "fine-grained" collaborative effects. In particular, in GNB, we propose a novel approach to construct two kinds of user graphs with distinct purposes, called "user exploitation graphs" and "user exploration graphs". Then, we apply

