PROVABLY EFFICIENT REINFORCEMENT LEARNING FOR ONLINE ADAPTIVE INFLUENCE MAXIMIZATION

Abstract

Online influence maximization aims to maximize the influence spread of a content in a social network with an unknown network model by selecting a few seed nodes. Recent studies followed a non-adaptive setting, where the seed nodes are selected before the start of the diffusion process and network parameters are updated when the diffusion stops. We consider an adaptive version of content-dependent online influence maximization problem where the seed nodes are sequentially activated based on real-time feedback. In this paper, we formulate the problem as an infinite-horizon discounted MDP under a linear diffusion process and present a model-based reinforcement learning solution. Our algorithm maintains a network model estimate and selects seed users adaptively, exploring the social network while improving the optimal policy optimistically. We establish O( √ T ) regret bound for our algorithm. Empirical evaluations on synthetic and real-world networks demonstrate the efficiency of our algorithm.

1. INTRODUCTION

Influence Maximization (IM) (Kempe et al., 2003; Kitsak et al., 2010; Centola & Macy, 2007) , motivated by real-world social-network applications such as viral marketing, has been extensively studied in the past decades. In viral marketing, a marketer selects a set of users (seed nodes) with significant influence for content promotion. These selected users are expected to influence their social network neighbors, and such influence will be propagated across the network. With limited seed nodes, the goal of IM is to maximize the information spread over the network. A typical IM formulation models the social network as a directed graph and the associated edge weights are the propagation probabilities across users. Influence propagation is commonly modeled by a certain stochastic diffusion process, such as independent cascade (IC) model and linear threshold (LT) model (Kempe et al., 2003) . A popular variant is topic-aware IM (Chen et al., 2015; 2016) where the activation probabilities are content-dependent and personalized, i.e., edge weights are different when propagating different contents. Classical influence maximization solutions are studied in an offline setting, assuming activation probabilities are given (Kempe et al., 2003; Chen et al., 2009; 2010) . However, this information may not be fully observable in many real-world applications. Online influence maximization (Chen et al., 2013; Wen et al., 2017; Vaswani et al., 2017) has recently attracted significant attention to tackle this problem, where an agent learns the activation probabilities by repeatedly interacting with the network. Most existing works on online influence maximization are formulated as a multi-armed bandits problem making a non-adaptive batch decision: at each round, the seed nodes are computed prior to the diffusion process by balancing exploring the unknown network and maximizing the influence spread; the agent observes either edge-level (Chen et al., 2013; Wen et al., 2017; Wu et al., 2019) or node-level (Vaswani et al., 2017; Li et al., 2020) activations when the diffusion finishes and updates its model. Combinatorial multi-armed bandits (Chen et al., 2013; Wang & Chen, 2017 ) and combinatorial linear bandits (Wen et al., 2017; Wu et al., 2019) algorithms have been proposed as solutions, where most works follow independent cascade model with edge-level feedback. In contrast to the non-adaptive setting, adaptive influence maximization allows the agent to select seed nodes in a sequential manner after observing partial diffusion results (Golovin & Krause, 2011; Tong et al., 2016; Peng & Chen, 2019) . The agent can achieve a higher influence spread since the decision adapts to the real-time feedback of diffusion. In viral marketing, the agent could observe

