DEEP EVIDENTIAL REINFORCEMENT LEARNING FOR DYNAMIC RECOMMENDATIONS

Abstract

Reinforcement learning (RL) has been applied to build recommender systems (RS) to capture users' evolving preferences and continuously improve the quality of recommendations. In this paper, we propose a novel deep evidential reinforcement learning (DERL) framework that learns a more effective recommendation policy by integrating both the expected reward and evidence-based uncertainty. In particular, DERL conducts evidence-aware exploration to locate items that a user will most likely take interest in the future. Two central components of DERL include a customized recurrent neural network (RNN) and an evidential-actor-critic (EAC) module. The former module is responsible for generating the current state of the environment by aggregating historical information and a sliding window that contains the current user interactions as well as newly recommended items that may encode future interest. The latter module performs evidence-based exploration by maximizing a uniquely designed evidential Q-value to derive a policy giving preference to items with good predicted ratings while remaining largely unknown to the system (due to lack of evidence). These two components are jointly trained by supervised learning and reinforcement learning. Experiments on multiple real-world dynamic datasets demonstrate the state-of-the-art performance of DERL and its capability to capture long-term user interests.

1. INTRODUCTION

Recommender systems (RS) have been widely used for providing personalized recommendations in diverse fields such as media, entertainment, and e-commerce by effectively improving user experience (Su & Khoshgoftaar, 2009; Sun et al., 2014; Xie et al., 2018) . Various methods have been introduced to tackle the recommendation problem. Traditional methods include: collaborative filtering, which captures user preferences using information of similar users (Koren, 2008) , contentbased, where extra information is used for better latent preference and item representation (Mooney & Roy, 2000) , and hybrid, which integrates both collaborative and content-based methods for a more effective recommendation (Burke, 2002) . Deep learning (DL) has also been increasingly used to build RS due to its ability to model complex and non-linear user-item relationships (Cheng et al., 2016; Guo et al., 2017) . Most RS methods mentioned above consider recommendation as a static process, which fails to consider users' evolving preferences. Some efforts have been devoted to capture users' evolving preferences by shifting the user latent preference over time (Koren, 2009; Charlin et al., 2015; Gultekin & Paisley, 2014) . Similarly, sequential recommendation methods (Kang & McAuley, 2018; Tang & Wang, 2018) attempt to incorporate users' dynamic behavior by leveraging previously interacted items. However, both static and dynamic recommendation methods primarily focus on maximizing the immediate (short-term) reward when making recommendations. As a result, they fail to take into account whether these recommended items will lead to long-term returns in the future, which is essential to maintain a stable user base for the system in the long run. Several recent works have adapted reinforcement learning (RL) in the RS (Chen et al., 2019b; Zhao et al., 2017) . RL has already gained huge success in diverse fields, such as robotics (Kober et al., 2013) and games (Silver et al., 2017) . The core idea of RL is to learn an optimal policy to maximize the total expected reward in the long run. RL methods consider a recommendation procedure as sequential interactions between users and RL agents to learn the optimal recommendation policies effectively. Although RL approaches show promising results in RS (Chen et al., 2019b; Zheng et al., 

