PURE: AN UNCERTAINTY-AWARE RECOMMENDATION FRAMEWORK FOR MAXIMIZING EXPECTED POSTE-RIOR UTILITY OF PLATFORM

Abstract

Commercial recommendation can be regarded as an interactive process between the recommendation platform and its target users. One crucial problem for the platform is how to make full use of its advantages so as to maximize its utility, i.e., the commercial benefits from recommendation. In this paper, we propose a novel recommendation framework which effectively utilizes the information of user uncertainty over different item dimensions 1 and explicitly takes into consideration the impact of display policy on user in order to achieve maximal expected posterior utility for the platform. We formulate the problem of deriving optimal policy to achieve maximal expected posterior utility as a constrained non-convex optimization problem and further propose an ADMM-based solution to derive an approximately optimal policy. Extensive experiments are conducted over data collected from a real-world recommendation platform and demonstrate the effectiveness of the proposed framework. Besides, we also adopt the proposed framework to conduct experiments with an intent to reveal how the platform achieves its commercial benefits. The results suggest that the platform should cater to the user's preference for item dimensions that the user prefers, while for item dimensions where the user is with high uncertainty, the platform can achieve more commercial benefits by recommending items with high utilities.

1. INTRODUCTION

Commercial recommendation systems have been widely applied among prevalent content distribution platforms such as YouTube, TikTok, Amazon and Taobao. During the interactive process on the recommendation platform, the users may find contents of their interests and avoid the information overload problem with the help of recommendation services. Meanwhile, the platform may gain commercial benefits from user behaviors on the platform such as clicks and purchases. As the platform may serve millions of users and can determine which contents to be recommended, it naturally has some advantages over individual user. Therefore, it would be crucial for the platform to make full use of its advantages in order to maximize the commercial benefits. One typical advantage of the platform is its information advantage, i.e., they may collect plenty of information over users and items for conducting better recommendation. Typical state-of-the-art recommendation systems (Covington et al., 2016; Guo et al., 2017; Ren et al., 2019; Zhou et al., 2019) always take these information into consideration including user profiles, item features and historical interactions between users and recommended items. It is worth noting that information over item features is always directly incorporated into the recommendation models without considering that the user may be with different levels of uncertainty over different item dimensions (which can be regarded as different hidden attributes describing different high-order features of the item). For instance, when buying a new coat on the platform, a user may be sure that the logistics is very fast as she (he) has bought clothes from the same online store before (i.e., the user is with low uncertainty over the logistics). But she (he) may be uncertain about the quality of the coat since it is of the brand that she (he) does not know much about (i.e., the user is with high uncertainty over the quality). Thus, it would be crucial for the platform to figure out whether it is possible to leverage the user uncertainty over different item dimensions to maximize the platform utility, and if yes, how? Actually, with consideration of the user uncertainty over different item dimensions, we would show that more commercial benefits can be gained from the item dimensions with higher uncertainty. Another advantage of the platform is that it owns the capacity of determining which items to display for the users and thus may affect the users' behaviors. It has been proved by lots of works (Kamenica & Gentzkow, 2011; Immorlica et al., 2019; Abdollahpouri & Mansoury, 2020 ) that the display signal itself would highly affect users' behaviors, and affected behaviors would apparently result in different benefits for the platform. Regarding the recommendation as a game between the platform and the users, it is possible for the platform to achieve more commercial benefits from the game by taking a proper display (recommendation) policy. However, though there are works to explore the impact of recommendation policies, it is still not well-studied in recommendation area how to explicitly model and exploit the impact of the display policy over users. In this paper, we propose an uncertainty-aware expected Posterior Utility maximization framework for REcommendation platforms (denoted as PURE in short). We take both the two previously mentioned factors, i.e., user uncertainty over different item dimensions and influence of display policy over the user, into account and introduce a generic utility function which can be flexibly adjusted for different real-world scenarios. Then, we formulate the problem of maximizing expected posterior utility for the platform as a constrained non-convex optimization problem, and correspondingly propose a solution based on Alternating Direction Method of Multipliers (ADMM, Boyd et al. ( 2011)) to derive the approximately optimal policy. To verify the effectiveness of the proposed framework, extensive experiments are conducted over data collected from a real-world recommendation platform. Furthermore, we also provide practical insights derived from carefully designed experiments and empirically reveal how the platform utilizes its information advantage to achieve more commercial benefits, which may help to better understand and conduct commercial recommendation.

2. RELATED WORK

Existing state-of-the-art recommendation systems (Zhou et al., 2018; Pi et al., 2019; Qu et al., 2016) mainly try to make full use of the information advantage of the platform. These works take these information into consideration including user profiles, item features, contextual information and historical interactions between users and recommended items. Typically, some works (Qu et al., 2016; Zhou et al., 2018; Li et al., 2019) focus on how to achieve better feature interactions or conduct better user interest modeling, while some works (Ren et al., 2019; Pi et al., 2019) may pay more attention to utilizing extremely long sequential interactive information. However, most of them ignore the existence of user uncertainty over different item dimensions, which might be crucial to conduct better commercial recommendation. In the research area to explore the display influence to the information receiver, Bayesian Persuasion (Kamenica & Gentzkow, 2011) is one of the most crucial works, which theoretically proves that the information sender may benefit from displaying proper information to the receiver. Some works (Immorlica et al., 2019; Mansour et al., 2016) follow this idea and strive to incentivize exploration via information asymmetry in scenarios such as recommendation. In another research direction that try to develop Reinforcement Learning (RL) based solutions for recommendation scenarios, a series of works (Dulac-Arnold et al., 2015; Zhao et al., 2018; Chen et al., 2019) model the recommendation process as a Markov Decision Process (MDP) and maximize the long-term reward via utilizing learned sequential patterns, which can also be regarded as taking the display (recommendation) influence into consideration to some extent.

3.1. OPTIMAL POLICY FOR MAXIMIZING PLATFORM'S EXPECTED POSTERIOR UTILITY

From the perspective of the platform, the optimal recommendation policy is the one with maximal expected utility (i.e., maximal expected commercial benefits). As mentioned before, the influence of display policy over users can not be ignored as it would highly affect the commercial benefits of the platform. In this paper, taking the impact of display policy on users into consideration, we formulate the platform's optimal policy π u for user u over a given item set I as follows. π u = argmax π i∈I π i U u (i|display; π), s.t., ∀i ∈ I, π i ≥ 0 and i∈I π i = 1, where U u (i|display; π) is the posterior utility of recommending item i to user u with consideration of the influence of display policy π. With this formulation, the remaining problem is how to model



Item dimensions: Typical state-of-the-art solutions for recommendation systems always encode each item as an embedding. The item dimensions refer to different dimensions of the item embedding, which can be explained as different high-order features.

