TDR-CL: TARGETED DOUBLY ROBUST COLLABORA-TIVE LEARNING FOR DEBIASED RECOMMENDATIONS

Abstract

Bias is a common problem inherent in recommender systems, which is entangled with users' preferences and poses a great challenge to unbiased learning. For debiasing tasks, the doubly robust (DR) method and its variants show superior performance due to the double robustness property, that is, DR is unbiased when either imputed errors or learned propensities are accurate. However, our theoretical analysis reveals that DR usually has a large variance. Meanwhile, DR would suffer unexpectedly large bias and poor generalization caused by inaccurate imputed errors and learned propensities, which usually occur in practice. In this paper, we propose a principled approach that can effectively reduce the bias and variance simultaneously for existing DR approaches when the error imputation model is misspecified. In addition, we further propose a novel semi-parametric collaborative learning approach that decomposes imputed errors into parametric and nonparametric parts and updates them collaboratively, resulting in more accurate predictions. Both theoretical analysis and experiments demonstrate the superiority of the proposed methods compared with existing debiasing methods.

1. INTRODUCTION

Addressing various tasks in recommender systems (RSs) with causality-based methods has become increasingly popular (Wu et al., 2022b) . Causality-based recommendation has shown its great potential in both numeric experiments and theoretical analyses across extensive literature (Chen et al., 2020; Wang et al., 2019) . Generally, the basic question faced in RS is that "what would the feedback be if recommending an item to a user", requiring to estimate the causal effect of a recommendation on user feedback. To answer the question, many methods have been proposed, such as inverse propensity score (IPS) (Schnabel et al., 2016) , self-normalized inverse propensity score (SNIPS) (Swaminathan & Joachims, 2015) , error imputation based (EIB) methods (Steck, 2010) , and doubly robust (DR) methods (Chen et al., 2021; Wang et al., 2019; 2021; Dai et al., 2022; Ding et al., 2022) . Among them, the DR method and its variants show superior performance. We compare and evaluate these methods in terms of three desired properties, including doubly robust (Hernán & Robins, 2020; Wu et al., 2022c) , robust to small propensities (Rosenbaum, 2020), and low variance (Tan, 2007) . Failing to meet any of them may lead to sub-optimal performance (Molenberghs et al., 2015; van der Laan & Rose, 2011) . Our theoretical analysis shows that DR has much greater variance and is less robust to small propensities compared to EIB (Kang & Schafer, 2007) , even though the imputed errors and the learned propensities are accurate. Meanwhile, DR would suffer unexpectedly large bias and poor generalization caused by inaccurate imputed errors and learned propensities, which usually occur in practice. In this paper, we first propose a novel targeted doubly robust (TDR) method, that can capture the merits of both DR and EIB effectively, by leveraging the targeted learning technique (van der Laan & Rose, 2011; 2018) . TDR can effectively reduce the bias and variance simultaneously for existing DR approaches when the imputed errors are less accurate. Remarkably, TDR provides a model-agnostic framework and can be assembled into any DR method by updating its error imputation model, resulting in more accurate predictions. To further reduce the bias and variance during the training process, we propose a novel uniformdata-free TDR-based collaborative learning (TDR-CL) approach that decomposes imputed errors into a parametric imputation model part and a nonparametric error part, where the latter adaptively rectifies the residual bias of the former. By updating the two parts collaboratively, TDR-CL achieves a more accurate and robust prediction. Both theoretical analysis and experiments demonstrate the superiority of TDR and TDR-CL compared with existing methods.

2. PRELIMINARIES

Many debiasing tasks in RS can be formulated using the widely adopted potential outcome framework (Neyman, 1990; Rubin, 1974) . Denote U = {u}, I = {i} and D = U × I as the sets of users, items and user-item pairs, respectively. Let x u,i , r u,i , and o u,i be the feature, feedback, and exposure status of user-item pair (u, i), where o u,i = 1 or 0 represents whether the item i is exposed to user u or not. Define r u,i (1) as the potential outcome if o u,i had been set to 1, which is observed only when o u,i = 1. In RS, we are often interested in answering the causal question: "if we recommend products to users, what would be the feedback?". This question can be formulated as to learn the quantity E (r u,i (1)|x u,i ), i.e., it requires to predict r u,i (1) using feature x u,i , where E denotes the expectation with respect to the target distribution P. Many classical tasks in RS can be defined as estimating this quantity, such as rating prediction (Schnabel et al., 2016) and post-click conversion rate prediction (Guo et al., 2021) . More examples can be found in Wu et al. (2022b) . Let f θ (x u,i ) be a model used to predict r u,i (1) with parameter θ. Ideally, if all r u,i (1) for (u, i) ∈ D were observed, θ can be trained directly by optimizing the following ideal loss L ideal = |D| -1 (u,i)∈D e u,i , where e u,i is the prediction error, e.g., the squared loss e u,i = (r u,i (1) -f θ (x u,i )) 2 . However, since r u,i (1) is observed only when o u,i = 1, the ideal loss is non-computable. Restricting the analysis to non-missing data will obtain biased conclusions, as the observed data may form an unrepresentative sample of the target population. Different debiasing methods are designed to approximate and substitute the ideal loss. For example, the IPS and EIB estimators are given as L IP S = |D| -1 (u,i)∈D o u,i e u,i /p u,i , L EIB = |D| -1 (u,i)∈D [o u,i e u,i + (1 -o u,i )ê u,i ], where pu,i is an estimate of propensity score p u,i := P(o u,i = 1|x u,i ), êu,i is an estimate of prediction error g u,i := E[e u,i |x u,i ], i.e., it fits e u,i using x u,i . The DR estimator is formulated as L DR = |D| -1 (u,i)∈D êu,i + o u,i (e u,i -êu,i ) pu,i which enjoys doubly robust property, i.e., it is an unbiased estimator of ideal loss when either imputed errors or learned propensities are accurate.

3. MOTIVATION

DR approaches have been extensively studied in RS for various debiasing tasks for its double robustness, e.g., rating prediction (Wang et al., 2019; 2020a; Li et al., 2023b; c) , learning-to-rank (LTR) (Saito, 2020; Oosterhuis, 2022) , and post-click conversion rate prediction (Guo et al., 2021; Dai et al., 2022) , etc. However, the DR still have several limitations that need to resolved. We first show that DR has a large variance and is sensitive to small propensities as shown in Proposition 1 (see Appendix A for proofs). Proposition 1. If pu,i and êu,i are accurate estimates of p u,i and g u,i , respectively, i.e., pu,i = p u,i , êu,i = g u,i , then IPS, EIB and DR estimators are unbiased, and their variances satisfy

