TDR-CL: TARGETED DOUBLY ROBUST COLLABORA-TIVE LEARNING FOR DEBIASED RECOMMENDATIONS

Abstract

Bias is a common problem inherent in recommender systems, which is entangled with users' preferences and poses a great challenge to unbiased learning. For debiasing tasks, the doubly robust (DR) method and its variants show superior performance due to the double robustness property, that is, DR is unbiased when either imputed errors or learned propensities are accurate. However, our theoretical analysis reveals that DR usually has a large variance. Meanwhile, DR would suffer unexpectedly large bias and poor generalization caused by inaccurate imputed errors and learned propensities, which usually occur in practice. In this paper, we propose a principled approach that can effectively reduce the bias and variance simultaneously for existing DR approaches when the error imputation model is misspecified. In addition, we further propose a novel semi-parametric collaborative learning approach that decomposes imputed errors into parametric and nonparametric parts and updates them collaboratively, resulting in more accurate predictions. Both theoretical analysis and experiments demonstrate the superiority of the proposed methods compared with existing debiasing methods.

1. INTRODUCTION

Addressing various tasks in recommender systems (RSs) with causality-based methods has become increasingly popular (Wu et al., 2022b) . Causality-based recommendation has shown its great potential in both numeric experiments and theoretical analyses across extensive literature (Chen et al., 2020; Wang et al., 2019) . Generally, the basic question faced in RS is that "what would the feedback be if recommending an item to a user", requiring to estimate the causal effect of a recommendation on user feedback. To answer the question, many methods have been proposed, such as inverse propensity score (IPS) (Schnabel et al., 2016) , self-normalized inverse propensity score (SNIPS) (Swaminathan & Joachims, 2015) , error imputation based (EIB) methods (Steck, 2010), and doubly robust (DR) methods (Chen et al., 2021; Wang et al., 2019; 2021; Dai et al., 2022; Ding et al., 2022) . Among them, the DR method and its variants show superior performance. We compare and evaluate these methods in terms of three desired properties, including doubly robust (Hernán & Robins, 2020; Wu et al., 2022c) , robust to small propensities (Rosenbaum, 2020), and low variance (Tan, 2007) . Failing to meet any of them may lead to sub-optimal performance (Molenberghs et al., 2015; van der Laan & Rose, 2011) . Our theoretical analysis shows that DR has much greater variance and is less robust to small propensities compared to EIB (Kang & Schafer, 2007) , even though the imputed errors and the learned propensities are accurate. Meanwhile, DR would suffer unexpectedly large bias and poor generalization caused by inaccurate imputed errors and learned propensities, which usually occur in practice. In this paper, we first propose a novel targeted doubly robust (TDR) method, that can capture the merits of both DR and EIB effectively, by leveraging the targeted learning technique (van der Laan & Rose, 2011; 2018) . TDR can effectively reduce the bias and variance simultaneously for existing DR approaches when the imputed errors are less accurate. Remarkably, TDR provides a model-

