UNBIASED DECISIONS REDUCE REGRET: ADVERSARIAL OPTIMISM FOR THE BANK LOAN PROBLEM

Abstract

In many real world settings binary classification decisions are made based on limited data in near real-time, e.g. when assessing a loan application. We focus on a class of these problems that share a common feature: the true label is only observed when a data point is assigned a positive label by the principal, e.g. we only find out if an applicant defaults if we accept their loan application in the first place. In this setting, sometimes referred to as the Bank Loan Problem (BLP) in the literature, the labelled training can accumulate bias since it is influenced by the past decisions. Prior work mitigates the consequences of this bias by injecting optimism into the model to allow the learner to correct self-reinforcing false rejections. This reduces long term regret but comes at the cost of a higher false acceptance rate. We introduce adversarial optimism (AdOpt) to directly address bias in the training set using adversarial domain adaptation. The goal of AdOpt is to learn an unbiased but informative representation of past data, by reducing the distributional shift between the set of accepted data points and all data points seen thus far. We integrate classification made using this "debiased" representation of the data with the recently proposed pseudo-label optimism (PLOT) method to increase the rate of correct decisions at every timestep. AdOpt significantly exceeds state-of-the-art performance on a set of challenging BLP benchmark problems.

1. INTRODUCTION

In a variety of online decision making problems, principals have to make an acceptance or rejection decision for a given instance based on observing data points in an online fashion. Across a broad range of these, the true label is only revealed for those data points which the principal accepted, creating a biased labelled dataset. In this work we concentrate on addressing this issue for the specific class of binary classification tasks, also known as the "Bank Loan Problem"(BLP), motivated by the characteristic example of a lender deciding on outcomes of loan applications. The lender's objective is to maximize profit, i.e. accept as many credible applicants as possible, while denying those who would ultimately default. The caveat is that the lender doesn't learn whether rejected applicants would have actually repaid the loan. Hence a decision policy that relies solely on the past experience lacks the opportunity to correct for erroneous rejection decisions. These "false rejects" are self-reinforcing since the correct label is never revealed for rejected candidates. The dynamic nature of the data collection mechanism in the BLP offers a very simple and clearly defined example of accumulating bias of the kind we indicated above. As time progresses, the growing pool of accepted applicants (the models training set), created by the models decisions, forms an increasingly biased dataset, whose distribution is different from that of the general applicant population. This distributional shift affects the accuracy of the predictions of a model trained on the set of accepted points for any future applicants. A common approach for mitigating the consequences of a biased model is to inject optimism into the decision making strategy. Of particular note to us is the recently proposed Pseudo-Label Optimism (Pacchiano et al., 2021, PLOT) that provides a simple and computationally efficient way to introduce optimism which can be used in combination with deep neural networks (DNNs). This op-Figure 1 : The AdOpt algorithm timism translates self-fulfilling false rejects into self-correcting false accepts, which strictly reduces long term regret. However, the cost of optimism is an increased false acceptance rate in particular early on in the learning process. We propose and evaluate a novel approach to the BLP that is motivated by methods from the domain of learning in the presence of distributional shift. Specifically, we utilize adversarial domain adaptation to learn a de-biased representation of the training data that minimizes this distributional difference, while preserving the informative features. Although very natural in this setting, this is to our knowledge the first attempt to utilise adversarial domain adaptation to tackle bias in the online context. Our experiments show that the de-biased classifier can achieve increased recall on the new queries while maintaining sufficient precision. However, by itself this adversarially de-biased classifier suffers from a fundamental flaw: It needs to trade-off between truly informative features and reducing bias. Clearly, both cannot be accomplished at the same time, which leads to this method performing poorly in some settings. To overcome these issues we introduce adversarial optimism method (AdOpt), which combines the de-biased classification approach with PLOT. When presented with a new query at each step, AdOpt uses a de-biased representation of the existing training data and trains a de-biased classifier to assess the probability of it being a true positive. If the de-biased classifier recommends to accept the point, we verify its suggestion using the pseudo-label optimism of PLOT: we add the data point to the original labelled dataset with a positive label and train the optimistic classifier on this mixed dataset. Finally, we use the classification of the optimistic classifier to accept or reject the data point (see Figure 1 ) Compared with the PLOT, AdOpt has the advantage of utilizing the de-biased classifier for identification of candidates for the pseudo-label optimism routine. In Pacchiano et al. ( 2021) the pseudolabel optimism was combined with the ϵ-greedy approach for pseudo-label candidates selection to mitigate the issue of high false-positive rate. However the major weakness of this approach is that the proportion of selected candidates for exploration stays constant from batch to batch, and that their selection is random and not in any way driven by the data. As a result re-running and carefully analyzing the experiments in the PLOT paper with several dataset sampling strategies shows that it doesn't consistently beat SOTA on 2 out of 3 datasets from Pacchiano et al. (2021)(see Figure 2 and section 5). The upshot is that better way of choosing pseudo-label candidates is needed to achieve optimal performance. Since the de-biased classifier is able to catch more positives with better accuracy, by combining it with PLOT we converge to better accuracy for the dataset after seeing a smaller number of candidates and with fewer misclassified queries. We evaluate AdOpt against a number of established approaches from literature, namely PLOT, "greedy", ϵ-greedy (with a decaying schedule) and NeuralUCB Zhou et al. (2020) algorithms. AdOpt outperforms the state-of-the-art methods on 4 out of 5 benchmark datasets. Conducting high number of experiments across several sampling methods allows us to use t-test to confirm the statistical significance of our results with high accuracy (Figures 2, 3, 4) . Our ablation study on the effectiveness of PLOT in the AdOpt algorithm demonstrates that the addition of PLOT significantly reduces the wide standard deviation that constitutes the main weakness of a standalone adversarially de-biased classifier (Figure 2 ). In addition, the Standalone Adversarial

