DEEP POSITIVE UNLABELED LEARNING WITH A SEQUENTIAL BIAS

Abstract

For many domains, from video stream analytics to human activity recognition, only weakly-labeled datasets are available. Worse yet, the given labels are often assigned sequentially, resulting in sequential bias. Current Positive Unlabeled (PU) classifiers, a state-of-the-art family of robust semi-supervised methods, are ineffective under sequential bias. In this work, we propose DeepSPU, the first method to address this sequential bias problem. DeepSPU tackles the two interdependent subproblems of learning both the latent labeling process and the true class likelihoods within one architecture. We achieve this by developing a novel iterative learning strategy aided by theoretically-justified cost terms to avoid collapsing into a naive classifier. Our experimental studies demonstrate that DeepSPU outperforms state-of-the-art methods by over 10% on diverse real-world datasets.

1. INTRODUCTION

Motivation. State-of-the-art approaches for learning from data with only only incomplete positive labels require an accurate estimation of the likelihood that any given positive instance receives a label, known as the propensity score. However, all existing approaches overlook the fact that the annotations given for sequential data are often clustered together, and thus the likelihood that a given instance is labeled is dependent on the labels of the surrounding instances. We refer to this as sequential bias. Overlooking this sequential bias results in an incorrect propensity score and significantly reduced classification performance. Ours is the first work to make this observation and we propose the first solution to this open problem. Human Activity Recognition (HAR) is a prime example of sequential bias in data. To collect HAR data, subjects are asked to report their activities while wearing mobile sensors. As study-length increases (collection may take many days), participants leave many activities unlabeled. Additionally, wearable sensors record data rapidly so large blocks of time get labeled consecutively, also creating sequential bias. Many more applications, such as intrusion detection from video or illness prediction from medical records, have similar sequentially-labeled data are are susceptible to sequential bias (Rodríguez-Moreno et al., 2019; Schaekermann et al., 2018) . This is a crucial issue as existing methods show drastically reduced accuracy when sequential bias is not accounted for (as demonstrated in our Experimental Results). State-of-the-Art. Positive Unlabeled (PU) classifiers are a family of semi-supervised methods that learn from incompletely-labeled data without requiring any labeled negative examples (Bekker & Davis, 2020; Elkan & Noto, 2008; Li & Liu, 2005; Hsieh et al., 2015; Du Plessis et al., 2015; Kiryo et al., 2017; Bekker & Davis, 2018a; Kato et al., 2019) . This is a key strength of PU methods because representative negative examples, typically required by semi-supervised methods, are often not feasible to acquire. For instance, in the HAR example, there are infinitely many activities that an individual is not performing at any given time. Consequentially, participants are only expected to provide some positive labels for their activities (Vaizman et al., 2017) . Unfortunately, existing PU methods make unrealistically restrictive simplifying assumptions on how the labels were applied. Specifically, they either assume that there is no bias in the labeling process (the probability of a sample being an unlabeled positive instance is uniform) (Elkan & Noto, 2008; Du Plessis et al., 2015; Kiryo et al., 2017) or otherwise only depends on the local attributes of each instance (Bekker & Davis, 2018a; Kato et al., 2019) . This means that existing methods do not model sequential bias. And, as we demonstrate in our experiments, these methods are significantly negatively impacted when a sequential bias is present. Problem Description and Technical Challenges. Given a dataset of sequences, our goal is to predict the true class likelihoods of each instance within the given sequence given a subset of labeled positive instances during training. In particular, we focus on the difficult case where labels have been assigned with a sequential bias, with sequential bias defined as the case where the likelihood that a positive instance is labeled varies depending on whether its neighboring instances were labeled. This problem is challenging due to two difficult interdependent subproblems. First, we have the dependency problem: if we had a model of the latent labeling process (which we call the propensity model) that allowed us to identify the true unlabeled positive instances, then we could use this propensity model to train a classifier to produce the true class likelihoods. However, we need these same true class likelihoods in order to train the propensity model -causing a cyclic dependency. Second, standard maximum likelihood estimation inherently assumes all instances are labeled, leading to a naive classifier in the presence of labeling bias. To capture unlabeled positive instances, a PU classifier must instead predict an appropriate number of positive instances without simply assuming all positives instances are labeled. Our Approach: DeepSPU. We propose Deep Sequential PU (DeepSPU), which is the first Positive Unlabeled method to use a propensity score model that predicts the likelihood that any given positive instance is labeled while taking sequential bias into account. The propensity score allows us to train a classifier network given only partially labeled data. We achieve this by developing a novel learning method that overcomes the cyclic dependency problem by iteratively learning the propensity score model and the classifier using weakly-labeled data. Further, we introduce the two novel PU cost terms: the Prior-Matching Costs (PMC) and the Observation-Matching Costs (OMC), which prohibit the propensity model and classifier from collapsing into incorrect naive solutions. Contributions. The main contributions of our work are: • We identify sequential bias, a labeling pattern characteristic of many real-world labeling processes, and demonstrate how ignoring this bias significantly impacts the performance of state-ofthe-art PU classifiers. • We propose the first learning strategy to minimize the bias incurred from sequentially biased PU data. Namely, we propose an iterative learning strategy and design two novel PU cost terms, Prior-Matching and Observation-Matching, which prohibit collapse into certain incorrect adversarial solutions, as justified through theoretical analysis. • We develop DeepSPU, the first model to mitigate sequential bias. DeepSPU uses the aforementioned learning strategy to jointly estimate the two interdependent latent variables: the propensity score and the true class probabilities -without any direct supervision for either learning task. et al., 2017; Du Plessis et al., 2015; Kiryo et al., 2017) . However, all these state-of-the-art approaches share the often-unrealistic assumption that no bias exists in the labeling process, sequential or otherwise. When a bias is present, these methods are susceptible to learning skewed decision boundaries and thus are prone to making biased and incorrect classifications (Bekker & Davis, 2020).



There are many approaches to PU learning, such as re-weighting predictions (Zhang & Lee, 2005; Elkan & Noto, 2008), iteratively identifying reliable examples (Ienco & Pensa, 2016), and most notably risk minimization (Northcutt

