POINTWISE BINARY CLASSIFICATION WITH PAIRWISE CONFIDENCE COMPARISONS

Abstract

Ordinary (pointwise) binary classification aims to learn a binary classifier from pointwise labeled data. However, such pointwise labels may not be directly accessible due to privacy, confidentiality, or security considerations. In this case, can we still learn an accurate binary classifier? This paper proposes a novel setting, namely pairwise comparison (Pcomp) classification, where we are given only pairs of unlabeled data that we know one is more likely to be positive than the other, instead of pointwise labeled data. Compared with pointwise labels, pairwise comparisons are easier to collect, and Pcomp classification is useful for subjective classification tasks. To solve this problem, we present a mathematical formulation for the generation process of pairwise comparison data, based on which we exploit an unbiased risk estimator (URE) to train a binary classifier by empirical risk minimization and establish an estimation error bound. We first prove that a URE can be derived and improve it using correction functions. Then, we start from the noisy-label learning perspective to introduce a progressive URE and improve it by imposing consistency regularization. Finally, experiments validate the effectiveness of our proposed solutions for Pcomp classification.

1. INTRODUCTION

Traditional supervised learning techniques have achieved great advances, while they are demanding for precisely labeled data. In many real-world scenarios, it may be too difficult to collect such data. To alleviate this issue, a large number of weakly supervised learning problems (Zhou, 2018) have been extensively studied, including semi-supervised learning (Zhu & Goldberg, 2009; Niu et al., 2013; Sakai et al., 2018) , multi-instance learning (Zhou et al., 2009; Sun et al., 2016; Zhang & Zhou, 2017) , noisy-label learning (Han et al., 2018; Xia et al., 2019; Wei et al., 2020 ), partial-label learning (Zhang et al., 2017; Feng et al., 2020b; Lv et al., 2020 ), complementary-label learning (Ishida et al., 2017; Yu et al., 2018; Ishida et al., 2019; Feng et al., 2020a) , positive-unlabeled classification (Gong et al., 2019) , positive-confidence classification (Ishida et al., 2018 ), similarunlabeled classification (Bao et al., 2018 ), unlabeled-unlabeled classification (Lu et al., 2019; 2020) , and triplet classification (Cui et al., 2020) . This paper considers another novel weakly supervised learning setting called pairwise comparison (Pcomp) classification, where we aim to perform pointwise binary classification with only pairwise comparison data, instead of pointwise labeled data. A pairwise comparison (x, x ) represents that the instance x has a larger confidence of belonging to the positive class than the instance x . Such weak supervision (pairwise confidence comparison) could be much easier for people to collect than full supervision (pointwise label) in practice, especially for applications on sensitive or private matters. For example, it may be difficult to collect sensitive or private data with pointwise labels, as asking for the true labels could be prohibited or illegal. In this case, it could be easier for people to collect other weak supervision like the comparison information between two examples. It is also advantageous to consider pairwise confidence comparisons in pointwise binary classification with class overlapping, where the labeling task becomes difficult, and even experienced labelers may provide wrong pointwise labels. Let us denote the labeling standard of a labeler as p(y|x) and assume that an instance x 1 is more positive than another instance x 2 . Facing the difficult labeling task, different labelers may hold different labeling standards, p(y = +1|x 1 ) > p(y = +1|x 2 ) > 1/2, p(y = +1|x 1 ) > 1/2 > p(y = +1|x 2 ), and 1/2 > p(y = +1|x 1 ) > p(y = +1|x 2 ), thereby providing different pointwise labels: (+1, +1), (+1, -1), (-1, -1). We can find that different labelers may provide inconsistent pointwise labels, while pairwise confidence comparisons are unanimous and accurate. One may argue that we could aggregate multiple labels of the same instance using crowdsourcing learning methods (Whitehill et al., 2009; Raykar et al., 2010) . However, as not every instance will be labeled by multiple labelers, it is not always applicable to crowdsourcing learning methods. Therefore, our proposed Pcomp classification is useful in this case. Our contributions in this paper can be summarized as follows: • We propose Pcomp classification, a novel weakly supervised learning setting, and present a mathematical formulation for the generation process of pairwise comparison data. • We prove that an unbiased risk estimator (URE) can be derived, propose an empirical risk minimization (ERM) based method, and present an improvement using correction functions (Lu et al., 2020) for alleviating overftting when complex models are used. • We start from the noisy-label learning perspective to introduce the RankPruning method (Northcutt et al., 2017) that holds a progressive URE for solving our proposed Pcomp classification problem and improve it by imposing consistency regularization. • We experimentally demonstrate the effectiveness of our proposed solutions for Pcomp classification.

2. PRELIMINARIES

Binary classification with pairwise comparisons and extra pointwise labels has been studied (Xu et al., 2017; Kane et al., 2017) . Our paper focuses on a more challenging problem where only pairwise comparison examples are provided. Unlike previous studies (Xu et al., 2017; Kane et al., 2017) that leverage some pointwise labels to differentiate the labels of pairwise comparisons, our methods are purely based on ERM with only pairwise comparisons. In the next, we briefly introduce some notations and review the related problem formulations of binary classification, positive-unlabeled classification, and unlabeled-unlabeled classification. Binary Classification. Since our paper focuses on how to train a binary classifier from pairwise comparison data, we first review the problem formulation of binary classification. Let the feature space be X and the label space be Y = {+1, -1}. Suppose the collected dataset is denoted by D = {(x i , y i )} n i=1 where each example (x i , y i ) is independently sampled from the joint distribution with density p(x, y), which includes an instance x i ∈ X and a label y i ∈ Y. The goal of binary classification is to train an optimal classifier f : X → R by minimizing the following expected classification risk: Positive-Unlabeled (PU) Classification. In some real-world scenarios, it may be difficult to collect negative data, and only positive (P) and unlabeled (U) data are available. PU classification aims to train an effective binary classifier in this weakly supervised setting. Previous studies (du Plessis et al., 2014; 2015; Kiryo et al., 2017) showed that the classification risk R(f ) in Eq. ( 1) can be rewritten only in terms of positive and unlabeled data as R(f ) = E p(x,y) (f (x), y) = π + E p+(x) (f (x), +1) + π -E p-(x) (f (x), -1) , R(f ) = R PU (f ) = π + E p+(x) (f (x), +1) -(f (x), -1) + E p(x) (f (x), -1) , where p(x) = π + p + (x) + π -p -(x) denotes the probability density of unlabeled data. This risk expression immediately allows us to employ ERM in terms of positive and unlabeled data. Unlabeled-Unlabeled (UU) Classification. The recent studies (Lu et al., 2019; 2020) showed that it is possible to train a binary classifier only from two unlabeled datasets with different class priors.



where : R × Y → R + denotes a binary loss function, π + := p(y = +1) (or π -:= p(y = -1)) denotes the positive (or negative) class prior probability, and p + (x) := p(x|y = +1) (or p -(x) := p(x|y = -1)) denotes the class-conditional probability density of the positive (or negative) data. ERM approximates the expectations over p + (x) and p -(x) by the empirical averages of positive and negative data and the empirical risk is minimized with respect to the classifier f .

