POINTWISE BINARY CLASSIFICATION WITH PAIRWISE CONFIDENCE COMPARISONS

Abstract

Ordinary (pointwise) binary classification aims to learn a binary classifier from pointwise labeled data. However, such pointwise labels may not be directly accessible due to privacy, confidentiality, or security considerations. In this case, can we still learn an accurate binary classifier? This paper proposes a novel setting, namely pairwise comparison (Pcomp) classification, where we are given only pairs of unlabeled data that we know one is more likely to be positive than the other, instead of pointwise labeled data. Compared with pointwise labels, pairwise comparisons are easier to collect, and Pcomp classification is useful for subjective classification tasks. To solve this problem, we present a mathematical formulation for the generation process of pairwise comparison data, based on which we exploit an unbiased risk estimator (URE) to train a binary classifier by empirical risk minimization and establish an estimation error bound. We first prove that a URE can be derived and improve it using correction functions. Then, we start from the noisy-label learning perspective to introduce a progressive URE and improve it by imposing consistency regularization. Finally, experiments validate the effectiveness of our proposed solutions for Pcomp classification.

1. INTRODUCTION

Traditional supervised learning techniques have achieved great advances, while they are demanding for precisely labeled data. In many real-world scenarios, it may be too difficult to collect such data. To alleviate this issue, a large number of weakly supervised learning problems (Zhou, 2018) have been extensively studied, including semi-supervised learning (Zhu & Goldberg, 2009; Niu et al., 2013; Sakai et al., 2018) , multi-instance learning (Zhou et al., 2009; Sun et al., 2016; Zhang & Zhou, 2017) , noisy-label learning (Han et al., 2018; Xia et al., 2019; Wei et al., 2020 ), partial-label learning (Zhang et al., 2017; Feng et al., 2020b; Lv et al., 2020 ), complementary-label learning (Ishida et al., 2017; Yu et al., 2018; Ishida et al., 2019; Feng et al., 2020a) This paper considers another novel weakly supervised learning setting called pairwise comparison (Pcomp) classification, where we aim to perform pointwise binary classification with only pairwise comparison data, instead of pointwise labeled data. A pairwise comparison (x, x ) represents that the instance x has a larger confidence of belonging to the positive class than the instance x . Such weak supervision (pairwise confidence comparison) could be much easier for people to collect than full supervision (pointwise label) in practice, especially for applications on sensitive or private matters. For example, it may be difficult to collect sensitive or private data with pointwise labels, as asking for the true labels could be prohibited or illegal. In this case, it could be easier for people to collect other weak supervision like the comparison information between two examples. It is also advantageous to consider pairwise confidence comparisons in pointwise binary classification with class overlapping, where the labeling task becomes difficult, and even experienced labelers may provide wrong pointwise labels. Let us denote the labeling standard of a labeler as p(y|x) and assume that an instance x 1 is more positive than another instance x 2 . Facing the difficult labeling task, different labelers may hold different labeling standards, p(y = +1|x 1 ) > p(y = +1|x 2 ) > 1/2, p(y = +1|x 1 ) > 1/2 > p(y = +1|x 2 ), and 1/2 > p(y = +1|x 1 ) > p(y = +1|x 2 ), thereby



, positive-unlabeled classification (Gong et al., 2019), positive-confidence classification (Ishida et al., 2018), similarunlabeled classification (Bao et al., 2018), unlabeled-unlabeled classification (Lu et al., 2019; 2020), and triplet classification (Cui et al., 2020).

