REGRESSION FROM UPPER ONE-SIDE LABELED DATA

Abstract

We address a regression problem from weakly labeled data that are correctly labeled only above a regression line, i.e., upper one-side labeled data. The label values of the data are the results of sensing the magnitude of some phenomenon. In this case, the labels often contain missing or incomplete observations whose values are lower than those of correct observations and are also usually lower than the regression line. It follows that data labeled with lower values than the estimations of a regression function (lower-side data) are mixed with data that should originally be labeled above the regression line (upper-side data). When such missing label observations are observed in a non-negligible amount, we thus should assume our lower-side data to be unlabeled data that are a mix of original upper-and lowerside data. We formulate a regression problem from these upper-side labeled and lower-side unlabeled data. We then derive a learning algorithm in an unbiased and consistent manner to ordinary regression that is learned from data labeled correctly in both upper-and lower-side cases. Our key idea is that we can derive a gradient that requires only upper-side data and unlabeled data as the equivalent expression of that for ordinary regression. We additionally found that a specific class of losses enables us to learn unbiased solutions practically. In numerical experiments on synthetic and real-world datasets, we demonstrate the advantages of our algorithm.

1. INTRODUCTION

This paper addresses a scenario in which a regression function is learned for label sensor values that are the results of sensing the magnitude of some phenomenon. A lower sensor value means not only a relatively lower magnitude than a higher value but also a missing or incomplete observation of a monitored phenomenon. Label sensor values for missing observations are lower than those for when observations are correct without missing observations and are also usually lower than an optimal regression line that is learned from the correct observations. A naive regression algorithm using such labels causes the results of prediction to be low and is thus biased and underfitted in comparison with the optimal regression line. In particular, when the data coverage of a label sensor is insufficient, the effect of missing observations causing there to be bias is critical. One practical example is that, for comfort in healthcare, we mimic and replace an intrusive wrist sensor (label sensor) with non-intrusive bed sensors (explanatory sensors). We learn a regression function that predicts the values of the wrist sensor from values of the bed sensors. The wrist sensor is wrapped around a wrist. It accurately represents the motion intensity of a person and is used such as for sleep-wake discrimination Tryon (2013); Mullaney et al. (1980); Webster et al. (1982); Cole et al. (1992) . However, it can sense motion only on the forearm, which causes data coverage to be insufficient and observations of movements on other body parts to be missing frequently. The bed sensors are installed under a bed; while their accuracy is limited because of their non-intrusiveness, they have much broader data coverage than that of the wrist sensor. In this case, the wrist sensor values for missing observations are improperly low and also inconsistent with the bed sensor values as shown in Fig. 1-(1 ). This leads to severe bias and underfitting. The specific problem causing the bias stems from the fact that our data labeled with lower values than the estimations of the regression function are mixed with data that should be originally labeled above the regression line. Here, we call data labeled above the regression line upper-side data, depicted as circles in Fig. 1-( 2), and data labeled below the regression line lower-side data, depicted as squares in Fig. 1-( 2). When there are missing observations, that is, our scenario, it means that the original data with missing observations have been moved to the lower side, depicted as triangles in Fig. 1-( 3). We We thus should assume our lower-side data to be unlabeled data, that is, a mix of original upperand lower-side data. We overcome the bias by handling this asymmetric label corruption, in which upper-side data are correctly labeled but lower-side data are always unlabeled. There is an established approach against such corrupted weak labels in regression, that is, robust regression that regards weak labels as containing outliers Huber et al. (1964); Narula & Wellington (1982) ; Draper & Smith (1998); Wilcox (1997). However, since not asymmetric but rather symmetric label corruption is assumed there, it is still biased in our problem setting. In the classification problem setting, asymmetric label corruption is addressed with positive-unlabeled (PU) learning, where it is assumed that negative data cannot be obtained but unlabeled data are available as well as positive data Denis ( 1998 2020). The focus is on classification tasks, and an unbiased risk estimator has been proposed Du Plessis et al. (2014; 2015) . There is a gap between the classification problem setting and our regression problem setting, i.e., we have to estimate specific continuous values, not positive/negative classes. We fill the gap with a novel approach for deriving an unbiased solution for our regression setting. In this paper, we formulate a regression problem from upper one-side labeled data, in which the upper-side data are correctly labeled, and we regard lower-side data as unlabeled data. We refer to this as one-side regression. Using these upper-side labeled and lower-side unlabeled data, we derive a learning algorithm in an unbiased and consistent manner to ordinary regression that uses data labeled correctly in both upper-and lower-side cases. This is achieved by deriving our gradient that requires only upper-side data and unlabeled data as an asymptotically equivalent expression of that for ordinary regression. This is a key difference from the derivation of unbiased PU classification where loss has been used. We additionally found that a specific class of losses enables us to make it so that an unbiased solution can be learned practically. For implementing the algorithm, we propose a stochastic optimization method. In numerical experiments using synthetic and real-world datasets, we empirically evaluated the effectiveness of the proposed algorithm. We found that it improves performance against regression algorithms that assume that both upper-and lower-side data are correctly labeled.

2. ONE-SIDE REGRESSION

Our goal is to derive a learning algorithm with upper one-side labeled data in an unbiased and consistent manner to ordinary regression that uses both upper-and lower-side labeled data. We first



Figure1: One-side regression problem, where, due to missing observations, data are correctly labeled only above regression line, i.e., upper one-side. Regression function must be learned in unbiased and consistent manner to ordinary regression, where data are labeled correctly in both upper-and lower-side.

); De Comité et al. (1999); Letouzey et al. (2000); Shi et al. (2018); Kato et al. (2019); Sakai & Shimizu (2019); Charoenphakdee & Sugiyama (2019); Li et al. (2019); Zhang et al. (2019); Xu et al. (2019); Zhang et al. (2020); Guo et al. (2020); Chen et al. (

