REGRESSION FROM UPPER ONE-SIDE LABELED DATA

Abstract

We address a regression problem from weakly labeled data that are correctly labeled only above a regression line, i.e., upper one-side labeled data. The label values of the data are the results of sensing the magnitude of some phenomenon. In this case, the labels often contain missing or incomplete observations whose values are lower than those of correct observations and are also usually lower than the regression line. It follows that data labeled with lower values than the estimations of a regression function (lower-side data) are mixed with data that should originally be labeled above the regression line (upper-side data). When such missing label observations are observed in a non-negligible amount, we thus should assume our lower-side data to be unlabeled data that are a mix of original upper-and lowerside data. We formulate a regression problem from these upper-side labeled and lower-side unlabeled data. We then derive a learning algorithm in an unbiased and consistent manner to ordinary regression that is learned from data labeled correctly in both upper-and lower-side cases. Our key idea is that we can derive a gradient that requires only upper-side data and unlabeled data as the equivalent expression of that for ordinary regression. We additionally found that a specific class of losses enables us to learn unbiased solutions practically. In numerical experiments on synthetic and real-world datasets, we demonstrate the advantages of our algorithm.

1. INTRODUCTION

This paper addresses a scenario in which a regression function is learned for label sensor values that are the results of sensing the magnitude of some phenomenon. A lower sensor value means not only a relatively lower magnitude than a higher value but also a missing or incomplete observation of a monitored phenomenon. Label sensor values for missing observations are lower than those for when observations are correct without missing observations and are also usually lower than an optimal regression line that is learned from the correct observations. A naive regression algorithm using such labels causes the results of prediction to be low and is thus biased and underfitted in comparison with the optimal regression line. In particular, when the data coverage of a label sensor is insufficient, the effect of missing observations causing there to be bias is critical. One practical example is that, for comfort in healthcare, we mimic and replace an intrusive wrist sensor (label sensor) with non-intrusive bed sensors (explanatory sensors). We learn a regression function that predicts the values of the wrist sensor from values of the bed sensors. The wrist sensor is wrapped around a wrist. It accurately represents the motion intensity of a person and is used such as for sleep-wake discrimination Tryon (2013); Mullaney et al. (1980); Webster et al. (1982); Cole et al. (1992) . However, it can sense motion only on the forearm, which causes data coverage to be insufficient and observations of movements on other body parts to be missing frequently. The bed sensors are installed under a bed; while their accuracy is limited because of their non-intrusiveness, they have much broader data coverage than that of the wrist sensor. In this case, the wrist sensor values for missing observations are improperly low and also inconsistent with the bed sensor values as shown in Fig. 1-(1 ). This leads to severe bias and underfitting. The specific problem causing the bias stems from the fact that our data labeled with lower values than the estimations of the regression function are mixed with data that should be originally labeled above the regression line. Here, we call data labeled above the regression line upper-side data, depicted as circles in Fig. 1-( 2), and data labeled below the regression line lower-side data, depicted as squares in Fig. 1-( 2). When there are missing observations, that is, our scenario, it means that the original data with missing observations have been moved to the lower side, depicted as triangles in Fig. 1-( 3). We

