SUPERVISED CONTRASTIVE REGRESSION WITH SAMPLE RANKING

Abstract

Deep regression models typically learn in an end-to-end fashion and do not explicitly try to learn a regression-aware representation. Their representations tend to be fragmented and fail to capture the continuous nature of regression tasks. In this paper, we propose Supervised Contrastive Regression (SupCR), a framework that learns a regression-aware representation by contrasting samples against each other based on their target distance. SupCR is orthogonal to existing regression models, and can be used in combination with such models to improve performance. Extensive experiments using five real-world regression datasets that span computer vision, human-computer interaction, and healthcare show that using SupCR achieves the state-of-the-art performance and consistently improves prior regression baselines on all datasets, tasks, and input modalities. SupCR also improves robustness to data corruptions, resilience to reduced training data, performance on transfer learning, and generalization to unseen targets.

1. INTRODUCTION

Regression problems are ubiquitous and fundamental in the real world. They include estimating age from human appearance (Rothe et al., 2015) , predicting health scores from physiological signals (Engemann et al., 2022) , and detecting gaze directions from webcam images (Zhang et al., 2017b) . Since regression targets are continuous, the most widely used approach to train a regression model is to have the model predict the target value, and use the distance (e.g., L1 or L2 distance) between the prediction and the ground-truth target as the loss function (Zhang et al., 2017a; b; Schrumpf et al., 2021; Engemann et al., 2022) . There are also works that control the relationship between predictions and targets by converting the regression task into a classification task and training the model with the cross-entropy loss (Rothe et al., 2015; Niu et al., 2016; Shi et al., 2021) . However, all previous methods focus on imposing constraints on the final predictions in an end-to-end fashion, but do not explicitly consider the representations learned by the model. Their representations tend to be fragmented and fail to capture the continuous relationships underlying regression tasks. For example, Figure 1(a) shows the representation learned by the L1 loss in the task of predicting weather temperature from webcam outdoor images (Chu et al., 2018) , where the images are captured by 44 outdoor webcams at different locations. The representation learned by the L1 model does not exhibit the continuous ground-truth temperatures; rather it is grouped by different webcams in a fragmented manner. Such unordered and fragmented representation is sub-optimal for the regression task and can even hamper performance since it contains distracting information (e.g., the capturing webcam). While there is a rich literature on representation learning, past methods are focused on classification problems. In particular, contrastive learning (Chen et al., 2020a; He et al., 2020; Chen et al., 2020b) and supervised contrastive learning (SupCon) (Khosla et al., 2020) have been proven highly effective in representation learning. However, as shown in Figure 1 (b), which plots the representation learned by SupCon for the visual temperature prediction task mentioned above, such method produces a sub-optimal representation for regression problems because it ignores the continuous order between the samples in a regression task. Besides, there are several recent works (Wang et al., 2022; Dufumier et al., 2021a; b; Schneider et al., 2022) adopting contrastive learning in the context of continuous labels, but they are not doing regression learning tasks. In this paper, we introduce Supervised Contrastive Regression (SupCR), a new framework for deep regression learning, where we first learn a representation that ensures the distances in the embedding space are ordered according to the target values. We then use this representation to predict the targets. To learn such a regression-aware representation, we contrast samples against each other based on their label/target distances. Our method explicitly leverages the ordered relationships between the samples to optimize the representation for the downstream regression task. As shown in Figure 1 (c), SupCR leads to a representation that captures the intrinsic ordered relationships between the samples. Moreover, our framework is orthogonal to existing regression methods as one can use any kind of regression method to map our learned representation to the prediction values. Extensive experiments using five real-world regression datasets that span computer vision, humancomputer interaction, and healthcare verify the superior performance of our framework. The results show that SupCR achieves the state-of-the-art performance and consistently improves the performance of previous regression methods by 8.7% on average across all datasets, tasks, and input modalities. Furthermore, our experimental results also show that SupCR delivers multiple desirable properties: 1) improved robustness to data corruptions; 2) enhanced resilience to reduced training data; 3) higher performance on transfer learning; and 4) better generalization to unseen targets.

2. RELATED WORK

Regression Learning. Deep learning has achieved a great success in addressing regression tasks (Rothe et al., 2015; Zhang et al., 2017a; b; Schrumpf et al., 2021; Engemann et al., 2022) . The most straightforward and widely used way to train a regression model is to have the model predict the target value, and use the distance between the prediction and the ground-truth target as the loss function. Common distance metrics used as the loss function in regression tasks are the L1 loss, the MSE loss and the Huber loss (Huber, 1992) . Past work has also proposed variants to the basic methods described above. One branch of prior works (Rothe et al., 2015; Gao et al., 2017; 2018; Pan et al., 2018) divide the regression range into small bins to convert the problem into a classification problem, and use classification loss, e.g., the cross-entropy loss, to train the model. Another branch of prior work (Niu et al., 2016; Fu et al., 2018; Cao et al., 2020; Shi et al., 2021) casts regression as an ordinal classification problem. They design multiple ordered thresholds and use multiple binary classifiers, one for each threshold, to learn whether the target is larger than each threshold. Past work however focuses on forcing the final predictions of the model to be close to the target and does so in an end-to-end manner. In contrast, this paper focuses on ordering the samples in the embedding space in accordance with the label order, and it does so by first learning a representation that contrasts samples against each other, and then learning a predictor to map the representation to the prediction values. Therefore, our method is orthogonal to prior methods for regression learning as those prior methods can be applied to training our predictor and learning the final predictions with the help of our ordered representation. Contrastive Learning. Contrastive learning, which aligns positive pairs and repulses negative pairs in the representation space, has demonstrated improved performance for self-supervised learn-



Figure 1: UMAP (McInnes et al., 2018) visualization of the representation learned by L1, Sup-Con (Khosla et al., 2020), and the proposed SupCR for the task of predicting the temperature from webcam outdoor images (Chu et al., 2018). The representation of each image is visualized as a dot and the color indicates the ground-truth temperature. SupCR can learn a representation that captures the intrinsic ordered relationships between the samples while L1 and SupCon fail to do so.

