LEARNING LABEL ENCODINGS FOR DEEP REGRESSION

Abstract

Deep regression networks are widely used to tackle the problem of predicting a continuous value for a given input. Task-specialized approaches for training regression networks have shown significant improvement over generic approaches, such as direct regression. More recently, a generic approach based on regression by binary classification using binary-encoded labels has shown significant improvement over direct regression. The space of label encodings for regression is large. Lacking heretofore have been automated approaches to find a good label encoding for a given application. This paper introduces Regularized Label Encoding Learning (RLEL) for end-to-end training of an entire network and its label encoding. RLEL provides a generic approach for tackling regression. Underlying RLEL is our observation that the search space of label encodings can be constrained and efficiently explored by using a continuous search space of real-valued label encodings combined with a regularization function designed to encourage encodings with certain properties. These properties balance the probability of classification error in individual bits against error correction capability. Label encodings found by RLEL result in lower or comparable errors to manually designed label encodings. Applying RLEL results in 10.9% and 12.4% improvement in Mean Absolute Error (MAE) over direct regression and multiclass classification, respectively. Our evaluation demonstrates that RLEL can be combined with off-the-shelf feature extractors and is suitable across different architectures, datasets, and tasks. Code is available at https://github.com/ubc-aamodt-group/RLEL_regression.

1. INTRODUCTION

Deep regression is an important problem with applications in several fields, including robotics and autonomous vehicles. Recently, neural radiance fields (NeRF) regression networks have shown promising results in novel view synthesis, 3D reconstruction, and scene representation (Liu et al., 2020; Yu et al., 2021) . However, a typical generic approach to direct regression, in which the network is trained by minimizing the mean squared or absolute error between targets and predictions, performs poorly compared to task-specialized approaches (Yang et al., 2018; Ruiz et al., 2018; Niu et al., 2016; Fu et al., 2018) . Recently, generic approaches based on regression by binary classification have shown significant improvement over direct regression using custom-designed label encodings (Shah et al., 2022) . In this approach, a real-valued label is quantized and converted to an M -bit binary code, and these binary-encoded labels are used to train M binary classifiers. In the prediction phase, the output code of classifiers is converted to real-valued prediction using a decoding function. Moreover, binary-encoded labels have been proposed for ordinal regression (Li & Lin, 2006; Niu et al., 2016) and multiclass classification (Allwein et al., 2001; Cissé et al., 2012) . The use of binary-encoded labels for regression has multiple advantages. Additionally, predicting a set of values (e.g., classifiers' output) instead of one value (direct regression) introduces ensemble diversity, which improves accuracy (Song et al., 2021) . Furthermore, encoded labels introduce redundancy in the label presentation, which improves error correcting capability and accuracy (Dietterich & Bakiri, 1995) . Finding suitable label encoding for a given problem is challenging due to the vast design space. Related work on ordinal regression has primarily leveraged unary codes (Li & Lin, 2006; Niu et al., 2016; Fu et al., 2018) . Different approaches for label encoding design, including autoencoder, random search, and simulated annealing, have been proposed to design suitable encoding for multiclass classification (Cissé et al., 2012; Dietterich & Bakiri, 1995; Song et al., 2021) . However, these encodings perform relatively poorly for regression due to differences in task objectives (Section 2). More recently, Shah et al. ( 2022) analyzed and proposed properties of suitable encodings for regression. They empirically demonstrated the effectiveness of manually designed encodings guided by these properties. While establishing the benefits of exploring the space of label encodings for a given task, they did not provide an automated approach to do so. In this work, we propose Regularized Label Encoding Learning (RLEL), an end-to-end approach to train the network and label encoding together. Binary-encoded labels have discrete search space. This work proposes to relax the assumption of using discrete search space for label encodings. Label encoding design can be approached by regularized search through a continuous space of real-valued label encodings, enabling the use of continuous optimization approaches. Such a formulation enables end-to-end learning of the network parameters and label encoding. We propose two regularization functions to encourage certain properties in the learned label encoding during training. Specifically, while operating on real-valued label encoding, the regularization functions employed by RLEL are designed to encourage properties previously identified as being helpful for binary-valued label encodings (Shah et al., 2022) . The first property encourages the distance between learned encoded labels to be proportional to the difference between corresponding label values, which reduces the regression error. Further, each bit of label encoding can be considered a binary classifier. The second property proposes to reduce the complexity of a binary classifier's decision boundary by reducing the number of bit transitions (0 → 1 and 1 → 0 transitions in the classifier's target over the range of labels) in the corresponding bit in binarized label encoding. Figure 1 demonstrates the effect of proposed regularizers on the learned label encodings and regression errors. Figure 1a plots the L1 distance between learned encodings for different target labels versus the difference between corresponding label values. The L1 distance between encodings for distant targets is low without the regularizer. In contrast, the proposed regularizer encourages the learned label encoding to follow the first design property. Figure 1b plots learned label encoding (binarized representation for clarity). Each row represents encoding for a target value, and each column represents a classifier's output over the range of target labels. The use of regularizer R2 reduces the number of bit-transitions (i.e., 1 → 0 and 0 → 1 transitions in a column) to enforce the second design property and consequently reduces the regression error. We demonstrate that the regularization approach employed by RLEL encourages the desired properties in the label encodings. We evaluate the proposed approach on 11 benchmarks, covering diverse datasets, network architectures, and regression tasks, such as head pose estimation, facial landmark detection, age estimation, and autonomous driving. Label encodings found by RLEL result in lower or comparable errors to manually designed codes and outperform generic encoding design approaches (Gamal et al., 1987; Cissé et al., 2012; Shah et al., 2022) . Further, RLEL results in lower error than direct regression and multiclass classification by 10.9% and 12.4%, respectively, and even outperforms several task-specialized approaches. We make the following contributions to this work:



Without regulrizer R1, NME = 4.98 With regularizer R1, NME= 4.71 (a) Effect of regularizer R1 With regularizer R2, MAE = 3.73 Average #bit_transitions = 2.5 Without regularizer R2, MAE = 4.13 Average #bit_transitions = 3Effect of regularizer R2 Figure 1: (a) and (b) demonstrate the effect of proposed regularizers on learned label encodings and the regression error (NME/MAE) for FLD1_s and LFH1 benchmarks (Table 2), respectively. (a) Regularizer R1 encourages the distance between learned encodings to be proportional to the difference between corresponding label values. (b) Regularizer R2 reduces the number of bit transitions per bit, reducing the complexity of decision boundaries to be learned by binary classifiers. Here blue and white colors represent 1 and 0, respectively.

