VARIATIONAL IMBALANCED REGRESSION

Abstract

Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation.

1. INTRODUCTION

Deep regression models are currently the state of the art in making predictions in a continuous label space and have a wide range of successful applications in computer vision (Yin et al., 2021) , natural language processing (Jiang et al., 2020) , etc. However, these models fail however when the label distribution in training data is imbalanced. For example, in visual age estimation (Moschoglou et al., 2017) , where a model infers the age of a person given her visual appearance, models are typically trained on imbalanced datasets with overwhelmingly more images of younger adults, leading to poor regression accuracy for images of children or elderly people (Yang et al., 2021) . Such unreliability in imbalanced regression settings motivates the need for both improving performance for the minority in the presence of imbalanced data and, more importantly, providing reasonable uncertainty estimation to inform practitioners on how reliable the predictions are (especially for the minority where accuracy is lower). Existing methods for deep imbalanced regression (DIR) only focus on improving the accuracy of deep regression models by smoothing the label distribution and reweighting data with different labels (Yang et al., 2021) . On the other hand, methods that provide uncertainty estimation for deep regression models operates under the balance-data assumption and therefore do not work well in the imbalanced setting (Amini et al., 2020; Mi et al., 2022; Charpentier et al., 2022) . To simultaneously cover these two desiderata, we propose a probabilistic deep imbalanced regression model, dubbed variational imbalanced regression (VIR). Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR assumes Neighboring and Identically Distributed (N.I.D.) and borrows data with similar regression labels to compute the latent representation's variational distribution. Specifically, VIR first encodes a data point into a probabilistic representation and then mix it with neighboring representations (i.e., representations from data with similar regression labels) to produce its final probabilistic representation; VIR is therefore particularly useful for minority data as it can borrow probabilistic representations from data with similar labels (and naturally weigh them using our probabilistic model) to counteract data sparsity. Furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions by the importance weight computed from the smoothed label distribution to impose probabilistic reweighting on the imbalanced data. This allows the negative log likelihood to naturally put more focus on the minority data, thereby balancing the accuracy for data with different regression labels. Our VIR framework is compatible with any deep regression models and can be trained end to end. We summarize our contributions as below: 1. While previous work has studied imbalanced regression and uncertainty estimation separately, none of them has considered uncertainty estimation in the imbalanced setting. We identify the problem of probabilistic deep imbalanced regression as well as two desiderata, balanced accuracy and uncertainty estimation, for the problem. 2. We propose VIR to simultaneously cover these two desiderata and achieve state-of-the-art performance compared to existing methods. 3. As a byproduct, we also provide strong baselines for benchmarking high-quality uncertainty estimation and promising prediction performance on imbalanced datasets. Variational Autoencoder.

2. RELATED WORK

Variational autoencoder (VAE) (Kingma & Welling, 2014) is an unsupervised learning model that aims to infer probabilistic representations from data. However, as shown in Figure 1 , VAE typically assumes I.I.D. representations, where a data point's representation is not directly affected by other data points. In contrast, our VIR borrows data with similar regression labels to compute the latent representation's variational distribution. Imbalanced Regression. Imbalanced regression is underexplored in the machine learning community. Most existing methods for imbalanced regression are direct extensions of the SMOTE algorithm (Chawla et al., 2002) , a commonly used algorithm for imbalanced classification, where data from the minority classes is over-sampled. These algorithms usually synthesize augmented data for the minority regression labels by either interpolating both inputs and labels (Torgo et al., 2013) or adding Gaussian noise (Branco et al., 2017; 2018) . Such algorithms fail to the distance in continuous label space and fall short in handling highdimensional data (e.g., images and text). Recently, DIR (Yang et al., 2021) addresses these issues by applying kernel density estimation to smooth and reweight data on the continuous label distribution, achieving state-of-the-art performance. However, DIR only focuses on improving the accuracy, especially for the data with minority labels, and therefore does not provide uncertainty estimation, which is crucial to assess the predictions' reliability. Ren et al. (2022) focuses on re-balancing the mean squared error (MSE) loss for imbalanced regression, and Gong et al. ( 2022) introduces ranking similarity for improving deep imbalanced regression. In contrast, our VIR provides a principled probabilistic approach to simultaneously achieve these two desiderata, not only improving upon DIR in terms of performance but also producing reasonable uncertainty estimation as a much-needed byproduct to assess model reliability. There is also related work on imbalanced classification (Deng et al., 2021) , which is related to our work but focusing on classification rather than regression.

Uncertainty Estimation in Regression.

There has been renewed interest in uncertainty estimation in the context of deep regression models (Kendall & Gal, 2017; Kuleshov et al., 2018; Song et al., 2019; Zelikman et al., 2020; Amini et al., 2020; Mi et al., 2022; van Amersfoort et al., 2021; Liu et al., 2020; Gal & Ghahramani, 2016; Stadler et al., 2021; Snoek et al., 2019; Heiss et al., 2022) . Most existing methods either directly predict the variance of the output distribution as the estimated uncertainty (Kendall & Gal, 2017; Zhang et al., 2019; Amini et al., 2020) or rely on post-hoc confidence interval calibration (Kuleshov et al., 2018; Song et al., 2019; Zelikman et al., 2020) . Meanwhile, Posterior Networks methods Charpentier et al. (2020; 2022); Stadler et al. (2021) consider conjugate distribution, pseudo-count interpretations, posterior updates, and variational losses for fast and high-quality uncertainty estimation. Closest to our work is Deep Evidential Regression (DER) (Amini et al., 2020) , which attempts to estimate both aleatoric and epistemic uncertainty (Kendall & Gal, 2017; Hüllermeier & Waegeman, 2019) on regression tasks by training



Figure 1: Comparison on inference networks between typical VAE (Kingma & Welling, 2014) and our VIR. In VAE (left), a data point's latent representation (i.e. z) is affected only by itself, while in VIR (right), neighbors participate to modulate the final representation.

