LABEL DISTRIBUTION LEARNING VIA IMPLICIT DIS-TRIBUTION REPRESENTATION

Abstract

In contrast to multi-label learning, label distribution learning characterizes the polysemy of examples by a label distribution to represent richer semantics. In the learning process of label distribution, the training data is collected mainly by manual annotation or label enhancement algorithms to generate label distribution. Unfortunately, the complexity of the manual annotation task or the inaccuracy of the label enhancement algorithm leads to noise and uncertainty in the label distribution training set. To alleviate this problem, we introduce the implicit distribution in the label distribution learning framework to characterize the uncertainty of each label value. Specifically, we use deep implicit representation learning to construct a label distribution matrix with Gaussian prior constraints, where each row component corresponds to the distribution estimate of each label value, and this row component is constrained by a prior Gaussian distribution to moderate the noise and uncertainty interference of the label distribution dataset. Finally, each row component of the label distribution matrix is transformed into a standard label distribution form by using the self-attention algorithm. In addition, some approaches with regularization characteristics are conducted in the training phase to improve the performance of the model. 1 https://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html 

1. INTRODUCTION

Label distribution learning (LDL) ( Geng ( 2016)) is a novel learning paradigm that characterizes the polysemy of examples. In LDL, the relevance of each label to an example is given by an exact numerical value between 0 and 1 (also known as description degree), and the description degree of all labels forms a distribution to fully characterize the polysemy of an example. Compared with traditional learning paradigms, LDL is a more generalizable and representational learning paradigm that provides richer semantic information. LDL has been successful in several application domains (Gao et al. (2018); Zhao et al. (2021); Chen et al. (2021a); Si et al. (2022) ). To obtain the label distribution for learning, there are mainly two ways: one is expert labeling, but labeling is expensive and there is no objective labeling criterion, and the resulting label distribution is highly subjective and ambiguous. The other is to convert a multi-label dataset into a label distribution dataset through a label enhancement algorithm (Xu et al. (2019; 2020); Zheng et al. (2021a); Zhao et al. (2022b) ). However, label enhancement lacks a reliable theory to ensure that the label distribution recovered from logical labels converges to the true label distribution, because logical labels provide a very loose solution space for the label distribution, making the solution less stable and less accurate. In summary, the label distribution dataset used for training has a high probability of inaccuracy and uncertainty, which significantly limits the performance of LDL algorithms. To characterize and mitigate the uncertainty of the label distribution, we propose a novel LDL method based on the implicit label distribution representation. Our work is inspired by recent work on implicit neural representation in 2D image reconstruction (Sitzmann et al. (2020) ). The key idea of implicit neural representation is to represent an object as a function that maps a sequence of coordinates to the corresponding signal, where the function is de-parameterized by a deep neural network. In this paper, we start with a deep network to extract the latent features of input information. Then, the latent features are looked up against the encoded coordinate matrix to generate a label distribution matrix (implicit distribution representation). Finally, the label distribution matrix is computed by a self-attention module to yield a standard label distribution. Note that the goal of the proposed implicit distribution representation is to generate a label distribution matrix with Gaussian distribution constraints as a customized representation pattern. 2022)) with an MLP as an executor to extract latent features in the input information. The architecture of the whole network consists of multiple layers of linear spiking neurons, and the neurons of different layers conduct a shortcut between them. Notably, spiking neural networks have two key properties that are different from the representation of artificial neural networks. First, a standard spiking neural network considers the time characteristics T (taking the image as an example, the input tensor X ∈ R T ×C×W ×H .) at a multi-step inference mechanism. Here, we create several pseudo-feature spaces on the native feature space by setting different strategies with data augmentation (Ucar et al. ( 2021)). These pseudo-features and the native feature are stacked in the time dimension T to achieve the multi-step inference mechanism. Second, the representation capability of the spiking neural network is underpowered, since the output space is a binarized sequence ({0 • • • 1}). Therefore, we place a standard MLP in the last layer of the spiking neural network, which projects the features into the real number space. Our model saves about 30∼40% of energy consumption over ANNs with the same network structure on embedded devices such as lynxi HP300, Raspberry Pi, or Apple smartphones (PyTorch 1.2 support M1 Mac). The extraction of latent features provides material for the coordinates to generate the label distribution matrix. First, the initialized coordinate matrix (the size is L × 64, where L denotes the number of labels, and 64 denotes features of the nodes) is reconstructed by using a GCN. Note that the features of the nodes meet the Gaussian distribution since the deep network needs to reconstruct the data from a fixed distribution. Then the coordinate matrix is repeated in N copies, and N denotes the number of samples. Next, the coordinate matrix computes a label distribution matrix (the size is N × L × 2L) in the latent feature space by looking up the table 1 . Each component of the label distribution matrix represents the distribution of each label value, and the components are constrained by a priori Gaussian distributions. Finally, the label distribution matrix leverages a self-attention mechanism to obtain the corresponding label distribution of the samples. Yet, deep learning-based approaches are prone to overfitting manually extracted features. To alleviate the problem, we propose some regularization approaches to boost the performance of the model, and a new dataset based on the image comprehension task is released. Our contribution includes: (i) For LDL, this is a novel method to obtain the label distribution of a sample through the implicit distribution representation. (ii) Spiking neural network with an MLP is developed to save energy consumption of mobile devices, and correlations between labels are deeply mined by a graph convolutional network. (iii) To the best of our knowledge, we are the first to tackle the tabular LDL issue by using deep learning. Facing the LDL task, some regularization techniques are designed to boost the performance of the model and a new LDL dataset is released.



Figure 1: Our architecture. This figure shows the architecture of the proposed deep implicit function, which consists of two parts. The first part starts with a latent feature prediction stream (SNN with an MLP) that learns the input information to predict the feature maps. The second part learns a label distribution matrix to regress a label distribution. To efficiently generate latent features around the coordinate, we design a deep spiking neural network (Yamazaki et al. (2022)) with an MLP as an executor to extract latent features in the input information. The architecture of the whole network consists of multiple layers of linear spiking neurons, and the neurons of different layers conduct a shortcut between them. Notably, spiking neural networks have two key properties that are different from the representation of artificial neural networks. First, a standard spiking neural network considers the time characteristics T (taking the image as an example, the input tensor X ∈ R T ×C×W ×H .) at a multi-step inference mechanism. Here, we create several pseudo-feature spaces on the native feature space by setting different strategies with data augmentation (Ucar et al. (2021)). These pseudo-features and the native feature are stacked in the time dimension T to achieve the multi-step inference mechanism. Second, the representation capability of the spiking neural network is underpowered, since the output space is a binarized sequence ({0 • • • 1}). Therefore, we place a standard MLP in the last layer of the spiking neural network, which projects the features into the real number space. Our model saves about 30∼40% of energy consumption over ANNs with the same network structure on embedded devices such as lynxi HP300, Raspberry Pi, or Apple smartphones (PyTorch 1.2 support M1 Mac). The extraction of latent features provides material for the coordinates to generate the label distribution matrix. First, the initialized coordinate matrix (the size is L × 64, where L denotes the number of labels, and 64 denotes features of the nodes) is reconstructed by using a GCN. Note that the features of the nodes meet the Gaussian distribution since the deep network needs to reconstruct the data from a fixed distribution. Then the coordinate matrix is repeated in N copies, and N denotes the number of samples. Next, the coordinate matrix computes a label distribution matrix (the size is N × L × 2L) in the latent feature space by looking up the table1. Each component of the label distribution matrix represents the distribution of each label value, and the components are constrained by a priori Gaussian distributions. Finally, the label distribution matrix leverages a self-attention mechanism to obtain the corresponding label distribution of the samples. Yet, deep learning-based approaches are prone to overfitting manually extracted features. To alleviate the problem, we propose some regularization approaches to boost the performance of the model, and a new dataset based on the image comprehension task is released. Our contribution includes: (i) For LDL, this is a novel method to obtain the label distribution of a sample through the implicit distribution representation. (ii) Spiking neural network with an MLP is developed to save energy consumption of mobile devices, and correlations between labels are deeply mined by a graph convolutional network. (iii) To the best of our knowledge, we are the first to tackle the tabular LDL issue by using deep learning. Facing the LDL task, some regularization techniques are designed to boost the performance of the model and a new LDL dataset is released.

2. BACKGROUND AND MOTIVATION

Starting in 2016, LDL ( Geng (2016)) is officially proposed as a novel learning paradigm that aims to inscribe the polysemy of a sample through the degree of description. Then, from the viewpoint of task kinds, LDL is categorized into two domains. 1) Addressing the uncertainty of application tasks ( Gao et al. (2017); Ren & Geng (2017); Gao et al. (2018); Chen et al. (2021a); Liu et al. (2021); Zhao et al. (2021); Li et al. (2022); Si et al. (2022); Cao et al. (2022); Buisson et al. (2022)); 2) Studying the characteristics of label distributions on customized datasets ( Geng (2016); Zhao & Zhou (2018); Ren et al. (2019b;a); Wang & Geng (2021); Jia et al. (2021a;b); Zhao et al. (2022a); Tan et al. ( 2022)). However, the existing work overlooks the fact that task 2 also requires uncertainty modeling for the label space. In this paper, we conduct uncertainty modeling on task 2 to boost the learning ability of the LDL algorithm. From a technical viewpoint, our approach has three

