CONNECTING SPHERE MANIFOLDS HIERARCHICALLY FOR REGULARIZATION

Abstract

This paper considers classification problems with hierarchically organized classes. We force the classifier (hyperplane) of each class to belong to a sphere manifold, whose center is the classifier of its super-class. Then, individual sphere manifolds are connected based on their hierarchical relations. Our technique replaces the last layer of a neural network by combining a spherical fully-connected layer with a hierarchical layer. This regularization is shown to improve the performance of widely used deep neural network architectures (ResNet and DenseNet) on publicly available datasets (CIFAR100, CUB200, Stanford dogs, Stanford cars, and Tiny-ImageNet).

1. INTRODUCTION

Applying inductive biases or prior knowledge to inference models is a popular strategy to improve their generalization performance (Battaglia et al., 2018) . For example, a hierarchical structure is found based on the similarity or shared characteristics between samples and thus becomes a basic criterion to categorize particular objects. The known hierarchical structures provided by the datasets (e.g., ImageNet (Deng et al., 2009) classified based on the WordNet graph; CIFAR100 (Krizhevsky, 2009) in ten different groups) can help the network identify the similarity between the given samples. In classification tasks, the final layer of neural networks maps embedding vectors to a discrete target space. However, there is no mechanism forcing similar categories to be distributed close to each other in the embedding. Instead, we may observe classes to be uniformly distributed after training, as this simplifies the separation by the last fully-connected layer. This behavior is a consequence of seeing the label structure as 'flat,' i.e., when we omit to consider the hierarchical relationships between classes (Bilal et al., 2017) . To alleviate this problem, in this study, we force similar classes to be closer in the embedding by forcing their hyperplanes to follow a given hierarchy. One way to realize that is by making children nodes dependent on parent nodes and constraining their distance through a regularization term. However, the norm itself does not give a relevant information on the closeness between classifiers. Indeed, two classifiers are close if they classify two similar points in the same class. This means similar classifiers have to indicate a similar direction. Therefore, we have to focus on the angle between classifiers, which can be achieved through spherical constraints. Contributions. In this paper, we propose a simple strategy to incorporate hierarchical information in deep neural network architectures with minimal changes to the training procedure, by modifying only the last layer. Given a hierarchical structure in the labels under the form of a tree, we explicitly force the classifiers of classes to belong to a sphere, whose center is the classifier of their super-class, recursively until we reach the root (see Figure 2 ). We introduce the spherical fully-connected layer and the hierarchically connected layer, whose combination implements our technique. Finally, we investigate the impact of Riemannian optimization instead of simple norm normalization. By its nature, the proposed technique is quite versatile because the modifications only affect the structure of last fully-connected layer of the neural network. Thus, it can be combined with many other strategies (like spherical CNN from Xie et al. (2017) , or other deep neural network architectures). Related works. Hierarchical structures are well-studied, and their properties can be effectively learned using manifold embedding. The design of the optimal embedding to learn the latent hierarchy

