COMPACT BILINEAR POOLING VIA GENERAL BILIN-EAR PROJECTION

Abstract

Deep metric learning aims at learning a deep neural network by letting similar samples have small distances while dissimilar samples have large distances. To achieve this goal, the current DML algorithms mainly focus on pulling similar samples in each class as closely as possible. However, the action of pulling similar samples only considers the local distribution of the data samples. It ignores the global distribution of the data set, i.e., the center positions of different classes. The global distribution helps the distance metric learning. For example, expanding the distance between centers can increase the discriminant ability of the extracted features. However, how to increase the distance between centers is a challenging task. In this paper, we design a genius function named the skewed mean function, which only considers the most considerable distances of a set of samples. So maximizing the value of the skewed mean function can make the most significant distance larger. We also prove that current energy functions used for uniformity regularization on centers are special cases of our skewed mean function. At last, we conduct extensive experiments to illustrate the superiority of our methods.

1. INTRODUCTION

Deep metric learning (DML) is a branch of supervised feature extraction algorithms that constrain the learned features, such that similar samples have a small distance and dissimilar samples have a large distance. Because having the ability to learn a deep neural network for unseen classes, distance metric learning, i.e., the classes of testing classes do not appear in the training data set, DML are widely used in the applications of image classification & clustering, face re-identification, or general supervised and unsupervised contrastive representation learning Chuang et al. (2020) . The goal of DML is to optimize deep neural networks to span its projection space on a surface of a hyper-sphere, in which the semantically similar samples have small distances, and the semantically dissimilar samples have large distances. 



Figure1: The illustration of assigning the location of centers. c1 is only pushed away by the six nearest centers. Because the pushing directions are contrary, the position of c1 is easy to stick. Therefore, the location assignment fails.

