DETERMINANT REGULARIZATION FOR DEEP METRIC LEARNING

Abstract

Distance Metric Learning (DML) aims to learn the distance metric that better reflects the semantically similarities in the data. Current pair-based and proxy-based methods on DML focus on reducing the distance between similar samples, while expanding the distance of dissimilar ones. However, we reveal that shrinking the distance between similar samples may distort the feature space, increasing the distance between points of the same class region, and therefore, harming the generalization of the model. The regularization terms (such as L 2 -norm on weights) cannot be adopted to solve this issue as they are based on linear projection. To alleviate this issue, we adopt the structure of normalizing flow as the deep metric layer and calculate the determinant of the Jacobian matrix as a regularization term that helps in reducing the Lipschitz constant. At last, we conduct experiments on several pair-based and proxy-based algorithms that demonstrate the benefits of our method.

1. INTRODUCTION

Deep metric learning (DML) is a branch of learning algorithms that parameterizes a deep neural network to capture highly non-linear similarities between images according to a given semantical relationship. Because the learned similarity functions can measure the similarity between samples that do not appear in the training data set, the learning paradigm of DML is widely used in many applications such as image classification & clustering, face re-identification, or general supervised and unsupervised contrastive representation learning Chuang et al. (2020) . Commonly, DML aims to optimize a deep neural networks to span the projection space on a surface of hyper-sphere in which the semantic similar samples have small distances and the semantic dissimilar samples have large distance. This goal can be formulated as the discriminant criterion (and its many variants that appear in the literature) we summarize as follows. max{d θ (x i , x j )|j ∈ S i } < δ 1 < δ 2 < min{d θ (x i , x l )|l ∈ D i } (1) where θ are the parameters of the deep metric model, δ 1 and δ 2 are two tunable hyperparameters, and S i and D i are the sets of similar and dissimilar samples of the query x i , respectively. Commonly the log-exp function q λ (θ) = log( n j=1 e λai(θ) ) Oh Song et al. ( 2016) is used to define the objective function in DML. Besides the definition of the objective function, many works point out that the performance of DML crucially depends on the informative sample mining (HSM) procedure and therefore focus their research direction on improving the HSM. Unfortunately, the explicit definitions of informative samples is still unclear, and the problem seems to be unsolved. This leads us to the following question: what is the real reason that makes DML model so crucially depend on hard sample mining? In this paper, we try to answer this question studying the local Lipschitz constant of the learned projection f θ (x).



Figure 1: The illustration represents the feature space spanned by f θ (x) learned by deep metric learning. ri is the radius i-th class of samples in training dataset and r * the radius of an unknown class. δ2 -δ1 reflects the distance between the two closest samples in class 1 and class 2.

