TRIPLET SIMILARITY LEARNING ON CONCORDANCE CONSTRAINT

Abstract

Triplet-based loss functions have been the paradigm of choice for robust deep metric learning (DML). However, conventional triplet-based losses require carefully tuning a decision boundary, i.e., violation margin. When performing online triplet mining on each mini-batch, choosing a good global and constant prior value for violation margin is challenging and irrational. To circumvent this issue, we propose a novel yet efficient concordance-induced triplet (CIT) loss as an objective function to train DML models. We formulate the similarity of triplet samples as a concordance constraint problem, then directly optimize concordance during DML model learning. Triplet concordance refers to the predicted ordering of intra-class and inter-class similarities being correct, which is invariant to any monotone transformation of the decision boundary of triplet samples. Hence, our CIT loss is free from the plague of adopting the violation margin as a prior constraint. In addition, due to the high training complexity of triplet-based losses, we introduce a partial likelihood term for CIT loss to impose additional penalties on hard triplet samples, thus enforcing fast convergence. We extensively experiment on a variety of DML tasks to demonstrate the elegance and simplicity of our CIT loss against its counterparts. In particular, on face recognition, person re-identification, as well as image retrieval datasets, our method can achieve comparable performances with state-of-the-arts without tuning any hyper-parameters laboriously.



). With triplet constraints, images from the same class are projected into neighboring embedding spaces, and images with different semantic contexts are mapped apart. However, under such an optimization objective, triplet-based losses suffer from following two problems when training DML models with the stochastic gradient descent (SGD) algorithm and sampling triplets within a mini-batch. •Irrational to set an absolute margin. Triplet constraint relies on a decision boundary to partition the embedding space of intra-class and inter-class, i.e., violation margin for reinforcing optimization Wang et al. (2018a; b) . However, the violation margin is sensitive to scale change, and choosing an identical absolute value for clusters in different scales of intra-class variation is inappropriate Wang et al. (2017) . Hence, triplet-based losses need to regulate this hyper-parameter attentively to impose appropriate penalty strength Qian et al. (2019); Sun et al. (2020) . The performance of Circle loss Sun et al. ( 2020) on the varying circular decision boundary can prove such a claim. The performance of the same task exhibits a significant difference by setting different violation margins. And Circle loss with the same violation margin varies from superior to inferior on various tasks. For circumventing this issue, Angular loss is proposed to push the negative point away from the center of the positive cluster and drag the positive points closer to each other by constraining the upper bound of the angle at the negative point Wang et al. (2017) . In hierarchical triplet loss (HTL) Ge ( 2018), the violation margin is automatically updated over the constructed hierarchical tree to identify a margin that generates gradients for violated triplets. However, existing methods of mitigating the issue still depend on the setting of the decision boundary, only substituting a hyperparameter. Angular loss needs to specify the angle degree and HTL needs to design a hierarchical class tree. Since choosing a global and constant prior value for the decision boundary is irrational, we innovatively formulate triplet similarity learning as a concordance constraint problem without an assumed decision boundary. 2019) introduced to learn embeddings without triplet sampling, we explore laying more emphasis on hard triplet samples by relaxing concordance constraints, thus accelerating convergence speed. In each triplet sample, the intra-class similarity is naturally higher than the one of inter-class. The predicted ordering of similarities of intra-class and inter-class needs to be on par with the observed ordering. Such an ordering concordance not only takes effect on a mini-batch but also on the whole sample. Such intrinsic concordance constraint is invariant to any monotone transformation of the decision boundary of triplet samples. Hence, we develop a novel concordance-induced triplet (CIT) loss function to optimize triplet similarity. Existing triplet-based losses explicitly give a global and constant violation margin as a decision boundary based on apriori knowledge. Unlike them, our CIT loss exploits the concordance constraint of triplet similarity to avoid falling into the plague of tuning the violation margin. It is an elegant, simple, and efficient way to learn the intrinsic similarity between all samples and is insensitive to the triplet sampling within a mini-batch. We further introduce a partial likelihood term to enforce different penalty strengths on different tuple types, primarily laying more penalties on hard triplet samples. This term mainly helps improve convergence speed and exhibits a slight impact on performance, thus avoiding the plague of elaborative tuning. Based on thoroughly and randomly mini-batches and triplet sampling, this term can regulate the penalty strength keeping consistency with the degree of the discordance or concordance of triplet similarity. The higher discordance of hard triplet samples brings more penalty strengths, thus arising more contributions to gradients. The main contributions of this work are summarized as follows: • We propose a novel, simple, elegant concordance-induced triplet (CIT) loss function for deep metric learning (DML). Our CIT loss frees DML training from tuning the decision boundary by directly maximizing concordance of triplet similarity. • In addition, we introduce a partial likelihood term to impose loose concordance constraints to focus on the informativity of hard triplet samples, thus helping speed up convergence. • Using two popular backbones, we conduct extensive experiments on various DML tasks, including face recognition, person re-identification (Reid), and image retrieval. On all tasks, we demonstrate the effectiveness and elegance of our CIT loss and gained performance on par with state-of-the-art.



learning (DML) for visual understanding tasks, e.g., face recognition Schroff et al. (2015); Taigman et al. (2014), person re-identification (ReID) Shi et al. (2016); Ustinova & Lempitsky (2016), image retrieval Fang et al. (2021); Revaud et al. (2019), aims at learning embedding representations of images with class-level labels by a ranking loss function Kaya & Bilge (2019); Sohn (2016); Wang et al. (2017). There are two representative ranking loss functions developed for DML to minimize between-class similarity and maximize within-class similarity, i.e., pair-based loss Sun et al. (2014) and triplet-based loss Zhao et al. (2019). Compared to pairwise constraints, the optimization pattern of triplet-based losses additionally captures the relative similarity information, thus yielding impressive performances Liang et al. (2021); Zhuang et al. (

Suffering from slow convergence. Triplet-based losses can provide a strong supervisory signal for training DML models by mining rich and fine-grained inter-sample relations. However, since the number of tuples (each tuple contains an anchor sample and its positive and negative samples) increases polynomially with the number of training samples, they suffer from prohibitively high training complexity, thus causing significantly slow convergence Ebrahimpour et al. (2022); Kim et al. (2020). Another potential issue for triplet-based losses is that a large amount of tuples make a limited contribution to the learning algorithm and sometimes even diminishes the quality of the learned embedding space Wu et al. (2017). Many works have been devoted to studying the effective triplet sampling strategy within a mini-batch to utilize hard triplet samples that improve convergence speed or the final discriminative performance Hermans et al. (2017); Oh Song et al. (2016); Sohn (2016); Wu et al. (2017). For example, HTL Ge (2018) is proposed to automatically collect informative training triplets via an adaptively-learned hierarchical class structure. However, these hard triplet sample mining techniques involve tuning hyper-parameters and may occur the risk of overfitting when performing online triplet mining within a mini-batch Ebrahimpour et al. (2022); Kim et al. (2020). Given three tuple types (hard, semi-hard, and easy triplet samples), we need to consider how to achieve the trade-off between them during DML optimization. Leveraging hard triplet samples alone may occur bad local minima Do et al. (2019). Overwhelming easy triplet samples affect the training efficiency Schroff et al. (2015). Inspired by SoftTriplet loss Qian et al. (

