ADVERSARIAL DEEP METRIC LEARNING

Abstract

Learning a distance metric between pairs of examples is widely important for various tasks. Deep Metric Learning (DML) utilizes deep neural network architectures to learn semantic feature embeddings where the distance between similar examples is close and dissimilar examples are far. While the underlying neural networks produce good accuracy on naturally occurring samples, they are vulnerable to adversarially-perturbed samples that can reduce their accuracy. To create robust versions of DML models, we introduce a robust training approach. A key challenge is that metric losses are not independent -they depend on all samples in a mini-batch. This sensitivity to samples, if not accounted for, can lead to incorrect robust training. To the best of our knowledge, we are the first to systematically analyze this dependence effect and propose a principled approach for robust training of deep metric learning networks that accounts for the nuances of metric losses. Using experiments on three popular datasets in metric learning, we demonstrate the DML models trained using our techniques display robustness against strong iterative attacks while their performance on unperturbed (natural) samples remains largely unaffected.

1. INTRODUCTION

Many machine learning (ML) tasks rely on ranking entities based on the similarities of data points in the same class. Deep Metric Learning (DML) is a useful technique for such tasks, particularly for applications involving test-time inference of classes that are not present during training (e.g., zero-shot learning). Example applications of DML include person re-identification (Hermans et al., 2017) , face verification (Schroff et al., 2015; Deng et al., 2019) , phishing detection (Abdelnabi et al., 2020) , and image retrieval (Wu et al., 2017; Roth et al., 2019) . At its core, DML relies on state-ofthe-art deep learning techniques that can produce lower-dimensional semantic feature embeddings of high-dimensional inputs. Points in this embedding space cluster similar inputs together while dissimilar inputs are far apart. Unfortunately, the underlying deep learning models are vulnerable to adversarial examples (Szegedy et al., 2014; Biggio et al., 2013) -inconspicuous input changes that can cause the model to output attacker-desired values. Thus, DML models themselves are vulnerable to adversarial examples. Given their wide usage in diverse ML tasks, including security-oriented ones, it is important to train robust DML models that withstand attacks. This paper tackles the open problem of training DML models using robust optimization techniques (Ben-Tal et al., 2009; Madry et al., 2018) . A key challenge in robust training of DML models concerns the so-called metric losses (Wu et al., 2017; Wang et al., 2019; Chechik et al., 2010; Schroff et al., 2015) . Unlike loss functions used in typical deep learning settings, DML loss for a single data point depends on the other data points in the mini-batch. A sampling process selects points for a mini-batch, and thus, the DML losses are sensitive to this process as well. For example, the widely-used triplet loss requires three input points: an anchor, a positive sample similar to the anchor, and a negative sample dissimilar to the anchor. To compute this loss, for a batch of size B, this would require O(B 3 ), making the training process inefficient. Thus, a sampling process ensures that a mini-batch contains enough positive and negative examples for the training to be useful while keeping the batch small enough to be efficient. This dependence of the DML loss on the contents of the mini-batch poses a challenge to adversarial training: (1) it is unclear what points should be adversarially perturbed; and (2) it is unknown whether the perturbations would cause training instability. Training a DML model is sensitive to the sampling process, and selecting samples that are too "hard" or "negative" can lead to training collapse (Wu et al., 2017) . We systematically approach the above challenges and contribute a robust training objective formulation for DML models by considering the widely-used triplet loss. Our key insight is that during an inference-time attack, adversaries seek to perturb data points such that the intra-class distance maximize, and thus this behavior needs to be accounted for during training to improve robustness. Recent work has attempted to train robust DML models, but they do not consider the issue of loss dependence and sensitivity to sampling (Abdelnabi et al., 2020) . This leads to non-robust DML models (Panum et al., 2020) .

Contributions.

• We contribute a principled robust training framework for Deep Metric Learning models by considering the dependence of triplet loss on the other data points in the mini-batch and the sensitivity to sampling. • We experiment with three commonly-used datasets for vision-based deep metric learning (CUB200-2011, CARS196, SOP) and show that naturally-trained models do not have any robustness -their accuracy drops to close to zero when subjected to PGD attacks that we formulate. • Using our robust formulation, we achieve good robustness. For example, for a PGD attacker with five iterations and δ ∞ < 0.01, we obtain an adversarial accuracy of 48.7 compared to the state-of-the-art natural accuracy baseline of 71.8 for the SOP dataset (in terms of R@1 score, a common metric in DML to assess the accuracy of models). Furthermore, the resulting robust model accuracies are largely unaffected for natural (unperturbed) samples.

2. RELATED WORK

Deep Metric Learning. Deep Metric Learning (DML) is a popular technique to obtain semantic feature embeddings with the property that similar inputs are geometrically close to each other in the embedding space while dissimilar inputs are far apart (Roth et al., 2020) . DML employs a variety of metric losses such as contrastive (Hadsell et al., 2006) , triplet (Schroff et al., 2015) , lifted-structure (Hermans et al., 2017) , and angular loss (Wang et al., 2017) . Recent surveys (Roth et al., 2020; Musgrave et al., 2020) highlight that performance of newer metric losses are lesser than previously reported. Thus, we choose to focus on the two established metric losses, contrastive and triplet loss, as they are widely used and have good performance. Adversarial Robustness. Since early work in the ML community discovered adversarial examples in deep learning models (Szegedy et al., 2014; Biggio et al., 2013) , a big focus has been to train adversarially-robust models. We focus on robust optimization-based training that utilizes a saddle-point formulation (min-max) (Ben-Tal et al., 2009; Madry et al., 2018) . To the best of our knowledge, no prior work has considered training DML models using robust-optimization techniques. Recent work, however, has used metric losses to improve adversarial training for standard deep network architectures (e.g., CNNs) (Mao et al., 2019; Li et al., 2019) . These techniques use metric losses (e.g., triplet) instead of traditional ones (e.g. cross-entropy). By contrast, our goal is to create a robust training objective for DML models themselves. This requires considering the dependence of metric losses on mini-batch items and the sampling process that derives those items. We propose a principled framework for robustly training DML models that considers these factors. Duan et al. (2018) propose a framework that uses generative models (e.g., GANs (Goodfellow et al., 2014) ) during training to generate hard negative samples from easy negatives. We observe that this work is concerned with better natural training of DML models rather than adversarial training, which is the focus of this work.

3. TOWARDS ROBUST DEEP METRIC MODELS

First, we describe some basic machine learning (ML) notation and concepts required in this paper. We assume a data distribution D over X × Y, where X is the sample space and Y = {y 1 , • • • , y L } is the finite space of labels. Let D X be the marginal distribution over X induced by Dfoot_0 . Given Y ⊆ Y we define D Y to be the measure of the subsets of X × Y induced by D. For y ∈ D, D y and D -y denote the measures of the sets D {y} and D Y\{y} , respectively. In the empirical risk minimization (ERM) framework we wish to solve the following optimization problem: min w∈H E (x,y)∼D l(w, x, y) In the equation given above H is the hypothesis space and l is the loss function. We will denote vectors in boldface (e.g. x, y). Since the distribution is usually unknown, a learner solves the following problem over a data set S = {(x 1 , y 1 ), • • • .(x n , y n )} sampled from the distribution D. min w∈H 1 n n i=1 l(w, x i , y i ) Once we have solved the optimization problem given above, we obtain a w * ∈ H which yields a classifier F : X → Y (the classifier is usually parameterized by w * , but we will omit it for brevity).

3.1. DEEP METRIC MODELS

A deep metric model f θ is function from X to S d , where θ ∈ Θ is a parameter and S d is an unit sphere in R d (i.e. x ∈ S d iff x 2 = 1). Since deep metric models embed a space X (which can itself be a metric space) in another metric space, we also sometimes refer to them deep embedding. Deep metric models use very different loss functions than typical classification networks described previously. Next we discuss two kinds of loss functions -contrastive and triplet. Let S = {(x 1 , y 1 ), • • • , (x n , y n )} be a dataset drawn from D. A contrastive loss function l c is defined over a pair (x, y), (x 1 , y 1 ) of labeled samples from X × Y and is defined as: l c (θ, (x, y), (x 1 , y 1 )) = 1 y=y1 d θ (x, x 1 ) + 1 y =y1 [α -d θ (x, x 1 )] In the equation given above, 1 E is an indicator function for event E (1 if event E is true and 0 otherwise), and d θ (x, x 1 ) is d j=1 (f θ (x) j -f θ (x 1 ) j ), the 2 distance in the embedding space. A triplet loss function l t is defined over three (x, y), (x 1 , y 1 ) and (x 2 , y 2 ) labeled samples and is defined as follows: l t (θ, (x, y), (x 1 , y 1 ), (x 2 , y 2 )) = 1 y=y1 1 y =y2 [d θ (x, x 1 ) -d θ (x, x 2 ) + α] + In the equation given above [x] + is max(x, 0).

3.2. ATTACKS ON DEEP METRIC MODELS

Assume that we have learned a deep embedding network with parameter θ ∈ Θ using one of the loss functions described above. Next we describe how the network is used. Let A = {(a 1 , c 1 ), • • • , (a m , c m )} be a reference or test dataset (e.g. a set of faces along with their label). A is distinct from the dataset S used during training time. Suppose we have a sample z and let k(A, z) be the index that corresponds to arg min j∈{1,••• ,m} d θ (a j , z)foot_1 . We predict the label of z as lb(A, z) = c k(A,z) (we will use the functions k(., .) and lb(., .) throughout this section). Next we describe test-time attacks on a deep embedding with parameter θ. Let A y ⊂ A be the subset of the reference dataset with label y ∈ Y (i.e. A y is equal to {(a j , y) | (a j , y) ∈ A}). Let z ∈ X . Untargeted attack on z can be described as follows (we want the perturbed point to have a different label than before): min δ∈X µ(δ) such that lb(A, z) = lb(A, z + δ) Targeted attack (with a target label t = lb(A, z)) can be described as follows (we desire to the predicted label of the perturbed point to be a specific label): min δ∈X µ(δ) such that lb(A, z + δ) = t (6) In the formulations given above we assume that X is a metric space with µ a metric on X (e.g. X could R n with usual norms, such as ∞ , 1 , or p (for p ≥ 2)). Any algorithm that solves the optimization problem described above leads to a specific attack on deep metric models.

3.3. ROBUST DEEP METRIC MODELS

Let S = {(x 1 , y 1 ), • • • , (x n , y n )} be a dataset drawn from distribution D. For a sample (x i , y i ) where 1 ≤ i ≤ n we define the following surrogate loss function l(θ, (x i , y i ), S) for the contrastive loss function l c : l(θ, (x i , y i ), S) = 1 n n j=1 l c (θ, (x i , y i ), (x j , y j )) Similarly, for the triplet loss function l t we can define the following surrogate loss function: l(θ, (x i , y i ), S) = 1 n yi n - yi ny i j=1 n - y i k=1 l t (θ, (x i , y i ), (x j , y j ), (x k , y k )) Let S y and S -y be defined as the following sets: {(x, y) | (x, y) ∈ S} and {(x, y ) | (x, y ) ∈ S and y = y}. In the equation given above the sizes of the sets S y and S -y are denoted by n y and n - y . Having defined the surrogate loss function l the learner's problem can be defined as: min θ∈Θ 1 n n i=1 l(θ, (x i , y i ), S) Recall that the learner's problem for the usual classification case is: min w∈H 1 n n i=1 l(w, x i , y i ) (10) Note that in the classification case the loss function l(w, x i , y i ) of a sample (x i , y i ) does not depend on the other samples in the dataset S. However, in the deep metric model case the surrogate loss function l(θ, (x i , y i ), S) for a sample (x i , y i ) depends on the rest of the data set S. This is the main difference between the embedding and classification scenarios. Formulation 1. Let B p (x, ) denote the -ball around the sample x using the p -norm. The straightforward robust formulation is given in the equation below. min θ∈Θ max (z1,••• ,zn)∈ n j=1 Bp(xj , ) 1 n n i=1 l(θ, (z i , y i ), S) In the formulation given above, all samples are adversarially perturbed at the same time (note that the max is outside the summation). Therefore, this formulation is not convenient for current training algorithms, such as SGD and ADAM. This is because the entire dataset S has to be perturbed at the same time. Moreover, this formulation is not conducive to various sampling strategies used in training of deep metric models. Formulation 2. In this formulation we push the max inside the summation sign of the optimization formulation. min θ∈Θ 1 n n i=1 max z∈Bp(xi, ) l(θ, (z, y i ), S) In the formulation given above, the sample x i is adversarially perturbed while other samples in the dataset S are kept intact while computing the surrogate loss function l. This formulation is more conducive to current training techniques, such as SGD and ADAM. Moreover, this formulation is also conducive to various strategies for sampling pairs and triplets used in deep metric models. Formulation 3. Our third formulation adds a regularizer which enforces the following informal constraint: if x changes a bit, the distance in the embedding space does not change too much. min θ∈Θ 1 n n i=1 [ l(θ, (z, y i ), S) + λ max z∈Bp(xi, ) d θ (z, x i )] These robust optimization formulations follow the common notion of robustness from robust optimization (Ben-Tal et al., 2009) , thus given an algorithm for solving one of the robust optimization formulations, leads to a robust model.

3.4. ATTACK ALGORITHM

Performing test-time attacks requires an algorithm to solve the optimization objective described in Equation 5. To evaluate the adversarial robustness of DML models, we propose the attack algorithm seen in Algorithm 1. The intuition of the algorithm is that adversaries seek to push a data point (under attack) x i further away from a close data point x j of the same label in the embedding space, thus consequently closer to an embedding of data point of a different label. Recall that the embedding space is normalized to the unit sphere. To select a close data point, the algorithm firstly examines the label of nearest neighboring data point in the embedding space k(A (i) , x i ) for the respective set of anchors A (i) . If the retrieved label is different from the label of the data point under going attack y i , then no perturbation is performed as the data point is already misclassified under nearest neighbor inference, commonly used within DML. If they share classes, then data point x i is adversarial perturbed by approximating a solution to arg max z∈Bp(xi, ) d θ (z, x j ) using established attacks methods, such as Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) , Carlini-Wagner (C&W) (Carlini & Wagner, 2017) , Projected Gradient Decent (PGD) (Madry et al., 2018) . The algorithm, and the mentioned attack methods, are applicable to commonly used norms in robust optimization. Algorithm 1 Evaluation of adversarial robustness for a model parameterized by θ and a dataset S = {(x 1 , y 1 ), • • • , (x m , y m )} under some for respective p -norm. for i = 1 . . . m do A (i) ← S \ {(x i , y i )} // Define the set of anchors used for inference j ← k(A (i) , x i ) // Find the index of the nearest neighbor data point (using d θ (•, •)) if y i = y j then // Nearest neighbor has the same class as x i , thereby perform adversarial perturbation x i ← arg max z∈Bp(xi, ) d θ (z, x j ) // Approximate using established attack methods else x i ← x // No perturbation for incorrect predictions end if eval(θ, x i , y i , A (i) ) // Optional step: Perform evaluations end for Following the adversarial perturbation, eval(θ, x i , y i , A (i) ) is an explicit evaluation step, that we use throughout experiments, however it is optional for the actual attack. For more details on evaluations performed throughout the experiments, see Section 4.

3.5. ADVERSARIAL TRAINING

To improve robustness of DML models, we propose a training objective that align with the covered robust formulations and can be viewed as adversarial training (Madry et al., 2018) for DML models. Given a set of possible perturbations ∆ p = {δ | δ p ≤ }, parameterized by and an p -norm, let ρ((x i , y i ), (x j , y j ), (x k , y k )) be a function that outputs an adversarial perturbation for the data point x i , with respect to the two dependent datapoints x j and x k , be defined as: ρ((x i , y i ), (x j , y j ), (x k , y k )) = arg max δ∈∆p 1 yi=yj d θ (x i + δ, x j ) -1 yi =y k d θ (x i + δ, x k ) (14) Given a triplet (x i , y i ), (x j , y j ), (x k , y k ) where y i = y j and y i = y k , this formulation is an inversion of the training objective enforced by metric losses (maximize inter-class distances, minimize intraclass distances (Boudiaf et al., 2020) ) for a given norm. Effectively, when x i and x j share classes (y i = y j ), the distance in the embedding space between these two points is maximized, while if y i = y k , the distance in the embedding space between x i and x k is minimized. The constrained optimization problem defined by ρ(•, •, •) can be solved using commonly approximation techniques for uncovering adversarial perturbations, such as FGSM (Goodfellow et al., 2015) , C&W (Carlini & Wagner, 2017) , PGD (Madry et al., 2018) . To simplify notation in following equations, we adopt the notation of ρ (i,j,k) for ρ((x i , y i ), (x j , y j ), (x k , y k )), in addition to ρ (i,j) = ρ (i,j,j) . The robust training objective for the contrastive loss function be given by: arg min θ∈Θ l(θ, (x 1 , y 1 ), (x 2 + γρ (2,1) , y 2 )) . Here, γ ∈ {0, 1} is a discrete random variable for which P (γ = 1) ∈ [0; 1] is a hyper-parameter that specifies the attack rate during training. For tuple-based losses perturbations are not performed on negative pairs, such that P (γ = 1 | y 1 = y 2 ) = 0. To reduce notation clutter, we use the notation of P (γ = 1) interchangeably across losses, however, for tuple-based losses it refers to P (γ = 1 | y 1 = y 2 ). Similarly, let the robust training objective for triplet-based metric losses be given by: arg min θ∈Θ l(θ, (x 1 , y 1 ), (x 2 + γρ (2,1) , y 2 ), (x 3 , y 3 )) . ( ) These robust training objectives align the usual objective of adversarial training for deep neural networks, i.e. replace a sample by their "worst case" variant before normal training. For the sake of concreteness, we apply the objective to the ∞ norm, but the method is applicable to other norms commonly used for adversarial training.

4. EXPERIMENTS

Our experiments explore the following research questions. Q1. How robust are naturally trained DML models towards established adversarial example attacks? Among commonly used datasets for visual similarity, we find that DML models, trained with established hyper-parameters, are vulnerable to adversarial examples, similar to non-DML models (Table 1 ). This forms our baseline for robustness. Q2. What is the accuracy of DML models when they are trained using our robust formulation? We find that DML models can be trained to become more robust towards a given threat model. For example, for a PGD attacker with 5 iterations and δ ∞ < 0.01, we obtain an adversarial accuracy of 48.7 compared to the state-of-the-art natural accuracy baseline of 71.8 for the SOP dataset (Table 2 ).

4.1. EXPERIMENTAL SETUP

Each experiment use parameter choices of Roth et al. (2020) , unless otherwise specified, and perform evaluations on the commonly used datasets within DML. These choices are a reflecting of state-ofthe-art performance for naturally-trained DML networks. We summarize some of these choices below, and emphasize deviations. For further we details on parameter choices, we refer the reader to Appendix A or the original work of Roth et al. (2020) . Experiments were executed on an NVIDIA Tesla V100 GPU (32GB RAM). Code for the experiments is available at: (anonymized repository) https://github.com/starving-panda/ adversarial-metric-learning. Model & Optimizer. Each experiment use a ResNet50 model (He et al., 2016) initialized with pre-trained ImageNet weights and frozen batch-norm layers. The last output layer is replaced with a fully-connected untrained embedding layer of size 128. ResNets are commonly used and preferable in DML due to their reduced amount of parameters (Musgrave et al., 2020; Roth et al., 2020) , while having comparable performance to other more complex deep neural network (DNN) architectures. For training, the ADAM (Kingma & Ba, 2015) optimizer with a learning ratefoot_2 of 10 -6 , weight decay of 4 • 10 -4 is used. For contrastive loss α = 1.0, while α = 0.2 for triplet loss. Datasets. We use datasets common for metric learning: (1) CUB200-2011 (Welinder et al., 2010) containing 11788 images of birds across 200 species; (2) CARS196 (Krause et al., 2013) Adversarial Perturbations. The proposed attack algorithm, Algorithm 1, and ρ(•, •, •) depends on solving an arg max expression. To solve this formulation, we use PGD (Madry et al., 2018) , which is an established attack method for creating adversarial perturbations. Attacks are run for five iterations, this was chosen to keep the efficient (in terms of runtime), as training is DML models is considered an expensive procedure (Roth et al., 2020) , and because the performance of the attack is known to saturate with high iterations counts (Wong et al., 2020) . Step sizes are fixed to 2( i ), where specifies the domain of valid perturbations under ∞ , thus any perturbation δ satisfy δ ∞ ≤ . This step size was chosen to keep the step size small, while ensuring the any point within the -ball is reachable, regardless of initialization (Wong et al., 2020) . Alternative attack methods, and their effectiveness towards naturally-trained DML models, can be seen in Appendix D. Evaluation Metrics. We use two DML-specific evaluation metrics proposed by Musgrave et al. (2020) , namely: Recall at one (R@1) and Mean Average Precision at R (mAP@R). R@1 is the accuracy of using the embeddings that the model produces for inferring the output class label using the class of the nearest neighbor anchor. Given a test set S = {(x 1 , y 1 ), • • • , (x n , y n )}, R@1 is given by: R@1 = 1 n n i=1 Prec(i, 1) , where Prec(i, k) = 1 k j∈I(k) 1 yi=yj I(k) = arg min |K|=k i / ∈K j∈K d θ (x i , x j ) mAP@R (Musgrave et al., 2020 ) measures a given model's ability to rank among classes in the embedding space; we adopted this metric for the reasons covered by Musgrave et al. (2020) . It is defined as: mAP@R = 1 n n i=1 1 R(i) R(i) k=1 Prec(i, k) , where R(i) = (xj ,yj )∈S i =j 1 yi=yj (19)

4.2. EXPERIMENTAL RESULTS

First we seek to establish a baseline of robustness against adversarial perturbations for naturallytrained DML models. We obtain this baseline by applying Algorithm 1 for three common values CUB200-2011 CARS196 SOP ∞( =↓) R@1 mAP@R R@1 mAP@R R@1 mAP@R

Contrastive

Unperturbed 59.1 ± 0.0 21.0 ± 0.0 74.0 ± 0.0 20.9 ± 0.0 71.8 ± 0.0 44.7 ± 0.0 0.01 2.1 ± 0.1 2.6 ± 0.0 0.4 ± 0.0 1.5 ± 0.0 1.3 ± 0.0 1.7 ± 0.0 0.05 0.1 ± 0.0 2.2 ± 0.0 0.2 ± 0.0 1.4 ± 0.0 0.0 ± 0.0 1.1 ± 0.0 0.10 0.0 ± 0.0 2.2 ± 0.0 0.2 ± 0.0 1.4 ± 0.0 0.0 ± 0.0 1.1 ± 0.0 Triplet Unperturbed 59.3 ± 0.0 21.7 ± 0.0 74.0 ± 0.0 21.4 ± 0.0 69.6 ± 0.0 42.1 ± 0.0 0.01 2.5 ± 0.1 3.0 ± 0.0 0.3 ± 0.0 1.6 ± 0.0 0.5 ± 0.0 1.4 ± 0.0 0.05 0.0 ± 0.0 2.5 ± 0.0 0.1 ± 0.0 1.5 ± 0.0 0.0 ± 0.0 1.2 ± 0.0 0.10 0.0 ± 0.0 2.5 ± 0.0 0.1 ± 0.0 1.5 ± 0.0 0.0 ± 0.0 1.2 ± 0.0 Table 1 : Performance of naturally-trained DML models against adversarial perturbations generated using Algorithm 1 with PGD across various for ∞ . Results are an average of five random seeds. Recall that, R@1 reflects a model's inference accuracy, while mAP@R reflects its ability to rank similar entities. Naturally-trained DML models are not robust to the generated adversarial perturbations. Table 2: Performance of DML models trained using the proposed adversarial training objective (using PGD) compared to naturally-trained DML for adversarial perturbations within ∞ ( = 0.01). ←Loss ∞( = 0.01) CUB200-2011 CARS196 SOP R@1 mAP@R R@1 mAP@R R@1 mAP@R C Natural 2.1 ± 0.1 2.6 ± 0.0 0.4 ± 0.0 1.5 ± 0.0 1.3 ± 0.0 1.7 ± 0.0 Robust 18.7 ± 0. Losses are denoted by C (contrastive) and T (triplet). The robustly trained model attain both higher inference accuracy (R@1) and improved ability to rank similar entities (mAP@R) than the naturallytrained baseline model. Thereby, the proposed robust training objective improves the robustness towards adversarial perturbations. of ∈ {0.01, 0.05, 0.1} used for computer visionfoot_3 , under the ∞ norm, such that a perturbation δ satisfy δ ∞ ≤ . PGD with five iterations is used to approximate a solution to the arg max. Five iterations was chosen as (1) it saturates the maximum attack effect because R@1 ≈ 0, and (2) in this situation, adding more iterations leads to stagnation (Wong et al., 2020) . Table 1 summarizes the findings, for which reported performance is an averaged over five distinct random seeds for the attacks algorithm. We observe that naturally-trained DML models achieve very low robustness to our proposed attack, experiencing a drop in both R@1 (inference) and mAP@R (ability to rank) by several orders of magnitude. Thus, to answer Q1, we conclude that DML models are not inherently robust to established attacks, suggesting that prior studies which report higher robustness of naturally-trained DML models is a potential sign of ineffective attacks instead (Abdelnabi et al., 2020) . Next, we examine the effectiveness of our robust training objective (Section 3.5) for creating robust DML models. We set a threat model ∞ ( = 0.01) and optimize a min-max objective while re-using hyper-parameter choices from non-robust training for five attack rates P (γ = 1) ∈ {0.1, 0.25, 0.5, 0.75, 1.0}. For the inner-maximization, we use PGD with a similar configuration to the attack setting. Table 2 summarizes the results for the best performing robust model, for which the performance is reported as an average of five attacks with distinct random seeds. Performance of the trained robust models on natural (unperturbed) input largely remains unaffected, for more details on this, we refer the reader to Appendix C. We observe that our robust formulation does improve model robustness against the specified PGDbased attack across any combination of loss and dataset. Consistently, for all of the considered attack rates, the robustness increased, more details on this matter can be found in Appendix C. An example of inference across training objectives can be seen in Figure 1 . For more examples we refer the reader to the online galleryfoot_4 . We also observed that the gained robustness towards the PGDbased also transfer to other established attack methods, such as C&W and FGSM, see Appendix B for related results. More details on the effects of the robust training objective on the embedding space, is provided by a small experiment using a high-dimensional synthetic data set in Appendix E, highlighting the embeddings of a robust model remain more stationary when input is adversarially perturbed. 

5. DISCUSSION

The proposed robust training objective relies on perturbing positive data points within tuples and triplets, for reasons addressed in Section 3.4. Experimentation of alternative perturbation targets, in Appendix F, demonstrate that the choice of perturbation target (positive) out perform alternatives (anchor, positive) in five out of six cases. For one particular case (contrastive loss on CUB200-2011), perturbing the anchor data points achieves better performance (R@1 = 20.9) than the proposed method (R@1 = 18.4). We speculate that the inability of other perturbation targets are associated with known instabilities of metric losses (Wu et al., 2017) . Namely, that large distances between anchors and negatives cause gradient updates during training to be noisy across batches. Stability is typically achieved by applying a sampling process that account for this phenomenon, and thus select a certain distribution of tuples or triplets. The proposed robust training objective induce noise (high loss) post-sampling, potentially causing instabilities during training. We hypothesis that this fact could be related to the observed drop in performance for certain training scenarios with high attack rates, P (γ = 1) = 1, as seen in Appendix C. Despite the proposed robust training objective already demonstrating promising results, we encourage future research to explore the inherent dilemma between robust optimization and instabilities in DML further.

6. CONCLUSION

Deep Metric Learning (DML) creates feature embedding spaces where similar input points are geometrically close to each other, while dissimilar points are far apart. However, the underlying DNNs are vulnerable to adversarial inputs, thus making the DML models themselves vulnerable. We demonstrate that naturally-trained DML models are vulnerable to strong attackers, similar to other types of deep learning models. To create robust DML models, we contribute a robust training objective that can account for the dependence of metric losses -the phenomenon that the loss at any point depends on the other items in the mini-batch and the sampling process that was used to derive the mini-batch. Our robust training formulation yields robust DML models that can withstand powerful PGD attackers without severely degrading their performance on natural inputs.

A TRAINING PARAMETERS (EXPANDED)

This section expands upon details and hyper-parameters used throughout the training of the respective DML models. Batches & Sampling. The training process use a mini-batch size of 112 data points. Each minibatch is sampled such that it contains a fixed amount of classes per class (Roth et al., 2020) , for which this fixed amount is specified is two samples per classes. Following this, sets of tuples and triplets (depending on loss used for training) are derived from the mini-batch. The triplet-set's size is the same as the mini-batch, for which each data point is used as an anchor. The tuple-set's size is double that of the mini-batch, thus balancing out the number of data points being compared relative to the triplet-set. Furthermore, each data point within the tuple-set is used in a positive and negative pair. DML models are prone to collapse during training, a phenomenon caused by too many distant negative samples in the tuple-or triplet-sets, causing the model to map any data point to the same point in the embedding space. As a measure to reduce the probability of facing this phenomenon, distance-weighted sampling is used for sampling negatives (Wu et al., 2017) . Data Augmentation. We augment the dataset using the following operations for each input image: (1) random cropping to an image patch of size 60-100% of the original image area; (2) scaling; (3) normalization of pixel intensities. One difference is that our patch sizes differ from Roth et al. (2020) that employs patches of size 8-100% of original area. We change this parameter because recent work suggests that computer vision models can be biased by backgrounds and textures during during (Xiao et al., 2020) . To combat this, we use cropping and scaling values based on Szegedy et al. (2015) .

B EXPERIMENT: UNIVERSALITY OF ROBUSTNESS

The robust DML models trained using the proposed adversarial training algorithm (covered in Section 3.5) approximate the solution to the arg max in ρ((x i , y i ), (x j , y j ), (x k , y k )) (defined in Equation 14) using PGD (Madry et al., 2018) , and have demonstrated improved robustness towards attacks using on PGD (see Table 2 ). However, as a measure to test if the increased robustness also apply to attacks relying on other attack methods, such as FGSM and C&W, we conduct a robustness evaluation using Algorithm 1 with these two attack methods for ∞ ( = 0.01). For FGSM, we use α = , to maximize the perturbation, and thereby attack strength, under the given norm. Despite C&W being originally designed for the 2 norm, we sought out to adopt an implementation created by the authorfoot_5 for the ∞ norm. However, it must be noted that the author now discourage the use of the attack method in favor of PGD for ∞foot_6 . For C&W, we had to limit the attack to 50 iterations as a measure to combat extreme runtime, that were infeasible to run on our available resources. Thus the applied attack might not be able capture the full potential of the attack method. Results of these experiments can be seen in Table 3 and Table 4 . It can be seen that, despite having used PGD for the adversarial training, the improved robustness is also useful for perturbations of other attack methods.

C EXPERIMENT DETAILS: ATTACK RATES

This section expands upon the training process of the proposed robust formulation, particularly the impact of attack rate P (γ = 1), and the following performance of the DML models. In Table 5 , the performance of models against adversarial perturbations across various attack rates can be seen. These results demonstrate that robustness of the models being trained increases across all of the covered attack rates P (γ = 1) ∈ {0.1, 0.25, 0.5, 0.75, 1.0}. Additionally, there seems to be an association between higher attack rates yielding higher robustness. However, this relationship is not monotonic, and across data sets and losses optimal choices for attack rate vary. In Table 6 , the Table 3 : Performance of DML models trained using the proposed adversarial training objective (using PGD) compared to naturally-trained DML for adversarial perturbations within ∞ ( = 0.01) discovered using FGSM. Losses are denoted by C (contrastive) and T (triplet). The robustly trained model attain both higher inference accuracy (R@1) and improved ability to rank similar entities (mAP@R) than the naturally-trained baseline model. Thereby, the proposed robust training objective improves the robustness towards adversarial perturbations. Table 4 : Performance of DML models trained using the proposed adversarial training objective (using PGD) compared to naturally-trained DML for adversarial perturbations within ∞ ( = 0.01) discovered using C&W. Losses are denoted by C (contrastive) and T (triplet). The robustly trained model attain both higher inference accuracy (R@1) and improved ability to rank similar entities (mAP@R) than the naturally-trained baseline model. Thereby, the proposed robust training objective improves the robustness towards adversarial perturbations. ←Loss FGSM CUB200-2011 CARS196 SOP R@1 mAP@R R@1 mAP@R R@1 mAP@R C Natural ←Loss C&W CUB200-2011 CARS196 SOP R@1 mAP@R R@1 mAP@R R@1 mAP@R C Natural performance for the robust (and a naturally-trained) models against benign (unperturbed) input can be seen. Generally, it can be seen that the robust formulation has minor impact on performance on benign input, as each model's evaluation metrics are close to performance of the naturally-trained baseline. Interestingly, for contrastive loss on CUB200-2011 and CARS196 (P (γ = 1) = 0.25 and P (γ = 1) = 0.1), the robust training objective have enable the benign performance to exceed the naturally-trained baseline. This could suggest that a low frequency of adversarial perturbations could potentially improve the training process of non-robust DML. We deem more details on this to out of scope for our work, but see it as an promising direction for future research to explore more thoroughly.

D EXPERIMENT: NATURAL ROBUSTNESS (ALTERNATIVES)

This section covers the robustness for naturally-trained DML models for two other established attack methods, FGSM (Goodfellow et al., 2015) and C&W (Carlini & Wagner, 2017) . Using the Algorithm 1 with these methods, it can be seen in Table 7 that they also lower the performance for ∞ ( = 0.01). Hyper-parameters for C&W are covered in Appendix B, particularly iterations had to remain low to feasible runtime under the available resources. Thereby, tuning parameters to a greater extend could yield a more powerful attack and thus lower robustness.

E EXPERIMENT: SYNTHETIC DATA

As a measure to establish more clarity into the effects of induced by the robust training objective (covered in Section 3), we seek to construct an experiment with two high-dimensional and well-defined data distributions being mapped to low-dimensional embedding space. We define the dimensionality of the high-dimensional input space as k = 3 × 224 × 224, in order to share the dimensionality of the other experiments using real-world datasets. We construct the dataset with CUB200-2011 CARS196 SOP Table 5 : Performance of robust DML models on adversarial input for various specified (training) attack rates P (γ = 1) . These models were trained using the proposed adversarial training algorithm (covered in Section 3.5) with PGD for ∞ ( = 0.01). Evaluations on conducted on adversarial input generated using Algorithm 1. Naturally-trained marks the performance of DML models using traditional non-robust training objectives. Bold marks best performance for dataset, metric, loss combinations. Robust models reach higher inference accuracy (R@1) and better ability to rank similar entities (mAP@R) on adversarial input than naturally-trained DML models. Higher attack rates are often associated with higher robustness. P (γ = 1) ↓ R@1 mAP@R R@1 mAP@R R@1 mAP@R CUB200-2011 CARS196 SOP (triplet) . Recall that, R@1 reflects a model's inference accuracy, while mAP@R reflects its ability to rank similar entities. P (γ = 1) ↓ R@1 mAP@R R@1 mAP@R R@1 mAP@R two classes, "a" and "b", for which each has an associated independent k-dimensional Gaussian distribution. These distributions are N k (µ a , Σ) and N k (µ b , Σ), for class "a" and "b" respectively, where: µ a = (0.25 • • • 0.25) ∈ R k , µ b = (0.75 • • • 0.75) ∈ R k , and Σ = σ 2 • I k . Here, I k ∈ R k×k is the identity matrix of size k and σ = 0.025. Following this, we draw a dataset D N with a fixed amount of data points for each class and train a deep metric model f θ : R k → R 2 parameterized by θ. We choose to have the embedding space be two-dimensional as a measure to enable visualizations of the learned embedding space. Using D N , we train two variants of f θ , one using the natural (non-robust) training objective and another using the proposed robust training objective (for ∞ ( = 0.01)). Each model use contrastive loss, is trained across 25 epochs on 508 train data points, while being evaluated on 516 test data points. Following, we perform a test-time attack using Algorithm 1 with PGD ( = 0.01). Differences of the learned embedding spaces, and the influence of the adversarial perturbations, can be seen in Figure E . Generally, it can be seen that embeddings of the robust model remain more stable (in terms of position) in the embedding space, when compared to naturally-trained model. Both models have R@1 = 100.0 for benign (unperturbed) data, while the naturally-trained model have R@1 = 4.6 for adversarial data points, while the robust model attains R@1 = 100.0 for those data points.

F EXPERIMENT: PERTURBATION TARGET

The proposed adversarial training techniques for DML apply perturbations to positive data points. This choice was made to establish alignment with the adversarial behavior covered in Section 3.4. However, the "adversarial loss" ρ(•, •, •) can capture violations across various data point types (anchor, positive, negative). To assess the impact of using alternative perturbation targets (negative, anchor) for adversarial training, we performed an experiment with P (γ = 1) = 0.5 (unless otherwise stated) across these alternative perturbation targets. Losses and attack rates used during adversarial training, for the other perturbation targets, are specified below. Negative perturbations for contrastive loss: l(θ, (x 1 , y 1 ), (x 2 + γρ (2,1) , y 2 )) , P (γ = 1 | y 1 = y 2 ) = 0 , and P (γ = 1 | y 1 = y 2 ) = 0.5 . Anchor perturbations for contrastive loss: l(θ, (x 1 + γρ (1,2) , y 1 ), (x 2 , y 2 )) , P (γ = 1 | y 1 = y 2 ) = P (γ = 1 | y 1 = y 2 ) = 0.5 . Negative perturbations for triplet loss: l(θ, (x 1 , y 1 ), (x 2 , y 2 ), (x 3 + γρ (3,1) , y 3 )) (nearest neighbour for perturbed input) Figure 2 : Effects of adversarial perturbations in the embedding space (embedding shift, inference) for 16 randomly sampled data points. Circles mark embeddings of unperturbed data points, while crosses are adversarial perturbations for the respective data points. (first row) Grey lines connect the unperturbed data points to their perturbed counterpart, to highlight the shift in the embedding space. (second row) Green and red lines connect the embedding of the respective adversarial data point to its nearest (and unperturbed) neighbor. If the line is green, this neighbor is of the same class (correct) while red is the opposite (wrong). It can be seen that embeddings of the robustly-trained model shifts much less, when faced with adversarial perturbed input, and are thus more robust. Anchor perturbations for triplet loss: l(θ, (x 1 + γρ (1,2,3) , y 1 ), (x 2 , y 2 ), (x 3 , y 3 )) Training processes for contrastive-and triplet loss can be seen in Figure 3 and Figure 4 , respectively. Across each combination of loss and dataset, the proposed adversarial training (positive perturbation) yields the most robustness for five out six experiments, with the exception of CUB200-2011 for contrastive loss. Additionally, it reaches several orders of magnitudes higher performance on the CARS196 and SOP datasets. We speculate that this inability for the alternative perturbation targets to reach similar performance can be linked to alterations on distance between anchors and negatives during training. Having large distance between anchor data points and negative data points are known to cause instabilities during training, causing models to reach a local minima (Wu et al., 2017) . 



The measure of set Z ⊆ X under distribution DX is the measure of the set Z × Y in distribution D. In case one or more anchors share the minimal distance to z, the tie is broke by a random selection among these anchors. This learning rate differ from the one stated byRoth et al. (2020) in their publication, 10 -5 , but reflects the actual learning rate used throughout their experiments. See the field "lr" within experiment configuration: https://bit.ly/3a4FyHP. https://www.robust-ml.org/defenses/ (anonymized gallery) https://starving-panda.github.io/sample-gallery/ C&W for ∞, TensorFlow Implementation: https://github.com/carlini/nn_robust_ attacks/blob/master/li_attack.py Nicholas Carlini on C&W for ∞: https://github.com/tensorflow/cleverhans/ issues/978#issuecomment-464594668



containing 16185 images of cars across 196 different models; (3) SOP(Song et al., 2016) containing 120053 images of 22634 different online products. Each dataset is divided into a training and testing set of approximately the same size, by respective selecting the first half of classes for the training set, while having the remaining classes be the testing set. This split reflect a zero-shot learning scenarios, which is a common application of DML. Similar toRoth et al. (2020), we train CUB200-2011 and CARS196 for 150 epochs and SOP for 100 epochs due to its volume.

Figure 1: Example of inference of a naturally-trained DML model and robustly trained variant on the CUB200-2011 dataset. Each model infer the class of the natural data point x, and its perturbed x counterpart, using the class of the nearest anchor nn(•). Green and red borders indicate correct-(same class) and incorrect inference, respectively. Both models infer the natural input correctly, however, the naturally-trained DML fails to infer the adversarial perturbed input correctly.

Figure 3: Metrics (Loss, R@1, and mAP@R) for contrastive loss training procedure across perturbation targets.

Performance of robust DML models on benign input for various specified (training) attack rates P (γ = 1) . These models were trained using the proposed adversarial training algorithm (covered in Section 3.5) with PGD for ∞ ( = 0.01). Evaluations on benign data. Naturally-trained marks the performance of DML models using traditional non-robust training objectives. Bold marks best performance for dataset, metric, loss combinations. Higher attack rates are often associated with lower performance on benign input. For contrastive loss

Performance of naturally-trained DML models against adversarial examples generated using Algorithm 1 with two alternative attack methods: FGSM and C&W. Losses are denoted by C (contrastive) and T

