ARMOURED: ADVERSARIALLY ROBUST MODELS USING UNLABELED DATA BY REGULARIZING DIVER-SITY

Abstract

Adversarial attacks pose a major challenge for modern deep neural networks. Recent advancements show that adversarially robust generalization requires a large amount of labeled data for training. If annotation becomes a burden, can unlabeled data help bridge the gap? In this paper, we propose ARMOURED, an adversarially robust training method based on semi-supervised learning that consists of two components. The first component applies multi-view learning to simultaneously optimize multiple independent networks and utilizes unlabeled data to enforce labeling consistency. The second component reduces adversarial transferability among the networks via diversity regularizers inspired by determinantal point processes and entropy maximization. Experimental results show that under small perturbation budgets, ARMOURED is robust against strong adaptive adversaries. Notably, ARMOURED does not rely on generating adversarial samples during training. When used in combination with adversarial training, AR-MOURED yields competitive performance with the state-of-the-art adversariallyrobust benchmarks on SVHN and outperforms them on CIFAR-10, while offering higher clean accuracy.

1. INTRODUCTION

Modern deep neural networks have met or even surpassed human-level performance on a variety of image classification tasks. However, they are vulnerable to adversarial attacks, where small, calculated perturbations in the input sample can fool a network into making unintended behaviors, e.g., misclassification. (Szegedy et al., 2014; Biggio et al., 2013) . Such adversarial attacks have been found to transfer between different network architectures (Papernot et al., 2016) and are a serious concern, especially when neural networks are used in real-world applications. As a result, much work has been done to improve the robustness of neural networks against adversarial attacks (Miller et al., 2020) . Of these techniques, adversarial training (AT) (Goodfellow et al., 2015; Madry et al., 2018) is widely used and has been found to provide the most robust models in recent evaluation studies (Dong et al., 2020; Croce & Hein, 2020) . Nonetheless, even models trained with AT have markedly reduced performance on adversarial samples in comparison to clean samples. Models trained with AT also have worse accuracy on clean samples when compared to models trained with standard classification losses. Schmidt et al. (2018) suggest that one reason for such reductions in model accuracy is that training adversarially robust models requires substantially more labeled data. Due to the high costs of obtaining such labeled data in real-world applications, recent work has explored semi-supervised AT-based approaches that are able to leverage unlabeled data instead (Uesato et al., 2019; Najafi et al., 2019; Zhai et al., 2019; Carmon et al., 2019) . Orthogonal to AT-based approaches that focus on training robust single models, a few works have explored the use of diversity regularization for learning adversarially robust classifiers. These works rely on encouraging ensemble diversity through regularization terms, whether on model predictions (Pang et al., 2019) or model gradients (Dabouei et al., 2020) , guided by the intuition that diversity amongst the model ensemble will make it difficult for adversarial attacks to transfer between individual models, thus making the ensemble as a whole more resistant to attack. et al., 2019b) and ALP (Kannan et al., 2018) further decompose the error into natural error and boundary error for higher robustness. Zhang et al. (2019a) ; Wang et al. (2019) theoretically prove the convergence of AT. Two drawbacks of AT are its slow training due to adversarial example generation requiring multiple gradient computations, and the significant reduction in model accuracy on clean samples. Several recent works have focused on speeding up AT (Zhang et al., 2019a; Qin et al., 2019; Shafahi et al., 2019) ; ARMOURED addresses the second limitation, enabling significantly improved performance on clean samples. Semi-supervised adversarial training: Schmidt et al. (2018) showed that adversarial robust generalization requires much more labeled data. To relieve the annotation burden, several semi-supervised adversarially robust learning (SSAR) methods have been developed to exploit unlabeled data instead. Uesato et al. (2019) introduced unsupervised adversarial training, a simple self-training model which optimizes a smoothness loss and a classification loss using pseudo-labels. Carmon et al. (2019) revisited the Gaussian model by Schmidt et al. (2018) and introduced robust self-training (RST), another self-training model that computes a regularization loss from unlabeled data, either via adversarial training or stability training. Zhai et al. (2019) applied a generalized virtual adversarial training to optimize the prediction stability of their model in the presence of perturbations. Najafi et al. (2019) proposed a semi-supervised extension of the distributionally robust optimization framework by Sinha et al. (2018) . They replace pseudo-labels with soft-labels for unlabeled data and train them together with labeled data. It is worth noting that all of these four state-of-the-art SSAR methods apply AT in their training procedure. Diversity regularization: Diversity regularization is an orthogonal direction to AT that has the potential to further improve the performance of AT. In earlier work, Pang et al. (2018) showed that for a single network, adversarial robustness can be improved when the features learned for different classes are diverse. Pang et al. (2019) further developed this concept by introducing Adaptive Promoting Diversity regularization (ADP). Given an ensemble of neural network classifiers, ADP promotes diversity among non-target predictions of the networks. ADP is inspired by determinantal point processes (Hough et al., 2006) , an elegant statistical tool to model repulsive interactions among items of a fixed ground set; applications to machine learning are reviewed in (Kulesza & Taskar, 2012) . Dabouei et al. (2020) enforce diversity on the gradients of individual networks in the ensemble instead of their predictions. We note that unlike ARMOURED, the methods described here are developed for the fully-supervised setting, and are not able to utilize unlabeled data.

2.2. SEMI-SUPERVISED LEARNING

Semi-supervised learning: Semi-supervised learning (SSL) is an effective strategy to learn from low-cost unlabeled data. There is considerable recent work in this practically relevant and active research area; we will not be able to cover all these works here. Existing SSL methods can be broadly categorized into three groups: consistency-based, graph-based, and generative models. Recent methods, such as Mean Teacher (Tarvainen & Valpola, 2017) and MixMatch (Berthelot et al., 2019) , are consistency-based as this approach can be adapted to generic problems and have superior performance in practice. The key idea behind consistency-based methods is that model predictions on different augmentations of the same input should be consistent. Multi-view learning: Multi-view learning is a SSL paradigm that is capable of representing diversity in addition to consistency. A dataset is considered to have multiple views when its data samples are represented by more than one set of features and each set is sufficient for the learning task. In this setting, a multi-view method assigns one modeling function to each view and jointly optimizes the functions to improve generalization performance (Zhao et al., 2017) . By analyzing various multi-view algorithms, Xu et al. (2013) summarized consensus and complementary as the two underpinning principles of multi-view learning. The consensus principle states that a multiview technique must aim at maximizing the prediction agreement on different views, similar to the consistency-based SSL methods discussed above. The complementary principle states that in order to make improvement, each view must contain some information that the other views do not carry, that the views should be sufficiently diverse. This principle has been applied to boost generalization capability in regular SSL (Qiao et al., 2018) and learning with label noise (Han et al., 2018) . In this paper, we argue that multi-view complementarity also plays a critical role in improving adversarial robustness, by reducing the transferability of adversarial attacks across different views.

3. THE ARMOURED METHOD

In this section, we introduce ARMOURED, our proposed semi-supervised adversarially robust learning method. To utilize both labeled and unlabeled data, ARMOURED adopts a multi-view framework where multiple networks output different predictions (posterior probabilities, which we will refer to as deep views) on the same input image. The networks are then co-optimized by a single loss function computed on the deep views. We adhere to both the consensus and complementary principles of multi-view learning by ensuring that the deep views maximize their consensus on the target class (the ground truth class for labeled examples), but complement each other on the non-target classes. To determine a "target" class for unlabeled samples, ARMOURED applies a matching filter to pick out a target class based on agreement between views. Since our method is designed for adversarial robustness, we place a greater emphasis on the complementary principle. More concretely, we introduce two levels of complementarity: (i) among the deep views via a regularizer based on DPP and (ii) among the non-target classes via an entropy regularization applied on the combined multi-view output. Following this, we will describe ARMOURED in detail. Pseudocode detailing the training procedure is provided in Algorithm 1 in Appendix A.1. Overview: We describe the general M -view model. Consider a semi-supervised image classification task on input image x and target label y from one of K classes, y ∈ {1, 2, . . . , K}. In each minibatch, our training data consists of a labeled set L = {(x i , y i )} n L i=1 and an unlabeled set 𝑥 𝑥 1 𝑥 2 𝑁 1 𝑁 2 𝑓 1 𝑓 2 𝑦 ℒ 𝑥, 𝑦 𝜂 𝜂 ො 𝑦 𝑁 1 𝑁 2 𝑓 1 𝑓 2 𝐟 avg (a) Training Procedure (b) Inference Procedure 𝑥 𝑥 1 𝑥 2 𝜂 𝜂 U = {x i } n U i=1 . For each input image x, we apply random augmentations to generate M different augmented images {x m } M m=1 . Let {N m } M m=1 be architecturally similar neural networks with respective parameters {θ m } M m=1 . Each network takes the corresponding augmented input and produces predictions f m (x) = N m (x m , θ m ) ∀m = 1, . . . , M . Due to the different augmentations and network parameters, each output f m can be treated as one deep view of the original image x. Finally, we compute a loss function on these deep views and backpropagate to optimize the parameters L(x, y) = L CE (x, y) + λ DPP L DPP (x, y) + λ NEM L NEM (x, y) where λ DPP and λ NEM are model hyperparameters. We describe each component of the overall loss function, L CE , L DPP and L NEM in the following. At inference time, the M outputs are combined to produce a single prediction. Since our networks possess similar learning capability, the final output is computed by averaging the deep views: f (x) = 1 M M m=1 f m (x). The detailed inference procedure is given in Algorithm 2 in Appendix A.1. Figure 1 illustrates the ARMOURED multiview framework for the dual-view scenario. Cross-entropy loss (L CE ) and pseudo-label filter: For each labeled sample, we minimize the standard cross-entropy loss L CE (x, y) = -M m=1 log f m y (x). While one may train each deep view independently using only the labeled data, the fact that augmented inputs are generated from the same original image enables us to add an additional constraint -that the deep views should agree with each other even on unlabeled samples. Hence, when all M networks assign the highest probability to the same class, we can be confident about their prediction on the sample. We denote such sample as a stable sample and define a pseudo-label ŷ for it as ŷ = arg max k=1,...,K f m k ∀m = 1, . . . , M . This pseudo-labeling technique has its roots in co-training (Blum & Mitchell, 1998) , a multi-view technique that conforms to the consensus principle. After a stable sample is confirmed, it is treated as a labeled sample and the cross-entropy loss L CE applies. We recompute pseudo-labels for each minibatch to avoid making incorrect pseudo-labels permanent. DPP regularization (L DPP ): Suppose that the number of deep views is smaller than the number of classes, i.e., (M < K). Let F be the K × M matrix formed by stacking the deep views horizontally, i.e., F = f 1 , f 2 , . . . , f M . Furthermore, let S be a K × K positive semidefinite kernel matrix that measures the pairwise similarity among the classes. For each sample, we extract F \y and S \y as the submatrices of F and S that correspond to the non-target classes. Let F\y denote the normalized F \y where each column is scaled to unit length. Inspired by determinantal point processes (Kulesza & Taskar, 2012) , ARMOURED minimizes the following loss: L DPP (x, y) = -log det F \y S \y F\y . This loss is minimized at F\y = F * , where F * is the horizontal concatenation of the first M dominant eigenvectors of S \y ; a proof is provided in Appendix A.2. Since eigenvectors are always orthogonal, L DPP encourages the deep views to make diverse predictions on non-target classes. If the kernel matrix is predefined, this result allows us to interpret the non-target predictions implied by the DPP regularizer. Specifically, if the kernel S is constructed by a similarity measure over the classes, then a clustering effect will be observed, where similar classes are "preferred" by the same view. On the other hand, we can also inject prior knowledge or encourage desired behavior by designing a custom kernel. Exploitation of prior knowledge can be beneficial to generalization, especially when labeled training data are limited. We note that our DPP regularizer generalizes the ensemble diversity regularizer of ADP (Pang et al., 2019) , that uses the identity matrix as its kernel (S ≡ I). If we decompose the kernel matrix such that S = Φ Φ, then our DPP regularizer is equivalent to the ADP regularizer applied on a linear transformation Φ F\y of the non-target predictions. Again, this linear transformation is another way to regulate the deep views, and can either be learned or predefined. Figure 2 illustrates the difference between the predictions from baseline model vs from ARMOURED models with different kernels. Another related work is cost-sensitive robustness (Zhang & Evans, 2018) , which uses a cost matrix to weigh different adversarial transformations (attacks) among the classes. Our kernel matrix does not serve the same purpose, but the effects are similar. In our model, network preference would prevent adversarial transformations across different groups of classes. Non-target entropy maximization (L NEM ): Besides the multi-view diversity, we further propose an entropy regularizer that encourages larger margins among non-target classes in the final predictions f (x). Specifically, let f \y be the (K -1) × 1 vector of non-target predictions, and f\y be the normalized vector where the elements sum up to 1. We propose to maximize the entropy defined over the normalized non-target predictions. Our entropy regularizer is therefore defined as the negative entropy L NEM (x, y) = -H( f\y ) = K-1 k=1 f\y log f\y . This loss is minimized when all elements of f \y are equal to 1 K-1 (1-f y ). Intuitively, this regularizer acts as a balancing force on the non-target predictions. It prevents ARMOURED from assigning high probability to any of the incorrect classes. We note that L NEM differs from the entropy maximization technique adopted in Pang et al. (2019) that encourages a uniform distribution over all K classes. Although our regularizer is similar to the complement objective proposed by Chen et al. ( 2019), we extend this technique to semi-supervised learning and provide more theoretical insight -we show that entropy maximization increases a lower bound on the average (logit) margin under mild assumptions (Theorem A.2 in Appendix A.3).

4. EXPERIMENTS 4.1 EXPERIMENTAL SETUP

Dataset: We evaluate ARMOURED on the CIFAR-10 and SVHN datasets. We use the official train/test splits (50k/10k labeled samples) for CIFAR-10 ( Krizhevsky et al., 2009) and reserve 5k samples from the training samples for a validation set. In our semi-supervised setup, the label budget is either 1k or 4k; remaining samples from training set are treated as unlabeled samples. For the SVHN dataset (Netzer et al., 2011) , our train/validation/test split is 65,932 / 7,325 / 26,032 samples. We use only 1k samples as the label budget in our semi-supervised setup for SVHN. For simplicity, we will refer to our setup as "Dataset-semi-budget", e.g., CIFAR-10-semi-4k, SVHN-semi-1k. Adversarial attacks: To evaluate robustness, we apply the following adversaries: (i) Fast Gradient Sign Method attack (Goodfellow et al., 2015) (FGSM), (ii) Projected Gradient Descent attack (Madry et al., 2018 ) (PGD) with random initialization and (iv) AutoAttack (Croce & Hein, 2020) . For ∞ attacks, the default perturbation budget is = 8/255; for 2 attacks, = 0.5.

Backbone network and training:

To enable fair comparison, the same Wide ResNet (Oliver et al., 2018) backbone is used for all methods. Specifically, we implement "WRN-28-2" with depth 28 and width 2 along with batch normalization, leaky ReLU activation and Adam optimizer. We train each method for 600 epochs on CIFAR-10-semi-4k and SVHN-semi-1k. Learning rate is decayed by a factor of 0.2 after the first 400k iterations. AT wrapper for SSL: We notice that many concepts in SSL, such as multi-view diversity or consistency, are orthogonal to AT, and that successful defenses against large-perturbation attacks always rely on AT (Croce & Hein, 2020) . Therefore, we hope to combine the best of both worlds by implementing AT as a wrapper method for SSL. Algorithm 3 in Appendix A.1 describes our Method+AT wrapper, which consists of three main steps. First, for each batch of semi-supervised data, we apply the inference procedure of the Method (e.g., Algorithm 2 of ARMOURED) to generate pseudolabels for unlabeled data. Second, for each input sample in the batch, we compute its adversarial sample using either the true label (if the sample is labeled) or the pseudo-label (if the sample is unlabeled). Third, we execute the training procedure of the Method (e.g., Algorithm 1 of AR-MOURED) using the adversarial samples and the original labels. The pseudo-labels computed from the first step are now dropped, so that the training is still semi-supervised. We note that this wrapper algorithm resembles RST (Carmon et al., 2019) .

ARMOURED variants:

We design three variants based on the dual-view model shown in Figure 1 that differ only in choice of diversity kernel. ARMOURED-I is our standard model that uses the Identity matrix as its diversity kernel. ARMOURED-H uses a Hand-crafted binary matrix intended to group the classes into two predefined clusters. On CIFAR-10, these are "vehicles" (airplane, ship, truck, automobile) vs. "animals" (bird, cat, deer, dog, frog, horse). On SVHN, we split the digits into "simple & edgy" (0, 1, 2, 4, 7) vs. "curvy & loopy" (3, 5, 6, 8, 9) . The third variant is ARMOURED-F, which uses a learnable Feature-based kernel. From a pre-trained SSL model, we first compute the adversarial samples corresponding to the labeled training samples. Then, for each class, we extract feature vectors by averaging over the adversarial samples associated with the class. Finally, we combine the feature vectors into a matrix B and compute a kernel S = B B. More details of the kernels are provided in Appendix A.4. In our experiments, we evaluate the following four variants: ARMOURED-I+AT, ARMOURED-H+AT, ARMOURED-F+AT and ARMOURED-F (trained without AT). For the AT wrapper, we apply a 7-step PGD ∞ attack with total = 8/255 (for CIFAR-10), = 4/255 (for SVHN) and step size of /4.

Comparison benchmarks:

We test the proposed method against a wide range of state-of-the-art SSL and SSAR benchmarks: Mean Teacher (MT) (Tarvainen & Valpola, 2017) , MixMatch (Berthelot et al., 2019) , RST (Carmon et al., 2019) (RST has two variants, we implemented RST adv ), and the method of Zhai et al. (2019) that we denote as ARG. In addition, we combine MT with adversarial training (MT+AT) using the wrapper Algorithm 3 in Appendix A.1. To the best of our knowledge, this is the first time MT+AT has been evaluated for adversarial robustness. For ATbased methods (RST, ARG), we use a 7-step PGD ∞ attack in their AT phase, similar to MT+AT and ARMOURED+AT.

4.2. RESULTS 1

Results on CIFAR-10 (Table 1 , Figure 3 ): On clean data, MixMatch yields the best performance, while ARMOURED-F surpasses all methods trained with AT by large margins (18%-26%) and is even better than MT -a SSL method. ARMOURED variants demonstrate substantially higher clean performance over the SSAR benchmarks. Under standard FGSM and PGD attacks, the most robust defense is still ARMOURED-F, followed by its AT-based variants with accuracy drops of 2%-5%. Other methods shows larger gaps: 25%-35% for SSAR and 10%-50% for SSL benchmarks. We note that the improvements by ARMOURED are not due to gradient masking (see Appendix B). Under AutoAttack, ARMOURED-F is no longer robust, instead, ARMOURED+AT variants are more resilient. ARMOURED-F+AT is the best defense, outperforming ARG by 5.23% for ∞ and 9.85% for 2 attacks. We also notice that the two best defenses against AutoAttack are trained with the hand-crafted and feature kernels. The former requires only human knowledge while the latter just needs additional computing resources, giving our method flexible ways to boost adversarial robustness with or without prior knowledge. In Figure 3 , we plot the robust accuracy against AutoAttack as the perturbation budget gradually increases. ARMOURED-F obtains highest accuracy on clean data as well as for small perturbation budgets, but its accuracy drops rapidly as is increased. Meanwhile, the ARMOURED+AT variants are able to achieve a better trade-off between clean accuracy and robustness. Results on SVHN (Table 2 ): On clean test samples, ARMOURED-F yields the best performance, with a small improvement over the second best competing method. Against FGSM and PGD attacks, even the worst ARMOURED variant is more robust than MT+AT (the best benchmark) by 13%-20%. Under AutoAttack, ARMOURED-H+AT falls behind MT+AT and ARG by a significant gap of 15% under ∞ attacks, while outperforming them by 4%-11% under 2 attacks. Overall, ARMOURED shows competitive performance compared to state-of-the-art SSL and SSAR benchmarks. Results on CIFAR-10 and SVHN suggest that MT+AT is a strong defense. Ablation study (Table 3 ): We perform an ablation study to investigate the contribution of each component to the performance of ARMOURED-F+AT agaisnt AutoAttack. First, we remove both DPP and entropy regularization terms from the total loss in equation ( 1). This model, denoted as w/o (L DPP + L NEM ), performs relatively well on clean data, but its performance suffers under attacks, dropping by 26% for ∞ and by 22% for 2 attacks. We then keep the term L NEM , but remove the diversity regularizer from the loss functionfoot_1 . This model -w/o L DPP -performs worse than the complete model by 1%. We conclude that the entropy regularizer plays a more vital role than the DPP regularizer. Besides, we train ARMOURED-F+AT using only the 4k labeled samples and call this model w/o Unlabeled. Its poor performance reinforce the importance of unlabeled data towards improving adversarial robustness. Finally, we include ARMOURED-F (trained without AT), which performs very well on clean data but fails against AutoAttack. Additional results are provided in Appendix B. Visualization of learned representations (Figure 4 ): On CIFAR-10 test samples, we visualize the feature embeddings (extracted from the last layer of WRN-28-2 before the linear layer) learned by the four models in our ablation study. On clean test samples, ARMOURED-F produces the best embeddings. On adversarial samples, we observe gradual improvements in the representations, starting from (a) no diversity regularization to (b) diversity on only labeled samples, with network N 1 showing well-defined clusters; then (c) diversity on whole training set with better cluster separation and (d) combining diversity with AT, where clusters are less contaminated under attacks.

5. CONCLUSION

In this work, we presented ARMOURED, a novel method for learning adversarially robust models that unifies semi-supervised learning and diversity regularization in a multi-view framework. AR-MOURED alone is robust against standard white-box attacks as well as strong adaptive attacks with small perturbation budgets. When combined with adversarial training, ARMOURED demonstrates much better robustness against a wider range of perturbation budgets. Additionally, ARMOURED improves clean accuracy when compared with state-of-the-art semi-supervised adversarial training methods. The empirical performance of ARMOURED+AT suggests that it is possible to learn adversarially robust models while upholding a reasonable accuracy on clean samples. Extending this method to exploit more than two views or alternative custom kernels for the DPP regularizer could result in further performance gains.

APPENDICES A ARMOURED METHOD

A.1 DETAILED PSEUDOCODES  Algorithm 1: ARMOURED Minibatch Training Procedure Input: Labeled samples L = {(x i , y i )} n L i=1 ; unlabeled samples U = {x i } n U i=1 ; kernel matrix S; random augmentation η(x); hyperparameters (λ DPP , λ NEM ) Output: Networks {N m } M m=1 with updated parameters {θ m } M m=1 for i = 1, . . . , n U do for m = 1, . . . , M do x m i = η(x i ) // random augmentation f m (x i ) = N m (x m i , θ m ) // forward pass end if ŷi = arg max k=1,...,K f m k (x i ) ∀m = 1, . . . , M then add (x i , ŷi ) to L remove x i from U end end for i = 1, . . . , n l do for m = 1, . . . , M do x m i = η(x i ) // random augmentation f m (x i ) = N m (x m i , θ m ) // forward pass end L(x i , y i ) = L CE (x i , y i ) + λ DPP L DPP (x i , y i ) + λ NEM L NEM (x i , y i ) // sample loss end L = n L i=1 L(x i , y i ) // for m = 1, . . . , M do x m = η(x) // random augmentation f m (x) = N m (x m , θ m ) // forward pass end f (x) = 1 M M m=1 f m (x) // posterior output ŷ = arg max k=1,...,K f k (x) // predicted label A.2 OPTIMA OF DPP REGULARIZER For simplicity, we find the maximum of the exponential of negative loss L DPP (x, y), defined as Q(x, y) = exp [-L DPP (x, y)] = det F \y S \y F\y (2) Since S \y is a principal submatrix of S, it is also positive semidefinite. We can decompose S \y as follows: S \y = V DV , where V is a square matrix whose k-th column is the eigenvector v k of S \y , and D is a diagonal matrix whose (k, k)-th element λ k is the k-th largest eigenvalue of S \y . (5) = 2 det(D M )D M F * D -1 M (6) = 2 det(D M ) F * (7) Interestingly, since D M is a diagonal matrix, det(D M ) equals the product of the first M eigenvalues of S \y . This product is also nonnegative because S \y is positive semidefinite. Therefore, the gradient at F * is a nonnegative scaling of F * itself. Since F * is normalized to unit length, adding this gradient does not update it any further, i.e., the angular gradient at F * is zero. As shown by Cover & Thomas (1988) , given a fixed positive semidefinite kernel, the determinant in equation ( 2) is a concave function of F\y . Thus, F * is a maximum of Q. Note that F * is not the only maximum. Let R be a M × M orthogonal matrix, so that F * R is a rotation of F * . Then, F * R is also a maximum of Q, because ( F * R) S \y ( F * R) = R ( F * S \y F * )R = R D M R = D M R R = D M = F * S \y F * (8) This means that a family of maxima exists for Q, which includes F * and its orthogonal transformations in the M -dimensional subspace spanned by F * . For example, when M = 2, objective Q is maximized at F * = [v 1 , v 2 ] det v 1 v 2 S \y [v 1 v 2 ] = det v 1 S \y v 1 v 1 S \y v 2 v 2 S \y v 1 v 2 S \y v 2 = det λ 1 0 0 λ 2 = λ 1 λ 2 (9) Any rotation of (v 1 , v 2 ) in the 2-dimensional plane spanned by them is also a maximum.

A.3 ANALYSIS OF NON-TARGET ENTROPY MAXIMIZATION

For ease of exposition, we denote g(x) as the unnormalized logits of f (x), the Lipschitz constant L N as the scalar satisfying, ||g(x) -g(x + )|| 2 ≤ L N || || 2 The guarded adversarial area (Tsuzuku et al., 2018) is defined as the hypersphere satisfying the following condition, where c is the largest perturbation radius measured in p distance ∀ : || || p ≤ c ⇒ f y (x + ) ≥ max f \y (x + ) The max/average logit gap is the gap bewteen target class logit and maxmimal/average non target class logit, maxgap(x) = g y (x) -max k =y g k (x), avggap(x) = g y (x) -avg k =y g k (x) We start by introducing the following lemma which is related to Proposition 1 of Tsuzuku et al. (2018) . Lemma A.1 For any adversarial perturbation smaller than the logit gap divided by the Liptschitz constant, it is guaranteed the class prediction does not change. Proof. Lemma A.1 can be written as the following, maxgap(x) = g y (x) -max k =y g k (x) ≥ √ 2L N || || ⇒ g y (x + ) -max k =y {(g k (x + )} ≥ 0 (13) A proof that Lemma A.1 is the same with the proof for Proposition 1 of Tsuzuku et al. (2018) . This lemma suggests that it is possible to increase the robustness, the guarded adversarial area , by either decreasing the Lipschitz constant and/or increasing the logit gap. It is often acknowledged, as with the analysis in Tsuzuku et al. (2018) , that the Lipschitz constant for large neural network is very hard to quantify. Instead we find it is easier to enlarge the logit gap by non-target entropy maximization and reveal a relation between them as follows, Theorem A.2 The non-target entropy H( f\y ) is a lower bound of average logit gap plus a constant. The entropy maximization term will encourage a uniform distribution over non-target classes, i.e. maxgap(x) ≈ avggap(x). By referring to Lemma A.1, this theorem suggests maximizing nontarget entropy H( f\y ) leads to higher guarded adversarial attack area . As result, the overall robustness to adversarial attack is improved by introducing the additional non-target entropy maximization loss. Proof. We first write the theorem to prove as the following: H( f\y ) ≤ g y (x) -avg k =y {(g k (x)} + C Before we provide the proof, we introduce the following two lemmas and make a mild assumption: Lemma A.3 LogSumExp is a smooth approximation to and upper bounded by the maximum function plus a constant. log k =y exp g k ≤ max k =y g k + log(K -1) Proof. We relax the summation with maximization and arrive at the following inequalities. log k =y exp g k ≤ log((K -1) exp(max k =y g k )) = max k =y g k + log(K -1) Lemma A.4 The following inequality holds for real number vector g of length K. avg k g k ≤ k g k exp g k k exp g k Proof. W.l.o.g. we assume g k is in descending order, i.e. ∀i < j, g i ≥ g j . The proof is rewritten as, g 1 (exp g 1 + • • • exp g K ) + • • • g K (exp g 1 + • • • exp g K ) ≤ Kg 1 exp g 1 + • • • Kg K exp g K (18) The difference between RHS and LHS is written as, RHS -LHS = (exp g 1 -exp g 2 )(g 1 -g 2 ) + (exp g 1 -exp g 3 )(g 1 -g 3 )+ (19) • • • + (exp g 1 -exp g K )(g 1 -g K )+ (20) (exp g 2 -exp g 3 )(g 2 -g 3 ) + (exp g 2 -exp g 4 )(g 2 -g 4 )+ (21) • • • + (exp g 2 -exp g N )(g 2 -g N )+ (22) • • • + (exp g K-1 -exp g K )(g K-1 -g K ) Obviously, RHS -LHS is non-negative, thus the inequality holds. Assumption A.5 Assume the clean samples are mostly correctly classified. max k =y g k (x) ≤ g y (x) Given the fact that we can achieve relatively high classification accuracy on clean samples, the assumption is realistic in most cases. Now we prove the inequality for equation ( 14) holds.  H( f\y ) = - k =y exp g k k =y exp g k log exp g k k =y exp g k (25) = k =y exp g k k =y exp g k (log k =y exp g k -g k ) ≤ k =y exp g k k =y exp g k (max k =y g k + log(K -1) -g k ) (27) ≤ k =y exp g k k =y exp g k (g y + log(K -1) -g k ) (28) ≤ k =y exp g k k =y exp g k (g y + log(K -1)) -avg k =y g k (29) ≤ (g y -avg k =y g k + log(K -1))

B SUPPLEMENTARY RESULTS

Additional results on CIFAR-10-semi-4k (Table B .6): In this table, we provide the numerical results that are plotted in Figure 3 . Additional results on SVHN-semi-1k ( ARMOURED-F and ARMOURED-F+AT are very robust against black-box attacks (FAB and Square), which suggests that gradient-masking is less likely to exist in our models. Utilization of unlabeled data (Table B .10): We define the utilization rate as the ratio between the number of stable samples and the total number of unlabeled samples in each minibatch. For each setup, we report the average utilization rate in the last 1000 training iterations. While the utilization rates on CIFAR-10-semi-4k are high (about 90%), they are much lower on CIFAR-10semi-1k (65%-80%) and SVHN-semi-1k (about 80% except for ARMOURED-F). We suspect that the low utilization rates negatively affect the performance of ARMOURED but was not able to conduct further investigation on this issue. B .11 show that ARMOURED variants achieve higher clean accuracy and better robustness against standard attacks, when compared to SSAR benchmarks. However, under AutoAttack, the best benchmark (ARG for ∞ and MT+AT for 2 ) outperforms ARMOURED-F+AT by about 3%. We suspect that the drops in the performance of ARMOURED are due to low utilization rate (see Table B .10), but were not able to investigate this issue further. In addition, we plot the robustness against AutoAttack for varying perturbation budgets in Figure B.8. Similar to the results on CIFAR-10-semi-4k, ARMOURED-F shows better performance than SSAR benchmarks under small attacks. 



Each result contains mean and standard deviation statistics computed from three independent runs with different random data seeds (for selecting labeled samples). The DPP regularization term cannot function properly without entropy regularization, due to a trivial optimum at the one-hot vector 1y, as shown byPang et al. (2019). Hence, we must keep LNEM.



Figure 1: ARMOURED dual-view framework: (a) training and (b) inference procedures. Solid black and dotted red arrows denote forward and backward passes, respectively; double black arrows represent image augmentation; double-dashed green arrows denote pseudo-label filter.

Figure 2: Three models are trained on CIFAR-10-semi-1k setup. We plot the average prediction of test samples with label "airplane". (a) ARMOURED without DPP regularization: each network predicts randomly on non-target classes. (b) ARMOURED-I: with identity matrix as kernel, network predictions on non-target classes are orthogonal. (c) ARMOURED-H: hand-crafted kernel causes a clustering effect, where each network prefers a group of classes, either vehicles or animals.

Figure 3: Robustness against AutoAttack vs. perturbation budget on CIFAR-10-semi-4k.

Figure 4: t-SNE plots of feature embeddings from CIFAR-10 test samples generated by ablation models: (a) w/o (L DPP + L NEM ), (b) w/o Unlabeled, (c) ARMOURED-F and (d) ARMOURED-F+AT. For each of the 8 network/method pairs, the clean and adversarial samples are processed together in a single t-SNE run. Adversarial samples are generated with PGD-∞ ( = 8/255). From (a) to (d), the embeddings of adversarial samples are progressively enhanced, while (c) yields the best representations on clean data.

F\y ) -1 (3) Let F * be the horizontal concatenation of the first M eigenvectors, i.e., F * = [v 1 , v 2 , . . . , v M ]. Notice that F * S \y F * = D M , where D M is the M × M leading principal submatrix of D. We evaluate the gradient at F * as follows ∂Q ∂ F\y F * = 2 det( F * S \y F * )S \y F * ( F * S \y F * ) -1 (4) = 2 det(D M )S \y F * D -1 M

IMPLEMENTATION DETAILSHyperparameters: We fine tune λ DPP and λ NEM with ARMOURED-I+AT model trained on CIFAR10 and SVHN individually. The tuning ranges are as follows: λ DPP ∈ [0.25, 0.5, 1] and λ NEM ∈ [1, 2, 4]. Each model is trained with one seed and is evaluated on the standard validation set (5k labeled samples for CIFAR-10 and 7325 labeled samples for SVHN). Please see TableA.4  and Table A.5  for the numerical results. After tuning, we decide to apply (λ DPP , λ NEM ) = (1, 1) for SVHN and (λ DPP , λ NEM ) = (1, 0.5) for CIFAR-10.Random augmentations: Regarding the random augmentations η(x), we apply translations and horizontal flips on CIFAR-10 images and apply only random translations for SVHN images. Feature-based kernel: We learn the feature-based kernel following the steps below. The learned kernels are plotted in Figure A.5. 1. Train a MT+AT model using both labeled and unlabeled training data. 2. Using the teacher model, feed forward the adversarial samples generated from labeled training data. Extract feature vectors from the last layer of WRN-28-2 before the linear layer. 3. For each class, compute the average feature vectors: b 1 , b 2 , . . . b K . 4. Normalize each feature vector by its L-2 norm. 5. Generate the feature matrix B = [b 1 b 2 . . . b K ] and compute the kernel as S = B B. 6. Normalize S by its largest eigenvalue (equivalent to L-2 normalization).

Figure A.5: Visualization of the learned feature-based kernels.

Figure B.6: Robustness against AutoAttack vs. perturbation budget on SVHN-semi-1k.

Figure B.8: Robustness against AutoAttack vs. perturbation budget on CIFAR-10-semi-1k.

Benchmark results on CIFAR-10-semi-4k

Benchmark results on SVHN-semi-1k

Ablation study on CIFAR-10-semi-4k

batch loss backpropagate L to optimize {θ m } M

Algorithm 3: Method+AT Minibatch Training Procedure Input: Labeled samples L = {(x i , y i )} n L i=1 ; unlabeled samples U = {x i } n U i=1 ; SSL technique Method with hyperparameters Ω Method ; adversarial attack π(x, y) Output: Method model with updated parameters create L adv = ∅ and U adv = ∅ // new empty sets for i = 1, . . . , n L do z i = π(x i , y i ) // adversarial sample add (z i , y i ) to L adv end for i = 1, . . . , n U do apply inference procedure of Method on x i to generate pseudo-label ŷi z i = π(x i , ŷi ) // adversarial sample add (z i ) to U adv end execute training procedure of Method with inputs: L adv ; U adv ; Ω MethodThe gradient of Q with respect to F\y is given by Petersen & Pedersen (2012) as

4: Fine-tuning of λ DPP and λ NEM on CIFAR10-semi-4k (reporting validation accuracy)

5: Fine-tuning of λ DPP and λ NEM on SVHN-semi-1k (reporting validation accuracy)

Table B.7, Figure B.6): Here, we evaluate the robustness of ARMOURED variants and SSAR benchmarks against AutoAttack with varying perturbation budgets. The results show that MT+AT achieves the best robustness. Among ARMOURED variants, ARMOURED-H+AT and ARMOURED-F+AT are the most robust and are comparable to each other.Additional results from ablation study (TableB.8): In this table, we report the full evaluation results from our ablation study, adding results from standard attacks. In addition, we create a new model w/ H(f ) by replacing H( f\y ) in L NEM by the entropy of the averaged prediction f over all classes, similar to the term used byPang et al. (2019). This model is less robust than ARMOURED-F+AT against AutoAttack, suggesting that our entropy regularization is better.Check on gradient masking (TableB.9): We evaluate ARMOURED-F, ARMOURED-F+AT and other benchmarks against individual components of AutoAttack. The results show that both Table B.6: Benchmark against AtutoAttack with varying budgets on CIFAR-10-semi-4k

Regularization effect of DPP kernel (Figure B.7): We illustrate the average prediction of test samples generated by ARMOURED variants. From all the subplots, we can clearly see that each network has developed a preference on high or low posterior for each class. For example, in Figure B.7b, network N 2 (right side) tends to have high predictions for airplane, automobile, ship and truck, while network N 1 (left side) has higher predictions on the remaining six classes. This behaviour is promoted by the hand-crafted kernel. The feature-based kernel (Figure B.7c and Figure B.7d) encourages a similar grouping of classes, even though the distinctions are less severe. With the identity matrix as kernel, the predictions in Figure B.7a also form two groups, but the correlation among classes of the same group are less intuitive.Results on CIFAR-10-semi-1k(Table B.11, Table B.12, Figure B.8): We conduct an experiment on CIFAR-10-semi-1k setup. The results in Table

8: Ablation study on CIFAR-10-semi-4k (full results) TableB.9: Benchmark against components of AutoAttack on CIFAR-10-semi-4k TableB.12: Benchmark against AtutoAttack with varying budgets on CIFAR-10-semi-1k

ACKNOWLEDGMENTS

This work is supported by the DSO National Laboratories of Singapore. The authors would like to thank the DSO project team, in particular Dr. Loo Nin Teow and Dr. Bingquan Shen for valuable discussions on adversarial robustness.

