ROBUSTNESS EXPLORATION OF SEMANTIC INFORMA-TION IN ADVERSARIAL TRAINING

Abstract

In this paper, we look into the problem of adversarial robustness from the semantic information perspective. We present a novel insight that adversarial attacks destroy the correlation between visual representations and semantic word vectors, and adversarial training fixed it. We further find that the correlation between robust features of different categories is consistent with the correlation between corresponding semantic word vectors. Based on that, we introduce the semantic information to assist model training and propose Semantic Constraint Adversarial Robust Learning (SCARL). Firstly, we follow an information-theoretical lens to formulate the mutual information between the visual representation and the corresponding semantic word vector in the embedding space to bridge the information gap. We further provide a differentiable lower bound to optimize such mutual information efficiently. Secondly, we propose a novel semantic structural constraint, encouraging the trained model to keep the structure of visual representations consistent with that of semantic word vectors. Finally, we combine these two techniques with adversarial training to learn robust visual representation. Experimentally, we conduct extensive experiments on several benchmarks, demonstrating that semantic information is indeed beneficial to model robustness.

1. INTRODUCTION

Word embedding is one of the critical technologies in natural language processing (Pennington et al., 2014; Goldberg & Levy, 2014; Tang et al., 2014) . It statistics the co-occurrence frequency between pairs of words within a given context in a large-scale training corpus to learn an encoder that can infer vectors for any words in a learned embedded space. A well-trained word embedding model is usually regarded as a knowledge graph (Matthews & Matthews, 2001; Wang et al., 2018) , in which the meaning of a word is determined by its relationship to other words in the learned vector space. That is, analogies and correlations between words can be presented by the learned vectors (Hohman et al., 2018; Chersoni et al., 2021) , which help the model associate seen objects with unseen objects. Recently, several works have explored using semantic word/text embedding as supervision signs for zero-shot learning and visual-linguistic pre-training, and have achieved impressive successes in various AI tasks (Qiao et al., 2017; Wang et al., 2018; Radford et al., 2021; Wang et al., 2022) . On the other hand, deep neural networks are usually vulnerable to adversarial examples (Szegedy et al., 2014; Goodfellow et al., 2015; Madry et al., 2018; Bhojanapalli et al., 2021) , which severely limits their applications in many security scenarios. Fortunately, some studies (Radford et al., 2021; Yu et al., 2022) have shown that the visual model trained with semantic supervised information has much more robust to distribution shift and adversarial examples than standard trained models. As a result, these preliminaries raise a natural question:

What is the impact of semantic informations on adversarial robustness?

To answer this question, we explore the relationship between semantic information and model robustness from two aspects: distribution and structural relevance. Firstly, we apply the canonical correlation analysis (CCA) (Hotelling, 1992) , which can reflect connections between two random variables. to analyze the distribution relevance between the visual representation and the corresponding semantic word vector. We mainly analyze the correlation coefficient of natural and adversarial image representation with semantic word vector under non-robust and robust models (Madry et al., et al., 2018) , the correlation between the visual representation and word vector has a significant enhancement. As a result, we can summarize a novel intriguing property: the more robust model, the stronger the correlation. Secondly, to verify the semantic word vectors could present the analogies and correlations between words, we visualize the similarity matrix of word vectors generated by a trained Glove (Pennington et al., 2014) on CIFAR-10, which is shown in Figure 2 (c), As can be seen from the figure, the correlation between category 3 (Cat)foot_0 and category 5 (Dog) is stronger than the correlation between category 3 (Cat) and category 9 (Truck). We further visualize the similarity between different categories of non-robust features , and the similarity of robust features. which are shown in Figure 2 (a) and (b) respectively. We can observe that the robustness feature can also reflect the relatedness between categories, and it is similar to the relatedness reflected by the semantic word vector. However, the non-robust features cannot reflect the association between categories. Recently, CLIP (Radford et al., 2021) uses large-scale image-text pairs to jointly learn semantic representations. Therefore, we also visualize the semantic representation correlation matrix learned by CLIP. which is shown in Figure 2 (d) . the semantic representations learned by CLIP present analogies and correlations between categories, but there is a certain gap with the real semantics. Taking our analysis into consideration, we introduce the semantic information learned by word embedding into model training, which aims at improving the robustness of the current neural networks (He et al., 2016a) . We follow an information-theoretical perspective to bridge the information gap



The CIFAR-10 contains 10 categories: airplane (0), car (1), bird (2), cat (3), deer (4), dog (5), frog (6), horse (7), ship (8), truck (9).



Figure 1: The canonical correlation analysis (CCA) of the natural and adversarial image with the semantic words under the natural and adversarial trained model, respectively. In each plot, we sample 500 image-words pairs to calculate the correlation coefficient. * indicates the model is trained by FGSM, it's less robust than standard adversarial training (d). The larger the CCA, the stronger the correlation between the visual representation and semantic word vector.

Figure 2: The similarity matrix between different categories of features learned by different models on CIFAR-10, Different numbers represent different categories. The similarity is calculated by operate inner product between different categories of normalized features. The color is brighter with a larger similarity.

