ROBUSTNESS EXPLORATION OF SEMANTIC INFORMA-TION IN ADVERSARIAL TRAINING

Abstract

In this paper, we look into the problem of adversarial robustness from the semantic information perspective. We present a novel insight that adversarial attacks destroy the correlation between visual representations and semantic word vectors, and adversarial training fixed it. We further find that the correlation between robust features of different categories is consistent with the correlation between corresponding semantic word vectors. Based on that, we introduce the semantic information to assist model training and propose Semantic Constraint Adversarial Robust Learning (SCARL). Firstly, we follow an information-theoretical lens to formulate the mutual information between the visual representation and the corresponding semantic word vector in the embedding space to bridge the information gap. We further provide a differentiable lower bound to optimize such mutual information efficiently. Secondly, we propose a novel semantic structural constraint, encouraging the trained model to keep the structure of visual representations consistent with that of semantic word vectors. Finally, we combine these two techniques with adversarial training to learn robust visual representation. Experimentally, we conduct extensive experiments on several benchmarks, demonstrating that semantic information is indeed beneficial to model robustness.

1. INTRODUCTION

Word embedding is one of the critical technologies in natural language processing (Pennington et al., 2014; Goldberg & Levy, 2014; Tang et al., 2014) . It statistics the co-occurrence frequency between pairs of words within a given context in a large-scale training corpus to learn an encoder that can infer vectors for any words in a learned embedded space. A well-trained word embedding model is usually regarded as a knowledge graph (Matthews & Matthews, 2001; Wang et al., 2018) , in which the meaning of a word is determined by its relationship to other words in the learned vector space. That is, analogies and correlations between words can be presented by the learned vectors (Hohman et al., 2018; Chersoni et al., 2021) , which help the model associate seen objects with unseen objects. Recently, several works have explored using semantic word/text embedding as supervision signs for zero-shot learning and visual-linguistic pre-training, and have achieved impressive successes in various AI tasks (Qiao et al., 2017; Wang et al., 2018; Radford et al., 2021; Wang et al., 2022) . On the other hand, deep neural networks are usually vulnerable to adversarial examples (Szegedy et al., 2014; Goodfellow et al., 2015; Madry et al., 2018; Bhojanapalli et al., 2021) , which severely limits their applications in many security scenarios. Fortunately, some studies (Radford et al., 2021; Yu et al., 2022) have shown that the visual model trained with semantic supervised information has much more robust to distribution shift and adversarial examples than standard trained models. As a result, these preliminaries raise a natural question:

What is the impact of semantic informations on adversarial robustness?

To answer this question, we explore the relationship between semantic information and model robustness from two aspects: distribution and structural relevance. Firstly, we apply the canonical correlation analysis (CCA) (Hotelling, 1992) , which can reflect connections between two random variables. to analyze the distribution relevance between the visual representation and the corresponding semantic word vector. We mainly analyze the correlation coefficient of natural and adversarial image representation with semantic word vector under non-robust and robust models (Madry et al., 

