ROBUSTNESS OF UNSUPERVISED REPRESENTATION LEARNING WITHOUT LABELS

Abstract

Unsupervised representation learning leverages large unlabeled datasets and is competitive with supervised learning. But non-robust encoders may affect downstream task robustness. Recently, robust representation encoders have become of interest. Still, all prior work evaluates robustness using a downstream classification task. Instead, we propose a family of unsupervised robustness measures, which are modeland task-agnostic and label-free. We benchmark state-of-the-art representation encoders and show that none dominates the rest. We offer unsupervised extensions to the FGSM and PGD attacks. When used in adversarial training, they improve most unsupervised robustness measures, including certified robustness. We validate our results against a linear probe and show that, for MOCOv2, adversarial training results in 3 times higher certified accuracy, a 2-fold decrease in impersonation attack success rate and considerable improvements in certified robustness.

1. INTRODUCTION

Unsupervised and self-supervised models extract useful representations without requiring labels. They can learn patterns in the data and are competitive with supervised models for image classification by leveraging large unlabeled datasets (He et al., 2020; Chen et al., 2020b; d; c; Zbontar et al., 2021; Chen & He, 2021) . Representation encoders do not use task-specific labels and can be employed for various downstream tasks. Such reuse is attractive as large datasets can make them expensive to train. Therefore, applications are often built on top of public domain representation encoders. However, lack of robustness of the encoder can be propagated to the downstream task. Consider the impersonation attack threat model in Fig. 1 . An attacker tries to fool a classifier that uses a representation encoder. The attacker has white-box access to the representation extractor (e.g. an open-source model) but they do not have access to the classification model that uses the representations. By optimizing the input to be similar to a benign input, but to have the representation of a different target input, the attacker can fool the classifier. Even if the classifier is private, one can attack the combined system if the public encoder conflates two different concepts onto similar representations. Hence, robustness against such conflation is necessary to perform downstream inference on robust features. We currently lack ways to evaluate robustness of representation encoders without specializing for a particular task. While prior work has proposed improving the robustness of self-supervised representation learning (Alayrac et al., 2019; Kim et al., 2020; Jiang et al., 2020; Ho & Vasconcelos, 2020; Chen et al., 2020a; Cemgil et al., 2020; Carmon et al., 2020; Gowal et al., 2020; Fan et al., 2021; Nguyen et al., 2022; Kim et al., 2022) , they all require labeled datasets to evaluate the robustness of the resulting models. Instead, we offer encoder robustness evaluation without labels. This is task-agnostic, in contrast to supervised assessment, as labels are (implicitly) associated with a specific task. Labels can also be incomplete, misleading or stereotyping (Stock & Cisse, 2018; Steed & Caliskan, 2021; Birhane & Prabhu, 2021) , and can inadvertently impose biases in the robustness assessment. In this work, we propose measures that do not require labels and methods for unsupervised adversarial training that result in more robust models. To the best of our knowledge, this is the first work on unsupervised robustness evaluation and we make the following contributions to address this problem: 1. Novel representational robustness measures based on clean-adversarial representation divergences, requiring no labels or assumptions about underlying decision boundaries.

