INDISCRIMINATE POISONING ATTACKS ON UNSUPER-VISED CONTRASTIVE LEARNING

Abstract

Indiscriminate data poisoning attacks are quite effective against supervised learning. However, not much is known about their impact on unsupervised contrastive learning (CL). This paper is the first to consider indiscriminate poisoning attacks of contrastive learning. We propose Contrastive Poisoning (CP), the first effective such attack on CL. We empirically show that Contrastive Poisoning, not only drastically reduces the performance of CL algorithms, but also attacks supervised learning models, making it the most generalizable indiscriminate poisoning attack. We also show that CL algorithms with a momentum encoder are more robust to indiscriminate poisoning, and propose a new countermeasure based on matrix completion. Code is available at: https://github.com/kaiwenzha/contrastive-poisoning.



All prior works on indiscriminate poisoning of deep learning are in the context of supervised learning (SL), and use a cross-entropy loss. However, advances in modern machine learning have shown that unsupervised contrastive learning (CL) can achieve the same accuracy or even exceed the performance of supervised learning on core machine learning tasks (Azizi et al., 2021; Radford et al., 2021; Chen et al., 2020b; 2021; Tian et al., 2021; Jaiswal et al., 2021) . Hence, an individual or a company that wants to use a dataset in an unauthorized manner need not use SL. Such a malicious company can use CL to learn a highly powerful representation using unauthorized data access. This paper studies indiscriminate data poisoning attacks on CL. We observe that past works on indiscriminate poisoning attacks on supervised learning do not work in the face of contrastive learning. In Figure 1 , we show empirically that three popular contrastive learning algorithms, SimCLR (Chen et al., 2020a), MoCo (Chen et al., 2020c), BYOL (Grill et al., 2020) are still able to learn highly discriminative features from a dataset poisoned using a state-of-the-art indiscriminate SL poisoning (Fowl et al., 2021a) . In contrast, the attack renders supervised learning completely ineffective. This is because indiscriminate poisoning of SL generates poisoning perturbations that are clustered according to the class labels (Yu et al., 2022) ; but such a design is unlikely to be effective against unsupervised CL because its representation learning does not involve any class labels. We propose Contrastive Poisoning (CP), the first indiscriminate data poisoning attack that is effective against CL. The design of CP involves three components: 1) a poison generation process that attacks the contrastive learning objective (e.g., the InfoNCE loss); 2) a differentiation procedure that attacks data augmentation in CL; and 3) a dual-branch gradient propagation scheme that attacks CL algorithms with a momentum encoder. We empirically evaluate CP on multiple datasets commonly used in prior work on indiscriminate poisoning attacks (CIFAR-10/-100, STL-10, and ImageNet-100). Our results reveal important new findings: • Contrastive learning is more robust to indiscriminate data poisoning than supervised learning. All prior indiscriminate poisoning attacks on SL can be evaded by using CL to learn a representation, then use the labels to learn a linear classifier. • Contrastive Poisoning (CP) is a highly effective attack. Interestingly, the same contrastive poison can attack various CL algorithms (SimCLR, MoCo, BYOL), as well as supervised learning models. • While CP works on all CL algorithms, models that include a momentum encoder (i.e., MoCo and BYOL) are relatively more robust than those that do not (i.e., SimCLR). Furthermore, it is essential to generate the poison using a momentum encoder; poisons generated without a momentum encoder, i.e., using SimCLR, do not generalize well to other CL algorithms. • The best defenses against poisoning SL (i.e., adversarial training (Tao et al., 2021) ) cannot defend against contrastive poisoning. A new data augmentation based on matrix completion works better.

2. RELATED WORK

Indiscriminate poisoning attacks. Indiscriminate poisoning attacks have been well studied in the context of classical machine learning models, like linear regression and support vector machine (Barreno et al., 2006; Biggio et al., 2012) . Indiscriminate poisoning attacks on deep neural networks have recently become a trendy topic due to the need for protecting data from unauthorized use (Muñoz-González et al., 2017; Feng et al., 2019; Shen et al., 2019; Shan et al., 2020; Cherepanova et al., 2021; Yuan & Wu, 2021; Huang et al., 2021; Fowl et al., 2021a; b) . All prior work on indiscriminate data poisoning of deep learning targets supervised learning and uses a cross-entropy loss. The closest past work to ours is in the area of targeted poisoning and backdoor attacks on contrastive learning (Carlini & Terzis, 2022; Jia et al., 2022; Liu et al., 2022) . Targeted poisoning attacks perturb the training data to make the poisoned model misclassify a specific data sample (as opposed to all unseen data). Backdoor poisoning attacks, on the other hand, implant a backdoor into the poisoned model to manipulate its behavior only on inputs that include the backdoor trigger (as opposed to any clean input et al., 2018), and CutMix (Yun et al., 2019) . This past work is in the context of SL. No past work has investigated defenses against indiscriminate poisoning attacks on CL.



* Equal contribution, determined via a random coin flip



). Carlini & Terzis investigates targeted attacks and backdoor attacks on a specific multi-modality contrastive learning framework called CLIP(Radford et al.,  2021). Saha et al. mounts a backdoor attack on contrastive learning by adding triggers to all images from one class in the training set. Truong et al. uses the contrastive loss as a regularization to make neural networks more resilient to backdoor attacks. Our work differs from all of the above attacks and is the first to focus on indiscriminate poisoning of contrastive learning. Geiping et al., 2021) have shown that adversarial training(Madry et al., 2018)  is the most effective way to counter indiscriminate poisoning attacks. They also considered other defense mechanisms such as protecting the learning process by using differentially-private optimizers like DP-SGD(Hong et al., 2020), and data augmentation techniques(Borgnia et al., 2021; Fowl et al., 2021a)   such as additive noise and Gaussian smoothing, Cutout (DeVries & Taylor, 2017), Mixup (Zhang

