VERA VERTO: MULTIMODAL HIJACKING ATTACK

Abstract

The increasing cost of training machine learning (ML) models has led to the inclusion of new parties to the training pipeline, such as users who contribute training data and companies that provide computing resources. This involvement of such new parties in the ML training process has introduced new attack surfaces for an adversary to exploit. A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own -possibly malicious -hijacking tasks. However, the scope of the model hijacking attack is so far limited to computer vision-related tasks. In this paper, we transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities. Specifically, we focus on the setting where an adversary implements a natural language processing (NLP) hijacking task into an image classification model. To mount the attack, we propose a novel encoder-decoder based framework, namely the Blender, which relies on advanced image and language models. Experimental results show that our modal hijacking attack achieves strong performances in different settings. For instance, our attack achieves 94%, 94%, and 95% attack success rate when using the Sogou news dataset to hijack STL10, CIFAR-10, and MNIST classifiers.

1. INTRODUCTION

Machine learning (ML) has become a critical component of various applications. Yet, this development has caused the ML models to be increasingly expensive to train. Hence, the training of ML models has transformed gradually to a joint process, e.g., new parties are included in the training of the model either by providing data or computational resources. However, the involvement of these new parties has created new attack surfaces against ML models, e.g., poison and backdoor attacks (Shafahi et al., 2018; Chen et al., 2017) . Another recent attack in this domain is the model hijacking attack (Salem et al., 2022a) , where the adversary is able to implement their own -hijacking -task into a target victim model. Concretely, the adversary poisons the training dataset of the target model with their own hijacking dataset. The hijacking dataset is first camouflaged for stealthiness to look similar to the target's model dataset. This attack could induce two different risks. The first one is about accountability which is the main threat for hijacking attacks, where the model owner can be framed by the adversary to perform illegal or unethical tasks without knowing. The second one is parasitic computing, where the model owner pays the model maintenance costs, while the adversary uses/offers it for their own application/service for free. On the other hand, the model hijacking technique can also be adapted to compress models, i.e., training a single model for multiple tasks. However, the previous work limits the applicable domains to computer vision (CV) related tasks, even though ML has achieved great success in many domains, e.g., the multiple available translators such as DeepL and Google Translate, and the different face detectors on social media platforms. Moreover, the previous model hijacking attack mandates that the hijacking and original tasks have the same modality. Relaxing this assumption will significantly increase the risks of the model hijacking attack, as the adversary can now target models with different modalities, i.e., more target models exist for the adversary to perform their attack. In this paper, therefore, we transform the model hijacking attack into a more general multimodal setting, i.e., implementing a hijacking task from a completely different domain. More concretely, the adversary can implement a NLP hijacking task into a CV target model, as illustrated in Figure 1 . For short, we refer to our attack as the modal hijacking attack. Our modal hijacking attack follows the same threat model as the model hijacking and poison attacks (Jagielski et al., 2018; Shafahi et al., 1 

