SHUFFLED TRANSFORMERS FOR BLIND TRAINING

Abstract

Conventional split learning faces the challenge of preserving training data and model privacy as a part of the training is beyond the data owner's control. We tackle this problem by introducing blind training, i.e., training without being aware of the data or the model, realized by shuffled Transformers. This is attributed to our intriguing findings that the inputs and the model weights of the Transformer encoder blocks, the backbone of Transformer, can be shuffled without degrading the model performance. We not only have proven the shuffling invariance property in theory, but also design a privacy-preserving split learning framework following the property, with little modification to the original Transformer architecture. We carry out verification of the properties through experiments, and also show our proposed framework successfully defends privacy attacks to split learning with superiority.

1. INTRODUCTION

Recent years have witnessed remarkable growth in deep learning applications, as deep neural networks (DNNs) have grown deeper and larger. It poses a dilemma for the thin edge device: on one hand, it lacks the computational power to individually train the models; on the other, data privacy would be violated if it sends all data to an untrusted party, e.g., the cloud, to process. A paradigm called split learning (Gupta & Raskar, 2018) emerges to be a potential solution: without sharing its raw data, the edge transmits intermediate features to the cloud while offloading partial computation. Typically, the private inputs are transformed into intermediate features by feeding through the first few layers of the DNN. The vanilla split learning still faces privacy leakages as an adversary could infer the input from the feature (Erdogan et al., 2021; Isola et al., 2017) . Hence many works have proposed to remove the sensitive information from the features, such as encryption (Lee et al., 2022) , adversarial learning (Xiao et al., 2020 ), differential privacy (Dong et al., 2019) , etc. However, these works mostly sacrifice accuracy or efficiency for privacy guarantee. More importantly, the privacy threat of the model weights trained on the cloud is left to be an open problem -the trained weights reveal the privacy of the training data (Fredrikson et al., 2015; Carlini et al., 2019; Zhang et al., 2020) , and should be proprietary to the data owner, i.e., the edge. We propose a novel blind training framework on the Transformer (Steiner et al., 2021), a state-ofthe-art DNN achieving impressive accuracy performance on a wide range of tasks. Blind training means that the cloud conducts its part of computation 'in blind' -being unaware of the data or the model it trains, yet executing valid computation to assist the edge. The framework resembles the homomorphic encryption where the edge encrypts training data with its key, and feeds to the encrypted DNN hosted in the cloud. The cloud trains the DNN in ciphertext, without knowing the input or the model. Different from the cryptographic tool, our framework is built all in plaintext, and thus avoiding the hassle of encryption. The key is to exploit the shuffle invariance property of Transformers. We discovered that Transformers have an intriguing property that each input, being an image or a sentence, can be randomly permuted within itself, to feed through the network, yet being equivalently trained to that without permutation. Despite that the previous work (Naseer et al., 2021) has recognized Transformer is ignorant of position information without position embeddings, we non-trivially found that even with position embeddings, Transformer is shuffling-invariant, proved by theories. By regarding the permutation order as a 'key,' the edge feeds shuffled training data to the cloud which performs natural training. Another interesting property we found is that, by training on the shuffled data, we inher-

