DISENTANGLING STYLE AND CONTENT FOR LOW RESOURCE VIDEO DOMAIN ADAPTATION: A CASE STUDY ON KEYSTROKE INFERENCE ATTACKS

Abstract

Keystroke inference attacks are a form of side-channels attacks in which an attacker leverages various techniques to recover a user's keystrokes as she inputs information into some display (for example, while sending a text message or entering her pin). Typically, these attacks leverage machine learning approaches, but assessing the realism of the threat space has lagged behind the pace of machine learning advancements, due in-part, to the challenges in curating large real-life datasets. This paper aims to overcome the challenge of having limited number of real data by introducing a video domain adaptation technique that is able to leverage synthetic data through supervised disentangled learning. Specifically, for a given domain, we decompose the observed data into two factors of variation: Style and Content. Doing so provides four learned representations: real-life style, synthetic style, real-life content and synthetic content. Then, we combine them into feature representations from all combinations of style-content pairings across domains, and train a model on these combined representations to classify the content (i.e., labels) of a given datapoint in the style of another domain. We evaluate our method on real-life data using a variety of metrics to quantify the amount of information an attacker is able to recover. We show that our method prevents our model from overfitting to a small real-life training set, indicating that our method is an effective form of data augmentation.

1. INTRODUCTION

We are exceedingly reliant on our mobile devices in our everyday lives. Numerous activities, such as banking, communications, and information retrieval, have gone from having separate channels to collapsing into one: through our mobile phones. While this has made many of our lives more convenient, this phenomena further incentivizes attackers seeking to steal information from users. Therefore, studying different attack vectors and understanding the realistic threats that arise from attackers' abilities to recover user information is imperative to formulating defenses. The argument for studying these attacks is not a new one. A rich literature of prior works studying both attacks and defenses has assessed a wide array of potential attack vectors. The majority of these attacks utilize various machine learning algorithms to predict the user's keystrokes, (Raguram et al., 2011; Cai & Chen, 2012; Xu et al., 2013; Sun et al., 2016; Chen et al., 2018; Lim et al., 2020) , but the ability to assess attackers leveraging deep learning methods has lagged due to the high costs of curating real-life datasets for this domain, and the lack of publicly available datasets. Despite all the recent attention to keystroke inference attacks, numerous questions have gone unanswered. Which defenses work against adversaries who leverage deep learning systems? Which defenses are easily undermined? Are there weaknesses in deep learning systems that we can use to develop better defenses to thwart state-of-the-art attacks? These questions capture the essence of the underlying principles for research into defenses for keystroke inference atttacks. Given the backand-forth nature of researching attacks and defenses, these questions can not be addressed because of the current inability to assess attacks with deep learning methods. This paper aims to overcome the challenge of having limited number of labeled, real-life data by introducing a video domain adaptation technique that is able to leverage abundantly labeled synthetic 1

