RADIAL SPIKE AND SLAB BAYESIAN NEURAL NET-WORKS FOR SPARSE DATA IN RANSOMWARE ATTACKS

Abstract

Paper under double-blind review Ransomware attacks are increasing at an alarming rate, leading to large financial losses, unrecoverable encrypted data, data leakage, and privacy concerns. The prompt detection of ransomware attacks is required to minimize further damage, particularly during the encryption stage. However, the frequency and structure of the observed ransomware attack data makes this task difficult to accomplish in practice. The data corresponding to ransomware attacks represents temporal, highdimensional sparse signals, with limited records and very imbalanced classes. While traditional deep learning models have been able to achieve state-of-the-art results in a wide variety of domains, Bayesian Neural Networks, which are a class of probabilistic models, are better suited to the issues of the ransomware data. These models combine ideas from Bayesian statistics with the rich expressive power of neural networks. In this paper, we propose the Radial Spike and Slab Bayesian Neural Network, which is a new type of Bayesian Neural network that includes a new form of the approximate posterior distribution. The model scales well to large architectures and recovers the sparse structure of target functions. We provide a theoretical justification for using this type of distribution, as well as a computationally efficient method to perform variational inference. We demonstrate the performance of our model on a real dataset of ransomware attacks and show improvement over a large number of baselines, including state-of-the-art models such as Neural ODEs (ordinary differential equations). In addition, we propose to represent low-level events as MITRE ATT&CK tactics, techniques, and procedures (TTPs) which allows the model to better generalize to unseen ransomware attacks.

1. INTRODUCTION

Ransomware attacks are increasing rapidly and causing significant losses to governments, corporations, non-governmental organizations, and individuals. The losses may include financial costs due to ransoms paid to decrypt assets, unrecoverable files when the ransom is not paid or the attacker fails to provide the decryption key, privacy and intellectual property theft when assets are exported, and even significant injury when ransomware impairs health care devices or patient records in hospitals. It is clear that the timely detection of ransomware incidents is necessary in order to minimize the number of assets that are encrypted or exfiltrated (Urooj et al., 2021) . To improve the ransomware response, this work proposes a new Bayesian Neural Network model that offers improved detection rates for organizations which employ analysts to protect their assets and networks. The problem is usually considered as a detection task, where the two classes are ransomware or not. The traditional methods of statistics and machine learning have been proposed to detect security threats in general and specifically ransomware in some cases. From the statistical perspective, a common approach is the application of Bayesian Networks (Perusquía et al., 2020; Oyen et al., 2016; Shin et al., 2015) , whose main goal is to model the relationship between the observed signal and the class of the attack as a graphical model. From the machine learning perspective, a range of models were used to detect ransomware (Alhawi et al., 2018; Poudyal et al., 2018; Zhang et al., 2019; Larsen et al., 2021) , such as Naive Bayes, Gradient Boosting, and Random Forests. Bottleneck. To obtain the rich expressive power of traditional deep learning models, training usually requires having access to a large number of records to successfully obtain robust generalized results. Unfortunately, the frequency and structure of commonly observed data corresponding to ransomware attacks makes this task more difficult to accomplish. In particular, ransomware attack data can be represented as temporal high-dimensional sparse signals, with a limited number of records and very imbalanced classes. In our data, the percentage of ransomware attacks to non-ransomware attacks is 1% versus 99%, respectively. Main ideas and contributions. To address these unique features of the ransomware data, we first propose to represent ransomware signals according their MITRE ATT&CK tactics, techniques, and procedures (TTPs) which allows us to generalize ransomware and other attacks at a higher-level instead of the low-level detections associated with an individual attack. In addition, this allows for the detection of both human operated and automated ransomware attacks across multiple stages in the kill chain within an organization's network. Next, we propose a new probabilistic model which is called the Radial Spike and Slab Bayesian Neural Network. It is a Bayesian Neural Network, where the approximate posterior is represented by a mixture of distributions, resulting in a Radial Spike and Slab distribution. Our model provides the following benefits including: (1) the Spike and Slab component handles missing or sparse data, (2) the Radial component scales well with the growth of the number of parameters in the deep neural network, and (3) the Bayesian component prevents overfitting in the limited data setup. From the theoretical perspective, we provide the justification for using this type of distribution, as well as a computationally efficient method to perform variational inference. In the results section, we demonstrate the performance of our model on a set of actual ransomware attacks and show improvement over a number of baselines, including the state-of-the-art temporal models such as RNNs (Cho et al., 2014) and Neural ODEs (ordinary differential equations) (Chen et al., 2018) . Thus, the proposed model is an important tool for the critical problem of ransomware detection.

2. INCIDENT DATA DESCRIPTION

This work utilizes threat data provided by 'our industry partner' to detect ransomware and other types of cybersecurity attacks. Low-level event generators are manually created by analysts (i.e., signatures) and are provided with a UUID (Universally Unique Identifier). Features. Given each incident, features need to be extracted which capture the range of attack behaviors observed across the kill chain and represent common behaviors across the different families of ransomware attacks. The low-level events cannot be used directly because there are too many to train our model, given the number of labeled examples, and they do not generalize well individually. To overcome these problems, we map a subset of the low-level events into a higher-level representation using the MITRE ATT&CK framework (MITRE). We chose the MITRE ATT&CK framework for the mapping because it provides a knowledge base of adversary tactics, techniques, and procedures (TTPs) and is widely used across the industry for classifying attack behaviors and understanding the lifecycle of an attack. Using the MITRE ATT&CK TTPs is a natural choice for features as it is generalizable, interpretable, and easy to acquire for this data as each low-level event from 'the anonymized company' is tagged with the MITRE technique associated with the alerted behavior (MITRE). For example, one of the features can represent whether 'OS Credential Dumping' happened or not. Additional MITRE ATT&CK features are included in Table 2 , and the entire set is provided by the MITRE corporation (MITRE, 2022a) . The verbose definition of these features can be found in (MITRE). For example, feature T1059.001 "Command and Scripting Interpreter, Powershell" corresponds to "Adversaries may abuse PowerShell commands and scripts for execution" (MITRE, 2022b) . In total, our data is a sparse binary, high-dimensional vector of size 706, which contains 298 MITRE ATT&CK features and 408 additional signature-based features, at each time point. One of the primary characteristics of the data is sparsity because only very few actions are completed at each time step during the attack. Labels. Using manual investigation, an analyst provides a label for each incident indicating whether it is due to a ransomware attack or another type of attack. The ransomware incidents include both human operated ransomware (HumOR) and automated ransomware attacks described in Appendix B in the Supplementary Material. However, our positive class label only indicates that an attack is ransomware and does not distinguish between the two classes of ransomware (i.e., HumOR, Automated). Our goal is to build an alarm-recommendation system, which can not only detect a possible ransomware attack, but also provide an estimate of the uncertainty about the decision. We provide additional details about the training and testing data in Section 4. Ethics. As part of the production data collection process, all data has been processed to remove all personal identifiable information. The datasets we received for this analysis only included a randomly assigned UUID for the organization, and the incidents that included the MITRE events,

