THE GANFATHER: CONTROLLABLE GENERATION OF MALICIOUS ACTIVITY TO EXPOSE DETECTION WEAKNESSES AND IMPROVE DEFENCE SYSTEMS

Abstract

Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7-4 trillion euros are laundered annually and go undetected. We propose The GANfather, a method to generate samples with properties of malicious activity, without label requirements. To go around the need for labels, we propose to reward the generation of malicious samples by introducing an extra objective to the typical Generative Adversarial Networks (GANs) loss. Ultimately, our goal is to enhance the detection of illicit activity using the discriminator network as a novel and robust defence system. Optionally, we may encourage the generator to bypass pre-existing detection systems. This setup then reveals defensive weaknesses for the discriminator to correct. We evaluate our method in two real-world use cases, money laundering and recommendation systems. In the former, our method moves cumulative amounts close to 250 thousand dollars through a network of accounts without being detected by an existing system. In the latter, we recommend the target item to a broad user base with as few as 30 synthetic attackers. In both cases, we train a new defence system to capture the synthetic attacks.

1. INTRODUCTION

Many aspects of our society become increasingly dominated by digital systems, in turn providing new opportunities for illicit actors. For example, digital banking enables clients to open bank accounts more easily but also facilitates complex money laundering schemes. It is estimated that undetected money laundering activities worldwide accumulate to C1.7-4 trillion annually (Lannoo & Parlour, 2021) , while operational costs related to anti-money laundering (AML) compliance tasks incurred by financial institutions accumulate to $37.1 billion (Ray, 2021). Another example are recommender systems, which are often embedded in digital services to deliver personalised experiences. However, recommender systems may suffer from injection attacks whenever malicious actors fabricate signals (e.g., clicks, ratings, or reviews) to influence recommendations. These attacks have detrimental effects on the user experience. For example, a one-star decrease in restaurant ratings can lead to a 5 to 9 percent decrease in revenue (Luca, 2016) . The detection of such malicious attacks is challenging in the following aspects. In many cases, these illicit activities are adversarial in nature, where an attacker and a defence system adapt to each other's behaviour over time. Additionally, labelled datasets are unavailable or incomplete in certain domains due to the absence of natural labels and the cost manual of feedback. For example, besides the large amount of undetected money laundering, the investigation of detected suspicious activity is often far from trivial, resulting in a feedback delay that can last months. To address these issues, we propose The GANfather, a method to generate examples of illicit activity and train effective detection systems without any labelled examples. Starting from unlabelled data which we assume to be predominantly legitimate, the proposed method leverages a GAN-like Figure 1 : Comparing our method to some widely used approaches. (a) GAN: a vanilla GAN setup does not require any labels, but one cannot choose the class of a generated sample since the distribution of the data is learned as a whole. (b) conditional GAN (cGAN): using a cGAN, one learns the class-conditional distributions of the data, allowing the user to choose the class of a generated sample. However, labels are needed to train a cGAN. (c) Adversarial Attack (evasion): starting from a malicious example, perturbations are found such that a trained classifier is fooled and miss-classifies the perturbed example. While labels are typically required to select the initial example as well to train the classifier, eventually the adversarial attacks can be used to obtain a more robust classifier. (d) Ours: our method has some desirable properties from the three previous approaches: no labels are needed (as in a GAN), samples of a desired target class are generated (as in a cGAN) and a robust detection system can be trained (as in adversarial training). The combination of these properties in one framework is especially suitable in domains where no labelled data is available. setup (Goodfellow et al., 2014) to train a generator which learns to create malicious activity, as well as a detection model learning to discriminate between real data and synthetic malicious data. To be able to generate samples with malicious properties from legitimate data, our method includes an additional optimisation objective in the training loss of the generator. This objective is a use-casespecific, user-defined differentiable formulation of the goal of the malicious agents. Furthermore, our method optionally allows to incorporate an existing defence system as long as a differentiable formulation is possible. In that case, we penalise the generator when triggering existing detection mechanisms. Our method can then actively find liabilities in an existing system while simultaneously training a complementary detection system to protect against such attacks. Our system makes the following assumptions and has the following desirable properties (in a context of adversarial attacks lacking labelled data): No labelled malicious samples are needed. Here, we assume that our unlabelled data is predominantly of legitimate nature. Samples with features of malicious activity are generated. The key to generate such samples from legitimate data is to introduce an extra objective function that nudges the generator to produce samples with the required properties. We implicitly assume that malicious activity shares many properties with legitimate behaviour. We justify this assumption since attackers often mimic legitimate activity to some degree, in order to avoid raising suspicious or triggering existing detection systems. A robust detection system is trained. By training a discriminator to distinguish between the synthetic malicious samples and real data, we conjecture that the defence against a variety of real malicious attacks can be strengthened. While each of these properties can be found separately in other methods, we believe that the combination of all the properties in a single method is novel and useful in the discussed scenarios. In Figure 1 , we illustrate visually how our method distinguishes itself from some well-known approaches. Finally, while we only perform experiments on two use-cases (anti-money laundering and recommender systems) in the following sections, we believe that the suggested approach is applicable in other domains facing similar constraints, i.e., no labelled data and adversarial attacks, subject to domain-specific adaptations.

