ADVERSARIAL DETECTOR FOR DECISION TREES EN-SEMBLES USING REPRESENTATION LEARNING Anonymous authors Paper under double-blind review

Abstract

Research on adversarial evasion attacks focuses mainly on neural network models. Among other reasons, this is because of their popularity in certain fields (e.g., computer vision and NLP) and the models' properties, making it easier to search for adversarial examples with minimal input change. Decision trees and tree ensembles are still very popular due to their high performance in fields dominated by tabular data and their explainability. In recent years, several works have defined new adversarial attacks targeting decision trees and tree ensembles. As a result, several papers were published focusing on robust versions of tree ensembles. This research aims to create an adversarial detector for attacks on an ensemble of decision trees. While several previous works have demonstrated the generation of more robust tree ensembles, the process of considering evasion attacks during ensemble generation can affect model performance. We demonstrate a method to detect adversarial samples without affecting either the target model structure or its original performance. We showed that by using representation learning based on the structure of the trees, we achieved better detection rates than the state-ofthe-art technique and better than using the original representation of the dataset to train an adversarial detector.

1. INTRODUCTION

In recent decades we have seen the introduction of machine learning algorithms in production environments into various fields such as medical imaging (Zhou et al., 2021) , autonomous driving (Huang & Chen, 2020) and law enforcement (Vestby & Vestby, 2019) . With the leap in performance of those models and their integration into real-life systems, people began to investigate how to bypass classifiers and defend against those malicious attempts ( Dalvi et al., 2004; Lowd & Meek, 2005) . Many papers have addressed examples of adversarial attacks that make small changes that are hard for a human to notice in the inputs of a machine learning model, usually a neural network, so their predictions are wrong. These can be exploited by a malicious actor and used to bypass a model that might, for example, be responsible for a critical classification task affecting people's lives. As a result, various researchers published techniques to detect and defend against adversarial attempts. Most of the research is focused on adversarial attacks targeting neural network models, among other things, because of the nature of their continuous learning space, which allows a gradient ascent process to maximize the model's loss function given a specific input. Thus defenses and detectors mainly target neural network models as well. Tree-based models continue to be very popular, especially for tabular data tasks (Nielsen, 2016; Shwartz-Ziv & Armon, 2022; Grinsztajn et al., 2022) , because they usually demand less data and are more interpretable. There are fewer studies on adversarial attacks and defenses affecting decision tree models. Gradient-descent-based methods commonly used in earlier attack models cannot be applied directly to evade decision trees due to the discrete nature of their non-differentiable decisionmaking paths and tree-splitting rules. Unfortunately, this does not mean that decision trees are unaffected by evasion attacks. In this work, we present a detection technique for adversarial evasion attacks against tree-based classifiers, focusing on boosting ensembles. Our main contributions are: (i) We defined a task that allows us to generate sample representations that rely on the distribution of the dataset in the different routes of a tree ensemble. (ii) We designed a pipeline to train and evaluate adversarial detection with reduced possibilities of overfitting or bias.

2. MOTIVATION

Chen et al. ( 2019) proposed a robust decision trees technique against adversarial evasion attacks. The model training algorithm was changed, as a result of which model itself was changed. As part of the experiment, the new model's accuracy was checked and compared to the non-robust model. Of the eleven datasets tested, seven showed a decrease in accuracy. Our primary motivation for this work is to create a defense layer for a decision tree ensemble against adversarial attacks. Our defense layer does not affect the model itself, allowing the model owner to decide if they want defense applied to their existing system. Secondly, production tree-based models use well-known open-source libraries such as XGBoost (Chen & Guestrin, 2016b), CatBoost (Dorogush et al., 2018), and LightGBM (Ke et al., 2017) . These libraries are heavily used, tested, and improved, which is partly why they were chosen in the first place. Currently, as of writing this paper, the above libraries do not contain an official version that is robust against adversarial attacks. Therefore, to add adversarial robustness to a model, it is necessary to use a different third-party version of the model or develop a new one.

3.1. RELATED WORK

We can split the field of adversarial learning into three primary sectors: attacking methods, defending methods, and detectors which aim to detect whether or not a sample is adversarial without changing the model itself. Generating Adversarial Samples. Early work around generating adversarial samples (Goodfellow et al., 2014; Kurakin et al., 2016) used backpropagation to try and discover which input features we should change to maximize the loss function of a model. In the face of more recent attacks, a different loss functions was suggested to find an adversarial sample (Papernot et al., 2016b; Carlini & Wagner, 2017; Cheng et al., 2018) . Other works used concepts from geometry and the location of the boundaries between decision spaces to search for a minimal perturbation for creating an adversarial sample (Moosavi-Dezfooli et al., 2016; Yang et al., 2020) . Because decision tree classifiers are not a continuous space model, earlier backpropagation methods will not work in these cases. Black-box methods, which ignore the internal inner structure of the model, try to approximate the gradients and can generate attacks for decision-trees-based classifiers using multiple queries to find the boundary between different classes in the unknown decision space (Cheng et al., 2018; Chen et al., 2020) . Some relevant white-box techniques focus specifically on the nature of decision trees (Papernot et al., 2016a; Kantchelian et al., 2016; Zhang et al., 2020) . Papernot et al. (2016a) defined an algorithm to search for a given sample, the closest leaf in the neighborhood of the original leaf, and perturb the features between them. Kantchelian et al. ( 2016) formulated a set of equality and inequality constraints based on the tree structure to generate an optimal adversarial sample for tree ensembles using a mixed-integer linear program. Model Defenses Against Adversarial Attacks. Common approach for protecting models is to train a robust model for evasion attacks. Adversarial training (Goodfellow et al., 2014) is one of these methods, with which one can generate adversarial samples and add them to the training data. Other suggested solutions use known techniques with other purposes, such as knowledge distillation, as shown in Papernot et al. (2016c) , which used its traits to create a newer model version with smaller gradients to make it more difficult to generate adversarial samples. Another work is Wang et al. (2018) , which used dropout in prediction time to reduce the dependency on specific neurons in a neural network.

