ADVERSARIAL DETECTOR FOR DECISION TREES EN-SEMBLES USING REPRESENTATION LEARNING Anonymous authors Paper under double-blind review

Abstract

Research on adversarial evasion attacks focuses mainly on neural network models. Among other reasons, this is because of their popularity in certain fields (e.g., computer vision and NLP) and the models' properties, making it easier to search for adversarial examples with minimal input change. Decision trees and tree ensembles are still very popular due to their high performance in fields dominated by tabular data and their explainability. In recent years, several works have defined new adversarial attacks targeting decision trees and tree ensembles. As a result, several papers were published focusing on robust versions of tree ensembles. This research aims to create an adversarial detector for attacks on an ensemble of decision trees. While several previous works have demonstrated the generation of more robust tree ensembles, the process of considering evasion attacks during ensemble generation can affect model performance. We demonstrate a method to detect adversarial samples without affecting either the target model structure or its original performance. We showed that by using representation learning based on the structure of the trees, we achieved better detection rates than the state-ofthe-art technique and better than using the original representation of the dataset to train an adversarial detector.

1. INTRODUCTION

In recent decades we have seen the introduction of machine learning algorithms in production environments into various fields such as medical imaging (Zhou et al., 2021) , autonomous driving (Huang & Chen, 2020) and law enforcement (Vestby & Vestby, 2019) . With the leap in performance of those models and their integration into real-life systems, people began to investigate how to bypass classifiers and defend against those malicious attempts ( Dalvi et al., 2004; Lowd & Meek, 2005) . Many papers have addressed examples of adversarial attacks that make small changes that are hard for a human to notice in the inputs of a machine learning model, usually a neural network, so their predictions are wrong. These can be exploited by a malicious actor and used to bypass a model that might, for example, be responsible for a critical classification task affecting people's lives. As a result, various researchers published techniques to detect and defend against adversarial attempts. Most of the research is focused on adversarial attacks targeting neural network models, among other things, because of the nature of their continuous learning space, which allows a gradient ascent process to maximize the model's loss function given a specific input. Thus defenses and detectors mainly target neural network models as well. Tree-based models continue to be very popular, especially for tabular data tasks (Nielsen, 2016; Shwartz-Ziv & Armon, 2022; Grinsztajn et al., 2022) , because they usually demand less data and are more interpretable. There are fewer studies on adversarial attacks and defenses affecting decision tree models. Gradient-descent-based methods commonly used in earlier attack models cannot be applied directly to evade decision trees due to the discrete nature of their non-differentiable decisionmaking paths and tree-splitting rules. Unfortunately, this does not mean that decision trees are unaffected by evasion attacks. In this work, we present a detection technique for adversarial evasion attacks against tree-based classifiers, focusing on boosting ensembles. Our main contributions are: (i) We defined a task that allows us to generate sample representations that rely on the distribution of the dataset in the 1

