GENERALIZING TREE MODELS FOR IMPROVING PRE-DICTION ACCURACY

Abstract

Can we generalize and improve the representation power of tree models? Tree models are often favored over deep neural networks due to their interpretable structures in problems where the interpretability is required, such as in the classification of feature-based data where each feature is meaningful. However, most tree models have low accuracies and easily overfit to training data. In this work, we propose Decision Transformer Network (DTN), our highly accurate and interpretable tree model based on our generalized framework of tree models, decision transformers. Decision transformers allow us to describe tree models in the context of deep learning. Our DTN is proposed based on improving the generalizable components of the decision transformer, which increases the representation power of tree models while preserving the inherent interpretability of the tree structure. Our extensive experiments on 121 feature-based datasets show that DTN outperforms the state-of-the-art tree models and even deep neural networks.

1. INTRODUCTION

Can we generalize and improve the representation power of tree models? The tree models learn a structure where the decision process is easy to follow (Breiman et al., 1984) . Due to this attractive feature of tree models, various efforts are being put to improve their performance or to utilize them as subcomponents of a deep learning model (Kontschieder et al., 2015; Shen et al., 2018) . Compared to typical deep neural networks, the main characteristic of tree models is that input data are propagated through layers without a change in their representations; the internal nodes calculate the probability of an input x arriving at the leaf nodes. The decision process that determines the membership of a training example, i.e., the process of constructing the subsets of training data at each leaf, is the core operation in tree models, which we generalize and improve in this work. Previous works on tree models consider the decisions at different nodes as separate operations, i.e., each node performs its own decision independently from the other nodes in the same layer, based on the arrival probability of an input x to the node. The independence of decisions allows learning to be simplified. However, in order to obtain a comprehensive view, the decisions of multiple nodes must be considered simultaneously. Furthermore, a typical tree model with a depth of L requires b L -1 decision functions, where b is a branching factor, which makes it intractable to construct a deep tree model. In this work, we suggest that many tree models can be generalized to what we term as the decision transformer as Proposition 1. A decision transformer generalizes existing tree models by treating the decisions of each layer as a single operation involving all nodes in that layer. More specifically, the decision transformer views each layer as a stochastic decision (Def. 2) which linearly transforms the membership of each training data by a learned stochastic matrix (Def. 1). The aggregated layerwise view allows the decision transformer to reduce the complexity of analysis to O(L), where L is the tree depth. Furthermore, formulating the tree models in a probabilistic context allows them to be understood by the deep learning framework and provides us a theoretical foundation for deeply understanding the advantages and the limitations of existing tree models in one framework. The in-depth understanding of tree models as decision transformers allows us to propose Decision Transformer Network (DTN), a novel architecture that inherits the interpretability of tree models while improving their representation power. DTN is an extension of tree models into deep networks 1

