BEYOND DEEP LEARNING: AN EVOLUTIONARY FEA-TURE ENGINEERING APPROACH TO TABULAR DATA CLASSIFICATION

Abstract

In recent years, deep learning has achieved impressive performance in the computer vision and natural language processing domains. In the tabular data classification scenario, with the emergence of the transformer architecture, a number of algorithms have been reported to yield better results than conventional tree-based models. Most of these methods attribute the success of deep learning methods to the expressive feature construction capability of neural networks. Nonetheless, in real practice, manually designed high-order features with traditional machine learning methods are still widely used because neural-network-based features can be easy to over-fitting. In this paper, we propose an evolution-based feature engineering algorithm to imitate the manual feature construction process through trial and improvement. Importantly, the evolutionary method provides an opportunity to optimize cross-validation loss, where gradient methods fail to do so. On a large-scale classification benchmark of 119 datasets, the experimental results demonstrate that the proposed method outperforms existing fine-tuned state-of-the-art tree-based and deep-learning-based classification algorithms.

1. INTRODUCTION

Tabular data is a typical way of storing information in a computer system, and the tabular data learning methodology has been widely employed in recommendation systems (Wang et al., 2021) , advertising (Yang et al., 2021) , and the medical industry (Spooner et al., 2020) to automate the decision process. Formally, tabular data learning techniques capture the association between some explanatory variables {x 1 , . . . , x m } and a response variable y for a given dataset {({x 1 1 , . . . , x 1 m }, y 1 ), . . . , ({x n 1 , . . . , x n m }, y n )}, where n is the number of data items. There are various machine learning methods that could address the tabular data learning problem, such as decision trees, linear models, and support vector machines. However, these learning algorithms have a limited learning capability to fully tackle this problem. For example, the linear model assumes that the relationship between explanatory variables and response variables is linear, whereas the decision tree posits that an axis-parallel decision boundary can divide distinct categories of samples. These assumptions are not always correct in the real world, and thus machine learning practitioners always employ feature engineering techniques to construct a set of high-order features {{ϕ 1 1 . . . ϕ 1 k }, . . . , {ϕ n 1 . . . ϕ n k }} to improve the learning capability of existing algorithms. In real-world applications, many feature engineering methods can be used by industrial practitioners to construct high-order features before training a machine learning model. The conventional feature engineering strategies include manually designed features, dimensionality reduction methods (Colange et al., 2020) , kernel methods (Allen-Zhu & Li, 2019), and so on. However, these methods have drawbacks. Manual feature construction approaches require a large amount of human labor, kernel methods are hard to be combined with tree-based methods, and dimensionality reduction algorithms are insufficient to enhance state-of-the-art learning algorithms, especially for unsupervised ones (Lensen et al., 2022) . With the success of deep learning, there is a growing interest in using neural networks to build high-order features automatically, particularly in the field of recommendation systems (Song et al., 2019; Lian et al., 2018) . However, it is debatable whether deep learning techniques can learn highorder features effectively (Wang et al., 2021) . Despite the fact that various neural network-based

