NET-DNF: EFFECTIVE DEEP MODELING OF TABULAR DATA

Abstract

A challenging open question in deep learning is how to handle tabular data. Unlike domains such as image and natural language processing, where deep architectures prevail, there is still no widely accepted neural architecture that dominates tabular data. As a step toward bridging this gap, we present Net-DNF a novel generic architecture whose inductive bias elicits models whose structure corresponds to logical Boolean formulas in disjunctive normal form (DNF) over affine soft-threshold decision terms. Net-DNFs also promote localized decisions that are taken over small subsets of the features. We present extensive experiments showing that Net-DNFs significantly and consistently outperform fully connected networks over tabular data. With relatively few hyperparameters, Net-DNFs open the door to practical end-to-end handling of tabular data using neural networks. We present ablation studies, which justify the design choices of Net-DNF including the inductive bias elements, namely, Boolean formulation, locality, and feature selection.

1. INTRODUCTION

A key point in successfully applying deep neural models is the construction of architecture families that contain inductive bias relevant to the application domain. Architectures such as CNNs and RNNs have become the preeminent favorites for modeling images and sequential data, respectively. For example, the inductive bias of CNNs favors locality, as well as translation and scale invariances. With these properties, CNNs work extremely well on image data, and are capable of generating problem-dependent representations that almost completely overcome the need for expert knowledge. Similarly, the inductive bias promoted by RNNs and LSTMs (and more recent models such as transformers) favors both locality and temporal stationarity. When considering tabular data, however, neural networks are not the hypothesis class of choice. Most often, the winning class in learning problems involving tabular data is decision forests. In Kaggle competitions, for example, gradient boosting of decision trees (GBDTs) (Chen & Guestrin, 2016; Friedman, 2001; Prokhorenkova et al., 2018; Ke et al., 2017) are generally the superior model. While it is quite practical to use GBDTs for medium size datasets, it is extremely hard to scale these methods to very large datasets. Scaling up the gradient boosting models was addressed by several papers (Ye et al., 2009; Tyree et al., 2011; Fu et al., 2019; Vasiloudis et al., 2019) . The most significant computational disadvantage of GBDTs is the need to store (almost) the entire dataset in memoryfoot_0 . Moreover, handling multi-modal data, which involves both tabular and spatial data (e.g., medical records and images), is problematic. Thus, since GBDTs and neural networks cannot be organically optimized, such multi-modal tasks are left with sub-optimal solutions. The creation of a purely neural model for tabular data, which can be trained with SGD end-to-end, is therefore a prime open objective. A few works have aimed at constructing neural models for tabular data (see Section 5). Currently, however, there is still no widely accepted end-to-end neural architecture that can handle tabular data and consistently replace fully-connected architectures, or better yet, replace GBDTs. Here we present Net-DNFs, a family of neural network architectures whose primary inductive bias is an ensemble comprising a disjunctive normal form (DNF) formulas over linear separators. This family also promotes (input) feature selection and spatial localization of ensemble members. These inductive



This disadvantage is shared among popular GBDT implementations: XGBoost, LightGBM, and CatBoost. 1

