METAFS: AN EFFECTIVE WRAPPER FEATURE SELECTION VIA META LEARNING

Abstract

Feature selection is of great importance and applies in lots of fields, such as medical and commercial. Wrapper methods, directly comparing the performance of different feature combinations, are widely used in real-world applications. However, selecting effective features meets the following two main challenges: 1) feature combinations are distributed in a huge discrete space; and 2) efficient and precise combinations evaluation is hard. To tackle these challenges, we propose a novel deep meta-learning-based feature selection framework, termed MetaFS, containing a Feature Subset Sampler (FSS) and a Meta Feature Estimator (MetaFE), which transforms the discrete search space into continuous and adopts meta-learning technique for effective feature selection. Specifically, FSS parameterizes the distribution of discrete search space and applies gradient-based methods to optimize. MetaFE learns the representations of different feature combinations, and dynamically generates unique models without retraining for efficient and precise combination evaluation. We adopt a bi-level optimization strategy to optimize the MetaFS. After optimization, we evaluate multiple feature combinations sampled from the converged distribution (i.e., the condensed search space) and select the optimal one. Finally, we conduct extensive experiments on two datasets, illustrating the superiority of MetaFS over 7 state-of-the-art methods.

1. INTRODUCTION

Human activities and real-world applications generate a huge amount of data, where the data has a large number of features. Sometimes, we need to identify the effects of different features and select an optimal feature combination for further study. For example, on online shopping websites, marketer analyze the purchase data to study the purchasing preferences of different peoples and construct the users' profilingAllegue et al. (2020) , which need to select a few number of representative shopping categories for further study. Supervised feature selection (FS) Cai et al. (2018b) , making full use of label (e.g., the group classifications) information, is apposite to solve these problems. Numerous supervised feature selection frameworks have been proposed, including filter Miao & Niu (2016), embedding Tibshirani (1996); Gui et al. ( 2019), and wrapper El Aboudi & Benhlima (2016) frameworks, shown in Figures 1(a, b, and c ). Filter frameworks evaluate each feature individually while embedding frameworks greedily weighting features. They both evaluate features instead of combinations and the selection results are less effective. Wrapper frameworks traverse feature combinations and directly score them by wrapped supervised algorithms, e.g., K-nearest neighbors (KNN) and Linear Regression (LR), which significantly improve the effectiveness of feature selection Sharma & Kaur (2021). However, the number of feature combinations grows exponentially with the number of features. Evaluating such large combinations is computationally intensive, and consequently a small number of evaluations would limit the improvement on the final performance. In summary, there are two major problems remaining in wrapper frameworks, which are listed as follows: Problem 1: How to search feature combinations in a large discrete space? If we select k effective features from a feature set with size n, we need to consider n k possible feature combinations, which forms an exponentially large search space. It's impractical to traverse and compare all feature combinations from a such huge discrete space.

