GREEDY INFORMATION MAXIMIZATION FOR ACTIVE FEATURE ACQUISITION

Abstract

Feature selection is commonly used to reduce feature acquisition costs, but the standard approach is to train models with static feature subsets. Here, we consider the active feature acquisition problem where the model sequentially queries features based on the presently available information. Active feature acquisition has frequently been addressed using reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This approach is theoretically appealing but difficult to implement in practice, so we introduce a learning algorithm based on amortized optimization that recovers the greedy policy when perfectly optimized. We find that the greedy method outperforms both RL-based and static feature selection methods across numerous datasets, which validates our approach as a simple but powerful baseline for this problem.

1. INTRODUCTION

Machine learning models require informative inputs to make accurate predictions, but a model's input features can be costly to acquire. In settings where information is gathered sequentially, and particularly when obtaining features requires time or money, it is reasonable to query features adaptively based on the presently available information. We refer to this as active feature acquisition 1 (Saar-Tsechansky et al., 2009) , and it has been considered by many works in the last decade (Dulac-Arnold et al., 2011; Chen et al., 2015b; Early et al., 2016a; He et al., 2016a; Kachuee et al., 2018) . Feature selection with fixed feature sets (static feature selection) has received more attention (see reviews by Li et al. 2017; Cai et al. 2018 ), but active approaches offer the potential for better performance given a fixed budget. This is easy to see, because selecting the same features for all instances (e.g., all patients visiting a doctor's office) is one possible policy but likely suboptimal in many situations. On the other hand, active approaches are also more challenging because they require both learning a selection policy and making predictions with multiple feature sets. Prior work has approached active feature acquisition in several ways, but often using reinforcement learning (RL) (Dulac-Arnold et al., 2011; Shim et al., 2018; Kachuee et al., 2018; Janisch et al., 2019; Li & Oliva, 2021) . RL is a natural approach for sequential decision-making problems, but current methods are difficult to train and do not reliably outperform static feature selection (Henderson et al., 2018; Erion et al., 2021) . Our work therefore explores a simpler approach: sequentially selecting features based on their conditional mutual information (CMI) with the response variable. The greedy CMI approach is discussed in prior work (Fleuret, 2004; Chen et al., 2015b; Ma et al., 2018) , but it remains difficult to implement because it requires perfect knowledge of the joint data distribution. The focus of this work is therefore developing a simple method to approximate the greedy policy. Our main insight is to leverage amortized optimization (Amos, 2022): by developing an optimization-based characterization of the greedy CMI policy, we design an end-to-end learning approach that recovers the policy when it is perfectly optimized. Our contributions in this work are the following: 1 The problem is also sometimes referred to as sequential or dynamic feature selection. 1

