EVALUATION OF ACTIVE FEATURE ACQUISITION METHODS UNDER MISSING DATA Anonymous authors Paper under double-blind review

Abstract

Machine learning (ML) methods generally assume the full set of features are available at no cost. If the acquisition of a certain feature is costly at run-time, one might want to balance the acquisition cost and the predictive value of the feature for the ML task. The task of training an AI agent to decide which features are necessary to be acquired is called active feature acquisition (AFA). Current AFA methods, however, are challenged when the AFA agent has to be trained/tested with datasets that contain missing data. We formulate, for the first time, the problem of active feature acquisition performance evaluation (AFAPE) under missing data, i.e. the problem of adjusting for the inevitable missingness distribution shift between train/test time and run-time. We first propose a new causal graph, the AFA graph, that characterizes the AFAPE problem as an intervention on the environment used to train AFA agents. Here, we discuss that for handling missing data in AFAPE, the conventional approaches (off-policy policy evaluation, blocked feature acquisitions, imputation and inverse probability weighting (IPW)) often lead to biased results or are data inefficient. We then propose active feature acquisition importance sampling (AFAIS), a novel estimator that is more data efficient than IPW. We demonstrate the detrimental conclusions to which biased estimators can lead as well as the high data efficiency of AFAIS in multiple experiments using simulated and real-world data under induced MCAR, MAR and MNAR missingness.

1. INTRODUCTION

Machine learning methods generally assume the full set of input features is available at run-time at little to no cost. This is, however, not always the case as acquiring features may impose a significant cost. For example in medical diagnosis, the cost of feature acquisition (e.g. a biopsy test) could include both its monetary cost as well as the potential adverse harm for patients. In this case, the predictive value of a feature should be balanced against its acquisition cost. Physicians acquire certain features via biopsies, MRI scans, or lab tests, only if their diagnostic value outweighs their cost or risk. This challenge becomes more critical when physicians aim to predict a large number of diverse outcomes, each of which has different sets of informative features. Going back to the medical example, a typical emergency department (ED) is able to diagnose thousands of different diseases based on a large set of possible observations. For every new emergency patient entering ED with an unknown diagnosis, clinicians must narrow down their search for a proper diagnosis via step by step feature acquisitions. In this case an ML model designed to do prediction given the entire feature set is infeasible. Active feature acquisition (AFA) addresses this problem by designing two AI systems: i) a so-called AFA agent, deciding which features must be observed, while balancing information gain vs. feature cost; ii) an ML prediction model, often a classifier, that solves the prediction task based on the acquired set of features. An AFA agent, by definition, induces missingness by selecting only a subset of features. We call this AFA missingness which occurs at run-time (e.g. when the AFA agent is deployed at the hospital). In addition, in many AFA applications, retrospective data which we use for model training and evaluation also contain missing entries. This is induced by a different feature acquisition process (e.g. by physicians, ordering from a wide range of diagnostic tests). We call this retrospective missingness. While using retrospective data (during training/evaluation), the agent can only decide among available features. At run-time, however, we make the assumption that the agent

