ACTIVE FEATURE ACQUISITION WITH GENERATIVE SURROGATE MODELS

Abstract

Many real-world situations allow for the acquisition of additional relevant information when making an assessment with limited or uncertain data. However, traditional ML approaches either require all features to be acquired beforehand or regard part of them as missing data that cannot be acquired. In this work, we propose models that perform active feature acquisition (AFA) to improve the prediction assessments at evaluation time. We formulate the AFA problem as a Markov decision process (MDP) and resolve it using reinforcement learning (RL). The AFA problem yields sparse rewards and contains a high-dimensional complicated action space. Thus, we propose learning a generative surrogate model that captures the complicated dependencies among input features to assess potential information gain from acquisitions. We also leverage the generative surrogate model to provide intermediate rewards and auxiliary information to the agent. Furthermore, we extend AFA in a task we coin active instance recognition (AIR) for the unsupervised case where the target variables are the unobserved features themselves and the goal is to collect information for a particular instance in a cost-efficient way. Empirical results demonstrate that our approach achieves considerably better performance than previous state of the art methods on both supervised and unsupervised tasks.

1. INTRODUCTION

A typical machine learning paradigm for discriminative tasks is to learn the distribution of an output, y given a complete set of features, x ∈ R d : p(y | x). Although this paradigm is successful in a multitude of domains, it is incongruous with the expectations of many real-world intelligent systems in two key ways: first, it assumes that a complete set of features has been observed; second, as a consequence, it also assumes that no additional information (features) of an instance may be obtained at evaluation time. These assumptions often do not hold; human agents routinely reason over instances with incomplete data and decide when and what additional information to obtain. For example, consider a doctor diagnosing a patient. The doctor usually has not observed all possible measurements (such as blood samples, x-rays, etc.) for the patient. He/she is not forced to make a diagnosis based on the observed measurements; instead, he/she may dynamically decide to take more measurements to help determine the diagnosis. Of course, the next measurement to make (feature to observe), if any, will depend on the values of the already observed features; thus, the doctor may determine a different set of features to observe from patient to patient (instance to instance) depending on the values of the features that were observed. Hence, not each patient will have the same subset of features selected (as would be the case with typical feature selection). Furthermore, acquiring features typically involves some cost (in time, money and risk), and intelligent systems are expected to automatically balance the cost and the return on improvement of the task performance. In order to more closely match the needs of many real-world applications, we propose an active feature acquisition (AFA) model that not only makes predictions with incomplete/missing features, but also determines what next feature would be the most valuable to obtain for a particular instance. In this work, we formulate the active feature acquisition problem as a Markov decision process (MDP), where the state is the set of currently observed features and the action is the next feature to acquire. We also introduce a special action to indicate whether to stop the acquisition process and make a final prediction. Reinforcement learning is then utilized to optimize the MDP, and the agent learns a policy for selecting which next feature to acquire based on the current state. After acquiring its value and paying the acquisition cost, the newly acquired feature is added to the observed subset and the agent proceeds to the next acquisition step. Once the agent decides to terminate the acquisition, it makes a final prediction based on the features acquired thus far. For example, in an image classification task (Fig. 1 ), the agent would dynamically acquire pixels until it is certain of the image class. The goal of the agent is to maximize the prediction performance while minimizing the acquisition cost. In the aforementioned MDP, the agent pays the acquisition cost at each acquisition step but only receives a reward about the prediction after completing the acquisition process. To reduce the sparsity of the rewards and simplify the credit assignment problem for potentially long episodes (Minsky, 1961; Sutton, 1988) , we leverage a surrogate model to provide intermediate rewards. The surrogate model captures the arbitrary conditional distribution p(y, x u | x o ), where y is the target variable and u, o ⊆ {1, . . . , d} are arbitrary subsets of all d-dimensional features. Note that the surrogate model must be able to capture arbitrary conditionals (for subsets u, o) since the acquired features will vary from instance to instance. We propose using the surrogate model to calculate intermediate rewards by assessing the information gain of the newly acquired feature, which quantifies how much our confidence about the prediction improves by acquiring this feature. In addition to producing intermediate rewards, we also propose using the surrogate model to provide side information that assists the agent. First, in order to inform the agent of the current information held in observed features, we pass uncertainty on the target through p(y | x o ). Second, to inform the agent about potential values for unobserved features, we pass imputed values by sampling xu ∼ p(x u | x o ). Lastly, to inform the agent about the expected utility of acquisitions, we pass an estimate of the expected information gain of acquisitions i for the target variable, i.e., H(y | x o ) -E p(xi|xo) H(y | x i , x o ). We note that the expected information gain can be used to directly build a greedy policy, where the next feature to acquire is the one maximizes the expected information gain (Ma et al., 2018; Gong et al., 2019) . In contrast, our agent learns a non-greedy policy to maximize the long-term returns and use the greedy approach as a 'prior' policy to guide our agent. In summary, our agent actively acquires new feature and pays the acquisition cost until it decides to terminate the acquisition process and make a final prediction. Meanwhile, the surrogate model calculates the information gain of the acquired feature as an intermediate reward and provides side information to assist the agent in assessing its current uncertainty and help it 'look ahead' to expected outcomes from future acquisitions. When the acquisition process is completed, the environment provides a final reward based on the agent's prediction. Note that the environment does have access to the ground-truth target y to evaluate the reward, but cannot reveal it to the agent. Equipped with the surrogate model, our method, denoted as GSMRL, essentially combines model-free and modelbased RL into a holistic framework. Above we discussed AFA for supervised tasks, where the goal is to acquire new features to predict a target variable y. In some cases, however, there may not be a single target variable, but instead the target of interest may be the remaining unobserved features themselves. That is, rather than reduce the uncertainty with respect to some desired output response (that cannot be directly queried and must be predicted), we now propose active instance recognition (AIR), where the task is to query as few features as possible that allows the agent to correctly uncover the remaining unobserved features. For example, in image data AIR, an agent queries new pixels until it can reliably uncover the remaining pixels (see Fig. 2 ). AIR is especially relevant in survey tasks, which are broadly applicable across various domains and applications. Most surveys aim to discover a broad set of underlying characteristics of instances (e.g., citizens in a census) using a limited number of queries (questions in the census form), which is at the core of AIR. Policies for AIR would build a personalized subset



Figure 1: Active feature acquisition on MNIST. Example of the acquisition process and the corresponding prediction probabilities.

