AN EXPERIMENT DESIGN PARADIGM USING JOINT FEA-TURE SELECTION AND TASK OPTIMIZATION Anonymous

Abstract

This paper presents a subsampling-task paradigm for data-driven task-specific experiment design (ED) and a novel method in populationwide supervised feature selection (FS). Optimal ED, the choice of sampling points under constraints of limited acquisition-time, arises in a wide variety of scientific and engineering contexts. However the continuous optimization used in classical approaches depend on a-priori parameter choices and challenging non-convex optimization landscapes. This paper proposes to replace this strategy with a subsampling-task paradigm, analogous to populationwide supervised FS. In particular, we introduce JOFSTO, which performs JOint Feature Selection and Task Optimization. JOFSTO jointly optimizes two coupled networks: one for feature scoring, which provides the ED, the other for execution of a downstream task or process. Unlike most FS problems, e.g. selecting protein expressions for classification, ED problems typically select from highly correlated globally informative candidates rather than seeking a small number of highly informative features among many uninformative features. JOF-STO's construction efficiently identifies potentially correlated, but effective subsets and returns a trained task network. Changed: We demonstrate the approach using parameter estimation and mapping problems in clinically-relevant applications in quantitative MRI and in hyperspectral imaging. Results from simulations and empirical data show the subsampling-task paradigm strongly outperforms classical ED, and within our paradigm, JOFSTO outperforms state-of-the-art supervised FS techniques. JOFSTO extends immediately to wider image-based ED problems and other scenarios where the design must be specified globally across large numbers of acquisitions. Our code is available for reviewers Code (2022).

1. INTRODUCTION

Experiment design (ED) seeks an optimally informative sampling scheme within a budget of measurement time Antony (2003) . The problem arises across a wide range of scientific disciplines and applications wherever mathematical models are fitted to resal-world noisy measurements to estimate quantities that cannot be measured directly e.g. agriculture Gupta et al. (2015) , civil engineering Lye (2002) , economics Jacquemet & L'Haridon (2019), microbiology Vanot & Sergent (2005) . Classical approaches Frieden (2004) ; Montgomery (2001) optimize the design to minimize the uncertainty of parameter values of a prespecified model, often derived from the Fisher information matrix. For any non-linear model, these approaches require a-priori specification of model parameter values to optimize the design for, which leads to circularity, as parameter values are by definition unknown at application stage. Moreover the optimization itself is usually cumbersome over a high-dimensional and highly non-convex space. For example, quantitative imaging techniques estimate and map model parameters pixel by pixel from multi-channel images. Multiple acquisition parameters often control the contrast in each channel. The ED challenge is to identify the combination of acquisition parameters that best inform the estimation of the model parameters, which vary substantially over the image. The popular MRI brain-imaging technique NODDI exemplifies the challenges: five acquisition parameters can vary for each of around 100 channels, thus the ED optimization is 500 dimensional. The standard acquisition protocol was designed by optimizing the Fisher-matrix for one specific combination of parameter values, although the aim of the technique is to highlight contrast in those parameters over the extent of the brain -the acquisition protocol is therefore by definition suboptimal. This paper considers a new paradigm for ED, which we cast as a populationwide feature selection (FS) problem, instead of one of continuous optimization. The paradigm requires training data densely sampled over the space of possible measurements from a representative population of test cases. It identifies i) a subset of locations in the measurement space that allows high performance of a downstream task, and ii) a trained network to support this task. The problem differs from most FS problems addressed by recent approaches Lee et al. ( 2022); Wojtas & Chen (2020) which use e.g. in protein-coding genes or noisy two-moons data, which typically aim to 'identify a small, highly discriminative subset' Kuncheva et al. (2020) . Typically in ED, each measurement individually offers similar amounts of information overall to support task performance, i.e. ED measurements are all informative, but informs different aspects of the task; ED seeks a combination that covers all important aspects. Accordingly, again unlike most FS problems, measurements are highly correlated (see figure 3 ). Thus we propose JOFSTO: an approach for joint FS and task optimization, applied to task-driven ED. JOFSTO's novel architecture simultaneously trains two neural networks end-to-end; the first scores and ranks the features by relevance to optimize subsampling and the second uses a subset of measurements to perform a prespecified task, such as estimating ED model parameters. JOFSTO's subsampling coupled with downstream-supported task training outputs both an optimized ED and a network trained for optimal task performance given that design. A simple scoring mechanism enables JOFSTO to identify effective subsets of potentially highly correlated features efficiently, and joint training during subsampling maintains high task performance. Furthermore we put JOFSTO's novel approach to scoring and feature subsampling in a recursive feature elimination (RFE) framework, to reduce the full set of samples to a small subsample stepwise, which improves the optimization and aids convergence to strong solutions. We demonstrate the benefits of JOFSTO within applications in quantitative (MR and hyperspectral) imaging, where the standard aim is to fit a model to multichannel measurements in each image voxel to obtain informative parameters that provide e.g. biological information. In MRI, acquisition time is limited by factors of cost and the ability of subjects to remain motionless in the noisy and claustrophobic environment of the scanner, thus ED is crucial to support the most accurate image-driven diagnosis, prognosis, or treatment choices. In hyperspectral imaging, recovering highquality information from the few wavelengths chosen by ED, increases acquisition speed, avoids misalignment, reduces storage requirements, and speeds up clinical adoption. Experiments, using both simulations and real-world data, show that JOFSTO outperforms classical ED and produces state-of-the-art performance in a subsampling-reconstruction MRI challenge, and oxygen saturation estimation. Moreover, within the subsampling paradigm, JOFSTO outperforms state-of-the-art FS algorithms Wojtas & Chen (2020); Lee et al. ( 2022) on four datasets/tasks.

2. RELATED WORK

Experiment Design (ED) The design of an experiment is a set A = {a 1 , ..., a C } where each a i ∈ A is a combination of acquisition-parameter settings, i.e. choices of independent variables under the control of the experimenter and C is the number of measurements acquired. Each data acquisition under ED A provides a set of measurements x = (x 1 , ..., x C ), corresponding to the elements of A; x is a sample under design A. ED optimization seeks the A that maximally supports a task, such as estimating parameters θ of model f (x; θ), that relates measurements, x, of a system to underlying properties of interest encapsulated in θ. 2022) raise the possibility of a subsampling approach to ED, but consider it only in specific scenarios, whilst Grussu et al. (2021) does not densely-sample the measurement and spatial domains, instead arguing that this leads to a model-free ED. Here we uniquely couple the subsampling with specific downstream tasks to optimize jointly the



Classical ED approaches typically aim to minimize the expected variance of model parameters e.g. encoded in the Fisher information matrix Pukelsheim (2006), using Bayesian techniques Chaloner & Verdinelli (1995); Kaddour et al. (2020) or indirectly via an operating characteristic curve Montgomery (2001). However, for non-linear models, parameter uncertainty depends on parameter value, so the design-optimization requires prespecification of parameter values of interest, leading to circularity. Moreover, the optimization often becomes cumbersome particularly as model complexity increases and C becomes large Alexander (2008). Our subsampling-task paradigm for ED avoids this circularity and replaces the challenging continuous optimization problem with neural network training. Recent work e.g. Pizzolato et al. (2020); Blumberg et al. (2022); Waterhouse & Stoyanov (

