SIMULATING ENVIRONMENTS FOR EVALUATING SCARCE RESOURCE ALLOCATION POLICIES Anonymous authors Paper under double-blind review

Abstract

Consider the sequential decision problem of allocating a limited supply of resources to a pool of potential recipients: This scarce resource allocation problem arises in a variety of settings characterized by "hard-to-make" tradeoffs-such as assigning organs to transplant patients, or rationing ventilators in overstretched ICUs. Assisting human judgement in these choices are dynamic allocation policies that prescribe how to match available assets to an evolving pool of beneficiariessuch as clinical guidelines that stipulate selection criteria on the basis of recipient and organ attributes. However, while such policies have received increasing attention in recent years, a key challenge lies in pre-deployment evaluation: How might allocation policies behave in the real world? In particular, in addition to conventional backtesting, it is crucial that policies be evaluated on a variety of possible scenarios and sensitivities-such as distributions of recipients and organs that may diverge from historic patterns. In this work, we present AllSim, an open-source framework for performing data-driven simulation of scarce resource allocation policies for pre-deployment evaluation. Simulation environments are modular (i.e. parameterized componentwise), learnable (i.e. on historical data), and customizable (i.e. to unseen conditions), and -upon interaction with a policy-outputs a dataset of simulated outcomes for analysis and benchmarking. Compared to existing work, we believe this approach takes a step towards more methodical evaluation of scarce resource allocation policies.

1. INTRODUCTION

The distribution of organs for transplant is a prototypical example of the scarce resource allocation problem -one with salient "life-or-death" consequences that places significant pressure on decisionmakers to make implicit but difficult trade-offs. To make the task more manageable, assisting human judgement in these choices are dynamic allocation policies that prescribe how to match each available unit of resource to a potential beneficiary. For instance, the United Network for Organ Sharing (UNOS) stipulates policies for organ allocation according to weighted organ-and patient-specific criteria, such as time on the waiting list, severity of illnesses, human leukocyte antigen matching, prognostic information, and other considerations [1-3]. Likewise, in the machine learning community, a variety of data-driven algorithms have been proposed as drop-in dynamic allocation policies, leveraging modern techniques for estimating treatment effects, predicting survival times, and accounting for organ scarcity-and often succeed in demonstrating high degrees of improvement in terms of life expectancies when deployed and evaluated on a backtested basis [4] [5] [6] . Is such demonstrated backtested performance sufficiently convincing for practitioners to adopt these developed allocation strategies? In many cases the answer is no, since there is still no standardised way in which this backtesting is actually undertaken [7] . The variety of methods that do exist share common challenges: First, when testing a target policy different from the actual policy used to generate the data, any offline evaluation method is immediately biased away from the true open-loop data-generating process [8] . Second, the evaluation methods themselves impute predicted outcomes, often with simple linear models [9, 10], which are not flexible enough to properly test more flexible machine learning methods. The compounding effect of these limitations leads to clinicians finding the results unconvincing [11] , consequently limiting the use of these potentially very beneficial systems.

