WHEN TO MAKE AND BREAK COMMITMENTS?

Abstract

In many scenarios, decision-makers must commit to long-term actions until their resolution before receiving the payoff of said actions, and usually, staying committed to such actions incurs continual costs. For instance, in healthcare, a newlydiscovered treatment cannot be marketed to patients until a clinical trial is conducted, which both requires time and is also costly. Of course in such scenarios, not all commitments eventually pay off. For instance, a clinical trial might end up failing to show efficacy. Given the time pressure created by the continual cost of keeping a commitment, we aim to answer: When should a decision-maker break a commitment that is likely to fail-either to make an alternative commitment or to make no further commitments at all? First, we formulate this question as a new type of optimal stopping/switching problem called the optimal commitment problem (OCP). Then, we theoretically analyze OCP, and based on the insight we gain, propose a practical algorithm for solving it. Finally, we empirically evaluate the performance of our algorithm in running clinical trials with subpopulation selection.

1. INTRODUCTION

In many real-world settings, decision-makers must commit to long-term actions and wait until their resolution before receiving the payoff of said actions. Meanwhile, staying committed to such actions incurs continual costs. For instance, in portfolio management, it might take time for an asset to develop additional value after an initial investment, and keeping capital tied up in an asset comes with an opportunity cost for the investor (Markowitz, 1959; Merton, 1969; Karatzas and Wang, 2020) . In an energy network, turning power stations on and off is not an immediate action, hence a sudden increase in energy demand can only be met with a delay after putting more stations into operation, and keeping stations operational obviously consumes resources (Rafique and Jianhua, 2018; Olofsson et al., 2022) . In healthcare, a newly-discovered treatment can only be marketed to patients once a successful clinical trial that targets the said treatment is conducted, which both requires time and is also costly (Kaitin, 2010; Umscheid et al., 2011) . Of course, not all commitments eventually pay off: An asset might end up losing value despite investments, energy demands might shift faster than a network can react to, and a clinical trial might fail to show efficacy for the targeted treatment. Given the time pressure created by the continual cost of keeping a commitment, our goal in this paper is to answer the question: When should a decision-maker break a commitment-thereby avoiding future costs but also forfeiting any potential returns-either to make an alternative commitment instead or to make no further commitments at all? Solving this problem optimally requires a careful balance between exploration and exploitation: The earlier a commitment that is bound to fail is broken, the more resources would be saved (cf. exploitation); but the longer one is kept, the more information is revealed regarding whether the commitment is actually failing or might still succeed (cf. exploration)-and in certain cases, also regarding the prospects of similar commitments one could make instead. Related problems are mostly studied within the context of adaptive experimentation and sequential hypothesis testing (see Section 5). As such, we focus on adaptive experimentation as our main application as well. More specifically, we consider the problem of selecting the target population of an adaptive experiment. Suppose an experimenter, who is interested in proving the efficacy of a new treatment, starts running an initial experiment that targets a certain population of patients. Incidentally, the treatment being tested is effective only for a relatively narrow subpopulation of patients but not for the wider population as a whole. Hence, an experiment targeting the overall population, but not the subpopulation specifically, will most probably fail to prove efficacy and prevent the deployment of the treatment for the patients who would have actually benefited from it, not to mention waste time and resources (Moineddin et al., 2008; Lipkovich et al., 2017; Chiu et al., 2018) . Of course, the experimenter has no knowledge of this in advance but the initial experiment they have set up would slowly reveal more information regarding the effects of the treatment and the fact that the ongoing experiment is bound to fail. In that case, we want to be able to determine at what point the experimenter has enough information to justify breaking their commitment to the initial experiment that targets too wide of a population to be successful, in favor of making a new commitment to a follow-up experiment that focuses on a narrower subpopulation instead? Contributions Our contributions are threefold: First, we formulate the problem of making and breaking commitments in a timely manner as a new type of optimal stopping/switching problem called the optimal commitment problem (OCP) (Section 2). The defining feature of OCP is that rewards are received only when a known time point is reached but costs are incurred continually, requiring commitment to actions but with incentive to abandon those commitments. As we will show later, OCP cannot be easily solved via conventional reinforcement learning techniques due to its non-convex nature. Second, we theoretically analyze a simplified case of OCP to identify the characteristics of the optimal solution (Section 3), and based on the insights we gain, propose a practical algorithm for the more general case (Section 4). Third, we empirically evaluate the performance of our algorithm in running experiments with subpopulation selection (Section 6). Before we move on, it should be emphasized that, although we predominantly consider adaptive experimentation as our main application, our contributions remain generally applicable to portfolio management, energy systems, and any other decision-making scenarios that require commitments to long-term actions.

2. OPTIMAL COMMITMENT PROBLEM

We first introduce the problem of optimal commitment from the perspective of running experiments. As far as our formulation is concerned, experiments are conducted to confirm the efficacy of an intervention by observing the outcome of the said intervention for subjects belonging to a particular population. However, this experiment-focused perspective does not limit the applicability of OCP; we stress its generality later at the end of the section. We provide a glossary of terms and notation in Appendix K. Populations Let X be a discrete set of atomic-populations such that every subject is only the member of exactly one atomic-population x ∈ X . Denote with η x ∈ [0, 1] the probability of a subject being from atomic-population x (such that x∈X η x = 1), and with Ω x the distribution of outcomes for atomic-population x such that the mean outcome θ x = E y∼Ωx [y] is the effect of some intervention for atomic-population x. Now, wider populations can be constructed by combining various atomic-populations. Let any X ⊆ X represent the population of subjects who belong to either one of the atomic-populations {x ∈ X}. Then, the probability of a subject being from population X can be written as η X = x∈X η x , the probability of a subject being from atomic-population x conditioned on the fact that they are from population X can be written as η x|X = η x /η X , and the average effect for population X can be written as θX = x∈X η x|X θ x . Experiments An experiment is largely characterized by the population it targets, its sample horizon, and its success criterion. During an experiment that targets population X, at each time step t ∈ {1, 2, . . .} that the experiment continues, first a subject from some atomic-population x t within the targeted population X arrives with probability η xt|X , and then the outcome y t ∼ Ω xt for that subject is observed. This process generates an online dataset D t = {x t , y t } t t =1 . The experiment terminates when a pre-specified sample/time horizon τ is reached. Once terminated, the experiment is declared a success if ρ(D τ ) = 1, where ρ : (X × R) τ → {0, 1} is the success criterion, and declared a failure otherwise. Formally, the tuple ψ = (X, τ, ρ) constitutes an experiment design. Meta-experimenter Suppose a meta-experimenter is given a set of viable experiment designs Ψ and is tasked with running at least one successful experiment. Each experiment ψ ∈ Ψ has an associated cost C ψ ∈ R + , which the experiment incurs per time step that it continues, and an associated reward R ψ ∈ R + , which the experiment provides only if it eventually succeeds. The metaexperimenter aims to maximize utility-that is the difference between any eventual reward received and the total costs incurred by running experiments. They first pick an initial experiment ψ 1 ∈ Ψ and start conducting it, which generates an online dataset D 1 t as described earlier. Now at each time step t, they need to decide whether they should stay committed to their initial decision and wait until ψ 1 terminates, or stop ψ 1 early in favor of starting a new experiment ψ 2 . They might decide on the latter to avoid unnecessary costs if D 1 t already indicates ψ 1 is unlikely to succeed. If at some point a secondary experiment ψ 2 is started, now the meta-experiment has a similar decision to make

