NEURAL DESIGN FOR GENETIC PERTURBATION EXPER-IMENTS

Abstract

The problem of how to genetically modify cells in order to maximize a certain cellular phenotype has taken center stage in drug development over the last few years (with, for example, genetically edited CAR-T, CAR-NK, and CAR-NKT cells entering cancer clinical trials). Exhausting the search space for all possible genetic edits (perturbations) or combinations thereof is infeasible due to cost and experimental limitations. This work provides a theoretically sound framework for iteratively exploring the space of perturbations in pooled batches in order to maximize a target phenotype under an experimental budget. Inspired by this application domain, we study the problem of batch query bandit optimization and introduce the Optimistic Arm Elimination (OAE) principle designed to find an almost optimal arm under different functional relationships between the queries (arms) and the outputs (rewards). We analyze the convergence properties of OAE by relating it to the Eluder dimension of the algorithm's function class and validate that OAE outperforms other strategies in finding optimal actions in experiments on simulated problems, public datasets well-studied in bandit contexts, and in genetic perturbation datasets when the regression model is a deep neural network. OAE also outperforms the benchmark algorithms in 3 of 4 datasets in the GeneDisco experimental planning challenge.

1. INTRODUCTION

We are inspired by the problem of finding the genetic perturbations that maximize a given function of a cell (a particular biological pathway or mechanism, for example the proliferation or exhaustion of particular immune cells) while performing the least number of perturbations required. In particular, we are interested in prioritizing the set of genetic knockouts (via shRNA or CRISPR) to perform on cells that would optimize a particular scalar cellular phenotype. Since the space of possible perturbations is very large (with roughly 20K human protein-coding genes) and each knockout is expensive, we would like to order the perturbations strategically so that we find one that optimizes the particular phenotype of interest in fewer total perturbations than, say, just brute-force applying all possible knockouts. In this work we consider only single-gene knockout perturbations since they are the most common, but multi-gene perturbations are also possible (though considerably more technically complex to perform at scale). While a multi-gene perturbation may be trivially represented as a distinct (combined) perturbation in our framework, we leave for future work the more interesting extension of embedding, predicting, and planning these multi-gene perturbations using previously observed single-gene perturbations. With this objective in mind we propose a simple method for improving a cellular phenotype under a limited budget of genetic perturbation experiments. Although this work is inspired by this concrete biological problem, our results and algorithms are applicable in much more generality to the setting of experimental design with neural network models. We develop and evaluate a family of algorithms for the zero noise batch query bandit problem based on the Optimistic Arm Elimination principle (OAE). We focus on developing tractable versions of these algorithms compatible with neural network function approximation. During each time-step OAE fits a reward model on the observed responses seen so far while at the same time maximizing the reward on all the arms yet to be pulled. The algorithm then queries the batch of arms whose predicted reward is maximal among the arms that have not been tried out. We conduct a series of experiments on synthetic and public data from the UCI Dua & Graff (2017) database and show that OAE is able to find the optimal "arm" using fewer batch queries than other algorithms such as greedy and random sampling. Our experimental evaluation covers both neurally realizable and not neurally realizable function landscapes. The performance of OAE against benchmarks is comparable in both settings, demonstrating that although our presentation of the OAE algorithm assumes realizability for the sake of clarity, it is an assumption that is not required in practice. In the setting where the function class is realizable i.e. the function class F used by OAE contains the function generating the rewards, and the evaluation is noiseless we show two query lower bounds for the class of linear and 1-Lipshitz functions. We validate OAE on the public CMAP dataset Subramanian et al. ( 2017), which contains tens of thousands of genetic shRNA knockout perturbations, and show that it always outperforms a baseline and almost always outperforms a simpler greedy algorithm in both convergence speed to an optimal perturbation and the associated phenotypic rewards. These results illustrate how perturbational embeddings learned from one biological context can still be quite useful in a different biological context, even when the reward functions of these two contexts are different. Finally we also benchmark our methods in the GeneDisco dataset and algorithm suite (see Mehrjou et al. ( 2021)) and show OAE to be competitive against benchmark algorithms in the task of maximizing HitRatios. et al. (2014) . In this work we move beyond the typical parametric and Bayesian assumptions from these works towards algorithms that work in conjunction with neural network models. We provide guarantees for the no noise setting we study based on the Eluder dimension Russo & Van Roy (2013).

2. RELATED WORK

Parallel Bandits Despite its wide applicability in many scientific applications, batch learning has been studied relatively seldom in the bandit literature. Despite this, recent work (Chan et al., 2021) show that in the setting of contextual linear bandits (Abbasi-Yadkori et al., 2011) , the finite sample complexity of parallel learning matches that of sequential learning irrespective of the batch size provided the number of batches is large enough. Unfortunately, this is rarely the regime that matters in many practical applications such as drug development where the size of the experiment batch may be large but each experiment may be very time consuming, thus limiting their number. In this work we specifically address this setting in our experimental evaluation in Section E. 2020) are designed to add an optimistic bonus to model predictions of a nature that can be analytically computed as is extremely reminiscent of the one used in linear bandits (Auer, 2002; Dani et al., 2008) , thus their theoretical validity depends on the 'linearizing' conditions to hold. More



Optimization The field of Bayesian optimization has long studied the problem of optimizing functions severely limited by time or cost Jones et al. (1998). For example, Srinivas et al. (2009) introduce the GP-UCB algorithm for optimizing unknown functions. Other approaches based on adaptive basis function regression have also been used to model the payoff function as in Snoek et al. (2015). These algorithms have been used in the drug discovery context. Mueller et al. (2017) applied Bayesian optimization to the problem of optimizing biological phenotypes. Very recently, GeneDisco was released as a benchmark suite for evaluating active learning algorithms for experiment design in drug discovery Mehrjou et al. (2021). Perhaps the most relevant to our setting are the many works that study the batch acquisition setting in Bayesian active learning and optimization such as Kirsch et al. (2019); Kathuria et al. (2016) and the GP -BUCB algorithm of Desautels

Prior work in experiment design tries to identify causal structures with a fixed budget of experiments Ghassami et al. (2018). Scherrer et al Scherrer et al. (2021) proposes a mechanism to select intervention targets to enable more efficient causal structure learning. Sussex et al. (2021) extend the amount of information contained in each experiment by simultaneously intervening on multiple variables. Causal matching, where an experimenter can perform a set of interventions aimed to transform the system to a desired state, is studied in Zhang et al. (2021). Neural Bandits Methods such as Neural UCB and Shallow Neural UCB Zhou et al. (2020); Xu et al. (

