NEURAL UNBALANCED OPTIMAL TRANSPORT VIA CYCLE-CONSISTENT SEMI-COUPLINGS

Abstract

Comparing unpaired samples of a distribution or population taken at different points in time is a fundamental task in many application domains where measuring populations is destructive and cannot be done repeatedly on the same sample, such as in single-cell biology. Optimal transport (OT) can solve this challenge by learning an optimal coupling of samples across distributions from unpaired data. However, the usual formulation of OT assumes conservation of mass, which is violated in unbalanced scenarios in which the population size changes (e.g., cell proliferation or death) between measurements. In this work, we introduce NUBOT, a neural unbalanced OT formulation that relies on the formalism of semi-couplings to account for creation and destruction of mass. To estimate such semi-couplings and generalize out-of-sample, we derive an efficient parameterization based on neural optimal transport maps and propose a novel algorithmic scheme through a cycle-consistent training procedure. We apply our method to the challenging task of forecasting heterogeneous responses of multiple cancer cell lines to various drugs, where we observe that by accurately modeling cell proliferation and death, our method yields notable improvements over previous neural optimal transport methods.

1. INTRODUCTION

Modeling change is at the core of various problems in the natural sciences, from dynamical processes driven by natural forces to population trends induced by interventions. In all these cases, the gold standard is to track particles or individuals across time, which allows for immediate estimation of individual (or aggregate) effects. But maintaining these pairwise correspondences across interventions or time is not always possible, for example, when the same sample cannot be measured more than once. This is typical in biomedical sciences, where the process of measuring is often altering or destructive. For example, single-cell biology profiling methods destroy the cells and thus cannot be used repeatedly on the same cell. In these situations, one must rely on comparing different replicas of a population and, absent a natural identification of elements across the populations, infer these correspondences from data in order to model evolution or intervention effects. The problem of inferring correspondences across unpaired samples in biology has been traditionally tackled by relying on average and aggregate perturbation responses (Green & Pelkmans, 2016; Zhan et al., 2019; Sheldon et al., 2007) or by applying mechanistic or linear models (Yuan et al., 2021; Dixit et al., 2016) in, potentially, a learned latent space (Lotfollahi et al., 2019) . Cellular responses to treatments are, however, highly complex and heterogeneous. To effectively predict the drug response of a patient during treatment and capture such cellular heterogeneity, it is necessary to learn nonlinear maps describing such perturbation responses on the level of single cells. Assuming perturbations incrementally alter molecular profiles of cells, such as gene expression or signaling activities, recent approaches have utilized optimal transport to predict changes and alignments (Schiebinger et al., 2019; Bunne et al., 2022a; Tong et al., 2020) . By returning a coupling between control and perturbed cell states, which overall minimizes the cost of matching, optimal transport can solve that puzzle and reconstruct these incremental changes in cell states over time. Despite the advantages mentioned above, the classic formulation of OT is ill-suited to model processes where the population changes in size, e.g., where elements might be created or destroyed over time. This is the case, for example, in single-cell biology, where interventions of interest typically promote proliferation of certain cells and death of others. Such scenarios violate the assumption of conservation

