NEURAL UNBALANCED OPTIMAL TRANSPORT VIA CYCLE-CONSISTENT SEMI-COUPLINGS

Abstract

Comparing unpaired samples of a distribution or population taken at different points in time is a fundamental task in many application domains where measuring populations is destructive and cannot be done repeatedly on the same sample, such as in single-cell biology. Optimal transport (OT) can solve this challenge by learning an optimal coupling of samples across distributions from unpaired data. However, the usual formulation of OT assumes conservation of mass, which is violated in unbalanced scenarios in which the population size changes (e.g., cell proliferation or death) between measurements. In this work, we introduce NUBOT, a neural unbalanced OT formulation that relies on the formalism of semi-couplings to account for creation and destruction of mass. To estimate such semi-couplings and generalize out-of-sample, we derive an efficient parameterization based on neural optimal transport maps and propose a novel algorithmic scheme through a cycle-consistent training procedure. We apply our method to the challenging task of forecasting heterogeneous responses of multiple cancer cell lines to various drugs, where we observe that by accurately modeling cell proliferation and death, our method yields notable improvements over previous neural optimal transport methods. A A B B subpopulation disappeared

1. INTRODUCTION

Modeling change is at the core of various problems in the natural sciences, from dynamical processes driven by natural forces to population trends induced by interventions. In all these cases, the gold standard is to track particles or individuals across time, which allows for immediate estimation of individual (or aggregate) effects. But maintaining these pairwise correspondences across interventions or time is not always possible, for example, when the same sample cannot be measured more than once. This is typical in biomedical sciences, where the process of measuring is often altering or destructive. For example, single-cell biology profiling methods destroy the cells and thus cannot be used repeatedly on the same cell. In these situations, one must rely on comparing different replicas of a population and, absent a natural identification of elements across the populations, infer these correspondences from data in order to model evolution or intervention effects. The problem of inferring correspondences across unpaired samples in biology has been traditionally tackled by relying on average and aggregate perturbation responses (Green & Pelkmans, 2016; Zhan et al., 2019; Sheldon et al., 2007) or by applying mechanistic or linear models (Yuan et al., 2021; Dixit et al., 2016) in, potentially, a learned latent space (Lotfollahi et al., 2019) . Cellular responses to treatments are, however, highly complex and heterogeneous. To effectively predict the drug response of a patient during treatment and capture such cellular heterogeneity, it is necessary to learn nonlinear maps describing such perturbation responses on the level of single cells. Assuming perturbations incrementally alter molecular profiles of cells, such as gene expression or signaling activities, recent approaches have utilized optimal transport to predict changes and alignments (Schiebinger et al., 2019; Bunne et al., 2022a; Tong et al., 2020) . By returning a coupling between control and perturbed cell states, which overall minimizes the cost of matching, optimal transport can solve that puzzle and reconstruct these incremental changes in cell states over time. Despite the advantages mentioned above, the classic formulation of OT is ill-suited to model processes where the population changes in size, e.g., where elements might be created or destroyed over time. This is the case, for example, in single-cell biology, where interventions of interest typically promote proliferation of certain cells and death of others. Such scenarios violate the assumption of conservation Figure 1 : a. A semi-coupling pair (γ 1 , γ 2 ) consists of two couplings that together solve the unbalanced OT problem. Intuitively, γ 1 describes where mass goes as it leaves from µ, and γ 2 where it comes from as it arrives in ν. b. NUBOT parameterizes the semi-couplings (γ 1 , γ 2 ) as the composition of reweighting functions η and ζ and the dual potentials f and g between the then balanced problem. of mass that the classic OT problem relies upon. Relaxing this assumption yields a generalized formulation, known as the unbalanced OT (UBOT) problem, for which recent work has studied its properties (Liero et al., 2018; Chizat et al., 2018a) , numerical solution (Chapel et al., 2021) , and has applied it successfully to problems in single cell biology (Yang & Uhler, 2019 ). Yet, these methods typically scale poorly with sample size, are prone to unstable solutions, or make limiting assumptions, e.g., only allowing for destruction but not creation of mass. In this work, we address these shortcomings by introducing a novel formulation of the unbalanced OT problem that relies on the formalism of semi-couplings introduced by Chizat et al. (2018b) , while still obtaining an explicit transport map that models the transformation between distributions. The advantage of the latter is that it allows mapping new out-of-sample points, and it provides an interpretable characterization of the underlying change in distribution. Since the unbalanced OT problem does not directly admit a Monge (i.e., mapping-based) formulation, we propose to learn to jointly 're-balance' the two distributions, thereby allowing us to estimate a map between their rescaled versions. To do so, we leverage prior work (Makkuva et al., 2020; Korotin et al., 2021) that learns the transport map as the gradient of a convex dual potential (Brenier, 1987) parameterized via an input convex neural network (Amos et al., 2017) . In addition, we derive a simple update rule to learn the rescaling functions. Put together, these components yield a reversible, parameterized, and computationally feasible implementation of the semi-coupling unbalanced OT formulation (Fig. 1 ). In short, the main contributions of this work are: (i) A novel formulation of the unbalanced optimal transport problem that weaves together the theoretical foundations of semi-couplings with the practical advantage of transport maps; (ii) A general, scalable, and efficient algorithmic implementation for this formulation based on dual potentials parameterized via convex neural network architectures; and (iii) An empirical validation on the challenging task of predicting perturbation responses of single cells to multiple cancer drugs, where our method successfully predicts cell proliferation and death, in addition to faithfully modeling the perturbation responses on the level of single cells.

2. BACKGROUND

2.1 OPTIMAL TRANSPORT For two probability measures µ, ν in P(X ) with X = R d and a real-valued continuous cost function c ∈ C(X 2 ), the optimal transport problem (Kantorovich, 1942) is defined as OT(µ, ν) := inf γ∈Γ(µ,ν) X 2 c(x, y)γ(dx, dy), (1) where Γ(µ, ν) = {γ ∈ M + (X 2 ), s.t. (Proj 1 ) ♯ γ = µ, (Proj 2 ) ♯ γ = ν} is the set of couplings in the cone of nonnegative Radon measures M + (X 2 ) with respective marginals µ, ν. When instantiated on finite discrete measures, such as µ = n i=1 u i δ xi and ν = m j=1 v j δ yj , with u ∈ Σ n , v ∈ Σ m this problem translates to a linear program, which can be regularized using an entropy term (Peyré & Cuturi, 2019) . For ε ≥ 0, set OT ε (µ, ν) := min P∈U (u,v) ⟨P, [c(x i , y j )] ij ⟩ -εH(P), where H(P) :=ij P ij (log P ij -1) and the polytope U (u, v) is the set of matrices {P ∈ R n×m + , P1 m = u, P ⊤ 1 n = v}. For clarity, we will sometimes write OT ε (u, v, {x i }, {y j }). Notice that the definition above reduces to (1) when ε = 0. Setting ε > 0 yields a faster and differentiable proxy to approximate OT and allows fast numerical approximation via the Sinkhorn algorithm (Cuturi, 2013) , but introduces a bias, since in general OT ε (µ, µ) ̸ = 0. Neural optimal transport. To parameterize (1) and allow to predict how a measure evolves from µ to ν, we introduce an alternative formulation known as the Monge problem (1781) given by OT(µ, ν) = inf T :T ♯ µ=ν X c(x, T (x))dµ(x), with pushforward operator ♯ and the optimal solution T * known as the Monge map between µ and ν. Brenier's theorem (1987) states that this Monge map is necessarily the gradient ∇ψ of a convex potential ψ : X → R such that ∇ψ ♯ µ = ν, i.e., T * (x) = ∇ψ(x). This connection has far-reaching impact and is a central component of recent neural optimal transport solvers (Makkuva et al., 2020; Bunne et al., 2022c; Alvarez-Melis et al., 2022; Korotin et al., 2020; Bunne et al., 2022b; Fan et al., 2021b) . Instead of (indirectly) learning the Monge map T (Yang & Uhler, 2019; Fan et al., 2021a) , it is sufficient to restrict the computational effort to learning a good convex potential ∇ θ , parameterized via input convex neural networks (ICNN) (Amos et al., 2017) , s.t. ∇ θ ψ♯µ = ν. Alternatively, parameterizations of such maps can be carried out via the dual formulation of (1) (Santambrogio, 2015, Proposition 1.11, Theorem 1.39), i.e., OT(µ, ν)  When cost c is the quadratic Euclidean distance, i.e., c = ∥ • ∥ 2 2 , := sup f,g∈C(X ) f ⊕g≤c f dµ + gdν, where the dual potentials f, g are continuous functions from X to R, and f ⊕ g → f (x) + g(x). Based on Brenier (1987) , Makkuva et al. (2020) derive an approximate min-max optimization scheme parameterizing the duals f, g via two convex functions. The objective thereby reads OT(µ, ν) = sup f convex inf g convex 1 2 E ∥x∥ 2 2 + ∥y∥ 2 2 Cµ,ν -E µ [f (x)] -E ν [⟨y, ∇g(y)⟩ -f (∇g(y))] Vµ,ν (f,g) . When parameterizing f and g via a pair of ICNNs with parameters θ f and θ g , this neural OT scheme then allows to predict ν or µ via ∇g θg♯ µ or ∇f θ f ♯ ν, respectively. We further discuss neural primal (Fan et al., 2021a; Yang & Uhler, 2019) and dual approaches (Makkuva et al., 2020; Korotin et al., 2020; Bunne et al., 2021) in §D.2.

2.2. UNBALANCED OPTIMAL TRANSPORT

A major constraint of problem (1) is its restriction to a pair of probability distributions µ and ν of equal mass. Unbalanced optimal transport (Benamou, 2003; Liero et al., 2018; Chizat et al., 2018b) lifts this requirement and allows a comparison between unnormalized measures, i.e., via inf γ∈M+(X 2 ) X 2 c(x, y)γ(dx, dy) + τ 1 D f1 ((Proj 1 ) ♯ γ | µ) + τ 2 D f2 ((Proj 2 ) ♯ γ | ν), with f -divergences D f1 and D f2 induced by f 1 , f 2 , and parameters (τ 1 , τ 2 ) controlling how much mass variations are penalized as opposed to transportation of the mass. When introducing an entropy regularization as in (2), the unbalanced OT problem between discrete measures u and v, i.e., UBOT(u, v) := min Γ∈R n×m + ⟨Γ, [c(x i , y j )] ij ⟩ + τ 1 D f1 (Γ1 n | u) + τ 2 D f2 (Γ ⊤ 1 m | v) -εH(Γ), can be efficiently solved via generalizations of the Sinkhorn algorithm (Chizat et al., 2018a; Cuturi, 2013; Benamou et al., 2015) . We describe alternative formulations of the unbalanced OT problem in detail, review recent applications, and provide a broader literature review in the Appendix ( §A.1).

3. A NEURAL UNBALANCED OPTIMAL TRANSPORT MODEL

The method we propose weaves together a rigorous formulation of the unbalanced optimal transport problem based on semi-couplings (introduced below) with a practical and scalable OT mapping estimation method based on input convex neural network parameterization of the dual OT problem. Semi-coupling formulation. Chizat et al. (2018b) introduced a class of distances that generalize optimal transport for the unbalanced setting. They introduce equivalent dynamic and static formulations of the problem, the latter of which relies on semi-couplings to allow for variations of mass. A formulation closely related (by a change of variables) to the notion of semi-couplings was independently introduced by Liero et al. (2016; 2018) . These are generalizations of couplings whereby only one of the projections coincides with a prescribed measure. Formally, the set of semi-couplings between measures µ and ν is defined as Γ (µ, ν) def. = (γ 0 , γ 1 ) ∈ M + X 2 2 : (Proj 1 ) ♯ γ 0 = µ, (Proj 2 ) ♯ γ 1 = ν . ( ) With this, the unbalanced Kantorovich OT problem can be written as C k (µ, ν) = inf (γ0,γ1)∈Γ(µ,ν) c(x, γ0 γ , y, γ1 γ ) dγ(x, y) , where γ is any joint measure for which γ 0 , γ 1 ≪ γ. Although this formulation lends itself to formal theoretical treatment, it has at least two limitations. First, it does not explicitly model a mapping between measures. Indeed, no analogue of the celebrated Brenier's theorem is known for this setting. Second, deriving a computational implementation of this problem is challenging by the very nature of the semi-couplings: being undetermined along one marginal makes it hard to model the space in (8). Rebalancing with proxy measures. To turn the semi-coupling formulation of unbalanced OT into a computationally feasible method, we propose to conceptually break the problem into balanced and unbalanced subproblems, each tackling a different aspect of the difference between measures: feature transformation and mass rescaling. These in turn imply a decomposition of the semi-couplings of ( 8), as we will show later. Specifically, we seek proxy measures μ and ν with equal mass (i.e., µ(X ) = ν(X )) across which to solve a balanced OT problem through a Monge/Brenier formulation. To decouple measure scaling from feature transformation, we propose to choose μ and ν simply as rescaled versions of µ and ν. Thus, formally, we seek μ, ν ∈ M + (X ) and T, S : X → X such that μ = η • µ, ν = ζ • ν, T ♯ μ = ν, S ♯ ν = μ, where η, ζ : X → R + are scalar fields, η•µ denotes the measure with density η(x) dµ(x) (analogously for ζ • ν), and T, S are a pair of forward/backward optimal transport maps between μ and ν (Fig. 1b ). Devising an optimization scheme to find all relevant components in (9) is challenging. In particular, it involves solving an OT problem whose marginals are not fixed, but that will change as the reweighting functionals η, ζ are updated. We propose an alternating minimization approach, whereby we alternative solve for η, ζ (through an approximate scaling update) and T, S (through gradient updates on ICNN convex potentials, as described in Section 2.1). Updating rescaling functions. Given current estimates of η and T , we consider the UBOT problem (6) between T ♯ (η • µ) = T ♯ μ and ν. Although in general these two measures will not be balanced (hence why we need to use UBOT instead of usual OT), our goal is to eventually achieve this. To formalize this, let us use the shorthand notation γ * UB (α, β) := argmin γ UBOT(γ; α, β), where UBOT is defined in (7). For a fixed T , our goal is to find η such that (Proj 1 ) ♯ [γ * UB (T ♯ (η • µ), ν)] = T ♯ (η • µ), i. e., to rescale µ so that the unbalanced solution would in fact be 'balanced' along that marginal. For the discrete setting (finite samples), this corresponds to finding a vector e ∈ R n satisfying: m j=1 [Γ] ij = e ⊙ u, where Γ = argmin UBOT(e ⊙ u, T (x i ), v, y j ). ( ) For a fixed T , the vector e * satisfying this system can be found via a fixed-point iteration. In practice, we approximate it instead with a single-step update using the solution to the unscaled problem: Γ ← argmin UBOT(u, T (x i ), v, y j ); e ← Γ1 ⊘ u (11) which empirically provides a good approximation on the optimal e * but is significantly more efficient. Apart from requiring a single update, whenever u and v are uniform (as in most applications where the samples are assumed to be drawn i.i.d.) solving this problem between unscaled histograms will be faster and more stable than solving its scaled (and therefore likely non-uniform) counterpart in (10). Analogously, for a given S, we choose ζ that ensures (Proj 2 ) ♯ [γ * UB (S ♯ (ζ • ν), µ)] = S ♯ (ζ • ν). For empirical measures, this yields the update:  Γ ← argmin UBOT(v, S(y j ), u, x i ); z ← Γ1 ⊘ v; e i ← j Γij ij Γij • n /* Marginal fitting update Eq. (12) */ Γ 2 ← unbalanced.sinkhorn(x, 1 m 1 m , x, 1 n 1 n ) z i ← j Γij ij Γij • m J(θ g , θ f ) = 1 n n i=1 e i [f (∇g(x i )) -⟨x i , ∇g(x i )⟩] -1 m m j=1 z j f (y j ) L η (θ η ) = MSE(e, η(x)) L ζ (θ ζ ) = MSE(z, ζ(y)) Update θ g to minimize J, θ η to minimize L η , θ ζ to minimize L ζ , and θ f to maximize J In order to be able to predict mass changes for new samples, we will use the discrete e, z to fit continuous versions of η, ζ via functions parameterized as neural networks trained to achieve η(x i ) ≈ e i ∀i ∈ {1, . . . , n} and ζ(y j ) ≈ z j ∀j ∈ {1, . . . , m} through a mean squared error loss. Updating mappings. Since η, ζ are tasked with modeling all mass rescalings, for fixed rescaled measures μ and ν we can model the transformation between them with a usual (balanced) OT formulation, whereby we seek T and S to be a pair of optimal (determinstic) OT maps between them. In particular, we use the formulation of (Makkuva et al., 2020) to fit them. That is, T = ∇g and S = ∇f for convex potentials f and g, parameterized as ICNNs with parameters θ f and θ g . The corresponding objective for these two potentials is: L(f, g) = E x∼μ [f (∇g(x)) -⟨x, ∇g(x)⟩] -E y∼ν [f (y)] = X f (∇g(x)) -⟨x, ∇g(x)⟩ η(x) dµ(x) -f (y)ζ(y) dν(y). In the finite sample setting, this objective becomes: L(f, g) = 1 n n i=1 e i [f (∇g(x i )) -⟨x i , ∇g(x i )⟩] - m j=1 z j f (y j ). The optimization procedure is summarized in Algorithm 1. Transforming new samples. After learning f, g, η, ζ, we can use these functions to transform (map and rescale) new samples, i.e., beyond those used for optimization. For a given source datapoint x with mass u, we transform it as (x, u) → (∇g(x), η(x) • u • ζ(∇g(x)) -1 ). Analogously, target points can be mapped back to the source domain using (y, v) → (∇f (y), ζ(y) • v • η(∇f (y)) -1 ). Recovering semi-couplings. Let us define Γ1 def. = diag(e -1 ) ⊤ Γ 1 and Γ2 def. = diag(z -1 ) ⊤ Γ 2 , where Γ 1 , Γ 2 are the solutions of the UBOT problems computed in Algorithm 1 (lines 7 and 9, respectively). It is easy to see that ( Γ1 , Γ⊤ 2 ) is a valid pair of semi-couplings between µ and ν (cf. Eq. 8). < l a t e x i t s h a 1 _ b a s e 6 4 = " m y q H 2 U D h 3 o I C v o z A g g P s 1 n K i 9 m 8 = " > A A A B 9 X i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 B I v g q S Q i 6 r H o x Z N W 6 B e 0 s W y 2 k 3 b p Z h N 2 J 2 o J / R 9 e P C j i 1 f / i z X / j t s 1 B W x 8 M P N 6 b Y W a e H w u u 0 X G + r d z S 8 s r q W n 6 9 s L G 5 t b 1 T 3 Baselines. To put NUBOT's performance into perspective, we compare it to several baselines: First, we consider a balanced neural optimal transport method CELLOT (Bunne et al., 2021) , based on the neural dual formulation of Makkuva et al. (2020) . Further, we benchmark NUBOT against the current state-of-the-art UBOT GAN, an unbalanced OT formulation proposed by Yang & Uhler (2019), which simultaneously learns a transport map and a scaling factor for each source point in order to account for variation of mass. Additionally, we consider two naive baselines: IDENTITY, simulating the identity matching and modeling cell behavior in absence of a perturbation, and OBSERVED, a random permutation of the observed target samples and thus a lower bound when comparing predictions to observed cells on the distributional level. Further, we consider DISCRETE OT, the entropy-regularized Wasserstein mapping returned by the Sinkhorn algorithm on finite source and target samples. As this method does not parameterize the transport map, we need to include the test cells while computing the optimal coupling and can therefore not be considered out-of-sample. GAUSSIAN APPROX computes a Gaussian approximation of the source and target samples separately, and uses the closed-form solution of the entropy-regularized optimal transport problem on unbalanced Gaussians (Janati et al., 2020b) for mapping. More details can be found in the Appendix §D.2. N 1 r 6 C h R D O o s E p F q + V S D 4 B L q y F F A K 1 Z A Q 1 9 A 0 x 9 e T f z m A y j N I 1 n D U Q x e S P u S B 5 x R N N J 9 B + E J N U t v E v + 2 N u 4 W S 0 7 Z m c J e J G 5 G S i R D t V v 8 6 v Q i l o Q g k Q m q d d t 1 Y v R S q p A z A e N C J 9 E Q U z a k f W g b K m k I 2 k u n V A i c V N S Q C k q H f u r 3 Q 1 p H D A J V B C t W 6 4 T g Z c Q B Z w K N s 6 1 Y 8 0 i Q o e k z 1 q G S h I w 7 S X T 0 8 f 4 0 C h d 3 A u V K Q l 4 q v 6 e S E i g 9 S j w T W d A Y K D n v Y n 4 n 9 e K o X f m J V x G M T B J Z 4 t 6 s c A Q 4 k k O u M s V o y B G h h C q u L k V 0 w F R h I J J K 2 d C c O d f X i T 1 4 6 J 7 U i z d l Q r l i

4.1. SYNTHETIC DATA

Populations are often heterogeneous and consist of different subpopulations. To simulate such heterogeneous intervention responses which exhibit changes in their particle counts, we generate a dataset containing a two-dimensional mixture of Gaussians with three clusters in the source distribution µ. The target distribution ν consists of the same three clusters, but with different cluster proportions. Further, each particle has undergone a constant shift in space upon intervention. We consider three scenarios with increasing imbalance between the three clusters (see Fig. 2a-c ). We evaluate NUBOT on the task of predicting the distributional shift from source to target, while at the same time correctly rescaling the clusters such that no mass is transported across non-corresponding clusters. Results. The results (setup, predicted Monge maps and weights) are displayed in Fig. 2 . Both NUBOT and UBOT GAN correctly map the points to the corresponding target clusters without transporting mass across clusters. NUBOT also accurately models the change in cluster sizes by predicting the correct weights for each point. In contrast, UBOT GAN only captures the general trend of cluster growth and shrinkage, but does not learn the exact weights required to re-weight the cluster proportions appropriately. The exact setup and calculation of weights can be found in §B (see Table 1 ), as well as an evaluation of the robustness of NUBOT w.r.t. several hyperparamaters (see Fig. 7 ) and a comparison to UBOT (see Fig. 8, 9 ).

4.2. SINGLE-CELL PERTURBATION RESPONSES

Through the measurement of genomic, transcriptomic, proteomic or phenotypic profiles of cells, and the identification of different cell types and cellular states based on such measurements, biologists 8h Timestep 24h Timestep < l a t e x i t s h a 1 _ b a s e 6 4 = " m y q H 2 U D h 3 o I C v o z A g g P s 1 n K i 9 m 8 = " > A A A B 9 X i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 B I v g q S Q i 6 r H o x Z N W 6 B e 0 s W y 2 k 3 b p Z h N 2 J 2 o J / R 9 e P C j i 1 f / i z X / j t s 1 B W x 8 M P N 6 b Y W a e H w u u 0 X G + r d z S 8 s r q W n 6 9 s L G 5 t b 1 T 3 N 1 r 6 C h R D O o s E p F q + V S D 4 B L q y F F A K 1 Z A Q 1 9 A 0 x 9 e T f z m A y j N I 1 n D U Q x e S P u S B 5 x R N N J 9 B + E J N U t v E v + 2 N u 4 W S 0 7 Z m c J e J G 5 G S i R D t V v 8 6 v Q i l o Q g k Q m q d d t 1 Y v R S q p A z A e N C J 9 E Q U z a k f W g b K m k I 2 k u n V 4 / t I 6 P 0 7 C B S p i T a U / X 3 R E p D r U e h b z p D i g M 9 7 0 3 E / 7 x 2 g s G F l 3 I Z J w i S z R Y F i b A x s i c R 2 D 2 u g K E Y G U K Z 4 u Z W m w 2 o o g x N U A U T g j v / 8 i J p n J T d s / L p 3 W m p c p n F k S c H 5 J A c E 5 e c k w q 5 J l V S J 4 w o 8 k x e y Z v 1 a L 1 Y 7 9 b H r D V n Z T P 7 5 A + s z x / j Q Z L G < / l a t e x i t > NubOT < l a t e x i t s h a 1 _ b a s e 6 4 = " u p C h 0 K 5 O f + A 5 k a J i t e C 8 + S a X B V o = " > A A A C A X i c b V B N S 8 N A E N 3 U r 1 q / o l 4 E L 4 t F 8 F Q S K e q x 6 k G P F e w H t K F s t p t 2 6 W Y T d i f S E u r F v + L F g y J e / R f e / D d u 2 x y 0 9 c H A 4 7 0 Z Z u b 5 s e A a H O f b y i 0 t r 6 y u 5 d c L G 5 t b 2 z v 2 7 l 5 d R 4 m i r E Y j E a m m T z Q T X L I a c B C s G S t G Q l + w h j + 4 n v i N B 6 Y 0 j + Q 9 j G L m h a Q n e c A p A S N 1 7 I M 2 s C F o m t 6 Q R G t O J L 6 M Y x U N x x 2 7 6 J S c K f A i c T N S R B m q H f u r 3 Y 1 o E j I J V B C t W 6 4 T g 5 c S B Z w K N i 6 0 E 8 1 i Q g e k x 1 q G S h I y 7 a X T D 8 b 4 2 C h d H E T K l A Q 8 V X 9 P p C T U e h T 6 p j M k 0 N f z 3 k T 8 z 2 s l E F x 4 K Z d x A k z S 2 a I g E R g i P I k D d 7 l i F M T I E E I V N 7 d i 2 i e K U D C h F U w I 7 v z L i 6 R + W n L P S u W 7 c r F y l c W R R 4 f o C J 0 g F 5 2 j C r p F V V R D F D 2 i Z / S K 3 q w n 6 8 V 6 t z 5 m r T k r m 9 l H f 2 B 9 / g A 9 Q p d o < / l a t e x i t > Gaussian Approx < l a t e x i t s h a 1 _ b a s e 6 4 = " n a B f R b Z K o 3 y Z P x J 4 C 4 I Y M w F I P o M = " > A A A B / X i c b V D J S g N B E O 2 J W 4 x b X G 5 e G o P g K c x I U I 9 B P X g z Q j Z I h t D T q S R N e h a 6 a 8 Q 4 B H / F i w d F v P o f 3 v w b O 8 k c N P F B w e O 9 K q r q e Z E U G m 3 7 2 8 o s L a + s r m X X c x u b W 9 s 7 + d 2 9 u g 5 j x a H G Q x m q p s c 0 S B F A D Q V K a E Y K m O 9 J a H j D q 4 n f u A e l R R h U c R S B 6 7 N + I H q C M z R S J 3 / Q R n h A z Z N r o b k C B H p b H X f y B b t o T 0 E X i Z O S A k l R 6 e S / 2 t 2 Q x z 4 E y C X T u u X Y E b o J U y i 4 h H G u H W u I G B + y P r Q M D Z g P 2 k 2 m 1 4 / p s V G 6 t B c q U w H S q f p 7 I m G + 1 i P f M 5 0 + w 4 G e 9 y b i f 1 4 r x t 6 F m 4 g g i h E C P l v U i y X F k E 6 i o F 2 h g K M c G c K 4 E u Z W y g d M M Y 4 m s J w J w Z l / e Z H U T 4 v O W b F 0 V y q U L 9 M 4 s u S Q H J E T 4 p B z U i Y 3 p E J q h J N H 8 k x e y Z v 1 Z L 1 Y 7 9 b H r D V j p T P 7 5 A + s z x + v r 5 V h < / l a t e x i t > Discrete OT < l a t e x i t s h a 1 _ b a s e 6 4 = " V P L e o y Y I u T t m z 5 h J x 9 v x C l T H + a E = " > A A A B + n i c b V B N S 8 N A E N 3 U r 1 q / U j 1 6 C R b B U 0 l E 1 G P R i z c r 2 A 9 o Q 9 l s J u 3 S z Q e 7 k 2 q J / S l e P C j i 1 V / i z X / j t s 1 B W x 8 M P N 6 b Y W a e l w i u 0 L a / j c L K 6 t r 6 R n G z t L W 9 s 7 t n l v e b K k 4 l g w a L R S z b H l U g e A Q N 5 C i g n U i g o S e g 5 Q 2 v p 3 5 r B F L x O L r H c Q J u S P s R D z i j q K W e W e 4 i P K J i 2 a 2 n Q I 7 A n / T M i l 2 1 Z 7 C W i Z O T C s l R 7 5 l f X T 9 m a Q g R M k G V 6 j h 2 g m 5 G J X I m Y F L q p g o S y o a 0 D x 1 N I x q C c r P Z 6 R P r W C u + F c R S V 4 T W T P 0 9 k d F Q q X H o 6 c 6 Q 4 k A t e l P x P 6 + T Y n D p Z j x K U o S I z R c F q b A w t q Y 5 W D 6 X w F C M N a F M c n 2 r x Q Z U U o Y 6 r Z I O w V l 8 e Z k 0 T 6 v O e f X s 7 q x S u 8 r j K J J D c k R O i E M u S I 3 c k D p p E E Y e y D N 5 J W / G k / F i v B s f 8 9 a C k c 8 c k D 8 w P n 8 A D k W U h w = = < / l a t e x i t > Observed < l a t e x i t s h a 1 _ b a s e 6 4 = " v M 7 R H g 9 z o k n i q M O t B 7 s 7 u w M C b U I = " > A A A B + n i c b V D L S s N A F J 3 U V 6 2 v V J d u B o v g q i R S 1 G X R j e 4 q 2 A e 0 o U w m N + 3 Q y S T M T N Q S + y l u X C j i 1 i 9 x 5 9 8 4 b b P Q 1 g M D h 3 P u 4 d 4 5 f s K Z 0 o 7 z b R V W V t f W N 4 q b p a 3 t n d 0 9 u 7 z f U n E q K T R p z G P Z 8 Y k C z g Q 0 N d M c O o k E E v k c 2 v 7 o a u q 3 7 0 E q F o s 7 P U 7 A i 8 h A s J B R o o 3 U t 8 s 9 D Y 9 a 0 e w m A G H y 4 0 n f r j h V Z w a 8 T N y c V F C O R t / + 6 g U x T S O T p 5 w o 1 X W d R H s Z k Z p R D p N S L 1 W Q E D o i A + g a K k g E y s t m p 0 / w s V E C H M b S P K H x T P 2 d y E i k 1 D j y z W R E 9 F A t e l P x P 6 + b 6 v D C y 5 h I U g 2 C z h e F K c c 6 x t M e c M A k U M 3 H h h A q m b k V 0 y G R h G r T V s m U 4 C 5 + e Z m 0 T q v u W b V 2 W 6 v U L / M 6 i u g Q H a E T 5 K J z V E f X q I G a i K I H 9 I x e 0 Z v 1 Z L 1 Y 7 9 b H f L R g 5 Z k D 9 A f W 5 w 8 m U J S X < / l a t e x i t > Identity < l a t e x i t s h a 1 _ b a s e 6 4 = " b O 8 U U E J g B 9 K J E Q c p w O c / f Q b f V c c = " > A A A B + H i c b V B N S 8 N A E N 3 U r 1 o / G v X o J V g E T y U R U Y / F X r x Z o V / Q h r L Z T t u l m 0 3 Y n Y g 1 9 J d 4 8 a C I V 3 + K N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Q S y 4 R t f 9 t n J r 6 x u b W / n t w s 7 u 3 n 7 R P j h s 6 i h R D B o s E p F q B 1 S D 4 B I a y F F A O 1 Z A w 0 B A K x h X Z 3 7 r A Z T m k a z j J A Y / p E P J B 5 x R N F L P L n Y R H l G z t A p C 3 N W n P b v k l t 0 5 n F X i Z a R E M t R 6 9 l e 3 H 7 E k B I l M U K 0 7 n h u j n 1 K F n A m Y F r q J h p i y M R 1 C x 1 B J Q 9 B + O j 9 8 6 p w a p e 8 M I m V K o j N X f 0 + k N N R 6 E g a m M 6 Q 4 0 s v e T P z P 6 y Q 4 u P Z T L u M E Q b L F o k E i H I y c W Q p O n y t g K C a G U K a 4 u d V h I 6 o o Q 5 N V w Y T g L b + 8 S p r n Z e + y f H F / U a r c Z H H k y T E 5 I W f E I 1 e k Q m 5 J j T Q I I w l 5 J q / k z X q y X q x 3 6 2 P R m r O y m S P y B 9 b n D w 9 v k 1 w = < / l a t e x i t > CellOT < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 I L R Z H l a 5 b 2 z E Z p Z k P w Z y 8 b u w M 8 = " > A A A B + n i c b V B N T 8 J A E N 3 i F + J X 0 a O X j c T E E 2 k N U Y + o B z 0 p J n w l 0 J D t s s C G 7 b b Z n a q k 8 l O 8 e N A Y r / 4 S b / 4 b F + h B w Z d M 8 v L e T G b m + Z H g G h z n 2 8 o s L a + s r m X X c x u b W 9 s 7 d n 6 3 r s N Y U V a j o Q h V 0 y e a C S 5 Z D T g I 1 o w U I 4 E v W M M f X k 7 8 x j 1 T m o e y C q O I e Q H p S 9 7 j l I C R O n a + D e w R N E 1 i / 7 a K r 8 5 v x h 2 7 4 B S d K f A i c V N S Q C k q H f u r 3 Q 1 p H D A J V B C t W 6 4 T g Z c Q B Z w K N s 6 1 Y 8 0 i Q o e k z 1 q G S h I w 7 S X T 0 8 f 4 0 C h d 3 A u V K Q l 4 q v 6 e S E i g 9 S j w T W d A Y K D n v Y n 4 n 9 e K o X f m J V x G M T B J Z 4 t 6 s c A Q 4 k k O u M s V o y B G h h C q u L k V 0 w F R h I J J K 2 d C c O d f X i T 1 4 6 J 7 U i z d l Q r l i z S O L N p H B + g I u e g U l d E 1 q q A a o u g B P a N X 9 G Y 9 W S / W u / U x a 8 1 Y 6 c w e + g P r 8 w f Z z 5 O 9 < / l a t e x i t > ubOT GAN Figure 3: Distributional fit of the predicted perturbed cell states to the observed perturbed cell states for each drug and timestep, measured by a weighted version of kernel MMD on a set of held-out control cells. For NUBOT and UBOT GAN, MMD is weighted by the predicted weights, while for the other baselines it is computed with uniform weights. OBSERVED corresponds to a random permutation of the observed control cells, i.e., its distribution is approximately the same as the observed cells. have revealed perturbation responses and response mechanism which would have remained obscured in bulk analysis approaches (Green & Pelkmans, 2016; Liberali et al., 2014; Kramer et al., 2022) . However, single-cell measurements typically require the destruction the cells in the course of recording. Thus, each measurement provides us only with a snapshot of cell populations, i.e., samples of probability distribution that is evolving in the course of the perturbation, from control µ (source) to perturbed cell states ν (target). Using NUBOT and the considered baselines, we learn a map T that reconstructs how individual cells respond to a treatment. The effect of a single perturbation frequently varies depending on the cell type or cell state, and may include the induction of cell death or proliferation. In the following, we will evaluate if NUBOT is able to capture and predict heterogeneous proliferation and cell death rates of two co-cultures melanoma cell liens through η and ζ in response to 25 drug treatments. The single-cell measurements used for this task were generated using the imaging technology 4i (Gut et al., 2018) over the course of 24 hours, resulting in three different unaligned snapshots (t = 0h, t = 8h and t = 24h) for each of the drug treatments. The control cells, i.e., the source distribution µ, consists of cells taken from a mixture of melanoma cell lines at t = 0h that are exposed to a dimethyl sulfoxide (DMSO) as a vehicle control. Futher, We consider two different target populations ν capturing the perturbed populations after t = 8h and t = 24h of treatment, respectively. As both cancer cell lines exhibit different sensitivities to the drugs (Raaijmakers et al., 2015) , their proportion (Fig. 15 ) as well as the total cell counts (Fig. 17 ) vary over the time points. Both cell lines are characterized by the expression of mutually exclusive protein markers, i.e., one cell line strongly expresses a set of proteins detected by an antibody called MelA (MelA + cell type), while the other is characterized by high levels of the protein Sox9 (Sox9 + cell type). An evaluation of this cell line annotation can be found in Fig. 14 (8h) and Fig. 16 (24h) . As no ground truth matching is available, we use insights from the number of cells after 8 and 24 hours of treatment (Fig. 15 , 17), as well as the cell type annotation for each cell to further evaluate NUBOT's performance. A detailed description of the dataset can be found in § C.2. Results. We split the dataset into a train and test set and train NUBOT as well as the baselines on unaligned unperturbed (control) and perturbed cell populations for each drug. During evaluation, we then predict out-of-sample the perturbed cell state from held-out control cells. Details on the network architecture and hyperparameters can be found in § D.3. NUBOT and UBOT GAN additionally predict the weight associated with the perturbed predicted cells, giving insights into which cells have proliferated or died in response to the drug treatments. First, we compare how well each method fits the observed perturbed cells on the level of the entire distribution. For this, we measure the weighted version of kernel maximum mean discrepancy (MMD) between predictions and observations. More details on the evaluation metrics can be found in § D.1. The results are displayed in Fig. 3 . We additionally report the weighted Wasserstein distance in Fig. 10 in Appendix §B. NUBOT outperforms all baselines in almost all drug perturbations, showing its effectiveness in predicting OT maps and local variation in mass.

Summed Weights

In the absence of a ground truth and in particular, given our inability to measure (i.e., observe) cells which have died upon treatment, we are required to base further analysis of NUBOT's predictions on changes in cell count for each subpopulation (MelA + , Sox9 + ). Fig. 15 clearly shows that drug treatments lead to substantially different cell numbers for each of the subpopulations compared to control. For example, Ulixertinib leads to the proliferation of both subpopulations after 8h, but to pronounced cell death in Sox9 + and strong proliferation in MelA + cells after 24h. We thus expect, that weights predicted by NUBOT for all drugs correlate with the change in cell counts for each cell type (here measured as population fractions). This is indeed the case, Fig. 4 shows a high correlation between observed cell counts of the two cell types and the sum of the predicted weights of the respective cell types after 8h of treatment for all drugs. After 24 hours, treatment-induced cell death (in at least one cell type) by some drugs can be so severe at that the number of observed perturbed cells becomes too low for accurate predictions and the evaluation of the task (Fig. 17 ). Further, we find that drugs influence the abundance of the cell lines markers MelA and Sox9, complicating cell type classification (see Fig. 14, 16 ). We ignore drugs falling into these categories and find that whilst the correlation between predicted weights and observed cell counts is reduced after 24h (see Fig. 4a ), NUBOT still captures the overall trend. The data further provides insights into biological processes such as apoptosis, a form of programmed cell death induced by enzymes called Caspases (ClCasp3). While dead cells become invisible in the cell state space (they cannot be measured), dying cells are still present in the observed perturbed sample and can be recognized by high levels of ClCasp3 (the apopotosis markers). Conversely, the protein Ki67 marks proliferating cells. Analyzing the correlation of ClCasp3 and Ki67 intensity with the predicted weights provides an additional assessment of the biological meaningfulness of our results. For example, upon Ulixertinib treatment, the absolute cell counts show an increase in Sox9 + cells, and a decline of MelA + cells at 24h (Fig. 15 ). Fig. 5 shows UMAP projections of the control cells at both time points, colored by the observed and predicted protein marker values and the predicted weights. At 8h, NUBOT predicts only little change in mass, but a few proliferative cells with high weights in areas which are marked by high values of the proliferation marker Ki67. At 24h, our model predicts cell death in the Sox9 + (MelA -) cell type, and proliferation in the MelA + cell type, which matches the observed changes in cell counts per cell type, seen in Fig. 15 in § B. We identify similar results for Trametinib (Fig. 11 ), Ixazomib (Fig. 12 ), and Vindesine (Fig. 13 ) which can be found in § B. These experiments thus demonstrate that NUBOT accurately predicts heterogeneous drug responses at the single-cell level, capturing both, cell proliferation and death. 

5. CONCLUSION

This work presents a novel formulation of the unbalanced optimal transport problem that bridges two previously disjoint perspectives on the topic: a theoretical one based on semi-couplings and a practical one based on recent neural estimation of OT maps. The resulting algorithm, NUBOT, is scalable, efficient, and robust. Yet, it is effective at modeling processes that involve population growth or death, as demonstrated through various experimental results on both synthetic and real data. On the challenging single-cell perturbation task, NUBOT is able to successfully predict perturbed cell states, while explicitly modeling death and proliferation. Explicitly modeling proliferation and death at the single-cell level as part of the drug response, allows to link cellular properties observed prior to drug treatment to therapy outcomes. Thus, the application of NUBOT in the fields of drug discovery and personalized medicine could be of great implications, as it allows to identify cellular properties predictive of drug efficacy.

APPENDIX A RELATED WORK

In the following, we provide further information and review related literature on concepts discussed throughout this work. A.1 UNBALANCED OPTIMAL TRANSPORT Unbalanced optimal transport is a generalization of the classical OT formulation (1), and as such allows mass to be created and destroyed throughout the transport. This relaxation has found recent use cases in various domains ranging from biology (Schiebinger et al., 2019;  Yang & Uhler, 2019), imaging (Lee et al., 2019) , shape registration (Bonneel & Coeurjolly, 2019) , domain adaption (Fatras et al., 2021) , positive-unlabeled learning (Chapel et al., 2020) , to general machine learning (Janati et al., 2020a; Frogner et al., 2015) . Problem (6) provides a general framework of the unbalanced optimal transport problem, which can recover related notions introduced in the literature: Choosing for D f the Kullback-Leibler divergence, one recovers the so-called squared Hellinger distance, with D f = ℓ 2 norm, we arrive at Benamou (2003) , and an ℓ 1 norm retrieves a concept often referred to robust OT (Mukherjee et al., 2020) with connections to a concept known as partial OT (Figalli, 2010) . The latter comprises approaches which do not rely on a relaxation of the marginal constraints as in ( 6). In particular, some strategies of partial OT expand the original problem by adding virtual mass to the marginals (Pele & Werman, 2009; Caffarelli & McCann, 2010; Gramfort et al., 2015) , or by extending the OT map by dummy rows and columns (Sarlin et al., 2020) onto which excess mass can be transported. The semi-coupling formulation (also known as Wasserstein-Fisher-Rao distance) has been independently proposed by Liero et al. (2016; 2018) , and recently been connected to shape analysis of surfaces (Bauer et al., 2022) . Recent work has furthermore developed alternative computational schemes (Chapel et al., 2021; Séjourné et al., 2022b) as well as provided a computational complexity analysis (Pham et al., 2020) of the generalized Sinkhorn algorithm solving entropic regularized unbalanced OT (Chizat et al., 2018a) . Besides Yang & Uhler (2019), these approaches do not provide parameterizations of the unbalanced problem and allow for an out-of-sample generalization which we consider in this work. The notion of unbalanced optimal transport has been further generalized to the multi-marginal setting Beier et al. (2022) as well as for Gromov-Wasserstein (Séjourné et al., 2021) . Janati et al. (2020b) further provide a closed-form solution of entropic optimal transport between unbalanced Gaussian measures. For a complete review, we refer the reader to Séjourné et al. (2022a) as well as (Peyré & Cuturi, 2019, Chapter 10.2) . It becomes evident from the discussion above that unbalanced optimal transport is better thought of as a family of problems, rather than a single specific one. Unbalanced OT arises in any setting where one seeks a notion of distance or correspondence across datasets/populations and (i.) the populations are not normalized in the same way and/or (ii.) one does not want to rigidly enforce full correspondence between the datasets (e.g., to minimize the effect of outliers or because there is reason to believe some datapoints do not occur in both datasets). This work proposes a method to parameterize an UBOT problem which does not necessarily match the typical Kantorovich entropy-regularized formulation. We seek a solution that learns which part of the distribution shrinks and which one exhibits overall growth, motivated by heterogeneous sobpopulation structures present in single-cell biology. To that end, NUBOT aims at modeling birth and death dynamics throughout the distribution shift recovered through an optimal transport mapping.

A.2 CYCLE-CONSISTENT LEARNING

The principle of cycle-consistency has been widely used for learning bi-directional transformations between two domains of interest. Cycle-consistency thereby assumes that both the forward and backward mapping are roughly inverses of each other. In particular, given unaligned points x ∈ X and y ∈ Y, as well as maps g : X → Y and f : Y → X , cycle-consistency reconstruction losses enforce ∥xf (g(x))∥ as well as ∥yg(f (y))∥ using some notion of distance ∥ • ∥, assuming that there exists such a ground truth bijection g = f -1 and f = g -1 . The advantage of validating good matches by cycling between unpaired samples becomes evident through the numerous use cases to which cycle-consistency has been applied: Originally introduced within the field of computer vision (Kalal et al., 2010) and applied to image-to-image translation tasks (Zhu et al., 2017a) , it has been quickly adapted to multi-modal problems (Zhu et al., 2017b) , domain adaptation (Hoffman et al., 2018) , and natural language processing (Shen et al., 2017) . The original principle has been further generalized to settings requiring a many-to-one or surjective mapping between domains (Guo et al., 2021) via conditional variational autoencoders, dynamic notions of cycle-consistency (Zhang et al., 2021) , or to time-varying applications (Dwibedi et al., 2019) . These classical approaches enforce cycle-consistency by explicitly composing both maps and penalizing for any deviation from this bijection. In this work, we treat cycle-consistency differently. It is enforced implicitly by coupling the two distributions of interest through a sequence of reversible transformations: re-weighting, transforming, and re-weighting (Eq. ( 9) and Fig. 1 ). Similarly to our work, Zhang et al. ( 2022) and Hur et al. ( 2021) establish a notion of cycle-consistency (reversibility) for a pair of pushforward operators to align two unpaired measures. Both methods rely on the Gromov-Monge distance (Mémoli & Needham, 2022) , a divergence to compare probability distributions defined on different ambient spaces X and Y-a setting not considered in this work. They proceed by defining a reversible metric through replacing the single Monge map by a pair of two Monge maps, i.e., f : X → Y and g : Y → X , minimizing the objective GM(µ, ν) := inf f :X →Y,f ♯ µ=ν g:Y →X ,g ♯ ν=µ ∆ p X (f ; µ) + ∆ p Y (g; ν) + ∆ p X ,Y (f, g; µ, ν), ∆ p X (f ; µ) = (E [|c X (x, x ′ ) -c Y (f (x), f (x ′ ))| p ]) 1 p ∆ p Y (g; ν) = (E [|c X (y, y ′ ) -c Y (g(y), g(y ′ ))| p ]) 1 p ∆ p X ,Y (f, g; µ, ν) = (E [|c X (x, g(y)) -c Y (f (x), y)| p ]) 1 p . Problem ( 14) shows similarities to the classical cycle-consistency objective of Zhu et al. (2017a) , where cycle-consistency is indirectly enforced through ∆ p X ,Y . Zhang et al. ( 2022) parameterize both Monge maps through neural networks in a similar fashion as done in (Yang & Uhler, 2019; Fan et al., 2021a) . Our approach differs from Zhang et al. (2022) ; Hur et al. (2021) as we model the problem through a single Monge map with duals f, g, allowing us to map back-and-forth between measures µ and ν, and using a different parametrization approach (ICNNs). More importantly, the approaches presented by Zhang et al. (2022) ; Hur et al. (2021) do not generalize to the unbalanced case. While Zhang et al. (2022) proposed an unbalanced version of ( 14) by relaxing the marginals as done in Chizat et al. (2018a) , they require the unbalanced sample sizes to be known (i.e., n and m need to be fixed). In our application of interest, particle counts of the target population are, however, not known a priori.

A.3 CONVEX NEURAL ARCHITECTURES

Input convex neural networks (Amos et al., 2017) are a class of neural networks that approximate the family of convex functions ψ with parameters θ, i.e., whose outputs ψ θ (x) are convex w.r.t. an input x. This property is realized by placing certain constraints on the networks parameters θ. More specifically, an ICNN is an L-layer feed-forward neural network, where each layer l = {0, ..., L -1} is given by z l+1 = σ l (W x l x + W z l z l + b l ) and ψ θ (x) = z L , ( ) where σ k are convex non-decreasing activation functions, and θ = {W x l , W z l , b l } L-1 l=0 is the set of parameters, with all entries in W z l being non-negative and the convention that z 0 and W z 0 are 0. As mentioned above and through the connection established in § 2, convex neural networks have been utilized to approximate Monge map T (3) via the convex Brenier potential ψ connected to the primal and dual optimal transport problem. In particular, it has been used to model convex dual functions (Makkuva et al., 2020) as well as normalizing flows derived from convex potentials (Huang et al., 2021) . The expressivity and universal approximation properties of ICNNs have been further studied by Chen et al. (2019) , who show that any convex function over a compact convex domain can be approximated in sup norm by an ICNN. To improve convergence and robustness of ICNNs -known to be notoriously difficult to train (Richter-Powell et al., 2021) -different initialization schemes have been proposed: Bunne et al. (2022b) derive two initialization schemes ensuring that upon initialization ∇ψ mimics an affine Monge map T mapping either the source measure onto itself Table 1 : Setup of the synthetic mixture of Gaussians dataset, showing the proportions of the three clusters in source and target distribution in three different settings (a., b., c.) as well as the required scaling factor per cluster needed to match the target without transporting points to non-corresponding clusters. The last two columns show the mean weights obtained by NUBOT and UBOT GAN. 1 shows the shares of the three clusters in the source and target distributions. In order to match the target distribution without transporting mass across non-corresponding clusters, the clusters have to be re-scaled with the factors presented in column 'True Scaling Factor'. The last two columns show the mean weights per cluster obtained by NUBOT and UBOT GAN, respectively. UBOT GAN captures only the general trend in growth and shrinkage, the exact weights do not scale the cluster proportions appropriately. In contrast, the weights obtained by NUBOT match the required scaling factors very closely. Fig. 6 , shows the weighted MMD between the source distribution and the target distribution, confirming superior performance of NUBOT. In the following, we evaluate NUBOT's sensitivity to hyperparameter choices. For this we screen various parameter ranges of the batch size (bs), as well as hyperparameters of the generalized Sinhorn algorithm (Chizat et al., 2018a) to solve the UBOT (see Algorithm 1 l. 5 and 7), i.e., the entropy regularization parameters (reg), and relaxation penalties (reg m ). All considered hyperparameters can be found in the legend of Fig. 7 . The base model thereby denotes the final hyperparameters chosen for all experiments conducted in this work. We conduct this analysis on the synthetic data setup introduced in § 4.1 and Figure 2a and b, i.e., a mixture of three Gaussians exhibiting different changes in particle count with known ground truth. We compare the obtained weights of each hyperparameter configuration for each setups (a. and b.). The results displayed in Fig. 7 (left panels) demonstrate that the correct weights are robustly learned for all (in setting a.) and most (in setting b.) hyperparameter choices. In setup b., the weights deviate in particular for high values of reg m (≥ 0.5). Fig. 7 (right panels) further show the correlation between the obtained weights of the base model (i.e., the parameters chosen for all experiments) with all other hyperparameter configurations.

Setting

Setting a.

Setting b.

Figure 7 : Hyperparameter screen of the NUBOT model. We screen various different hyperparameter configurations different to the base model (the one used for the experiments above) and evaluated them for different synthetic data settings (a. and b.). (left panels) Distribution of learned weights obtained for each hyperparameter setting and (right panels) correlation of the weights for all hyperparameter configurations compared to the base setup.

B.1.2 COMPARISON TO UBOT

NUBOT provides a parameterization of an unbalanced optimal transport problem. At each training step, NUBOT executes the generalized Sinkhorn algorithm (Chizat et al., 2018a; Cuturi, 2013; Benamou et al., 2015) as a subroutine to obtain estimates of the weights e and z. To compare the solution of NUBOT to the mapping computed by the UBOT problem in (7), we further analyze the obtained solutions and hyperparameter sensitivities of UBOT itself, i.e., not integrated within NUBOT. In particular, we compare different relaxation penalties τ = τ 1 = τ 2 (see ( 7)), and different choices of the entropy regularization ϵ. We consider the same synthetic data settings (a., b.) of Gaussian mixtures, as introduced in § 4 and shown in Fig. 2 , where the three clusters grow/shrink at different rates which are known. We show the couplings and weights obtained from UBOT in Fig. 8 (setting a.) and Fig. 9 (setting b.), for different regularization (ϵ) and relaxation penalty (τ ) parameters. The weights per source point are thereby computed by summing over the columns in the coupling matrix. To get a weight value relative to all other weights, we additionally normalize these weights by the total sum of the coupling matrix. In most cases UBOT couples points only between corresponding clusters. As τ increases for ϵ = 0.05, mass variation is penalized stronger, and therefore, mass has to be coupled between non-corresponding clusters as well in order to fit the marginals. The non-normalized weights exhibit a significantly higher variation between different values of τ , especially for ϵ = 0.05. Comparing the average weights per cluster to the true scaling, which are shown in Table 1 and Fig. 2 , we observe that both the non-normalized and normalized weights are matching these values closely for τ ∈ {1, 10} and ϵ = 0.005. For ϵ = 0.05 and small values of τ , the coupling is still only mapping points between corresponding clusters. The weights, however, are not matching the true underlying scalings. < l a t e x i t s h a 1 _ b a s e 6 4 = " p x C s l j L 5 7) . For each parameter setup, we show the coupling from source (red) to target (blue), where the strength of each line is proportional to its value in the coupling matrix (left pane), the weights (middle pane) and the normalized weights (right pane). The weight per point is computed as the sum over the columns of the coupling matrix. We additionally show the average weight per cluster. + 1 Z n k m p Q I d J j u i E L o I U = " > A A A B 7 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 V 7 A e 0 o W y 2 k 3 b p Z h N 3 N 0 I J / R N e P C j i 1 b / j z X / j t s 1 B W x 8 M P N 6 b Y W Z e k A i u j e t + O 4 W 1 9 Y 3 N r e J 2 a W d 3 b / + g f H j U 0 n G q G D Z Z L G L V C a h G w S U 2 D T c C O 4 l C G g U C 2 8 H 4 d u a 3 n 1 B p H s s H M 0 n Q j + h Q 8 p A z a q z U 6 W G i u Y h l v 1 x x q + 4 c Z J V 4 O a l A j k a / / N U b x C y N U B o m q N Z d z 0 2 M n 1 F l O B M 4 L f V S j Q l l Y z r E r q W S R q j 9 b H 7 v l J x Z Z U D C W N m S h s z V 3 x M Z j b S e R I H t j K g Z 6 W V v J v 7 n d V M T X v s Z l 0 l q U L L F o j A V x M R k 9 j w Z c I X M i I k l l C l u b y V s R B V l x k Z U s i F 4 y y + v k t Z F 1 b u s 1 u 5 r l f p N H k c R T u A U z s G D K 6 j D H T S g C Q w E P M M r v D m P z o v z 7 n w s W g t O P n M M f + B 8 / g B P v 5 A p < / l a t e x i t > ✏ = 0.05 < l a t e x i t s h a 1 _ b a s e 6 4 = " q Z O g C E D w l G r W y k P z j u i / V s T h x V M = " > A A A B 9 H i c b V D L T g J B E J z F F + I L 9 e h l I j H x R H Y N U Y 9 E L x 4 x k U c C G z I 7 N D B h d n a d 6 S W S D d / h x Y P G e P V j v P k 3 D r A H B S v p p F L V n e 6 u I J b C o O t + O 7 m 1 9 Y 3 N r f x 2 Y W d 3 b / + g e H j U M F G i O d R 5 J C P d C p g B K R T U U a C E V q y B h Y G E Z j C 6 n f n N M W g j I v W A k x j 8 k A 2 U 6 A v O 0 E p + B + E J D U + T I M J p t 1 h y y + 4 c d J V 4 G S m R D L V u 8 a v T i 3 g S g k I u m T F t z 4 3 R T 5 l G w S V M C 5 3 E Q M z 4 i A 2 g b a l i I R g / n R 8 9 p W d W 6 d F + p G 0 p p H P 1 9 0 T K Q m M m Y W A 7 Q 4 Z D s + z N x P + 8 d o L 9 a z 8 V K k 4 Q F F 8 s 6 i e S Y k R n C d C e 0 M B R T i x h X A t 7 K + V D p h l H m 1 P B h u A t v 7 x K G h d l 7 7 J c u a + U q j d Z H H l y Q k 7 J O f H I F a m S O 1 I j d c L J I 3 k m r + T N G T s v z r v z s W j N O d n M M f k D 5 / M H p H W S r g = = < / l a t e x i t > ubot = 0.01 < l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u a g r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t > ⌧ < l a t e x i t s h a 1 _ b a s e 6 4 = " p x C s l j L 5 + 1 Z n k m p Q I d J j u i E L o I U = " > A A A B 7 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 V 7 A e 0 o W y 2 k 3 b p Z h N 3 N 0 I J / R N e P C j i 1 b / j z X / j t s 1 B W x 8 M P N 6 b Y W Z e k A i u j e t + O 4 W 1 9 Y 3 N r e J 2 a W d 3 b / + g f H j U 0 n G q G D Z Z L G L V C a h G w S U 2 D T c C O 4 l C G g U C 2 8 H 4 d u a 3 n 1 B p H s s H M 0 n Q j + h Q 8 p A z a q z U 6 W G i u Y h l v 1 x x q + 4 c Z J V 4 O a l A j k a / / N U b x C y N U B o m q N Z d z 0 2 M n 1 F l O B M 4 L f V S j Q l l Y z r E r q W S R q j 9 b H 7 v l J x Z Z U D C W N m S h s z V 3 x M Z j b S e R I H t j K g Z 6 W V v J v 7 n d V M T X v s Z l 0 l q U L L F o j A V x M R k 9 j w Z c I X M i I k l l C l u b y V s R B V l x k Z U s i F 4 y y + v k t Z F 1 b u s 1 u 5 r l f p N H k c R T u A U z s G D K 6 j D H T S g C Q w E P M M r v D m P z o v z 7 n w s W g t O P n M M f + B 8 / g B P v 5 A p < / l a t e x i t > ✏ = 0.005 = 0.05 < l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u a g r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t > ⌧ = 0.1 < l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u a g r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < /

B.2 SINGLE-CELL PERTURBATION RESPONSES

In addition to the weighted MMD metric shown in Fig. 3 , we evaluated our method on another distributional metric, the weighted Wasserstein distance between the predicted perturbed and observed perturbed cells. We compute it with the Sinkhorn algorithm, whereby for NUBOT and UBOT GAN, we pass the normalized predicted weights as source weights. Results are shown in Fig. 10 . As we lack ground truth for the correspondence of control and perturbed cells, we assess the biological meaningfulness of our predictions by comparing the weights to ClCasp3 and Ki67 intensity, the apoptosis and proliferation markers, respectively. Figures 11, 12 and 13 show UMAP projections computed on control cells for the drugs Trametinib, Ixazomib and Vindesine. In Figure 12 c., d., and Figure 13 c., d., regions of low predicted weights accurately correspond to regions of increased ClCasp3 intensity. Additionally, we compare predicted weights between the two cell types, and contrast them with observed cell counts. < l a t e x i t s h a 1 _ b a s e 6 4 = " p x C s l j L 5 + 1 Z n k m p Q I d J j u i E L o I U = " > A A A B 7 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 V 7 A e 0 o W y 2 k 3 b p Z h N 3 N 0 I J / R N e P C j i 1 b / j z X / j t s 1 B W x 8 M P N 6 b Y W Z e k A i u j e t + O 4 W 1 9 Y 3 N r e J 2 a W d 3 7) . For each parameter setup, we show the coupling from source (red) to target (blue), where the strength of each line is proportional to its value in the coupling matrix (left pane), the weights (middle pane) and the normalized weights (right pane). The weight per point is computed as the sum over the columns of the coupling matrix. We additionally show the average weight per cluster. b / + g f H j U 0 n G q G D Z Z L G L V C a h G w S U 2 D T c C O 4 l C G g U C 2 8 H 4 d u a 3 n 1 B p H s s H M 0 n Q j + h Q 8 p A z a q z U 6 W G i u Y h l v 1 x x q + 4 c Z J V 4 O a l A j k a / / N U b x C y N U B o m q N Z d z 0 2 M n 1 F l O B M 4 L f V S j Q l l Y z r E r q W S R q j 9 b H 7 v l J x Z Z U D C W N m S h s z V 3 x M Z j b S e R I H t j K g Z 6 W V v J v 7 n d V M T X v s Z l 0 l q U L L F o j A V x M R k 9 j w Z c I X M i I k l l C l u b y V s R B V l x k Z U s i F 4 y y + v k t Z F 1 b u s 1 u 5 r l f p N H k c R T u A U z s G D K 6 j D H T S g C Q w E P M M r v D m P z o v z 7 n w s W g t O P n M M f + B 8 / g B P v 5 A p < / l a t e x i t > ✏ = 0.05 < l a t e x i t s h a 1 _ b a s e 6 4 = " q Z O g C E D w l G r W y k P z j u i / V s T h x V M = " > A A A B 9 H i c b V D L T g J B E J z F F + I L 9 e h l I j H x R H Y N U Y 9 E L x 4 x k U c C G z I 7 N D B h d n a d 6 S W S D d / h x Y P G e P V j v P k 3 D r A H B S v p p F L V n e 6 u I J b C o O t + O 7 m 1 9 Y 3 N r f x 2 Y W d 3 b / + g e H j U M F G i O d R 5 J C P d C p g B K R T U U a C E V q y B h Y G E Z j C 6 n f n N M W g j I v W A k x j 8 k A 2 U 6 A v O 0 E p + B + E J D U + T I M J p t 1 h y y + 4 c d J V 4 G S m R D L V u 8 a v T i 3 g S g k I u m T F t z 4 3 R T 5 l G w S V M C 5 3 E Q M z 4 i A 2 g b a l i I R g / n R 8 9 p W d W 6 d F + p G 0 p p H P 1 9 0 T K Q m M m Y W A 7 Q 4 Z D s + z N x P + 8 d o L 9 a z 8 V K k 4 Q F F 8 s 6 i e S Y k R n C d C e 0 M B R T i x h X A t 7 K + V D p h l H m 1 P B h u A t v 7 x K G h d l 7 7 J c u a + U q j d Z H H l y Q k 7 J O f H I F a m S O 1 I j d c L J I 3 k m r + T N G T s v z r v z s W j N O d n M M f k D 5 / M H p H W S r g = = < / l a t e x i t > ubot = 0.01 < l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u a g r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t > ⌧ < l a t e x i t s h a 1 _ b a s e 6 4 = " p x C s l j L 5 + 1 Z n k m p Q I d J j u i E L o I U = " > A A A B 7 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e i F 4 8 V 7 A e 0 o W y 2 k 3 b p Z h N 3 N 0 I J / R N e P C j i 1 b / j z X / j t s 1 B W x 8 M P N 6 b Y W Z e k A i u j e t + O 4 W 1 9 Y 3 N r e J 2 a W d 3 b / + g f H j U 0 n G q G D Z Z L G L V C a h G w S U 2 D T c C O 4 l C G g U C 2 8 H 4 d u a 3 n 1 B p H s s H M 0 n Q j + h Q 8 p A z a q z U 6 W G i u Y h l v 1 x x q + 4 c Z J V 4 O a l A j k a / / N U b x C y N U B o m q N Z d z 0 2 M n 1 F l O B M 4 L f V S j Q l l Y z r E r q W S R q j 9 b H 7 v l J x Z Z U D C W N m S h s z V 3 x M Z j b S e R I H t j K g Z 6 W V v J v 7 n d V M T X v s Z l 0 l q U L L F o j A V x M R k 9 j w Z c I X M i I k l l C l u b y V s R B V l x k Z U s i F 4 y y + v k t Z F 1 b u s 1 u 5 r l f p N H k c R T u A U z s G D K 6 j D H T S g C Q w E P M M r v D m P z o v z 7 n w s W g t O P n M M f + B 8 / g B P v 5 A p < / l a t e x i t > ✏ = 0.005 = 0.05 < l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u a g r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t > ⌧ = 0.1 < l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u a g r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t > ⌧ = 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u a g r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A

C DATASETS

We evaluate NUBOT on several tasks including synthetic data as well as perturbation responses of single cells. In both settings, we are provided with unpaired measures µ and ν and aim to recover map T which describes how source µ transforms into target ν. While in the synthetic data setting we are provided with a ground truth matching, this is not the case for the single-cell data as measuring a cell requires destroying it. In the following, we describe generation and characteristics of both datasets, as well as introduce additional biological insights allowing us to shed light on the learned matching T .

C.1 SYNTHETIC DATA

To evaluate NUBOT in a simple and low-dimensional setup with known ground-truth, we generate synthetic example: We model a source population with clear subpopulation structure through a mixture of Gaussians. Next, we generate a second (target) population aligned to the source population. We then simulate an intervention to which the subpopulations respond differently, including different levels of growth and death. Specifically, we generate batches of 400 samples with three clusters with different proportions before and after the intervention. Table 1 shows the proportions of the three clusters in the source and target distribution, as well as the required weight-factor and the obtained results from NUBOT and UBOT GAN. we then used to predict the cell type labels of the drug treated cells. The procedure was performed separately for the 8h-and 24h dataset. Results of the classification can be found in Figure 14 and Figure 16 respectively.

D EXPERIMENTAL DETAILS

NUBOT consists of several modules and its performance is compared against several baselines. In the following, we provide additional background on experimental details, including a description of the evaluation metrics and baselines considered, as well as further information on the parameterization and hyperparameter choices made for NUBOT.

D.1 EVALUATION METRICS

We evaluate our model by analyzing the distributional similarity between the predicted and observed perturbed distribution. For this, we compute the kernel maximum mean discrepancy (MMD) (Gretton et al., 2012) . We utilize the RBF kernel, and as is usually done, report the MMD as an average over several length scales, i.e., 2, 1, 0.5, 0.1, 0.01, 0.005. To take the mass variation into consideration, we compute a weighted version of MMD, by weighting each predicted point by its associated normalized weight. Additionally, we compute the weighted Wasserstein distance between the predicted and observed perturbed cells (2).

D.2 BASELINES

We compare NUBOT against several baselines, comprising a balanced OT-based method (Bunne et al., 2021, CELLOT) and an unbalanced OT-based method (Yang & Uhler, 2019, NUBOT), i.e., current state-of-the-art methods as well as ablations of our work. We further provide a comparison to the closed-form of entropy-regularized optimal transport on unbalanced Gaussians (Janati et al., 2020b) . In the following, we briefly motivate and introduce each baseline. CELLOT. By introducing reweighting functions η and ζ, NUBOT recovers a balanced problem parameterized by dual potentials f and g. An important ablation study to consider is thus to compare its performance to its balanced counterpart. Ignoring the fact that the original problem includes cell death and growth, and thus varying cell numbers, we apply ideas developed in Makkuva et al. (2020) ; Bunne et al. (2021) and learn a balanced OT problem via duals f and g. These duals are parameterized by two ICNNs and optimized in objective (5) via an alternating min-max scheme. [c 1 (x i , T θ (x i ))ξ ϕ (x i ) + c 2 (ξ ϕ (x i )) + ξ ϕ (x i )h ω (T θ (x i )) -Ψ * (h ω (y i ))] , with Ψ * approximating the divergence term of the relaxed marginal constraints (see ( 6)), and is optimized via alternating gradient updates.



EVALUATIONWe illustrate the effectiveness of NUBOT on different tasks, including a synthetic setup as well as an important but challenging task to predict single-cell perturbation responses to a diverse set of cancer drugs with different modes of actions.



4 / t I 6 P 0 7 C B S p i T a U / X 3 R E p D r U e h b z p D i g M 9 7 0 3 E / 7 x 2 g s G F l 3 I Z J w i S z R Y F i b A x s i c R 2 D 2 u g K E Y G U K Z 4 u Z W m w 2 o o g x N U A U T g j v / 8 i J p n J T d s / L p 3 W m p c p n F k S c H 5 J A c E 5 e c k w q 5 J l V S J 4 w o 8 k x e y Z v 1 a L 1 Y 7 9 b H r D V n Z T P 7 5 A + s z x / j Q Z L G < / l a t e x i t > NubOT < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 I L R Z H l a 5 b 2 z E Z p Z k P w Z y 8 b u w M 8 = " > A A A B + n i c b V B N T 8 J A E N 3 i F + J X 0 a O X j c T E E 2 k N U Y + o B z 0 p J n w l 0 J D t s s C G 7 b b Z n a q k 8 l O 8 e N A Y r / 4 S b / 4 b F + h B w Z d M 8 v L e T G b m + Z H g G h z n 2 8 o s L a + s r m X X c x u b W 9 s 7 d n 6 3 r s N Y U V a j o Q h V 0 y e a C S 5 Z D T g I 1 o w U I 4 E v W M M f X k 7 8 x j 1 T m o e y C q O I e Q H p S 9 7 j l I C R O n a + D e w R N E 1 i / 7 a K r 8 5 v x h 2 7 4 B S d K f

Figure 2: Unbalanced sample mapping. In all three scenarios (a, b, c), the source (gray) and target (blue) datasets share structure but have different shifts and per-cluster sampling proportions. The true growth factors of the clusters depicted in the left pane bottom for each setup (True Weights). Tasked with mapping from source to target, NUBOT and UBOT GAN predict the locations (middle pane, red) and weights (right pane) of the transported samples. The number next to the weights denotes the mean weights per cluster. While both methods map the samples to the correct location, NUBOT more accurately predicts the weights needed to match the target distribution, creating mass (dark blue) or destroying it (red) as needed.

Figure4: Given the ground truth on the known subpopulation (MelA (red) and Sox9 (blue)) sizes for each drug, we analyze their level of correlation to our predicted weights after a. 8h and b. 24h. With increasing difficulty of the task and certain drugs completely removing both or one of the subpopulations, the level of correlation reduces from 8 to 24h. R denotes Pearson correlation coefficient and P the p-value.

Figure 5: UMAP projections of the control cells for Ulixertinib at a. 8h and b. 24h. Cells are colored by the observed and predicted protein marker values (Ki67, MelA), and predicted weights. NUBOT thereby correctly predicts weights ≥ 1 for proliferating cells in the MelA + population (a. and a., right panel), and increased levels of cell death in the Sox9 + population after 24h via weights ≤ 1 (b., right panel), confirmed by the experimental observations (see Fig. 15).

Figure 6: Distributional fit of the predicted samples to the target samples on synthetic data, measured by a weighted version of kernel MMD.

t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u ag r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t > ⌧ = 10< l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u ag r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J Jr e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / A K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t > ⌧ Setting a.

Figure8: Discrete solution to synthetic experiments in setting a. using unbalanced Sinkhorn (UBOT), with different entropy regularization parameters ϵ and different penalization parameters τ = τ 1 = τ 2 (7). For each parameter setup, we show the coupling from source (red) to target (blue), where the strength of each line is proportional to its value in the coupling matrix (left pane), the weights (middle pane) and the normalized weights (right pane). The weight per point is computed as the sum over the columns of the coupling matrix. We additionally show the average weight per cluster.

K b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t >⌧ = 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " i O i 4 i f 5 N k M I V 0 h g F u x O x E M P Z M J w = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 1 G P R i 8 c K 9 g P a U D b b T b t 0 d x N 2 J 0 I J / Q t e P C j i 1 T / k z X 9 j 0 u a g r Q 8 G H u / N M D M v i K W w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 b Z Q Y x l s s k p H p B t R y K T R v o U D J u 7 H h V A W S d 4 L J X e 5 3 n r i x I t K P O I 2 5 r + h I i 1 A w i r n U R 5 o M q j W 3 7 s 5 B V o l X k B o U a A 6 q X / 1 h x B L F N T J J r e 1 5 b o x + S g 0 K J v m s 0 k 8 s j y m b 0 B H v Z V R T x a 2 f z m + d k b N M G Z I w M l l p J H P 1 9 0 R K l b V T F W S d i u L Y L n u 5 + J / X S z C 8 8 V O h 4 w S 5 Z o t F Y S I J R i R / n A y F 4 Q z l N C O U G Z H d S t i Y G s o w i 6 e S h e A t v 7 x K 2 h d 1 7 6 p + + X B Z a 9 w W c Z T h B E 7 h H D y 4 h g b c Q x N a w G A M z / AK b 4 5 y X p x 3 5 2 P R W n K K m W P 4 A + f z B y Q b j l E = < / l a t e x i t > ⌧ Setting b.

Figure9: Discrete solution to synthetic experiments in setting b. using unbalanced Sinkhorn (UBOT), with different entropy regularization parameters ϵ and different penalization parameters τ = τ 1 = τ 2 (7). For each parameter setup, we show the coupling from source (red) to target (blue), where the strength of each line is proportional to its value in the coupling matrix (left pane), the weights (middle pane) and the normalized weights (right pane). The weight per point is computed as the sum over the columns of the coupling matrix. We additionally show the average weight per cluster.

UBOT GAN. Using (6),Yang & Uhler (2019) propose to model mass variation in unbalanced OT via a relaxation of the marginals. Similar toFan et al. (2021a), Yang & Uhler (2019) reformulate the constrained Monge problem (3) as a saddle point problem with Lagrange multiplier h for the constraintT ♯ µ = ν, i.e., x, T (x))h(T (x))]µ(x)dx + X h(y)ν(y)dy,parameterizing T and h via neural networks. To allow mass to be created and destroyed, Yang & Uhler (2019) introduce scaling factor ξ : X → R + , allowing to scale mass of each source point x i . The optimal solution then needs to balance the cost of mass and the cost of transport, potentially measured through different cost functions c 1 : X × Y → R + (cost of mass transport) and c 2 : R + → R + (cost of mass variation). Parameterizing the transport map T θ , the scaling factors ξ ϕ , and the penalty h ω with neural networks, the resulting objective is l(θ, ϕ, ω) := 1 n n i=0

)

8h Timestep 24h Timestep

< l a t e x i t s h a 1 _ b a s e 6 4 = " m y q H 2 U D h 3 o I C v o z A g g P s 1 n K i 9 m 8 = " > A A A B 9 X i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 B I v g q S Q i 6 r H o x Z N W 6 B e 0 s W y 2 k 3 b p Z h N 2 J 2 o J / R 9 e P C j i 1 f / i z X / j t s 1 B W x 8 M P N 6 b Y W a e H w u u 0 X G + r d z S 8 s r q W n 6 9 s L G 5 t b 1 T 3 N 1 r 6 C h R D O o s E p F q + V S D 4 B L q y F F A K 1 Z A Q 1 9 A 0 x 9 e T f z m A y j N I 1 n D U Q x e S P u S B 5 x R N N J 9 B + E J N U t v E v + 2 N u 4 W S 0 7 Z m c J e J G 5 G S i R D t V v 8 6 v Q i l o Q g k Q m q d d t 1 Y v R S q p A z A e N C J 9 E Q U z a k f W g b K m k I 2 k u n V 4 / t I 6 P 0 7 C B S p i T a U / X 3 R E p D r U e h b z p D i g M 9 7 0 3 E / 7 x 2 g s G F l 3 I Z J w i S z R Y F i b A x s i c R 2 D 2 u g K E Y G U K Z 4 u Z W m w 2 o o g x N U A U T g j v / 8 i J p n J T d s / L p 3 W m p c p n F k S c H 5 J A c E 5 e c k w q 5 J l V S J 4 w o 8 k x e y Z v 1 a L 1 Y 7 9 b H r D V n Z T P 7 5 A + s z x / j Q Z L G < / l a t e x i t > NubOT < l a t e x i t s h a 1 _ b a s e 6 4 = " u p C h 0 K 5 O f + A 5 k a J i t e C 8 + S a X B V o = " > A A A C A X i c b V B N S 8 N A E N 3 U r 1 q / o l 4 E L 4 t F 8 F Q S K e q x 6 k G P F e w H t K F s t p t 2 6 W Y T d i f S E u r F v + L F g y J e / R f e / D d u 2 x y 0 9 c H A 4 7 0 Z Z u b 5 s e A a H O f b y i 0 t r 6 y u 5 d c L G 5 t b 2 z v 2 7 l 5 d R 4 m i r E Y j E a m m T z Q T X L I a c B C s G S t G Q l + w h j + 4 n v i N B 6 Y 0 j + Q 9 j G L m h a Q n e c A p A S N 1 7 I M 2 s C F o m t 6 Q R G t O J L 6 M Y x U N x x 2 7 6 J S c K f A i c T N S R B m q H f u r 3 Y 1 o E j I J V B C t W 6 4 T g 5 c S B Z w K N i 6 0 E 8 1 i Q g e k x 1 q G S h I y 7 a

Discrete OT

< l a t e x i t s h a 1 _ b a s e 6 4 = " V P L e o y Y I u T t m z 5 h J (Gut et al., 2018) , which is capable of measuring the abundance and localization of many proteins in cells. By iteratively adding, imaging and removing fluorescently tagged antibodies, a multitude of protein markers is captured for each cell. Additionally, cellular and morphological characteristics are extracted from microscopical images, such as the cell and nucleus area and circularity. This spatially resolved phenotypic dataset is rich in molecular information and provides insights into heterogeneous responses of thousands of cells to various drugs. Measuring different morphological and signaling features captures pre-existing cell-to-cell variability which might influence perturbation effect, resulting in various different responses. Some of these markers are of particular importance, as they provide insights into the level of a cell's growth or death as well as subpopulation identity. We utilized a mixture of two melanoma tumor cell lines (M130219 and M130429) at a ratio of 1:1. The cell lines can be differentiated by the mutually exclusive expression of marker proteins. The former is positive for Sox9, the latter for a set of four proteins which are all recognized by and antibody called MelA (Raaijmakers et al., 2015) . Cells were seeded in a 384-well plate and incubated at 37C and 5% CO2 overnight. Next, the cells were exposed to multiple cancer drugs and Dimethyl sulfoxide (DMSO) as a vehicle control for 8h and 24h after which the cells were fixed and six cycles of 4i were performed TissueMAPS and the scikit-image library (Van der Walt et al., 2014) were used to process and analyze the acquired images, perform feature extraction and quality control steps using semi-supervised random forest classifiers.Data generation and processing. Our datasets contain high-dimensional single-cell data of control and drug-treated cells measured at two time points (8 and 24 hours). For both the 8h-dataset and the 24h-dataset, we normalized the extracted intensity and morphological features by dividing each feature by its 75th percentile, computed on the control cells. Additionally, values were transformed by a log1p function (x ← log(x + 1)). In total, our datasets consist of 48 features, of which 26 are protein marker intensities and the remaining 22 are morphological features. For each treatment, we have measured between 2000 and 3000 cells. For training the models, we perform a 80/20 train/test split. We trained all models on control and treated cells for each time step and each drug separately.The considered drugs as well as their inhibition type can be found in Table 2 .Cell type assignment. We assigned M130219 and M130429 cells to the Sox9 and MelA cell types, respectively, by first training a two component Gaussian mixture model on the features 'intensity-cell-MelA-mean' and 'intensity-nuclei-Sox9-mean' of the control cells. Next, we used the aforementioned features and the labels provided by the mixture model to train a nearest neighbor classifier, which GAUSSIAN APPROX. Janati et al. (2020b) provide a closed-form solution of the entropy-regularized optimal transport problem on unbalanced Gaussians. They show that the unbalanced optimal transport plan is, minimizer of (7), is also a Gaussian distribution over R d × R d .In order to use this baseline on the single cell data, we first compute a Gaussian approximation of the control and treated cells separately in the original cell data space. Then, we compute the closed-form joint coupling, π = N (µ, Σ), whereGiven a test source sample x t , we then compute the conditional expectation of the transported target y, i.e.,DISCRETE OT. Additionally, we consider the entropy-regularized Wasserstein mapping returned by the Sinkhorn algorithm (Chizat et al., 2018a; Cuturi, 2013; Benamou et al., 2015) on finite sets. This algorithm does not return a parameterized solution, but rather a transport map between finite source and target sets. Thus, the solution is computed on the full dataset (inclduing train and test data) and cannot be considered in the out-of-sample setup. We have included a comparison anyway for completion.IDENTITY. A trivial baseline is to compare the predictions to a map which does not model any perturbation effect. The IDENTITY baseline thus models an identity map and provides an upper bound on the overall performance, also considered in Bunne et al. (2021) .OBSERVED. In a similar fashion we might ask for a lower bound on the performance. As a ground truth matching is not available, we can construct a baseline for a comparison on a distributional level by comprising a different set of observed perturbed cells, which only vary from the true predictions up to experimental noise. The closer a method can approach the OBSERVED baseline, the more accurate it fits the perturbed cell population.

D.3 HYPERPARAMETERS

We parameterize the duals f and g using ICNNs with 4 hidden layers, each of size 64, using ReLU as activation function between the layers. We choose the identity initialization scheme introduced by Bunne et al. (2022b) such that ∇g and ∇f resemble the identity function in the first training iteration. As suggested by Makkuva et al. (2020) , we relax the convexity constraint on ICNN g and instead penalize its negative weightsThe convexity constraint on ICNN f is enforced after each update by setting the negative weights of all W z l ∈ θ f to zero. Duals g and f are trained with an alternating min-max scheme where each model is trained at the same frequency. Further, both reweighting functions η and ζ are represented by a multi-layer perceptron (MLP) with two hidden layers of size 64 for the single-cell and of size 32 for the synthetic dataset, with ReLU activation functions. The final output is further passed through a softplus activation function as we do not assume negative weights. For the unbalanced Sinkhorn algorithm, we choose an entropy regularization of ε = 0.005 and a marginal relaxation penalty of 0.05. We use both Adam for pairs g and f as well as η and ζ with learning rate 10 -4 and 10 -3 as well as β 1 = 0.5 and β 2 = 0.9, respectively. We parameterize both baselines with networks of similar size and follow the implementation proposed by Yang & Uhler (2019) and Bunne et al. (2021) .

E REPRODUCIBILITY

The code will be made public upon publication of this work.

