KERNEL NEURAL OPTIMAL TRANSPORT

Abstract

We study the Neural Optimal Transport (NOT) algorithm which uses the general optimal transport formulation and learns stochastic transport plans. We show that NOT with the weak quadratic cost may learn fake plans which are not optimal. To resolve this issue, we introduce kernel weak quadratic costs. We show that they provide improved theoretical guarantees and practical performance. We test NOT with kernel costs on the unpaired image-to-image translation task. (a) Celeba (female) → anime, 128 × 128. (b) Outdoor → church, 128 × 128.

1. INTRODUCTION

Neural methods have become widespread in Optimal Transport (OT) starting from the introduction of the large-scale OT (Genevay et al., 2016; Seguy et al., 2018) and the Wasserstein Generative Adversarial Networks (Arjovsky et al., 2017) (WGANs) . Most existing methods employ the OT cost as the loss function to update the generator in GANs (Gulrajani et al., 2017; Sanjabi et al., 2018; Petzka et al., 2018) . In contrast to these approaches, (Korotin et al., 2023; Rout et al., 2022; Daniels et al., 2021; Fan et al., 2022a; Korotin et al., 2023) have recently proposed scalable neural methods to compute the OT plan (or map) and use it directly as the generative model. In this paper, we focus on the Neural Optimal Transport (NOT) algorithm (Korotin et al., 2023) . It is capable of learning optimal deterministic (one-to-one) and stochastic (one-to-many) maps and plans for quite general strong and weak (Gozlan et al., 2017; Gozlan & Juillet, 2020; Backhoff-Veraguas et al., 2019) transport costs. In practice, the authors of NOT test it on the unpaired image-to-image translation task (Korotin et al., 2023, 5) with the weak quadratic cost (Alibert et al., 2019, 5.2) . Contributions. We conduct the theoretical and empirical analysis of the saddle point optimization problem of NOT algorithm for the weak quadratic cost. We show that it may have a lot of fake solutions which do not provide an OT plan. We show that NOT indeed might recover them ( 3.1). We propose weak kernel quadratic costs and prove that they solve this issue ( 3.2). Practically, we show how NOT with kernel costs performs on the unpaired image-to-image translation task ( 5). Notations. We use X , Y, Z to denote Polish spaces and P(X ), P(Y), P(Z) to denote the respective sets of probability distributions on them. For a distribution P, we denote its mean and covariance matrix by m P and Σ P , respectively. We denote the set of probability distributions on X × Y with marginals P and Q by Π(P, Q). For a measurable map T : X × Z → Y (or T x : Z → Y), we denote the associated push-forward operator by T ♯ (or T x ♯). We use H to denote a Hilbert space (feature space). Its inner product is ⟨•, •⟩ H , and ∥ • ∥ H is the corresponding norm. For a function u : Y → H (feature map), we denote the respective positive definite symmetric (PDS) kernel by k(y, y ′ ) def = ⟨u(y), u(y ′ )⟩ H . A PDS kernel k : Y × Y → R is called characteristic if the kernel mean embedding P(Y) ∋ µ → u(µ) def = X u(y)dµ(y) ∈ H is a one-to-one mapping. For a function ϕ : R D → R ∪ {∞}, we denote its convex conjugate by ϕ(y) def = sup x∈R D {⟨x, y⟩ -ϕ (x)}.

2. BACKGROUND ON OPTIMAL TRANSPORT

Strong OT formulation. For distributions P ∈ P(X ), Q ∈ P(Y) and a cost function c : X ×Y → R, Kantorovich's (Villani, 2008) primal formulation of the optimal transport cost (Figure 2a ) is Cost(P, Q) def = inf π∈Π(P,Q) X ×Y c(x, y)dπ(x, y), where the minimum is taken over all transport plans π, i.e., distributions on X × Y whose marginals are P and Q. The optimal π * ∈ Π(P, Q) is called the optimal transport plan. A popular example of an OT cost for X = Y = R D is the Wasserstein-2 (W 2 2 ), i.e., formulation (1) for c(x, y) = 1 2 ∥x -y∥ 2 .  For γ = 0, the transport cost (3) is strong, i.e., W 2 = W 2,0 .



Figure 1: Unpaired image-to-image translation (one-to-many) by Kernel Neural Optimal Transport.

(a) Strong OT formulation (1). (b) Weak OT formulation (2).

Figure 2: Strong (Kantorovich's) and weak (Gozlan et al., 2017) optimal transport formulations. Weak OT formulation. Let C : X × P(Y) → R be a weak cost (Gozlan et al., 2017), i.e., a function which takes a point x ∈ X and a distribution of y ∈ Y as inputs. The weak OT cost between P, Q is Cost(P, Q) def = inf π∈Π(P,Q) X C x, π(•|x) dπ(x),(2)

