OTCOP: LEARNING OPTIMAL TRANSPORT MAP VIA CONSTRAINT OPTIMIZATION

Abstract

The approximation power of the neural network makes it an ideal tool to learn optimal transport maps. However, existing methods are mostly based on the Kantorovich duality and require regularizations and/or special network structures. In this paper, we propose a direct constraint optimization algorithm for the computation of optimal transport maps based on the Monge formulation. We solve this constraint optimization problem by using three different methods: the Langrangian multiplier method, the augmented Lagrangian method, and the alternating direction method of multipliers (ADMM). We demonstrate a significant accuracy of learned optimal transport maps on high dimensional benchmarks. Moreover, we show that our methods reduce the regularization effects and accurately learn the target distributions at a lower transport cost.

1. INTRODUCTION

There has been a great interest in applying modern machine learning techniques for finding optimal transport maps between two distributions. Different from traditional computational methods that solve PDEs for optimal transport maps (Benamou & Brenier (2000) ; Angenent et al. (2003) In this paper, we focus on the direct solution of the Monge problem. The Monge problem (Monge (1781)) directly seeks to identify the optimal transport maps and is a nonlinear constraint optimization problem. The major difficulty in solving the problem numerically is that it is nonlinear and includes a constraint that the push-forward distribution is equal to the target distribution, which is difficult to implement. Therefore, most optimal transport algorithms avoid directly solving the Monge problem but use the Kantorovich duality (Kantorovich (1942)), for which the objective function is linear and the transport map is obtained by taking the gradient of the Brenier potential for the quadratic cost. However, these two problems are not always identical (Villani ( 2009)) and it is desirable to find a direct approach for the Monge problem. The Monge problem has been solved numerically using optimization based methods with polynomial approximations. For example, a Lagrangian penalty method was used to find optimal transport maps approximated by polynomials for Bayesian inference El Moselhy & Marzouk (2012) and space discretization was used in Haber et al. (2010) to calculate the Jacobian matrix of the transport maps and transferred the optimization to finite dimensional spaces. However, their approaches are limited to low dimensions as number of grids expands exponentially as dimensions become large. Considering the success of deep neural networks in approximating high dimensional data, the integration of classical constraint optimization methods and neural networks holds a promise. One successful application of the optimal transport theory to deep learning is the Wasserstein Generative Adversarial Network (WGAN) Arjovsky et al. (2017) . However, WGAN only use the optimal transport distance as a loss function and does not target at finding the optimal transport maps. It is desirable to study whether it is possible to lower the transport cost of the map learned by WGAN or other networks using the algorithm for finding optimal transport maps. This paper presents a new approach for finding optimal transport maps between two continuous distributions. We make the following contributions: • We integrate three constraint optimization algorithms including the Standard Lagrangian (SL), the Augmented Lagrangian method (AL) and the Alternating Direction Method of Multipliers (ADMM) with neural networks to solve the Monge problem of optimal transport with provable guarantees (Theorem 1-3). • We show that our method is able to find an accurate optimal transport map between Gaussian distributions, both theoretically (Theorem 2) and experimentally. Moreover, we apply our method to WGAN and show that our method can find a generative map with lower transport cost while not sacrificing the quality of outputs. • We compare the three algorithms and find the SL algorithm introduces errors but is simple and easy to implement, while AL and ADMM algorithms can find exact results and are more robust, and ADMM gives a lower transport cost in general. Notations. We use the notations α d = (α, • • • , α) ∈ R d and α d×d for the constant d × d matrix. The transport cost of a map T , which pushes distribution µ to ν, is defined to be E x∼µ [|x -T x| 2 ].

2.1. THE MONGE PROBLEM

Let (X, µ), (Y, ν) be two separable metric probability spaces. The Monge problem is to find a transport map T : X → Y that realizes the infimum inf X c(x, T x)dµ(x) T # µ = ν where T # µ denotes the push forward of µ and c : X × Y → R + is a Borel measurable function which is lower semicontinuous. In this paper, we simply take the distance |x -y| 2 but our method applies to other distance functions. The existence of the Monge problem is difficult and does not hold always. However, under suitable conditions, for example for continuous distributions without atoms, the existence and uniqueness of the Monge problem is guaranteed (see for example, (Villani, 2009, Theorem 5.30) . Therefore, here we focus on learning transport maps between continuous distributions. For discrete distributions, one can apply dequantization techniques to transform them to continuous distributions Ho et al. (2019).

2.2. THE MONGE PROBLEM AS CONSTRAINT OPTIMIZATION

In order to solve the Monge problem, we use a generative network, denoted by T θ with parameter set θ, which inputs random samples from the distribution µ and generates samples representing the target distributions ν. As can be seen from the definition, the Monge problem is a constraint optimization problem. However, the constraint T # µ = ν is a highly nonlinear constraint. In order to impose this constraint, we take d( (2) The objective of this paper is to solve the above problem using techniques from the constraint optimization theory (Bertsekas ( 2014)).



; Li et al. (2018)), modern machine learning techniques aim to solve the problem directly by optimizations. The Sinkhorn Distance method Cuturi (2013); Peyré et al. (2019), the regularized OT dual Seguy et al. (2017) have been used to find large scale optimal transport maps between discrete probability distributions and have been used to train generative networks Genevay et al. (2018); Sanjabi et al. (2018). A geometric treatment is provided in Gu et al. (2013). The Input Convex Neural Network (ICNN) is used to construct a convex Brenier potential for finding optimal transport maps Makkuva et al. (2020) between continuous distributions and is recently used in population dynamics Bunne et al. (2022), which combines the ICNN and Sinkhorn distance methods Amos et al. (2022). Despite these successes, most methods are based on the duality formulation and avoids the direct treatment on the Monge problem.

•|•) to be a distance function (such as the Wasserstein distance, the MMD (Gretton et al. (2012)) or the IPM (Müller (1997))) or a probability divergence (such as the Kullback-Leibler (KL) divergence). The constraint optimization problem reads as min θ E x∼µ |x -T θ x| 2 , s.t. d(T θ# µ|ν) = 0.

