VOLUMETRIC OPTIMAL TRANSPORTATION BY FAST FOURIER TRANSFORM

Abstract

The optimal transportation map finds the most economical way to transport one probability measure to another, and it has been applied in a broad range of applications in machine learning and computer vision. By the Brenier theory, computing the optimal transport map is equivalent to solving a Monge-Ampère equation, which is highly non-linear. Therefore, the computation of optimal transportation maps is intrinsically challenging. In this work, we propose a novel and powerful method, the FFT-OT (fast Fourier transform-optimal transport), to compute the 3-dimensional OT problems. The method is based on several key ideas: first, the Monge-Ampère equation is linearized to a sequence of linear elliptic PDEs with spacial and temporal variant coefficients; second, the obliqueness property of optimal transportation maps is reformulated as a Neumann boundary condition; and third, the variant coefficient elliptic PDEs are approximated by constant coefficient elliptic PDEs and solved by FFT on GPUs. We also prove that the algorithm converges linearly. Experimental results show that the FFT-OT algorithm is more than a hundred times faster than the conventional methods based on the convex geometry. Furthermore, the method can be directly applied for sampling from complex 3D density functions in machine learning and magnifying the volumetric data in medical imaging.

1. INTRODUCTION

Optimal transportation (OT) transports one probability measure to another in the most economical way, and it plays a fundamental role in areas like machine learning Courty et al. (2017) ; Altschuler et al. (2019) , computer vision Arjovsky et al. (2017) ; Tolstikhin et al. (2018) ; An et al. (2020) , and computer graphics Solomon et al. (2015) ; Nader & Guennebaud (2018) . Given a Riemannian manifold X, all the probability distributions on X form an infinite dimensional space P(X). Given any two distributions µ, ν ∈ P(X), the optimal transportation map defines a distance between them, and the McCann interpolation McCann (1997) defines the geodesic connecting them. Hence optimal transportation equips P(X) with a Riemannian metric and defines its covariant differentiation, which provides a variational calculus framework for optimization in it. As the optimal transportation problem is highly non-linear, it is quite challenging to compute the OT maps. Recently, researchers have developed many algorithms. The geometric variational approach Aurenhammer et al. (1998) ; Gu et al. (2016) ; Levy (2015) based on the Brenier theorem Brenier (1991) is capable of achieving high accuracy for low dimensional problems, but it requires complicated geometric data structure and the storage complexity grows exponentially as the dimension increases. The Sinkhorn method Cuturi (2013) based on the Kantorovich theorem adds an entropic regularizer to the primal problem and can handle high dimensional tasks, but it suffers from the intrinsic approximation error. We propose a novel method to tackle this challenging problem through Fast Fourier Transformation (FFT). According to the Brenier theorem Brenier (1991) , under the quadratic distance cost, the optimal transportation map is the gradient of the Brenier potential, which satisfies the Monge-Ampère equation. With the continuity method Delanoë (1991) , the Monge-Ampère equation can be linearized as a sequence of elliptic partial differential equations (PDEs) with spacial and temporal variant coefficients. By iteratively solving the linearized Monge-Ampère equations, we can obtain the OT map. Specifically, we propose to approximate the linearized Monge-Ampère equation by constant coefficient elliptic PDEs and solve them using the FFT on GPUs. Our proposed FFT-OT method has many merits: (i) it is generalizable for arbitrary dimension; (ii) it has a linear convergence rate, namely the approximation error decays exponentially fast; (iii) in each iteration, the computational complexity of FFT is O(n log n), thus our algorithm can solve large scale OT problems; and (iv) it is highly parallelable and can be efficiently implemented on GPUs. We demonstrate the efficiency of the FFT-OT algorithm by solving the volumetric OT problems for machine learning and medical imaging applications including sampling from given 3D density functions and volumetric magnifier. The algorithm also has its own limitations: (i) although it can be generalized to any dimensions, the storage complexity increase exponentially with respect to the dimension, so its power is limited by the memory size of the GPUs; (ii) Since the algorithm uses FFT, the current version of the method only works well for continuous density functions. (iii) In this work, we mainly focus on the computation of the OT map from the uniform distribution to another arbitrary continuous distribution. To extend the method to find the OT map between any two continuous measures, we can compute two OT maps from the uniform distribution to the both continuous measures, then combine them together. The combination will give a reasonable approximation of the OT map Nader & Guennebaud (2018) . Though Lei and Gu Lei & Gu (2021) also uses FFT to solve the 2-dimensional OT problem, our method differs their works in the following two aspects: (i) Lei and Gu's method uses the fixed point method to compute the 2D OT problems, ours is based on the linearization of the Monge-Ampère operator to solve the 3D OT problems, these are two different methodologies in PDE theory; (ii) In our paper, we also provide the theoretical convergence analysis of the proposed method. For more detailed analysis and related work, please refer to the Appendix A.

2. OPTIMAL TRANSPORTATION THEORY

In this section, we review the fundamental concepts and theorems of the OT problem and the Monge-Amperè equation, more details can be found in Villani (2008) . Optimal Transportation Map and the Monge-Ampère equation Suppose the source domain Ω is an open set in R d with the probability measure µ, the target domain Σ is with the probability measure ν. Both µ and ν have density functions dµ(x) = f (x)dx and dν(y) = g(y)dy, respectively, with the equal total mass: Ω f (x)dx = Σ g(y)dy, which is called the balance condition. Suppose T : Ω → Σ is a measurable map. The mapping T is called measure preserving and denoted as T # µ = ν if the following relation µ(T -1 (A)) = ν(A) for every Borel subset A ⊂ Σ. A cost function c : Ω × Σ → R measures the transportation cost for transporting the unit mass from x ∈ Ω to y ∈ Σ. Problem 1 (Monge). The optimal transportation problem finds the measure preserving map with the minimal total transportation cost, min T # µ=ν Ω c(x, T (x))f (x)dx The solution to the Monge's problem is called the optimal transport map between µ and ν. The existence, uniqueness and regularity of OT maps depend on the boundedness and the continuity of the density functions, the convexity of the supporting domains, the continuity of their boundaries, and the cost function. In our current work, we focus on the similar situation in Saumier et al. (2013) , • The cost function is quadratic Euclidean distance c(x, y) = ∥x -y∥ 2 /2; • The supports of the source and the target measures are the canonical cube Ω = [-1, 1] 3 , which is uniformly convex; • The source and the target measures µ, ν are absolutely continuous with respect to the Lebesgue measure, their densities f, g are positive and bounded away from zero; 0 < m < f, g < M, and f, g are of class C α (Ω), • The boundary condition is second boundary condition (OT boundary condition), T (Ω) = Ω. Then according to (Villani (2003 ) Theorem 14.4, Saumier et al. (2013) Theorem 2.1), the OT maps T : Ω → Ω exists and is unique and invertible (µ a.e), and the Brenier potential is of class C 2,β (Ω) form some 0 < β < α. Theorem 2. Assume that Ω, µ, ν, f and g are defined as above. Then there exists a convex function u : Ω → R, u ∈ C 2,β (Ω) for some 0 < β < α, such that ∇u pushes µ forward to ν, (∇u) # µ = ν. Moreover, ∇u is unique and invertible (µ a.e), and its inverse ∇v satisfies (∇v ) # ν = µ. We call such a convex function u the Brenier potential, it satisfies the Monge-Ampère equation, detD 2 u(x) = f (x) g • ∇u(x) . ( ) with the boundary condition ∇u(Ω) = Σ. Then finding the optimal transportation map is equivalent to solving the corresponding Monge-Ampère equation. In the current work, the target measure is always the Lebesgue measure, and the source density f is of class C 2,α (Ω). Linearized Monge-Ampère Operator The Monge-Ampère operator is defined as MA[u] = detD 2 u, which is highly non-linear. It can be linearized as following: MA[u + εv] = det(D 2 u + εD 2 v) ≈ detD 2 u + εTrace(Adj(D 2 u) • D 2 v), where Adj(A) is the adjoint (co-factor) matrix of A, Adj(A) := det(A)A -T . Therefore the linearized Monge-Ampère operator is defined as DMAu[v] := Trace(Adj(D 2 u) • D 2 v) = d p,q=1 u pq (x)∂p∂qv(x), where (u pq ) = Adj(D 2 u) is the adjoint matrix of the Hessian of u, and ∂ p ∂ q := ∂ 2 ∂xp∂xq . Continuity Method For simplicity, we assume the source domain coincides with the target domain, that is Ω = Σ, and the target density is g(x) ≡ 1. The Monge-Ampère equation Eqn. (2) is simplified as detD 2 u(x) = f (x). Define a flow of density as ρ(x, t) = (1 -t) + tf (x), t ∈ [0, 1]. The corresponding flow of the Brenier potentials is u(x, t) : Ω × [0, 1] → R, detD 2 x u(x, t) = ρ(x, t), s.t. ∇ x u(x, t)(Ω) = Ω, where D 2 x u(x, t) is the Hessian of u(x, t) with respect to x, and u(x, 1) is the solution to the initial Monge-Ampère equation Eqn. (2). Take the derivative w.r.t. time t on both sides of the linearized Monge-Ampère operator Eqn. (4), we obtain an elliptic PDE with the spacial and temporal variant coefficients of the unknown v(x, t) := u(x, t), namely the "velocity" of the Brenier potential, DMAu[v] = d p,q=1 u pq (x, t)∂p∂qv(x, t) = ∂ ∂t ρ(x, t) = f (x) -1. At time t = 0, the initial Brenier potential is known as u(x, 0) = 1 2 ∥x∥ 2 . Suppose at time t, we have obtained u(x, t) already, then we can compute the adjoint matrix u pq (x, t) of the Hessian D 2 x u(x, t), and solve Eqn. (6) to get the velocity v(x, t) = u(x, t). In turn, we move forward to time t + δt, and update u(x, t + δt) by u(x, t) + u(x, t)δt. By repeating this procedure, eventually we reach time t = 1 and obtain the solution u(x) := u(x, 1) to the initial Monge-Ampère Eqn. (2). Obliqueness Boundary Condition Suppose the boundary of Ω is C 1 almost everywhere, therefore at a C 1 point x ∈ ∂Ω, the outer normal n(x) is well defined. For almost every boundary point x ∈ ∂Ω, the obliqueness condition is represented as ⟨n(x), n(∇u(x))⟩ ≥ 0. (7) Suppose Ω is a cuboid and has 6 faces, if a boundary point x ∈ ∂Ω is on a face, by the cyclic monotonicity of the map and the strict convexity of u Villani (2008) , its image ∇u(x) must be on the same face of x, namely, ⟨∇u(x) -x, n(x)⟩ = 0. ( ) We can rewrite the Brenier potential as u(x 1 , x 2 , . . . , x d ) = 1 2 d i=1 x 2 i + v(x 1 , • • • , x d ), then ∇u(x) -x = ∇v(x). By Eqn. (8), v(x) satisfies the Neumann boundary condition, ∂v ∂n (x) = 0, x ∈ ∂Ω. Similarly, the velocity of the (modified) Brenier potential v in Eqn. ( 6) also satisfies the Neumann boundary condition. The analysis about the existence and regularity of the solutions to Eqn. ( 6) with boundary condition Eqn. ( 9) can be found in the supplementary material.

3. COMPUTATIONAL ALGORITHM

Here we introduce the 3-dimensional FFT-OT algorithm, which can be generalized to any dimensions. We approximate the Monge-Ampère equation by a sequence of constant coefficient elliptic PDEs, and solve them by FFT on GPUs. More detailed analysis about the solution of the discretized Monge-Ampère equation, and the proofs of the lemmas and theorems are given by Appendix B.

3.1. CONTINUITY METHOD FOR SOLVING THE MONGE-AMP ÈRE EQUATION

By using the continuity method, we can solve the Monge-Ampère equation iteratively. For simplicity, we assume the target measure is the Lebesgue's measure with g ≡ 1. At the n-th iteration, the Brenier potential is represented as 1 2 ∥x∥ 2 + u n (x), its Hessian matrix is H n (x) := I + D 2 u n (x), the corresponding density function is defined as the determinant of the Hessian ρ n = det(H n ), and the velocity of the Brenier potential is v n (x). In the beginning, the Brenier potential u 0 (x) is zero, the Hessian is H 0 = I and the density is ρ 0 = 1. At the n-th step, we compute the adjoint matrix [H pq n (x)] of the Hessian matrix H n (x) for any x ∈ Ω. According to Eqn. (3), the velocity v n (x) satisfies the variant coefficient elliptic PDE induced by the linearized Monge-Ampère operator, DMAu n [vn] = 2 p,q=0 H pq n (x)∂p∂qvn(x) = 1 τ (f (x) -ρn(x)). Note that the right hand side of Eqn. ( 6) is the difference between the initial and the target densities, whereas here it is replaced by the difference between the initial and the current densities. The step length parameter τ ≥ 1 can be chosen to guarantee the convergence Loeper & Rapetti (2005) . The elliptic PDE Eqn. ( 10) is with spatially variant coefficients. Although the traditional finite element method (FEM) can solve it using the GMRES algorithm Saad (2003) , this algorithm can not be directly accelerated by GPUs. To overcome this difficulty, we approximate Eqn. (10) by a much simpler elliptic PDE with constant coefficients, which can be directly solved using the following FFT-OT algorithm pipeline Alg. 1 on GPUs in Appendix C. At the n-th iteration, after obtaining the adjoint matrix [H pq n (x)], x ∈ Ω, we compute the mean adjoint matrix [ Hpq n (x)] Hpq n := Ω H pq n (x)ρn(x)dx Ω ρn(x)dx , p, q = 0, 1, 2 and replace the elliptic PDE Eqn.(10) with variant coefficients by the elliptic PDE with constant coefficients, DMAu n [vn] = 2 p,q=0 Hpq n ∂p∂qvn(x) = 1 τ (f (x) -ρn(x)), ( ) where DMA is called the mean linearized Monge-Ampère operator. Then we solve the constant coefficient elliptic PDE Eqn. ( 12) by FFT Algorithm Alg. 2 in Appendix C. Although the original variant coefficient PDE Eqn. ( 10) is replaced by its constant coefficient approximation Eqn. ( 12), the algorithm still converges to the solution with a linear convergence rate. This replacement allows the whole algorithm to be solved by FFT on GPUs, which greatly improves the computational efficiency. Theorem 3 (main). Given a domain Ω ⊂ R d , which is a canonical cuboid Ω = [-1, 1] d , and a positive density function f : Ω → R with the balance condition Ω f (x)dx = Ω dx, suppose the mirror reflection extension Eqn. ( 14) of f to the flat torus f : T n → R is C α , α ∈ (0, 1), then the Monge-Ampère equation, detD 2 u(x) = f (x), ∇u(Ω) = Ω can be solved using the FFT-OT Algorithm Alg. 1 in Appendix C. In particular, one can choose the step length parameter τ , such that there is a constant 0 < γ < 1 that the approximation error satisfies ∥f -ρn+1∥ 2 < Cγ n , ( ) namely the algorithm has a linear convergence rate.

3.2. FFT SOLVER FOR CONSTANT COEFFICIENT ELLIPTIC PDES

To solve the constant coefficient elliptic PDE Eqn. ( 12), we first extend the PDE to the flat torus by mirror reflection, then discretize the domain and compute the differential operators by central difference scheme. Finally the PDE is converted to algebraic equations in the frequency domain by FFT and can be efficiently solved on GPUs.

Extension by Mirror Reflection

Suppose Ω = [0, 1] 3 and f : Ω → R are given, we extend Ω to Ω = [-1, 1] 3 and f to f : Ω → R by mirror reflection f (x, y, z) = f (|x|, |y|, |z|), ∀(x, y, z) ∈ Ω. By definition, f satisfies the periodic boundary condition and can be treated as a function defined on the flat torus T 3 . Ω is one of the fundamental domain of T 3 . The constant coefficients a p,q keep unchanged. Then we solve the following constant coefficient elliptic PDE Eqn. ( 18) L[ũ] = f with the periodic boundary condition. Finally, the restriction of ũ on Ω gives the initial solution u to L[u] = f with Neumann boundary condition. In the following, to avoid using overly complicated symbols, we use (u, f, Ω) to represent (ũ, f , Ω) for simplicity. Tessellation Suppose Ω = [-1, 1] 3 is the canonical cube (a fundamental domain of a flat torus), we tessellate it to the regular cells, and the centers of the cells form a grid M × N × L. The Brenier potential u : Ω → R is discretized to a tensor u i,j,k with {i, j, k} ∈ {0, . . . , M -1} × {0, . . . , N - 1} × {0, . . . , L -1}. The spacial step lengths are (h x , h y , h z ) = (2/M, 2/N, 2/L). The coordinate of each sample point (x i , y j , z k ) is (x i , y j , z k ) = (-1 + h x (i + 1/2), -1 + h y (j + 1/2), -1 + h z (k + 1/2)). The periodic boundary condition is then formulated as u i,j,k = u i+αM,j+βN,k+γL , α, β, γ ∈ Z. ( ) Finite Difference Differential Operator We use the standard central differences to compute the differential operators. The first order derivative D x is approximated by D x u i,j,k = u i+1,j,k -u i-1,j,k 2h x , where the index i + 1 means i + 1 modulus M . The operators D y , D z are defined in a similar way. The second order derivative operator D xx and D xy are approximated by D 2 xx u i,j,k = u i+1,j,k + u i-1,j,k -2u i,j,k h 2 x D 2 xy u i,j,k = u i+1,j+1,k + u i-1,j-1,k -u i+1,j-1,k -u i-1,j+1,k The other operators D yy , D zz , D yz and D xz are defined similarly.

Discrete Fourier Transformation

The discrete Fourier transformation (DFT) of u i,j,k is given by ûm,n,l = M -1 i=0 N -1 j=0 L-1 k=0 u i,j,k ωmnl (16) u i,j,k = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l ω mnl ( ) where ωmnl = e -ι 2πmi M e -ι 2πnj N e -ι 2πlk L , ω mnl = e ι 2πmi M e ι 2πnj N e ι 2πlk L and ι = √ -1, {m, n, l} are the indices of the frequency coefficients. By using DFT, the differential operators are converted to algebraic operators in the frequency domain. Lemma 4. Suppose the discrete function is u i,j,k , with the discrete Fourier transformation Eqn. ( 16) and Eqn. ( 17), by using the central difference scheme, the first order differential operator is given by D x u i,j,k = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l sin 2πm M h x ω mnl the second order differential operators are represented by D 2 xx u i,j,k = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l 2 cos 2πm M -1 h 2 x ω mnl D 2 xy u i,j,k = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l -sin 2πm M sin 2πn N h x h y ω mnl The other differential operators D y , D z , D yy , D zz , D yz and D xz are also represented accordingly. The detailed proofs can be found in the supplementary material. FFT Solver Suppose we want to solve an elliptic PDE with constant coefficients on Ω ⊂ R 3 , L[u] := 2 p=0 2 q=0 a p,q ∂ p ∂ q + 2 r=0 b r ∂ r + c u(x) = f (x), with the periodic boundary condition, where a p,q , b r , c are constants, the matrix (a p,q ) is positive definite, namely the PDE is uniformly elliptic. By the discrete Fourier transformation F, we convert the differential equation to an algebraic equation in the frequency domain, 2 p=0 2 q=0 a p,q F(∂ p ∂ q u) + 2 r=0 b r F(∂ r u) + cF(u) = F(f ) By applying Lemma 4 and defining λ m,n,l =a 0,0 2(cos 2πm M -1) h 2 x + a 1,1 2(cos 2πn N -1) h 2 y +a 2,2 2(cos 2πl L -1) h 2 z -(a 0,1 + a 1,0 ) sin 2πm M sin 2πn N hxhy -(a 1,2 + a 2,1 ) sin 2πn N sin 2πl L hyhz -(a 0,2 + a 2,0 ) sin 2πl L sin 2πm M hzhx + b 0 sin 2πm M hx + b 1 sin 2πn N hy + b 2 sin 2πl L hz + c (19) We have the algebraic equations in frequency domain, ûm,n,l λ m,n,l = fm,n,l With ûm,n,l 's, we can easily obtain u i,j,k 's by the Inverse Discrete Fourier Transform (IDFT), which means solving the constant coefficient elliptic equation. The algorithm is described in Alg. 2 in Appendix C. The FFT for solving the constant coefficient elliptic PDE can be efficiently computed with GPUs. Moreover, the algorithm Alg. 2 solves the constant coefficient elliptic PDEs with a periodic boundary condition, which can be generalized to solving the same type of PDEs with Neumann boundary condition by extending the PDE to the flat torus T 3 using mirror reflection Eqn. ( 14).

4. EXPERIMENTAL RESULTS

In this section, we firstly show that the our proposed FFT-OT algorithm converges linearly and runs 100× faster than the conventional convex geometry based solver Levy (2015) , then demonstrate the method in two applications: 3D adaptive sampling and Volume Magnifier. All the algorithms are developed using generic C++ with CUDA Toolkit. All the experiments are conducted on a Windows laptop with Intel Core i7-7700HQ CPU with 16 GB memory and NVIDIA GeForce GTX 1060 Graphics Cards. More experiments can be found in Appendix D.

4.1. RUNNING TIME AND CONVERGENCE ANALYSIS

To show the performance of the proposed method, we experiment on the density functions defined by the Gaussian mixture models. To be specific, the domain is a cube Ω = [0, 1] 3 , the 3-dimensional density function defined on Ω is set to be f (x) = 30 i=1 p i N (µ i , Σ i ), where N (µ i , Σ i ) represents Gaussian distribution with mean µ i and variance Σ i = diag(σ 2 i0 , σ 2 i1 , σ 2 i2 ). µ i ∈ R 3 is uniformly sampled from [0, 1] 3 , σ ij is uniformly sampled from [0, 0.5], p i ∈ R is uniformly sampled from [0.2, 1 ] and normalized such that Ω f (x)dx = 1. Thus the source distribution µ is a complicated Gaussian mixture distribution restricted on Ω. Then by mirror reflection in Sec. 3.2, we obtain the complex density function which is defined on [-1, 1] 3 and satisfies the periodic boundary condition. We directly use the FFT-OT algorithm Alg. 1 to solve the linearized Monge-Ampère equation. With the approximation error threshold ε = 1.0 × 10 -6 and the resolution 256 × 256 × 256, the running time for our FFT-OT algorithm with double precision on GPU is less than 175 seconds. The conventional convex geometry based algorithm for 3D optimal transportation Levy (2015) can neither handle such large data sets nor be implemented on GPUs. It can only compute OT map with resolution no greater than 100 × 100 × 100 on our system, which takes about 2700 seconds. When handling problem with 128 × 128 × 128 resolution, our FFT-OT consumes about 20.3 seconds, which is 130× faster than the power diagram based method Levy (2015) . Fig. 1 shows the approximation error for the above Gaussian mixture density with respect to iterations, namely log ∥f -ρ n ∥ 2 2 . Our algorithm does converge linearly and the result is consistent with the prediction Eqn. (13) in Thm. 3. Therefore, this experiment validates the theorem.

4.2. 3D ADAPTIVE SAMPLING

Generating random samples matching a given density function plays an essential role in the applications like Monte-Carlo integration or stippling. Efficiently obtaining high quality samples is still an on-going research topic Bauer et al. (2015) ; Perrier et al. (2018) . And optimal transportation has been successfully applied for generating high quality 2D samples de Goes et al. ( 2012); Nader & Guennebaud (2018) . Most of the current research focuses on generating 2D samples fitting the given density function. Here we apply the proposed 3D FFT-OT method to generate high quality 3D samples according to the given complex density functions. To the best of our knowledge, it is the first work that uses OT to sample from 3D density functions. Suppose the source probability distribution dµ(x) = f (x)dx is defined on Ω = [0, 1] 3 with µ(Ω) = 1. The target distribution dν(y) = dy is the uniform distribution. We use the FFT-OT algorithm Alg. 1 to compute the OT map T : Ω → Ω, T # µ = ν. The domain is tessellated to a 256×256×256 grid. For each x ijk , i, j, k ∈ {0, 1, . . . , 255}, the image T (x ijk ) can be obtained. We use {T (x ijk )} as vertices to compute the Delaunay triangulation of Ω. Then representing the OT map T : (Ω, µ) → (Ω, ν) as a piecewise linear map, the restriction of T on each tetrahedron is a linear map. Then the inverse OT map T -1 : (Ω, ν) → (Ω, µ) is also a piecewise linear map. Namely, given a grid point y mnl , we can find a tetrahedron containing it. Suppose the vertices of the tetrahedron are {T (x i ), T (x j ), T (x k ), T (x l )}, then y mnl is computed as where the non-negative barycenter coordinates satisfy λ i + λ j + λ k + λ l = 1. Then the image of the inverse OT map is given by y mnl = λ i T (x i ) + λ j T (x j ) + λ k T (x k ) + λ l T (x l ), T -1 (y mnl ) = λixi + λjxj + λ k x k + λ l x l . ( ) We generate random samples {y k } according to the uniform distribution ν on Ω, then their images {T -1 (y k )} are the desired random samples following the distribution µ. In our experiment, we use the same Gaussian mixture settings of the density function as Sec. 4.1. To quantitatively compare the sampling results, we use the Chi-square goodness-of-fit test, which firstly groups the data and then computes the L 2 norm of the difference between the actual number of observations in each group and the expected number of observations. In our experiment, we set the group number to 64 × 64 × 64 and use 500K samples to make the comparison. The corresponding L 2 norm of each method is shown in the top-right of the corresponding figure. We can see that the both sampling strategies of our method give smaller scores than the classical ones.

4.3. VOLUMETRIC MAGNIFIER

In reality, physical magnifiers can only magnify planar images. In medical image processing, it is highly desirable to magnify certain regions of the 3D MRIs or CT images. Our algorithm can address such requests with the user prescribed region of interest (ROI) and magnifying factor. Suppose the ROI is a symmetric region with the center (x, ȳ, z) ∈ Ω and the radius σ x , σ y , σ z in different directions. The density function f of the source measure µ is defined as f (x, y, z) = 0.5 + 0.5e -((x-x) 2 /2σ 2 x +(y-ȳ) 2 /2σ 2 y +(z-z) 2 /2σ 2 z ) We compute OT map T : (Ω, µ) → (Ω, ν), where ν is the uniform distribution. Similar to the method in 3D adaptive sampling, we compute the Delaunay triangulation of the images {T (x ijk )}, then the OT map T is represented as a piecewise linear map. The inverse optimal transportation map Fig. 3 demonstrates our volumetric magnifier by magnifying an aneurysm on blood vessel Hansen & Johnson (2004) . We choose the aneurysm region as the ROI. The first column gives the snapshot of the blood vessel, and the yellow circle denotes the location of the aneurysm. The last three columns show the magnified aneurysm with different magnifying ratio from the same viewpoints. Moreover, we show the magnified volumetric knee from different viewpoints with different ROIs denoted by the blue boxes in Fig. 4 . Our method only magnifies the ROIs and keeps other regions unchanged. Compared with the traditional method requiring tedious zoom in/out, our method only magnifies the ROI region and keeps the whole subject in the field of view, which enables doctors to visualize the overall anatomy while scrutinize detailed anatomical structure at the same time.

5. CONCLUSION

In this paper, we propose the FFT-OT method to solve the optimal transportation problem. According to the Brenier theory, under the quadratic distance cost, finding the solution to the OT problem is equivalent to solving the Monge-Ampère equation, which can be linearized as a sequence of variant coefficient elliptic PDEs. Later, the variant coefficient PDEs are approximated by constant coefficient PDEs and solved by Fast Fourier Transformation. We also prove that the proposed method converges linearly. Experiments on volumetric data show that the FFT-OT can be used to sample from complex 3D density functions and magnify the volumetric data in medical images.

A RELATED WORK

There is a huge literature about optimal transportation. Here we will only briefly review the most related works. For detailed reviews, we refer readers to Santambrogio (2015) ; Peyré & Cuturi (2019) . The first type of algorithms is based on the Kantorovich theory. When both the input and output domains are Dirac masses, the Kantorovich problem can be treated as a standard linear programming (LP) task. In order to tackle large data sets, Cuturi (2013) 2017) also improve the efficiency by applying the Newton's method. When both the source and target measures are continuous, some interpolation methods are necessary Schwartzburg et al. (2014) . The major drawback of this type of algorithms is the high computational complexity of constructing the dynamic power diagram, which prevents them from handling high dimensional tasks. For example, for the 3D OT problems, these algorithms usually run very slow. The third type of algorithms is based on computational fluid dynamics Benamou et al. (2002) ; Papadakis et al. (2014) . These methods aim at finding a special temporal-spacial flow field that transports the initial source density to the target density with the minimal total kinetic energy. Then the diffeomorphism induced by the flow gives the optimal transport map under the quadratic Euclidean distance cost. However, this kind of algorithms are difficult to extend to high dimensional space. The fourth type of algorithms directly solve the Monge-Ampère equation using numerical methods. Loeper and Rapetti Loeper & Rapetti (2005) propose to solve the linearized Monge-Ampère equation defined on a flat torus in each iteration. Its corresponding variant coefficient elliptic PDE is converted to a positive definite linear system using the finite-difference scheme, which can be solved by the BiCG algorithm Endre (2020). Benamou et al. Benamou et al. (2014) propose to solve the linearized Monge-Ampère on more general domains using Newton's method. Nader and Guennebaud Nader & Guennebaud (2018) (2013) can handle more general target measures, the proposed work currently only deals with the Lebesgue target measure. Nevertheless, the current work can be directly generalized to handle general target measures as well. Lei and Gu Lei & Gu (2021) use the fixed point method to compute the 2-dimensional OT problem based on FFT, but it cannot be extended to solve the 3-dimensional problems. In this work, we combine the idea of linearizing the Monge-Ampère equation Loeper & Rapetti (2005) and the idea of FFT Saumier et al. (2013) . The key novelty of our proposed method is to use the mean linearized Monge-Ampère operator Eqn. ( 12) to replace the conventional linearized Monge-Ampere operator Eqn. ( 10). This replacement allows the algorithm to be implemented on GPUs and makes the algorithm hundreds of times faster. In the following, we compute the 3dimensional optimal transport problem by applying the proposed algorithm. Our method also runs more than 100× faster than the convex geometry based method Levy (2015) .

B APPENDIX THEORY

In the section, we give the detailed proofs for several lemmas and theorems. Some of them are well known in the Monge-Ampère PDE field and the applied mathematics field, we include them for the completeness.

B.1 EXISTENCE OF THE SOLUTION TO THE TIME DEPENDENT MONGE-AMP ÈRE EQNUATION

Let T n = R n /Z n be the n-dimensional flat torus. Below we sometimes identify it with Ω = [0, 1] n and assume all data are periodic. The existence and regularity of solutions to the Monge-Ampère equation are given by the following theorem, Theorem 5. Suppose a positive density function f : Ω → R is defined on Ω = [0, 1] n , such that Ω f (x)dx = 1, and f ∈ C α (Ω), then the solution u : Ω × [0, 1] to the time-dependent Monge- Ampère equation detD 2 x u(x, t) = (1 -t) + tf (x), ∇ x u(x, t)(Ω) = Ω 21) exists and is unique up to a constant. Furthermore, there exist constants 0 < λ < Λ, such that λ n p=1 ξ 2 p ≤ n p,q=1 u pq (x, t)ξ p ξ q ≤ Λ n p=1 ξ 2 p , ∀ξ ∈ R n , ∀(x, t) ∈ Ω × [0, 1]. We refer readers to Cordero-Erausquin (1999) for detailed proof. Weak Solution In practice, we compute the weak solution of the linearized Monge-Ampère Eqn. (6) using numerical methods. We first rewrite the differential operator to a divergence form, then define a bi-linear form. Since (u pq (x, t)) is the adjoint matrix of D 2 x u(x, t), by direct computation, we obtain n p=1 ∂ p u pq (x, t) = 0, ∀(x, t) ∈ Ω × [0, 1], ∀q = 1, . . . , n. so Eqn. ( 6) can be converted into the divergence form: n p=1 ∂ p n q=1 u pq ∂ q v = n p,q=1 u pq ∂ p ∂ q v + n q=1 n p=1 ∂ p u pq ∂ q v = n p,q=1 u pq ∂ p ∂ q v, we obtain n p=1 ∂ p n q=1 u pq (x, t)∂ q v(x, t) = f (x) -1. with Neumann boundary condition ∂v (x, t) ∂n = 0, ∀(x, t) ∈ ∂Ω × [0, 1]. For any w ∈ H 1 (Ω), by differentiation of product, we obtain n p=1 ∂ p n q=1 u pq ∂ q v w + n p=1 n q=1 u pq ∂ q v ∂ p w = n p=1 ∂ p n q=1 u pq ∂ q v w by integrating both sides, and from the fact that v satisfies the Neumann boundary condition, we deduce Ω n p=1 ∂ p n q=1 u pq ∂ q v w + Ω n p,q=1 u pq ∂ q v∂ p w = ∂Ω n p=1 n q=1 u pq ∂ q v w = 0. ( ) For any fixed time t ∈ [0, 1], by the divergence form, we can construct a bilinear form a : H 1 (Ω) × H 1 (Ω) and a linear form l : H 1 (Ω) → R, a(v, w) = n p,q=1 Ω u pq ∂ p v∂ q w, l(w) = - Ω (f -1)wdx. A weak solution to Eqn. ( 24) is a function v ∈ H 1 (Ω), such that a(v, w) = l(w), ∀w ∈ H 1 (Ω). By the uniform ellipticity Eqn. ( 22), the Lax-Milgram theorem Endre (2020) shows the existence of the weak solution.

B.2 DISCRETE LINEARIZED MONGE-AMP ÈRE EQUATION SOLVABILITY

Galerkin Method In practice, we construct a triangulation T of Ω, such that the ratio between the diameter and inscribe-sphere radius of each simplex is bounded, and variation of the diameters of all the simplexes is small. We call such kind of T a quasi-uniform triangulation, and denote the largest diameter as h. For each vertex v i ∈ T , we construct a piecewise linear base function φ i , such that φ i is linear on each triangle, φ i (v j ) is δ ij . We define a finite dimensional subspace V h ⊂ H 1 (Ω), V h := v h (x) := vi∈T λ i φ i (x), λ i ∈ R . Given a function u ∈ H 1 (Ω), we use u h ∈ V h to denote its approximation in V h . Furthermore, u h = i λ i φ i , we also use u h to represent the coefficient vector (λ 1 , λ 2 , . . . , λ k ) T depending on the context. The weak solution Eqn. ( 28) to the Monge-Ampère equation ( 6) is equivalent to find a v ∈ H 1 (Ω), such that a(v, w) = l(w) for all w ∈ H 1 (Ω). In discrete cases, we want to find v h ∈ V h , such that a(v h , w h ) = l(w h ), ∀w h ∈ V h . Eqn. ( 29) is equivalent to the linear system,     a(φ 1 , φ 1 ) a(φ 2 , φ 1 ) • • • a(φ N , φ 1 ) a(φ 1 , φ 2 ) a(φ 2 , φ 2 ) • • • a(φ N , φ 2 ) . . . . . . . . . a(φ 1 , φ N ) a(φ 2 , φ N ) • • • a(φ N , φ N )         λ 1 λ 2 . . . λ N     =     l(φ 1 ) l(φ 2 ) . . . l(φ N )     From the weak solution to the linearized Monge-Ampère equation ( 10), we obtain the linear system Eqn. (30). We denote the stiffness matrix A = (a(φ i , φ j )). By the uniform ellipticity Eqn. ( 22), and V h ⊂ H 1 (Ω) a(v, v) ≥ λ∥∇v∥ 2 L 2 (Ω) Assume Ω vdx = 0, by Poincaré inequality, ∥∇v∥ 2 L 2 (Ω) ≥ C 1 (Ω)∥v∥ 2 L (Ω), ∀v ∈ H 1 (Ω), Ω vdx = 0, where the constant C 1 (Ω) depends on Ω. Combine the above two inequalities, we obtain a(v, v) ≥ c∥v∥ 2 L 2 (Ω) , ∀v ∈ H 1 (Ω), Ω vdx = 0. Similarly, By the uniform ellipticity Eqn. 22, and V h ⊂ H 1 (Ω) a(v, v) ≤ Λ∥∇v∥ 2 L 2 (Ω) For linear finite element and quasi-uniform triangulation, we have the inverse Poincaré inequality, ∥∇v h ∥ 2 L 2 ≤ C 2 (Ω)h -1 ∥v h ∥ 2 L 2 . where h is the diameter of each element. Combine the above two inequalities, we obtain a(v h , v h ) ≤ C∥v h ∥ 2 L 2 (Ω) , ∀v h ∈ V h . By combining the inequalities Eqn. (31) and Eqn. (32), we obtain 1 C 3 ∥v h ∥ 2 L 2 (Ω) ≤ a(v h , v h ) ≤ C 3 ∥v h ∥ 2 L 2 (Ω) , ∀v h ∈ V h , Ω v h = 0, where C 3 > 1 is a constant. Suppose v h = n i=1 ξ i φ i , then ∥v h ∥ 2 L 2 (Ω) = Ω v 2 h dx = n i,j=1 ξ i ξ j Ω φ i (x)φ j (x)dx = ξ T Φξ, where ξ = (ξ i ) and the matrix Φ = Ω φ i φ j is positive definite. Therefore, 1 C 4 ∥ξ∥ 2 ≤ ξ T Φξ < C 4 ∥ξ∥ 2 . ( ) By a(v h , v h ) = ξ T Aξ, combing inequalities Eqn. (33) and Eqn. (34), we obtain 1 C 3 C 4 ∥ξ∥ 2 ≤ ξ T Aξ ≤ C 3 C 4 ∥ξ∥ 2 , ∀ξ ∈ R n , n i=1 ξ i = 0, where C 3 C 4 > 1. This proves the following lemma, Lemma 6. By using Galerkin method using linear elements to numerically approximate the weak solution Eqn. ( 28) to the linearized Monge-Ampère Eqn. ( 6), if the uniform ellipticity Eqn. ( 22) holds, and the triangulation T is quasi-uniform, then the stiffness matrix of the linear system Eqn. ( 30) is positive definite on the space n i=1 ξ i = 0, 1 C 3 C 4 ∥ξ∥ 2 ≤ ξ T Aξ ≤ C 3 C 4 ∥ξ∥ 2 , ∀ξ ∈ R n , n i=1 ξ i = 0, ( ) where C 3 C 4 > 1. Since the uniform ellipticity Eqn. ( 22) holds for any time t ∈ [0, 1], then we obtain Corollary 7. By using Galerkin method with linear elements on quasi-uniform triangulations, the linearized Monge-Ampère equation in the continuity method Eqn. ( 6) always has a solution v h ∈ V h for any t ∈ [0, 1]. Please note that the central differential scheme can be treated as Galerkin's method on a special uniform triangulation. Therefore, the above estimates still hold.

B.3 CONVERGENCE RATE

Theorem 8 (main). Given a domain Ω ⊂ R n , which is a canonical cuboid Ω = [-1, 1] n , and a positive density function f : Ω → R with the balance condition Ω f (x)dx = Ω 1 • dx, suppose the mirror reflection extension Eqn. ( 14) of f to the flat torus f : T n → R is C α , α ∈ (0, 1), then Monge-Ampère equation, detD 2 u(x) = f (x), ∇u(Ω) = Ω can be solved using FFT-OT Algorithm Alg. (1). In particular, one can choose the step length parameter τ , such that there is a constant 0 < γ < 1, the approximation error satisfies ∥f -ρ k+1 ∥ 2 < Cγ k , namely the algorithm has a linear convergence rate. Proof. Suppose at the k + 1-th iteration, ρ k+1 = det(I + D 2 u k+1 ), ∥v k ∥ ∼ O(τ -1 ), f -ρ k+1 = f -det(I + D 2 u k + D 2 v k ) = f -det(I + D 2 u k ) - pq u pq k ∂ p ∂ q v k + o(τ -1 ) = (f -ρ k ) -L k [v k ] + o(τ -1 ) where L k [v k ] = pq u pq k ∂ p ∂ q v k . Hence by integration by parts Eqn. ( 27), ∥f -ρ k+1 ∥ 2 L 2 (Ω) = ∥f -ρ k ∥ 2 L 2 (Ω) -2 Ω L k [v k ](f -ρ k ) + o(τ -1 ) = ∥f -ρ k ∥ 2 L 2 (Ω) + 2a k (f -ρ k , v k ) + o(τ -1 ) where a k is the bilinear form in Eqn.( 27). In the discrete case, all functions are in V h , we denote ∥u h ∥ 2 Φ := ∥u h ∥ 2 L 2 (Ω) = u T h Φu h , ∥u h ∥ 2 := u T h u h , ∥u h ∥ 2 A := u T h Au h , by the inequality Eqn. (34) and Eqn. 35, 1 C 4 ∥u h ∥ 2 ≤ ∥u h ∥ 2 Φ ≤ C 4 ∥u h ∥ 2 , 1 C 3 C 4 ∥u h ∥ 2 ≤ ∥u h ∥ 2 A ≤ C 3 C 4 ∥u h ∥ 2 . Therefore ∥f h -ρ h,k+1 ∥ 2 Φ = ∥f h -ρ h,k ∥ 2 Φ -2τ -1 (f -ρ h,k ) T A k Ā-1 k (f h -ρ h,k ) + o(τ -1 ), where A k is the stiffness matrix in Eqn.( 30), and Āk is the mean stiffness matrix. ( By the uniform ellipticity Eqn. ( 22), the eigen values of the adjoint matrix (u pq )(x, t) is uniformly bounded away from zero in the space H := {ξ ∈ R n | i ξ i = 0}, so the eigen value of the mean adjoint matrix ūpq (t) is bounded away from zero in H. After discretization, the eigen values of Āk is strictly positive in H, hence Āk is invertible in H. In the following discussion, the term o(τ -1 ) will be ignored.) Remark that the following displayed equation is a scalar (f h -ρ h,k ) T A k Ā-1 k (f -ρ h,k ) = tr((f h -ρ h,k ) T A k Ā-1 k (f h -ρ h,k )) Since A k and Āk are symmetric, positive definite on the space i ξ i = 0, ∥A k ∥ 2 ≤ C 3 C 4 and ∥ Āk ∥ 2 ≤ C 3 C 4 , so are their inverses. Since A n and Ān are symmetric, positive definite on the space orthogonal to (1, 1, . . . , 1) T , by Eqn. ( 35) and ∥A k Ā-1 k ∥ ≤ ∥A k ∥∥ Ā-1 k ∥, we have (n -1) C 2 3 C 3 4 ∥f h -ρ h,k ∥ 2 Φ ≤ (f h -ρ h,k ) T A k Ā-1 k (f h -ρ h,k ). Plug into Eqn. (37), we have ∥f h -ρ h,k+1 ∥ 2 Φ ≤ 1 - 1 τ (n -1) C 2 3 C 3 4 ∥f h -ρ h,k ∥ 2 Φ ≤ 1 - 1 τ (n -1) C 2 3 C 3 4 k ∥f h -ρ h,0 ∥ 2 Φ . ( ) We can choose the step-length τ -1 , such that γ ∈ (0, 1), where γ = 1 - (n -1) τ C 2 3 C 3 4 . Therefore ∥f h -ρ h,k+1 ∥ 2 Φ ≤ γ k ∥f h -ρ h,0 ∥ 2 Φ ≤ C 4 γ k ∥f h -ρ h,0 ∥ 2 . B.4 DIFFERENTIAL OPERATOR USING FFT By using the Discrete Fourier Transformation, the differential operators can be converted to algebraic operators in the frequency domain. Lemma 9. Suppose the discrete function is u i,j,k , with discrete Fourier transformation u i,j,k = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l e √ -1 2πmi M e √ -1 2πnj N e √ -1 2πlk L then the differential operator using central difference ∂ i ∂ i u i,j,k is given by ∂ i ∂ i u i,j,k = 1 h 2 x (u i+1,j,k + u i-1,j,k -2u i,j,k ) = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l 2 cos 2πm M -1 h 2 x e ι 2πmi M e ι 2πnj N e ι 2πlk L where ι = √ -1, and ∂ i ∂ j u i,j,k is given by, ∂ i ∂ j u i,j,k = 1 4h x h y (u i+1,j+1,k + u i-1,j-1,k -u i+1,j-1,k -u i-1,j+1,k ) = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l -sin 2πm M sin 2πn N h x h y e ι 2πmi M e ι 2πnj N e ι 2πlk L Proof. By equations cos(A + α) + cos(A -α) -2 cos(A) =(cos A cos α -sin A sin α) + (cos A cos α + sin A sin α) -2 cos A =2(cos α -1) cos A and sin(A + α) + sin(A -α) -2 sin(A) =(sin A cos α + cos A sin α) + (sin A cos α -cos A sin α) -2 cos A =2(cos α -1) sin A we obtain 1 h 2 x [e ι 2πm(i+1) M + e ι 2πm(i-1) M -2e ι 2πmi M ] = 2 cos 2πm M -1 h 2 x e ι 2πmi M by direct computation, we have ∂ i ∂ i u i,j,k = 1 h 2 x (u i+1,j,k + u i-1,j,k -2u i,j,k ) = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l e ι 2πm(i+1) M + e ι 2πm(i-1) M -2e ι 2πmi M h 2 x e ι 2πnj N e ι 2πlk L = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l 2 cos 2πm M -1 h 2 x e ι 2πmi M e ι 2πnj N e ι 2πlk L Similarly, by equations cos(A + α + B + β) + cos(A -α + B -β) -cos(A + α + B -β) -cos(A -α + B + β) = cos(A + B + α + β) + cos(A + B -α -β) -cos(A + B + α -β) -cos(A + B -α + β) =2 cos(A + B) cos(α + β) -2 cos(A + B) cos(α -β) =2 cos(A + B)(cos(α + β) -cos(α -β)) =2 cos(A + B)(cos α cos β -sin α sin β -cos α cos β -sin α -sin β) = -4 cos(A + B) sin α sin β and sin(A + α + B + β) + sin(A -α + B -β) -sin(A + α + B -β) -sin(A -α + B + β) = sin(A + B + α + β) + sin(A + B -α -β) -sin(A + B + α -β) -sin(A + B -α + β) =2 sin(A + B) cos(α + β) -2 sin(A + B) cos(α -β) =2 sin(A + B)(cos(α + β) -cos(α -β)) =2 sin(A + B)(cos α cos β -sin α sin β -cos α cos β -sin α -sin β) = -4 sin(A + B) sin α sin β we deduce the following equation, ∂ i ∂ j u i,j,k = 1 4h x h y (u i+1,j+1,k + u i-1,j-1,k -u i+1,j-1,k -u i-1,j+1,k ) = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l -sin 2πm M sin 2πn N h x h y e ι 2πmi M e ι 2πnj N e ι 2πlk L Similarly, we have the representations of other differential operators in the frequency domain, ∂ j ∂ j u i,j,k = 1 h 2 x (u i,j+1,k + u i,j-1,k -2u i,j,k ) = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l 2 cos 2πn N -1 h 2 y e ι 2πmi M e ι 2πnj N e ι 2πlk L ∂ k ∂ k u i,j,k = 1 h 2 z (u i,j,k+1 + u i,j,k-1 -2u i,j,k ) = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l 2 cos 2πl L -1 h 2 z e ι 2πmi M e ι 2πnj N e ι 2πlk L ∂ j ∂ k u i,j,k = 1 4h y h z (u i,j+1,k+1 + u i,j-1,k-1 -u i,j+1,k-1 -u i,j-1,k+1 ) = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l -sin 2πn N sin 2πl L h y h z e ι 2πmi M e ι 2πnj N e ι 2πlk L ∂ k ∂ i u i,j,k = 1 4h z h x (u i+1,j,k+1 + u i-1,j,k-1 -u i+1,j,k-1 -u i-1,j,k+1 ) = 1 M N L M -1 m=0 N -1 n=0 L-1 l=0 ûm,n,l -sin 2πl L sin 2πm M h z h x e ι 2πmi M e ι 2πnj N e ι 2πlk L C ALGORITHM PIPELINES In this section, we give the algorithm pipeline of the FFT-OT in Alg. 1 and the details to solve the costant coefficient elliptic PDE through FFT in Alg. 2. 

D APPENDIX EXPERIMENTS

In this section, as a compensation of the experiments in the main paper, we give more results on the 3D adaptive sampling and volumetric magnifier.

D.1 MORE RESULTS ON 3D ADAPTIVE SAMPLING

In the experiments, we set the density function f (x) = 30 i=1 p i N (µ i , Σ i ), where N (µ i , Σ i ) represents Gaussian distribution with mean µ i and variance Σ i = diag(σ 2 i0 , σ 2 i1 , σ 2 i2 ). µ i ∈ R 3 is uniformly sampled from [0, 1] 3 , σ ij is uniformly sampled from [0, 0.5], p i ∈ R is uniformly sampled from [0.2, 1] and normalized such that Ω f (x)dx = 1. Thus the source distribution µ is a complicated Gaussian mixture distribution restricted on Ω = [0, 1] 3 . After computing the OT map In order to keep the consistency with the mirror reflection process in the FFT-OT algorithm, we also reflect the the generated samples back to Ω. To visualize the results of the kth slice, we plot the samples whose z coordinates satisfy the inequality, k/128 -1/256 ≤ z ≤ k/128 + 1/256. In Fig. 5 and Fig. 6 , we give more sampling results of different slices correspond to the two models used in Fig. 2 in the main paper. Fig. 5 visualize the density function restricted on the 40th, 56th, 72th and 80th slices for different methods of the model displayed in the first row of 2. Fig. 6 visualize the density function restricted on the 56th, 64th, 80th and 88th slices for different methods of the model displayed in the second row of 2. Compared with the classical methods, the both sampling strategies of our method give decent sampling results that fit the prescribed density function well. Moreover, the number of generated samples for different slices of the same 3D model fits the density functions restricted to the corresponding slices well, namely more samples are generated in the brighter regions for different slices.

D.2 MORE RESULTS ON VOLUMETRIC MAGNIFIER

In this experiment, we magnify the volumetric MRI image of the aneurysm by different amplification factors. In Fig. 7 , we show the original aneurysm viewed from difference angles in the first column. The last three columns give the magnified results with different amplification factors from the viewpoints same as those in the first column. We can see that the aneurysm region is successfully magnified by different factors and the rest parts of the volume nearly keeps the same.



Figure 1: Convergence Analysis.

Figure 2: 3D density function sampling. (a) The density functions in a slice. The slices in each row come from two different density functions. (b)-(f) The samples obtained by different sampling methods. (b) Rejection sampling. (c) Metropolis-Hastings (MH) algorithm Bishop (2006). (d) Slice sampling Neal (2003). (e) The sampling results by mapping the random samples from the uniform distribution back to the desired distribution with T -1 . (f) The sampling results by mapping the grid centers back with T -1 . The scores of the top right give the results of the Chi-square goodness-of-fit test. Smaller means better.

visualizes the generated samples. We randomly pick the k-th slice along the z-direction from the discretized volume, draw the source density function on this slice, and use pixel intensity to represent the density in Fig.2(a). (i) We uniformly generate 100k random samples {y k } ⊂ Ω, and obtain the desired random samples by applying the inverse OT map {T -1 (y k )}. (ii) We also set {y k } as the grid centers of Ω and obtain the corresponding samples of the desired distribution µ. The samples around the k-th slice of both sampling strategies are plotted in Fig. 2(e) and Fig. 2(f). By visual comparison, it is obvious that the distributions of Fig. 2(e) and Fig. 2(f) are consistent with the density function in Fig. 2(a). The consistency of the boundary of Fig. 2(e) and (f) and Fig. 2(a) also verifies the obliqueness boundary condition of the Monge-Ampère equation. To further show the performance of the proposed method, we compare it with the classical sampling methods, namely rejection sampling, the Metropolis-Hastings algorithm Bishop (2006) and the slice sampling Neal (2003), shown in Fig. 2(b), Fig. 2(c) and Fig. 2(d).

Figure 3: The volume magnifier of an aneurysm. The first column shows the original volumetric data, and the last three columns give the magnified data from the same viewpoints with different magnifying ratios. The yellow circle denotes the ROI/aneurysm. To obtain the results, we set σ = σx = σy = σz, and they are 0.83, 0.75 and 0.5 respectively.

Figure 4: The volume magnifier of the knee. The first row gives the the original volumetric data with different ROIs denoted by the blue boxes from different viewpoints, and the second row shows the corresponding magnified results. In the experiments we set σx = σy = σz = 0.75.

FFT-OT Input: Domain Ω = [-1, 1] 3 , the source density function f > 0, the target density g = 1, step length τ , approximation error threshold ε Output: Solution 1 2 ∥x∥ 2 + un to the Monge-Ampère Eqn. (2) with the corresponding boundary condition. Initialize u0(x) = 0; while true do Compute the Hessian matrix D 2 un(x); Compute the density function ρn(x) ← det(I + D 2 un(x)); if ∥f -ρn∥ L 2 (Ω) < ε then Break; Compute the adjoint matrix [H pq n (x)] ← Adj(I + D 2 un(x)); Compute the mean adjoint matrix [ Hpq n ] using Eqn. (11); Solve the constant coefficient elliptic PDE (12) using the FFT Solver Alg. 2; Update the Brenier potential un+1(x) ← un + τ vn;

Figure 5: 3D density function sampling. (a) The density functions in different slices of the same model, namely the 40th, 56th, 72th and 80th. (b)-(f) The samples obtained by different sampling methods. (b) Rejection sampling. (c) Metropolis-Hastings (MH) algorithm Bishop (2006). (d) Slice sampling Neal (2003). (e) The sampling results by mapping the random samples from the uniform distribution back to the desired distribution with T -1 . (f) The sampling results by mapping the grid centers back with T -1 . The scores of the top right give the results of the Chi-square goodness-of-fit test. Smaller means better. Zoom in for better visualization.

FFT Solver for the Constant Coefficient Elliptic PDE Input: Domain Ω = [-1, 1] 3 , M, N, L, {a pq }, b r , c, function f with the periodic boundary condition Output: Solution u to the elliptic PDE Eqn. (18) Discretize the domain Ω to a M × N × L grid; Sample the function f to f i,j,k ;Compute FFT using Eqn. (16), { fm,n,l } ← FFT({f i,j,k });for (m, n, l) ∈ [0, M -1] × [0, N -1] × [0, L -1] doCompute the factor λ m,n,l using Eqn. (19); if λ m,n,l is 0 then ûm,n,l ← 0; else ûm,n,l ← fm,n,l /λ m,n,l ;Compute the Inverse FFT using Eqn. (17), {u i,j,k } ← IFFT({û m,n,l }); Return {u i,j,k }.

Ours-rand (f) Ours-grid

Figure 6: 3D density function sampling. (a) The density functions in different slices of the same model, namely the 56th, 64th, 80th and 88th. (b)-(f) The samples obtained by different sampling methods. (b) Rejection sampling. (c) Metropolis-Hastings (MH) algorithm Bishop (2006). (d) Slice sampling Neal (2003). (e) The sampling results by mapping the random samples from the uniform distribution back to the desired distribution with T -1 . (f) The sampling results by mapping the grid centers back with T -1 . The scores of the top right give the results of the Chi-square goodness-of-fit test. Smaller means better. Zoom in for better visualization.

adds an entropic regularizer to the original LP problem and the regularized problem can be quickly solved by the Sinkhorn algorithm. Recently, various algorithms have been proposed to further accelerate the computation by improving the efficiency of matrix-vector multiplications, including the Greenkhorn Altschuler et al. (2017), Screenkhorn Alaya et al. (2019) and the NYS-SINK Altschuler et al. (2019) algorithms. Dvurechensky et al. Dvurechensky et al. (2018) also propose the adaptive primal-dual accelerated gradient descent algorithm (APDAGD) to solve the discrete OT problem. An et al.An et al. (2022) compute the approximate OT plan by smoothing the dual Kantorovich problem and solving it with the FISTA method. This kind of methods have limitations: (i) they only give transport plans and cannot produce the bijective transportation maps; and (ii) the computational complexity is too high to apply them in the scenarios with huge number of samples.

apply the similar discretization strategy and solve the Monge-Ampère equation by conjugate gradient method. Saumier et al.Saumier et al. (2013) propose to solve the linearized Monge-Ampère equation using FFT. In each iteration the elliptic PDE with spacial and temporal variant coefficients is converted to a group of linear equations in the frequency domain, which is solved by the GMRES algorithm. Although the GMRES algorithm can be implemented on GPUs Aliaga et al. (2019), there is no available open source code. The work inSaumier et al. (2013) focuses on periodic boundary condition, but this our proposed work focuses on general second boundary condition; the work inSaumier et al. (2013) concerns planar OT maps, ours emphasizes on volumetric OT maps, which has higher complexity. The work inSaumier et al.

