A CONVERGENT SINGLE-LOOP ALGORITHM FOR RE-LAXATION OF GROMOV-WASSERSTEIN IN GRAPH DATA

Abstract

In this work, we present the Bregman Alternating Projected Gradient (BAPG) method, a single-loop algorithm that offers an approximate solution to the Gromov-Wasserstein (GW) distance. We introduce a novel relaxation technique that balances accuracy and computational efficiency, albeit with some compromises in the feasibility of the coupling map. Our analysis is based on the observation that the GW problem satisfies the Luo-Tseng error bound condition, which relates to estimating the distance of a point to the critical point set of the GW problem based on the optimality residual. This observation allows us to provide an approximation bound for the distance between the fixed-point set of BAPG and the critical point set of GW. Moreover, under a mild technical assumption, we can show that BAPG converges to its fixed point set. The effectiveness of BAPG has been validated through comprehensive numerical experiments in graph alignment and partition tasks, where it outperforms existing methods in terms of both solution quality and wall-clock time.

1. INTRODUCTION

The GW distance provides a flexible way to compare and couple probability distributions supported on different metric spaces. This has led to a surge in literature that applies the GW distance to various structural data analysis tasks, including 2D/3D shape matching (Peyré et al., 2016; Mémoli & Sapiro, 2004; Mémoli, 2009) , molecule analysis (Vayer et al., 2018; 2019a) , graph alignment and partition (Chowdhury & Mémoli, 2019; Xu et al., 2019b; a; Chowdhury & Needham, 2021; Gao et al., 2021) , graph embedding and classification (Vincent-Cuaz et al., 2021b; Xu et al., 2022) , generative modeling (Bunne et al., 2019; Xu et al., 2021) . Although the GW distance has gained a lot of attention in the machine learning and data science communities, most existing algorithms for computing the GW distance are double-loop algorithms that require another iterative algorithm as a subroutine, making them not ideal for practical use. Recently, an entropy-regularized iterative sinkhorn projection algorithm called eBPG was proposed by Solomon et al. (2016) , which has been proven to converge under the Kurdyka-Łojasiewicz framework. However, eBPG has several limitations. Firstly, it addresses an entropic-regularized GW objective, whose regularization parameter has a major impact on the model's performance. Secondly, it requires solving an entropic optimal transport problem at each iteration, which is both computationally expensive and not practical. In an effort to solve the GW problem directly, Xu et al. (2019b) proposed the Bregman projected gradient (BPG), which is still a double-loop algorithm that relies on another iterative algorithm as a subroutine. Additionally, it suffers from numerical instability due to the lack of an entropic regularizer. While Vayer et al. (2019a) ; Mémoli (2011) introduced the Frank-Wolfe method to solve the GW problem, they still relied on linear programming solvers and line-search schemes, making it unsuitable for even medium-sized tasks. Recently, Xu et al. (2019b) developed a simple heuristic, single-loop method called BPG-S based on BPG that showed good empirical performance on node correspondence tasks. However, its performance in the presence of noise is unknown due to the lack of theoretical support. The main challenge lies in efficiently tackling the Birkhoff polytope constraints (i.e., the polytope of doubly stochastic matrices) for the coupling matrix. The key issue is that there is no closed update for its Bregman projection, which forces current algorithms to rely on computationally expensive or hyperparameter-sensitive iterative methods. To address this difficulty, we propose a single-loop algorithm (BAPG) that solves the GW distance approximately. Our solution incorporates a novel relaxation technique that sacrifices some feasibility of the coupling map to achieve computational efficiency. This violation is acceptable for certain learning tasks, such as graph alignment and partition, where the quality of the coupling is not the primary concern. We find that BAPG can obtain desirable performance on some graph learning tasks as the performance measure for those tasks is the matching accuracy instead of the sharpness of the probabilistic correspondence. In conclusion, BAPG offers a way to sacrifice the feasibility for both computational efficiency and matching accuracy. In our approach, we decouple the Birkhoff polytope constraint into separate simplex constraints for the rows and columns. The projected gradient descent is then performed on a constructed penalty function using an alternating fashion. By utilizing the closed-form Bregman projection of the simplex constraint with relative entropy as the base function, BAPG only requires matrix-vector/matrix-matrix multiplications and element-wise matrix operations at each iteration, making it a computationally efficient algorithm. Thus, BAPG has several convenient properties such as compatibility with GPU implementation, robustness with regards to the step size (the only hyperparameter), and low memory requirements. Next, we investigate the approximation bound and convergence behavior of BAPG. We surprisingly discover that the GW problem satisfies the Luo-Tseng error bound condition (Luo & Tseng, 1992) . This fact allows us to bound the distance between the fixed-point set of BAPG and the critical point set of the GW problem, which is a notable departure from the usual approach of utilizing the Luo-Tseng error bound condition in establishing the linear convergence rate for structured convex problems (Zhou & So, 2017) . With this finding, we are able to quantify the approximation bound for the fixed-point set of BAPG explicitly. Moreover, we establish the subsequence convergence result when the accumulative asymmetric error of the Bregman distance is bounded. Lastly, we present extensive experimental results to validate the effectiveness of BAPG for graph alignment and graph partition. Our results demonstrate that BAPG outperforms other heuristic single-loop and theoretically sound double-loop methods in terms of both computational efficiency and matching accuracy. We also conduct a sensitivity analysis of BAPG and demonstrate the benefits of its GPU acceleration through experiments on both synthetic and real-world datasets. All theoretical insights and results have been well-corroborated in the experiments.

2. PROPOSED ALGORITHM

In this section, we begin by presenting the GW distance as a nonconvex quadratic problem with Birkhoff polytope constraints. We then delve into the theoretical insights and computational characteristics of our proposed algorithm, BAPG. The Gromov-Wasserstein distance was first introduced in (Mémoli, 2011; 2014; Peyré et al., 2019) as a way to quantify the distance between two probability measures supported on different metric spaces. More precisely: Definition 2.1 (GW distance). Suppose that we are given two unregistered compact metric spaces (X , d X ), (Y, d Y ) accompanied with Borel probability measures µ, ν respectively. The GW distance between µ and ν is defined as inf π∈Π(µ,ν) |d X (x, x ′ )d Y (y, y ′ )| 2 dπ(x, y)dπ(x ′ , y ′ ), where Π(µ, ν) is the set of all probability measures on X × Y with µ and ν as marginals.

