LEARNING ARBORESCENCE WITH AN EFFICIENT IN-FERENCE ALGORITHM

Abstract

We consider a class of structured learning problems on arborescence (i.e., the directed spanning tree) from the input graph. The key step involved in this problem is predicting the minimal weight arborescence (MWA) from the learned model. In literature, there are two lines of research for predicting MWA: the Chu-Liu Edmonds (CLE) (Chu & Liu, 1965) and the Lovász (Lovász, 1985) methods. The CLE method is easy to implement while it takes O(n) cycle contractions. Here n is the graph size. The Lovász method reduces to the multi-pair shortest path (MPSP) problem and takes only O(log n) contractions. Nevertheless, in the CPU setting, MPSP has the same time complexity as finding MWA. The Lovász method only attains time efficiency under a sufficient GPU setting. Both the aforementioned methods are painfully slow for large-scale learning tasks. In this research, we find the general MPSP problem can be simplified when working with machine learning models. This is because the learning model predicts edge weights for all pairs of vertices and the graph we process is always complete. Therefore, we only need to handle those paths that directly enter every weakly connected component (WCC) while the classic Lovász method need to handle all possible paths. This allows us to propose Lazy Lováz (Lavá) method that enjoys O(log n) contractions as well as efficient performance in both CPU and GPU settings. In experiments, we consider synthetic datasets and two real-world learning tasks, i.e., graph-based dependency parsing and unsupervised parsing on ListOps. The empirical results exhibit important gains of our Lavá method to the classic CLE and Lovász methods, that Lavá boosts the training time for arborescence learning tasks.

1. INTRODUCTION

This paper primarily focuses on a structured learning with the output structure being arborescence (i.e., directed spanning tree). Examples of real-world arborescence learning problems include graphbased dependency parsing (Koo et al., 2007) and ListOps parsing (Nangia & Bowman, 2018) . At every step of learning, the neural model needs to infer the minimum weight arborescence (MWA). This inference procedure is mainly resolved by the Chu-Liu Edmonds algorithm (CLE) (Chu & Liu, 1965; Edmonds, 1967) or the Lovász method (Lovász, 1985) . The CLE method is straightforward in its implementation (Gabow et al., 1986; Mendelson et al., 2004) . However, when dealing with larger-scale input graphs, CLE spends the majority of the time on cycle contractions. In fact, we show the CLE takes O(n) rounds of contractions theoretically and empirically. Here n is the size of vertices in the graph. Lovász (1985) transformed the original finding MWA problem into the multi-pair shortest path (MPSP) problem. The Lovász method only needs O(log n) contractions. Under sufficient GPU setting where the shortest path can be computed efficiently, Lovász algorithm attains faster running time over the CLE method. Later works marginally improve the classic Lovász with less GPU resource (Amato, 1993) and generalize to distributed network (Fischer & Oshman, 2021) . Since the learning model predicts the edge weights for all pairs of vertices, the graph we process is always complete. After the edge pre-process step, all the edge weights are non-negative (in Remark 1) and the graph is partitioned into several weakly connected components (WCCs). The classic Lovász method asks for computing the shortest path for every outside vertex x i into the cycle in the current WCC, which may travel around all the vertices and go inside and then outside In Lovász method, the outside vertex may find a path that wanders around over several WCCs, that involves all the vertices in the graph. (c) Our Lavá method computes those paths that directly enter the WCC and never go outside. Our method limits the scope of the shortest path and thus attains time efficiency. of the current WCC. We observe that this can be simplified by considering those paths that directly enter the current component and never go outside. Such paths always exist because the graph we process is densely connected. We thus save the computation workload by limiting the scope of the shortest path. This observation allows us to attain time efficiency and thus boosts the training time for large-scale learning problems. Fig. 1 illustrates this idea with one potential graph. In this research, we introduce, Lazy Lováz (Lavá), an unified algorithm for finding MWA. Except for the above observation, we also introduce a bag of tricks for finding MWA on the large-scale and densely-connected graph. To detect the cycle and compute the shortest path, we exploit the property of matrix power to replace DFS-based algorithms. The whole method is described in the language of matrix computation. Therefore, our method is not only convenient and efficient for implementation with Numpy or Pytorch but also transparent to the CPU or GPU devices. In experiments, we show the advantage of our Lavá approach in synthesized graphs and two realworld applications. In the synthetic experiment, we first show the empirical number of contractions by the CLE algorithm increases linearly with the graph size while our method takes much fewer contractions. Then we show that our Lavá method takes much less empirical running time to find MWA than CLE approach on a large-scale setting. Furthermore, for the dependency parsing task datasets and unsupervised parsing over Listops task, we show the proposed Lavá method attain show better running time than the CLE method over real-world datasets. Our contributions are: (1) We propose an efficient inference algorithm (i.e., Lavá) for finding MWA, which is more compatible with and efficient for large-scale arborescence learning problems. (2) Theoretically, we show that our Lavá method takes O(log n) and CLE takes O(log n) round of contractions. We further show the correctness of our Lavá after adopting our observation on the shortest path. (3) We evaluate our method on synthetic data and two real-world applications. We show that Lavá is faster than CLE over all the datasets and tasks.

2. PRELIMINARIES 2.1 LEARNING ARBORESCENCE

Notations. Denote G(V, E) as a weighted and directed graph with a given root r ∈ V . Root r has no incoming edges and the rest vertices are strongly connected. The adjacency matrix A that represents the connectivity of the graph is: for x i , x j ∈ V , A i,j = A(x i , x j ) = ϕ(x i , x j ) If e(x i , x j ) ∈ E ∞ If e(x i , x j ) ̸ ∈ E (1) where ϕ(x i , x j ) : V × V → R denotes the weight of edge (x i , x j ) in the graph. We assume edge weights are all finite. All the incoming edges of vertex x i correspond to i-th column vector A :,i and all the outgoing edges can be found in i-th row vector A i,: . Min-plus Product. The min-plus product, is defined on the adjacency matrix for computing the shortest path in the graph (Williams & Xu, 2020). We denote "⋆" as the min-plus product. Let



Efficient SSSP in our Lavá paths visit only inside vertices (a) Graph after ''edge pre-process'' the cycle outside vertices range of one WCC 1: Our Lavá is more efficient than the classic when computing the shortest path from outside to the cycle. (a) The pre-processed graph with circle denotes vertices and arrow (→) denotes edges. "WCC" stands for weakly connected component. (b)

