LEARNING ARBORESCENCE WITH AN EFFICIENT IN-FERENCE ALGORITHM

Abstract

We consider a class of structured learning problems on arborescence (i.e., the directed spanning tree) from the input graph. The key step involved in this problem is predicting the minimal weight arborescence (MWA) from the learned model. In literature, there are two lines of research for predicting MWA: the Chu-Liu Edmonds (CLE) (Chu & Liu, 1965) and the Lovász (Lovász, 1985) methods. The CLE method is easy to implement while it takes O(n) cycle contractions. Here n is the graph size. The Lovász method reduces to the multi-pair shortest path (MPSP) problem and takes only O(log n) contractions. Nevertheless, in the CPU setting, MPSP has the same time complexity as finding MWA. The Lovász method only attains time efficiency under a sufficient GPU setting. Both the aforementioned methods are painfully slow for large-scale learning tasks. In this research, we find the general MPSP problem can be simplified when working with machine learning models. This is because the learning model predicts edge weights for all pairs of vertices and the graph we process is always complete. Therefore, we only need to handle those paths that directly enter every weakly connected component (WCC) while the classic Lovász method need to handle all possible paths. This allows us to propose Lazy Lováz (Lavá) method that enjoys O(log n) contractions as well as efficient performance in both CPU and GPU settings. In experiments, we consider synthetic datasets and two real-world learning tasks, i.e., graph-based dependency parsing and unsupervised parsing on ListOps. The empirical results exhibit important gains of our Lavá method to the classic CLE and Lovász methods, that Lavá boosts the training time for arborescence learning tasks.

1. INTRODUCTION

This paper primarily focuses on a structured learning with the output structure being arborescence (i.e., directed spanning tree). Examples of real-world arborescence learning problems include graphbased dependency parsing (Koo et al., 2007) and ListOps parsing (Nangia & Bowman, 2018) . At every step of learning, the neural model needs to infer the minimum weight arborescence (MWA). This inference procedure is mainly resolved by the Chu-Liu Edmonds algorithm (CLE) (Chu & Liu, 1965; Edmonds, 1967) or the Lovász method (Lovász, 1985) . The CLE method is straightforward in its implementation (Gabow et al., 1986; Mendelson et al., 2004) . However, when dealing with larger-scale input graphs, CLE spends the majority of the time on cycle contractions. In fact, we show the CLE takes O(n) rounds of contractions theoretically and empirically. Here n is the size of vertices in the graph. Lovász (1985) transformed the original finding MWA problem into the multi-pair shortest path (MPSP) problem. The Lovász method only needs O(log n) contractions. Under sufficient GPU setting where the shortest path can be computed efficiently, Lovász algorithm attains faster running time over the CLE method. Later works marginally improve the classic Lovász with less GPU resource (Amato, 1993) and generalize to distributed network (Fischer & Oshman, 2021) . Since the learning model predicts the edge weights for all pairs of vertices, the graph we process is always complete. After the edge pre-process step, all the edge weights are non-negative (in Remark 1) and the graph is partitioned into several weakly connected components (WCCs). The classic Lovász method asks for computing the shortest path for every outside vertex x i into the cycle in the current WCC, which may travel around all the vertices and go inside and then outside

