LEARNING COMBINATORIAL NODE LABELING ALGORITHMS

Abstract

We present the combinatorial node labeling framework, which generalizes many prior approaches to solving hard graph optimization problems by supporting problems where solutions consist of arbitrarily many node labels, such as graph coloring. We then introduce a neural network architecture to implement this framework. Our architecture builds on a graph attention network with several inductive biases to improve solution quality and is trained using policy gradient reinforcement learning. We demonstrate our approach on both graph coloring and minimum vertex cover. Our learned heuristics match or outperform classical hand-crafted greedy heuristics and machine learning approaches while taking only seconds on large graphs. We conduct a detailed analysis of the learned heuristics and architecture choices and show that they successfully adapt to different graph structures.

1. INTRODUCTION

Graph problems have numerous real-world applications, ranging from scheduling problems (Marx, 2004) and register allocation (Chaitin, 1982; Smith et al., 2004) , to computational biology (Abukhzam et al., 2004) . However, many useful graph optimizations problems are NP-hard to solve (Karp, 1972) . This has spurred a variety of approaches, from greedy heuristics (Brélaz, 1979; Papadimitriou & Steiglitz, 1982; Matula & Beck, 1983; Avis & Imamura, 2007; Delbot & Laforest, 2008) to integer linear programming (Graver, 1975) . More recently, machine learning approaches have shown increasing promise (Dai et al., 2017; Kool et al., 2019; Li et al., 2018; Karalias & Loukas, 2020) . From a structural point of view, many graph problems fall into one of three classes depending on the type of their solution: Problems that ask for (1) subsets of vertices, (2) permutations of vertices, or (3) partitions of vertices into two or more sets. Most work has focussed on either the first two (Dai et al., 2017) , or just one of the three (Bello et al., 2017; Li et al., 2018; Kool et al., 2019; Karalias & Loukas, 2020; Manchanda et al., 2020; Cappart et al., 2020; Drori et al., 2020; Ma et al., 2020) . Existing machine learning methods for the first two types of problems, such as S2V-DQN (Dai et al., 2017) , do not easily generalize to cases where the number of labels is not known in advance. Many important and challenging problems, such as graph coloring (Marx, 2004; Myszkowski, 2008; Bandh et al., 2009) , require that vertices be partitioned into an unkown number of sets. To address this, we present the combinatorial node labeling framework ( §2), which generalizes prior approaches (Fig. 1 ), and supports many problems, including minimum vertex cover (Onak et al., 2012; Bhattacharya et al., 2017; Ghaffari et al., 2020 ), traveling salesman (Dantzig et al., 1954; Garey & Johnson, 1990 ), maximum cut (Karp, 1972) , and list coloring (Jensen et al., 1995) . These, and many other ( §D), problems can all be framed as iteratively assigning a label to nodes, in some order. We then introduce a neural architecture, GAT-CNL, to learn greedy-inspired heuristics for such problems ( §3). We use policy gradient reinforcement learning (Sutton & Barto, 2018; Kool et al., 2019) to learn a node ordering and combine this with a fixed label rule to label each node according to the ordering. We show that for the chosen label rules, there still exists an order that guarantees an optimal solution. By using policy gradients, we can construct both a deterministic greedy policy, as well as a probabilistic policy where sampling boosts the solution quality. To improve performance, we incorporate two inductive biases: spatial locality, where labeling a node only impacts the weights of its neighbors; and temporal locality, where node selection is conditioned only on the previously labeled node, a summary of prior labelings, and a global graph context (Figs. 2 and 3 ). We evaluate our approach ( §4) and demonstrate significantly improved performance for neural graph coloring (GC) and find near-optimal solutions for minimum vertex cover (MVC). We additionally study the runtime of our models, conduct comprehensive ablation studies, and provide qualitative analyses of the learned heuristics, showing they adapt to the properties of the input graph. Related work. We now review key related works. Figure 1 (left) provides a comparison of node labeling with other frameworks. Supervised learning. The fundamental downsides of supervised learning for combinatorial optimization are twofold: First, it can be difficult to formulate a problem in a supervised manner, since it might have many optimal solutions (e.g., GC). Second, even if the problem admits a direct supervised formulation, we still need labeled data for training, which can be hard to generate and relies on an existing solver. In particular, supervised learning cannot easily be used for problems that have not been studied before. Advantages of supervised learning are its sample efficiency and that it can lead to overall better results. Unsupervised Learning. To apply unsupervised learning, it is necessary to formulate a differentiable surrogate loss. There have been approaches for several specific combinatorial optimization problems (Nazi et al., 2019; Amizadeh et al., 2019; Tönshoff et al., 2019; Yao et al., 2019) and there has been progress to create a framework for the derivation of trainable losses (Karalias & Loukas, 2020). Still, significant insight into a problem is required to design suitable loss functions. (Bello et al., 2017; Cappart et al., 2020; Drori et al., 2020; Ma et al., 2020) . Barrett et al. (2020) 



Recent approaches likeJoshi et al. (2019)  andManchanda et al. (2020)  obtain good results for influence maximization (IM) and the traveling salesman problem (TSP), respectively. Both approaches use supervised learning. For IM, the approach ofManchanda et al. (2020)  shows promising results on graphs much larger than those seen in training. For TSP, the approach of Joshi et al. (2019) is very efficient but does not generalize well to graphs larger than those seen in training.Li et al. (2018)  also use supervised learning and produce good results on minimum vertex cover (MVC), maximum independent set, and maximal clique.

Left: Venn diagram of tasks solvable with the set, permutation, and node labeling frameworks. Node labeling generalizes existing frameworks and allows solving additional tasks. Center & right: Comparison of our architecture with S2V-DQN(Dai et al., 2017). We add a label assignment step, allowing us to solve new problems. Further, the average time for picking the next vertex is significantly reduced, such that the total number of arithmetic operations is now linear in the size of the graph.

). Using RL only requires a way to represent partial solutions and a way to score the cost of a (partial or final) solution.Dai et al. (2017)  provide S2V-DQN, a general framework for learning problems like MVC and TSP that is trained with RL. It shows good results across different graph sizes for the covered problems, but is not fast enough to replace existing approaches nor does it handle arbitrary node labels (see Fig.1).Kool et al. (2019)  focus on routing problems like TSP and the vehicle routing problem. They outperform Dai et al. on TSP instances of the training size. Unfortunately, their approach does not seem to generalize to graph sizes that are very different from those used for training. Several other RL approaches have been proposed and evaluated for TSP

consider the maximum cut (MaxCut) problem.Huang et al. (2019)  present a Monte Carlo search tree approach specialized only for graph coloring. These methods do not address the general node labeling framework, but instead model the solution as a permutation of vertices (e.g., TSP, vehicle routing) or a set of nodes or edges (e.g., MVC, MaxCut). Instead, we can represent

