SOLVING NP-HARD PROBLEMS ON GRAPHS WITH EX-TENDED ALPHAGO ZERO

Abstract

There have been increasing challenges to solve combinatorial optimization problems by machine learning. Khalil et al. (NeurIPS 2017) proposed an end-to-end reinforcement learning framework, which automatically learns graph embeddings to construct solutions to a wide range of problems. However, it sometimes performs poorly on graphs having different characteristics than training graphs. To improve its generalization ability to various graphs, we propose a novel learning strategy based on AlphaGo Zero, a Go engine that achieved a superhuman level without the domain knowledge of the game. We redesign AlphaGo Zero for combinatorial optimization problems, taking into account several differences from two-player games. In experiments on five NP-hard problems such as MINIMUMVERTEX-COVER and MAXCUT, our method, with only a policy network, shows better generalization than the previous method to various instances that are not used for training, including random graphs, synthetic graphs, and real-world graphs. Furthermore, our method is significantly enhanced by a test-time Monte Carlo Tree Search which makes full use of the policy network and value network. We also compare recently-developed graph neural network (GNN) models, with an interesting insight into a suitable choice of GNN models for each task.

1. INTRODUCTION

There is no polynomial-time algorithm found for NP-hard problems [7] , but they often arise in many real-world optimization tasks. Therefore, a variety of algorithms have been developed in a long history, including approximation algorithms [2, 14] , meta-heuristics based on local searches such as simulated annealing and evolutionary computation [15, 10] , general-purpose exact solvers such as CPLEXfoot_0 and Gurobi [16] , and problem-specific exact solvers [1, 25] . Recently, machine learning approaches have been actively investigated to solve combinatorial optimization, with the expectation that the combinatorial structure of the problem can be automatically learned without complicated hand-crafted heuristics. In the early stage, many of these approaches focused on solving specific problems [17, 5] such as the traveling salesperson problem (TSP). Khalil et al. [19] proposed a general framework to solve combinatorial problems by a combination of reinforcement learning and graph embedding, which attracted attention for the following two reasons: It does not require any knowledge on graph algorithms other than greedy selection based on network outputs. Furthermore, it learns algorithms without any training dataset. Thanks to these advantages, the framework can be applied to a diverse range of problems over graphs and it also performs much better than previous learning-based approaches. However, we observed poor empirical performance on some graphs having different characteristics (e.g., synthetic graphs and real-world graphs) than random graphs that were used for training, possibly because of the limited exploration space of their Q-learning method. In this paper, to overcome its weakness, we propose a novel solver, named CombOpt Zero. CombOpt Zero is inspired by AlphaGo Zero [33], a superhuman engine of Go, which conducts Monte Carlo Tree Search (MCTS) to train deep neural networks. AlphaGo Zero was later generalized to AlphaZero [34] so that it can handle other games; however, its range of applications is limited to two-player games whose state is win/lose (or possibly draw). We extend AlphaGo Zero to a bunch of combinatorial



www.cplex.com 1

