PRACTICAL MASSIVELY PARALLEL MONTE-CARLO TREE SEARCH APPLIED TO MOLECULAR DESIGN

Abstract

It is common practice to use large computational resources to train neural networks, known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used to search solutions for combinatorial optimization problems. This paper proposes a novel massively parallel Monte-Carlo Tree Search (MP-MCTS) algorithm that works efficiently for a 1,000 worker scale on a distributed memory environment using multiple compute nodes and applies it to molecular design. This paper is the first work that applies distributed MCTS to a real-world and non-game problem. Existing works on large-scale parallel MCTS show efficient scalability in terms of the number of rollouts up to 100 workers. Still, they suffer from the degradation in the quality of the solutions. MP-MCTS maintains the search quality at a larger scale. By running MP-MCTS on 256 CPU cores for only 10 minutes, we obtained candidate molecules with similar scores to non-parallel MCTS running for 42 hours. Moreover, our results based on parallel MCTS (combined with a simple RNN model) significantly outperform existing state-of-the-art work. Our method is generic and is expected to speed up other applications of MCTS 1 .

1. INTRODUCTION

A survey paper on MCTS, published in 2012, has cited 240 papers, including many game and non-game applications (Browne et al., 2012) AlphaGo (Silver et al., 2016) has brought MCTS into the spotlight. Combining MCTS and DNN is becoming one of the standard tools for solving decision making or combinatorial optimization problems. Therefore, there is a significant demand for parallel MCTS. However, in contrast to the enormous computing resources invested in training DNN models in many recent studies, MCTS is rarely parallelized at large scale. Parallelizing MCTS/UCT is notoriously challenging. For example, in UCT, the algorithm follows four steps, selection-expansion-rollout-backpropagation. Non-parallel vanilla UCT updates (backpropagates) the values in the tree nodes after each rollout. The behavior of the subsequent selection steps depends on the results of the previous rollouts-backpropagation. Therefore, there is no apparent parallelism in the algorithm.



. Since the invention of Upper Confidence bound applied to Trees (UCT) (Kocsis & Szepesvári, 2006) (the most representative MCTS algorithm) in 2006, MCTS has shown remarkable performance in various problems. Recently, the successful combination with Deep Neural Networks (DNN) in computer Go by

funding

was done while all the authors were at RIKEN.

