PRACTICAL MASSIVELY PARALLEL MONTE-CARLO TREE SEARCH APPLIED TO MOLECULAR DESIGN

Abstract

It is common practice to use large computational resources to train neural networks, known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used to search solutions for combinatorial optimization problems. This paper proposes a novel massively parallel Monte-Carlo Tree Search (MP-MCTS) algorithm that works efficiently for a 1,000 worker scale on a distributed memory environment using multiple compute nodes and applies it to molecular design. This paper is the first work that applies distributed MCTS to a real-world and non-game problem. Existing works on large-scale parallel MCTS show efficient scalability in terms of the number of rollouts up to 100 workers. Still, they suffer from the degradation in the quality of the solutions. MP-MCTS maintains the search quality at a larger scale. By running MP-MCTS on 256 CPU cores for only 10 minutes, we obtained candidate molecules with similar scores to non-parallel MCTS running for 42 hours. Moreover, our results based on parallel MCTS (combined with a simple RNN model) significantly outperform existing state-of-the-art work. Our method is generic and is expected to speed up other applications of MCTS 1 .

1. INTRODUCTION

A survey paper on MCTS, published in 2012, has cited 240 papers, including many game and non-game applications (Browne et al., 2012) . Since the invention of Upper Confidence bound applied to Trees (UCT) (Kocsis & Szepesvári, 2006) (the most representative MCTS algorithm) in 2006, MCTS has shown remarkable performance in various problems. Recently, the successful combination with Deep Neural Networks (DNN) in computer Go by AlphaGo (Silver et al., 2016) has brought MCTS into the spotlight. Combining MCTS and DNN is becoming one of the standard tools for solving decision making or combinatorial optimization problems. Therefore, there is a significant demand for parallel MCTS. However, in contrast to the enormous computing resources invested in training DNN models in many recent studies, MCTS is rarely parallelized at large scale. Parallelizing MCTS/UCT is notoriously challenging. For example, in UCT, the algorithm follows four steps, selection-expansion-rollout-backpropagation. Non-parallel vanilla UCT updates (backpropagates) the values in the tree nodes after each rollout. The behavior of the subsequent selection steps depends on the results of the previous rollouts-backpropagation. Therefore, there is no apparent parallelism in the algorithm. Using virtual-loss technique (explained in section 2.3), MCTS has been efficiently parallelized on shared-memory single machine environment, where the number of CPU cores are limited (Chaslot et al., 2008; Enzenberger & Müller, 2010; Segal, 2010) . However, there is limited research on large-scale parallel MCTS using distributed memory environments. Only two approaches scale efficiently on distributed memory environment, but these were only validated in terms of the number of rollouts and the actual improvement is not validated (Yoshizoe et al., 2011; Graf et al., 2011) . Recently, the combination of (non-parallel) MCTS and DNN has been applied to molecular design problems, which aims to find new chemical compounds with desired properties (Yang et al., 2017; Sumita et al., 2018) , utilizing the ability of MCTS to solve single-agent problems. In general, designing novel molecules can be formulated as a combinatorial optimization or planning problem to find the optimal solutions in vast chemical space (of 10 23 to 10 60 , Polishchuk et al. ( 2013)) and can be tackled with the combinations of deep generative models and search (Kusner et al., 2017; Gómez-Bombarelli et al., 2018; Jin et al., 2018; Popova et al., 2018; 2019; Yang et al., 2017) . However, there are no previous studies about massively parallel MCTS for molecular design. In this paper, we propose a novel distributed parallel MCTS and apply it to the molecule design problem. This is the first work to explore viability of distributed parallel MCTS in molecular design. Our experimental results show that a simple RNN model combined with massively parallel MCTS outperforms existing work using more complex models combined with Bayesian Optimization or Reinforcement Learning (other than UCT).

2. BACKGROUND 2.1 (NON-PARALLEL) MCTS

In 2006, Kocsis and Szepesvári proposed UCT based on a Multi-Armed Bandit algorithm UCB1 (Auer et al., 2002) , which is the first MCTS algorithm having a proof of convergence to the optimal solution. It has shown good performance for many problems, including the game of Go (Gelly et al., 2006) . One round of UCT consists of four steps, as shown in Fig. 1 . It repeats the rounds for a given number of times or until a specified time has elapsed. Selection: The algorithm starts from the root node and selects the child with the highest UCB1 value (Fig. 2 left) until it reaches a leaf node. For each child i, v i is the number of visits, w i is the cumulative reward, and V is the number of visits at the parent node. Exploration constant C controls the behavior of UCT: the smaller, the more selective; the greater, the more explorative. Expansion: If the number of visits exceeds a given threshold, expand the leaf node (add the children of the leaf to the tree) and select one of the new children for simulation. Do nothing otherwise. Simulation: UCT then evaluates the node by a simulation. This step is often called playout or rollout. A simple example of rollout is to go down the tree by selecting a random child at each node until it reaches a terminal node and returns the value at the terminal node as the reward r (win or loss for games). Replacing rollout with a DNN based evaluation is becoming more popular following the success of AlphaGo. 

O=C(Nc1cc(Nc2c

(Cl)cccc2NCc2ccc(Cl)cc2Cl) c2ccccc2c1OC(F)F)c1cccc2ccccc12 & O 1 C 1 = 2 c 2 C 3 N 3



Figure1: Four steps of (non-parallel) MCTS, with simulation for molecular design.

funding

was done while all the authors were at RIKEN.

