EMBEDDING A RANDOM GRAPH VIA GNN: MEAN-FIELD INFERENCE THEORY AND RL APPLICATIONS TO NP-HARD MULTI-ROBOT/MACHINE SCHEDULING

Abstract

We develop a theory for embedding a random graph using graph neural networks (GNN) and illustrate its capability to solve NP-hard scheduling problems. We apply the theory to address the challenge of developing a near-optimal learning algorithm to solve the NP-hard problem of scheduling multiple robots/machines with time-varying rewards. In particular, we consider a class of reward collection problems called Multi-Robot Reward Collection (MRRC). Such MRRC problems well model ride-sharing, pickup-and-delivery, and a variety of related problems. We consider the classic identical parallel machine scheduling problem (IPMS) in the Appendix. For the theory, we first observe that MRRC system state can be represented as an extension of probabilistic graphical models (PGMs), which we refer to as random PGMs. We then develop a mean-field inference method for random PGMs. We prove that a simple modification of a typical GNN embedding is sufficient to embed a random graph even when the edge presence probabilities are interdependent. Our theory enables a two-step hierarchical inference for precise and transferable Q-function estimation for MRRC and IPMS. For scalable computation, we show that transferability of Q-function estimation enables us to design a polynomial time algorithm with 1 -1/e optimality bound. Experimental results on solving NP-hard MRRC problems (and IMPS in the Appendix) highlight the near-optimality and transferability of the proposed methods.

1. INTRODUCTION

Consider a set of identical robots seeking to serve a set of spatially distributed tasks. Each task is given an initial age (which then increases linearly in time). Greater rewards are given to younger tasks when service is complete according to a predetermined reward rule. We focus on NP-hard scheduling problems possessing constraints such as 'no possibility of two robots assigned to a task at once'. Such problems prevail in operations research, e.g., dispatching vehicles to deliver customers in a city or scheduling machines in a factory. Impossibility results in asynchronous communicationfoot_0 [Fischer et al. (1985) ] make these problems inherently centralized. Learning-based scheduling methods for single-robot NP-hard problems. structure2vec (Dai et al. ( 2016)) is a popular Graphical Neural Network (GNN) derived from the fixed point iteration of PGM based mean-field inference. Recently, Dai et al. (2017) showed that structure2vec can construct a solution for Traveling Salesman Problem (TSP). A partial solution to TSP was considered as an intermediate state, and the state was represented using a heuristically constructed probabilistic graphical model (PGM). This GNN was used to infer the Q-function, which they exploit to select the next assignment. While their choice of PGM was entirely heuristic, their approach achieved nearoptimality and transferability of their trained single-robot scheduling algorithm to new single-robot scheduling problems with an unseen number of tasks. Those successes were restricted to single-robot problems except for special cases when the problem can be modeled as a variant of single-robot TSP via multiple successive journeys of a single robot, c.f., (Nazari et al. (2018); Kool et al. (2018) ).



Due to this limitation, multi-agent (decentralized) methods are rarely used in industries (e.g., factories).1

