LEARNING TO CROSS EXCHANGE TO SOLVE MIN-MAX VEHICLE ROUTING PROBLEMS

Abstract

CROSS exchange (CE), a meta-heuristic that solves various vehicle routing problems (VRPs), improves the solutions of VRPs by swapping the sub-tours of the vehicles. Inspired by CE, we propose Neuro CE (NCE), a fundamental operator of learned meta-heuristic, to solve various min-max VRPs while overcoming the limitations of CE, i.e., the expensive O(n 4 ) search cost. NCE employs graph neural network to predict the cost-decrements (i.e., results of CE searches) and utilizes the predicted cost-decrements to guide the selection of sub-tours for swapping, while reducing the search cost to O(n 2 ). As the learning objective of NCE is to predict the cost-decrement, the training can be simply done in a supervised fashion, whose training samples can be easily collected. Despite the simplicity of NCE, numerical results show that the NCE trained with min-max flexible multidepot VRP (min-max FMDVRP) outperforms the meta-heuristic baselines. More importantly, it significantly outperforms the neural baselines when solving distinctive special cases of min-max FMDVRP (e.g., min-max MDVRP, min-max mTSP, min-max CVRP) without additional training.

1. INTRODUCTION

The field of neural combinatorial optimization (NCO), an emerging research area intersecting operation research and artificial intelligence, aims to train an effective solver for various combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020) , vehicle routing problems (VRPs) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020; Hottung & Tierney, 2019; Lu et al., 2019; da Costa et al., 2021) , and vertex covering problems (Khalil et al., 2017; Li et al., 2018; Guo et al., 2019) . As NCO tackles NP-hard problems using various state-of-the-art (SOTA) deep learning techniques, it is considered an important research area in artificial intelligence. At the same time, NCO is an important field from a practical point of view because it can solve complex real-world problems. The current study mainly focuses on VRPs, a type of CO problems. Majority of learning-based VRP solvers learns to improve the current solution to obtain a better solution (i.e., improvement heuristics) (Hottung & Tierney, 2019; Lu et al., 2019; da Costa et al., 2021) or construct a solution sequentially (i.e., construction heuristics) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020; Park et al., 2021; Cao et al., 2021) . To learn such solvers, learning-based methods either employ supervised learning (SL), which imitates the solutions of the verified solvers, or reinforcement learning (RL), which learn a solver from the generated routes. Most NCO studies focus on the well-established "min-sum VRP" that aims to minimize the total traveling distance of vehicles, possibly because the benchmark problems and baseline algorithms are set up for the "min-sum VRP." On the other hand, VRP with different objectives have not received much attention from the NCO community, even though they can model various practical scenarios. For example, "min-max VRP" that aims to minimize the total completion time (i.e., makespan) of various time-critical distributed tasks (e.g., vaccine delivery, grocery delivery) has not been widely considered. * Equal contribution 1

