LEARNING TO CROSS EXCHANGE TO SOLVE MIN-MAX VEHICLE ROUTING PROBLEMS

Abstract

CROSS exchange (CE), a meta-heuristic that solves various vehicle routing problems (VRPs), improves the solutions of VRPs by swapping the sub-tours of the vehicles. Inspired by CE, we propose Neuro CE (NCE), a fundamental operator of learned meta-heuristic, to solve various min-max VRPs while overcoming the limitations of CE, i.e., the expensive O(n 4 ) search cost. NCE employs graph neural network to predict the cost-decrements (i.e., results of CE searches) and utilizes the predicted cost-decrements to guide the selection of sub-tours for swapping, while reducing the search cost to O(n 2 ). As the learning objective of NCE is to predict the cost-decrement, the training can be simply done in a supervised fashion, whose training samples can be easily collected. Despite the simplicity of NCE, numerical results show that the NCE trained with min-max flexible multidepot VRP (min-max FMDVRP) outperforms the meta-heuristic baselines. More importantly, it significantly outperforms the neural baselines when solving distinctive special cases of min-max FMDVRP (e.g., min-max MDVRP, min-max mTSP, min-max CVRP) without additional training.

1. INTRODUCTION

The field of neural combinatorial optimization (NCO), an emerging research area intersecting operation research and artificial intelligence, aims to train an effective solver for various combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020) , vehicle routing problems (VRPs) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020; Hottung & Tierney, 2019; Lu et al., 2019; da Costa et al., 2021) , and vertex covering problems (Khalil et al., 2017; Li et al., 2018; Guo et al., 2019) . As NCO tackles NP-hard problems using various state-of-the-art (SOTA) deep learning techniques, it is considered an important research area in artificial intelligence. At the same time, NCO is an important field from a practical point of view because it can solve complex real-world problems. The current study mainly focuses on VRPs, a type of CO problems. Majority of learning-based VRP solvers learns to improve the current solution to obtain a better solution (i.e., improvement heuristics) (Hottung & Tierney, 2019; Lu et al., 2019; da Costa et al., 2021) or construct a solution sequentially (i.e., construction heuristics) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020; Park et al., 2021; Cao et al., 2021) . To learn such solvers, learning-based methods either employ supervised learning (SL), which imitates the solutions of the verified solvers, or reinforcement learning (RL), which learn a solver from the generated routes. Most NCO studies focus on the well-established "min-sum VRP" that aims to minimize the total traveling distance of vehicles, possibly because the benchmark problems and baseline algorithms are set up for the "min-sum VRP." On the other hand, VRP with different objectives have not received much attention from the NCO community, even though they can model various practical scenarios. For example, "min-max VRP" that aims to minimize the total completion time (i.e., makespan) of various time-critical distributed tasks (e.g., vaccine delivery, grocery delivery) has not been widely considered. This study aims to learn a fundamental and universal operator that can effectively solve various practical "min-max VRP" that have flexible depot constraints. To design an universal and simple, yet powerful operator, we utilize CROSS-exchange (CE) (Taillard et al., 1997) , a local search designed to conduct the inter-operation of two routes (i.e., swapping the sub-tours of two selected routes) to reduce the traveling cost. We noticed that the inter-operation of CE is especially effective in improving the quality of "min-max VRP" because it can consider the interaction among multiple vehicles, and effectively reduce the differences between the traveling distances of all vehicles. However, the search cost for selecting the sub-tours from two trajectory is O(n 4 ) where n is the tour length. This make CE unapplicable to large scale VRPs. In this paper, we propose Neuro CE (NCE) that effectively conducts the CE operation with significantly less computational complexity. NCE amortizes the search for ending nodes of the sub-tours by employing a graph neural network (GNN), which predicts the best cost decrement, given two starting nodes from two given trajectories. NCE searches over only promising starting nodes using the prediction. Hence, the proposed NCE has O(n 2 ) search complexity. Furthermore, unlike other SL or RL approaches, the prediction target labels of NCE is not the entire solution of the VRPs, but the cost decrements of the CE operations (true target operator), which makes the collection of training data simple and easy. The contributions of this study are summarized as follows: • Generalizability/Transferability: As NCE learns a fundamental and universal operator to solve various complex min-max VRPs without retraining for each type of VRPs. • Trainability: The NCE operator is trained in a supervised manner with the dataset comprised of the tour pairs and cost decrements, which make the collection of training data easy and simple. • Practicality/Performance: We evaluate NCE with various types of min-max VRPs, including flexible multi-depot VRP (min-max FMDVRP), multi-depot VRP (min-max MDVRP), multiple TSP (min-max mTSP), and capacitated VRP (min-max CVRP). Extensive numerical experiments validate that NCE outperforms SOTA meta-heuristics and NCO baselines in solving various min-max VRPs, even though NCE is only trained with the data from min-max FMDVRP.

2. PRELIMINARIES

This section introduces the target problem, min-max flexible multi-depot VRP (min-max FMD-VRP), and the CE operator, a powerful local-search heuristics that solves min-max FMDVRP.

2.1. MIN-MAX FLEXIBLE MULTI-DEPOT VRP

Min-max FMDVRP is a generalization of VRP that aims to find the coordinated routes of multiple vehicles with multiple depots. The flexibility allows vehicles to go back to any depots regardless of their starting depots. The min-max FMDVRP is formulated as follows: min π∈S(P ) max i∈V C(τ i ) ( ) where P is the description of the min-max FMDVRP instance that is composed of a set of vehicles V, S(P ) is the set of solutions that satisfy the constraints of the min-max FMDVRP (i.e., feasible solutions), and π = {τ i } i∈V is a solution of the min-max FMDVRP. The tour τ i = [N 1 , N 2 , ..., N l(i) ]



Figure 1: The overall procedure of improvement heuristic that uses CE as the inter-operation.

