LEARNING TO CROSS EXCHANGE TO SOLVE MIN-MAX VEHICLE ROUTING PROBLEMS

Abstract

CROSS exchange (CE), a meta-heuristic that solves various vehicle routing problems (VRPs), improves the solutions of VRPs by swapping the sub-tours of the vehicles. Inspired by CE, we propose Neuro CE (NCE), a fundamental operator of learned meta-heuristic, to solve various min-max VRPs while overcoming the limitations of CE, i.e., the expensive O(n 4 ) search cost. NCE employs graph neural network to predict the cost-decrements (i.e., results of CE searches) and utilizes the predicted cost-decrements to guide the selection of sub-tours for swapping, while reducing the search cost to O(n 2 ). As the learning objective of NCE is to predict the cost-decrement, the training can be simply done in a supervised fashion, whose training samples can be easily collected. Despite the simplicity of NCE, numerical results show that the NCE trained with min-max flexible multidepot VRP (min-max FMDVRP) outperforms the meta-heuristic baselines. More importantly, it significantly outperforms the neural baselines when solving distinctive special cases of min-max FMDVRP (e.g., min-max MDVRP, min-max mTSP, min-max CVRP) without additional training.

1. INTRODUCTION

The field of neural combinatorial optimization (NCO), an emerging research area intersecting operation research and artificial intelligence, aims to train an effective solver for various combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020) , vehicle routing problems (VRPs) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020; Hottung & Tierney, 2019; Lu et al., 2019; da Costa et al., 2021) , and vertex covering problems (Khalil et al., 2017; Li et al., 2018; Guo et al., 2019) . As NCO tackles NP-hard problems using various state-of-the-art (SOTA) deep learning techniques, it is considered an important research area in artificial intelligence. At the same time, NCO is an important field from a practical point of view because it can solve complex real-world problems. The current study mainly focuses on VRPs, a type of CO problems. Majority of learning-based VRP solvers learns to improve the current solution to obtain a better solution (i.e., improvement heuristics) (Hottung & Tierney, 2019; Lu et al., 2019; da Costa et al., 2021) or construct a solution sequentially (i.e., construction heuristics) (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020; Park et al., 2021; Cao et al., 2021) . To learn such solvers, learning-based methods either employ supervised learning (SL), which imitates the solutions of the verified solvers, or reinforcement learning (RL), which learn a solver from the generated routes. Most NCO studies focus on the well-established "min-sum VRP" that aims to minimize the total traveling distance of vehicles, possibly because the benchmark problems and baseline algorithms are set up for the "min-sum VRP." On the other hand, VRP with different objectives have not received much attention from the NCO community, even though they can model various practical scenarios. For example, "min-max VRP" that aims to minimize the total completion time (i.e., makespan) of various time-critical distributed tasks (e.g., vaccine delivery, grocery delivery) has not been widely considered. This study aims to learn a fundamental and universal operator that can effectively solve various practical "min-max VRP" that have flexible depot constraints. To design an universal and simple, yet powerful operator, we utilize CROSS-exchange (CE) (Taillard et al., 1997) , a local search designed to conduct the inter-operation of two routes (i.e., swapping the sub-tours of two selected routes) to reduce the traveling cost. We noticed that the inter-operation of CE is especially effective in improving the quality of "min-max VRP" because it can consider the interaction among multiple vehicles, and effectively reduce the differences between the traveling distances of all vehicles. However, the search cost for selecting the sub-tours from two trajectory is O(n 4 ) where n is the tour length. This make CE unapplicable to large scale VRPs. In this paper, we propose Neuro CE (NCE) that effectively conducts the CE operation with significantly less computational complexity. NCE amortizes the search for ending nodes of the sub-tours by employing a graph neural network (GNN), which predicts the best cost decrement, given two starting nodes from two given trajectories. NCE searches over only promising starting nodes using the prediction. Hence, the proposed NCE has O(n 2 ) search complexity. Furthermore, unlike other SL or RL approaches, the prediction target labels of NCE is not the entire solution of the VRPs, but the cost decrements of the CE operations (true target operator), which makes the collection of training data simple and easy. The contributions of this study are summarized as follows: • Generalizability/Transferability: As NCE learns a fundamental and universal operator to solve various complex min-max VRPs without retraining for each type of VRPs. • Trainability: The NCE operator is trained in a supervised manner with the dataset comprised of the tour pairs and cost decrements, which make the collection of training data easy and simple. • Practicality/Performance: We evaluate NCE with various types of min-max VRPs, including flexible multi-depot VRP (min-max FMDVRP), multi-depot VRP (min-max MDVRP), multiple TSP (min-max mTSP), and capacitated VRP (min-max CVRP). Extensive numerical experiments validate that NCE outperforms SOTA meta-heuristics and NCO baselines in solving various min-max VRPs, even though NCE is only trained with the data from min-max FMDVRP.

2. PRELIMINARIES

This section introduces the target problem, min-max flexible multi-depot VRP (min-max FMD-VRP), and the CE operator, a powerful local-search heuristics that solves min-max FMDVRP.

2.1. MIN-MAX FLEXIBLE MULTI-DEPOT VRP

Min-max FMDVRP is a generalization of VRP that aims to find the coordinated routes of multiple vehicles with multiple depots. The flexibility allows vehicles to go back to any depots regardless of their starting depots. The min-max FMDVRP is formulated as follows: min π∈S(P ) max i∈V C(τ i ) ( ) where P is the description of the min-max FMDVRP instance that is composed of a set of vehicles V, S(P ) is the set of solutions that satisfy the constraints of the min-max FMDVRP (i.e., feasible solutions), and π = {τ i } i∈V is a solution of the min-max FMDVRP. The tour τ i = [N 1 , N 2 , ..., N l(i) ] Algorithm 1: Neuro CROSS exchange (NCE) for solving VRP family  (τ1, τ2) ← SelectTours({τi} i∈|V| ) (τ ′ 1 , τ ′ 2 ) ← NeuroCROSS(τ1, τ2, f θ ) // Inter operation τ ′ i ← IntraOperation(τi), i = 1, 2 τ1 ← τ ′ 1 , τ2 ← τ ′ 2 if update then {τ * i } i∈|V| ← {τi} i∈|V| if Cper = p then break Cper ← Cper + 1 (τ1, τ2) ← ChooseRandomTours (τ1, τ2) ← RandomExchange(τ1, τ2) // escape from local minima of vehicle i is the ordered collection of the visited tasks by the vehicle v i , and C(τ i ) is the cost of τ i . Min-max FMDVRP can be used to formulate the operation of shared vehicles that can be picked up from or delivered to any depots. The mixed integer linear programming (MILP) formulation of min-max FMDVRP is provided in Appendix A.3. Classical VRPs are special cases of FMDVRP: TSP is a VRP with a single vehicle and depot, mTSP is a VRP with multiple vehicles and a single depot, and MDVRP is a VRP with multiple vehicles and depots. Since FMVDRP is a general problem class, we learn a solver for FMVDRP and employ it to solve other specific problems (i.e., min-max MDVRP, min-max mTSP, and min-max CVRP), without retraining or fine-tuning. We demonstrate that the proposed method can solve these special cases almost optimally without retraining in Section 4.

2.2. CROSS EXCHANGE

CE is a solution updating operator that iteratively improves the solution until it reaches a satisfactory result (Taillard et al., 1997) . CE reduces the overall cost by exchanging the sub-tours in two tours. The CE operator is defined as: τ ′ 1 , τ ′ 2 = CROSS(a 1 , b 1 , a 2 , b 2 ; τ 1 , τ 2 ) (2) τ ′ 1 ≜ τ 1 [: a 1 ] ⊕ τ 2 [a 2 : b 2 ] ⊕ τ 1 [b 1 :] (3) τ ′ 2 ≜ τ 2 [: a 2 ] ⊕ τ 1 [a 1 : b 1 ] ⊕ τ 2 [b 2 :] where τ i and τ ′ i are the input and updated tours of the vehicle i, respectively. τ i [a : b] represents the sub-tour of τ i , ranging from node a to b. τ ⊕ τ ′ represents the concatenation of tours τ and τ ′ . For brevity, we assume that node a 1 , a 2 comes early than node b 1 , b 2 in τ 1 , τ 2 , respectively. CE selects the sub-tours (i.e., τ 1 [a 1 : b 1 ], τ 2 [a 2 : b 2 ]) from τ 1 , τ 2 and swaps the sub-tours to generate new tours τ ′ 1 , τ ′ 2 . CE seeks to find the four points (a 1 , b 1 , a 2 , b 2 ) to reduce the cost of the tours. For min-max VRPs, we define the cost of the two selected tours as C(τ 1 , τ 2 ) = max(l(τ 1 ), l(τ 2 )), where l(τ i ) is the traveling distance of tour τ i , and apply the CE operator to reduce this cost, i.e., C(τ ′ 1 , τ ′ 2 ) ≤ C(τ 1 , τ 2 ) in an attempt to minimize the traveling distance of the longest route. When the full search method is naively employed, the search cost is O(n 4 ), where n is the number of nodes in a tour. Fig. 1 illustrates how the improvement heuristics utilize CE to solve min-max FMDVRP. The improvement heuristics start by generating the initial feasible tours using simple heuristics. Then, they repeatedly (1) select two tours, (2) apply inter-operation to generate improved tours by CE, and (3) apply intra-operation to improve the tours independently. The improvement heuristics terminate when no more (local) improvement is possible.  / a * 1 ← ∅, a * 2 ← ∅, b * 1 ← ∅, b * 2 ← ∅, y * ← 0 for ((a1, a2), ŷ * (a1, a2; τ1, τ2)) ∈ SK do ( b1, b2) ← arg max b 1 ,b 2 (C(CROSS((a1, b1, a2, b2; τ1, τ2))) -C(τ1, τ2)) y * (a1, a2; τ1, τ2) ← C(CROSS((a1, b1, a2, b2; τ1, τ2)) -C(τ1, τ2) if y * (a1, a2; τ1, τ2) ≥ y * then a * 1 ← a1, a * 2 ← a2, b * 1 ← b1, b * 2 ← b2 y * ← y * (a1, a2; τ1, τ2) (τ ′ 1 , τ ′ 2 ) ← CROSS(a * 1 , b * 1 , a * 2 , b * 2 ; τ1, τ2)

3. NEURO CROSS EXCHANGE

In this section, we introduce Neuro CROSS exchange (NCE) to solve min-max FMDVRP and its special cases. The overall procedure of NCE is summarized in Algorithm 1. We briefly explain GetInitialSolution, SelectTours, NeuroCROSS, and IntraOperation, and then provide the details of the proposed NeuroCROSS operation in the following subsections. NCE is particularly designed to improve the solution quality and problem solving speed of CE when solving min-max VRPs. Each component of NCE is as follows: • GetInitialSolution. We use a multi-agent extended version of the greedy assignment heuristic (Dell'Amico et al., 1993) to obtain the initial feasible solutions. The heuristic first clusters the cities into |V| clusters and then applies the greedy assignment to each cluster to get the initial solution. • SelectTours. Following the common practice, we select τ 1 , τ 2 as the tours with the largest and smallest traveling distance (i.e., τ 1 = arg max τ (l(τ i ) i∈V ), τ 2 = arg min τ (l(τ i ) i∈V )). • NeruoCROSS. We utilize the cost-decrement prediction model f θ (•) and two-stage search method to find the cost-improving tour pair (τ ′ 1 , τ ′ 2 ) with O(n 2 ) budget. The details of the NCE operation will be given in Sections 3.1 and 3.2. • IntraOperation. For our target VRPs, the intra-operation is equivalent to solving TSP. We utilize elkai (Dimitrovski) to solve TSP.

3.1. NEURO CROSS EXCHANGE OPERATION

The CE operation can be shown as selecting two pairs of nodes (i.e., the pairs of a 1 /b 1 and a 2 /b 2 ) from the selected tours (i.e., τ 1 , τ 2 ). This typically involves O(n 4 ) searches. To reduce the high search complexity, NCE utilizes the cost-decrement model f θ (a 1 , a 2 ; τ 1 , τ 2 ) that predicts the maximum cost decrements from the given τ 1 and τ 2 , and the starting nodes a 1 and a 2 of their sub-tours. That is, f θ (a 1 , a 2 ; τ 1 , τ 2 ) amortizes the search for the ending nodes b 1 , b 2 given (τ 1 , τ 2 , a 1 , a 2 ), and helps to identify the promising (a 1 , a 2 ) pairs that are likely to improve the tours. After selecting the top K promising pairs of (a 1 , a 2 ) using f θ (a 1 , a 2 ; τ 1 , τ 2 ), whose search cost is O(n 2 ), NCE then finds (b 1 , b 2 ) to identify the promising (a 1 , a 2 ) pairs. Overall, the entire search can be done in O(n 2 ). The following paragraphs detail the procedures of NCE. Predicting cost decrement. We employ f θ (a 1 , a 2 ; τ 1 , τ 2 ) (which will be explained in Section 3.2) to predict the optimal cost decrement y * defined as: where C(τ 1 , τ 2 ) = max (l(τ 1 ), l(τ 2 ) ). In other words, f θ (•) predicts the best cost decrement of τ 1 and τ 2 , given a 1 and a 2 . The real cost decrement labels are obtained from the real search operation. y * (a 1 , a 2 ; τ 1 , τ 2 ) = max b1,b2 (C(CROSS((a 1 , b 1 , a 2 , b 2 ; τ 1 , τ 2 ))) -C(τ 1 , τ 2 )) (5) ≈ f θ (a 1 , a 2 ; τ 1 , τ 2 ) (6) Constructing search candidate set. By training f θ (•), we can amortize the search for b 1 and b 2 . However, this amortization bears the prediction errors, which can misguide entire improvement process. To alleviate this problem, we selects the top K pairs of (a 1 , a 2 ) that have the top K-largest y * out of all (a 1 , a 2 ) choices. Intuitively speaking, NCE exclude the less promising (a 1 , a 2 ) pairs while utilizing the prediction model f θ (•) which can have some errors. Performing reduced search. NCE finds the best (b 1 , b 2 ) for each (a 1 , a 2 ) in the search candidate sets and selects the best sub-tours (a 1 , a 2 , b 1 , b 2 ) that maximizes the actual cost decrement (not prediction). Unlike the full search of CE, the proposed NCE only performs the search for (b 1 , b 2 ), which reduces the search cost from O(n 4 ) to O(n 2 ). The detailed procedures of NCE are summarized in Algorithm 2.

3.2. COST-DECREMENT PREDICTION MODEL

NCE saves computations by employing f θ (a 1 , a 2 ; τ 1 , τ 2 ) to predict y * (•) from a 1 , a 2 , τ 1 and τ 2 . The overall procedure is illustrated in Fig. 2 . Graph representation of (τ 1 , τ 2 ). We represent a pair of tours (τ 1 , τ 2 ) as a directed complete graph G = (N, E), where N = τ 1 ∪ τ 2 (i.e., the i th node n i of G is either the city or depot of the tours, and e ij is the edge from n i to n j ). G has the following node and edge features: • x i = [coord(n i ), 1 depot (n i )], where coord(n i ) is the 2D Euclidean coordinate of v i , and 1 depot (n i ) is the indicator of whether n i is a depot. • x ij = [dist(n i , n j )], where dist(n i , n j ) is the 2D Euclidean distance between n i and n j . Graph embedding with attentive graph neural network. We employ an attentive variant of graph-network (GN) block (Battaglia et al., 2018) to embed G. The attentive embedding layer is defined as follows: h ′ ij = ϕ e (h i , h j , h ij , x ij ) z ij = ϕ w (h i , h j , h ij , x ij ) (8) w ij = softmax({z ij } j∈N (i) ) (9) h ′ i = ϕ n (h i , j∈N (i) w ij h ′ ij ) where h i and h ij are node and edge embeddings respectively, ϕ e , ϕ w , and ϕ n are the Multilayer Perceptron (MLP)-parameterized edge, attention and node operators respectively, and N (i) is the neighbor set of n i . We utilize H embedding layers to compute the final node {h (H) i | n i ∈ V} and edge {h (H) ij | e ij ∈ E} embeddings. Cost-decrement prediction. Based on the computed embedding, the cost prediction module ϕ c predicts y * (a 1 , a 2 ; τ 1 , τ 2 ). The selection of the two starting nodes in τ 1 and τ 2 indicates (1) the addition of the two edges, (a 1 , a 2 + 1) and (a 2 , a 1 + 1), and (2) the removal of the original two edges, (a 1 , a 1 + 1) and (a 2 , a 2 + 1), as shown in the third block in Fig. 2 (we overload the notations a 1 + 1, a 2 + 1 so that they denote the next nodes of a 1 , a 2 in τ 1 , τ 2 , respectively). To consider such edge addition and removal procedure in cost prediction, we design ϕ c as follows: i,j denotes the embedding of n i and e ij , respectively. The quality of the NCE operator highly depends on the accuracy of f θ . When K ≥ 10, we experimentally confirmed that the NCE operator finds the argmax (a 1 , a 2 , b 1 , b 2 ) pair with high probability. We provide the experimental details and results about the predictions of f θ in Appendix H. ŷ * (a 1 , a 2 ; τ 1 , τ 2 ) = ϕ c (h (H) a1 , h (H) a1+1 , h (H) a2 , h

4. EXPERIMENTS

This section provides the experiment results that validate the effectiveness of the proposed NCE in solving min-max FMDVRP and various min-max VRPs. To train f θ (•), we use the input (τ 1 , τ 2 , a 1 , a 2 ) and output y * pairs obtained from 50,000 random min-max FMDVRP instances. The details of the train data generation are described in Appendix G. The cost decrement model f θ (•) is parametrized by the GNN that contains the five attentive embedding layers. The details of the f θ (•) architecture and the computing infrastructure used to train f θ (•) are discussed in Appendix G. We emphasize that we use a single f θ (•) that is trained using data obtained from random min-max FMDVRP instances for all experiments. We found that f θ (•) effectively solves the three special cases (i.e., min-max MDVRP, min-max mTSP, and min-max CVRP) without retraining or finetuning, proving the effectiveness of NCE as a universal operator for VRPs.

4.1. MIN-MAX FMDVRP EXPERIMENTS

We evaluate the performance of NCE in solving various sizes of min-max FMDVRP. We consider 100 random min-max FMDVRP instances for each problem size (N c , N d , N v ), where N c , N d , N v are the number of cities, depots, and vehicles, respectively. We provide the average makespan and computation time for the 100 instances. For small-sized problems (N c ≤ 10), we employ CPLEX (Cplex, 2009) (an exact method), OR-tools (Perron & Furnon) , CE (full search), ScheduleNet (Park et al., 2021) , greedy heuristic (Dell'Amico et al., 1993) , and greedy + TSP heuristic as baselines. For the larger-sized problems, we exclude CPLEX from the baselines due to its limited scalability. To our knowledge, our method is the first neural approach to solve min-max FMDVRP. Please note that we extend the ScheduleNet algorithm (Park et al., 2021) , the most performing neural baseline for mTSP, and utilize it as a neural baseline for the min-max FMVDRP experiments. Table 1 shows the performances of NCE on the small-sized problems. NCE achieve similar makespans with CPLEX (optimal solution) within significantly lower computation times. NCE outperforms OR-tools in terms of makespan but has longer computation time; however, the computation time for NCE will be much lower than that of OR-tools when the problem size becomes bigger. It is noteworthy that NCE exhibits larger computation time than CE as the forward-propagation cost of GNN is larger than exhaustive search for small problems. Table 2 and Table 3 shows the performances of NCE on the medium and large-sized problems, respectively. Applying CPLEX for large min-max FMDVRPs is infeasible, so we exclude it from the baselines. Instead, the CE serves as an oracle to compute the makespans. For all cases, NCE provides a solution with almost zero gap from CE , but is computationally much faster. This validates that NCE successfully amortizes the search operations of CE with significantly lower computation times. In addition, NCE consistently outperforms OR-tools for both the makespan and computational time. The performance gap between NCE and OR-tools becomes more significant as N c /N v becomes large (i.e., each tour length becomes longer). min-max MDVRP results. We also apply NCE with f θ that is trained on FMDVRP to solve minmax MDVRP. As shown Tables A.1 to A.3 in Appendix B, NCE shows leading performance and is faster than the baselines, similar to the FMDVRP experiments.

4.2. MIN-MAX MTSP EXPERIMENTS

We evaluate the generalization capability of NCE for solving min-max mTSPs. We provide the average performance of 100 instances for each (N c , N v ) pair. For the baselines, we consider two meta-heuristics; LKH-3 (Helsgaun, 2017) , which is known as the one of the best mTSP heuristics, and OR-tools, and two neural baselines; ScheduleNet (Park et al., 2021) and DAN (Cao et al., 2021) . As shown in Table 4 , NCE achieves similar performance with LKH-3 with significantly shorter computational time. It is noteworthy that LKH-3 employs mTSP-specific heuristics on top of LKH heuristics, while NCE does not employ any mTSP-specific structures. To validate the effect of taskspecific information on NCE, we train NCE with mTSP data (NCE-mTSP) and solve mTSP. The performances of NCE and NCE-mTSP are almost identical, which indicates that NCE is highly generalizable. In addition, NCE consistently outperforms the neural baselines. We further apply We evaluate the generalization capability of NCE in solving min-max capacitated VRP. As f θ (•) is trained on min-max FMDVRPs, it does not consider the capacity constraints. However, we can easily enforce such constraints without retraining f θ (•), but by adjusting the searching range as follows: (b 1 , b 2 ) ← arg max b1,b2∈Sc (C(CROSS((a 1 , b 1 , a 2 , b 2 ; τ 1 , τ 2 ))) -C(τ 1 , τ 2 )) , ( ) where the searching range S c is a set of nodes that satisfies the capacity constraints. As shown in As most NCO studies use min-sum CVRP as the canonical benchmark tasks, we also employed NCE to solve min-sum CVRP problems. For min-sum CVRP, we trained the NCE cost decrement prediction model using different cost definition C(τ 1 , τ 2 ) = l(τ 1 ) + l(τ 2 ) in an attempt to minimize the sum of the traveling distance. The min-sum CVRP benchmark results are provided in Appendix E, Table A .7.

4.4. ABLATION STUDIES

We evaluate the effects of the hyperparameters on NCE. The results are as follows: • Appendix F.1: the performance of NCE converges when the number of candidate K ≥ 10. • Appendix F.2: the performance of NCE is less sensitive to the selection of intra solvers. • Appendix F.3: the performance of NCE is less sensitive to the selection of swapping tours. • Appendix F.4: the performance of NCE converges when the perturbation parameter p ≥ 5.

5. RELATED WORKS

Supervised learning (SL) approach to solve VRPs SL approaches (Joshi et al., 2019; Vinyals et al., 2015; Xin et al., 2021b; Li et al., 2021; 2018) utilize the supervision from the VRP solvers as the training labels. (Vinyals et al., 2015; Joshi et al., 2019) imitates TSP solvers using PointerNet and graph convolution network (GCN), respectively. (Joshi et al., 2019) trains a GCN to predict the edge occurrence probabilities in TSP solutions. Even though SL often offer a faster solving speed than existing solvers, their use is limited to the problems where the solvers are available. Such property limits the use of SL from general and realistic VRPs. Reinforcement learning (RL) approach to solve VRPs RL approaches (Bello et al., 2016; Khalil et al., 2017; Nazari et al., 2018; Kool et al., 2018; Kwon et al., 2020; Park et al., 2021; Cao et al., 2021; Guo et al., 2019; Wu et al., 2019; 2021; Falkner & Schmidt-Thieme, 2020; Chen & Tian, 2019) exhibit promising performances that are comparable to existing solvers as they learn solvers from the problem-solving simulations. (Bello et al., 2016; Nazari et al., 2018; Kool et al., 2018; Guo et al., 2019) utilize an encoder-decoder structure to generate routing schedules sequentially, while (Park et al., 2021; Khalil et al., 2017) use graph-based embedding to determine the next assignment action. However, RL approaches often require the problem-specific Markov decision process and network design. NCE does not require the simulation of the entire problem-solving. Instead, it only requires the computation of the swapping operation (i.e., the results of CE). This property allows NCE to be trained easily to solve various routing problems with one scheme. Neural network-based (meta) heuristic approach Combining machine learning (ML) components with existing (meta) heuristics shows strong empirical performances when solving VRPs (Hottung & Tierney, 2019; Xin et al., 2021b; Li et al., 2021; Lu et al., 2019; da Costa et al., 2021; Kool et al., 2021) . They often employ ML to learn to solve NP-hard sub-problems of VRPs, which are difficult. For example, L2D (Li et al., 2021) learns to predict the objective value of CVRP, NLNS (Hottung & Tierney, 2019 ) learns a TSP solver when solving VRPs and DPDP (Kool et al., 2021) learns to boost the dynamic programming algorithms. To learn such solvers, these methods apply SL or RL. Instead, NCE learns the fundamental operator of meta-heuristics rather than predict or generate a solution. Hence, NCE that is trained on FMDVRP generalizes well to the special cases of FMDVRP. Furthermore, the training data for NCE can be prepared effortlessly.

6. CONCLUSION

We propose Neuro CROSS exchange (NCE), a neural network-enhanced CE operator, to learn a fundamental and universal operator that can be used to solve the various types of min-max VRPs. We validated that NCE can solve various min-max VRPs without retraining for each specific problem, exhibiting strong empirical performances. Although NCE addresses more realistic VRPs (i.e., minmax FMDVRP) than existing NCO solvers, NCE does not yet consider complex constraints such as pickup and delivery, and time windows. Our future research will focus on solving more complex VRPs by considering such various constraints during the NCE operation. Liang Xin, Wen Song, Zhiguang Cao, and Jie Zhang. Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In Proceedings of 35th AAAI Conference on Artificial Intelligence, pp. 12042-12049, 2021a. Liang Xin, Wen Song, Zhiguang Cao, and Jie Zhang. Neurolkh: Combining deep learning model with lin-kernighan-helsgaun heuristic for solving the traveling salesman problem. Advances in Neural Information Processing Systems, 34, 2021b.

A.2 MDVRP

Multi-depot VRP is a multi-depot extension of mTSP (Appendix A.1), where each vehicle starts from its own designated depot and returns to the depot. We extend the MILP formulation of mTSP to define the MILP formulation of MDVRP. On top of the mTSP formulation, we define K i , which indicates the set of vehicles assigned to the depot i. minimize Q (A.10) subject to. i∈V j∈V (A.11) j∈V j̸ =i k∈T .12) i∈V j̸ =i k∈T .20) where Eq. (A.19) and Eq. (A.20) indicate that each vehicle starts and returns its own depot at most once. d ij x ijk ≤ Q, ∀k ∈ K : i ̸ = j, x ijk = 1, ∀i ∈ V \ S (A x ijk = 1, ∀j ∈ V \ S (A.13) i∈V x ijk - h∈V x jhk = 0, ∀j ∈ V \ S, ∀k ∈ K (A.14) u ik -u jk + |V |x ijk ≤ |V | -1, ∀k ∈ K, j ∈ V \ S : i ̸ = j, (A.15) 0 ≤ u ik ≤ |V | -1, ∀k ∈ K, i ∈ V \ S (A.16) x ijk ∈ {0, 1}, ∀k ∈ K, ∀i, j ∈ V, (A.17) u ik ∈ Z, ∀k ∈ K, i ∈ V (A.18) j∈V \S x ijk ≤ 1, ∀k ∈ K i , ∀i ∈ S (A.19) i∈V \S x ijk ≤ 1, ∀k ∈ K j , ∀j ∈ S (A A.3 FMDVRP Flexible MDVRP (FMDVRP) is an extension of MDVRP, allowing the vehicle to return to any depot. We extend the MDVRP formulation (Appendix A.2) to define the FMDVRP formulation. To account for the flexibility of the returning depot, we introduce a dummy node for all depots; a depot is modeled with a start and return depot. We define S 1 and S 2 as the set of start and return depots and s k as the start node of the vehicle k. i∈V j∈V (A.22) j∈V j̸ =i k∈T  d ij x ijk ≤ Q, ∀k ∈ K : i ̸ = j, x ijk = 1, ∀i ∈ V \ S (A.23) i∈V j̸ =i k∈T x ijk = 1, ∀j ∈ V \ S (A.24) i∈V x ijk - h∈V x jhk = 0, ∀j ∈ V \ S, ∀k ∈ K (A.25) u ik -u jk + |V |x ijk ≤ |V | -1, ∀k ∈ K, j ∈ V \ S : i ̸ = j, (A.26) 0 ≤ u ik ≤ |V | -1, ∀k ∈ K, i ∈ V \ S (A.27) x ijk ∈ {0, 1}, ∀k ∈ K, ∀i, j ∈ V, (A.28) u ik ∈ Z, ∀k ∈ K, i ∈ V (A.29) j∈V \S x s k jk = 1, ∀k ∈ K (A.30) j∈V \S x ijk = 0, ∀k ∈ K, ∀i ∈ S \ s k (A.31) j∈V \S x ijk ≤ 1, ∀k ∈ K i , ∀i ∈ S1 (A.32) i∈V \S x ijk ≤ 1, ∀k ∈ K j , ∀j ∈ S2 (A.33) j∈V \S x ijk = 0, ∀k ∈ K, ∀i ∈ S2 (A.34) j∈V \S x ijk = 0, ∀k ∈ K, ∀i ∈ S1 (A.35) i∈S1 j∈V \S x ijk = i∈V \S j∈S2 x ijk , ∀k ∈ K (A.

B MIN-MAX MDVRP RESULTS

In this section, we provide the experiment results of MDVRP. We apply NCE with the f θ trained on FMDVRP instances to solve MDVRP. For each (N c , N d , N v ) pair, we measure the average makespan of 100 instances. We provide the MDVRP results in Tables A.1 to A.3. Similar to the FMDVRP experiments, NCE shows leading performance while being faster than the baselines. From the results, we can conclude that the learned f θ is transferable to the different problem sets. This phenomenon is rare in many ML-based approaches. It again highlights the effectiveness of learning fundamental operators (i.e., learn what should be cross exchanged) when solving the VRP families. In this section, we provide the results for min-sum CVRP benchmark instances. To solve min-sum CVRP, we trained NCE to predict min-sum cost decrement with C(τ 1 , τ 2 ) defined as l(τ 1 ) + l(τ 2 ). As shown in Table A .7, NCE is on par with or outperforms other neural baselines. However, NCE is not efficient in terms of computation speed. This is because we initially design the CE operation to solve the min-max VRPs not min-sum VRPs. According to the experimental results, when CE operation is conducted to reduce the min-sum cost of two tours, it typically requires larger iterations to improve the solution to near optimum. This results indicates that other (local) search operator is required to efficiently solve the min-sum problems. 

H EVALUATION OF THE COST DECREMENT MODEL

In this section, we evaluate the prediction accuracy of f θ (•). To evaluate f θ (•), we randomly generate 1,000 FMDVRP instances by sampling N C ∼ U(10, 100) and N D ∼ U(2, 9), and (x, y) ∼ U(0, 1) 2 . From the instances, we measure the ratio of existence of the argmax (a 1 , a 2 ) pair in the search candidate set whose size is K. As shown in Table A .12, when K ≥ 10, NCE can find the argmax pair with at least 0.9 probability. We also provide the results of the cost-decrement predictions and its corresponding cost. As shown in Table A .12: f θ (•) prediction performance test {20, 30, 40, 50, 60, 70, 80, 90, 100} with the fixed N d = 3 and N v = 3. As shown in Table A Because NCE is an improvement heuristic, the more computation time is used, the better outcome will be produced. Thus, we investigate how the performance varies with the allowed computation time. We sampled 100 instances of FMDVRP (N c , N d , N v )=(60, 5, 5) and computed the average cost and computation time of NCE and baselines on these problem instances. 



Figure 1: The overall procedure of improvement heuristic that uses CE as the inter-operation.

Fig. A.1, f θ (•) predicts the general tendency well.

Figure A.1: Predicted cost-decrements vs. true cost-decrements

Figure A.5: MDVRP solutions computed by NCE and OR-tools

Fig. A.8 depicts the relationships between "Run time (computational cost) vs. Cost curve" in the form of a Pareto curve for the case of FMDVRP (N c , N d , N v )=(60, 5, 5). The result shows how the performance of NCE improves with more computation time, and this trend is always superior to another baseline.

VRP instance P , cost-decrement prediction model f θ , Perturbation parameter p Output: Optimized tours {τ * i } i∈|V| {τi} i∈|V| ← GetInitialSolution(P )

FMDVRP results (small-sized instances)

FMDVRP results (medium-sized instances)

FMDVRP results (large-sized instances)

Average makespans of the random mTSPs: DAN and ScheduleNet results are taken from the original papers, † computational time of DAN is measured with the Nvidia RTX 3090.





36) where Eqs. (A.30) and (A.31) indicate that each vehicle starts at its own depot. Eqs. (A.32) to (A.35) indicate the start and return depots constraints. Eq. (A.36) indicates the balance equation of the start and return depots.



7: CVRP benchmark results: Best in bold; Second underline, (s.n) indicates the best results of n sampling, (i.n) indicates the best results after n improvement steps, and † indacates that the computation times of the neural baselines are measured with GPU. The run times ofLu et al. (2019) andKwon et al. (2020) are taken fromKwon et al. (2020). The run times of the other neural baselines are taken fromKim et al. (2021).

.13, NCE shows nearly identical performances with CE, but with significantly faster computation speed than CE as shown in Fig. A.2. We also test the mTSP and CVRP cases as shown in Table A.14. Table A.13: FMDVRP performance comparison of CE and NCE(K=10, p=0) 14: mTSP, CVRP performance comparison of CE and NCE VRP instance mTSP50,N v =5 mTSP100,N v =10 mTSP200,N

A MILP FORMULATIONS FOR MIN-MAX ROUTING PROBLEMS

This section provides the mixed integer linear programming (MILP) formulations of mTSP, MD-VRP, and FMDVRP.A.1 MTSP mTSP is a multi-vehicle extension of the traveling salesman problem (TSP). mTSP comprises of a set of the nodes (i.e., cities) and depot V , a set of vehicles K, and a set of depot S. We define d ij as the cost (or travel time) between node i and j, and the decision variable x ijk denotes whether the edge between node i and j are taken by vehicle k. Following the convention, we consider mTSP with |S| = 1. The MILP formulation of mTSP is given as follows: minimize Q (A.1) subject to.i∈V j∈Vj∈V i̸ =jx ijk = 1, ∀k ∈ K, ∀i ∈ S, (A.3) i∈V j̸ =i k∈Tx ijk = 1, ∀j ∈ V \ S (A.4)i∈V i̸ =jx ijk -h∈V h̸ =jx jhk = 0, ∀j ∈ V \ S (A.5)where Q denotes the longest traveling distance among multiple vehicles. (i.e., makespan), Eq. (A.3) indicates the vehicles start at the depot, Eq. (A.4) indicates that all cities are visited, Eq. (A.5) indicates the balance equation for all cities, Eq. (A.6) and Eq. (A.7) indicate the sub-tour eliminations.

C MIN-MAX MTSP RESULTS

In this section, we provide the additional experiment results of mTSP. We further apply NCE to solve mTSPLib (mTSPLib), which comprises of mTSP instances from real cities, and large scale problem. As reported in 

F ABLATION STUDY

In this section, we provide the results of the ablation studies.F.1 CANDIDATE SET NCE constructed a search candidate set. To mitigate the prediction error of f θ (•) in finding the argmax of (a 1 , a 2 , b 1 , b 2 ), NCE search the top K pairs of (a 1 , a 2 ) that have the largest y * out of all (a 1 , a 2 ) choices. We measure how the performance changes whenever the size of the candidate set Kchanges. As shown in Table A .8, as the size of K increases, the performance tends to increase slightly. When K ≥ 10, the performance of NCE almost converges. Thus, we choose K = 10 as the default hyperparameter of NCE. NCE repeatedly applies the inter-and intra-operation. In this view, the choice of the intra-operation may affect the performance of NCE. In this subsection, we measured the performance of NCE according to intra-operation. We compare the results of NCE that uses Elkai, OR-tools, and 2-opt as the intra-operator. To solve TSP -the task intra-operator has to solve -, Elkai, OR-tools, and 2-opt show the best, second best, and third best performances. As shown in Table A .9, the performances of NCE are almost identical to the selection of an intra-operator. We validate that the effect of the intra-operation choice is negligible to the performance. F.4 PERTURBATION NCE employs perturbation to increase performance. Perturbation is a commonly used strategy for enhancing the performance of meta-heuristics (Polat et al., 2015) . It is done by randomly perturbing the solution and solving the problem with the perturbed solutions. This technique is beneficial to escape from the local optima. As described in Algorithm 1, when falling into the local optima, NCE randomly selects two tours and performs a random exchange. We compare the performance of NCE with different perturbations. As shown in Table A .11, the performance of NCE increases and converges as the number of perturbations p increases. When p = 5, the performance of NCE converges. Thus, we choose p = 5 as the default hyperparameter of NCE. We applied the trained NCE to solve a real-world application, the task of deriving the cooperative path for multiple sterilizing robots to sterilize the open space of a building in the minimum time.Here, the task nodes correspond to the spatial grids composing the entire building floor area, as shown in Fig. A.6 . It consists of a total of 500 grids, of which 25 are obstacle areas (gray grids) that a robot cannot move into, and 99 are contaminated areas (orange grids) a robots need to sterilize. We assume the traveling time between the two adjacent grids is 1, the service time for the general area (green grids) is 1, and 2 for the contaminated areas. The upper left corner is the starting point. We formulate finding the cooperative paths for multiple robots to minimize the operation time as the min-max mTSP problem. A .15 shows the time and objective value of solving the problem for each algorithm. The results show that the NCE algorithm trained using a synthetic dataset can produce an efficient cooperative path for multiple robots without retraining or fine-tuning. NCE reduces more than 10% of the makespan compared to Google OR-tool and 20% to Greedy+TSP when the number of robots is 5.Note that the distribution of tasks is far from uniform distribution; each grid is clustered, and each cluster is separated due to the unique floor plan of a building. Thus, the experiment results verify the generalization capability of NCE to problem instances generated from a different distribution. 

M GENERALIZATION TO DIFFERENT DISTRIBUTION

We investigate the generalization capability of NCE to the problem instances generated from different distributions. We have employed the NCE trained by FMDVRP random instances to the problems generated by the different distributions introduced in (Bi et al., 2022) . We used cluster, expansion, explosion, grid, implosion, and mixed distribution as unseen distributions for testing. We followed the definition in the (Bossek et al., 2019; Jiang et al., 2022; Bi et al., 2022) , and the data were taken from the (Bi et al., 2022) .Table A .16 summarizes the experiment result. As in the case of uniform distribution, NCE shows leading performance while being faster than OR-Tools and CE. Note that we did not retrain our NCE algorithm but just employed the already trained existing NCE algorithm to new distribution tasks.From the results, we can conclude that the learned f θ is generalizable to the different distributions other than the uniform. It again highlights the effectiveness of learning fundamental operators when solving the VRP families. 

