GENERALIZE LEARNED HEURISTICS TO SOLVE LARGE-SCALE VEHICLE ROUTING PROBLEMS IN REAL-TIME

Abstract

Large-scale Vehicle Routing Problems (VRPs) are widely used in logistics, transportation, supply chain, and robotic systems. Recently, data-driven VRP heuristics are proposed to generate real-time VRP solutions with up to 100 nodes. Despite this progress, current heuristics for large-scale VRPs still face three major challenges: 1) Difficulty in generalizing the heuristics learned on small-scale VRPs to large-scale VRPs without retraining; 2) Challenge in generating real-time solutions for large-scale VRPs; 3) Difficulty in embedding global constraints into learned heuristics. We contribute in the three directions: We propose a Two-stage Divide Method (TAM) to generate sub-route sequence rather than node sequence for generalizing the heuristics learned on small-scale VRPs to solve large-scale VRPs in real-time. A two-step reinforcement learning method with new reward and padding techniques is proposed to train our TAM. A global mask function is proposed to keep the global constraints satisfied when dividing a large-scale VRP into several small-scale Traveling Salesman Problems (TSPs). As result, we can solve the small-scale TSPs in parallel quickly. The experiments on synthetic and real-world large-scale VRPs show our method could generalize the learned heuristics trained on datasets of VRP 100 to solve VRPs with over 5000 nodes in real-time while keeping the solution quality better than data-driven heuristics and competitive with traditional heuristics.

1. INTRODUCTION

Vehicle Routing Problems (VRPs) are widely used in logistics, supply chain, transportation, and robotic systems (Toth & Vigo, 2002b; Golden et al., 2008; Bullo et al., 2011) . For instance, on e-commerce platforms, hundreds and thousands of goods are sold in real-time and then transported to customers with maximum efficiencies, minimum number of vehicles, and shortest distance. Therefore, more large-scale VRPs need to be solved in real-time to improve logistics or transportation efficiency (Dong et al., 2021; Duan et al., 2020) . Although VRP is one of the most well-studied combinatorial optimization problems, the large-scale VRP is still challenging due to its NP-hard characteristic (Golden et al., 2008) . Exact methods or solvers (such as branch and bound (Toth & Vigo, 2002a) , branch and cut (Naddef & Rinaldi, 2002) , column generation (Chabrier, 2006) , Gurobi, and Cplex) could obtain global optimal solutions on small-scale VRPs with theory guarantee. However, these methods are time-consuming and hard to be extended to large-scale VRPs because permutation number is growing exponentially. Traditional heuristics or solvers could solve small-scale VRPs quickly with near-optimal solutions. Some heuristics could be extended to solve large-scale VRPs (Ortools (Perron & Furnon), LKH3 (Helsgaun, 2017), HGS (Vidal, 2022; Vidal et al., 2012), and SISRs (Christiaens & Vanden Berghe, 2020) ). However, massive iterations are needed to obtain good solutions. The algorithm for solving large-scale VRP in real-time (seconds) still lags behind. Recently, data-driven methods are proposed to learn heuristics for constructing VRP solutions directly (Vinyals et al., 2015) . Deep learning methods like Transformer (Vaswani et al., 2017; Kool et al., 2019; Peng et al., 2020) and Graph neural network (Kipf & Welling, 2016; Khalil et al., 2017; Joshi et al., 2019) are used to extract hidden states of VRPs and TSPs, which is called Encoder. The solution sequences of VRPs are then generated in an autoregressive way from the hidden states, which is called Decoder. Reinforcement learning techniques are also applied to train the encoderdecoder model (sequence-to-sequence model) to improve its accuracy (Nazari et al., 2018) . These learn-to-construct heuristics can outperform or be comparable to traditional VRP heuristics with up to 100 nodes. However, when it comes to large-scale VRPs (over 1000 nodes), the learned heuristics still face three challenges: 1) the training of data-driven large-scale VRP model is time-consuming and computationally expensive. For instance, the computation complexity and memory space of training the Transformer are quadratic to the lengths of the input sequence (nodes number of VRP) (Kool et al., 2019; Kitaev et al., 2019) ; 2) the model trained on small-scale VRPs is difficult to be generalized to large-scale VRPs because the nodes distribution of large-scale VRPs in test dataset is different from that of the small-scale VRPs in the training dataset; 3) the constraints like maximum vehicle number are hard to be encoded in the encoder-decoder model because global constraints become active only at the end of the sequence. Although the limitations of traditional and data-driven methods, we ask: Could we generalize the learned heuristics to solve large-scale VRPs in real-time by taking advantages of both data-driven and traditional methods? We try to answer this question from the following perspectives: 1) Although the traditional heuristic methods are time-consuming when solving large-scale VRPs, they can quickly obtain optimal or near-optimal solutions with some theory guarantees when solving small-scale VRPs. We observe that vehicle capacity for real-world large-scale VRPs is limited, and each vehicle serves a few customers. If we know the customers that each vehicle needs to serve, then the original large-scale VRP could be divided into several small-scale TSPs, which could be solved by traditional heuristics quickly and parallelly. 2) The generalization of data-driven heuristics to large-scale VRPs is difficult because the sequenceto-sequence model needs to learn the distribution of each node in the long sequence. We observe that if we just model the distribution of sub-routes and ignore the order of nodes inside a sub-route, then we could possibly better generalize the model trained on small-scale VRPs to solve large-scale VRPs. 3) Although the global constraints are only active at the end of the sequence, we could design a global mask function with theory guarantee to prevent the infeasible solution beforehand. In addition, the global constraints could include some prior information, which helps improve the generalization of the learned heuristics. For instance, we observe that the predefined maximum vehicle number could provide some global information about the possible range of the optimal vehicle number in the testing dataset, which could help identify the minimum travel length. Driven by the above analysis, we present a Two-stage Divide Method (TAM) in Figure 1 for generalizing the learned heuristics to solve large-scale VRPs in real-time with a zero-shot way. Our TAM combines the real-time advantages of data-driven methods and the generalization advantages of traditional heuristics. It first divides a large-scale VRP into several smaller sub-routes by generating



Figure 1: Our TAM framework. In the first stage (green), a learned model divides large-scale VRP into several small TSPs while satisfying VRP constraints like capacity and maximum vehicle number. Then, the original TSPs are padded to the same number of nodes at the training time. In the second stage (orange), all TSPs or sub-routes with simple constraints are optimized in parallel.

