GENERALIZE LEARNED HEURISTICS TO SOLVE LARGE-SCALE VEHICLE ROUTING PROBLEMS IN REAL-TIME

Abstract

Large-scale Vehicle Routing Problems (VRPs) are widely used in logistics, transportation, supply chain, and robotic systems. Recently, data-driven VRP heuristics are proposed to generate real-time VRP solutions with up to 100 nodes. Despite this progress, current heuristics for large-scale VRPs still face three major challenges: 1) Difficulty in generalizing the heuristics learned on small-scale VRPs to large-scale VRPs without retraining; 2) Challenge in generating real-time solutions for large-scale VRPs; 3) Difficulty in embedding global constraints into learned heuristics. We contribute in the three directions: We propose a Two-stage Divide Method (TAM) to generate sub-route sequence rather than node sequence for generalizing the heuristics learned on small-scale VRPs to solve large-scale VRPs in real-time. A two-step reinforcement learning method with new reward and padding techniques is proposed to train our TAM. A global mask function is proposed to keep the global constraints satisfied when dividing a large-scale VRP into several small-scale Traveling Salesman Problems (TSPs). As result, we can solve the small-scale TSPs in parallel quickly. The experiments on synthetic and real-world large-scale VRPs show our method could generalize the learned heuristics trained on datasets of VRP 100 to solve VRPs with over 5000 nodes in real-time while keeping the solution quality better than data-driven heuristics and competitive with traditional heuristics.

1. INTRODUCTION

Vehicle Routing Problems (VRPs) are widely used in logistics, supply chain, transportation, and robotic systems (Toth & Vigo, 2002b; Golden et al., 2008; Bullo et al., 2011) . For instance, on e-commerce platforms, hundreds and thousands of goods are sold in real-time and then transported to customers with maximum efficiencies, minimum number of vehicles, and shortest distance. Therefore, more large-scale VRPs need to be solved in real-time to improve logistics or transportation efficiency (Dong et al., 2021; Duan et al., 2020) . Although VRP is one of the most well-studied combinatorial optimization problems, the large-scale VRP is still challenging due to its NP-hard characteristic (Golden et al., 2008) . Exact methods or solvers (such as branch and bound (Toth & Vigo, 2002a), branch and cut (Naddef & Rinaldi, 2002) , column generation (Chabrier, 2006) , Gurobi, and Cplex) could obtain global optimal solutions on small-scale VRPs with theory guarantee. However, these methods are time-consuming and hard to be extended to large-scale VRPs because permutation number is growing exponentially. Traditional heuristics or solvers could solve small-scale VRPs quickly with near-optimal solutions. Some heuristics could be extended to solve large-scale VRPs (Ortools (Perron & Furnon), LKH3 (Helsgaun, 2017), HGS (Vidal, 2022; Vidal et al., 2012), and SISRs (Christiaens & Vanden Berghe, 2020) ). However, massive iterations are needed to obtain good solutions. The algorithm for solving large-scale VRP in real-time (seconds) still lags behind. Recently, data-driven methods are proposed to learn heuristics for constructing VRP solutions directly (Vinyals et al., 2015) . Deep learning methods like Transformer (Vaswani et al., 2017; Kool et al., 2019; Peng et al., 2020) and Graph neural network (Kipf & Welling, 2016; Khalil et al., 2017; Joshi et al., 2019) are used to extract hidden states of VRPs and TSPs, which is called Encoder. The solution sequences of VRPs are then generated in an autoregressive way from the hidden states,

