REWRITING BY GENERATING: LEARN TO SOLVE LARGE-SCALE VEHICLE ROUTING PROBLEMS

Abstract

The large-scale vehicle routing problems are defined based on the classical VRPs with thousands of customers. It is of great importance to find an efficient and high-quality solution for real-world applications. However, existing algorithms for VRPs including non-learning heuristics and RL-based methods, only perform well on small-scale instances with usually no more than a hundred customers. They are unable to solve large-scale VRPs due to either high computation cost or explosive solution space that results in model divergence. Inspired by the classical idea of Divide-and-Conquer, we present a novel Rewriting-by-Generating(RBG) framework with hierarchical RL agents to solve large-scale VRPs. RBG consists of a rewriter agent that refines the customer division globally and an elementary generator to infer regional solutions locally. Extensive experiments demonstrate the effectiveness and efficiency of our proposed RBG framework. It outperforms LKH3, the state-of-the-art method for CVRPs, by 2.43% when customer number N = 2000 and shortens the inference time by about 100 times 1 .

1. INTRODUCTION

The Large-Scale Vehicle Routing Problems (VRPs) is an important combinatorial optimization problem defined upon an enormous distribution of customer nodes, usually more than a thousand. An efficient and high-quality solution to large-scale VRPs is critical to many real-world applications. Meanwhile, most existing works focus on finding near-optimal solutions with only no more than a hundred customers because of the computational complexity (Laporte, 1992; Golden et al., 2008; Braekers et al., 2016) . Originated from the NP-hard nature as a VRPs, the exponential expansion of solution space makes it much more difficult than solving a small-scale one. Therefore, providing effective and efficient solutions for large-scale VRPs is a challenging problem (Fukasawa et al., 2006) . Current algorithms proposed for routing problems can be divided into traditional non-learning based heuristics and reinforcement learning (RL) based models. Many routing solvers involve heuristics as their core algorithms, for instance, ant colony optimization (Gambardella et al., 1999) and LKH3 (Helsgaun, 2017) , which can find a near optimal solution by greedy exploration. However, they become inefficient when the problem scale extends. Apart from traditional heuristics, RL based VRPs solvers have been widely studied recently to find more efficient and effective solutions (Dai et al., 2017; Nazari et al., 2018; Bello et al., 2017; Kool et al., 2019; Chen & Tian, 2019; Lu et al., 2020) . Thanks to the learning manner that takes every feedback from learning attempts as signals, RL based methods rely on few hand-crafted rules and thus can be widely used in different customer distributions without human intervention and expert knowledge. Besides, these RL methods benefit from a pre-training process allowing them to infer solutions for new instances much faster than traditional heuristics. However, current RL agents are still insufficient to learn a feasible policy and generate solutions directly on large-scale VRPs due to the vast solution space, which is usually N ! for N customers. More specifically, the solution space of a large-scale VRPs with 1000 customers is e 2409 much larger than that of a small-scale one with only 100 customers. Consequently, the complexity makes the agent difficult to fully explore and makes the model hard to learn useful knowledge in large-scale VRPs. To avoid the explosion of solution space in large-scale VRPs, we consider leveraging the classic Divide-and-Conquer idea to decompose the enormous scale of the original problem. In particularly, dividing the large-scale customer distributions into small-scale ones and then generating individual regional solutions to reduce the problem complexity. However, how to obtain a refined region division where the local VRPs can be handled effectively and how to coordinate iterations between global and local optimization efficiently remain two challenges of our VRPs solvers. To tackle those two challenges above, we propose an RL-based framework, named Rewriting-by-Generating (RBG), to solve large-scale VRPs. The framework adopts a hierarchical RL structure, which consists of a "Generator" and a "Rewriter". Firstly, We divide customers into regions and use an elementary RL-based VRPs solver to solve them locally, known as the "Generation" process. After that, from a global perspective, a special "Rewriting" process is designed based on all regional generations, which rewrites the previous solution with new divisions and the corresponding new regional VRPs results. Within each rewriting step, we select and merge two regions into a hyperregion, and then further divide it into two new sub-regions according to the hyper-regional VRPs solution. By doing this, the problem scale is decomposed into pieces and the problem could be solved efficiently using regional RL-based solvers, and can still preserve the solution quality which is improved by the rewriter continuously. Extensive experiments demonstrate that our RBG framework achieves significant performance in a much more efficient manner. It has a significant advantage on solution quality to other RL-based methods, and outperforms the state-of-the-art LKH3 (Helsgaun, 2017), by 2.43% with the problem size of N = 2000 and could infer solutions about 100 times faster. Moreover, it also has a growing superiority to other methods when the problem scale increases.

Notations:

We introduce some fundamental notations of large-scale VRPs, while the complete formulation is presented in the Appendix. Let G(V, E) denote the entire graph of all customers and the depot. Specifically, V = {v 0 , v 1 , ..., v i , ..., v N }, where v 0 denotes the depot, and v i (1 ≤ i ≤ N ) denotes the i-th customer with its location (x i , y i ) and its demand d i . The edge e i,j , or E(v i , v j ) in another manner represents the traveling distance between v i and v j . Within the RBG framework, the generated regional VRPs solution π k = {v k,0 , v k,1 , v k,2 , ..., v k,N k } of the divided region G k has a corresponding traveling cost C(π k ) = N k i=0 E(v k,i , v k,i+1 ). The entire solution of all customers is denoted by π.

2. RELATED WORK

We discuss previous works which are related to our research in the following two directions: Traditional Heuristics. Since the exact methods (Laporte, 1992; Laporte & Nobert, 1987; Holland, 1992; Baldacci et al., 2010) are almost impossible to solve VRPs within a reasonable time due to the high computation complexity, researchers developed heuristics, i.e., non-exact methods, to find approximation solutions instead. Tabu search is one of the old metaheuristics (Glover, 1990b; a; Gendreau et al., 1994; Battiti & Tecchiolli, 1994) , which keeps searching for new solutions in the neighborhood of the current solution. Instead of focusing on improving merely one solution, genetic algorithms operate in a series of solutions (Goldberg, 1989; Holland, 1992) . It constructs new structures continuously based on parent structures. Instead of treating objectives to be optimized altogether, ant colony optimizations as another widely accepted solver, utilize several ant colonies to optimize different functions: the number of vehicles, the total distance and others (Gambardella et al., 1999; Dorigo et al., 2006; Dorigo & Di Caro, 1999) . Meanwhile, recreate search methods keep constructing the current solution and ruining the current ones to build better solutions. (Schrimpf et al., 2000) . This helps to expand the exploration space to prevent the local optimum. Among these categories, LKH3 is a state-of-the-art heuristic solver that empirically finds optimal solutions (Helsgaun, 2017). Although these heuristics, compared to exact methods, can improve searching efficiency, they are still much too time-consuming when applied to large-scale VRPs with the acceptable performance required, and may fail to respond to any real-time solution requests. RL based VRPs Solutions. Since the learning manner of reinforcement learning allows the agent model to directly infer solutions based on a pre-trained model with much shorter computation time, RL becomes a compelling direction on solving combinatorial optimizations. It has been successfully applied in VRPs particularly (Bello et al., 2017; Nazari et al., 2018; Kool et al., 2019) . Vinyals et al. (2015) was the first to adopt deep learning in combinatorial optimizations by a novel Pointer Network model. Inspired by this, Bello et al. (2017) proposed to use RL to learn model parameters as an



Codes and data will be released at https://github.com/RBG4VRPs/Rewriting-By-Generating

