REWRITING BY GENERATING: LEARN TO SOLVE LARGE-SCALE VEHICLE ROUTING PROBLEMS

Abstract

The large-scale vehicle routing problems are defined based on the classical VRPs with thousands of customers. It is of great importance to find an efficient and high-quality solution for real-world applications. However, existing algorithms for VRPs including non-learning heuristics and RL-based methods, only perform well on small-scale instances with usually no more than a hundred customers. They are unable to solve large-scale VRPs due to either high computation cost or explosive solution space that results in model divergence. Inspired by the classical idea of Divide-and-Conquer, we present a novel Rewriting-by-Generating(RBG) framework with hierarchical RL agents to solve large-scale VRPs. RBG consists of a rewriter agent that refines the customer division globally and an elementary generator to infer regional solutions locally. Extensive experiments demonstrate the effectiveness and efficiency of our proposed RBG framework. It outperforms LKH3, the state-of-the-art method for CVRPs, by 2.43% when customer number N = 2000 and shortens the inference time by about 100 times 1 .

1. INTRODUCTION

The Large-Scale Vehicle Routing Problems (VRPs) is an important combinatorial optimization problem defined upon an enormous distribution of customer nodes, usually more than a thousand. An efficient and high-quality solution to large-scale VRPs is critical to many real-world applications. Meanwhile, most existing works focus on finding near-optimal solutions with only no more than a hundred customers because of the computational complexity (Laporte, 1992; Golden et al., 2008; Braekers et al., 2016) . Originated from the NP-hard nature as a VRPs, the exponential expansion of solution space makes it much more difficult than solving a small-scale one. Therefore, providing effective and efficient solutions for large-scale VRPs is a challenging problem (Fukasawa et al., 2006) . Current algorithms proposed for routing problems can be divided into traditional non-learning based heuristics and reinforcement learning (RL) based models. Many routing solvers involve heuristics as their core algorithms, for instance, ant colony optimization (Gambardella et al., 1999) and LKH3 (Helsgaun, 2017), which can find a near optimal solution by greedy exploration. However, they become inefficient when the problem scale extends. Apart from traditional heuristics, RL based VRPs solvers have been widely studied recently to find more efficient and effective solutions (Dai et al., 2017; Nazari et al., 2018; Bello et al., 2017; Kool et al., 2019; Chen & Tian, 2019; Lu et al., 2020) . Thanks to the learning manner that takes every feedback from learning attempts as signals, RL based methods rely on few hand-crafted rules and thus can be widely used in different customer distributions without human intervention and expert knowledge. Besides, these RL methods benefit from a pre-training process allowing them to infer solutions for new instances much faster than traditional heuristics. However, current RL agents are still insufficient to learn a feasible policy and generate solutions directly on large-scale VRPs due to the vast solution space, which is usually N ! for N customers. More specifically, the solution space of a large-scale VRPs with 1000 customers is e 2409 much larger than that of a small-scale one with only 100 customers. Consequently, the complexity makes the agent difficult to fully explore and makes the model hard to learn useful knowledge in large-scale VRPs. To avoid the explosion of solution space in large-scale VRPs, we consider leveraging the classic Divide-and-Conquer idea to decompose the enormous scale of the original problem. In particularly,



Codes and data will be released at https://github.com/RBG4VRPs/Rewriting-By-Generating 1

