DISTANCE VS. COORDINATE: DISTANCE-BASED EM-BEDDING IMPROVES MODEL GENERALIZATION FOR ROUTING PROBLEMS

Abstract

Routing problems, such as traveling salesman problem (TSP) and vehicle routing problem, are among the most classic research topics in combinatorial optimization and operations research (OR). In recent years, with the rapid development of online service platforms, there has been renewed interest in applying this study to facilitate emerging industrial applications, such as food delivery and logistics services. While OR methods remain the mainstream technique, increasing efforts have been put into exploiting deep learning (DL) models for tackling routing problems. The existing DL methods often consider the embedding of the route point coordinate as a key model input and are capable of delivering competing performance in synthetic or simplified settings. However, it is empirically noted that this line of work appears to lack robustness and generalization ability that are crucial for real-world applications. In this paper, we demonstrate that the coordinate can unexpectedly lead to these problems. There are two factors that make coordinate rather 'poisonous' for DL models: i) the definition of distance between route points is far more complex than what coordinate can depict; ii) the coordinate can hardly be sufficiently 'traversed' by the training data. To circumvent these limitations, we propose to abandon the coordinate and instead use the relative distance for route point embedding. We show in both synthetic TSP and real-world food pickup and delivery route prediction problem that our design can significantly improve model's generalization ability, and deliver competitive or better performance with existing models.

1. INTRODUCTION

Inspired by the success of deep models, such as Transformer (Vaswani et al., 2017) in tackling language tasks and graph neural network (GNN) (Scarselli et al., 2008) in dealing with unstructured data, growing number of researchers have been attracted to explore the potential of deep learning (DL) models in dealing with routing problems, a research direction historically being dominated by operations research (OR) methods for decades. Numerous DL models, which have achieved success in other research areas, are applied to solve traditional routing problems, such as traveling salesman problem (TSP) and vehicle routing problem (VRP). More recently, with the urgent requirements from online logistics service platforms, route prediction has also become an emerging research topic. For example, the platform usually needs to predict and evaluate whether a package is 'distanceconsuming' if it is dispatched to a courier. The predicted route, as well as related route properties, can be used in these evaluations and is vital for improving platform performance. These two kinds of problems, namely route optimization and route prediction, are also the main focus of this paper. Routing problems, to a great extent, can be defined by the properties of route points (or called nodes in some literature) and the relationship among them. In light of this, it is not surprising to understand that route point characterization plays an irreplaceable role in the algorithm design. To the best of our knowledge, almost all existing DL models tend to take the route point coordinates or their corresponding embedding as the model input. With such coordinate information, competitive performance are achieved via numerical experiments, mostly conducted on synthetic data. However, when it comes to the real-world data, we empirically note that the coordinate information turns to be 'poisonous', rather than informative. A DL model which employs the coordinate information often delivers less promising results even after training with large scale dataset. Moreover, by adding noises or perturbations to the coordinate input, the model performance may drop dramatically. In comparison, the classic OR based methods seldom face these problems. This may explain why OR methods are still the de-facto solutions for many industrial-level routing problems -there remains a great practical gap between OR and DL in real-world applications and the generalization ability of DL models still concerns. We demonstrate that it might be better to abandon the coordinate in order to improve the model generalization ability. More specifically, the lack of generalization ability in many existing DL models is closely related to the 'curse of coordinate'. Two main reasons may support this finding. First, coordinate may not be what we really need for the routing problems. In problems such as TSP, the goal is to minimize the total traveling distance. It suffices to provide the distance between route points, rather than the coordinate information. Moreover, in the real world, the distance information usually relies on a complicated geographic information system. It is far more complex than what simple coordinate can depict. Figure 1 provides a simple illustration for a TSP. The distance between point A and B becomes much longer when a barrier (e.g., a mountain or a high way) exists, while the distance stays the same through the lens of coordinate. The distance change further results in the optimal solution change. This information mismatch, which is common in real-world data, may significantly decay the model performance. Second, the large-scale data may not be large enough to provide sufficient samples for the realworld coordinate. Apart from inferring the distance from the coordinate, a natural idea can be borrowed from the language tasks. Similarly, we can treat coordinate as word tokens and try to learn the token embedding through large scale data training. However, unlike the coordinate in the synthetic setting, where nearby coordinate are assumed to resemble each other, two nearby coordinates in the real world can be significantly different. Also take Figure 1 as an example, although point A and B with a barrier are close enough in the coordinate space, they should differ drastically due the distance problem. Therefore, the number of possible coordinate explodes in the real world. The DL models may not be sufficiently trained and thus lack generalization ability when using the coordinate. Our treatment to the 'curse of coordinate' is simple. We propose to use the relative distance between points instead of the coordinate itself. As shown in Figure 2 , a common practice in existing deep models is to first project the coordinates into embedding vectors, then feed the embedding vectors into the deep models. We argue that, by simply replacing the coordinate with a distance vector containing distances from the point to all the points (include the point itself), we can achieve significantly better generalization ability and better model performances compared to the existing work. Moreover, our design helps DL models approach to competitive or even better results compared to OR methods in real-world applications. This could pave the way for the large application of DL models in the industry. The remaining paper is organized as follows. In Section 2, we review the existing DL methods for handling routing problems. In Section 3, we give detailed discussions on why the distance-based embedding outperforms the coordinate-based embedding. In Section 4, we support our insight via experiments conducted on both synthetic and real-world data.

2. RELATED WORK

The recent years have witnessed the vibrant use of DL models, such as Transformer (Vaswani et al., 2017) and GNN (Scarselli et al., 2008) , for tackling the route optimization and prediction problems.



Figure 1: TSP Example: A barrier between point A and B significant changes the optimal solution.

