BQ-NCO: BISIMULATION QUOTIENTING FOR GENER-ALIZABLE NEURAL COMBINATORIAL OPTIMIZATION

Abstract

Despite the success of Neural Combinatorial Optimization methods for end-toend heuristic learning, out-of-distribution generalization remains a challenge. In this paper, we present a novel formulation of combinatorial optimization (CO) problems as Markov Decision Processes (MDPs) that effectively leverages symmetries of the CO problems to improve out-of-distribution robustness. Starting from the standard MDP formulation of constructive heuristics, we introduce a generic transformation based on bisimulation quotienting (BQ) in MDPs. This transformation allows to reduce the state space by accounting for the intrinsic symmetries of the CO problem and facilitates the MDP solving. We illustrate our approach on the Traveling Salesman, Capacitated Vehicle Routing and Knapsack Problems. We present a BQ reformulation of these problems and introduce a simple attention-based policy network that we train by imitation of (near) optimal solutions for small instances from a single distribution. We obtain new state-ofthe-art generalization results for instances with up to 1000 nodes from synthetic and realistic benchmarks that vary both in size and node distributions.

1. INTRODUCTION

Combinatorial Optimization problems are crucial in many application domains such as transportation, energy, logistics, etc. Because they are generally NP-hard (Cook et al., 1997) , their resolution at real-life scales is mainly done by heuristics, which are efficient algorithms that generally produce good quality solutions (Boussaïd et al., 2013) . However, strong heuristics are generally problemspecific and designed by domain experts. Neural Combinatorial Optimization (NCO) is a relatively recent line of research that focuses on using deep neural networks to learn such heuristics from data, possibly exploiting information on the specific distribution of problem instances of interest (Bengio et al., 2021; Cappart et al., 2021) . Despite the impressive progress in this field over the last few years, their out-of-distribution generalization, especially to larger instances, remains a major hurdle (Joshi et al., 2022; Manchanda et al., 2022) . In this paper, we are interested in constructive NCO methods, which build a solution incrementally, by applying a sequence of elementary steps. These methods are often quite generic, see e.g. the seminal papers by Khalil et al. (2017); Kool et al. (2019) . Most CO problems can indeed be represented in this way, although the representation is not unique as the nature of the steps is, to a large extent, a matter of choice. Given a choice of step space, solving the CO problem amounts to computing an optimal policy for sequentially selecting the steps in the construction. This task can typically be performed in the framework of Markov Decision Processes (MDP), through imitation or reinforcement learning. The exponential size of the state space, inherent to the NP-hardness of combinatorial problems, usually precludes other methods such as (tabular) dynamic programming. Whatever the learning method used to solve the MDP, its efficiency, and in particular its out-ofdistribution generalization capabilities, greatly depends on the state representation. The state space is often characterized by deep symmetries, which, if they are not adequately identified and leveraged, hinders the training process by forcing it to independently learn the policy at states which in fact are essentially the same (modulo some symmetry). In this work, we investigate a type of symmetries which often occurs in MDP formulations of constructive CO heuristics. We first introduce a generic framework to systematically derive a naive CO problem-specific MDP. We formally demonstrate the equivalence between solving the MDP and solving the CO problem and highlight the flexibility of the MDP formulation, by defining a minimal set of conditions for the equivalence to hold. Our framework is general and easy to specialize to encompass previously proposed learning-based construction heuristics. We then show that the state space of this naive MDP is inefficient because it fails to capture deep symmetries of the CO problem, even though such symmetries are easy to identify. Therefore, we propose a method to transform the naive MDP, based on the concept of bisimulation quotienting (BQ), in order to get a reduced state space, which is easier to process by the usual (approximate) MDP solvers. We illustrate our approach on three well-known CO problems, the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP) and Knapsack Problem (KP). Furthermore, we propose a simple transformer-based architecture for these problems, that we train by imitation of expert trajectories derived from (near) optimal solutions. In particular, we show that our model is well-suited for our BQ formulation: it spends a monotonically increasing amount of computation as a function of the subproblem size (and therefore complexity), in contrast to most previous models. Finally, extensive experiments confirm the validity of our approach, and in particular its state-of-theart out-of-distribution generalization capacity. In summary, our contributions are as follows: 1) We present a generic and flexible framework to define a construction heuristic MDP for arbitrary CO problems; 2) We propose a method to simplify commonly used "naive" MDPs for constructive NCO via symmetry-focused bisimulation quotienting; 3) We design an adequate transformer-based architecture for the new MDP, for the TSP, CVRP and KP; 4) We achieve state-of-the-art generalization performance on these three problems.

2. COMBINATORIAL OPTIMIZATION AS A MARKOV DECISION PROBLEM

In this section, we present a generic formalization of constructive heuristics which underlies their MDP formulation. A deterministic CO problem is denoted by a pair (F, X ), where F is its objective function space and X its (discrete) solution space. A problem instance f ∈F is a mapping f :X →R∪{∞}, with the convention that f (x)=∞ if x is infeasible for instance f . A solver for problem (F, X ) is a functional: SOLVE : F → X satisfying SOLVE(f ) = arg min x∈X f (x). Incremental solution construction Constructive heuristics for CO problems build a solution sequentially, starting at an empty partial solution and expanding it at each step until a finalized solution is reached. Many NCO approaches are based on a formalization of that process as an MDP, e.g. Khalil et al. (2017); Kool et al. (2019); Zhang et al. (2020) . Such an MDP can be obtained, for an arbitrary CO problem (F, X ), using the following ingredients: • Steps: T is a set of available steps to construct solutions. A partial solution is a pair (f, t 1:n ) of a problem instance f ∈F and a sequence of steps t 1:n ∈T * (the set of sequences of elements of T ). Observe that a partial solution (in F×T * ) is not a solution (in X ), but may represent one. • Representation: SOL:F ×T * →X ∪{⊥} is a mapping that takes a partial solution and returns either a feasible solution (in which case the partial solution is said to be finalized), or ⊥ otherwise. • Evaluation: VAL:F ×T * →R∪{∞} is a mapping that takes a partial solution and returns an estimate of the minimum value of its expansions into finalized solutions. If the returned value is finite, the partial solution is said to be admissible. In order to define an MDP using these ingredients, we assume they satisfy the following axioms: ∀f ∈F, x∈X , f (x) < ∞ ⇔ ∃t 1:n ∈ T * such that SOL(f, t 1:n ) = x, (2a) ∀f ∈F, t 1:n ∈T * , SOL(f, t 1:n ) ̸ = ⊥ ⇒ ∀m∈{1:n-1}, SOL(f, t 1:m ) = ⊥, (2b) ∀f ∈F, t 1:n ∈T * , x∈X , SOL(f, t 1:n ) = x ⇒ VAL(f, t 1:n ) = f (x), ∀m ∈ {1:n-1}, VAL(f, t 1:m ) < ∞. Equation 2a states that the feasible solutions are exactly those represented by a finalized partial solution; Equation 2b states that if a partial solution is finalized then none of its preceding partial solutions in the construction can also be finalized; Equation 2c states that the evaluation of a finalized partial solution is the value of the solution it represents, and all its preceding partial solutions are admissible. We call a triplet ⟨T , SOL, VAL⟩ satisfying the above axioms a specification of problem (F, X ). Note that a specification is not intrinsic to the problem. The step space T results from a choice

