TOWARDS ONE-SHOT NEURAL COMBINATORIAL SOLVERS: THEORETICAL AND EMPIRICAL NOTES ON THE CARDINALITY-CONSTRAINED CASE

Abstract

One-shot non-autoregressive neural networks, different from RL-based ones, have been actively adopted for solving combinatorial optimization (CO) problems, which can be trained by the objective score in a self-supervised manner. Such methods have shown their superiority in efficiency (e.g. by parallelization) and potential for tackling predictive CO problems for decision-making under uncertainty. While the discrete constraints often become a bottleneck for gradient-based neural solvers, as currently handled in three typical ways: 1) adding a soft penalty in the objective, where a bounded violation of the constraints cannot be guaranteed, being critical to many constraint-sensitive scenarios; 2) perturbing the input to generate an approximate gradient in a black-box manner, though the constraints are exactly obeyed while the approximate gradients can hurt the performance on the objective score; 3) a compromise by developing soft algorithms whereby the output of neural networks obeys a relaxed constraint, and there can still occur an arbitrary degree of constraint-violation. Towards the ultimate goal of establishing a general framework for neural CO solver with the ability to control an arbitrarysmall degree of constraint violation, in this paper, we focus on a more achievable and common setting: the cardinality constraints, which in fact can be readily encoded by a differentiable optimal transport (OT) layer. Based on this observation, we propose OT-based cardinality constraint encoding for end-to-end CO problem learning with two variants: Sinkhorn and Gumbel-Sinkhorn, whereby their violation of the constraints can be exactly characterized and bounded by our theoretical results. On synthetic and real-world CO problem instances, our methods surpass the state-of-the-art CO network and are comparable to (if not superior to) the commercial solver Gurobi. In particular, we further showcase a case study of applying our approach to the predictive portfolio optimization task on real-world asset price data, improving the Sharpe ratio from 1.1 to 2.0 of a strong LSTM+Gurobi baseline under the classic predict-then-optimize paradigm.

1. INTRODUCTION

Developing neural networks that can handle combinatorial optimization (CO) problems is a trending research topic (Vinyals et al., 2015; Dai et al., 2016; Yu et al., 2020) . A family of recent CO networks (Wang et al., 2019b; Li et al., 2019; Karalias & Loukas, 2020; Bai et al., 2019) improves the existing reinforcement learning-based auto-regressive CO networks (Dai et al., 2016; Lu et al., 2019) by solving the problem in one shot and relaxing the non-differentiable constraints, resulting in an end-to-end learning pipeline. The superiorities of one-shot CO networks are recognized in three aspects: 1) the higher efficiency by exploiting the GPU-friendly one-shot feed-forward network, compared to CPU-based traditional solvers (Gamrath et al., 2020) and the tedious auto-regressive 2020) . A neural network CO solver is built upon a problem encoder network, which firstly accepts raw problem data and predicts the decision variables for the problem. The decision variables are then passed to a differentiable formula to estimate the objective score, and finally, the objective score is treated as the self-supervised loss. All modules must be differentiable for end-to-end learning. As a CO solver, the output of the network should obey the constraint of the CO problem, while still preserving the gradient. Since the input-output mappings of CO are piece-wise constant, where the real gradient is zero almost everywhere or infinite when the output changes, it is notoriously hard to encode CO constraints in neural networks. There are three typical workarounds available: 1) In Karalias & Loukas (2020), the constraints are softly enforced by a penalty term, and the degree of constraint-violation can be hardly theoretically characterized nor controlled, which limits their applicability in many constraint-critical scenarios. Meanwhile, in the obligatory discretization step, adding penalty terms means that the algorithm must search a much larger space than if it was confined to feasible configurations, making the search less efficient and less generalizable (see Table 1 ). 2) The perturbation-based black-box differentiation methods (Pogančić et al., 2019; Paulus et al., 2021; Berthet et al., 2020) resorts to adding perturbation to the input-output mapping of discrete functions to estimate the approximate gradient as such the strict constraints are enforced in brute force, yet their approximate gradients may hurt the learning process. 3) The soft algorithms (Zanfir & Sminchisescu, 2018; Wang et al., 2019a; Sakaue, 2021) encode constraints to neural networks by developing approximate and differentiable algorithms for certain CO problems (graph matching, SAT, submodular), which is followed in this paper for their efficiency, yet there still remains the possibility of facing an arbitrary degree of constraint-violation. Towards the ultimate goal of devising a general CO network solver addressing all the above issues, in this paper, we focus on developing a more practical paradigm for solving the cardinality-constrained CO problems (Buchbinder et al., 2014) . The cardinality constraints ∥x∥ 0 ≤ k are commonly found in a wide range of applications such as planning facility locations in business operation (Liu, 2009) , discovering the most influential seed users in social networks (Chen et al., 2021), and predicting portfolios with controllable operational costs (Chang et al., 2000) . Under the cardinality constraint, we aim to find the optimal subset with size k. Likewise other discrete CO constraints, the cardinality constraint is non-trivial to differentiate through. In this paper, we propose to encode cardinality constraints to CO networks by a topk selection over a probability distribution (which is the output of an encoder network). An intuitive approach is to sort all probabilities and select the k-largest ones, however, such a process does not offer informative gradients. 1964) . With a follow-up differentiable computation of the self-supervised loss, we present a CO network whose output is softly cardinality-constrained and capable of end-to-end learning. However, our theoretical characterization of the Sinkhorn-based soft algorithm shows its violation of the cardinality constraint may significantly grow if the values of the k-th and (k + 1)-th probabilities are too close. Being aware of the perturbation-based differentiable methods (Pogančić et al., 2019; Paulus et al., 2021; Berthet et al., 2020) and the Gumbel trick (Jang et al., 2017; Mena et al., 2018; Grover et al., 2019) that can build near-discrete neural networks, in this paper, we further incorporate the Gumbel trick which is crucial for strictly bounding the constraint-violation term to an arbitrary



Comparison among CO networks. Both theoretically and empirically, smaller constraintviolation (CV) leads to better optimization results. Logarithm terms in CV bounds are ignored. score, which is more practical than supervised learning(Vinyals et al., 2015)  and empirically more efficient than reinforcement learning(Schulman et al., 2017); 3) the end-to-end architecture enabling tackling the important predictive CO problems, i.e. decision-making under uncertainty(Wilder et al., 2019; Elmachtoub & Grigas, 2022). In this paper, we follow the general paradigm of learning to solve CO in one-shot presented in the seminal work(Karalias & Loukas,

Inspired by Cuturi et al. (2019); Xie et al. (2020), we develop a soft algorithm by reformulating the topk selection as an optimal transport problem (Villani, 2009) and efficiently tackle it by the differentiable Sinkhorn algorithm (Sinkhorn,

funding

* Junchi Yan is the correspondence author. The work was in part supported by National Key Research and Development Program of China (2020AAA0107600), NSFC (U19B2035, 62222607, 61972250), STCSM (22511105100), Shanghai Committee of Science and Technology (21DZ1100100).

