AIA: LEARN TO DESIGN GREEDY ALGORITHM FOR NP-COMPLETE PROBLEMS USING NEURAL NETWORKS

Abstract

Algorithm design is an art that heavily requires intuition and expertise of the human designers as well as insights into the problems under consideration. In particular, the design of greedy-selection rules, the core of greedy algorithms, is usually a great challenge to designer: it is relatively easy to understand a greedy algorithm while it is always difficult to find out an effective greedy-selection rule. In the study, we present an approach, called AIA, to learn algorithm design with the aid of neural networks. We consider the minimum weighted set cover problem (WSCP), one of the NP-hard problems, as an representative example. Initially, we formulate a given WSCP as an 0-1 integer linear program (ILP): each variable x i has two options, i.e., x i = 0, which denotes abandon of the set s i , and x i = 1, which denotes selection of s i . Each option of a variable leads to a sub-problem with respect to the original ILP problem. Next, we design a generic search framework to find the optimal solution to the ILP problem. At each search step, the value of a variable is determined with the aid of neural networks. The key of our neural network is the loss function: the original ILP problem and the sub-problems generated by assigning a variable x i should satisfy the Bellman-Ford equation, which enables us to set the dissatisfication of Bellman-Ford equation as loss function of our neural network. The neural network is used as greedy-selection rule. Experimental results on representative instances suggest that using the NN-based greedy selection rule, we can successfully find the optimal solutions. More importantly, the NN-based greedy-selection rule outperform the outstanding Chavatal greedy algorithm, which was designed by human expert. The basic idea of our approach can be readily extended without significant modification to design greedy algorithm for other NP-hard problems.

1. INTRODUCTION

NP-complete problems, the hardest ones in the NP class, can be validated in polynomial time, but no polynomial-time algorithm has yet been found to solve these problems. A great variety of practical problems can be formulated as NP-complete problems, such as strategic planning, production planning, facility location problems, as well as a variety of scheduling and routing problems. Thus, despite the hardness of these problems, designing efficient solving algorithms for NP-complete problems is highly desired. The weighted set cover problem (WSCP) is a classical NP-complete problem, which aims to find a subset of columns that cover all the rows of a 0-1 matrix at minimal cost (Karp, 1972) . The algorithms to solve WSCP, say branch-and-bound and branch-and-cut, can only handle instances with limited size. Therefore, considerable efforts have been devoted to design heuristics and metaheuristics that can find optimal or near optimal solutions to large-scale WSCP problems within a reasonable time. The latest works on meta-heuristic approaches for the WSCP include genetic algorithms, ant colony optimization, simulated annealing, tabu Search. With the breakthrough of deep learning (DL) in solving practical problems, many researchers try to use DL to solve combinatorial optimization problems. Training a neural network end-to-end with supervised learning to solve theoretically complex combinatorial optimization problems is a difficult problem. On the one hand, traditional algorithms and mathematical methods have a relatively complex theoretical foundation while the neural network as a black box lacks theoretical foundation. On the other hand, many practical problems have their specific data characteristics while DL models usually require a large amount of labeled data under the distribution and constructing labeled data needs to know the optimal solution of the original problem, so it is very difficult to build a large-scale dataset like ImageNet (Russakovsky et al., 2015) . Thus, we are not directly solving this problem end-to-end like other fields. In the study, we present an approach, called AIA, to learn algorithm design with the aid of neural networks. The specific goal of this paper is to use machine learning to find greedy rules for solving WSCP and then design an efficient and practical algorithm for WSCP. Our main contributions of this work are as follows: 1. We propose an idea to solve the problem, where the neural network learns greedy strategies to assist researchers in designing algorithms instead of using deep learning to solve the problem end-to-end. 2. We propose the NNVal algorithm, which uses a simple neural network to score recursive sub-problems that are used to guide a multi-step decision-making process, and a special novel loss function is designed to train the neural network. 3. We propose the NNGreedy algorithm for WSCP. The experimental results on multiple datasets show that compared with greedy algorithms designed based on human experience, such as the Chvatal algorithm, the NNGreedy algorithm can obtain better solution.

2. RELATED WORKS AND BACKGROUND

There exist two traditional approaches to solve combinatorial optimization problems: exact algorithms and approximate/heuristic algorithms. Exact algorithms are guaranteed to find optimal solutions, but they become intractable when the problem scales up. Approximate algorithms trade optimality for computational efficiency. They are problem-specific, often designed by iteratively applying a simple man-crafted rule, known as heuristic. Their complexity is polynomial and their quality depends on an approximate ratio that characterizes the worst/average-case error w.r.t the optimal solution. For NP-complete or even NP-hard constraint programming problems, the exact algorithm usually adopts the divide-and-conquer approach, dividing the solution process into multiple parts, and gradually eliminate poor choices until the optimal solution is found. These methods are essentially exhaustive search and their time complexity are exponential. The cores of the divide-and-conquer strategy is how to decompose it into sub-problems and which sub-problem to choose to solve first. Taking the Traveling Salesman Problem (TSP) as an example, the solution software Concorde (Applegate et al., 2002) adopts the branch-and-bound combined with cut plane method. For more general mixed integer linear programming problems, the mainstream solvers, for example, Gurobi (Bixby, 2007) , CPLEX (Cplex, 2009) , XPRESS (Laundy et al., 2009) , SCIP (Gamrath et al., 2016) , etc., also use branch and bound combined with cutting plane and column generation technology. When searching for the optimal solution, heuristic rules play an important role. Taking mixed integer programming as an example, in the branch-and-bound process, selecting an appropriate branch variable requires heuristic rules, and each time a sub-problem is re-selected also requires heuristic rules. Choosing appropriate branching variables and subproblems can significantly reduce the search space. In addition, to obtain a good feasible solution as soon as possible to speed up pruning, the algorithm performs a simple and fast primitive heuristic strategy on each subproblem. A good original heuristic can find better feasible solutions faster. Heuristic rules depend on the specific problem and solution process. In other words, different heuristic rules and different parameters have different effects on different data and at different stages of solution. Therefore, data-driven machine learning technique is a potential heuristic rule design method. In the last decade, DL has significantly improved Computer Vision, Natural Language Processing and Speech Recognition by replacing hand-crafted features with features learned from data (LeCun et al., 2015) . On the one hand, combinatorial optimization algorithms are often used as a com-plement to deep learning solutions. DETR (Carion et al., 2020) uses a bipartite graph matching algorithm to replace the NMS post-processing in traditional object detection, which solves a pain point in this field. DeepSORT (Wojke et al., 2017) uses the Hungarian algorithm to tell if an object in current frame is the same as the one in previous frame, which is one of the most popular and general object tracking algorithms. On the other hand, more and more researchers have introduced neural network into Combinatorial Optimization, called Neural Combinatorial Optimization (Garmendia et al., 2022) , which attempts to learn good heuristics for solving a set of problems using Neural Network models. The use of machine learning techniques in NP-complete problem solving can be divided into two categories: one is learning from expert knowledge, that is, supervised learning, and the other is learning from experience, that is, reinforcement learning, which is briefly described as follows: Supervised learning is relatively common and easy to implement. In the branch and bound framework of Mixed Integer Programming Solver, there are more than 10 heuristic rules for the selection of branch variables. The Strong Branch strategy is recognized as the branch variable selection strategy that minimizes the size of the branch search tree (Achterberg et al., 2005) . It selects the variable branch that minimizes the lower bound of the new sub-problem after the branch every time. The cost is that for each branch, the lower bound of the new subproblem must be calculated. This step introduces a lot of calculations, and absolutely most calculations cannot be reused in subsequent solutions. This makes the strong branching strategy lag behind in solving time compared to other rules. In order to overcome the shortcoming of the strong branch strategy, Marcos Alvarez (Alvarez et al., 2014) used a special kind of decision tree to learn the branch variable selection strategy of Strong Branch. Khalil (Khalil et al., 2016) proposes an instance-specific learning framework. For each problem, the choice of strong branching strategy is recorded on several sub-problems at the beginning, and the features of each variable and each step are extracted at the same time. They train an SVMrank and use the learned model on the following sub-problems. Gasse uses a Graph Convolutional Neural Network (Gasse et al., 2019) to extract deeper information on variables and constraints before each branch to learn the choice of strong branching strategies. He (He et al., 2014) designed a machine learning algorithm,in which among all open subproblems, the subproblem whose subtree contains the optimal solution will be selected. Chaitanya's work(Joshi et al., 2019) uses a deep graph network by supervision to predict the probabilities of an edge to be in the TSP tour, which is more sample efficient compared to reinforcement learning, whose feasible tour is generated by beam search. Reinforcement learning techniques are mostly used in solving algorithms for specific NP-hard problems. Taking the TSP problem as an example, Vinyals (Vinyals et al., 2015) proposed a pointer network, and introduced a pointer in the decoder RNN, which solves the problem of variable size output dictionaries using the mechanism of neural attention. This model solves the problem that the output scale strictly depends on the input scale during training. Bello (Bello et al., 2016 ) uses a similar model structure, sets the current total distance as a reward, and uses reinforcement learning to train the network, which solves parts of problems that supervised learning is difficult to deal with, such as non-unique standard answers. Kool and Welling (Kool et al., 2018) replaced the Recurrent Neural Network (RNN) with a Graph Neural Network (GNN) to process the input. All three above use reinforcement learning to train end-to-end models. Astounding results from Transformer (Vaswani et al., 2017 ) models on NLP and CV tasks (Khan et al., 2021) have intrigued the researcher to study their application to TSP. Xavier and Thomas (Bresson & Laurent, 2021) propose to adapt the Transformer architecture to the combinatorial TSP. Training is done by reinforcement learning, hence without TSP training solutions, and decoding uses beam search. As for the learning of specific heuristic rules, Khalil uses the graph neural network (Khalil et al., 2017) to encode graph information to make choices. In general, reinforcement learning methods are more widely used in solving combinatorial optimization problems, which don't require too much prior knowledge, while supervised learning methods are more efficient in sampling and training. All in all, the use of machine learning to assist the solution of combinatorial optimization problems is a very promising research trend. The main question is whether DL can learn better heuristics from data, i.e. replacing human-designed heuristics.

3. STATE-TRANSITION EQUATION

The WSCP is the problem of covering the rows of an m-row, n-column, 0-1 matrix A with a subset of the columns at minimum cost. Problem 1 is a matrix form of WSCP. min z = c T x s.t. Ax ≥ b x j = 0 or 1 (j = 1, 2, • • • , n). (1) We define a state s k = [A, b, c; k] (k = 0, 1, ..., n). k=0 represents the original programming problem and k >0 means that the variables x 1 , x 2 , ..., x k have already been fixed. We define the problem as a multistep decision-making process, determining whether a variable x k is 0 or 1 at each step. If x 1 =1 at the first step, A with m rows and n columns turns to be A ′ , which has m rows and n -1 columns and b turns to be b -α 1 (α j is the j-th column of A). We redefine the problem as follows, min z = c T x s.t.    Ax ≥ b x 1 , x 2 , ..., x k are f ixed x k+1 , x k+2 , ..., x n = 0 or 1 (2) We define f (s k ) as the optimal solution of the subprogram 2 under the state s k .Therefore statetransition equation can be obtained, f (s k+1 ) = min x k+1 =0 or 1 f (s k ), k = 0, ..., n -1

4. METHOD

We use a neural network to score each sub-problem and guide multi-step decision-making according to equation 3 instead of predicting the solution to the sub-problem. We call this algorithm NNVal.foot_0 

4.1. PROBLEM REFORMULATION

Supposing the optimal solution of the sub-problem 2 is f (A, b, c), jwe have the following statetransition equation at state s k , f (A ′ , b ′ , c ′ ) = min (f (A ′′ , b ′ , c ′′ ) , c k+1 + f (A ′′ , b ′ -α k+1 , c ′′ )) A ′ = A ′ m×(n-k) = (α k+1 , α k+2 , ..., α n ) A ′′ = A ′′ m×(n-k-1) = (α k+2 , α k+3 , ..., α n ) b ′ = b ′ m = b -(x * 1 α 1 + ... + x * k α k ) c ′ = c ′ n-k = (c k+1 ; c k+2 ; ...; c n ) c ′′ = c ′′ n-k-1 = (c k+2 ; c k+3 ; ...; c n ) where, x * k is the variable determined at state s k-1 and α k represents the k-th column of A. We define f (A, b, c) = +∞ when Ax ≥ b has no feasible solution. Thus, given the function f , it is easy to get: x k+1 = 0 f (A ′′ , b ′ , c ′′ ) ≤ c k+1 + f (A ′′ , b ′ -α k+1 , c ′′ ) 1 f (A ′′ , b ′ , c ′′ ) > c k+1 + f (A ′′ , b ′ -α k+1 , c ′′ ) (5)

4.2. NNVAL ALGORITHM

A simple idea is to fit f with a neural network, but it may not be feasible. On the one hand, it is due to the difficulty of obtaining labeled data, and on the other hand, due to the characteristics of the neural network itself, the input changes little, but the output may change greatly, so it is difficult to accurately predict the target. We decide to make full use of the state-transition equation. Our model g θ does not directly learn the optimized values of the sub-problems, but learns the recursive relationship between the original problem and the sub-problems.  In dynamic programming, the recurrence relation is as follow, Objval(P ) = min{Objval(SP 0 ), Objval(SP 1 )} ( ) where P is the original problem and SP is sub-problem of P . We hope our model can also learn the recursive relationship, so we define L square as follows, L square = (g θ (P ) -min {g θ (SP 0 ), g θ (SP 1 )}) 2 (8) We want the output of the original problem to be as close as possible to the output of the correct subproblem. However, this is not enough, because sometimes there are situations where sub-problems are not feasible. Thus, we add the auxiliary item L aux as follow, L aux = ReLU (g θ (feasible SP) -g θ (infeasible SP)) where ReLU(x) = max(x, 0). We know L aux is equal to 0 when the model output of feasible subproblem is less than the infeasible, and if infeasible sub-problem is selected, this item will generate a positive penalty. All in all, L square is mainly used to make model learn the recursive relationship, and L aux is used to control the case where the sub-problem is not feasible.foot_1 

4.4. IMPROVEMENT

The algorithm discussed before relies on external library functions to judge the feasibility of the subproblem, so it is suitable any 0-1 ILP. This section attempts to mine the nature of the weighted set coverage problem itself, which is more controllable and flexible. In order to facilitate the distinction, we call the algorithm discussed before as NNVal V1, and the algorithm to be discussed next is called NNVal V2. In NNVal V2, we still use a two-layer fully-connected neural network just as V1. The difference of model from V1 is that a ReLU layer is added. The structure of the neural network is shown in Figure 2 , where b = ReLU(b).  + ... + c k x * k ) s.t. α k+1 x k+1 + ... + α n x n ≥ b -(α 1 x * 1 + ... + α k x * k ) x k+1 , x k+2 , ..., x n = 0 or 1 (10) where the constant term on the right side of the constraint inequality takes a integer value not exceeding 1. In fact, for such a 0-1 ILP, the value of b -(α 1 x * 1 + ... + α k x * k ) being 0 or less than 0 has no effect on the solution, so we can use ReLU to reduce the complexity of training. The i-th row of the inequality constraint can be written as, a i,k+1 x k+1 + ... + a i,n x n ≥ b i,k+1 where b i,k+1 ∈ Z ∩ {x | x ≤ 1} is the i-th row element of b -(α 1 x * 1 + ... + α k x * k ), a i,j ∈ {0, 1} is the i-th row element of α j and x j ∈ {0, 1}. In fact, x k+1 , ..., x n can be arbitrarily 0 or 1 if b i,k+1 ≤ 0. However if b i,k+1 = 1, then the row constraint is possible to have further restrictions on x k+1 , ..., x n being 0 or 1. In particular, if b i,k+1 = 1, a i,l = 1 and a i,k+1 = ... = a i,l-1 = a i,l+1 , ... = a i,n = 0, then the line constraint becomes x l ≥ 1 and we get x l = 1. Thus, before the state transition, check whether x k+1 need to be fixed to 1. If so directly specify x k+1 = 1, otherwise make a state transition according to the result of the model g θ .

4.5. NNGREEDY ALGORITHM

Based on NNVal algorithm, we design greedy rule and greedy algorithm NNGreedy. The NNGreedy algorithm makes decisions step by step from x 1 to x n , finally obtains a complete solution. Without loss of generality, we record P k as the original problem of the k-th step. SP k0 corresponds to the sub-problem when x k = 0 and SP k1 corresponds to the sub-problem when x k = 1. The greedy rule adopted by the NNGreedy algorithm is as follows, Algorithm 1 NNGreedy algorithm greedy rule 1: if g θ (SP k0 ) < g θ (SP k1 ) then 2: x k = 0 3: else 4: x k = 1 5: end if In a specific WSCP, for S i ∈ F, ∀i ∈ {1, 2, ..., n}, x i = 0 means not choosing S i and x i = 1 means S i is selected. The pseudo-code of the NNGreedy algorithm can be expressed as follows, Algorithm 2 NNGreedy greedy algorithm 1: for i = 1 → n do 2: Calculated x i according to the NNGreedy greedy rule 3: end for 4: Output x 1 , x 2 , ..., x n and n i=1 w i x i

5.1. NNVAL EXPERIMENTS

This article uses PyTorch1.6. The CPU is an Intel Core i7-8700K, and the GPU is an NVIDIA GeForce GTX 1080Ti. We randomly generate some weighted set coverage problems as dataset according to the method of Balas & Ho (1980) . We use the two scale as follow, 1. Small-scale ILP instance, full set |U | = 20, 20 subsets S 1 , S 2 , ..., S 20 , constraint matrix density is 0.1, that is, the probability of the number 1 appearing in the constraint matrix. 2. Large-scale ILP instance, full set |U | = 50, 50 subsets S 1 , S 2 , ..., S 50 , constraint matrix density is the same as small-scale instance. The NNVal algorithm is mainly used to assist decision making at every step, and the sub-problems at different stages have different scales. In order to prove the effectiveness of the algorithm, we conduct experiments on problems of different scales. We choose sub-problem pairs (SP 0 , SP 1 ) to evaluate. Table1 shows the estimation of the order between the sub-problems of NNVal pairs on small-scale and big-scale instances, and their estimation accuracy is 95.4% and 86.5%.

5.2. NNGREEDY EXPERIMENTS

We compared the NNgreedy algorithm with the Chvatal's greedy algorithm, which is designed purely based on human experience, in terms of time and solution quality On 100 small-scale instances, we compare the NNGreedy algorithm and the Chvatal greedy algorithm from two perspectives: 1. Comparison of the number of instances to obtain the optimal solution: NNGreedy obtains the optimal solution on 87 instances, and the Chvatal greedy algorithm obtains the optimal solution on 11 instances. 2. Comparison of solution Quality: On 87 instances, the solution obtained by NNGreedy is better than the solution obtained by Chvatal greedy, and on 3 instances, the solution obtained by NNGreedy is inferior to the Chvatal greedy algorithm. Table 2 shows the specific solution quality comparison of the two greedy algorithms on 10 smallscale instances. As shown in the table 2, the solution quality of NNGreedy algorithm is better than that of the Chvatal greedy algorithm. On 10 large-scale instances, we also compares the NNGreedy algorithm and the Chvatal greedy algorithm from the perspectives of the number of instances to find the optimal solution and the quality of the solution: 1. Comparison of the number of instances for obtaining the optimal solution: NNGreedy obtains the optimal solution on one instance, and the Chvatal greedy algorithm does not obtain the optimal solution on any instance. 2. Comparison of solution Quality: On 8 instances, the solution obtained by NNGreedy is better than the solution obtained by the Chvatal greedy algorithm, and on 2 instances, the solution obtained by NNGreedy is inferior to the Chvatal greedy algorithm. Table 3 shows the specific solution quality comparison of the two greedy algorithms on 10 largescale instances. The experimental results show that the solution quality of the NNgreedy algorithm is better than that of the Chvatal greedy algorithm. Although the NNGreedy algorithm is slow to solve due to embedding training, it is acceptable, and the iteration time of each round is about 0.01s to 0.1s 

A LOSS FUNCTION FOR ILP

The neural network model is denoted as g θ where θ is the parameters of the neural network. We first need g θ satisfy the state-transition equation as follows, g θ (A ′ , b ′ , c ′ ) = min (g θ (A ′′ , b ′ , c ′′ ), c k+1 + g θ (A ′′ , b ′ -α k+1 , c ′′ )) We define the following variables: LHS = g θ (s k ) = g θ (A ′ , b ′ , c ′ ) RHS1 = g θ (s k , x k+1 = 1) = c k+1 + g θ (A ′′ , b ′ -α k+1 , c ′′ ) RHS0 = g θ (s k , x k+1 = 0) = g θ (A ′′ , b ′ , c ′′ ) We want g θ to satisfy the state-transition equation 14: LHS =    min(RHS1, RHS0) x k+1 is free RHS1 x k+1 = 0 is infeasible RHS0 x k+1 = 1 is infeasible (14) There is an important fact that we have defined f (A, b, c) = +∞ when x i = 0 or 1 has no feasible solution. This constraint should be taken into the loss function as well. Therefore, the loss function (for one-step state-transition) can be designed as follows: loss(θ) =    (LHS -min(RHS1, RHS0)) 2 x k+1 is free (LHS -RHS1) 2 + ReLU(RHS1 -RHS0) x k+1 = 0 is infeasible (LHS -RHS0) 2 + ReLU(RHS0 -RHS1) x k+1 = 1 is infeasible (15) The purpose of the ReLU item is to make the output of g θ large enough when the sub-problem is not feasible. In fact, in one-step state transition, g θ can obtain the correct optimal solution as long as the output value of g θ in the infeasible state is greater than the one in the feasible. When the training is over, the optimal solution can be obtained according to the equation 16 in each step. 



Appendix B provide the training process including flowchat and pseudocode More interpretion is shown in appendix A This paper is an attempt to use deep learning technology to assist algorithm design: the greedy algorithm proposed in this paper does not rely on human experience to design greedy rules, but learns greedy rules, which is an intelligent search algorithm and can help to overcome the inadequacy of human experience. We believe that data-driven algorithm design will be a new hot topic.



preserving all information of the problem and simplifying the training procedure, we use a simple two-layer fully-connected neural network to learn the recursive relationship. The model g θ input is the same as the input of f , whose input layer has m × n + m + n nodes, corresponding to A, b and c of the original Problem 1. As for sub-problem input, α k and c k corresponding to fixed variables x k are set to 0. The output layer of the network has 1 node. 1 hidden layer is in the middle, which has m + n nodes. The whole model is shown in Figure 1.

Figure 1: Neural Network for NNVal

Figure 2: Neural network for improved version of NNVal Let's rethink the sub-problem corresponding to state s k = [A, b, c; k], min c k+1 x k+1 + ... + c n x n + (c 1 x * 1+ ... + c k x * k ) s.t. α k+1 x k+1 + ... + α n x n ≥ b -(α 1 x * 1 + ... + α k x * k )x k+1 , x k+2 , ..., x n = 0 or 1

Figure 3: Training process used in NNVal V1

The estimation relationship on sub-problem pairs in 2 scalesSCALEObj(SP 0 )<Obj(SP 1 ) Obj(SP 0 )>Obj(SP 1 ) Acc

Comparison of solution quality on small scale instances

Comparison of solution quality on big scale instances

Algorithm 4 Training Algorithm for NNVal V2

1: Initialize g θ , data pool D, exploration rate ϵ, learning rate η and max iteration number T , t ← 1 2: while t < T do 3:T rj, sol ← GENERATETRAJECTORY(g θ , A, c) if label == T rue then 16:According to R 1 , R 0 , calculate x i with exploration rate ϵ, update b tmp , push x i into sol return label, f ix idxs 38: end function

