NEURAL LEARNING OF ONE-OF-MANY SOLUTIONS FOR COMBINATORIAL PROBLEMS IN STRUCTURED OUTPUT SPACES

Abstract

Recent research has proposed neural architectures for solving combinatorial problems in structured output spaces. In many such problems, there may exist multiple solutions for a given input, e.g. a partially filled Sudoku puzzle may have many completions satisfying all constraints. Further, we are often interested in finding any one of the possible solutions, without any preference between them. Existing approaches completely ignore this solution multiplicity. In this paper, we argue that being oblivious to the presence of multiple solutions can severely hamper their training ability. Our contribution is two fold. First, we formally define the task of learning one-of-many solutions for combinatorial problems in structured output spaces, which is applicable for solving several problems of interest such as N-Queens, and Sudoku. Second, we present a generic learning framework that adapts an existing prediction network for a combinatorial problem to handle solution multiplicity. Our framework uses a selection module, whose goal is to dynamically determine, for every input, the solution that is most effective for training the network parameters in any given learning iteration. We propose an RL based approach to jointly train the selection module with the prediction network. Experiments on three different domains, and using two different prediction networks, demonstrate that our framework significantly improves the accuracy in our setting, obtaining up to 21 pt gain over the baselines.

1. INTRODUCTION

Neural networks have become the de-facto standard for solving perceptual tasks over low level representations, such as pixels in an image or audio signals. Recent research has also explored their application for solving symbolic reasoning tasks, requiring higher level inferences, such as neural theorem proving (Rocktäschel et al., 2015; Evans & Grefenstette, 2018; Minervini et al., 2020) , and playing blocks world (Dong et al., 2019) . The advantage of neural models for these tasks is that it will create a unified, end-to-end trainable representation for integrated AI systems that combine perceptual and high level reasoning. Our paper focuses on one such high level reasoning task -solving combinatorial problems in structured output spaces, e.g., solving a Sudoku or N-Queens puzzle. These can be thought of as Constraint Satisfaction problems (CSPs) where the underlying constraints are not explicitly available, and need to be learned from training data. We focus on learning such constraints by a non-autoregressive neural model where variables in the structured output space are decoded simultaneously (and therefore independently). Notably, most of the current state-of-the-art neural models for solving combinatorial problems, e.g., SATNET (Wang et al., 2019) , RRN (Palm et al., 2018) , NLM (Dong et al., 2019) , work with non autoregressive architectures because of their high efficiency of training and inference, since they do not have to decode the solution sequentially. One of the key characteristics of such problems is solution multiplicity -there could be many correct solutions for any given input, even though we may be interested in finding any one of these solutions. For example, in a game of Sudoku with only 16 digits filled, there are always multiple correct solutions (McGuire et al., 2012) , and obtaining any one of them suffices for solving Sudoku. Unfortunately, existing literature has completely ignored solution multiplicity, resulting in sub-optimally trained networks. Our preliminary analysis of a state-of-the-art neural Sudoku solver (Palm et al., 2018) foot_0 , which trains and tests on instances with single solutions, showed that it achieves a high accuracy of 96% on instances with single solution, but the accuracy drops to less than 25%, when tested on inputs that have multiple solutions. Intuitively, the challenge comes from the fact that (a) there could be a very large number of possible solutions for a given input, and (b) the solutions may be highly varied. For example, a 16-givens Sudoku puzzle could have as many as 10,000 solutions, with maximum hamming distance between any two solutions being 61. Hence, we argue that an explicit modeling effort is required to represent this solution multiplicity. As the first contribution of our work, we formally define the novel problem of One-of-Many Learning (1oML). It is given training data of the form {(x i , Y x i )}, where Y x i denotes a subset of all correct outputs Y x i associated with input x i . The goal of 1oML is to learn a function f such that, for any input x, f (x) = y for some y ∈ Y x . We show that a naïve strategy that uses separate loss terms for each (x i , y ij ) pair where y ij ∈ Y x i can result in a bad likelihood objective. Next, we introduce a multiplicity aware loss (CC-LOSS) and demonstrate its limitations for non-autoregressive models on structured output spaces. In response, we present our first-cut approach, MINLOSS, which picks up the single y ij closest to the prediction ŷi based on the current parameters of prediction network (base architecture for function f ), and uses it to compute and back-propagate the loss for that training sample x i . Though significantly better than naïve training, through a simple example, we demonstrate that MINLOSS can be sub-optimal in certain scenarios, due to its inability to pick a y ij based on global characteristics of solution space. To alleviate the issues with MINLOSS, we present two exploration based techniques, I-EXPLR and SELECTR, that select a y ij in a non-greedy fashion, unlike MINLOSS. Both techniques are generic in the sense that they can work with any prediction network for the given problem. I-EXPLR relies on the prediction network itself for selecting y ij , whereas SELECTR is an RL based learning framework which uses a selection module to decide which y ij should be picked for a given input x i , for back-propagating the loss in the next iteration. The SELECTR's selection module is trained jointly along with the prediction network using reinforcement learning, thus allowing us to trade-off exploration and exploitation in selecting the optimum y ij by learning a probability distribution over the space of possible y ij 's for any given input x i . We experiment on three CSPs: N-Queens, Futoshiki, and Sudoku. Our prediction networks for the first two problems are constructed using Neural Logic Machines (Dong et al., 2019) , and for Sudoku, we use a state-of-the-art neural solver based on Recurrent Relational Networks (Palm et al., 2018) . In all three problems, our experiments demonstrate that SELECTR vastly outperforms naïve baselines by up to 21 pts, underscoring the value of explicitly modeling solution multiplicity. SELECTR also consistently improves on other multiplicity aware methods, viz. CC-LOSS, MINLOSS, and I-EXPLR.

2. BACKGROUND AND RELATED WORK

Related ML Models: There are a few learning scenarios within weak supervision which may appear similar to the setting of 1oML, but are actually different from it. We first discuss them briefly. 'Partial Label Learning' (PLL) (Jin & Ghahramani, 2002; Cour et al., 2011; Xu et al., 2019; Feng & An, 2019; Cabannes et al., 2020) involves learning from the training data where, for each input, a noisy set of candidate labels is given amongst which only one label is correct. This is different from 1oML in which there is no training noise and all the solutions in the solution set Y x for a given x are correct. Though some of the recent approaches to tackle ambiguity in PLL (Cabannes et al., 2020) may be similar to our methods, i.e., MINLOSS , by the way of deciding which solution in the target set should be picked next for training, the motivations are quite different. Similarly, in the older work by (Jin & Ghahramani, 2002) , the EM model, where the loss for each candidate is weighted by the probability assigned to that candidate by the model itself, can be seen as a naïve exploration based approach, applied to a very different setting. In PLL, the objective is to select the correct label out of many incorrect ones to reduce training noise, whereas in 1oML, selecting only one label for training provably improves the learnability and there is no question of reducing noise as all the labels are correct. Further, most of the previous work on PLL considers classification over a discrete output space with, say, L labels, where as in 1oML, we work with structured output spaces, e.g., an r dimensional vector space where each dimension represents a discrete space of L labels. This



Available at https://data.dgl.ai/models/rrn-sudoku.pkl

