ABDUCTIVE KNOWLEDGE INDUCTION FROM RAW DATA

Abstract

For many reasoning-heavy tasks, it is challenging to find an appropriate endto-end differentiable approximation to domain-specific inference mechanisms. Neural-Symbolic (NeSy) AI divides the end-to-end pipeline into neural perception and symbolic reasoning, which can directly exploit general domain knowledge such as algorithms and logic rules. However, it suffers from the exponential computational complexity caused by the interface between the two components, where the neural model lacks direct supervision, and the symbolic model lacks accurate input facts. As a result, they usually focus on learning the neural model with a sound and complete symbolic knowledge base while avoiding a crucial problem: where does the knowledge come from? In this paper, we present Abductive Meta-Interpretive Learning (M eta Abd ), which unites abduction and induction to learn perceptual neural network and first-order logic theories simultaneously from raw data. Given the same amount of domain knowledge, we demonstrate that M eta Abd not only outperforms the compared end-to-end models in predictive accuracy and data efficiency but also induces logic programs that can be reused as background knowledge in subsequent learning tasks. To the best of our knowledge, M eta Abd is the first system that can jointly learn neural networks and recursive first-order logic theories with predicate invention.

1. INTRODUCTION

Inductive bias, background knowledge, is an essential component in machine learning. Despite the success of data-driven end-to-end deep learning in many traditional machine learning tasks, it has been shown that incorporating domain knowledge is still necessary for some complex learning problems (Dhingra et al., 2020; Grover et al., 2019; Trask et al., 2018) . In order to leverage complex domain knowledge that is discrete and relational, end-to-end learning systems need to represent it with a differentiable module that can be embedded in the deep learning context. For example, graph neural networks (GNN) use relational graphs as an external knowledge base (Zhou et al., 2018) ; some works even considers more specific domain knowledge such as differentiable primitive programs (Gaunt et al., 2017) . However, the design of these modules is usually ad hoc. Sometimes, it is not easy to find an appropriate approximation that is suited for single-model based end-to-end learning (Glasmachers, 2017; Garcez et al., 2019) . Therefore, many researchers propose to break the end-to-end learning pipeline apart and build a hybrid model that consists of smaller modules where each of them only accounts for one specific function (Glasmachers, 2017) . A representative branch in this line of research is Neural-Symbolic (NeSy) AI (De Raedt et al., 2020; Garcez et al., 2019) aiming to bridge System 1 and System 2 AI (Kahneman, 2011; Bengio, 2017) , i.e., neural-network-based machine learning and symbolicbased relational inference. In NeSy models, the neural network extracts high-level symbols from noisy raw data and the symbolic model performs relational inference over the extracted symbols. However, the non-differentiable interface between neural and symbolic systems (i.e., the facts extracted from raw data and their truth values) leads to high computational complexity in learning. For example, due to the lack of direct supervision to the neural network and reliable inputs to the symbolic model, some works have to use Markov Chain Monte Carlo (MCMC) sampling or zeroorder optimisation to train the model (Li et al., 2020; Dai et al., 2019) , which could be inefficient in practice. Consequently, almost all hybrid models assume the existence of a very strong predefined domain knowledge base and focus on using it to train neural networks. It limits the expressive power of the hybrid-structured model and sacrifices many benefits of symbolic learning (e.g., predicate invention, learning recursive theories, and re-using learned models as background knowledge). In this paper, we integrate neural networks with Inductive Logic Programming (ILP) (Muggleton & de Raedt, 1994 )-a general framework for symbolic machine learning-to enable first-order logic theory induction from raw data. More specifically, we present Abductive Meta-Interpretive Learning (M eta abd ) which extends the Abductive Learning (ABL) framework (Dai et al., 2019; Zhou, 2019) by combining logical induction and abduction (Flach et al., 2000) with neural networks in Meta-Interpretive Learning (MIL) (Muggleton et al., 2014) . M eta Abd employs neural networks to extract probabilistic logic facts from raw data, and induces an abductive logic program (Kakas et al., 1992) that can efficiently infer possible truth values of the probabilistic facts to train the neural model. On the one hand, the abductive logic program learned by M eta Abd can largely prune the search space of the truth value assignments to the logical facts extracted by an under-trained neural model. On the other hand, the extracted probabilistic facts, although noisy, provide a distribution on the possible worlds (Nilsson, 1986) reflecting the raw data distribution, which helps logical induction to identify the most probable hypothesis. The two systems in M eta Abd are integrated by a probabilistic model that can be optimised with Expectation Maximisation (EM). To the best of our knowledge, M eta abd is the first system that can simultaneously (1) train neural models, (2) learn recursive logic theories and (3) perform predicate invention from domains with sub-symbolic representation. In the experiments we compare M eta Abd to the compared state-ofthe-art end-to-end deep learning models on two complex learning tasks. The results show that, given the same amount of background knowledge, M eta abd outperforms the end-to-end models significantly in terms of predictive accuracy and data efficiency, and learns human interpretable models that could be re-used in subsequent learning tasks.

2. RELATED WORK

Solving "System 2" problems require the ability of relational and logical reasoning instead of "intuitive and unconscious thinking" (Kahneman, 2011; Bengio, 2017) . Due to the complexity of this type of tasks, many researchers have tried to embed intricate background knowledge in end-to-end deep learning models. For example, Trask et al. (2018) propose the differentiable Neural Arithmetic Logic Units (NALU) to model basic arithmetic functions (e.g., addition, multiplication, etc.) in neural cells; Grover et al. (2019) encode permutation operators with a stochastic matrix and present a continuous and differentiable approximation to the sort operation; Wang et al. (2019) introduce a differentiable SAT solver to enable gradient-based constraint solving. However, most of these specially designed differentiable modules are ad hoc approximations to the original inference mechanisms, which can not represent the inductive bias in a general form such as formal languages. In order to directly exploit the complex background knowledge expressed by formal languages, Statistical Relational (StarAI) and Neural Symbolic (NeSy) AI (De Raedt et al., 2020; Garcez et al., 2019) are proposed. Some works try to approximate logical inference with continuous functions or use probabilistic logical inference to enable the end-to-end training (Cohen et al., 2020; Manhaeve et al., 2018; Donadello et al., 2017) ; others try to combine neural networks and pure symbolic reasoning by performing a combinatorial search over the truth values of the output facts of the neural model (Li et al., 2020; Dai et al., 2019) . Because of the highly complex statistical relational inference and combinatorial search, it is difficult for them to learn first-order logic theories. Therefore, most existing StarAI and NeSy systems focus on utilising a pre-defined symbolic knowledge base to help the parameter learning of the neural model and probabilistic model. One way to learn symbolic models is to use Inductive Logic Programming (Muggleton & de Raedt, 1994) . Some early work on combining logical abduction and induction can learn logic theories even when input data is incomplete (Flach et al., 2000) . Recently, ∂ILP was proposed for learning first-order logic theories from noisy data (Evans & Grefenstette, 2018) . However, these works are designed for learning from domains. Otherwise, they need to use a fully trained neural model to extract primitive facts from raw data before symbolic learning. Machine apperception (Evans et al., 2019) unifies reasoning and perception by combining logical inference and binary neural networks in Answer Set Programming, in which logic hypotheses and parameters of neural networks are all represented by logical groundings, making the system hard to optimise. For problems involving noisy inputs like MNIST images, it still requires a fully pre-trained neural net for pre-processing. Different to the previous work, our presented Abductive Meta-Interpretive Learning (M eta Abd ) aims to combine symbolic and sub-symbolic learning in a mutually beneficial way, where the induced abductive logic program prunes the combinatorial search of the unknown labels for training the neural model; and the probabilistic facts output by the neural model provide a distribution on the possible worlds of the symbolic domain to help logic theory induction.

3.1. PROBLEM FORMULATION

A typical hybrid model bridging sub-symbolic and symbolic learning contains two major parts: a perception model and a reasoning model (Dai et al., 2019) . The perception model maps raw inputs x ∈ X -which are usually noisy and represented by sub-symbolic features-to some primitive symbols z ∈ Z, such as digits, objects, ground logical expressions, etc. The reasoning model takes the interpreted z as input and deduces the final output y ∈ Y according to a symbolic background knowledge base B. Because the primitive symbols z are uncertain and not observable from both training data and the background knowledge, we have named them as pseudo-labels of x. The perception model is parameterised with θ and outputs the conditional probability P θ (z|x) = P (z|x, θ); the reasoning model H ∈ H is a set of first-order logical clauses such that B ∪H ∪z |= y, where "|=" means "logically entails". Our target is to learn θ and H simultaneously from training data D = { x i , y i } n i=1 . For example, if we have one example with x = [ , , ] and y = 6, given background knowledge about adding two numbers, the hybrid model should learn a perception model that recognises z = [1, 2, 3] and induce a program to add all numbers in z recursively. Assuming that D is an i.i.d. sample from the underlying distribution of (x, y), the objective of our learning problem can be represented as follows: (H * , θ * ) = arg max H,θ x,y ∈D z∈Z P (y, z|B, x, H, θ), where z is a hidden variable in this model. Theoretically, this problem can be solved by Expectation Maximisation (EM) algorithm. However, even if we can obtain the expectation of the hidden variable z and efficiently estimate the perception model's parameter θ with numerical optimisation, the hypothesis H, which is a first-order logic theory, is still difficult to be optimised together with θ. We propose to solve this problem by treating H like z as an extra hidden variable, which gives us: θ * = arg max θ x,y ∈D H∈H z∈Z P (y, H, z|B, x, θ) The hybrid-model learning problem in Equation 1 can be split into two EM steps: (1) Expectation: obtain the expected value of H and z by sampling them in their discrete hypothesis space from (H, z) ∼ P (H, z|B, x, y, θ); (2) Maximisation: estimate θ by maximising the likelihood of training data with efficient numerical optimisation approaches such as gradient descent. As one can imagine, the main challenge is to estimate the expectation of the hidden variables H ∪ z, i.e., we need to search for the most probable H and z given the θ learned in the previous iteration. 

3.2. EFFICIENT HYPOTHESIS SAMPLING BY COMBINING ABDUCTION AND INDUCTION

Inspired by early works in abductive logic programming (Flach et al., 2000) , we propose to solve the challenges above by combining logical induction and abduction. The induction learns an abductive logic theory H based on P θ (z|x), and the abduction by H reduces the search space of z. Abduction refers to the process of selectively inferring specific grounded facts and hypotheses that give the best explanation to observations based on background knowledge of a deductive theory. For example, if we know that H is a cumulative sum program and observe that x = [ , , ] and y = 6, then we can abduce that x must satisfy the constraint Z1+Z2+Z3=6, where [Z1, Z2, Z3] = z are the pseudo-labels of images in x. This constraint can largely prune the search space of z, in which all Zi > 6 can be excluded. If the current perception model assigns very high probabilities to Z1 = 2 and Z2 = 3, one can easily infer that Z3 = 1 even when the perception model has relatively low confidence about it, as this is the only solution that satisfies the constraint. An illustrative example of combining abduction and induction is shown in Figure 1 . Briefly speaking, instead of directly sampling pseudo-labels z and H together from the huge hypothesis space, our proposed Abductive Meta-Interpretive Learning approach only samples the abductive logic program H, and then use the abduced relational constraints to prune the search space of z. Meanwhile, the perception model outputs the likelihood of pseudo-labels with p θ (z|x) which defines a distribution over all possible values of z and helps to find the most probable H ∪ z. Formally, for each example x, y , we re-write the likelihood in Equation 2 as follows: P (y, H, z|B, x, θ) = P (y, H|B, z)P θ (z|x) = P (y|B, H, z)P (H|B, z)P θ (z|x) = P (y|B, H, z)P σ * (H|B)P θ (z|x), where P σ * (H|B) is the Bayesian prior distribution on first-order logic hypotheses, which is defined by the transitive closure of stochastic refinements σ * given the background knowledge B (Muggleton et al., 2013) , where a refinement is a unit modification (e.g., adding/removing a clause or literal) to a logic theory. The equations hold because: (1) pseudo-label z is conditioned on x and θ since it is the output of the perception model; (2) H follows the prior distribution so it only depends on B; (3) y ∪ H is independent from x given z because the relations among B, H, y and z are determined by pure logical inference, where: 5. P (y|B, H, z) = 1, if B ∪ H ∪ z y, 0, otherwise. Following Bayes' rule we have P (H, z|B, x, y, θ) ∝ P (y, H, z|B, x, θ). Therefore, we can sample the most probable H ∪ z in the expectation step according to Equation 3 as follows: 1. Sample an abductive first-order logic hypothesis H ∼ P σ * (H|B); 2. Use H ∪ B and y to abducefoot_1 possible pseudo-labels z, which are guaranteed to satisfy H ∪ B ∪ z y and exclude the values of z such that P (y|B, H, z) = 0; 3. According to Equation 3 and 4, tor each sampled H ∪ z calculate its score by: score(H, z) = P σ * (H|B)P θ (z|x) (5) 4. Return the H ∪ z with the highest score to continue the maximisation step. By learning an abductive logic theory H, the search space of pseudo-label z can be largely pruned thanks to the sparsity of the probabilistic distribution structured by B ∪ H ∪ z y.

3.3. THE M eta abd IMPLEMENTATION

We implement the above abduction-induction algorithm with Abductive Meta-Interpretive Learning (M eta Abd ), whose codes are shown in Figure 2 . It extends the general meta-interpreter of MIL (Muggleton et al., 2014) by including an abduction procedure (bold fonts in Figure 2 ) that can abduce relational constraints on pseudo-labels z for pruning the search space. Meta-Interpretive Learning (MIL) is a form of ILP (Muggleton & de Raedt, 1994) . It learns firstorder logic programs with a second-order meta-interpreter, which is composed of a definite firstorder background knowledge B and meta-rules M . B contains the primitive predicates for constructing first-order hypotheses H; M is second-order clauses with existentially quantified predicate variables and universally quantified first-order variables that shape the structure of the hypothesis space H. Briefly speaking, MIL attempts to prove the training examples and saves the resulting programs for successful proofs. However, MIL can only learn first-order logic programs from pure symbolic domains, where the examples are deterministic and noise-free. By combining abduction and induction, M eta Abd can learn abductive logic programs from noisy domains where the distribution on possible worlds (Nilsson, 1986 ) is given by a set of probabilistic facts. A possible world is a truth value assignment to the probabilistic logic facts. For the example in Figure 1 , each combination of the possible pseudo-labels of the three input images forms a (Trask et al., 2018) Predicates add, mult and eq Permutation Permutation matrix P sort (Grover et al., 2019) Prolog's permutation Sorting sort operator (Grover et al., 2019) Predicate s (learned from sub-task) possible world, whose probability distribution is defined by the probability values output by the perceptual neural net. As shown in the figure, given the abducible primitives as background knowledge M eta Abd can construct the hypotheses H while abducing the relational constraints on z. After an abductive hypothesis H has been sampled, the search for z will be done by logical abduction. Finally, the score of H ∪ z will be calculated by Equation 5, where P θ (z|x) is the output of the perception model, which in this work is implemented with a neural network ϕ θ that outputs: P θ (z|x) = sof tmax(ϕ θ (x, z)). Meanwhile, we define the prior distribution on H by following Hocquette & Muggleton (2018) : P σ * (H|B) = 6 (π • c(H)) 2 , where C(H) is the complexity of the learned program, e.g., the size of H.

4. EXPERIMENTS

This section describes the experiments which apply M eta Abd to learn first-order logic programs from images of handwritten digits in two scenarios: (1) cumulative sum/product and (2) sorting. The experiments aim to address the following two questions: 1. Can the abduction-induction strategy of M eta Abd learn first-order logic programs and train perceptual neural networks jointly? 2. Given the same type and amount of background knowledge shown in Table 1 , is hybrid modelling, which directly leverages the background knowledge in symbolic form, better than end-to-end learning?

4.1. LEARNING CUMULATIVE SUM AND PRODUCT FROM IMAGES

Materials Following the setting of Trask et al. (2018) , the inputs of the two tasks are series of randomly chosen MNIST digits; the numerical outputs are the sum and product of the digits, respectively. The lengths of training sequences are 2-5. To verify if the learned models can extrapolate to longer inputs, we also include test examples with length 10 (both tasks), 15 (in the cumulative product task) and 100 (in the cumulative sum task). In the cumulative product experiments, when the randomly generated sequence is long enough, it will be very likely to contain a 0 and makes the final outputs equal to 0. So the extrapolation examples with length 15 only contain digits from 1 to 9. The dataset contains 3000 and 1000 examples for training and validation, respectively; the test data of each length has 10,000 examples. Since the end-to-end models usually require more training data due to the model complexity, we also did experiments with 10,000 training examples for them. Methods We compare M eta Abd with four end-to-end learning baselines, including RNN, LSTM and LSTMs attached to Neural Accumulators(NAC) and Neural Arithmetic Logic Units (NALU)foot_2 (Trask et al., 2018) . The performance is measured by classification accuracy (Acc.) on length-one inputs, mean average error (MAE) in sum tasks, and mean average error on logarithm (log MAE) of the outputs in product tasks whose error grows exponentially with sequence length. A convnet processes the input images to the recurrent networks, as Trask et al. (2018) described; it also serves as the perception model of M eta Abd to output the probabilistic facts. As shown in Table 1 , all models are aware of the same amount of background knowledge: the end-to-end models use LSTM or RNN to handle recurring inputs and use NACs and NALUs to encode basic arithmetic functions, while M eta Abd can exploit them explicitly as primitive predicates in the Prolog language. Note that M eta Abd uses the same background knowledge for both sum and product tasks. Each experiment is carried out five times, and the average of the results are reported. Results Our experimental results are shown in Table 2 ; the learned first-order logic theories are shown in Figure 3a . End-to-end models that do not exploit any background knowledge (LSTM and RNN) perform worst on these tasks. For NALU and NAC, even though they can exploit background knowledge by using the specially designed differentiable neural modules, the performance is still significantly worse than M eta Abd given the same amount of training data or even more. Although M eta Abd achieves the best result among the compared methods, we observe that its EM learning sometimes converges to saddle points or local optima in the cumulative sum task. This phenomenon happens less in the other task, because of the distribution P (H, z|B, x, y, θ) of learning the cumulative product function is much sparser compared to cumulative sum. Therefore, we also carry out extra experiments with 1shot pre-trained convnets, which are trained by randomly sampling one example in each class of MNIST data. Although the pre-trained convnet is weak (Acc. 20∼35%), it provides a good initialisation for the EM algorithm and improves the learning performance. Methods In this task, we compare M eta Abd to NeuralSortfoot_3 (Grover et al., 2019) , which implements a differentiable relaxation of sorting operator. Given an input list of scalars, it generates a stochastic permutation matrix by applying the pre-defined deterministic or stochastic sort operator on the inputs, i.e., NeuralSort can be regarded as a differentiable approximation to bogosort (permutation sort). Although for M eta Abd it is easy to include stronger background knowledge for learning more efficient sorting algorithms like quicksort (Cropper & Muggleton, 2019) , in order to make a fair comparison, we adapt the same background knowledge as NeuralSort to logic rules and learn bogosort. We did not compare to other baselines such as LSTM/RNN with different activation layers because they are weaker than NeuralSort in this task (Grover et al., 2019) . The background knowledge of permutation in M eta Abd is implemented with Prolog's built-in predicate permutation. Meanwhile, instead of providing the information about sorting as prior knowledge like the NeuralSort, we try to learn the concept of "sorted" (represented by a monadic predicate s) from data as a sub-task, whose training set is the subset of the sorted within the training dataset (< 20 examples). To do this, M eta Abd uses an MLP attached to the same untrained convnet as previous experiments to produce dyadic probabilistic facts nn pred([ , | ]), which learns if the first two items in the image sequence satisfy a dyadic relation. Please note that the attached MLP is not provided with supervision on nn pred about whether it should learn "greater than" or "less than". Moreover, we do not provide any prior knowledge about total ordering, so nn pred only learns a dyadic partial order among the MNIST images. As we can see, the background knowledge used by M eta Abd is much weaker than that is used by NeuralSort. The sorting task and its sub-task are trained sequentially. In our experiments, the first five epochs of M eta Abd learn the sub-task, and then it re-uses the learned models to learn bogosort. Results Table 3 shows the average accuracy of the compared methods in the sorting tasks; The learned program of s and the dyadic neural net nn pred are both successfully re-used in the sorting task, where the learned program of s is consulted as interpreted background knowledge (Cropper et al., 2020) , and the neural network that generates probabilistic facts of nn pred is directly re-used and continuously trained during the learning of sorting. This experiment also demonstrates M eta Abd 's ability of learning recursive logic programs and predicate invention (the invented predicate s 1 in Figure 3a ).

5. CONCLUSION

In this paper, we present the Abductive Meta-Interpretive Learning (M eta Abd ) approach that can train neural networks and learn recursive first-order logic theories with predicate invention simultaneously. By combining symbolic learning with neural networks, M eta Abd can learn humaninterpretable models directly from raw-data, and the learned neural models and logic theories can be directly re-used in subsequent learning tasks. M eta Abd is a general framework for combining sub-symbolic perception with logical induction and abduction. The perception model extracts probabilistic facts from sub-symbolic data; the logical induction searches for first-order abductive theories in a relatively small hypothesis space; the logical abduction uses the abductive theory to prune the vast search space of the truth values of the probabilistic facts. The three parts are optimised together in a well-defined probabilistic model. In future work, we would like to apply M eta Abd on more complicated tasks that involve subsymbolic perception and symbolic induction, such as reinforcement learning. Instead of approximating logical inference with continuous and differentiable functions, M eta Abd uses pure logical inference for reasoning and it is possible to leverage more advanced symbolic inference/optimisation techniques like Satisfiability Modulo Theories (SMT) (Barrett & Tinelli, 2018) and Answer Set Programming (ASP) (Lifschitz, 2019) , which are able to perform large scale inference efficiently.



CLP(Z) is a constraint logic programming package accessible at https://github.com/triska/clpz. More implementation details please refer to the Appendix. The abduction can be naturally accelerated by parallel computing, more details are in the Appendix. We use the implementation of NAC and NALU from https://github.com/kevinzakka/NALU-pytorch We use the implementation of NeuralSort from https://github.com/ermongroup/neuralsort



This search problem is nontrivial. Sampling the values of hidden variable z results in a search space growing exponentially with the number of training examples. Even when B is sound and complete, existing hybrid models that do not learn first-order hypotheses still have to use Zero-Order Optimisation (ZOOpt) or Markov Chain Monte Carlo (MCMC) sampling to estimate the expectation of z(Dai et al., 2019;Li et al., 2020), which could be quite inefficient in practice.Furthermore, the size and structure of hypothesis space H of first-order logic programs makes the search problem even more complicated. For example, given x = [ , , ] and y = 6, when the perception model is accurate enough to output the most probable z = [1, 2, 3], we have at least two choices for H: cumulative sum or cumulative product. When the perception model is under-trained and outputs the most probable z = [2, 2, 3], then H could be a program that only multiplies the last two digits. Hence, H and z are entangled and cannot be treated independently. Example ( x, y ): f([ , , ], 15). Abducible Primitives (B): add([A,B|T], [C|T]) :-C #= A+B. mult([A,B|T], [C|T]) :-C #= A*B. eq([A| ], B) :-A #= B. head([H| ], H). tail([ |T], T). Neural Probabilistic facts (p θ (z|x)): nn( = 0, 0.02). nn( = 1, 0.39). ... nn( = 0, 0.09). nn( = 1, 0.02). ... nn( = 0, 0.07). nn( = 1, 0.00). ... Pseudo-labels (z): Abductive hypotheses (H): f(A,B) :-add(A,B). f(A,B) :-mult(A,B). f(A,B) :-add(A,C),eq(C,B). ... f(A,B) :-add(A,C),f(C,B). f(A,B) :-eq(A,B). ... f(A,B) :-tail(A,C),f 1(C,B). f 1(A,B) :-mult(A,C),eq(C,B). ... f(A,B) :-mult(A,C),f 1(C,B). f 1(A,B) :-mult(A,C),eq(C,B). ...

Figure 1: Example of M eta Abd 's abduction-induction learning. Given training examples, background knowledge of abducible primitives and probabilistic facts generated by a perceptual neural net, M eta Abd learns an abductive logic program H and abduces relational constraints (implemented with the CLP(Z) predicate "#=" 1 ) over the input images; it then uses them to efficiently prune the search space of the most probable pseudo-labels z (in grey blocks) for training the neural network.

Figure 3: Learned programs and the time efficiency of abduction.

Figure3bshows the time efficiency of M eta Abd 's abduction-induction strategy on one batch of examples in the cumulative sum task. "z → H" means first samples pseudo-labels z and then learn H with ILP; "H → z" means first sample an abductive hypothesis H and then use H to abduce z. The unit of x-axis is the average number of Prolog inferences, the number at the end of each bar is the average inference time in seconds. Evidently, the abduction leads to a substantial improvement in the number of Prolog inferences and significantly reduces the search complexity.

Figure  3ashows the learned programs. The performance is measured by the average proportion of correct permutations and individual permutations followingGrover et al. (2019). Although using weaker background knowledge, M eta Abd has a significantly better performance than NeuralSort in both interpolation (length 5) and extrapolation (length 3 & 7) experiments.

Figure 2: Prolog code for M eta Abd . It recursively proves a series of atomic goals in three ways:(1) deducing them from background knowledge; (2) abducing a possible grounded expression (e.g., relational constraint) to satisfy them (bold fonts); (3) matching them against the heads of metarules and form an augmented program or prove it with the current program. Finally, the abduced groundings Abd are used for searching the best pseudo-labels z; the probability of Abd is used for calculating the score function in Equation

Domain knowledge used by the compared models.

Results on the MNIST cumulative sum/product tasks.

Average accuracy of the bogosort task. First value is the rate of correct permutations; second value is the rate of correct individual element ranks.

