ABDUCTIVE KNOWLEDGE INDUCTION FROM RAW DATA

Abstract

For many reasoning-heavy tasks, it is challenging to find an appropriate endto-end differentiable approximation to domain-specific inference mechanisms. Neural-Symbolic (NeSy) AI divides the end-to-end pipeline into neural perception and symbolic reasoning, which can directly exploit general domain knowledge such as algorithms and logic rules. However, it suffers from the exponential computational complexity caused by the interface between the two components, where the neural model lacks direct supervision, and the symbolic model lacks accurate input facts. As a result, they usually focus on learning the neural model with a sound and complete symbolic knowledge base while avoiding a crucial problem: where does the knowledge come from? In this paper, we present Abductive Meta-Interpretive Learning (M eta Abd ), which unites abduction and induction to learn perceptual neural network and first-order logic theories simultaneously from raw data. Given the same amount of domain knowledge, we demonstrate that M eta Abd not only outperforms the compared end-to-end models in predictive accuracy and data efficiency but also induces logic programs that can be reused as background knowledge in subsequent learning tasks. To the best of our knowledge, M eta Abd is the first system that can jointly learn neural networks and recursive first-order logic theories with predicate invention.

1. INTRODUCTION

Inductive bias, background knowledge, is an essential component in machine learning. Despite the success of data-driven end-to-end deep learning in many traditional machine learning tasks, it has been shown that incorporating domain knowledge is still necessary for some complex learning problems (Dhingra et al., 2020; Grover et al., 2019; Trask et al., 2018) . In order to leverage complex domain knowledge that is discrete and relational, end-to-end learning systems need to represent it with a differentiable module that can be embedded in the deep learning context. For example, graph neural networks (GNN) use relational graphs as an external knowledge base (Zhou et al., 2018) ; some works even considers more specific domain knowledge such as differentiable primitive programs (Gaunt et al., 2017) . However, the design of these modules is usually ad hoc. Sometimes, it is not easy to find an appropriate approximation that is suited for single-model based end-to-end learning (Glasmachers, 2017; Garcez et al., 2019) . Therefore, many researchers propose to break the end-to-end learning pipeline apart and build a hybrid model that consists of smaller modules where each of them only accounts for one specific function (Glasmachers, 2017) . A representative branch in this line of research is Neural-Symbolic (NeSy) AI (De Raedt et al., 2020; Garcez et al., 2019) aiming to bridge System 1 and System 2 AI (Kahneman, 2011; Bengio, 2017) , i.e., neural-network-based machine learning and symbolicbased relational inference. In NeSy models, the neural network extracts high-level symbols from noisy raw data and the symbolic model performs relational inference over the extracted symbols. However, the non-differentiable interface between neural and symbolic systems (i.e., the facts extracted from raw data and their truth values) leads to high computational complexity in learning. For example, due to the lack of direct supervision to the neural network and reliable inputs to the symbolic model, some works have to use Markov Chain Monte Carlo (MCMC) sampling or zeroorder optimisation to train the model (Li et al., 2020; Dai et al., 2019) , which could be inefficient in practice. Consequently, almost all hybrid models assume the existence of a very strong predefined domain knowledge base and focus on using it to train neural networks. It limits the expressive power of the hybrid-structured model and sacrifices many benefits of symbolic learning (e.g., predicate invention, learning recursive theories, and re-using learned models as background knowledge). In this paper, we integrate neural networks with Inductive Logic Programming (ILP) (Muggleton & de Raedt, 1994)-a general framework for symbolic machine learning-to enable first-order logic theory induction from raw data. More specifically, we present Abductive Meta-Interpretive Learning (M eta abd ) which extends the Abductive Learning (ABL) framework (Dai et al., 2019; Zhou, 2019) by combining logical induction and abduction (Flach et al., 2000) with neural networks in Meta-Interpretive Learning (MIL) (Muggleton et al., 2014) . M eta Abd employs neural networks to extract probabilistic logic facts from raw data, and induces an abductive logic program (Kakas et al., 1992) that can efficiently infer possible truth values of the probabilistic facts to train the neural model. On the one hand, the abductive logic program learned by M eta Abd can largely prune the search space of the truth value assignments to the logical facts extracted by an under-trained neural model. On the other hand, the extracted probabilistic facts, although noisy, provide a distribution on the possible worlds (Nilsson, 1986) reflecting the raw data distribution, which helps logical induction to identify the most probable hypothesis. The two systems in M eta Abd are integrated by a probabilistic model that can be optimised with Expectation Maximisation (EM). To the best of our knowledge, M eta abd is the first system that can simultaneously (1) train neural models, (2) learn recursive logic theories and (3) perform predicate invention from domains with sub-symbolic representation. In the experiments we compare M eta Abd to the compared state-ofthe-art end-to-end deep learning models on two complex learning tasks. The results show that, given the same amount of background knowledge, M eta abd outperforms the end-to-end models significantly in terms of predictive accuracy and data efficiency, and learns human interpretable models that could be re-used in subsequent learning tasks.

2. RELATED WORK

Solving "System 2" problems require the ability of relational and logical reasoning instead of "intuitive and unconscious thinking" (Kahneman, 2011; Bengio, 2017) . Due to the complexity of this type of tasks, many researchers have tried to embed intricate background knowledge in end-to-end deep learning models. For example, Trask et al. (2018) propose the differentiable Neural Arithmetic Logic Units (NALU) to model basic arithmetic functions (e.g., addition, multiplication, etc.) in neural cells; Grover et al. ( 2019) encode permutation operators with a stochastic matrix and present a continuous and differentiable approximation to the sort operation; Wang et al. ( 2019) introduce a differentiable SAT solver to enable gradient-based constraint solving. However, most of these specially designed differentiable modules are ad hoc approximations to the original inference mechanisms, which can not represent the inductive bias in a general form such as formal languages. In order to directly exploit the complex background knowledge expressed by formal languages, Statistical Relational (StarAI) and Neural Symbolic (NeSy) AI (De Raedt et al., 2020; Garcez et al., 2019) are proposed. Some works try to approximate logical inference with continuous functions or use probabilistic logical inference to enable the end-to-end training (Cohen et al., 2020; Manhaeve et al., 2018; Donadello et al., 2017) ; others try to combine neural networks and pure symbolic reasoning by performing a combinatorial search over the truth values of the output facts of the neural model (Li et al., 2020; Dai et al., 2019) . Because of the highly complex statistical relational inference and combinatorial search, it is difficult for them to learn first-order logic theories. Therefore, most existing StarAI and NeSy systems focus on utilising a pre-defined symbolic knowledge base to help the parameter learning of the neural model and probabilistic model. One way to learn symbolic models is to use Inductive Logic Programming (Muggleton & de Raedt, 1994) . Some early work on combining logical abduction and induction can learn logic theories even when input data is incomplete (Flach et al., 2000) . Recently, ∂ILP was proposed for learning first-order logic theories from noisy data (Evans & Grefenstette, 2018). However, these works are designed for learning from domains. Otherwise, they need to use a fully trained neural model to extract primitive facts from raw data before symbolic learning. Machine apperception (Evans et al., 2019) unifies reasoning and perception by combining logical inference and binary neural networks in Answer Set Programming, in which logic hypotheses and parameters of neural networks are all represented by logical groundings, making the system hard to optimise. For problems involving noisy inputs like MNIST images, it still requires a fully pre-trained neural net for pre-processing.

