NEURAL COMPOSITIONAL RULE LEARNING FOR KNOWLEDGE GRAPH REASONING

Abstract

Learning logical rules is critical to improving reasoning in KGs. This is due to their ability to provide logical and interpretable explanations when used for predictions, as well as their ability to generalize to other tasks, domains, and data. While recent methods have been proposed to learn logical rules, the majority of these methods are either restricted by their computational complexity and cannot handle the large search space of large-scale KGs, or show poor generalization when exposed to data outside the training set. In this paper, we propose an endto-end neural model for learning compositional logical rules called NCRL. NCRL detects the best compositional structure of a rule body, and breaks it into small compositions in order to infer the rule head. By recurrently merging compositions in the rule body with a recurrent attention unit, NCRL finally predicts a single rule head. Experimental results show that NCRL learns high-quality rules, as well as being generalizable. Specifically, we show that NCRL is scalable, efficient, and yields state-of-the-art results for knowledge graph completion on large-scale KGs. Moreover, we test NCRL for systematic generalization by learning to reason on small-scale observed graphs and evaluating on larger unseen ones.

1. INTRODUCTION

Knowledge Graphs (KGs) provide a structured representation of real-world facts (Ji et al., 2021) , and they are remarkably useful in various applications (Graupmann et al., 2005; Lukovnikov et al., 2017; Xiong et al., 2017; Yih et al., 2015) . Since KGs are usually incomplete, KG reasoning is a crucial problem in KGs, where the goal is to infer the missing knowledge using the observed facts. This paper investigates how to learn logical rules for KG reasoning. Learning logical rules is critical for reasoning tasks in KGs and has received recent attention. This is due to their ability to: (1) provide interpretable explanations when used for prediction, and (2) generalize to new tasks, domains, and data (Qu et al., 2020; Lu et al., 2022; Cheng et al., 2022) . For example, in Fig. 1 , the learned rules can be used to infer new facts related to objects that are unobserved in the training stage. Moreover, logical rules naturally have an interesting property -called compositionality: where the meaning of a whole logical expression is a function of the meanings of its parts and of the way they are combined (Hupkes et al., 2020) . To concretely explain compositionality, let us consider the family relationships shown in Fig. 2 . In Fig. 2 (a), we show that the rule (hasUncle ← hasMother ∧ hasMother ∧ hasSon) forms a composition of smaller logical expressions, which can be expressed as a hierarchy, where predicates (i.e., relations) can be combined and replaced by another single predicate. For example, predicates hasMother and hasMother can be combined and replaced by predicate hasGrandma as shown in Fig. 2(a ). As such, by recursively combining predicates into a composition and reducing the composition into a single predicate, we can finally infer the rule head (i.e., hasUncle) from the rule body. While there are various possible hierarchical trees to represent such rules, not all of them are valid given the observed relations in the KG. For example, in In this work, our objective is to learn rules that generalize to large-scale tasks and unseen graphs. Let us consider the example in Fig. 1 . From the training KG, we can extract two rulesrule 1 ⃝: hasGrandma(x, y) ← hasMother(x, z) ∧ hasMother(z, y) and rule 2 ⃝: hasUncle(x, y) ← hasGrandma(x, z) ∧ hasSon(z, y). We also observe that the necessary rule to infer the relation between Alice and Bob in the test KG is rule 3 ⃝: hasUncle(x, y) ← hasMother(x, z 1 ) ∧ hasMother(z 1 , z 2 ) ∧ hasSon(z 2 , y), which is not observed in the training KG. However, using compositionality to combine rules 1 ⃝ and 2 ⃝, we can successfully learn rule 3 ⃝ which is necessary for inferring the relation between Alice and Bob in the test KG. The successful prediction in the test KG shows the model's ability for systematic generalization, i.e., learning to reason on smaller graphs and making predictions on unseen graphs (Sinha et al., 2019) . Although compositionality is crucial for learning logical rules, most of existing logical rule learning methods fail to exploit it. In traditional AI, inductive Logic Programming (ILP) (Muggleton & De Raedt, 1994; Muggleton et al., 1990) is the most representative symbolic method. Given a collection of positive examples and negative examples, an ILP system aims to learn logical rules which are able to entail all the positive examples while excluding any of the negative examples. However, it is difficult for ILP to scale beyond small rule sets due to their restricted computational complexity to handle the large search space of compositional rules. There are also some recent neural-symbolic methods that extend ILP, e.g., neural logic programming methods (Yang et al., 2017; Sadeghian et al., 2019) and principled probabilistic methods (Qu et al., 2020) . Neural logic programming simultaneously learns logical rules and their weights in a differentiable way. Alternatively, principled probabilistic methods separate rule generation and rule weight learning by introducing a rule generator and a reasoning predictor. However, most of these approaches are particularly designed for the KG completion task. Moreover, since they require an enumeration of rules given a maximum rule length T , the complexity of these methods grows exponentially as max rule length increases, which severely limits their systematic generalization capability. To overcome these issues, several works such as conditional theorem provers (CTPs) (Minervini et al., 2020) , recurrent relational reasoning (R5) (Lu et al., 2022) focused on the model's systematicity instead. CTPs learn an adaptive strategy for selecting subsets of rules to consider at each step of the reasoning via gradient-based optimization while R5 performs rule extraction and logical reasoning with deep reinforcement learning equipped with a dynamic rule memory. Despite their strong generalizability to larger unseen graphs beyond the training sets (Sinha et al., 2019) , they cannot handle KG completion tasks for large-scale KGs due to their high computational complexity.



Figure 1: Illustration of how the compositionality of logical rules helps improve systematic generalization. (a) logical rule extraction from the observed graph (i.e., training stage) and (b) Inference on an unseen graph (i.e., test stage). The train and the test graphs have disjoint sets of entities. By combining logical rules 1⃝ and 2 ⃝ we can successfully learn rule 3⃝ for prediction on unseen graphs.Fig.2(b), given a KG which only contains relations {hasMother, hasSon, hasGrandma, hasUncle}, it is possible to combine hasMother and hasSon first. However, there is no proper predicate to represent it in the KG. Therefore, learning a high-quality compositional structure for a given logical expression is critical for rule discovery, and it is the focus of our work.Hierarchical Tree (a) ✓

