EM-RBR: A REINFORCED FRAMEWORK FOR KNOWL-EDGE GRAPH COMPLETION FROM REASONING PER-SPECTIVE

Abstract

Knowledge graph completion aims to predict the missing links among the knowledge graph (KG),i.e., predicting the possibility that a certain triple belongs to the knowledge graph. Most mainstream embedding methods focus on fact triplets contained in the given KG, however, ignoring the rich background information provided by logic rules driven from knowledge base implicitly. Limited to the modeling of algebraic space, contradictory in expressing certain relational patterns usually exists in the embedding models. Therefore, the representation of the knowledge graph is incomplete and inaccurate. To solve this problem, in this paper, we propose a general framework, named EM-RBR(embedding and rulebased reasoning), capable of combining the advantages of reasoning based on rules and the state-of-the-art models of embedding. EM-RBR aims to utilize relational background knowledge contained in rules to conduct multi-relation reasoning link prediction. In this way, we can find the most reasonable explanation for a given triplet to obtain higher prediction accuracy. In experiments, we demonstrate that EM-RBR achieves better performance compared with previous models on FB15k, WN18 and our new dataset FB15k-R, especially the new dataset where our model perform futher better than those state-of-the-arts. We make the implementation of EM-RBR available at https://github.com/1173710224/ link-prediction-with-rule-based-reasoning.

1. INTRODUCTION

Knowledge graph (KG) has the ability to convey knowledge about the world and express the knowledge in a structured representation. The rich structured information provided by knowledge graphs has become extremely useful resources for many Artificial Intelligence related applications like query expansion (Graupmann et al., 2005) , word sense disambiguation (Wasserman Pritsker et al., 2015) , information extraction (Hoffmann et al., 2011) , etc. A typical knowledge representation in KG is multi-relational data, stored in RDF format, e.g. (Paris, Capital-Of, France). However, due to the discrete nature of the logic facts (Wang & Cohen, 2016) , the knowledge contained in the KG is meant to be incomplete (Sadeghian et al., 2019) . Consequently, knowledge graph completion(KGC) has received more and more attention, which attempts to predict whether a new triplet is likely to belong to the knowledge graph (KG) by leveraging existing triplets of the KG. Currently, the popular embedding-based KGC methods aim at embedding entities and relations in knowledge graph to a low-dimensional latent feature space. The implicit relationships between entities can be inferred by comparing their representations in this vector space. These researchers (Bordes et al., 2013; Mikolov et al., 2013; Wang et al., 2014; Ji et al., 2015; Lin et al., 2015; Nguyen et al., 2017) make their own contributions for more reasonable and competent embedding. But the overall effect is highly correlated with the density of the knowledge graph. Because embedding method always fails to predict weak and hidden relations which a low frequency. The embedding will converge to a solution that is not suitable for triplets owned weak relations, since the training set for embedding cannot contain all factual triplets.However, reasoning over the hidden relations can covert the testing target to a easier one. For example, there is an existing triplet (Paul, Leader-Of, SoccerTeam) and a rule Leader-Of(x,y) =⇒ Member-Of(x,y) which indicates the leader of a soccer team is also a member of a sport team. Then we can apply the rule on the triplet to obtain a new triplet (Paul, Member-of, SportTeam) even if the relation Member-of is weak in knowledge graph. Besides, some innovative models try to harness rules for better prediction. Joint models (Rocktäschel et al., 2015; Wang et al., 2019; Guo et al., 2016) utilize the rules in loss functions of translation models and get a better embedding representation of entities and relations. An optimization based on ProPPR (Wang & Cohen, 2016) embeds rules and then uses those embedding results to calculate the hyper-parameters of ProPPR. These efforts all end up on getting better embedding from rules and triplets, rather than solving completion through real rule-based reasoning, which is necessary to address weak relation prediction as mentioned before. Compared with them, EM-RBR can perform completion from the reasoning perspective. Usually, there are some contradictions in the mathematical space of existing embedding-based models. Take transE (Bordes et al., 2013) and RotatE (Sun et al., 2019) as examples. For the relation pattern R(x, y) ⇒ R(y, x), i.e., a symmetrical relation, transE cannot model it as described in Sun et al. ( 2019). While RotatE can not model some relation pattern. For example, transitive relation R, w.r.t R(x, y) ∧ R(y, z) ⇒ R(x, z). We assume that the embedding of the relation R under RotatE is e iθ R , abbreviated as r. Formula 1 is necessary and sufficient to transitivity. So we will get r 2 = r, which means that e iθ R = 1, so θ R = 0, θ R ∈ [0, 2π). θ R = 0 denotes that R is a reflexive relation. That is all the embedding of the transitive relation will be trained to have the nature of the reflexive relation, which is out of our expect. x • r = y, y • r = z, x • r = z (1) In addition to the above problems, transE and RotatE models cannot model data that has multiple relations between two entities. TransR (Lin et al., 2015) can solve this problem by training the relationship as a transformation matrix. But none of them can solve the problem that an entity has the same relation with multiple other entities. For example, there are two triples (h, r, t 1 ), (h, r, t 2 ) in the knowledge graph(t 1 = t 2 ). For the transE model, h + r is a fixed result. So a wrong equation r 1 = r 2 will exist under transE model. The other two mathematical models are no exception. The above-mentioned shortcomings can be solved by defining the relation pattern in a rule directly. Morivated by this, We propose a novel framework EM-RBR combing embedding and rule-based reasoning, which is a heuristic search essentially. In the development of the joint framework EM-RBR, we meet two challenges. On the one hand, we use AMIE (Galárraga et al., 2013) to auto-mine large amount of rules but not manually. However, these rules automatically mined sometimes are not completely credible. Therefore, it is necessary to propose a reasonable way to measure rules to pick proper rules when reasoning. On the other hand, it is known that traditional reasoningbased methods will give only 0 or 1 to one triplet to indicate acceptance or rejection for the given knowledge graph. This conventional qualitative analysis lacks the quantitative information as the embedding models. So the result of EM-RBR need to reflect the probability one triplet belonging to the knowledge graph. Three main contributions in EM-RBR are summarized as follows: • EM-RBR is flexible and general enough to be combined with a lot of embedding models. • We propose a novel reasoning algorithm, which can distinguish a given triplet with other wrong triplets better. • We propose a novel rating mechanism for auto-mined reasoning rules and each rule will be measured properly in our framework. In the remaining of this paper, we will explain how our model works in Section 2, experiments in Section 3 and related work in Section 4.

2. METHOD

Our motivation is to use rules to make up for the shortcomings of embedding model. When the mathematical space of a certain embedding model is not enough to represent a certain relation pattern, the relation pattern can be declared through rules. For example, for the symmetric relationship R, we declare a rule R(x, y) ⇒ R(y, x). When scoring a certain triple (a, R, b) under transE model, we can take min(s(a, R, b), s(b, R, a)) as the The final score of the triple. This is equivalent

