CAUSAL INFERENCE FOR KNOWLEDGE GRAPH COM-PLETION Anonymous

Abstract

The basis of existing knowledge graph completion (KGC) models is to learn the correlations in data, such as the correlation between entities, relations and scores of triplets. Since correlation is not as reliable as causation, correlation-driven KGC models are weak in interpretability and suffer from the data bias issues. In this paper, we propose causal KGC models to alleviate the data bias issues by leveraging causal inference framework. Our method is intuitive and interpretable by utilizing causal graphs, controllable by using intervention techniques and model-agnostic. Causal graphs allow us to explain the causal relationships between variables and study the data generation processes. Under the causal graph, data bias can be seen as confounders. Then we block the bad effect of confounders by intervention operators to mitigate the data bias issues. Due to the difficulty of obtaining randomized data, causal KGC models pose unique challenges for evaluation. Thus, we show a method that makes evaluation feasible. Finally, we show a group theory view for KGC, which is equivalent to the view of causal but further reveals the causal relationships. Experimental results show that our causal KGC models achieve better performance than traditional KGC models on three benchmark datasets.

1. INTRODUCTION

A knowledge graph (KG) consists of a large number of triplets in the form of (head entity, relation, tail entity). Many KGs suffer from the incompleteness problem. To complement the KGs, knowledge graph completion (KGC) models define a scoring function to measure the likelihood of triplets. The core of traditional KGC models is to learn the correlation in data, such as the correlation between entities or relations and scores of triplets. Since correlation is not as reliable as causation, purely modeling the correlation leads to poor interpretability and the data bias issues. For example, due to ignoring popularity bias in KG data, KGC models are biased towards popular entities and relations (Mohamed et al., 2020) . In this paper, we propose causal KGC models to solve the data bias issues by utilizing causal inference techniques (Pearl, 2009b) . Our method is model-agnostic and just needs to add an extra term to the traditional KGC models. Causal inference defines causal graphs to describe the causal relationships between variables. Causal graphs can help build intuitive, interpretable and controllable KGC models. Traditional KGC models are only concerned with the correlations in the data, while ignoring the causation and the data generation process, which can lead to incorrect correlations between entities, relations and scores of triplets. Causal graphs allow us to explain the causal relationships between variables and study the data generation processes. Under the causal graph, data bias can be seen as confounders, where confounders in KG data are variables that simultaneously affect entities or relations and scores of triplets. We utilize intervention operators to eliminate the bad effect of confounders, which remove the path from confounders to entities and relations in the causal graph. Then we can estimate the causal effect or correct correlations in KG data by backdoor adjustment formula (Pearl, 2009b) . Causal KGC models present special challenges for evaluation, which need to evaluate on a randomized test set. However, a randomized test set is often difficult or infeasible to obtain. Therefore, we define a new evaluation metric to measure the performance of causal KGC models based on the popularity of entities and relations. The main feature of causal is invariance or symmetry (Arjovsky et al., 2020) . Group theory is a language to describe symmetry. Thus, we finally show a view of group theory for KGC, which is equivalent to the view of causal but further uncovers the causal relationships. The view of group theory transcends the view of causal and shows potential applications. The main contributions of this paper are listed below: 1. To the best of our knowledge, we are the first to show the necessity of introducing causation into KGC and apply causal inference to KGC. 2. We propose causal KGC models to enhance the interpretability of KGC models and alleviate the data bias issues. Then we show a method to evaluate causal KGC models on observation datasets. 3. We show a view of group theory for KGC to further reveal the causal relationships. 4. We empirically show that causal KGC models outperform traditional KGC models on three benchmark datasets.

2. BACKGROUND

In this section, we introduce the related background of our model, knowledge graph completion and causal inference.

2.1. KNOWLEDGE GRAPH COMPLETION

Let E denote the set of entities and R denote the set of relations, a KG is composed of a set of triplets D = {(h, r, t)} ⊂ E × R × E, where h is a head entity, r is a relation and t is a tail entity. Lacroix et al. (2018) propose to augment every triplet (h, r, t) in D with its inverse triplet (t, r -1 , h). With this augmentation, KGC can be formulated as predicting the tail entities that satisfy a query (h, r, ?). A KG can also be represented by a 3rd-order binary tensor X ∈ {0, 1} |E|×|R|×|E| with X h,r,t = 1 if (h, r, t) ∈ D and X h,r,t = 0 if (h, r, t) / ∈ D. KGC models define a scoring function f (h, r, t) to measure the likelihood of a triplet (h, r, t) based on the corresponding embedding (h, r, t). A number of KGC models have been proposed (Zhang et al., 2021a) , we list four popular KGC models that we consider in our experiments. where h, r, t ∈ C n , t * i is the complex conjugate of t i and Re(•) is the real part of a complex number.



TransE (Bordes et al., 2013), a representative model of translation-based models, defines the scoring function as the negative distance between h + r and t, i.e., f (h, r, t) = -∥h + r -t∥ where (h, r, t) is the corresponding embeddings of (h, r, t), h, r, t ∈ R n , n is the dimension of embedding and ∥ • ∥ is a norm of a vector.RotatE (Sun et al., 2018)  generalizes the embedding from real vector space to complex vector space to model various relation patterns, and the scoring function is defined asf (h, r, t) = -∥h ⊙ r -t∥where h, r, t ∈ C n and ⊙ is Hadamard product.DistMult(Yang et al., 2014), a representative model of multiplicative models, defines the scoring function as the inner product of h, r and t, i.e., where h, r, t ∈ R n . ComplEx(Trouillon et al., 2017)  extends DistMult to complex vector space to handle asymmetric relation patterns and defines the scoring function as

