CAUSAL INFERENCE FOR KNOWLEDGE GRAPH COM-PLETION Anonymous

Abstract

The basis of existing knowledge graph completion (KGC) models is to learn the correlations in data, such as the correlation between entities, relations and scores of triplets. Since correlation is not as reliable as causation, correlation-driven KGC models are weak in interpretability and suffer from the data bias issues. In this paper, we propose causal KGC models to alleviate the data bias issues by leveraging causal inference framework. Our method is intuitive and interpretable by utilizing causal graphs, controllable by using intervention techniques and model-agnostic. Causal graphs allow us to explain the causal relationships between variables and study the data generation processes. Under the causal graph, data bias can be seen as confounders. Then we block the bad effect of confounders by intervention operators to mitigate the data bias issues. Due to the difficulty of obtaining randomized data, causal KGC models pose unique challenges for evaluation. Thus, we show a method that makes evaluation feasible. Finally, we show a group theory view for KGC, which is equivalent to the view of causal but further reveals the causal relationships. Experimental results show that our causal KGC models achieve better performance than traditional KGC models on three benchmark datasets.

1. INTRODUCTION

A knowledge graph (KG) consists of a large number of triplets in the form of (head entity, relation, tail entity). Many KGs suffer from the incompleteness problem. To complement the KGs, knowledge graph completion (KGC) models define a scoring function to measure the likelihood of triplets. The core of traditional KGC models is to learn the correlation in data, such as the correlation between entities or relations and scores of triplets. Since correlation is not as reliable as causation, purely modeling the correlation leads to poor interpretability and the data bias issues. For example, due to ignoring popularity bias in KG data, KGC models are biased towards popular entities and relations (Mohamed et al., 2020) . In this paper, we propose causal KGC models to solve the data bias issues by utilizing causal inference techniques (Pearl, 2009b) . Our method is model-agnostic and just needs to add an extra term to the traditional KGC models. Causal inference defines causal graphs to describe the causal relationships between variables. Causal graphs can help build intuitive, interpretable and controllable KGC models. Traditional KGC models are only concerned with the correlations in the data, while ignoring the causation and the data generation process, which can lead to incorrect correlations between entities, relations and scores of triplets. Causal graphs allow us to explain the causal relationships between variables and study the data generation processes. Under the causal graph, data bias can be seen as confounders, where confounders in KG data are variables that simultaneously affect entities or relations and scores of triplets. We utilize intervention operators to eliminate the bad effect of confounders, which remove the path from confounders to entities and relations in the causal graph. Then we can estimate the causal effect or correct correlations in KG data by backdoor adjustment formula (Pearl, 2009b) . Causal KGC models present special challenges for evaluation, which need to evaluate on a randomized test set. However, a randomized test set is often difficult or infeasible to obtain. Therefore, we define a new evaluation metric to measure the performance of causal KGC models based on the popularity of entities and relations.

