LEARNING TO DECEIVE KNOWLEDGE GRAPH AUGMENTED MODELS VIA TARGETED PERTURBATION

Abstract

Knowledge graphs (KGs) have helped neural models improve performance on various knowledge-intensive tasks, like question answering and item recommendation. By using attention over the KG, such KG-augmented models can also "explain" which KG information was most relevant for making a given prediction. In this paper, we question whether these models are really behaving as we expect. We show that, through a reinforcement learning policy (or even simple heuristics), one can produce deceptively perturbed KGs, which maintain the downstream performance of the original KG while significantly deviating from the original KG's semantics and structure. Our findings raise doubts about KG-augmented models' ability to reason about KG information and give sensible explanations.

1. INTRODUCTION

Recently, neural reasoning over knowledge graphs (KGs) has emerged as a popular paradigm in machine learning and natural language processing (NLP). KG-augmented models have improved performance on a number of knowledge-intensive downstream tasks: for question answering (QA), the KG provides context about how a given answer choice is related to the question (Lin et al., 2019; Feng et al., 2020; Lv et al., 2020; Talmor et al., 2018) ; for item recommendation, the KG mitigates data sparsity and cold start issues (Wang et al., 2018b; 2019a; b; 2018a) . Furthermore, by using attention over the KG, such models aim to explain which KG information was most relevant for making a given prediction (Lin et al., 2019; Feng et al., 2020; Wang et al., 2018b; 2019b; Cao et al., 2019; Gao et al., 2019) . Nonetheless, the process in which KG-augmented models reason about KG information is still not well understood. It is assumed that, like humans, KG-augmented models base their predictions on meaningful KG paths and that this process is responsible for their performance gains (Lin et al., 2019; Feng et al., 2020; Gao et al., 2019; Song et al., 2019) . In this paper, we question if existing KG-augmented models actually use KGs in this human-like manner. We study this question primarily by measuring model performance when the KG's semantics and structure have been perturbed to hinder human comprehension. To perturb the KG, we propose four perturbation heuristics and a reinforcement learning (RL) based perturbation algorithm. Surprisingly, for KG-augmented models on both commonsense QA and item recommendation, we find that the KG can be extensively perturbed with little to no effect on performance. This raises doubts about KG-augmented models' use of KGs and the plausibility of their explanations.

2. PROBLEM SETTING

Our goal is to investigate whether KG-augmented models and humans use KGs similarly. Since KGs are human-labeled, we assume that they are generally accurate and meaningful to humans. Thus, across different perturbation methods, we measure model performance when every edge in the KG has been perturbed to make less sense to humans. To quantify the extent to which the KG has been perturbed, we also measure both semantic and structural similarity between the original

