MULTIMODAL ANALOGICAL REASONING OVER KNOWLEDGE GRAPHS

Abstract

Analogical reasoning is fundamental to human cognition and holds an important place in various fields. However, previous studies mainly focus on single-modal analogical reasoning and ignore taking advantage of structure knowledge. Notably, the research in cognitive psychology has demonstrated that information from multimodal sources always brings more powerful cognitive transfer than single modality sources. To this end, we introduce the new task of multimodal analogical reasoning over knowledge graphs, which requires multimodal reasoning ability with the help of background knowledge. Specifically, we construct a Multimodal Analogical Reasoning dataSet (MARS) and a multimodal knowledge graph MarKG. We evaluate with multimodal knowledge graph embedding and pre-trained Transformer baselines, illustrating the potential challenges of the proposed task. We further propose a novel model-agnostic Multimodal analogical reasoning framework with Transformer (MarT) motivated by the structure mapping theory, which can obtain better performance. We hope our work can deliver benefits and inspire future research 1 .

1. INTRODUCTION

Analogical reasoning -the ability to perceive and use relational similarity between two situations or events -holds an important place in human cognition (Johnson-Laird, 2006; Wu et al., 2020; Bengio et al., 2021; Chen et al., 2022a) and can provide back-end support for various fields such as education (Thagard, 1992 ), creativity (Goel, 1997) , thus appealing to the AI community. Early, Mikolov et al. (2013b); Gladkova et al. (2016a); Ethayarajh et al. (2019a) propose visual analogical reasoning aiming at lifting machine intelligence in Computer Vision (CV) by associating vision with relational, structural, and analogical reasoning. Meanwhile, researchers of Natural Language Processing (NLP) hold the connectionist assumption (Gentner, 1983) of linear analogy (Ethayarajh et al., 2019b) ; for example, the relation between two words can be inferred through vector arithmetic of word embeddings. However, it is still an open question whether artificial neural networks are also capable of recognizing analogies among different modalities. Note that humans can quickly acquire new abilities based on finding a common relational system between two exemplars, situations, or domains. Based on Mayer's Cognitive Theory of multimedia learning (Hegarty & Just, 1993; Mayer, 2002) , human learners often perform better on tests with analogy when they have learned from multimodal sources than single-modal sources. Evolving from recognizing single-modal analogies to exploring multimodal reasoning for neural models, we emphasize the importance of a new kind of analogical reasoning task with Knowledge Graphs (KGs). In this paper, we introduce the task of multimodal analogical reasoning over knowledge graphs to fill this blank. Unlike the previous multiple-choice QA setting, we directly predict the analogical target and formulate the task as link prediction without explicitly providing relations. Specifically, the task can be formalized as (e h , e t ) : (e q , ?) with the help of background multimodal knowledge graph

