LOGICAL MESSAGE PASSING NETWORKS WITH ONE-HOP INFERENCE ON ATOMIC FORMULAS

Abstract

Complex Query Answering (CQA) over Knowledge Graphs (KGs) has attracted a lot of attention to potentially support many applications. Given that KGs are usually incomplete, neural models are proposed to answer the logical queries by parameterizing set operators with complex neural networks. However, such methods usually train neural set operators with a large number of entity and relation embeddings from the zero, where whether and how the embeddings or the neural set operators contribute to the performance remains not clear. In this paper, we propose a simple framework for complex query answering that decomposes the KG embeddings from neural set operators. We propose to represent the complex queries into the query graph. On top of the query graph, we propose the Logical Message Passing Neural Network (LMPNN) that connects the local one-hop inferences on atomic formulas to the global logical reasoning for complex query answering. We leverage existing effective KG embeddings to conduct one-hop inferences on atomic formulas, the results of which are regarded as the messages passed in LMPNN. The reasoning process over the overall logical formulas is turned into the forward pass of LMPNN that incrementally aggregates local information to finally predict the answers' embeddings. The complex logical inference across different types of queries will then be learned from training examples based on the LMPNN architecture. Theoretically, our query-graph representation is more general than the prevailing operator-tree formulation, so our approach applies to a broader range of complex KG queries. Empirically, our approach yields a new state-of-the-art neural CQA model. Our research bridges the gap between complex KG query answering tasks and the long-standing achievements of knowledge graph representation learning. Our implementation can be found at https://github.com/HKUST-KnowComp/LMPNN. Recently, neural models for Complex Query Answering (CQA) have been proposed to complete the unobserved knowledge graph and answer the complex query simultaneously. These models aim to address complex queries that belong to an important subset of the first-order queries. Formally speaking, the complex queries are Existentially quantified First Order queries and has a single free

1. INTRODUCTION

Knowledge Graphs (KG) are essential sources of factual knowledge supporting downstream tasks such as question answering (Zhang et al., 2018; Sun et al., 2020; Ren et al., 2021) . Answering logical queries is a complex but important task to utilize the given knowledge (Ren & Leskovec, 2020; Ren et al., 2021) . Modern Knowledge Graphs (KG) (Bollacker et al., 2008; Suchanek et al., 2007; Carlson et al., 2010) , though on a great scale, is usually considered incomplete. This issue is well known as the Open World Assumption (OWA) (Ji et al., 2021) . Representation learning methods are employed to mitigate the incompleteness issue by learning representations from the observed KG triples and generalizing them to unseen triples (Bordes et al., 2013; Trouillon et al., 2016; Sun et al., 2018; Zhang et al., 2019; Chami et al., 2020) . When considering logical queries over incomplete knowledge graphs, the query answering models are required to not only predict the unseen knowledge but also execute logical operators, such as conjunction, disjunction, and negation (Ren & Leskovec, 2020; Wang et al., 2021b) . variable (EFO-1) (Wang et al., 2021b) containing logical conjunction, disjunction, and negation (Ren & Leskovec, 2020) . The EFO-1 queries are transformed in the forms of operator trees, e.g., relational set projection, set intersection, set union, and set complement (Wang et al., 2021b) . The key idea of these approaches is to represent the entity set into specific embedding spaces (Ren & Leskovec, 2020; Zhang et al., 2021; Chen et al., 2022) . Then, the set operators are parameterized by neural networks (Ren & Leskovec, 2020; Amayuelas et al., 2022; Bai et al., 2022) . The strict execution of the set operations can be approximated by learning and conducting continuous mappings over the embedding spaces. It is observed by experiments that classic KG representation (Trouillon et al., 2016) can easily outperform the neural CQA models in one-hop queries even though the neural CQA models model the one-hop projection with complex neural networks (Ren & Leskovec, 2020; Amayuelas et al., 2022; Bai et al., 2022) . One possible reason is that the neural set projection is sub-optimal in modeling the inherent relational properties, such as symmetry, asymmetry, inversion, composition, etc, which are sufficiently discussed in KG completion tasks and addressed by KG representations (Trouillon et al., 2016; Sun et al., 2018) . On the other hand, Continuous Query Decomposition (CQD) (Arakelyan et al., 2021) method searches for the best answers with a pretrained KG representation. The logical inference step is modeled as an optimization problem where the continuous truth value of an Existential Positive First Order (EPFO) query is maximized by altering the variable embeddings. However, the speed and the performance of inference heavily rely on the optimization algorithm. It also assumes that the embeddings of entities and relations can reflect higher-order logical relations, which is not generally assumed in existing knowledge graph representation models. Moreover, it is unclear whether CQD can achieve good performance on complex queries with negation operatorsfoot_0 . In this paper, we aim to answer complex EFO-1 queries by equipping pretrained KG representations with logical inference power. First, we formulate the EFO-1 KG queries as Disjunctive Normal Form (DNF) formulas and propose to represent the conjunctive queries in the form of query graphs. In the query graph, each edge is an atomic formula that contains a predicate with a (possible) negation operator. For each one-hop atomic formula, we use the pretrained KG representation to infer the intermediate embeddings given the neighboring entity embedding, relation embedding, direction information, and negation information. We show that the inference can be analytically derived for the KG representation formulation. The results of one-hop atomic formula inference are interpreted as the logical messages passed from one node to another. Based on this mechanism, we propose a Logical Message Passing Neural Network (LMPNN), where node embeddings are updated by one Multi-Layer Perceptron (MLP) based on aggregated logical messages. LMPNN coordinates the local logical message by pretrained knowledge graph representations and predicts the answer embedding for a complex EFO-1 query. Instead of performing on-the-fly optimization over the query graph as CQD (Arakelyan et al., 2021) , we parameterize the query answering process as the forward pass of LMPNN which is trained from the observed KG query samples. Extensive experiments show that our approach is a new state-of-the-art neural CQA model, in which only one MLP network and two embedding vectors are trained. Interestingly, we show that the optimal number of layers of the LMPNN is the largest distance between the free variable node and the constant entity nodes. This makes it easy to generalize our approach to complex queries of arbitrary complexity. Hence, our approach bridges the gap between complex KG query answering tasks and the long-standing achievements of knowledge graph representation learning.

2. RELATED WORKS

Knowledge graph representation. Representing relational knowledge is one of the long-standing topics in representation learning. Knowledge graph representations aim to predict unseen relational triples by representing the discrete symbols in continuous spaces. Various algebraic structures (Bordes et al., 2013; Trouillon et al., 2016; Sun et al., 2018; Ebisu & Ichise, 2018; Zhang et al., 2019) are applied to represent the relational patterns (Sun et al., 2018) and different geometric spaces (Chami et al., 2020; Cao et al., 2022) are explored to capture the hierarchical structures in knowledge graphs. Therefore, entities and relations in large knowledge graphs can be efficiently represented in a continuous space. Neural complex query answering. Most existing works treat the complex queries as operator trees (Ren et al., 2020; Ren & Leskovec, 2020; Wang et al., 2021b) . The query types that can be answered are extended from existential positive first-order (EPFO) queries (Ren et al., 2020; Choudhary et al., 2021; Arakelyan et al., 2021) to the existential first-order (Ren & Leskovec, 2020; Zhang et al., 2021; Bai et al., 2022) , or more specifically EFO-1 queries (Wang et al., 2021b) . In a neural CQA model, the entity sets are represented by various forms, including probabilistic distributions (Ren & Leskovec, 2020; Choudhary et al., 2021; Bai et al., 2022) , geometric shapes (Ren et al., 2020; Zhang et al., 2021) , and fuzzy-logic-inspired representations (Chen et al., 2022) . In contrast to knowledge graph representations, the relation projections between sets are usually modeled by complex neural networks, including multi-layer perceptron (Ren & Leskovec, 2020) , MLPmixer (Amayuelas et al., 2022) , or even transformers (Bai et al., 2022) . However, their performances on one-hop queries are shown to be worse than the state-of-the-art but simple knowledge graph representation (Trouillon et al., 2016) . Other works compiled the queries into the graphs and then solve queries with graph neural networks (Daza & Cochez, 2020; Liu et al., 2022) . In contrast to this work, existing investigations only focused on EPFO queries and require to train the entire GNN from zero. Notably, knowledge graph representations also provide effective signals when answering complex queries. Specifically, CQD (Arakelyan et al., 2021) uses the KG representation to calculate the continuous truth value of an EPFO logical formula with the logical t-norms. Then, the embeddings are optimized to maximize the continuous truth value. The optimization can be applied in the embedding space as well as the symbolic space. Our experiments show that this method performs badly on complex queries with logical negation, see Section 4.3.

3. PRELIMINARIES

In this section, we formally introduce the knowledge graph and related model-theoretic concepts. These concepts are helpful when we define the DNF formulation of EFO-1 queries in Section 4. Then, we introduce the abstract interface of knowledge graph representation, which is useful in defining one-hop inference in Section 5. Model-theoretic concepts for knowledge graphs. A first-order language L is specified by a triple (F, R, C) where F, R, and C are sets of symbols for functions, relations, and constants, respectively. A knowledge graph is specified under the language L KG , where function symbol set F = ∅ and relation symbols in R denote binary relations. A knowledge graph KG is an L KG -structure given the entity set V, where each constant c ∈ C = V is also an entity and each relation r ∈ R is a set r ⊆ V × V. We say r(t 1 , t 2 ) = True when (t 1 , t 2 ) ∈ r. A knowledge graph is usually defined by the relation triple set E = {(h, r, t)}, where h and t are entities such that (h, t) ∈ r. The Open World Assumption (OWA) means only a subset of E can be observed. The observed knowledge graph is denoted by KG obs . A term is either a constant or a variable. And an atomic formula is either r(t 1 , t 2 ) or ¬r(t 1 , t 2 ) where t 1 and t 2 are terms and r is a relation. In the following parts, we use a • to denote an atomic formula. Then the first order formula can be inductively defined by adding connectives (conjunction ∧, disjunction ∨, and negation ¬) to atomic formulas and quantifiers (existential ∃ and universal ∀) to variables. The formal definition of first-order formulas can be found in Marker (2006) . A variable is bounded when associated with a quantifier, otherwise, it is free.

3.1. KNOWLEDGE GRAPH REPRESENTATIONS

Our approach relies upon the following abstract interface of knowledge graphs. Given the head entity embedding h, relation embedding r, and tail entity embedding t, a knowledge graph representation is able to produce a continuous truth value ψ(h, r, t) in [0, 1] of the embedding triple (h, r, t). In the symbolic space, whether (h, t) ∈ r is indicated by the {0, 1} truth value of r(h, t). In the embedding space, ψ(h, r, t) indicates the "probability" that (h, t) ∈ r. Hence, this definition is a continuous relaxation of the {0, 1} truth value. Each knowledge graph representation has a scoring function ϕ(h, r, t), which can be based on a similarity function or a distance function. It is easy to convert such functions into ψ by applying the Sigmoid function with necessary scaling and shift. For example, the scoring function of Com-plEx (Trouillon et al., 2016) embedding is. ϕ(h, r, t) = Re(⟨h ⊗ r, t⟩), 𝑒 ! 𝑒 " 𝑛 𝑖 𝑝 # 𝑐 ! 𝑐 " 𝑥 𝑦 (a) Operator Tree (b) Query Graph 𝑝 " 𝑝 ! 𝑟 ! ¬𝑟 " 𝑟 # Figure 1: The operator tree representation (a) and query graph representation (b) of an examplar complex query in Ren & Leskovec (2020) . The logical formula of this query is given by r 1 (x, c 1 ) ∧ ¬r 2 (c 2 , x) ∧ r 3 (y, x). For shorthand, this query is denoted as INP. The symbols about relations and terms are consistent in the query graph representation. In the operator tree representation, c 1 and c 2 are represented by anchor node operators e 1 and e 2 . Relation r 1 and r 3 are represented by projection node p 1 and p 3 . Relation r 2 is jointly represented by projection node p 2 and negation node p n . The fact that x is connected to all other nodes is represented by intersection operator i. where ⊗ denotes the element-wise complex number multiplication and ⟨x, y⟩ is the complex inner product. Re extracts the real part of a complex number. Then, the truth value of which can be computed by ψ(h, r, t) = σ(ϕ(h, r, t)), ( ) where σ is the sigmoid function. This truth value function is used in Arakelyan et al. (2021) with logic t-norms. In the context of knowledge graph representation learning, the entity embeddings h, t are usually related to specific entity symbols in a look-up table . In this work, we assume the embedding vector is related to not only specific entities but also variables. 4 EFO-1 QUERIES AND QUERY GRAPHS Without loss of generality, we consider the logical formulas under the disjunctive normal form. Then, we define the Existential First Order queries with a single free variable (EFO-1). Definition 1. Given a knowledge graph KG, an EFO-1 query Q is formulated as the first-order formula in the following disjunctive normal form, Q(y, x 1 , ..., x m ) = ∃x 1 , ..., ∃x m [a 11 ∧ a 12 ∧ • • • ∧ a 1n1 ] ∨ • • • ∨ a p1 ∧ a p2 ∧ • • • ∧ a pnp , ( ) where y is the only free variable and x i , 1 ≤ i ≤ m are m existential variables. a ij , 1 ≤ i ≤ p, 1 ≤ j ≤ n p are atomic formulas with constants and variables y, x 1 , . . . , x m . a ij can be either negated or not. To answer the EFO-1 queries, one is expected to identify the answer set A[Q, KG]. A[Q, KG] is the set of entities such that a ∈ A[q, KG] if and only if Q(y = a, x 1 , ..., x m ) = True. Moreover, since Q is given in the disjunctive normal form, let us consider Q(y, x 1 , ..., x m ) = CQ 1 (y, x 1 , ..., x m ) ∨ • • • ∨ CQ p (y, x 1 , ..., x m ), where CQ i = ∃x 1 , ...∃x m a i1 ∧ a i2 ∧ • • • ∧ a ini is a conjunctive query. It is easy to see that A[Q, KG] = ∪ p i=1 A[CQ i , KG]. Therefore, solving A[Q, KG] is equivalent to solving the answer sets for all conjunctive queries.

4.1. QUERY GRAPH FOR CONJUNCTIVE QUERIES

For each conjunctive query, the constant entities and variables are closely related by the atomic formulas. To emphasize the dependencies between entities and variables, we propose to use the query graph where the terms are nodes connected by the atomic formulas. Each node in the query graph is either a constant symbol or a free or existential variable. Each edge in the query graph represents an atomic formula containing both relation and negation information. Figure 1 shows our query graph representation and the operator tree representation (Wang et al., 2021b) for a typical query type defined in Ren & Leskovec (2020) . We see that the query graph is more concise than the operator tree. First, we can see that the nodes and edges have different meanings in operator trees and query graphs. In the operator trees representation, each node is an operator denoting a set operation, whose output can be fed into other set operators. When using the complex query answering models with the operator tree, the information flows from leaf to root, which is unidirectional. However, for the query graph, the messages are passed bi-directionally through each edge as we will show in Section 6. In Figure 1 , the central node x receives messages from all neighbor nodes.

4.2. EXPRESSIVENESS OF DEFINITION 1

Our definition is theoretically broader than all existing discussions. The definition in (Wang et al., 2021b) , though widely adapted and discussed in the existing literature, has implicit assumptions because they are proposed to predict the answers by neural operators. It is assumed that (1) the Skolemization process can always convert the query into a tree of set operators, and (2) all leaves of the operator tree are entities rather than variables. A counterexample that can be expressed by Definition 1 but cannot be represented by operator trees is shown in Appendix A.

4.3. LIMITATION OF OPTIMIZATION-BASED METHODS FOR NEGATED QUERIES

Our definition accepts the atomic formulas with negation operation. Therefore, It can be seen as a natural extension of the definitions in CQD (Arakelyan et al., 2021) . Moreover, we extended CQD to negation queries with the continuous truth value with fuzzy logical negator (see Appendix B). The extended method is named CQD(E), and the results are compared in Table 1 . We could see that the performance of CQD(E) is much less effective on negation queries. We conjecture that the landscape of the objective function, i.e., the continuous truth values of the complex formula with negation, can be non-convex. So the optimization problem is inherently harder. The non-convexity objective function is discussed in Appendix B.1.

5. ONE-HOP INFERENCE ON ATOMIC FORMULAS

As shown in Figure 1 , each edge in a query graph is an atomic formula containing the information of neighboring entities, relation, and logical negation, which are all crucial for predicting the answers. We propose to encode such entity, relation, and logical negation information by one-hop inference that maximizes the continuous truth value of the (negated) atomic formula. Let ρ be the logical message encoding function of four input parameters, including neighboring entity embedding, relation embedding, direction information (h2t or t2h), and logical negation information (0 for no negation and 1 for with negation). The goal of this section is to properly define ρ. Moreover, inference on one-hop atomic formula is much easier compared to that on the entire complex EFO-1 query graph, as discussed in Section 4.3. We also provide the closed-form expression of ρ for the knowledge graph embedding we used in this paper.

5.1. ONE-HOP INFERENCE IN NON-NEGATED ATOMIC FORMULAS

The first situation is to infer the head embedding ĥ given the tail embedding t and relation embedding r on a non-negated atomic formula. We formulate the inference task in the form of continuous truth value maximization: ĥ = ρ(t, r, t2h, 0) := arg max x∈D ψ(x, r, t), ( ) where D is the search domain for the embedding. Similarly, the tail embedding t can be infered given head embedding h and relation embedding r, that is, t = ρ(h, r, h2t, 0) := arg max x∈D ψ(h, r, x).

5.2. ONE-HOP INFERENCE IN NEGATED ATOMIC FORMULAS

To extend the definition for non-negated atomic formulas to negated formulas, one only need to compute the continuous truth value of a negated atomic formula by the fuzzy logic negator (Hájek, 2013), that is, ψ(h, ¬r, t) = 1 -ψ(h, r, t). Then the estimation of head and tail embeddings is related to the following inference problems ĥ = ρ(t, r, t2h, 1) := arg max 𝑐 ! 𝑐 " 𝑥 𝑦 𝑟 ! ¬𝑟 " 𝑟 # 𝜌 𝑦, 𝑟 # , ℎ2𝑡, 0 𝜌 𝑐 " , 𝑟 " , ℎ2𝑡, 1 𝜌 𝑐 ! , 𝑟 ! , x∈D ψ(x, ¬r, t) = arg max x∈D [1 -ψ(x, r, t)] , t = ρ(h, r, h2t, 1) := arg max x∈D ψ(h, ¬r, x) = arg max x∈D [1 -ψ(h, r, x)] . This optimization-based approach is similar to CQD discussed in Section 4.3, but it is more reliable since atomic formulas are what we used to train the knowledge graph representation. Specifically, the objectives in Eq. ( 5) and Eq. ( 6) are eventually the likelihood of positive samples and those in Eq. ( 7) and Eq. ( 8) are the likelihood of negative samples. These objectives are widely used to learn the representations with negative sampling. Closed-form message encoding function ρ with pretrained KG representation. We have already defined ρ with optimization problems. Moreover, the closed-form expression of ρ can be (approximately) derived in many cases given two facts about the knowledge graph representations: (1) the scoring function of knowledge graph representation is usually as simple as the inner product or distance. More details about constructing closed-form ρ for these two types of scoring functions are discussed in Appendix D.1; (2) the sigmoid function outside the scoring function ϕ makes the final truth value zero or one only if the output of the scoring function is sufficiently small or large. We identify the closed-form approximation of ρ for ComplEx (Trouillon et al., 2016) and other five different KG representations in Appendix C and D, which allows fast computation logical messages used in Section 6.

6. LOGICAL MESSAGE PASSING NEURAL NETWORKS

In this section, we propose a Logical Message Passing Neural Network (LMPNN) to bridge the one-hop inference proposed in Section 5 and complex query answering defined in Section 4. As a variation of the message-passing neural network (Gilmer et al., 2017; Xu et al., 2018) , LMPNN has two stages: (1) each node passes a message to all its neighbors; (2) each node aggregates the received messages and updates its latent embedding. Figure 2 illustrates how those two stages work. Then the final layer embedding for the free variable node can be used to predict the answer entities.

6.1. LOGICAL MESSAGE PASSING OVER THE QUERY GRAPH

We use the message encoding function ρ to compute the messages passed from node to node. Figure 2 (a) demonstrates the logical message passing with blue arrows. Each node receives the message from all its neighboring nodes.

6.2. NODE EMBEDDINGS IN QUERY GRAPH AND UPDATING SCHEME

Let n be a node and z (l) n be the embedding of n at the l-th layer. We discuss how to compute the z (l) n from the input layer l = 0 to latent layers l > 0. When l = 0, z n falls into one of three situations. (1) For an entity node e, z (0) e is looked up from the pretrained knowledge graph representation. (2) For an existential variable node x i , we assign an learnable embedding z (0) xi = v x . (3) For a free variable node y, we assign another learnable embedding z (0) y = v y . We set that all existential variables x i share one v x , for simplicity. At the l-th layer, z (l) n can be computed by updating the aggregated information from the (l -1)-th layer. Specifically, let N (n) be the neighbor set of node n in the query graph. For each neighbor node v ∈ N (n), one can obtain its embedding z (l-1) v ∈ D, the relation r v→n ∈ R, the direction D v→n ∈ {h2t, t2h}, and the negation indicator Neg v→n ∈ {0, 1}. Then, the embedding z (l) n , l ≥ 1 is computed by an MLP network after the summation of the aggregated information, that is, z (l) n = MLP (l)   ϵz (l-1) n + v∈N (n) ρ z (l-1) v , r v→n , D v→n , Neg v→n   , ( ) where ϵ is a hyperparameter. To feed the complex vector of ComplEx (Trouillon et al., 2016) into the MLP network, the real and imaginary vectors of one complex embedding are concatenated and regarded as one feature vector. The formulation in Eq. ( 9) is similar to the Graph Isomorphic Networks (Xu et al., 2018) except that the logical messages passed are encoded by ρ from the pretrained KG representation. Trainable v x and v y are unrelated to any specific entity.

6.3. LEARNING LMPNN FOR COMPLEX QUERY ANSWERING

To train the neural network, we apply the Noisy Contrastive Estimation (NCE) loss for ranking tasks proposed in (Ma & Collins, 2018) . Let {(a i , q i )} n i=1 be the positive data samples, where a i ∈ A[q i , KG]. Our optimization involves K uniformly sampled noisy answers from the entity set. The NCE objective is: L N CE = 1 n n i=1 log exp [cos(a i , z(q i ))/T ] exp [cos(a i , z(q i ))/T ] + K k=1 exp [cos(z k , z(q i ))/T ] , ( ) where a i is the embedding of positive answer a i and z k is the embedding of the noisy entity samples. z(q i ) indicates the embedding of the free variable in q i at the final layer of LMPNN. T is a hyperparameter. This objective is optimized by stochastic gradient descent.

6.4. ANSWERING COMPLEX QUERIES WITH LMPNN

We discuss two ways to retrieve answers for general DNF queries in Definition 1: (a) A two-step approach as the previous works (Ren et al., 2020; Ren & Leskovec, 2020) , where the free variable embedding for each sub conjunctive query are estimated, the answer entities are then ranked by the minimal distance (or maximal similarity) against free variable embeddings from multiple sub conjunctive queries. ( 2) We transform all disjunctions in the formula to conjunctions, then one query graph is sufficient for solving the transformed query. The answer set of a transformed query is a strict subset of the originanl answer. For simplicity, we use the second way to solve disjunctive queries in this paper, though it may lead to sub-optimal performance. Then, we discuss how to answer conjunctive queries with LMPNN. Conjunctive query graph of arbitrary size. We apply LMPNN to the query graph of a given conjunctive query Q. A sufficient condition to produce a correct answer is that the free variable node has received messages from all the entity nodes after the forward passing through LMPNN layers. Let the largest distance between entity nodes and the free variable node be L. Then, we apply the LMPNN layers L times to ensure all messages from entity nodes are successfully received by the free variable node. The prediction of answer embedding z(Q) is given by the free variable embedding at the final layer, i.e., z(Q ) = z (L) y . We propose to use the cosine similarity between z(Q) and the pretrained entity embeddings to rank the entities and then retrieve answers. Since L is not determined, we assume all L layers share the same MLP layer. Hence, the only trainable parameter in LMPNN is one MLP network and two embeddings for existential and free variables. Our experiments on different query types show that the single MLP network has strong generalizability to LMPNN of different depths. 

7. EXPERIMENTS

In this section, we compare LMPNN with existing neural CQA methods and justify the important features of LMPNN with ablation studies. Our results show that LMPNN is a very strong method for answering complex queries.

7.1. EXPERIMENTAL SETTINGS

Baselines. We consider the neural complex query answering models for EFO-1 queries in recent three years, including BetaE (Ren & Leskovec, 2020) , ConE (Zhang et al., 2021) , and Q2P (Bai et al., 2022) . The baseline results are obtained by training models with the code released by the authors under the suggested hyperparameters. Neural-symbolic ensemble models are implemented with the negation. Moreover, we also implement and report CQD (Arakelyan et al., 2021) with the same pretrained knowledge graph representation. We also compare more neural CQA models in Appendix E. Datasets. We consider the widely used training and evaluation dataset in (Ren & Leskovec, 2020) . It allows us to compare our results with existing methods directly. We compare the results on FB15k (Bordes et al., 2013) , FB15k-237 (Toutanova et al., 2015) , and NELL (Carlson et al., 2010) . Evaluations. The evaluation metric follows the previous works (Ren & Leskovec, 2020) . For each query instance, we first rank all entities except those observed as easy answers based on their cosine similarity with the free variable embedding estimated by LMPNN. The rankings of hard answers are used to compute MRR for the given query instance. Then, we average the metrics from all query instances. In this paper, MRR is reported and compared. LMPNN Setting. We use the ComplEx (Trouillon et al., 2016) To justify the effect of one-hop inference, we compare a baseline with logical messages computed by a linear transformation of the concatenation of the entity embedding, the relation embedding, a binary indicator for h2t and t2h, and a binary indicator for negation. For example, for ComplEx embedding in 1,000-dimensional complex vector space, there are 2,000 parameters for entity embedding and 2,000 for relation embedding. The concatenation produces a feature of 4,002 dimensions. Then we use a linear transformation to transform this feature to 2000 dimensions so that the logical message can be used in Eq. ( 9). This baseline is denoted as KGE CAT. To justify the effect of the depth of LMPNN, we alter the depth of LMPNN based on its original depth L into L -1, L + 1, L + 2, and L + 3. The value L is computed from the maximal distances between the free variable node and constant entity nodes in the query graph. Even in the L -1 case, we keep the depth of LMPNN at least 1 to ensure the logical message is passed between nodes. Table 2 shows the results of the ablation study, where the setting reported in Table 1 is indicated by BEST CHOICE. We note that BEST CHOICE uses one-hop inference on atomic formulas, L LMPNN layers, ϵ = 0.1, and T = 0.05 for FB15k-237. We find that KGE CAT performs poorly even though it contains the pretrained KG information, which indicates that one-hop inference is essential to answer complex queries. Meanwhile, L -1 performs worse than BEST CHOICE since the information is not fully passed to the free variable node. And the worse performances of L + 1, L + 2, and L + 3 cases indicate that our definition for L is reasonable. Moreover, ϵ and T are also important to the best performance. Overall, one-hop inference on atomic formula is the most critical factor in the learning and inference process of LMPNN.

8. CONCLUSION

In this paper, we present LMPNN to answer complex queries, especially EFO-1 queries, over knowledge graphs. LMPNN achieves a strong performance by training one MLP network to aggregate the logical messages passed over the query graph. In the ablation study, we identify that the one-hop inference on atomic formulas based on a pretrained knowledge graph is critical to answering complex queries. Our research effectively bridges the gap between EFO-1 query answering tasks and the long-standing achievements of knowledge graph representation. In future work, our method can be combined with stronger knowledge graph representation techniques, as well as with neural-symbolic ensembles.

A A COUNTEREXAMPLE FOR THE EXPRESIVENESS OF OPERATOR TREE REPRESENTATION

Example 1. Given a citation network with authors, papers, and conferences, one query wants to find ICLR authors with at least one collaborator. It can be expressed in the format in Definition 1 as q(a 1 , a 2 , p 1 , p 2 ) = ∃a 2 ∃p 1 ∃p 2 IsAuthor(a 1 , p 1 ) ∧ InConf(p 1 , ICLR) ∧ IsAuthor(a 1 , p 2 ) ∧ IsAuthor(a 2 , p 2 ) ∧ ¬(a 1 = a 2 ). We see that if we take a 1 as the answer node. a 2 and p 2 are leaves but not anchor entities. In this way, this query cannot be represented by the operator tree anchor nodes. However, this query can be represented in the query graph, see Figure 3 . Then we discuss how to answer this query with LMPNN. It is easy to see that LMPNN can be applied to the query graph in Figure 3 once the ̸ = is considered as the combination of predicate eq (equality) and negation ¬. To include eq, we only need to define the logical messages ρ(a 1 , eq, 1) and ρ(a 2 , eq, 1). According to the Proposition 2 in Appendix D.1, these two problems boil down to defining ρ(a, eq, 0) = f (a, eq). By the semantics of "equality", equal terms shares the equal embedding. Therefore, the entity embedding which is equal to a given embedding a is just f (a, eq) = a. Then, ρ(a, eq, 0) = f (a, eq) = a and ρ(a 1 , eq, 1) = -a 1 , ρ(a 2 , eq, 1) = -a 2 . In this way, LMPNN is the first actionable approach to address the queries in Example 1.

B A NATURAL EXTENSION OF COMPLEX QUERY DECOMPOSITION (CQD) TO ANSWER NEGATION QUERIES

In this paper, we compare the optimization-based approach CQD (Arakelyan et al., 2021) by extending existing CQD with fuzzy logic negator. The extended version is denoted as CQD(E). For example, consider the logical formula INP query in the Figure 1 , we could estimate the continuous truth value of the given logical formula r 1 (x, c 1 ) ∧ ¬r 2 (c 2 , x) ∧ r 3 (y, x) as follows T V CQD(E) (x, y|INP) = ψ r1 (x, c 1 )⊤ [1 -ψ r2 (c 2 , x)] ⊤ψ r3 (y, x), where ϕ • are the continuous value of relations r 1 , r 2 , and r 3 and ⊤ is a t-norm. Then CQD(E) maximizes the continuous truth value T V CQD(E) (x, y|INP) to obtain the "best" variable embeddings x and y as Arakelyan et al. (2021) .

B.1 NON-CONVEX LANDSCAPE OF NEGATED COMPLEX QUERIES

In this part, we show that the negator in fuzzy logic introduces non-convexity. Let x be an optimizable variable in the 1D interval I and ϕ 1 (x) and ϕ 2 (x) be two continuous truth value of two atomic Published as a conference paper at ICLR 2023 formula a 1 and a 2 , respectively. They are convex functions over I. Consider the conjunctive query a 1 ∧ ¬a 2 . The continuous truth value is J(x) = ϕ 1 (x)⊤[1 -ϕ 2 (x)]. Consider an example with convex ϕ 1 (x) and ϕ 2 (x). Let ⊤ is the product t-norm, and ϕ 1 (x) = 1-x 2 and ϕ 2 (x) = 1 -(x -0.3) 2 for x ∈ [0, 1]. Then J(x) turns to be non-convex as shown in Figure 4 .

C CLOSED-FORM LOGICAL MESSAGE BY COMPLEX

In this section, we derive the closed-form logical message encoding function for ComplEx embedding (Trouillon et al., 2016) . The scoring function of ComplEx is ϕ(h, r, t) = Re(⟨h ⊗ r, t⟩). We expand the complex embeddings to real vectors h = h r + ih i , r = r r + ir i t = t r + it i . Then the scoring function is ϕ(h, r, t) = Re(⟨h ⊗ r, t⟩) (14) = ⟨r r ⊗ h r -r i ⊗ h i , t r ⟩ + ⟨r r ⊗ h i + r i ⊗ h r , t i ⟩ (15) = ⟨r r ⊗ t r + r i ⊗ t i , h r ⟩ + ⟨r r ⊗ t i -r i ⊗ t r , h i ⟩. (16) Since -r i = ri under the complex conjugate, then, ϕ(h, r, t) = ⟨r r ⊗ t r -ri ⊗ t i , h r ⟩ + ⟨r r ⊗ t i + ri ⊗ t r , h i ⟩ (17) = Re(⟨t ⊗ r, h⟩). Then, we optimize the continuous truth value of ComplEx given in Eq. ( 2) to derive the closed-form estimation of Eq. ( 5). We note that the embedding used in ComplEx is not strictly restricted in a domain set D. Instead, the N3 regularization (Lacroix et al., 2018) is applied to the embedding as a soft constraint. Therefore, in our derivation of the close form solution, we also employ N3 regularization rather than hard constraint. Our first result is the following proposition. Proposition 1. For ComplEx embedding, the logical message encoding function has the following closed form with respect to the complex embedding r and t, ρ(t, r, t2h, 0) = r ⊗ t 3λ∥r ⊗ t∥ . ( ) Proof. We expand the optimization problem as follows, ρ(t, r, t2h, 0) = arg max Notice that r r ⊗ t r -ri ⊗ t i and r r ⊗ t i + ri ⊗ t r are the real and imaginary part of r ⊗ t. Let s = [r r ⊗ t r -ri ⊗ t i , r r ⊗ t i + ri ⊗ t r ] be the real vector concatenated by the real and imageinary part of r ⊗ t. Also, let x = [x r , x i ] be the real vector concatenated by the real and imageinary part of x. Then the Eq. ( 21) is equivalent to the following optimization problem in the real space: x∈C d Re(⟨r ⊗ t, x⟩) -λ∥x∥ 3 (20) = arg max x∈C d ⟨r r ⊗ t r -ri ⊗ t i , x r ⟩ + ⟨r r ⊗ t i + ri ⊗ t r , x i ⟩ -λ ⟨x r , x r ⟩ + ⟨x i , x i ⟩ 3 . max x∈R 2d ⟨s, x⟩ -λ∥ x∥ 3 2 :=J . ( ) We note that J is convex function over x. To optimize x, we optimize the unit direction v and length η of x, with rewriting x = ηv. When η is fixed the second term is also fixed, it is easy to see that the v * = s/∥s∥ 2 maximizes the first term. Then we find the optimal η by minizing the following objective for η > 0: J = ∥s∥ 2 η -λη 3 . ( ) By letting dJ dη = ∥s∥ 2 -3λη 2 = 0, we derive the optimal η * = ∥s∥2 3λ . Then, we have x * = η * v * = s 3λ∥s∥ 2 . ( ) Then, we identify the optimal real and imaginary part of x * from x * , and thus recover the optimal x * . Similarly, we derive the optimal closed-form expression of ρ in all other cases: ρ(h, r, h2t, 0) = r ⊗ h 3λ∥r ⊗ h∥ , ρ(t, r, t2h, 1) = -r ⊗ t 3λ∥r ⊗ t∥ , ρ(h, r, h2t, 1) = -r ⊗ h 3λ∥r ⊗ h∥ . ( ) We note that the value of λ is not determined. On the one hand, it can be of course a hyperparameter to discuss. In LMPNN application, we just let 3λ∥ • ∥ = 1 and then all denominators in the closedform expression are 1.

D CLOSED-FORM LOGICAL MESSAGES FOR KG REPRESENTATIONS

We demonstrate two general ways to construct closed-form logical messages function ρ for LMPNN in the Appendix D.1. Then, we show six examples to illustrate how our approach constructs the closed form ρ for various KG representations in the Appendix D.2. Specifically, our constructions apply to two types of KG representations characterized by their scoring functions. The first type of KG representations uses inner-product-based scoring functions while the second type of KG representations uses distance-based scoring functions. Moreover, we provide six examples of KG representations, including RESCAL (Nickel et al., 2011 ), TransE (Bordes et al., 2013) , DistMult (Yang et al., 2014) , ComplEx (Trouillon et al., 2016) , ConvE (Dettmers et al., 2018) , and RotatE (Sun et al., 2018) .

D.1 TWO CONSTRUCTIONS

As discussed in Section 5, the closed-form logical message encoding function ρ is the result of the closed-form solution of four one-hop inference problems (estimating the head or tail entity embedding with or without logical negation, see . This leads to four construction tasks. The major result of Appendix D.1 is Proposition 2. It shows that, with our constructions of two types of scoring functions, closed-form solutions for four one-hop inference problems are actually dependent. Once one of four one-hop inference problems are approximately solved in the closed form, the other three one-hop inference problems are also solved approximately in the closed form. Simplification with reciprocal relations. We simplify the four construction tasks into two tasks by introducing reciprocal relations. For each relation r ∈ R, the reciprocal relation is r -1 ∈ R -1 but in the reversed direction. By introducing reciprocal relations r -1 and training their embeddings r -1 , the one-hop inference in the tail-to-head direction can be rewritten in the head-to-tail direction. Specifically, we have ρ(t, r, t2h, 0) = ρ(t, r -1 , h2t, 0), (28) ρ(t, r, t2h, 1) = ρ(t, r -1 , h2t, 1). ( ) Introducing reciprocal relations is shown to improve the performances of the link prediction tasks (Ruffinelli et al., 2020) . We assume that the reciprocal relation embedding can be obtained, irrespective of being separately trained or analytically derived from the original relation embedding, such as ComplEx discussed in Appendix C. Then, it suffices to construct the closed-form solution for ρ(h, r, h2t, 0) and ρ(h, r, h2t, 1), and the rest two types of logical messages are naturally defined with reciprocal relation embeddings. Then we construct the closed-form ρ(h, r, h2t, 0) and ρ(h, r, h2t, 1) for two types of KG embeddings, characterized by their scoring functions. We emphasize that the derivations below are only approximate estimations to keep the closed-form expression as simple as possible. However, empirical results show that these simple and approximate closed-form solutions can already be used in LMPNN. Type 1: inner-product-based scoring function. The inner-product-based scoring function for a triple of embeddings (h, r, t) is ⟨f (h, r), t⟩, where f is a binary function of the entity and relation embeddings and ⟨•, •⟩ is the inner product. The inner-product ⟨•, •⟩ can be defined in real or complex vector spaces. This scoring function is used in RESCAL (Nickel et al., 2011) DistMult (Yang et al., 2014) , ComplEx (Trouillon et al., 2016) , ConvE (Dettmers et al., 2018) , etc. When optimizing the embeddings, l q 2 regularizations (q = 2, 3) are usually applied (Ruffinelli et al., 2020) . Then we consider the following optimization problem: ρ(h, r, h2t, 0) = arg max x σ (⟨f (h, r), x⟩) -λ∥x∥ p 2 :=J1 , where hyperparameter λ > 0 is a regularization coefficient, σ is the sigmoid function. We note that J 1 is just the Lagrangian of the following maximization problem, and the λ is the Langrangian multiplier max ∥x∥ q 2 <δ σ (⟨f (h, r), x⟩) , where x is restricted inside a δ 1 q -ball. Then we could conclude that arg max ∥x∥ q 2 <δ σ (⟨f (h, r), x⟩) = arg max ∥x∥ q 2 <δ ⟨f (h, r), x⟩ = δ 1 q f (h, r) ∥f (h, r)∥ 2 . ( ) By altering the hyperparameter δ = ∥f (h, r)∥ p 2 , we could derive a simple result arg max x σ (⟨f (h, r), x⟩) -λ∥x∥ p 2 ≈ f (h, r). Therefore, we define ρ(h, r, h2t, 1) := f (h, r). Similarly, for the ρ(h, r, h2t, 1), we have ρ(h, r, h2t, 1) = arg max x [1 -σ (⟨f (h, r), x⟩)] -λ∥x∥ p 2 (35) = arg max x σ (⟨-f (h, r), x⟩) -λ∥x∥ p 2 (36) ≈ arg max ∥x∥ p 2 <δ ⟨-f (h, r), x⟩. We conclude the closed-form solution as ρ(h, r, h2t, 0) := -f (h, r). Type 2: distance-based scoring function. Another type of scoring functions for a triple of embeddings (h, r, t) is γ -∥f (h, r) -t∥, where f follows the definition above and γ is a margin. This scoring function is used in TransE (Bordes et al., 2013) , RotatE (Sun et al., 2018) , etc. Similarly, the ∥x∥ q 2 regularizations can also be considered (Ruffinelli et al., 2020) . ρ(h, r, h2t, 0) can be computed by ρ(h, r, h2t, 0) = arg max x σ (γ -∥f (h, r) -x∥) -λ∥x∥ p 2 . ( ) With similar tricks, we transform the "soft" regularization into the "hard" constraint. arg max x σ (γ -∥f (h, r) -x∥) -λ∥x∥ q 2 ≈ arg max ∥x∥ q 2 <δ [γ -∥f (h, r) -x∥] , where δ is another hyperparameter. We set δ > ∥f (h, r)∥ q 2 , then the optimal solution is f (h, r), which summarizes ρ(h, r, h2t, 0) := f (h, r). For the negated head-to-tail direction, the one-hop inference problem is arg max ∥x∥ q 2 <δ [1 -σ (γ -∥f (h, r) -x∥)] = arg max ∥x∥ q 2 <δ ∥f (h, r) -x∥ = -δ 1 q f (h, r). For simplicity, we choose ρ(h, r, h2t, 1) := -f (h, r). Our constructions for two types of KG representations share a unified closed-form logical message once the function f (h, r) is given. In the following part f is named as "forward" estimation function since it estimate the tail embeddings based on head and relation embedding in a forward direction. Therefore, we summarize the four types of logical messages used in LMPNN in the following proposition: Proposition 2. For a KG representation of either Type 1 and Type 2, we could define four closedform logical encoding functions with (1) relation embedding r and the corresponding reciprocal relation embedding r -1 and (2) the forward estimation function f as follows: ρ(h, r, h2t, 0) = f (h, r), (45) ρ(h, r, h2t, 1) = -f (h, r), ρ(t, r, t2h, 0) = f (t, r -1 ), ρ(t, r, t2h, 1) = -f (t, r -1 ). (48)

D.2 SIX KG REPRESENTATION EXAMPLES

Now it is ready to apply the Proposition 2 to six KG representations. For each KG representation, it is important to state its scoring function for triple (h, r, t) and the relation parameterization. We assume the reciprocal relation embeddings are already trained. Table 3 summarizes the information for RESCAL (Nickel et al., 2011 ) TransE (Bordes et al., 2013) , DistMult (Yang et al., 2014) , ComplEx (Trouillon et al., 2016) , ConvE (Dettmers et al., 2018) , and RotatE (Sun et al., 2018) . We list the relation parameter r, the essential one-hop inference function ρ(h, r, h2t, 0) = f (h, r), and the scoring function for each KG representation. The scoring function of ComplEx (Trouillon et al., 2016) is not the exact inner-product in the complex vector space, but it can be reduced to the inner product in the real vector space and has already been discussed in Appendix C. We see that the Proposition 2 and Table 3 covers the results in Appendix C by letting the reciprocal embedding of r -1 be the complex conjugate r of the original embedding r. integrations. These results suggest that neural models can be potentially improved with symbolic integration. The additional cost is the larger computational cost. We noticed that the task of answering logical queries are investigated over larger knowledge graphs (Ren et al., 2022) . When considering larger knowledge graphs, neural CQA methods (discussed in the Appendix E) and symbolic integrated methods (discussed in this part) have different scalabilities. For neural CQA models, the intermediate embeddings are of fixed dimensions, while the sizes of intermediate fuzzy sets used in the symbolic integration methods grow linearly with the size of the knowledge graph. Such difference makes neural-symbolic methods more resource demanding and they may suffer from the scalabilities issues. The differences between NELL and FB15k-237 can be explained by the quality of the ground knowledge graphs. However, integrating the symbolic method into neural CQA models and investigating the fundamental impact of ground KGs are beyond the scope of this paper. Our work connects the KG representation and neural CQA, which could also be combined with context and symbolic information. These extensions are left for future work and are expected to bring additional improvements.



Existing empirical evaluations are all conducted on queries without negation(Arakelyan et al., 2021)



Figure 2: An illustration of the two-stage procedures of logical message passing neural networks: (a) passing the logical messages across the graph; (b) updating the node embedding with the aggregated information with an MLP network.

Figure 3: The query graph for the query in Example 1.

Figure 4: The landscape of continuous truth value becomes non-convex after negation.

MRR results of different CQA models on three KGs. A P represents the average score of EPFO queries and A N represents the average score of queries with negation. The boldface indicates the best results for each KG.

MRR results of different hyperparameter settings compared to the best combination.Interestingly, LMPNN performs much better than the CQD on both EPFO and negation queries with the same pretrained knowledge graph representation. Our results show that the LMPNN is stronger than CQD in more complex queries, especially those with logical negation. Notably, our approach does not require any optimization in the inference time as inArakelyan et al. (2021). It confirms again that LMPNN successfully leverages the representation power of knowledge graph representation simply by training an MLP.

Benchmark comparison with neural CQA models on NELL queries.

Comparison between LMPNN and symbolic integration methods. The number in brackets indicate the order of trainable parameters.

acknowledgement

The authors of this paper were supported by the NSFC Fund (U20B2053) from the NSFC of China, the RIF (R6020-19 and R6021-20) and the GRF (16211520 and 16205322) from RGC of Hong Kong, the MHKJFS (MHP/001/19) from ITC of Hong Kong and the National Key R&D Program of China (2019YFE0198200) with special thanks to HKMAAC and CUSBLT, and the Jiangsu Province Science and Technology Collaboration Fund (BZ2021065). We also thank the support from NVIDIA AI Technology Center (NVAITC) and the UGC Research Matching Grants (RMGS20EG01-D, RMGS20CR11, RMGS20CR12, RMGS20EG19, RMGS20EG21, RMGS23CR05, RMGS23EG08).

annex

Published as a conference paper at ICLR 2023 Table 3 : Closed-form forward estimation function f for six KG representations. Closed-form logical message encoding function ρ can be easily constructed with the closed-form f . KG Embedding r parameters f (h, r) Scoring function RESCAL (Nickel et al., 2011) Wr Wrh ⟨f (h, r), t⟩ TransE (Bordes et al., 2013) r r + h γ -∥f (h, r) -t∥ DistMult (Yang et al., 2014) r r ⊗ h ⟨f (h, r), t⟩ ComplEx (Trouillon et al., 2016) r r ⊗ h Re⟨f (h, r), t⟩ ConvE (Dettmers et al., 2018) ω, W ReLU(vec(ReLU([e h ; er] * ω))W ) ⟨f (h, r), t⟩ RotatE (Sun et al., 2018) cos θ + i sin θ (cos θ + i sin θ) ⊗ h γ -∥f (h, r) -t∥ The performances of LMPNN with six backbone KG representations are presented in Table 5 . The LMPNN is trained in the suggested setting in the Section 7.3. The pretrain checkpoints of six backbone KG representations are obtained from Ruffinelli et al. (2020) . The information of the performances of each KG representation is listed in Table 4 .It could be found that, LMPNN achieves descent performances with simple KG backbones of relatively low dimensions (128D and 256D). ConvE (256D) (Dettmers et al., 2018) , DistMult (256D) (Yang et al., 2014) , and ComplEx (256D) Trouillon et al. (2016) outperform BetaE (800D) on both EPFO and negation queries. All KG representation except TransE (128D) (Bordes et al., 2013) could outperform BetaE (800D) (Ren & Leskovec, 2020) on negation queries. We note that adjust the hyperparameters, i.e., embedding dimensions, to obtain more powerful KG representations could improve the results. However, this is beyond the scope of this paper.

E NEURAL CQA BENCHMARK

In this section, we show that LMPNN (with ComplEx 2000D pretrained by (Arakelyan et al., 2021) ) is the new state-of-the-art method among all neural CQA models. We include the following neural CQA baselines that can address the EFO-1 queries. Other models that cannot answer EFO-1 queries are not compared (Ren et al., 2020; Choudhary et al., 2021; Liu et al., 2022) . We tried to reproduce the results reported, and we note that different model applies to different knowledge graphs.BetaE (Ren & Leskovec, 2020) The results of FB15k-237, FB15k, and NELL are shown in Table 6, Table 7, and Table 8 , respectively. We can see that LMPNN achieves the best performance among all neural complex query answering models.

F COMPARE TO SYMBOLIC INTEGRATION METHODS

Contextualized and symbolic information are shown to be effective to improve the neural models for both knowledge graph representation and complex query answering. For knowledge graph representation, neighboring information (Schlichtkrull et al., 2018; Wang et al., 2019; 2021a; Zhu et al., 2021) aggregated by graph neural networks of KG, external information (Xie et al., 2016a; b) by annotations, or even information from language models (Petroni et al., 2019; Liu et al., 2020) are also leveraged to make the knowledge graph representation more informative and effective. For complex query answering, neural models are enhanced with symbolic resoning (Zhu et al., 2022; Xu et al., 2022) that heavily search over the original symbolic space (Zhu et al., 2022) or its approximations (Cohen et al., 2020; Xu et al., 2022) . Unlike neural CQA models whose operations are always in the embedding space of fixed size, the size of the intermediate states for symbolic reasoning grows with the number of the entity sets, such as the fuzzy sets used in (Zhu et al., 2022; Xu et al., 2022) , and the beam-search variation of CQD (Arakelyan et al., 2021) .We refer to two methods with symbolic integration. We cannot reproduce the results since the codes for those two methods have not been released. However, since symbolic integration can also be applied to improve the LMPNN, we also list their results to show the potential. Zhu et al. (2022) : This requires 4 V100 GPU (32G), which is 8 times larger than the resources required by LMPNN. The official implementation has not been released. ENeSy (Xu et al., 2022) : The official implementation has not been released.

GNN-QE

Table 9 shows that LMPNN is also compatible even with the symbolic integrated models at EPFO queries with only 1% trainable parameters at NELL and 10% trainable parameters at FB15k-237. For FB15k-237, there are still gaps between the neural CQA models and the models with symbolic

