COMPLEX QUERY ANSWERING WITH NEURAL LINK PREDICTORS

Abstract

Neural link predictors are immensely useful for identifying missing edges in large scale Knowledge Graphs. However, it is still not clear how to use these models for answering more complex queries that arise in a number of domains, such as queries using logical conjunctions (∧), disjunctions (∨) and existential quantifiers (∃), while accounting for missing edges. In this work, we propose a framework for efficiently answering complex queries on incomplete Knowledge Graphs. We translate each query into an end-to-end differentiable objective, where the truth value of each atom is computed by a pre-trained neural link predictor. We then analyse two solutions to the optimisation problem, including gradient-based and combinatorial search. In our experiments, the proposed approach produces more accurate results than state-of-the-art methods -black-box neural models trained on millions of generated queries -without the need of training on a large and diverse set of complex queries. Using orders of magnitude less training data, we obtain relative improvements ranging from 8% up to 40% in Hits@3 across different knowledge graphs containing factual information. Finally, we demonstrate that it is possible to explain the outcome of our model in terms of the intermediate solutions identified for each of the complex query atoms. All our source code and datasets are available online 1 . Neural link predictors (Nickel et al., 2016) tackle the problem of identifying missing edges in large KGs. However, in many complex domains, an open challenge is developing techniques for answering complex queries involving multiple and potentially unobserved edges, entities, and variables, rather than just single edges. We focus on First-Order Logical Queries that use conjunctions (∧), disjunctions (∨), and existential quantifiers (∃). A multitude of queries can be expressed by using such operators -for instance, the query "Which drugs D interact with proteins associated with diseases t 1 or t 2 ?" can be rewritten as ?D : ∃P.interacts(D, P ) ∧ [assoc(P, t 1 ) ∨ assoc(P, t 2 )], which can be answered via sub-graph matching.

1. INTRODUCTION

Knowledge Graphs (KGs) are graph-structured knowledge bases, where knowledge about the world is stored in the form of relationship between entities. KGs are an extremely flexible and versatile knowledge representation formalism -examples include general purpose knowledge bases such as DBpedia (Auer et al., 2007) and YAGO (Suchanek et al., 2007) , domain-specific ones such as Bio2RDF (Dumontier et al., 2014) and Hetionet (Himmelstein et al., 2017) for life sciences and WordNet (Miller, 1992) for linguistics, and application-driven graphs such as the Google Knowledge Graph, Microsoft's Bing Knowledge Graph, and Facebook's Social Graph (Noy et al., 2019) . "Which drugs interact with proteins associated with diseases t₁ or t₂?" "Which directors directed actors that won either an Oscar or an Emmy?" Figure 1 : Examples of First-Order Logical Queries using existential quantification (∃), conjunction (∧), and disjunction (∨) operators -their dependency graphs are D ← P ← {t 1 , t 2 }, and D ← A ← {Oscar, Emmty}, respectively. However, plain sub-graph matching cannot capture semantic similarities between entities and relations, and cannot deal with missing facts in the KG. One possible solution consists in computing all missing entries via KG completion methods (Getoor & Taskar, 2007; De Raedt, 2008; Nickel et al., 2016) , but that would materialise a significantly denser KG and would have intractable space and time complexity requirements (Krompaß et al., 2014) . In this work, we propose a framework for answering First-Order Logic Queries, where the query is compiled in an end-to-end differentiable function, modelling the interactions between its atoms. The truth value of each atom is computed by a neural link predictor (Nickel et al., 2016 ) -a differentiable model that, given an atomic query, returns the likelihood that the fact it represents holds true. We then propose two approaches for identifying the most likely values for the variable nodes in a query -either by continuous or by combinatorial optimisation. Recent work on embedding logical queries on KGs (Hamilton et al., 2018; Daza & Cochez, 2020; Ren et al., 2020) has suggested that in order to go beyond link prediction, more elaborate architectures, and a large and diverse dataset with millions of queries is required. In this work, we show that this is not the case, and demonstrate that it is possible to use an efficient neural link predictor trained for 1-hop query answering, to generalise to up to 8 complex query structures. By doing so, we produce more accurate results than state-of-the-art models, while using orders of magnitude less training data. Summarising, in comparison with other approaches in the literature such as Query2Box (Ren et al., 2020) , we find that the proposed framework i) achieves significantly better or equivalent predictive accuracy on a wide range of complex queries, ii) is capable of out-of-distribution generalisation, since it is trained on simple queries only and evaluated on complex queries, and iii) is more explainable, since the intermediate results for its sub-queries and variable assignments can be used to explain any given answer.

2. EXISTENTIAL POSITIVE FIRST-ORDER LOGICAL QUERIES

A Knowledge Graph G ⊆ E × R × E can be defined as a set of subject-predicate-object s, p, o triples, where each triple encodes a relationship of type p ∈ R between the subject s ∈ E and the object o ∈ E of the triple, where E and R denote the set of all entities and relation types, respectively. One can think of a Knowledge Graph as a labelled multi-graph, where entities E represent nodes, and edges are labelled with relation types R. Without loss of generality, a Knowledge Graph can be represented as a First-Order Logic Knowledge Base, where each triple s, p, o denotes an atomic formula p(s, o), with p ∈ R a binary predicate and s, o ∈ E its arguments. Conjunctive queries are a sub-class of First-Order Logical queries that use existential quantification (∃) and conjunction (∧) operations. We consider conjunctive queries Q in the following form: Q[A] ?A : ∃V 1 , . . . , V m .e 1 ∧ . . . ∧ e n where e i = p(c, V ), with V ∈ {A, V 1 , . . . , V m }, c ∈ E, p ∈ R or e i = p(V, V ), with V, V ∈ {A, V 1 , . . . , V m }, V = V , p ∈ R. In Eq. ( 1), the variable A is the target of the query, V 1 , . . . , V m denote the bound variable nodes, while c ∈ E represent the input anchor nodes. Each e i denotes a logical atom, with either one (p(c, V )) or two variables (p(V, V )), and e 1 ∧ . . . ∧ e n denotes a conjunction between n atoms. The goal of answering the logical query Q consists in finding a set of entities Q ⊆ E such that a ∈ Q iff Q[a] holds true, where Q is the answer set of the query Q. As illustrated in Fig. 1 , the dependency graph of a conjunctive query Q is a graph representation of Q where nodes correspond to variable or non-variable atom arguments in Q and edges correspond to atom predicates. We follow Hamilton et al. (2018) and focus on valid conjunctive queries -i.e. the dependency graph needs to be a directed acyclic graph, where anchor entities correspond to source nodes, and the query target A is the unique sink node. Example 2.1 (Conjunctive Query). Consider the query "Which drugs interact with proteins associated with the disease t?". This query can be formalised as a conjunctive query Q such as ?D : ∃P.interacts(D, P ) ∧ assoc(P, t), where t is an input anchor node, the variable D is the target of the query, P is a bound variable node, and the dependency graph is D ← P ← t. The answer set Q of Q corresponds to the set of all drugs in E interacting with proteins associated with t. Handling Disjunctions So far we focused on conjunctive queries defined using the existential quantification (∃) and conjunction (∧) logical operators. In our framework, given a DNF query Q, for each of its conjunctive sub-queries we produce a score for all the entities representing the likelihood that they answer that sub-query. Finally, such scores are aggregated using a t-conorm -a continuous relaxation of the logical disjunction.

3. COMPLEX QUERY ANSWERING VIA OPTIMISATION

We propose a framework for answering EPFO logical queries in the presence of missing edges. Given a query Q, we define the score of a target node a ∈ E as a candidate answer for a query as a function of the score of all atomic queries in Q, given a variable-to-entity substitution for all variables in Q. Each variable is mapped to an embedding vector, that can either correspond to an entity c ∈ E or to a virtual entity. The score of each of the query atoms is determined individually using a neural link predictor (Nickel et al., 2016) . Then, the score of the query with respect to a given candidate answer Q[a] is computed by aggregating all atom scores using t-norms and t-conorms -continuous relaxations of the logical conjunction and disjunction operators. Neural Link Prediction A neural link predictor is a differentiable model where atom arguments are first mapped into a k-dimensional embedding space, and then used for producing a score for the atom. More formally, given a query atom p(s, o), where p ∈ R and s, o ∈ E, the score for p(s, o) is computed as φ p (e s , e o ), where e s , e o ∈ R k are the embedding vectors of s and o, and φ p : R k × R k → [0, 1 ] is a scoring function computing the likelihood that entities s and o are related by the relationship p. In our experiments, as neural link predictor, we use ComplEx (Trouillon et al., 2016) regularised using a variational approximation of the tensor nuclear p-norm proposed by Lacroix et al. (2018) . (Klement et al., 2000; 2004) . Some examples include the Gödel t-norm min (x, y) = min{x, y}, the product tnorm prod (x, y) = x•y, and the Łukasiewicz t-norm Luk (x, y) = max{0, x+y-1}. Analogously, t-conorms are dual to t-norms for disjunctions -given a t-norm , the complementary t-conorm is defined by T-Norms A t-norm : [0, 1] × [0, 1] → [0, 1] is a generalisation of conjunction in logic ⊥(x, y) = 1 -(1 -x, 1 -y). Continuous Reformulation of Complex Queries Let Q denote the following DNF query: Q[A] ?A : ∃V 1 , . . . , V m . e 1 1 ∧ . . . ∧ e 1 n1 ∨ .. ∨ e d 1 ∧ . . . ∧ e d n d where e j i = p(c, V ), with V ∈ {A, V 1 , . . . , V m }, c ∈ E, p ∈ R or e j i = p(V, V ), with V, V ∈ {A, V 1 , . . . , V m }, V = V , p ∈ R. (2) We want to know the variable assignments that render Q true. To achieve this. we can cast this as an optimisation problem, where the aim is finding a mapping from variables to entities that maximises the score of Q: arg max A,V1,...,Vm∈E e 1 1 . . . e 1 n1 ⊥ .. ⊥ e d 1 . . . e d n d where e j i = φ p (e c , e V ), with V ∈ {A, V 1 , . . . , V m }, c ∈ E, p ∈ R or e j i = φ p (e V , e V ), with V, V ∈ {A, V 1 , . . . , V m }, V = V , p ∈ R, where and ⊥ denote a t-norm and a t-conorm -a continuous generalisation of the logical conjunction and disjunction, respectively -and φ p (e s , e o ) ∈ [0, 1] denotes the neural link prediction score for the atom p(s, o). We write t-norms and t-conorms as infix operators since they are both associative. Note that, in Eq. ( 3), the bound variable nodes V 1 , . . . , V m are only used through their embedding vector: to compute φ p (e c , e V ) we only use the embedding representation e V ∈ R k of V , and do not need to know which entity the variable V corresponds to. This means that we have two possible strategies for finding the optimal variable embeddings e V ∈ R k with V ∈ {A, V 1 , . . . , V m } for maximising the objective in Eq. ( 3), namely continuous optimisation, where we optimise e V using gradient-based optimisation, and combinatorial optimisation, where we search for the optimal variable-to-entity assignment.

3.1. COMPLEX QUERY ANSWERING VIA CONTINUOUS OPTIMISATION

One way we can solve the optimisation problem in Eq. ( 3) is by finding the variable embeddings that maximise the score of a complex query. This can be formalised as the following continuous optimisation problem: where arg max e A ,e V 1 , e j i = φ p (e c , e V ), with V ∈ {A, V 1 , . . . , V m }, c ∈ E, p ∈ R or e j i = φ p (e V , e V ), with V, V ∈ {A, V 1 , . . . , V m }, V = V , p ∈ R. In Eq. ( 4) we directly optimise the embedding representations e A , e V1 , . . . , e Vm ∈ R k of variables A, V 1 , . . . , V m , rather than exploring the combinatorial space of variable-to-entity mappings. In this way, we can tackle the maximisation problem in Eq. ( 4) using gradient-based optimisation methods, such as Adam (Kingma & Ba, 2015) . Then, after we identified the optimal representation for variables A, V 1 , . . . , V m , we replace the query target embedding e A with the embedding representations e c ∈ R k of all entities c ∈ E, and use the resulting complex query score to compute the likelihood that such entities answer the query.

3.2. COMPLEX QUERY ANSWERING VIA COMBINATORIAL OPTIMISATION

Another way we tackle the optimisation problem in Eq. ( 3) is by greedily searching for a set of variable substitutions S = {A ← a, V 1 ← v 1 , . . . , V m ← v m }, with a, v 1 , . . . , v m ∈ E, that maximises the complex query score, in a procedure akin to beam search. We do so by traversing the dependency graph of a query Q and, whenever we find an atom in the form p(c, V ), where p ∈ R, c is either an entity or a variable for which we already have a substitution, and V is a variable for which we do not have a substitution yet, we replace V with all entities in E and retain the top-k entities t ∈ E that maximise φ p (e c , e t ) -i.e. the most likely entities to appear as a substitution of V according to the neural link predictor. Our procedure is akin to beam search: as we traverse the dependency graph of a query, we keep a beam with the most promising variable-to-entity substitutions identified so far. Example 3.1 (Combinatorial Optimisation). Consider the query "Which drugs D interact with proteins associated with disease t?" can be rewritten as: ?D : ∃P.interacts(D, P ) ∧ assoc(P, t). In order to answer this query via combinatorial optimisation, we first find the top-k proteins p that are most likely to substitute the variable P in assoc(P, t). Then, we search for the top-k drugs d that are most likely to substitute D in interacts(D, P ), ending up with at most k 2 candidate drugs. Finally, we rank the candidate drugs d by using the query score produced by the t-norm. Note that scoring all possible entities can be done efficiently and in a single step on a GPU by replacing V with the entity embedding matrix. In our experiments we did not notice any computational bottlenecks due to the branching factors of longer queries. However, that could be handled by using alternate graph exploration strategies.

4. RELATED WORK

This work is closely related to approaches for learning to traverse Knowledge Graphs (Guu et al., 2015; Das et al., 2017; 2018) , and more recent works on answering conjunctive queries via blackbox neural models trained on generated queries (Hamilton et al., 2018; Daza & Cochez, 2020; Kotnis et al., 2020) . The main difference is that we propose a tractable framework for handling a substantially larger subset of First-Order Logic queries. More recently, Ren et al. ( 2020) proposed Query2Box, a neural model for Existential Positive First-Order logical queries, where queries are represented via box embeddings (Li et al., 2019) . Such approaches for query answering require a dataset with millions of generated queries to generalise well -for instance, on the FB15k-237 dataset, approx. 15 × 10 4 training queries for each query type are used, resulting in approx. 1.2 × 10 6 training queries. Our framework, on the other hand, only uses a simple, state-of-the-art neural link predictor (Lacroix et al., 2018) trained on a set of 1-hop queries that is orders of magnitude smaller. There is a large body of work on neural link predictors, that learn embeddings of entities and relations in KGs via a simple link prediction training objective (Bordes et al., 2013; Yang et al., 2015; Trouillon et al., 2016; Lacroix et al., 2018) . Due to their design, they are often evaluated for answering 1-hop queries only, as their application to more complex queries does not derive directly from their formulation. Previous work has considered using such embeddings for complex query answering, by partitioning the query graph and using an ad-hoc aggregation function to score candidate answers (Wang et al., 2018) , or by using a probabilistic mixture model similar to DistMult (Friedman & den Broeck, 2020) . In contrast, our proposed method answers a query by using a single pass where aggregation steps are implemented with t-norms and t-conorms, which are continuous relaxations of conjunctions and disjunctions. Such t-norms have been proposed as differentiable formulations of logical operators suitable for gradient-based learning (Serafini & d'Avila Garcez, 2016; Guo et al., 2016; Minervini et al., 2017; van Krieken et al., 2020) . Further alternatives for using embeddings from neural link predictors, such as combinatorial optimisation, have been ruled out as unfeasible (Hamilton et al., 2018; Daza & Cochez, 2020) . We show that this approach can scale well by reducing the set of possible intermediate answers, while outperforming the state-of-the-art in query answering. The framework proposed in this paper is related to neural theorem provers (Rocktäschel & Riedel, 2017; Weber et al., 2019; Minervini et al., 2020a; b) , a differentiable relaxation of the backwardchaining reasoning algorithm where comparison between symbols is replaced by a differentiable similarity function between their embedding vectors. During the reasoning process, neural theorem provers check which rules can be used for proving a given atomic query. Then it is checked whether the premise of such rules is satisfied, where the premise is a conjunctive query. The procedure they use for answering conjunctions is akin to the combinatorial optimisation procedure we propose in Section 3.2. The main source of difference is how atomic queries are answered -we use the Com-plEx neural link predictor (Trouillon et al., 2016) , while neural theorem provers use the maximum similarity value between a given atomic query and all facts in the Knowledge Graph, which has linear complexity in the number of triples in the graph. ). An example of a pi query is ?T : ∃V.p(a, V ), q(V, T ), r(b, T ), where a and b are anchor nodes, V is a variable node, and T is the query target node. 

5. EXPERIMENTS

We described a method to answer a query by decomposing it into a continuous formulation, which we refer to as Continuous Query Decomposition (CQD). In this section we demonstrate the effectiveness of CQD on the task of answering complex queries that cannot be answered using the incomplete KG, and report experimental results for continuous optimisation (CQD-CO, Section 3.1) and beam search (CQD-Beam, Section 3.2). We also provide a qualitative analysis of how our method can be used to obtain explanations for a given complex query answer. For the sake of comparison, we use the same datasets and evaluation metrics as Ren et al. (2020).

5.1. DATASETS

Following Ren et al. (2020) , we evaluate our approach on FB15k (Bordes et al., 2013) and FB15k-237 (Toutanova & Chen, 2015) -two subset of the Freebase knowledge graph -and NELL995 (Xiong et al., 2017) , a KG generated by the NELL system (Mitchell et al., 2015) . In order to compare with previous work on query answering, we use the queries generated by Ren et al. ( 2020) from these datasets. Dataset statistics are detailed in Table 1 . We consider a total of 9 query types, including atomic queries, and 2 query types that contain disjunctions -the different query types are shown in Fig. 2 . Note that in our framework, the neural link predictor is only trained on atomic queries, while the evaluation is carried out on the complete set of query types in Fig. 2 . Note that each query in Table 1 can have multiple answers, therefore the total number of training instances can be higher. For atomic queries (of type 1p), this number is equal to the number of edges in the training graph. Other methods like GQE (Hamilton et al., 2018) and Q2B (Ren et al., 2020) require a dataset with more query types. As an example, the FB15k dataset contains approximately 960k instances for 1p queries. When adding 2p, 3p, 2i, and 3i queries employed by GQE and Q2B during training, this number increases to 65 million instances.

5.2. MODEL DETAILS

To obtain embeddings for the query answering task, we use ComplEx (Trouillon et al., 2016) a variational approximation of the nuclear tensor p-norm for regularisation (Lacroix et al., 2018) . We fix a learning rate of 0.1 and use the Adagrad optimiser. We then tune the hyperparameters of ComplEx on the validation set for each dataset, via grid search. We consider ranks (size of the embedding) in {100, 200, 500, 1000}, batch size in {100, 500, 1000}, and regularisation coefficients in the interval 10 -4 , 0.5 . For query answering we experimented with the Gödel and product t-norms -we select the best t-norm for each query type according to the best validation accuracy. For CQD-CO, we optimise variable and target embeddings with Adam, using the same initialisation scheme as Lacroix et al. (2018) , with an initial learning rate of 0.1 and a maximum of 1,000 iterations. In practice, we observed that the procedure usually converges in less than 300 iterations. For CQD-Beam, the beam size k ∈ {2 2 , 2 3 , . . . , 2 8 } is found on an held-out validation set.

5.3. EVALUATION

As in Ren et al. (2020) , for each test query, we assign a score to every entity in the graph, and use such score for ranking such entities. We then compute the Hits at 3 (H@3) metric, which measures the frequency with which the correct answer is contained in the top three answers in the ranking. Since a query can have multiple answers, we use the filtered setting (Bordes et al., 2013) , where we filter out other correct answers from the ranking before calculating the H@3. As baselines we use two recent state-of-the-art models for complex query answering, namely Graph Query Embedding (GQE, Hamilton et al., 2018) and Query2Box (Q2B, Ren et al., 2020).

5.4. RESULTS

We detail the results of H@3 for all different query types in Table 2 . We observe that, on average, CQD produces more accurate results than GQE and Q2B, while using orders of magnitude less training data. In particular, combinatorial optimisation in CQD-Beam consistently outperforms the baselines across all datasets. The results for chained queries (2p and 3p) show that CQD-Beam is effective, even when increasing the length of the chain. The most difficult case corresponds to 3p queries, where the number of candidate variable substitutions increases due to the branching factor of the search procedure. We also note that having more variables does not always translate into worse performance for CQD-CO: it yields the best ranking scores for ip queries on FB15k-237, and for ip and pi queries for NELL995, and both such query types contain two variables. Figure 3 : Intermediate variable assignments and ranks for two example queries, obtained with CQD-Beam. Correctness indicates whether the answer belongs to the ground-truth set of answers. The results presented in Table 2 were obtained with a rank of 1,000, as they produced the best performance in the validation set. We present results for other values of the rank in Appendix A, where we observe that even with a rank of 100, CQD still outperforms baselines with a larger embedding size. Furthermore, in Appendix B, we report the number of seconds required to answer each query type, showing that CQD-Beam requires less than 50ms for all considered queries. We also experimented with a variant of CQD-Beam that uses DistMult (Yang et al., 2015) as the link predictor -results are reported in Appendix C. As expected, results when using DistMult are slightly less accurate than when using ComplEx, while still being more accurate than those produced by GQE and Q2B.

5.5. EXPLAINING ANSWERS TO COMPLEX QUERIES

A useful property of our framework is its transparency when computing scores for distinct atoms in a query. Unlike GQE and Q2B -two neural models that encode a query into a vector via a set of non-linear transformations -our framework is able to produce an explanation for a given answer in terms of intermediate variable assignments. Consider the following test query from the FB15k-237 knowledge graph: "In what genres of movies did Martin Lawrence appear?" This query can be formalised as ?G : ∃M.perform(ML, M ) ∧ genre(M, G), where ML is an anchor node representing Martin Lawrence. The ground truth answers to this query are 7 genres, including Drama, Comedy, and Crime Fiction. In Fig. 3 we show the intermediate assignments obtained when using CQD-Beam, to the variable M in the query, and the rank for each combination of movie M and genre G. We note that the assignments to the variable M are correct, as these are movies where Martin Lawrence appeared. Furthermore, these intermediate assignments lead to correct answers in the first seven positions of the ranking, which correctly belong to the ground-truth set of answers. In a second example, consider the following query: "What international organisations contain the country of nationality of Thomas Aquinas?" Its conjunctive form is ?O : ∃C.nationality(TA, C) ∧ memberOf(C, O), where TA is an anchor node representing Thomas Aquinas. The ground-truth answers to this query are the Organisation for Economic Co-operation and Development (OECD), the European Union (EU), the North Atlantic Treaty Organisation (NATO), and the World Trade Organisation (WTO). As shown in Fig. 3 , CQD-Beam yields the correct answers in the first four positions in the rank. However, by inspecting the intermediate assignments, we note that such correct answers are produced by an incorrect (although related) intermediate assignment, since the country of nationality of Thomas Aquinas is Italy. By inspecting these decisions we can thus identify failure modes of our framework, even when it produces seemingly correct answers. This is in contrast with other neural black-box models for complex query answering outlined in Section 4, where such an analysis is not possible.

6. CONCLUSIONS

We proposed a framework -Complex Query Decomposition (CQD) -for answering Existential Positive First-Order logical queries by reasoning over sets of entities in embedding space. In our framework, answering a complex query is reduced to answering each of its sub-queries, and aggregating the resulting scores via t-norms. The benefit of the method is that we only need to train a neural link prediction model on atomic queries to use our framework for answering a given complex query, without the need of training on millions of generated complex queries. This comes with the added value that we are able to explain each step of the query answering process regardless of query complexity, instead of using a black-box neural query embedding model. The proposed method is agnostic to the type of query, and is able to generalise without explicitly training on a specific variety of queries. Experimental results show that produces significantly more accurate results than current state-of-the-art complex query answering methods on incomplete Knowledge Graphs. In Table 3 we report results for CQD-CO (Section 3.1) and CQD-Beam (Section 3.2) for different rank (embedding size) values. We can see that the model produces very accurate results even with significantly fewer parameters.



At https://github.com/uclnlp/cqd † Equal contribution, alphabetical order.



∃A . directs(D, A) ∧ [ prize(A, Oscar) ∨ prize(A, Emmy) ]

Figure 2: Query structures considered in our experiments, as proposed by Ren et al. (2020) -the naming of each query structure corresponds to projection (p), intersection (i), and union (u), and reflects how they were implemented in the Query2Box model(Ren et al., 2020). An example of a pi query is ?T : ∃V.p(a, V ), q(V, T ), r(b, T ), where a and b are anchor nodes, V is a variable node, and T is the query target node.

...,e Vm ∈R k

Number of queries in the datasets used for evaluation of query answering performance. Others indicates the number of queries for each of the remaining types.

Complex query answering results (H@3) across all query types; results for Graph Query Embedding (GQE,Hamilton et al., 2018) and Query2Box (Ren et al., 2020)  are from Ren et al. (2020).

?G : ∃M . perform(ML, M) ∧ genre(M, G)

Complex query answering results (H@3) across all query types, for different rank (embedding size) values -results for Graph Query Embedding (GQE,Hamilton et al., 2018) and Query2Box (Ren et al., 2020) are from Ren et al. (2020). .770 0.585 0.785 0.828 0.373 0.679 0.815 0.357 500 0.912 0.759 0.580 0.772 0.817 0.372 0.650 0.831 0.351 1000 0.918 0.779 0.584 0.796 0.837 0.377 0.658 0.839 0.355 .348 0.296 0.406 0.525 0.166 0.291 0.527 0.149 1000 0.667 0.343 0.297 0.410 0.529 0.168 0.283 0.536 0.157

ACKNOWLEDGEMENTS

This research was supported by the European Union's Horizon 2020 research and innovation programme under grant agreement no. 875160. This project was partially funded by Elsevier's Discovery Lab. Finally, we thank NVIDIA for GPU donations.

B TIMING EXPERIMENTS

In Fig. 4 and Fig. 5 we report the time (seconds) required by Q2B (Ren et al., 2020) and CQD-Beam (Section 3.2 for answering each query type, aggregated over FB15k, FB15k-237, and NELL. We can see that, in CQD-Beam, the main computation bottleneck are multi-hop queries, since the model is required to invoke the neural link prediction model for each step of the chain to obtain the top-k candidates for the next step in the chain. In Table 4 we report the results for CQD-Beam with two different neural link prediction namely ComplEx (Trouillon et al., 2016) and DistMult (Yang et al., 2015) . Both models were trained using the loss and regulariser proposed by Lacroix et al. (2018) , and their hyperparameters were tuned according to their performance in the validation set; in both cases, the embedding size is set to 1,000. As expected, CQD-Beam with DistMult produces slightly less accurate results than with ComplEx, while still yielding more accurate results than the Q2B and GQE baselines.

