EXPRESSIVE: A SPATIO-FUNCTIONAL EMBEDDING FOR KNOWLEDGE GRAPH COMPLETION

Abstract

Knowledge graphs are inherently incomplete. Therefore substantial research has been directed toward knowledge graph completion (KGC), i.e., predicting missing triples from the information represented in the knowledge graph (KG). KG embedding models (KGEs) have yielded promising results for KGC, yet any current KGE is incapable of: (1) fully capturing vital inference patterns (e.g., composition), (2) capturing prominent patterns jointly (e.g., hierarchy and composition), and (3) providing an intuitive interpretation of captured patterns. In this work, we propose ExpressivE, a fully expressive spatio-functional KGE that solves all these challenges simultaneously. ExpressivE embeds pairs of entities as points and relations as hyper-parallelograms in the virtual triple space R 2d . This model design allows ExpressivE not only to capture a rich set of inference patterns jointly but additionally to display any supported inference pattern through the spatial relation of hyper-parallelograms, offering an intuitive and consistent geometric interpretation of ExpressivE embeddings and their captured patterns. Experimental results on standard KGC benchmarks reveal that ExpressivE is competitive with state-of-the-art KGEs and even significantly outperforms them on WN18RR. Knowledge graphs (KGs) are large collections of triples r i (e h , e t ) over relations r i ∈ R and entities e h , e t ∈ E used for representing, storing, and processing information. Real-world KGs such as Freebase (Bollacker et al., 



Therefore, capturing general composition is still an open problem. Even more, composition patterns describe paths, which are fundamental for navigation within a graph. Hence, the ability to capture general composition is vital for KGEs. In contrast, approaches such as SimplE (Kazemi & Poole, 2018) , ComplEx (Trouillon et al., 2016) , and BoxE (Abboud et al., 2020) have managed to capture other vital patterns, such as hierarchy, yet are unable to capture any notion of composition. Table 1 : This table lists patterns that several KGEs can capture. Specifically, ✓ represents that the pattern is supported and ✗ that it is not supported. Furthermore, "Comp. def." stands for compositional definition and "Gen. comp." for general composition.

Inference Pattern

ExpressivE BoxE RotatE TransE DistMult ComplEx Symmetry: r 1 (X, Y ) ⇒ r 1 (Y, X) ✓ ✓ ✓ ✗ ✓ ✓ Anti-symmetry: r 1 (X, Y ) ⇒ ¬r 1 (Y, X) ✓ ✓ ✓ ✓ ✗ ✓ Inversion: r 1 (X, Y ) ⇔ r 2 (Y, X) ✓ ✓ ✓ ✓ ✗ ✓ Comp. def.: r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r 3 (X, Z) ✓ ✗ ✓ ✓ ✗ ✗ Gen. comp.: r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) ✓ ✗ ✗ ✗ ✗ ✗ Hierarchy: r 1 (X, Y ) ⇒ r 2 (X, Y ) ✓ ✓ ✗ ✗ ✓ ✓ Intersection: r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) ✓ ✓ ✓ ✓ ✗ ✗ Mutual exclusion: r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ ✓ ✓ ✓ ✓ ✓ ✓ Challenge. While the extensive research on composition (Bordes et al., 2013; Sun et al., 2019; Zhang et al., 2019; Lu & Hu, 2020) and hierarchy (Yang et al., 2015a; Trouillon et al., 2016; Kazemi & Poole, 2018; Abboud et al., 2020) highlights their importance, any KGE so far is incapable of: (1) capturing general composition, (2) capturing composition and hierarchy jointly, and (3) providing an intuitive geometric interpretation of captured inference patterns. Contribution. This paper focuses on solving all the stated limitations simultaneously. In particular: • We introduce the spatio-functional embedding model ExpressivE. It embeds pairs of entities as points and relations as hyper-parallelograms in the space R 2d , which we call the virtual triple space. The virtual triple space allows ExpressivE to represent patterns through the spatial relationship of hyper-parallelograms, offering an intuitive and consistent geometric interpretation of ExpressivE embeddings and their captured patterns. • We prove that ExpressivE can capture any pattern listed in Table 1 . This makes ExpressivE the first model capable of capturing both general composition and hierarchy jointly. • We prove that our model is fully expressive, making ExpressivE the first KGE that both supports composition and is fully expressive. • We evaluate ExpressivE on the two standard KGC benchmarks WN18RR (Dettmers et al., 2018) and FB15k-237 (Toutanova & Chen, 2015) , revealing that ExpressivE is competitive with state-of-the-art (SotA) KGEs and even significantly outperforms them on WN18RR. Organization. Section 2 introduces the KGC problem and methods for evaluating KGEs. Section 3 embeds ExpressivE in the context of related work. Section 4 introduces ExpressivE, the virtual triple space, and interprets our model's parameters within it. Section 5 analyzes our model's expressive power and inference capabilities. Section 6 discusses experimental results together with our model's space complexity and Section 7 summarizes our work. The appendix contains all proofs of theorems.

2. KNOWLEDGE GRAPH COMPLETION

This section introduces the KGC problem and evaluation methods (Abboud et al., 2020) . Let us first introduce the triple vocabulary T , consisting of a finite set of entities E and relations R. We call an expression of the form r i (e h , e t ) a triple, where r i ∈ R and e h , e t ∈ E. Furthermore, we call e h the head and e t the tail of the triple. Now, a KG G is a finite set of triples over T and KGC is the problem of predicting missing triples. KGEs can be evaluated by means of an: (1) experimental evaluation on benchmark datasets, (2) analysis of the model's expressiveness, and (3) analysis of the inference patterns that the model can capture. We will discuss each of these points in what follows. Experimental Evaluation. The experimental evaluation of KGEs requires a set of true and corrupted triples. True triples r i (e h , e t ) ∈ G are corrupted by replacing either e h or e t with any e c ∈ E such hyper-rectangles (boxes) represent entity classes, capturing class hierarchies naturally through the spatial subsumption of these boxes (Vilnis et al., 2018; Subramanian & Chakrabarti, 2018; Li et al., 2019) . Also, query answering systems -such as Query2Box (Ren et al., 2020) -have used boxes to represent answer sets due to their intuitive interpretation as sets of entities. Although Query2Box can be used for KGC, entity classification approaches cannot scalably be employed in the general KGC setting, as this would require an embedding for each entity tuple (Abboud et al., 2020) . BoxE (Abboud et al., 2020) is the first spatial KGE dedicated to KGC. It embeds relations as a pair of boxes and entities as a set of points and bumps in the embedding space. The usage of boxes enables BoxE to capture any inference pattern that can be described by the intersection of boxes in the embedding space, such as hierarchy. Moreover, boxes enable BoxE to capture 1-N, N-1, and N-N relations naturally. Yet, BoxE cannot capture any notion of composition (Abboud et al., 2020) . Our Work. These research gaps, namely that any KGE cannot capture general composition and hierarchy jointly, have motivated our work. In contrast to prior work, our model defines for each relation a hyper-parallelogram, allowing us to combine the benefits of both spatial and functional models. Even more, prior work primarily analyzes the embedding space itself, while we propose the novel virtual triple space that allows us to display any captured inference pattern -including general composition -through the spatial relation of hyper-parallelograms.

4. EXPRESSIVE AND THE VIRTUAL TRIPLE SPACE

This section introduces ExpressivE, a KGE targeted toward KGC with the capabilities of capturing a rich set of inference patterns. ExpressivE embeds entities as points and relations as hyperparallelograms in the virtual triple space R 2d . More concretely, instead of analyzing our model in the d-dimensional embedding space R d , we construct the novel virtual triple space that grants Expres-sivE's parameters a geometric meaning. Above all, the virtual triple space allows us to intuitively interpret ExpressivE embeddings and their captured patterns, as discussed in Section 5. Representation. Entities e j ∈ E are embedded in ExpressivE via a vector e j ∈ R d , representing points in the latent embedding space R d . Relations r i ∈ R are embedded as hyper-parallelograms in the virtual triple space R 2d . More specifically, ExpressivE assigns to a relation r i for each of its arity positions p ∈ {h, t} the following vectors: (1) a slope vector r p i ∈ R d , (2) a center vector c p i ∈ R d , and (3) a width vector d p i ∈ (R ≥0 ) d . Intuitively, these vectors define the slopes r p i of the hyper-parallelogram's boundaries, its center c p i and width d p i . A triple r i (e h , e t ) is captured to be true in an ExpressivE model if its relation and entity embeddings satisfy the following inequalities: (e h -c h i -r t i ⊙ e t ) |.| ⪯ d h i (1) (e t -c t i -r h i ⊙ e h ) |.| ⪯ d t i Where x |.| represents the element-wise absolute value of a vector x, ⊙ represents the Hadamard (i.e., element-wise) product and ⪯ represents the element-wise less or equal operator. It is very complex to interpret this model in the embedding space R d . Hence, we construct followingly a virtual triple space in R 2d that will ease reasoning about the parameters and inference capabilities of ExpressivE. Virtual Triple Space. We construct this virtual space by concatenating the head and tail entity embeddings. In detail, this means that any pair of entities (e h , e t ) ∈ E × E defines a point in the virtual triple space by concatenating their entity embeddings e h , e t ∈ R d , i.e., (e h ||e t ) ∈ R 2d , where || is the concatenation operator. A set of important sub-spaces of the virtual triple space are the 2-dimensional spaces, created from the j-th embedding dimension of head entities and the j-th dimension of tail entities -i.e., the j-th and (d + j)-th virtual triple space dimensions. We call them correlation subspaces, as they visualize the captured relation-specific dependencies of head and tail entity embeddings as will be discussed followingly. Moreover, we call the correlation subspace spanned by the j-th and (d + j)-th virtual triple space dimension the j-th correlation subspace. Parameter Interpretation. Inequalities 1 and 2 construct each an intersection of two parallel halfspaces in any correlation subspace of the virtual triple space. We call the intersection of two parallel half-spaces a band, as they are limited by two parallel boundaries. Henceforth, we will denote with v(j) the j-th dimension of a vector v. For example, (e h (j) -c h i (j) -r t i (j) ⊙ e t (j)) |.| ⪯ d h i (j) defines a band in the j-th correlation subspace. The intersection of two bands results either in a band (if one band subsumes the other) or a parallelogram. Since we are interested in constructing ExpressivE embeddings that capture certain inference patterns, it is sufficient to consider parallelograms for these constructions. Figure 1a visualizes a relation parallelogram (green solid) and its parameters (orange dashed) in the j-th correlation subspace. In essence, the parallelogram is the result of the intersection of two bands (thick blue and magenta lines), where its boundaries' slopes are defined by r p i , the center of the parallelogram is defined by c p i , and finally, the widths of each band are defined by d p i . Since Inequalities 1 and 2 solely capture dependencies within the same dimension, any two different dimensions j ̸ = k of head and tail entity embeddings are independent. Thus, relations are embedded as hyper-parallelograms in the virtual triple space, whose edges are solely crooked in any j-th correlation subspace. Intuitively, the crooked edges represent relation-specific dependencies between head and tail entities and are thus vital for the expressive power of ExpressivE. Note that each correlation subspace represents one dimension of the element-wise Inequalities 1 and 2. Since the sum of all correlation subspaces represents all dimensions of Inequalities 1 and 2, it is sufficient to analyze all correlation subspaces to identify the captured inference patterns of an ExpressivE model. Scoring Function. Let τ ri(h,t) denote the embedding of a triple r i (h, t), i.e., τ ri(h,t) = (e htc ht i -r th i ⊙ e th ) |.| , with e xy = (e x ||e y ) and a xy i = (a x i ||a y i ) for a ∈ {c, r, d} and x, y ∈ {h, t}. D(h, r i , t) = τ ri(h,t) ⊘ w i , if τ ri(h,t) ⪯ d ht i τ ri(h,t) ⊙ w i -k, otherwise Equation 3 states the typical distance function of spatial KGEs (Abboud et al., 2020) , where w i = 2 ⊙ d ht i + 1 is a width-dependent factor and k = 0.5 ⊙ (w i -1) ⊙ (w i -1 ⊘ w i ). If a triple r i (h, t ) is captured to be true by an ExpressivE embedding, i.e., if τ ri(h,t) ⪯ d ht i , then the distance correlates inversely with the hyper-parallelogram's width, keeping low distances/gradients within the parallelogram. Otherwise, the distance correlates linearly with the width to penalize points outside larger parallelograms. Appendix J provides further details on the distance function. The scoring function is defined as s(h, r i , t) = -||D(h, r i , t)|| 2 . Following Abboud et al. (2020) , we optimize the self-adversarial negative sampling loss (Sun et al., 2019) using the Adam optimizer (Kingma & Ba, 2015) . We have provided more details on the training setup in Appendix M.

5. KNOWLEDGE CAPTURING CAPABILITIES

This section analyzes ExpressivE's expressive power and supported patterns. In what follows, we assume the standard definition of capturing patterns (Sun et al., 2019; Abboud et al., 2020) . This means intuitively that a KGE captures a pattern if a set of parameters exists such that the pattern is captured exactly and exclusively. Appendix C formalizes this notion for our model.

5.1. EXPRESSIVENESS

This section analyzes whether ExpressivE is fully expressive (Abboud et al., 2020) , i.e., can capture any graph G over R and E. Theorem 5.1 proves that this is the case by constructing for any graph G an ExpressivE embedding that captures any triple within G to be true and any other triple to be false. Specifically, the proof uses induction, starting with an embedding that captures the complete graph, i.e., any triple over E and R is true. Next, each induction step shows that we can alter the embedding to make an arbitrarily picked triple of the form r i (e j , e k ) with r i ∈ R, e j , e k ∈ E and e j ̸ = e k false. Finally, we add |E| * |R| dimensions to make any self-loop -i.e., any triple of the form r i (e j , e j ) with r i ∈ R and e j ∈ E -false. The full, quite technical proof can be found in Appendix D. 

5.2. INFERENCE PATTERNS

This section proves that ExpressivE can capture any pattern from Table 1 . First, we discuss how ExpressivE represents inference patterns with at most two variables. Next, we introduce the notion of compositional definition and continue by identifying how this pattern is described in the virtual triple space. Then, we define general composition, building on both the notion of compositional definition and hierarchy. Finally, we conclude this section by discussing the key properties of ExpressivE. Two-Variable Patterns. Figure 1b displays several one-dimensional relation embeddings and their captured patterns in a correlation subspace. Intuitively, ExpressivE represents: (1) symmetry patterns r 1 (X, Y ) ⇒ r 1 (Y, X) via symmetric hyper-parallelograms, (2) anti-symmetry patterns r 1 (X, Y ) ⇒ ¬r 1 (Y, X) via hyper-parallelograms that do not overlap with their mirror image, (3 ) inversion patterns r 1 (X, Y ) ⇔ r 2 (Y, X) via r 2 's hyper-parallelogram being the mirror image of r 1 's, (4) hierarchy patterns r 1 (X, Y ) ⇒ r 2 (X, Y ) via r 2 's hyper-parallelogram subsuming r 1 's, (5) intersection patterns r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) via r 3 's hyper-parallelogram subsuming the intersection of r 1 's and r 2 's, and (6) mutual exclusion patterns r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ via mutually exclusive hyper-parallelograms of r 1 and r 2 . We have formally proven that ExpressivE can capture any of these two-variable inference patterns in Theorem 5.2 (see Appendices F and G). Compositional Definition. A compositional definition pattern is of the form r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z), where we call r 1 and r 2 the composing and r d the compositionally defined relation. In essence, this pattern defines a relation r d that describes the start and end entities of a path X r1 -→ Y r2 -→ Z. Since any two relations r 1 and r 2 can instantiate the body of a compositional definition pattern, any such pair may produce a new compositionally defined relation r d . Interestingly, compositional definition translates analogously into the virtual triple space: Intuitively, this means that the embeddings of any two relations r 1 and r 2 define for r d a convex region -which we call the compositionally defined region -that captures r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z), leading to Theorem 5.3 (proven in Appendix E). Based on this insight, ExpressivE captures compositional definition patterns by embedding the compositionally defined relation r d with the compositionally defined region, defined by the relation embeddings of r 1 and r 2 . We have formally proven that ExpressivE can capture compositional definition in Theorem 5.4 (see Appendices F and G). Theorem 5.3 Let r 1 , r 2 , r d ∈ R be relations, s 1 , s 2 be their ExpressivE embeddings, and assume r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds. Then there exists a region s d in the virtual triple space R 2d such that (i) s 1 , s 2 , and s d capture r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) and (ii) s d is convex. General Composition. In contrast to compositional definition, general composition r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) does not specify the composed relation r 3 completely. Specifically, general composition allows the relation r 3 to include additional entity pairs not described by the start and end entities of the path X r1 -→ Y r2 -→ Z. Therefore, to capture general composition, we need to combine hierarchy and compositional definition. Formally this means that we express general composition as: {r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z), r d (X, Y ) ⇒ r 3 (X, Y )}. We have proven that ExpressivE can capture general composition in Theorem 5.4 (see Appendices F and G for the full proofs). Theorem 5.4 ExpressivE captures compositional definition and general composition. We argue that hierarchy and general composition are very tightly connected as hierarchies are hidden within general composition. If, for instance, r 1 were to represent the relation that solely captures selfloops, then the general composition r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) would reduce to a hierarchy r 2 (X, Y ) ⇒ r 3 (X, Y ). This hints at why our model is the first to support general composition, as ExpressivE can capture both hierarchy and composition jointly in a single embedding space. Key Properties. ExpressivE's way of capturing inference patterns has several interesting implications: 1. We observe that ExpressivE embeddings offer an intuitive geometric interpretation: there is a natural correspondence between (a) relations in the KG -and -regions (representing mathematical relations) in the virtual triple space, (b) relation containment, intersection, and disjointness in the KG -and -region containment, intersection, and disjointness in the virtual triple space, (c) symmetry, anti-symmetry, and inversion in the KG -and -symmetry, anti-symmetry, and reflection in the virtual triple space, (d) compositional definition in the KG -and -the composition of mathematical relations in the virtual triple space. 2. Next, we observe that ExpressivE captures a general composition pattern if the hyperparallelogram of the pattern's head relation subsumes the compositionally defined region defined by its body relations. Thereby, ExpressivE assigns a novel spatial interpretation to general composition patterns, generalizing the spatial interpretation that is directly provided by set-theoretic patterns such as hierarchy, intersection, and mutual exclusion. 3. Finally, capturing general composition patterns through the subsumption of spatial regions allows ExpressivE to provably capture composition patterns for 1-N, N-1, and N-N relations. We provide further empirical evidence to this in Appendix I.1.

6. EXPERIMENTAL EVALUATION AND SPACE COMPLEXITY

In this section, we evaluate ExpressivE on the standard KGC benchmarks WN18RR (Dettmers et al., 2018) and FB15k-237 (Toutanova & Chen, 2015) and report SotA results, providing strong empirical evidence for the theoretical strengths of ExpressivE. Furthermore, we perform an ablation study on ExpressivE's parameters to quantify the importance of each parameter and finally perform a relation-wise performance comparison on WN18RR to provide an in-depth analysis of our results.

6.1. KNOWLEDGE GRAPH COMPLETION

Experimental Setup. As in Abboud et al. (2020) , we compare ExpressivE to the functional models TransE (Bordes et al., 2013) and RotatE (Sun et al., 2019) , spatial model BoxE (Abboud et al., 2020) , and bilinear models DistMult (Yang et al., 2015a) , ComplEx (Trouillon et al., 2016) , and TuckER (Balazevic et al., 2019) . ExpressivE is trained with gradient descent for up to 1000 epochs, stopping the training if after 100 epochs the Hits@10 score did not increase by at least 0.5% for WN18RR and 1% for FB15k-237. We use the model of the final epoch for testing. Each experiment was repeated 3 times to account for small performance fluctuations. In particular, the MRR values fluctuate by less than 0.003 between runs for any dataset. We maintain the fairness of our result comparison by considering KGEs with a dimensionality d ≤ 1000 (Balazevic et al., 2019; Abboud et al., 2020) . To allow a direct comparison of ExpressivE's performance and parameter efficiency to its closest functional relative RotatE and spatial relative BoxE, we employ the same embedding dimensionality for the benchmarks as RotatE and BoxE. Appendix M lists further setup details, hyperparameters, libraries (Ali et al., 2021) , hardware details, definitions of metrics, and properties of datasets. (Sun et al., 2019) , BoxE (Abboud et al., 2020) , DistMult and ComplEx (Ruffinelli et al., 2020; Yang et al., 2015b) , and TuckER (Balazevic et al., 2019) . Family Model WN18RR FB15k-237

Func. & Spatial

H@1 H@3 H@10 MRR H@1 H@3 H@10 MRR 2 and 3 reveal that Functional ExpressivE, with only half the number of parameters of BoxE and RotatE, performs best among spatial and functional models on FB15k-237 and is competitive with TuckER, especially in MRR. Even more, Base ExpressivE outperforms all competing models significantly on WN18RR. The significant performance increase of Base ExpressivE on WN18RR is likely due to WN18RR containing both hierarchy and composition patterns in contrast to FB15k-237 (similar to the discussion of Abboud et al. (2020) ). We will empirically investigate the reasons for ExpressivE's performances on FB15k-237 and WN18RR in Section 6.2 and Section 6.3. Discussion. Tables 2 and 3 reveal that ExpressivE is highly parameter efficient compared to related spatial and functional models while reaching competitive performance on FB15k-237 and even new SotA performance on WN18RR, supporting the extensive theoretical results of our paper.

6.2. ABLATION STUDY

This section analyses how constraints on ExpressivE's parameters impact its benchmark performances. Specifically, we analyze the following constrained ExpressivE versions: (1) Base ExpressivE, which represents ExpressivE without any parameter constraints, (2) Functional ExpressivE, where the width parameter d ht i of each relation r i is zero, (3) EqSlopes ExpressivE, where all slope vectors are constrained to be equal -i.e., r ht i = r ht k for any relations r i and r k , (4) NoCenter ExpressivE, where the center vector c ht i of any relation r i is zero, and (5) OneBand ExpressivE, where each relation is embedded by solely one band instead of two -i.e., OneBand ExpressivE captures a triple r i (e h , e t ) to be true if its relation and entity embeddings only satisfy Inequality 1. This result hints at FB15k-237 not containing many hierarchy patterns. Thus, FB15k-237 cannot exploit the added capabilities of Base ExpressivE, namely the ability to capture general composition and hierarchy. In contrast, the significant performance gain of Base ExpressivE over Functional ExpressivE on WN18RR is likely due to WN18RR containing many composition and hierarchy patterns ( (Abboud et al., 2020) , cf. Appendix I.2), exploiting Base ExpressivE's added capabilities.

6.3. WN18RR PERFORMANCE ANALYSIS

This section analyses the performance of ExpressivE and its closest spatial relative BoxE (Abboud et al., 2020) and functional relative RotatE (Sun et al., 2019) 

REPRODUCIBILITY STATEMENT

We have made our code publicly available in a GitHub repositoryfoot_0 . It contains, in addition to the code of ExpressivE, a setup file to install the necessary libraries and a ReadMe.md file containing library versions and running instructions to facilitate the reproducibility of our results. Furthermore, we have provided all information for reproducing our results -including the concrete hyperparameters, further details of our experiment setup, the used libraries (Ali et al., 2021) , hardware details, definitions of metrics, properties of datasets, and more -in Appendix M. We have provided the complete proofs for our extensive theoretical results in the appendix and stated the complete set of assumptions we made. Specifically, each theorem states any necessary assumption, and each proof starts by listing any property we assume without loss of generality. We have proven Theorem 5.1 in Appendix D, Theorem 5.3 in Appendix E, and Theorems 5.2 and 5.4 in Appendices F and G. A OVERVIEW OF THE APPENDIX  = (M , f h , f v ).

Definition of Truth.

A triple r i (e h , e t ) holds in some m, with r i ∈ R and e h , e t ∈ E iff Inequalities 1 and 2 hold for the assigned embeddings of h, t, and r. This means more specifically that Inequalities 1 and 2 need to hold for f v (e h , e t ) = (f e (e h )||f e (e t )) = (e h ||e t ) and f h (r i ) = (c ht i , d ht i , r th i ), with c ht i = (c h i ||c t i ), d ht i = (d h i ||d t i ), r th i = (r t i ||r h i ). At an intuitive level, this means that a triple r i (e h , e t ) is true in some complete model configuration m iff the virtual pair embedding f v (e h , e t ) of entities e h and e t lies within the hyper-parallelogram of relation r i defined by f h (r i ). Simplifying Notations. Therefore, to simplify the upcoming proofs, we denote with f v (e h , e t ) ∈ f h (r i ) that the virtual pair embedding f v (e h , e t ) ∈ R 2d of an entity pair (e h , e t ) ∈ E × E lies within the hyper-parallelogram f h (r i ) ⊆ R 2d × R 2d × R 2d of some relation r i ∈ R in the virtual triple space. Accordingly, for sets of virtual pair embeddings P := {f v (e h1 , e t1 ), . . . , f v (e hn , e tn )}, we denote with P ⊆ f h (r i ) that all virtual pair embeddings of P lie within the hyper-parallelogram of the relation r i . Furthermore, we denote with f v (e h , e t ) ̸ ∈ f h (r i ) that a virtual pair embedding f v (e h , e t ) does not lie within the hyper-parallelogram of a relation r i and with P ̸ ⊆ f h (r i ) we denote that an entire set of virtual pair embeddings P does not lie within the hyper-parallelogram of a relation r i . Capturing Inference Patterns. Based on the previous definitions, we define capturing patterns formally: A relation configuration m h captures a pattern ψ exactly if for any ground pattern ϕ B1 ∧ • • • ∧ ϕ Bm ⇒ ϕ H within the deductive closure of ψ and for any instantiation of f e and f v the following conditions are satisfied: • if ϕ H is a triple and if m h captures the body triples to be true -i.e., f v (args (ϕ B1 )) ∈ f h (rel (ϕ B1 )), . . . , f v (args(ϕ Bm )) ∈ f h (rel (ϕ Bm )) -then m h also captures the head triple to be true -i.e., f v (args(ϕ H )) ∈ f h (rel (ϕ H )). • if ϕ H = ⊥, then m h captures at least one of the body triples to be false -i.e., there is some j ∈ {1, . . . , m} such that f v (args(ϕ Bj )) ̸ ∈ f h (rel (ϕ Bj )). where args() is the function that returns the arguments of a triple and rel () is the function that returns the relation of the triple. Furthermore, a relation configuration m h captures a pattern ψ exactly and exclusively if (1) m h exactly captures ψ and (2) m h does not capture any positive pattern ϕ (i.e., ϕ ∈ {symmetry, inversion, hierarchy, intersection, composition}) such that ψ ̸ |= ϕ except where the body of ϕ is not satisfied over m h . Discussion. In the following, some intuition of the above definition of capturing a pattern is provided. Capturing a pattern exactly is defined straightforwardly by adhering to the semantics of logical implication ϕ := ϕ B ⇒ ϕ H , i.e., a relation configuration m h needs to be found such that for any complete model configuration m over m h if the body ϕ B of the pattern is satisfied, then its head ϕ H can be inferred. Capturing a pattern exactly and exclusively imposes additional constraints. Here, we do not solely aim at capturing a pattern but at additionally showcasing that a pattern can be captured independently from any other pattern. Therefore, some notion of minimality/exclusiveness of a pattern is needed. As in Abboud et al. (2020) , we define minimality by means of solely capturing those positive patterns ϕ that directly follow from the deductive closure of the pattern ψ, except for those ϕ that are captured trivially, i.e., except for those ϕ where their body is not satisfied over the constructed m h . As presented in Section 5, we can express any supported pattern by means of spatial relations of the corresponding relation hyper-parallelograms in the virtual triple space. Therefore, we formulate exclusiveness intuitively as the ability to limit the intersection of hyper-parallelograms to only those intersections that directly follow from the captured pattern ψ for any known relation r i ∈ R, which is in accordance with BoxE's notion of exclusiveness (Abboud et al., 2020) . Note that our definition of capturing patterns solely depends on relation configurations. This is vital for ExpressivE to be able to capture patterns in a lifted manner, i.e., ExpressivE shall be able to capture patterns without the need of grounding them first. Furthermore, being able to capture patterns in a lifted way is not only efficient but also natural as we aim at capturing patterns between relations. Thus it would be unnatural if constraints on entity embeddings were necessary to capture such relation-specific patterns. As outlined in the previous paragraphs, our definition is in accordance with the literature, focuses on efficiently capturing patterns, and gives us a formal foundation for the upcoming proofs, which will show that ExpressivE can capture various logical patterns.

D PROOF OF FULLY EXPRESSIVENESS

In this section, we prove Theorem 5.1. We will show by induction that ExpressivE is fully expressive. We will first only consider self-loop-free triples, i.e., triples of the form r i (e j , e k ) with e j , e k ∈ E, r i ∈ R and j ̸ = k and later remove unwanted self-loops from the constructed model configuration. Since our proof is highly technical, we will first give some general intuition and then formally state our proof. In the base case, we consider an ExpressivE model that captures the complete graph G over the entity vocabulary E and the relationship vocabulary R, i.e., the graph that contains all triples from the universe. In the induction step, we prove that we can adjust our ExpressivE model to make any arbitrary self-loop-free triple of G false while maintaining the truth value of any other triple in the universe. In the induction step, we make triples r i (e j , e k ) false by translating the entity embeddings of e j and e k such that a hyper-parallelogram can separate pairs of entity embeddings that shall be true from those that shall be false. Afterward, we translate and shear r i 's hyper-parallelogram to match such a separating shape. Finally, after the induction step, we add a separate dimension for any possible self-loop, i.e., triple of the form r i (e j , e j ) such that we can make any self-loop false. Thereby, we show that ExpressivE can make any triple false and thus that ExpressivE can capture any graph G over R and E. Our proof shares some common ideas with the fully expressiveness proof of BoxE (Abboud et al., 2020) , yet differs dramatically in many aspects. BoxE embeds relations with two axis-aligned boxes and entities with two separate embedding vectors, which greatly simplifies the fully expressiveness proof of BoxE, as the two entity embeddings are independent of each other. This grants BoxE some flexibility for adapting model configuration yet imposes substantial restrictions, such as that BoxE cannot capture any notion of composition patterns. Our model does not have these restrictions and uses only one embedding vector per entity instead, pushing the complexity of our model to the relation embeddings by representing relations as hyper-parallelogram in the virtual triple space. This, however, has the consequence that we cannot easily change entity embeddings without moving and sheering relation embeddings as well when we want to make solely one triple false and preserve the truth value of any other triple. In the following proof, we will explain the complex adjustment of relation embeddings and many more novel aspects of our proof in more detail. We start our proof by making the following assumptions without loss of generality: 1. Any relation r i ∈ R and entity e j ∈ E is indexed with 0 ≤ i ≤ |R|-1 and 0 ≤ j ≤ |E|-1.

2.. The dimensionality of each relation and entity embedding vectors is equal to |E| * |R|.

Furthermore, v(i, j) represents the dimension i * |E| + j of the vector v. Intuitively, the dimensions of v(i, 0), . . . , v(i, |E| -1) corresponds to the dimensions reserved for relation r i . 3. The slope vectors of relation r i ∈ R are positive, i.e., r h i , r t i > 0. 4. Any entity embedding is positive, i.e., for any entity e k ∈ E holds that e k > 0. 5. For any pair of entities e k1 , e k2 ∈ E holds that e k1 (i, k 1 ) ≥ e k2 (i, k 1 ) + m, with m > 0. Building on these assumptions, we prove fully expressiveness by induction as follows: Base Case. We initialize a graph G as the whole universe over E and R and construct a complete model configuration m = (M , f h , f v ) with dimensionality |E| * |R| such that G is captured and all assumptions are satisfied. Concretely, we specify for any dimension (i, k 1 ) with 0 ≤ i ≤ |R| -1 and 0 ≤ k 1 ≤ |E| -1 the embedding values of entity embeddings with index k 1 to set e k1 (i, k 1 ) = 2 and with index k 2 ̸ = k 1 to e k2 (i, k 1 ) = 1. Furthermore, we specify for any dimension (i, k) with 0 ≤ i ≤ |R| -1 and 0 ≤ k ≤ |E| -1 the embedding of relation r i to c h i (i, k) = c t i (i, k) = 0, r h i (i, k) = 1, r t i (i, k) = 2 and d h i (i, k) = d t i (i, k) = 4. As can be shown easily the constructed complete model configuration satisfies all assumptions and makes any triple over R and E true. Note that in particular, any self-loop is also captured to be true in the constructed complete model configuration. Induction step. In the induction step, we adjust the entity and relation embeddings of the complete model configuration such that a single triple r i (e j , e k ) is made false without affecting the truth value of any other triple within the graph G. We denote any adjusted embedding with an asterisk v * and the old value of the embedding with v and perform the following adjustments: 1. Increase any slope vector r t * i (i, k) := r t i (i, k) + ∆r t i with ∆r t i > 0 such that: e j (i, k) -r t i (i, k)e k (i, k) -c h i (i, k) -∆r t i m ≤ -d h i (i, k) 2. Since e k (i, k) is by assumption the largest value in dimension (i, k), we can specify the following two values: ∆r max i := ∆r t i e k (i, k) ∆r ub i := ∆r t i (e k (i, k) -m) with ∆r ub i < ∆r max i . 3. Using this definition, we increase all entity embeddings e j ′ with j ′ ̸ = j in dimension (i, k) by: e * j ′ (i, k) := e j ′ (i, k) + ∆r max i 4. Furthermore, we increase all entity embeddings e j ′ with j ′ ̸ = j in dimension (i, k) by: e * j ′ (i, k) := e j ′ (i, k) + ∆r max i 5. For any relation with index i ̸ = i ′ , we adjust any head band in dimension (i, k) by moving its center downwards and growing the band upwards. This means formally that we update the following embeddings: s := r t i ′ (i, k)∆r t i m + ∆r max i d h * i ′ (i, k) := d h i ′ (i, k) + s 2 c h * i ′ (i, k) := c h i ′ (i, k) -r t i ′ (i, k)∆r max i + s 6. We adjust any tail band in dimension (i, k) by moving its center downwards and growing the band upwards. This means formally that we update the following embeddings: s := r h i ′ (i, k)∆r t i m + ∆r max i d t * i ′ (i, k) := d t i ′ (i, k) + s 2 c t * i ′ (i, k) := c t i ′ (i, k) -r h i ′ (i, k)∆r max i + s 7. For any relation with index i, we adjust any head band in dimension (i, k) by moving its center downwards and growing the band upwards. This means formally that we update the following embeddings: s := (∆r t i + r t i (i, k))∆r t i m + ∆r max i d h * i (i, k) := d h i (i, k) + s 2 c h * i (i, k) := c h i (i, k) -∆r t i ∆r max i -r t i (i, k)∆r max i + s 2 In the induction step, we adjust the slope vectors (Step 1), the entity embeddings (Step 2-4), and the width and center embeddings (Step 5-7). Intuitively, by changing the slope vector of relation hyperparallelograms, we sheer the hyper-parallelograms. Furthermore, we translate any desired entity embeddings more than the undesired entity embedding of e j . This allows us to draw a separating hyper-parallelogram between the point defined by (e j , e k ) and any other pair of entities that shall remain within relation r i . Finally, we must move the sheered hyper-parallelograms into the correct position and stretch it to make all desired triples true. Our next goal is to show this behavior formally. We will first show that the initially true triple r i (e j , e k ) is false, then continue by showing that the truth value of any other triple is preserved. Since the induction steps perform only adjustments in dimension (i, k), we only have to consider the dimension (i, k) for any embedding vector in the following inequalities. Please note that to state the inequalities concisely, we have omitted the notation (i, k) from any embedding vector v in the following inequalities. For instance, we will denote r t i (i, k) with r t i henceforth. Let s := (∆r t i + r t i )∆r t i m + ∆r max i , then we can show that our induction step makes r i (e j , e k ) false as follows: e j -r t i e k -c h i -∆r t i m ≤ -d h i (4) e j -r t i e k -c h i + ∆r ub i -∆r max i -∆r t i ∆r max i + ∆r t i ∆r max i -r t i ∆r max i + r t i ∆r max i + s 2 - s 2 ≤ -d h i (5) e j + ∆r ub i -(r t i + ∆r t i )(e k + ∆r max i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≤ -(d h i + s 2 ) (6) e * j -r t * i e * k -c h * i ≤ -d h * i (7) Inequality 4 follows directly from Induction Step 1. Next, in Inequality 5 we add many terms that eliminate each other and apply ∆r ub i - ∆r max i = ∆r t i (e k -m) -∆r t i e k = -m∆r t i . Finally, in Inequality 6 we restructure the terms such that we can substitute the terms for the adjusted embedding vectors defined in Steps 1-7. Through this substitution, we obtain Inequality 7, which reveals that the adjusted embeddings e * j , e * k do not lie within the adjusted hyper-parallelogram of relation r i . Therefore, we have shown that the adjustments of the complete model configuration listed in Steps 1-7 have made the triple r i (e j , e k ) false, as required. Next, we need to show that the truth value of any other self-loop-free triple r i ′ (e j ′ , e k ′ ) with j ′ ̸ = k ′ is not altered after the induction step. We start by showing that any triple r i ′ (e j ′ , e k ′ ) that is true in m remains true after the induction step. Since what follows is a highly technical proof, we give some intuition now. We make a case distinction of any possible true triple in G and perform the following steps. First, we assume that the triple is true and therefore instantiate Inequalities 1 and 2 with the embeddings prior to the induction step. Note that it is solely necessary to consider Inequality 1 as the proofs work vice versa for Inequality 2. Thus, we solely consider Inequality 1 henceforth. Next, we add terms that eliminate each other and adjustment terms a such that we can substitute our inequality with the adjusted embedding values v * . Finally, we show that Inequality 1 is satisfied for the adjusted embedding values. Note that Inequality 1 defines two inequalities, specifically e h -c h i -r t i ⊙ e t ⪯ d h i and e h -c h i -r t i ⊙ e t ⪰ -d h i . Therefore, we denote with (<) the proof for the first inequality and with (>) the proof for the second inequality. Thereby, we will show that if we assume the triple r i ′ (e j ′ , e k ′ ) to be true in the complete model configuration prior to the induction step, we can follow that r i ′ (e j ′ , e k ′ ) stays true after the adjustments of the induction step. To provide the complete formal side of our proof, we consider the following 12 cases: 1. Case i ′ = i, j ′ = j, k ′ = j, k ′ ̸ = k: (<) Let s := (∆r t i + r t i )∆r t i m + ∆r max i and let a := (∆r max i -∆r ub i )(1 -∆r t i -r t i ∆r ub ). Note that a is positive since a = ∆r t i m + ∆r max i holds. Therefore, we can perform the following transformations: e j -r t i e j -c h i ≤ d h i (8) e j -r t i e j -c h i -a + s -s ≤ d h i (9) e j + ∆r ub i -(r t i + ∆r t i )(e j + ∆r ub i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≤ d h i + s 2 (10) e * j -r t * i e * j -c h * i ≤ d h * i (11) (>) Let a := (∆r max i -∆r ub i )(∆r t i + r t i ) + ∆r ub i -∆r max i and let s := (∆r t i + r t i )∆r t i m + ∆r max i . Note that a is positive since (1) a = m∆r t i (∆r t i + r t i -1), (2) we initialize r t i in the base case to 2 in any dimension and (3) any induction step may only increase r t i . Therefore, we can perform the following transformations: e j -r t i e j -c h i ≥ -d h i (12) e j -r t i e j -c h i + a + s 2 - s 2 ≥ -d h i (13) e j + ∆r ub i -(r t i + ∆r t i )(e j + ∆r ub i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≥ -(d h i + s 2 ) (14) e * j -r t * i e * j -c h * i ≥ -d h * i (15) 2. Case i ′ = i, j ′ = j, k ′ ̸ = j, k ′ = k: As can be seen easily this case describes the triple r i (e j , e k ), which shall be made false in the induction step. We have shown that the induction step changes the triples truth value to false in Inequalities 4-7 and therefore omitted the case here.

3.. Case

i ′ = i, j ′ = j, k ′ ̸ = j, k ′ ̸ = k: (<) Let s := (∆r t i + r t i )∆r t i m + ∆r max i and let a := ∆r t i e k ′ + s -∆r ub i . Note that a is positive since a = ∆r t i (e k ′ + m(1 + ∆r t i + r t i )) holds. Therefore, we can perform the following transformations: e j -r t i e k ′ -c h i ≤ d h i (16) e j -r t i e k ′ -c h i -a + ∆r t i ∆r max i -∆r t i ∆r max i + r t i ∆r max i -r t i ∆r max i ≤ d h i (17) e j + ∆r ub i -(r t i + ∆r t i )(e k ′ + ∆r max i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≤ d h i + s 2 (18) e * j -r t * i e * k ′ -c h * i ≤ d h * i ( ) (>) Let a := ∆r ub i -∆r t i e k ′ and let s := (∆r t i + r t i )∆r t i m + ∆r max i . Note that a is positive since ∆r ub i ≥ ∆r t i e k ′ holds. Therefore, we can perform the following transformations: e j -r t i e k ′ -c h i ≥ -d h i (20) e j -r t i e k ′ -c h i + a + ∆r t i ∆r max i -∆r t i ∆r max i + r t i ∆r max i -r t i ∆r max i + s 2 - s 2 ≥ -d h i (21) e j + ∆r ub i -(r t i + ∆r t i )(e k ′ + ∆r max i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≥ -(d h i + s 2 ) (22) e * j -r t * i e * k ′ -c h * i ≥ -d h * i (23) 4. Case i ′ = i, j ′ ̸ = j, k ′ = j, k ′ ̸ = k: (<) Let a := ∆r t i e j and let s := (∆r t i + r t i )∆r t i m + ∆r max i . Note that a is trivially positive since we initially assumed e j > 0 and since we assumed in Step 1 ∆r t i > 0. Therefore, we can perform the following transformations: e j ′ -r t i e j -c h i ≤ d h i (24) e j ′ -r t i e j -c h i -a + ∆r t i ∆r max i -∆r t i ∆r max i + r t i ∆r max i -r t i ∆r max i + s -s ≤ d h i (25) e j ′ + ∆r max i -(r t i + ∆r t i )(e j + ∆r ub i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≤ d h i + s 2 (26) e * j ′ -r t * i e * j -c h * i ≤ d h * i (>) Let a := ∆r max i -∆r t i e j + ∆r t i m(∆r t i + r t i ) and let s := (∆r t i + r t i )∆r t i m + ∆r max i . Note that a is positive since ∆r max i -∆r t i e j > 0. Therefore, we can perform the following transformations: e j ′ -r t i e j -c h i ≥ -d h i (28) e j ′ -r t i e j -c h i + a + ∆r t i ∆r max i -∆r t i ∆r max i + r t i ∆r max i -r t i ∆r max i + s 2 - s 2 ≥ -d h i (29) e j ′ + ∆r max i -(r t i + ∆r t i )(e j + ∆r ub i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≥ -(d h i + s 2 ) (30) e * j ′ -r t * i e * j -c h * i ≥ -d h * i (31) 5. Case i ′ = i, j ′ ̸ = j, k ′ ̸ = j, k ′ = k: (<) Let s := (∆r t i + r t i )∆r t i m + ∆r max i and let a := s + ∆r t i e k -∆r max i . Note that a is positive since a = ∆r t i (e k + m(∆r t i + r t i )) holds. Therefore, we can perform the following transformations: e j ′ -r t i e k -c h i ≤ d h i (32) e j ′ -r t i e k -c h i -a + ∆r t i ∆r max i -∆r t i ∆r max i + r t i ∆r max i -r t i ∆r max i ≤ d h i (33) e j ′ + ∆r max i -(r t i + ∆r t i )(e k + ∆r max i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≤ d h i + s 2 (34) e * j ′ -r t * i e * k -c h * i ≤ d h * i (35) (>) Let s := (∆r t i + r t i )∆r t i m + ∆r max i . Using this definition, we can perform the following transformations: e j ′ -r t i e k -c h i ≥ -d h i (36) e j ′ -r t i e k -c h i + ∆r max i -∆r max i + ∆r t i ∆r max i -∆r t i ∆r max i +r t i ∆r max i -r t i ∆r max i - s 2 ≥ -d h i - s 2 (37) e j ′ + ∆r max i -(r t i + ∆r t i )(e k + ∆r max i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≥ -d h i - s 2 (38) e * j ′ -r t * i e * k -c h * i ≥ -d h * i (39) 6. Case i ′ = i, j ′ ̸ = j, k ′ ̸ = j, k ′ ̸ = k: (<) Let s := (∆r t i + r t i )∆r t i m + ∆r max i and let a := s -∆r max i + ∆r t i e k ′ . Note that a is positive since a = ∆r t i (e k ′ + m(∆r t i + r t i )) holds. Therefore, we can perform the following transformations: e j ′ -r t i e k ′ -c h i ≤ d h i (40) e j ′ -r t i e k ′ -c h i -a + ∆r t i ∆r max i -∆r t i ∆r max i + r t i ∆r max i -r t i ∆r max i ≤ d h i (41) e j ′ + ∆r max i -(r t i + ∆r t i )(e k ′ + ∆r max i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≤ d h i + s 2 (42) e * j ′ -r t * i e * k ′ -c h * i ≤ d h * i (>) Let a := ∆r max i -∆r t i e k ′ and let s := (∆r t i + r t i )∆r t i m + ∆r max i . Therefore, we can perform the following transformations: e j ′ -r t i e k ′ -c h i ≥ -d h i (44) e j ′ -r t i e k ′ -c h i + a + ∆r t i ∆r max i -∆r t i ∆r max i + r t i ∆r max i -r t i ∆r max i + s 2 - s 2 ≥ -d h i (45) e j ′ + ∆r max i -(r t i + ∆r t i )(e k ′ + ∆r max i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≥ -(d h i + s 2 ) (46) e * j ′ -r t * i e * k ′ -c h * i ≥ -d h * i (47) 7. Case i ′ ̸ = i, j ′ = j, k ′ ̸ = j, k ′ = k: (<) Let s := r t i ′ ∆r t i m + ∆r max i and let a := s -∆r ub i . Note that a is positive since a = ∆r t i m(1 + r t i ′ )) holds. Therefore, we can perform the following transformations: e j -r t i ′ e k -c h i ′ ≤ d h i ′ (48) e j -r t i ′ e k -c h i ′ -a + r t i ′ ∆r max i -r t i ′ ∆r max i ≤ d h i ′ (49) e j + ∆r ub i -r t i ′ (e k + ∆r max i ) -(c h i ′ -r t i ′ ∆r max i + s 2 ) ≤ d h i ′ + s 2 (50) e * j -r t * i ′ e * k -c h * i ′ ≤ d h * i ′ (>) Let a := ∆r ub i and let s := r t i ′ ∆r t i m + ∆r max i . Note that a is trivially positive since ∆r ub i is positive. Therefore, we can perform the following transformations: e j -r t i ′ e k -c h i ′ ≥ -d h i ′ (52) e j -r t i ′ e k -c h i ′ + a + r t i ′ ∆r max i -r t i ′ ∆r max i + s 2 - s 2 ≥ -d h i ′ (53) e j + ∆r ub i -r t i ′ (e k + ∆r max i ) -(c h i ′ -r t i ′ ∆r max i + s 2 ) ≥ -(d h i ′ + s 2 ) (54) e * j -r t * i ′ e * k -c h * i ′ ≥ -d h * i ′ (55) 8. Case i ′ ̸ = i, j ′ = j, k ′ ̸ = j, k ′ ̸ = k: As can be seen easily this case generates the same inequalities as the previous case, except that k ′ = k. Therefore, no relevant difference has to be considered, which is why we omit this case. 9. Case (i ′ ̸ = i, j ′ ̸ = j, k ′ = j, k ′ ̸ = k): (<) Let s := r t i ∆r t i m + ∆r max i . Using this definition we can make the following transformations: e j ′ -r t i ′ e j -c h i ′ ≤ d h i ′ (56) e j ′ -r t i ′ e j -c h i ′ + s -s ≤ d h i ′ (57) e j ′ + ∆r max i -r t i ′ (e j + ∆r ub i ) -(c h i ′ -r t i ′ ∆r max i + s 2 ) ≤ d h i ′ + s 2 (58) e * j ′ -r t * i ′ e * j -c h * i ′ ≤ d h * i ′ (59) (>) Let a := ∆r max i + r t i (∆r max i -∆r ub i ) and let s := r t i ∆r t i m + ∆r max i . Note that a is positive since ∆r max i > ∆r ub i . Therefore, we can perform the following transformations: e j ′ -r t i ′ e j -c h i ′ ≥ -d h i ′ (60) e j ′ -r t i ′ e j -c h i ′ + a + s 2 - s 2 ≥ -d h i ′ (61) e j ′ + ∆r max i -r t i ′ (e j + ∆r ub i ) -(c h i ′ -r t i ′ ∆r max i + s 2 ) ≥ -(d h i ′ + s 2 ) (62) e * j ′ -r t * i ′ e * j -c h * i ′ ≥ -d h * i ′ (63) 10. Case i ′ ̸ = i, j ′ ̸ = j, k ′ ̸ = j, k ′ = k: (<) Let s := r t i ′ ∆r t i m + ∆r max i and let a := s -∆r max i . Note that a is positive since a = r t i ∆r t i m holds. Therefore, we can perform the following transformations: e j ′ -r t i ′ e k -c h i ′ ≤ d h i ′ (64) e j ′ -r t i ′ e k -c h i ′ -a -∆r max i + ∆r max i -r t i ′ ∆r max i + r t i ′ ∆r max i ≤ d h i ′ (65) e j ′ + ∆r max i -r t i ′ (e k + ∆r max i ) -(c h i ′ -r t i ′ ∆r max i + s 2 ) ≤ d h i ′ + s 2 (66) e * j ′ -r t * i ′ e * k -c h * i ′ ≤ d h * i ′ (67) (>) Let s := r t i ′ ∆r t i m + ∆r max i and a := ∆r max i . Note that a is trivially positive since ∆r max i is positive. Therefore, we can perform the following transformations: e j ′ -r t i ′ e k -c h i ′ ≥ -d h i ′ (68) e j ′ -r t i ′ e k -c h i ′ + a + r t i ′ ∆r max i -r t i ′ ∆r max i + s 2 - s 2 ≥ -d h i ′ (69) e j ′ + ∆r max i -r t i ′ (e k + ∆r max i ) -(c h i ′ -r t i ′ ∆r max i + s 2 ) ≥ -(d h i ′ + s 2 ) (70) e * j ′ -r t * i ′ e * k -c h * i ′ ≥ -d h * i ′ (71) 11. Case i ′ ̸ = i, j ′ ̸ = j, k ′ ̸ = j, k ′ ̸ = k: As can be seen easily this case generates the same inequalities as the previous case, except that k ′ = k. Therefore, no relevant difference has to be considered, which is why we omit this case. 12. Case i ′ ̸ = i, j ′ = j, k ′ = j, k ′ ̸ = k: (<) Let s := r t i ′ ∆r t i m + ∆r max i and let a := ∆r max i -∆r ub i . Note that a is positive since a = ∆r t i m. Therefore, we can perform the following transformations: e j -r t i ′ e j -c h i ′ ≤ d h i ′ (72) e j -r t i ′ e j -c h i ′ -a -s + s ≤ d h i ′ (73) e j + ∆r ub i -r t i ′ (e j + ∆r ub i ) -(c h i ′ -r t i ′ ∆r max i + s 2 ) ≤ d h i ′ + s 2 (74) e * j -r t * i ′ e * j -c h * i ′ ≤ d h * i ′ (75) (>) Let s := r t i ′ ∆r t i m + ∆r max i and a := ∆r ub i + ∆r t i mr t i ′ . Note that a is trivially positive since we assumed any parameter to be positive. Therefore, we can perform the following transformations: e j -r t i ′ e j -c h i ′ ≥ -d h i ′ (76) e j -r t i ′ e j -c h i ′ + a + s 2 - s 2 ≥ -d h i ′ e j + ∆r ub i -r t i ′ (e j + ∆r ub i ) -(c h i ′ -r t i ′ ∆r max i + s 2 ) ≥ -(d h i ′ + s 2 ) (78) e * j -r t * i ′ e * j -c h * i ′ ≥ -d h * i ′ We have shown in any of the twelve discussed cases that if a triple r i ′ (e j ′ , e k ′ ) with i ′ ̸ = i or j ′ ̸ = j or k ′ ̸ = k was true in the model configuration prior to the induction step, then it is still true in the adjusted model configuration after the induction step. Hence, to show that ExpressivE can capture any self-loop-free graph, it remains to show that any triple that was false remains false after the induction step. To verify that an initially false tripe r i ′ (e j ′ , e k ′ ) remains false we solely need to show that the embeddings of r i ′ , e j ′ and e k ′ do not satisfy at least one of the Inequalities 1 or 2. We have to consider the following cases: 1. Case k ′ ̸ = k: Any changes to the dimension v(i, k) do not affect the dimension v(i ′ , k ′ ). Therefore, if r i ′ (e j ′ , e k ′ ) for k ′ ̸ = k was false before the induction step, it remains false after the induction step, as we solely alter dimension (i, k). 2. Case k ′ = k, i ′ = i: In this case j ′ ̸ = j needs to hold as the triple r i (e j , e k ) was initially assumed to be true. We can easily show that in this case any triple remains false as follows: Let s := (∆r t i + r t i )∆r t i m + ∆r max i , then we can show that our induction step makes r i (e j ′ , e k ) false as follows: e j ′ -r t i e k -c h i ≤ -d h i (80) e j ′ -r t i e k -c h i + ∆r max i (1 -1 + ∆r t i -∆r t i + r t i -r t i ) - s 2 ≤ -d h i - s 2 (81) e j ′ + ∆r max i -(r t i + ∆r t i )(e k + ∆r max i ) -(c h i -∆r t i ∆r max i -r t i ∆r max i + s 2 ) ≤ -d h i - s 2 (82) e * j ′ -r t * i e * k -c h * i ≤ -d h * i ( ) Since we started with the complete graph, any triple that is false was made false by an induction step. We have seen that if we apply our algorithm to make r i (e j , e k ) false, then Inequality 7 holds. Since we assume that r i (e j ′ , e k ) was false prior to the current induction step and Inequality 7 describes how induction steps make triples false, we can follow that Inequality 80 needs to hold prior to this induction step. Next, we add in Inequality 81 terms that eliminate each other. Finally, in Inequality 82 we restructure the terms such that we can substitute them for the adjusted embedding vectors defined in 1-7. Through this substitution, we obtain Inequality 83, which reveals that the adjusted embeddings of e * j ′ and e * k do not lie within the adjusted hyper-parallelogram of relation r i . Therefore, we have shown that the adjustments of the complete model configuration stated in Steps 1-7 preserve the false triples of this case to remain false. 3. Case i ′ ̸ = i: Any changes to the dimension v(i, k) do not affect the dimension v(i ′ , k ′ ). Therefore, if r i ′ (e j ′ , e k ′ ) for i ′ ̸ = i was false before the induction step, it remains false after the induction step, as we solely alter dimension (i ′ , k). Hence, we have shown that we can make any self-loop-free triple false in the induction step while preserving the truth value of the remaining triples in G. To show fully expressiveness, it remains to show that we can capture any graph G even with self-loops. We started our proof in the base case with a complete graph, which means that any self-loop was initially true. Furthermore, we have shown in Inequalities 8-15 and 72-79 that any true self-loop remains true after the induction step and that therefore any constructed complete model configuration captures any self-loop to be true. Since there are only |R| * |E| possibilities to generate triples of the form r i (e j , e j ) for any r i ∈ R and e j ∈ E and since we require just a single dimension where the embedding of the entity pair e j , e j is outside of r i 's hyper-parallelogram to make the triple r i (e j , e j ) false, we can simply add a dimension per self-loop to our embeddings, whose sole purpose is to exclude one undesired self-loop r i (e j , e j ). 

E PROOF OF COMPOSITIONALLY DEFINED REGION

In this section, we prove Theorem 5.3, which will serve as further machinery for successive appendices. Since we are going to prove Theorem 5.3 by proving a more specific Theorem, we need to extend the notion of when a compositional definition pattern holds in the virtual triple space first such that we can employ it later in our proof. Definition E.1 describes when a compositional definition pattern holds in dependence of the spatial regions of its relations in the virtual triple space. The definition employs the notion of logical implication, i.e., if the body of a pattern is satisfied, then its head can be inferred. Definition E.1 (Truth of Compositional Definition in the Virtual Triple Space) Let r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z ) be a compositional definition pattern over some relations r 1 , r 2 , r d ∈ R and over arbitrary entities X, Y, Z ∈ E. Furthermore, let f h be a relation assignment function defined over r 1 and r 2 . Moreover, let s d be the spatial region of r d in the virtual triple space. The compositional definition pattern holds for the regions of the relations in the virtual triple space, i.e., for f h (r 1 ), f h (r 2 ) and s d , if: (⇒) for any entity assignment function f e and virtual assignment function  f v over f e if f v (X, Y ) ∈ f h (r 1 ) and f v (Y, Z) ∈ f h (r 2 ), then f v (X, f v over f e if f v (X, Z) is within the region s d of r d , then there exists an entity assignment f e (Y ) such that f v (X, Y ) ∈ f h (r 1 ) and f v (Y, Z) ∈ f h (r 2 ). Recall that Theorem 5.3 (reformulated in the definitions of Appendix C and Definition E.1) states that if ϕ := r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z ) is a compositional definition pattern defined over relations r 1 , r 2 , r d ∈ R and if f h is a relation assignment function that is defined over r 1 and r 2 , then there exists a convex region s d for r d in the virtual triple space R 2d such that ϕ holds for f h (r 1 ), f h (r 2 ) and s d . In particular, we are not only interested in proving the existence of the compositionally defined region s d , but we will even identify a system of inequalities that describes the shape of s d . Specifically, Theorem E.2 concretely characterizes the shape of s d , which we prove subsequently. Theorem E.2 Let r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) be a compositional definition pattern over some relations r 1 , r 2 , r d ∈ R and over arbitrary entities X, Y, Z ∈ E. Furthermore, let f h be a relation assignment function that is defined over r 1 and r 2 such that for any i ∈ {1, 2}, These substitutions result in a system of inequalities with the same behavior as the initial system of inequalities. We have listed the result of these substitutions in Inequalities 98-107.  f h (r i ) = (c ht i , d ht i , r th i ) with c ht i = (c h i ||c t i ), d ht i = (d h i ||d t i ), (x -zr t 1 r t 2 -c h 2 r t 1 -c h 1 ) |.| ⪯ d h 2 r t 1 + d h 1 ( ) (zr t 2 + c h 2 -xr h 1 -c t 1 ) |.| ⪯ d t 1 + d h 2 (99) (z -xr h 1 r h 2 -c t 1 r h 2 -c t 2 ) |.| ⪯ d t 1 r h 2 + d t 2 (100) (z + (c h 1 -x)r h 2 ⊘ r t 1 -c t 2 ) |.| ⪯ d h 1 r h 2 ⊘ r t 1 + d t 2 (101) (x(1 -r h 1 r t 1 ) -c t 1 r t 1 -c h 1 ) |.| ⪯ d t 1 r t 1 + d h 1 (102) (z(1 -r h 2 r t 2 ) -c h 2 r h 2 -c t 2 ) |.| ⪯ d h 2 r h 2 + d t 2 (103) d h 1 ⪯ -d h 1 (104) d t 1 ⪯ -d t 1 (105) d h 2 ⪯ -d h 2 (106) d t 2 ⪯ -d t 2 containing f v (X, Z) if f v (X, Y ) ∈ f h (r 1 ) and f v (Y, Z) ∈ f h (r 2 ) . This is trivially true since Inequalities 98-107 directly follow from Inequalities 90-97, which are instantiations of Inequalities 1-2 representing f v (X, Y ) ∈ f h (r 1 ) and f v (Y, Z) ∈ f h (r 2 ). Reading the proof bottom-up proves the other direction (⇐), i.e., if f v (X, Z) is in s d , then there exists an entity assignment f e (Y ) = y such that f v (X, Y ) ∈ f h (r 1 ) and f v (Y, Z) ∈ f h (r 2 ). Thereby, we have successfully shown that if Inequalities 84-89 describe the region s d of relation r d in the virtual triple space, then r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for f h (r 1 ), f h (r 2 ), and s d in the virtual triple space. □ We have proven Theorem E.2 in this section, i.e., that Inequalities 84-89 define the compositionally defined region for positive slope vectors. The proof works vice versa for any other sign of slope vectors, except that the substitutions of Inequalities 90-97 may vary due to the different signs of slope vectors. Note that by proving Theorem E.2, we have also proven Theorem 5.3 -i.e., that there exists a convex region that describes the compositionally defined region s d -since (1) we have characterized the compositionally defined region and thereby implicitly proven its existence and since (2) Inequalities 84-89 trivially form a convex region.

F DETAILS ON CAPTURING PATTERNS EXACTLY

Before we prove the inference capabilities of ExpressivE in this section, we formally define the considered patterns in Definition F.1. • Patterns of the form r 1 (X, Y ) ⇒ r 1 (Y, X) with r 1 ∈ R are called symmetry patterns. • Patterns of the form r 1 (X, Y ) ⇒ ¬r 1 (Y, X) with r 1 ∈ R are called anti-symmetry patterns. • Patterns of the form r 1 (X, Y ) ⇔ r 2 (Y, X) with r 1 , r 2 ∈ R and r 1 ̸ = r 2 are called inversion patterns. • Patterns of the form r 1 (X, Y )∧r 2 (Y, Z) ⇒ r 3 (X, Z) with r 1 , r 2 , r 3 ∈ R and r 1 ̸ = r 2 ̸ = r 3 are called general composition patterns. • Patterns of the form r 1 (X, Y )∧r 2 (Y, Z) ⇔ r d (X, Z) with r 1 , r 2 , r d ∈ R and r 1 ̸ = r 2 ̸ = r d are called compositional definition patterns. • Patterns of the form r 1 (X, Y ) ⇒ r 2 (X, Y ) with r 1 , r 2 ∈ R and r 1 ̸ = r 2 are called hierarchy patterns. • Patterns of the form r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) with r 1 , r 2 , r 3 ∈ R and r 1 ̸ = r 2 ̸ = r 3 are called intersection patterns. • Patterns of the form r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ with r 1 , r 2 ∈ R and r 1 ̸ = r 2 are called mutual exclusion patterns. With all definitions in place, we prove the exactness part of Theorems 5.2 and 5.4, i.e., that ExpressivE captures all patterns from Table 1 exactly. Specifically, we do not solely prove that ExpressivE captures the patterns of Table 1 exactly, but that ExpressivE captures these patterns exactly iff its relation hyper-parallelograms meet the properties intuitively described in Section 5. Next, in Section G, we prove that ExpressivE captures patterns exactly and exclusively. For the upcoming proofs, we employ the definitions and formal specifications of Sections C and E: Proposition F.1 (Symmetry (Exactly)) Let m h = (M , f h ) be a relation configuration and r 1 ∈ R be a symmetric relation, i.e., r 1 (X, Y ) ⇒ r 1 (Y, X) holds for any entities X, Y ∈ E. Then m h captures r 1 (X, Y ) ⇒ r 1 (Y, X) exactly iff r 1 's relation hyper-parallelogram f h (r 1 ) is symmetric across the identity line of any correlation subspace. Proof ⇒ For the first direction, what is to be shown is that if r 1 's relation hyper-parallelogram f h (r 1 ) is symmetric across the identity line of any correlation subspace, then m h captures r 1 (X, Y ) ⇒ r 1 (Y, X) exactly. We show this by contradiction. Thus, we first assume that r 1 's corresponding relation hyper-parallelogram f h (r 1 ) of m h is symmetric across the identity line for any correlation subspace s i . Now to the contrary, we assume that m h does not capture r 1 (X, Y ) ⇒ r 1 (Y, X) exactly. Then, due to the symmetry of the hyper-parallelogram across the identity line in any correlation subspace s i , for any virtual assignment function f v it holds that if f v (e x , e y ) ∈ f h (r 1 ) for arbitrary entities e x , e y ∈ E, then f v (e y , e x ) ∈ f h (r 1 ). Yet, by the definition of capturing patterns exactly, this means that m h captures r 1 (X, Y ) ⇒ r 1 (Y, X) exactly. This is a contradiction to the initial assumption that m h does not capture r 1 (X, Y ) ⇒ r 1 (Y, X) exactly, proving the ⇒ part of the proposition. ⇐ For the second direction, what is to be shown is that if m h captures r 1 (X, Y ) ⇒ r 1 (Y, X) exactly, then r 1 's relation hyper-parallelogram f h (r 1 ) is symmetric across the identity line of any correlation subspace. We show this by contradiction. Thus, we first assume that m h captures r 1 (X, Y ) ⇒ r 1 (Y, X) exactly, i.e., for any instantiation of f e and f v over f e if f v (e x , e y ) ∈ f h (r 1 ), then f v (e y , e x ) ∈ f h (r 1 ). Now to the contrary, we assume that r 1 's corresponding relation hyper-parallelogram f h (r 1 ) of m h is not symmetric across the identity line in at least one correlation subspace s i . Then, since f h (r 1 ) is not symmetric across the identity line in s i , there is an instantiation of f v and f e such that f v (e x , e y ) ∈ f h (r 1 ) and f v (e y , e x ) ̸ ∈ f h (r 1 ) for some entities e x , e y ∈ E. Yet, by the definition of capturing patterns exactly, this means that m h does not capture r 1 (X, Y ) ⇒ r 1 (Y, X) exactly. This is a contradiction to the initial assumption that m h captures r 1 (X, Y ) ⇒ r 1 (Y, X) exactly, proving the ⇐ part of the proposition. □ Proposition F.2 (Anti-Symmetry (Exactly)) Let m h = (M , f h ) be a relation configuration and r 1 ∈ R be an anti-symmetric relation, i.e., r 1 (X, Y ) ⇒ ¬r 1 (Y, X) holds for any entities X, Y ∈ E. Then m h captures r 1 (X, Y ) ⇒ ¬r 1 (Y, X) exactly iff r 1 's relation hyper-parallelogram f h (r 1 ) is not symmetric across the identity line in at least one correlation subspace. Proposition F.2 can be proven analogously to Proposition F.1. Therefore, its proof has been omitted.

Proposition F.3 (Inversion (Exactly))

Let m h = (M , f h ) be a relation configuration and r 1 , r 2 ∈ R be relations where r 1 (X, Y ) ⇔ r 2 (Y, X) holds for any entities X, Y ∈ E. Then m h captures r 1 (X, Y ) ⇔ r 2 (Y, X) exactly iff f h (r 1 ) is the mirror image across the identity line of f h (r 2 ) for any correlation subspace. Proof ⇒ For the first direction, what is to be shown is that if the relation hyper-parallelogram f h (r 1 ) is the mirror image across the identity line of f h (r 2 ) for any correlation subspace, then m h captures r 1 (X, Y ) ⇔ r 2 (Y, X) exactly. We show this by contradiction. Thus, we first assume that r 1 's corresponding relation hyper-parallelogram f h (r 1 ) of m h is the mirror image across the identity line of f h (r 2 ) for any correlation subspace s i . Now to the contrary, we assume that m h does not capture r 1 (X, Y ) ⇔ r 2 (Y, X) exactly. Then, due to f h (r 1 ) being the mirror image of f h (r 2 ) in any correlation subspace s i , for any virtual assignment function f v it holds that if f v (e x , e y ) ∈ f h (r 1 ) for arbitrary entities e x , e y ∈ E, then f v (e y , e x ) ∈ f h (r 2 ). Yet, by the definition of capturing patterns exactly, this means that m h captures r 1 (X, Y ) ⇔ r 2 (Y, X) exactly. This is a contradiction to the initial assumption that m h does not capture r 1 (X, Y ) ⇔ r 2 (Y, X) exactly, proving the ⇒ part of the proposition. ⇐ For the second direction, what is to be shown is that if m h captures r 1 (X, Y ) ⇔ r 2 (Y, X) exactly, then the relation hyper-parallelogram f h (r 1 ) is the mirror image across the identity line of f h (r 2 ) for any correlation subspace. We show this by contradiction. Thus, we first assume that m h captures r 1 (X, Y ) ⇔ r 2 (Y, X) exactly, i.e., for any instantiation of f e and f v over f e if f v (e x , e y ) ∈ f h (r 1 ), then f v (e y , e x ) ∈ f h (r 2 ). Now to the contrary, we assume that r 1 's corresponding relation hyper-parallelogram f h (r 1 ) of m h is not the mirror image across the identity line of f h (r 2 ) for at least one correlation subspace s i . Then, since f h (r 1 ) is not the mirror image across the identity line of f h (r 2 ) in s i , there is an instantiation of f v and f e such that f v (e x , e y ) ∈ f h (r 1 ) and f v (e y , e x ) ̸ ∈ f h (r 2 ) for some entities e x , e y ∈ E. Yet, by the definition of capturing patterns exactly, this means that m h does not capture r 1 (X, Y ) ⇔ r 2 (Y, X) exactly. This is a contradiction to the initial assumption that m h captures r 1 (X, Y ) ⇔ r 2 (Y, X) exactly, proving the ⇐ part of the proposition. □ Proposition F.4 (Hierarchy (Exactly)) Let m h = (M , f h ) be a relation configuration and r 1 , r 2 ∈ R be relations where r 1 (X, Y ) ⇒ r 2 (X, Y ) holds for any entities X, Y ∈ E. Then m h captures r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly iff f h (r 1 ) is subsumed by f h (r 2 ) for any correlation subspace. Proof ⇒ For the first direction, what is to be shown is that if the relation hyper-parallelogram f h (r 1 ) is subsumed by f h (r 2 ) for any correlation subspace, then m h captures r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly. We show this by contradiction. Thus, we first assume that r 1 's corresponding relation hyper-parallelogram f h (r 1 ) of m h is subsumed by f h (r 2 ) for any correlation subspace s i . Now to the contrary, we assume that m h does not capture r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly. Then, due to f h (r 1 ) being a subset of f h (r 2 ) in any correlation subspace s i , for any virtual assignment function f v it holds that if f v (e x , e y ) ∈ f h (r 1 ) for arbitrary entities e x , e y ∈ E, then f v (e x , e y ) ∈ f h (r 2 ). Yet, by the definition of capturing patterns exactly, this means that m h captures r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly. This is a contradiction to the initial assumption that m h does not capture r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly, proving the ⇒ part of the proposition. ⇐ For the second direction, what is to be shown is that if m h captures r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly, then the relation hyper-parallelogram f h (r 1 ) is subsumed by f h (r 2 ) for any correlation subspace. We show this by contradiction. Thus, we first assume that m h captures r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly, i.e., for any instantiation of f e and f v over f e if f v (e x , e y ) ∈ f h (r 1 ), then f v (e x , e y ) ∈ f h (r 2 ). Now to the contrary, we assume that r 1 's corresponding relation hyper-parallelogram f h (r 1 ) of m h is not subsumed by f h (r 2 ) for at least one correlation subspace s i . Then, since f h (r 1 ) is subsumed by f h (r 2 ) in s i , there is an instantiation of f v and f e such that f v (e x , e y ) ∈ f h (r 1 ) and f v (e x , e y ) ̸ ∈ f h (r 2 ) for some entities e x , e y ∈ E. Yet, by the definition of capturing patterns exactly, this means that m h does not capture r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly. This is a contradiction to the initial assumption that m h captures r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly, proving the ⇐ part of the proposition. □ Proposition F.5 (Intersection (Exactly)) Let m h = (M , f h ) be a relation configuration and r 1 , r 2 , r 3 ∈ R be relations where r 1 (X, Y )∧r 2 (X, Y ) ⇒ r 3 (X, Y ) holds for any entities X, Y ∈ E. Then m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly iff the intersection of f h (r 1 ) and f h (r 2 ) is subsumed by f h (r 3 ) for any correlation subspace. Proof ⇒ For the first direction, what is to be shown is that if the intersection of f h (r 1 ) and f h (r 2 ) is subsumed by f h (r 3 ) for any correlation subspace, then m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly. We show this by contradiction. Thus, we first assume that the intersection of f h (r 1 ) and f h (r 2 ) of m h is subsumed by f h (r 3 ) for any correlation subspace s i . Now to the contrary, we assume that m h does not capture r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly. Then, due to the intersection of f h (r 1 ) and f h (r 2 ) being a subset of f h (r 3 ) in any correlation subspace s i , for any virtual assignment function f v it holds that if f v (e x , e y ) ∈ f h (r 1 ) and f v (e x , e y ) ∈ f h (r 2 ) for arbitrary entities e x , e y ∈ E, then f v (e x , e y ) ∈ f h (r 3 ). Yet, by the definition of capturing patterns exactly, this means that m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly. This is a contradiction to the initial assumption that m h does not capture r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly, proving the ⇒ part of the proposition. ⇐ For the second direction, what is to be shown is that if m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly, then the intersection of f h (r 1 ) and f h (r 2 ) is subsumed by f h (r 3 ) for any correlation subspace. We show this by contradiction. Thus, we first assume that m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly, i.e., for any instantiation of f e and f v over f e if f v (e x , e y ) ∈ f h (r 1 ) and f v (e x , e y ) ∈ f h (r 2 ), then f v (e x , e y ) ∈ f h (r 3 ). Now to the contrary, we assume that the intersection of f h (r 1 ) and f h (r 2 ) is not subsumed by f h (r 3 ) for at least one correlation subspace s i . Then, since the intersection of f h (r 1 ) and f h (r 2 ) is not subsumed by f h (r 3 ) in s i , there is an instantiation of f v and f e such that f v (e x , e y ) ∈ f h (r 1 ) and f v (e x , e y ) ∈ f h (r 2 ) but f v (e x , e y ) ̸ ∈ f h (r 3 ) for some entities e x , e y ∈ E. Yet, by the definition of capturing patterns exactly, this means that m h does not capture r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly. This is a contradiction to the initial assumption that Then m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ exactly iff f h (r 1 ) and f h (r 2 ) do not intersect in at least one correlation subspace. m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly, Proof ⇒ For the first direction, what is to be shown is that if the relation hyper-parallelograms f h (r 1 ) and f h (r 2 ) do not intersect in at least one correlation subspace, then m h captures r 1 (X, Y )∧ r 2 (X, Y ) ⇒ ⊥ exactly. We show this by contradiction. Thus, we first assume that f h (r 1 ) and f h (r 2 ) of m h do not intersect in at least one correlation subspace s i . Now to the contrary, we assume that m h does not capture r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ exactly. Then, since f h (r 1 ) and f h (r 2 ) do not intersect in at least one correlation subspace s i , for any virtual assignment function f v it holds that if f v (e x , e y ) ∈ f h (r 1 ) for arbitrary entities e x , e y ∈ E, then f v (e x , e y ) ̸ ∈ f h (r 2 ). Yet, by the definition of capturing patterns exactly, this means that m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ exactly. This is a contradiction to the initial assumption that m h does not capture r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ exactly, proving the ⇒ part of the proposition. ⇐ For the second direction, what is to be shown is that if m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ exactly, then the relation hyper-parallelograms f h (r 1 ) and f h (r 2 ) do not intersect in at least one correlation subspace. We show this by contradiction. Thus, we first assume that m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ exactly, i.e., for any instantiation of f e and f v over f e if f v (e x , e y ) ∈ f h (r 1 ), then f v (e x , e y ) ̸ ∈ f h (r 2 ) and if f v (e x , e y ) ∈ f h (r 2 ), then f v (e x , e y ) ̸ ∈ f h (r 1 ). Now to the contrary, we assume that r 1 's corresponding relation hyper-parallelogram f h (r 1 ) of m h intersects with f h (r 2 ) in any correlation subspace. Then, since f h (r 1 ) intersects with f h (r 2 ) in any correlation subspace, there is an instantiation of f v and f e such that f v (e x , e y ) ∈ f h (r 1 ) and f v (e x , e y ) ∈ f h (r 2 ) for some entities e x , e y ∈ E. Yet, by the definition of capturing patterns exactly, this means that m h does not capture r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ exactly. This is a contradiction to the initial assumption that m h captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ exactly, proving the ⇐ part of the proposition. □ Proposition F.7 (General Composition (Exactly)) Let r 1 , r 2 , r 3 ∈ R be relations and let m h = (M , f h ) be a relation configuration, where f h is defined over r 1 , r 2 , and r 3 . Furthermore let r 3 be the composite relation of r 1 and r 2 , i.e., r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) holds for any entities X, Y, Z ∈ E. Then m h captures r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) iff the relation hyperparallelogram f h (r 3 ) subsumes the compositionally defined region s d defined by f h (r 1 ) and f h (r 2 ) for any correlation subspace. Proof ⇒ For the first direction, assume that the compositionally defined region defined by f h (r 1 ) and f h (r 2 ) is subsumed by f h (r 3 ) for any correlation subspace. What is to be shown is that m h captures r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly. Our proof for this direction is based on the following three results: 1. For an auxiliary relation r d ∈ R, there exists a convex region s d in the virtual triple space such that r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for f h (r 1 ), f h (r 2 ), s d in any correlation subspace (Theorem E.2). 2. f h (r 1 ) subsumes s d iff m h captures r d (X, Y ) ⇒ r 3 (X, Y ) exactly (Proposition F.4). 3. r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) logically follows from {r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z), r d (X, Y ) ⇒ r 3 (X, Y )}. For (1), observe that based on Theorem E.2, we know that we can define an auxiliary relation r d ∈ R with area s d such that r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for f h (r 1 ), f h (r 2 ) , and s d , i.e., such that s d is the compositionally defined region of f h (r 1 ) and f h (r 2 ). For (2), as shown in Proposition F.4, m h captures r d (X, Y ) ⇒ r 3 (X, Y ) exactly iff f h (r 3 ) subsumes r d 's area s d . Therefore, we have shown that if f h (r 3 ) subsumes s d , and if s d is the compositionally defined region of f h (r 1 ) and f h (r 2 ), then r d (X, Y ) ⇒ r 3 (X, Y ) and r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for f h (r 1 ), f h (r 2 ), f h (r 3 ) and s d . Together with the fact that f h is only defined over r 1 , r 2 , and r 3 , we can infer that m h exactly captures any pattern -solely consisting of r 1 , r 2 , and r 3 -that follows from ψ = {r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z), r d (X, Y ) ⇒ r 3 (X, Y )}. For (3), by logical deduction, the following statement holds: ψ |= r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Y ). Since r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) (i) solely consists of r 1 , r 2 , and r 3 and (ii) follows from ψ, we have proven that m h captures r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly if f h (r 3 ) subsumes s d , proving the ⇒ part of the proposition. ⇐ For the second direction, what is to be shown is that if m h captures r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly, then the compositionally defined region defined by f h (r 1 ) and f h (r 2 ) is subsumed by f h (r 3 ) for any correlation subspace. We prove this by contradiction. Thus assume that m h captures r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly, i.e., for any instantiation of f e and f v over f e if f v (e x , e y ) ∈ f h (r 1 ) and f v (e y , e z ) ∈ f h (r 2 ), then f v (e x , e z ) ∈ f h (r 3 ). Now to the contrary, we assume that r 3 's corresponding relation hyper-parallelogram f h (r 3 ) of m h does not subsume the compositionally defined region s d in at least one correlation subspace. The following three points will be used to construct a counter-example: (1) we have shown in Theorem E.2 that we can define an auxiliary relation r d ∈ R with area s d such that r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for f h (r 1 ), f h (r 2 ), and s d , (2) r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) logically fol- lows from {r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z), r d (X, Y ) ⇒ r 3 (X, Y )} , stating together with Point (1) and Proposition F.4 that r 3 needs to subsume r d 's area  and  (3) we have initially assumed that f h (r 3 ) does not subsume s d . From ( 1)-( 3) we can infer that there exists an instantiation of f v and f e such that f v (e x , e y ) ∈ f h (r 1 ) and f v (e y , e z ) ∈ f h (r 2 ) but f v (e x , e z ) ̸ ∈ f h (r 3 ) for some entities e x , e y , e z ∈ E. Yet, by the definition of capturing patterns exactly, this means that m h does not capture r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly. This is a contradiction to the initial assumption that m h captures r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly, proving the ⇐ part of the proposition. □ Proposition F.8 (Compositional Definition (Exactly)) Let r 1 , r 2 , r d ∈ R be relations and let m h = (M , f h ) be a relation configuration, where f h is defined over r 1 , r 2 , and r d . Furthermore let r d be the compositionally defined relation of r 1 and r 2 , i.e., r 1 (X, s d such that m h can capture r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for any entities X, Y, Z ∈ E. Then m h captures r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) iff the relation hyper-parallelogram f h (r d ) is equal to the compositionally defined region s d defined by f h (r 1 ) and f h (r 2 ) for any correlation subspace. The proof for Proposition F.8 is straightforward, as Proposition F.8 can be proven analogously to Proposition F.7 with the sole difference that instead of defining a relation embedding f h (r 3 ) that subsumes the compositionally defined region s d , we define the compositionally defined relation r d whose embedding f h (r d ) is equal to the compositionally defined region s d . Propositions F.1, F.2, F.3, F.5, F.4, and F.6 together prove the exactness part of Theorem 5.2, i.e., that ExpressivE can capture symmetry, anti-symmetry, inversion, intersection, hierarchy, and mutual exclusion exactly. Propositions F.7 and F.8 prove the exactness part of Theorem 5.4, i.e., that ExpressivE can capture general composition exactly. Now it remains to show that ExpressivE can capture all these patterns exactly and exclusively, which is shown in Section G.

G DETAILS ON CAPTURING PATTERNS EXCLUSIVELY

This section proves that ExpressivE can capture all inference patterns of Theorems 5.2 and 5.4 exactly and exclusively. By the definition of capturing a pattern ψ exactly and exclusively, this means that we need to construct a relation configuration m h such that (1) m h captures ψ and (2) m h does not capture any positive pattern ϕ such that ψ ̸ |= ϕ. Note that we have shown in Propositions F.1-F.7 that we can construct a relation configuration m h that captures the following patterns by constraining the following geometric properties of m h 's relation hyper-parallelograms: 1. For symmetry and inversion patterns, the mirror images across the identity line of hyperparallelograms in any correlation subspace need to be constrained (Propositions F.1 and F.3). 2. For hierarchy and intersection patterns the intersections of hyper-parallelograms in any correlation subspace need to be constrained (Propositions F.4 and F.5). 3. For general composition patterns the compositionally defined region needs to be subsumed in any correlation subspace. Since symmetry, inversion, hierarchy, intersection, and composition are all positive patterns of our considered language of patterns, it suffices to analyze the mirror images (M), intersections (I), and compositionally defined regions (C) of each relation hyper-parallelogram to check which positive patterns have been captured. Furthermore, for the upcoming proofs, Definition G.1 defines head and tail intervals. Definition G.1 (Head and Tail Intervals) Let r i ∈ R be a relation and m h = (M , f h ) be a relation configuration. We call an interval a head interval H ri,mh and respectively a tail interval T ri,mh of r i and m h if for arbitrary entities e h , e t ∈ E, virtual assignment functions f v , and complete model configuration m over m h and f v the following property holds: if m captures a triple r 1 (e h , e t ) to be true, then f v (e h ) ∈ H ri,mh and f v (e t ) ∈ T ri,mh . Using the Definition G.1 and the insights provided by (M), (I), and (C), we will followingly prove that ExpressivE captures each considered pattern exactly and exclusively. Proposition G.1 (Symmetry (Exactly and Exclusively)) Let m h = (M , f h ) be a relation configuration and r 1 ∈ R be a symmetric relation, i.e., r 1 (X, Y ) ⇒ r 1 (Y, X) holds for any entities X, Y ∈ E. Then m h can capture r 1 (X, Y ) ⇒ r 1 (Y, X) exactly and exclusively. Proposition G.2 (Anti-Symmetry (Exactly and Exclusively)) Let m h = (M , f h ) be a relation configuration and r 1 ∈ R be an anti-symmetric relation, i.e., r 1 (X, Y ) ⇒ ¬r 1 (Y, X) holds for any entities X, Y ∈ E. Then m h can capture r 1 (X, Y ) ⇒ ¬r 1 (Y, X) exactly and exclusively. The proofs for Propositions G.1 and G.2 are straightforward, as the only positive pattern that contains only one relation is symmetry. Furthermore, since (i) Propositions F.1 and F.2 have shown that there is a relation configuration that can capture symmetry/anti-symmetry exactly and (ii) a hyperparallelogram cannot be symmetric and anti-symmetric simultaneously, we have shown that there is a relation configuration that captures symmetry/anti-symmetry exactly and exclusively, proving Propositions G.1 and G.2.

Proposition G.3 (Inversion (Exactly and Exclusively))

Let m h = (M , f h ) be a relation configuration and r 1 , r 2 ∈ R be relations where r 1 (X, Y ) ⇔ r 2 (Y, X) holds for any entities X, Y ∈ E. Then m h can capture r 1 (X, Y ) ⇔ r 2 (Y, X) exactly and exclusively. The proof for Proposition G.3 is straightforward, as the only positive patterns that contain at most two relations are symmetry, hierarchy, and inversion. Furthermore, since (i) Proposition F.3 has shown that there is a relation configuration that can capture inversion exactly and (ii) it is simple to show that a hyper-parallelogram can be the mirror image of another hyper-parallelogram without one of them subsuming the other (hierarchy) or one of them being symmetric across the identity line (symmetry), we have shown that there is a relation configuration that captures inversion exactly and exclusively, proving Proposition G.3. Proposition G.4 (Hierarchy (Exactly and Exclusively)) Let m h = (M , f h ) be a relation configuration and r 1 , r 2 ∈ R be relations where r 1 (X, Y ) ⇒ r 2 (X, Y ) holds for any entities X, Y ∈ E. Then m h can capture r 1 (X, Y ) ⇒ r 2 (X, Y ) exactly and exclusively. The proof for Proposition G.4 is straightforward, as the only positive patterns that contain at most two relations are symmetry, hierarchy, and inversion. Furthermore, since (i) Proposition F.4 has shown that there is a relation configuration that can capture hierarchy exactly and (ii) it is simple to show that a hyper-parallelogram can subsume another hyper-parallelogram without one of them being the mirror image across the identity line of the other (inversion) or one of them being symmetric across the identity line (symmetry), we have shown that there is a relation configuration that captures hierarchy exactly and exclusively, proving Proposition G.4.

Proposition G.5 (Intersection (Exactly and Exclusively))

Let m h = (M , f h ) be a relation configuration and r 1 , r 2 , r 3 ∈ R be relations where r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) holds for any entities X, Y ∈ E. Then m h can capture r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly and exclusively. Proof What is to be shown is that m h can capture intersection (r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y )) exactly and exclusively. We have already shown that m h can capture r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly in Proposition F.5. Now, to show that m h can capture intersection exactly and exclusively, we construct an instance of m h such that (1) m h captures intersection r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) and (2) m h does not capture any positive pattern ϕ such that r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) ̸ |= ϕ. Table 6 : One-dimensional relation embeddings of a relation configuration m h that captures intersection (i.e., r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y )) exactly and exclusively. c h d h r t c t d t r h r 1 -6 2 2 8 2 3 r 2 -11.5 3 5 11 3 3 r 3 -9.5 5 5 9 1 3 Figure 2 visualizes the hyper-parallelograms defined by the one-dimensional relation embeddings of Table 6 . In particular, it displays the hyper-parallelograms of r 1 , r 2 , r 3 . As can be easily seen in Figure 2 (and proven using Proposition F.5), the relation configuration m h described by Table 6 captures r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) exactly, as f h (r 3 ) subsumes the intersection of f h (r 1 ) and f h (r 2 ). Now it remains to show that m h does not capture any positive pattern ϕ such that r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) ̸ |= ϕ. To show this, we will show that (M) the mirror image of any relation hyper-parallelogram is not subsumed by any other relation hyper-parallelogram (i.e., no unwanted symmetry nor inversion pattern is captured) and (C) the compositionally defined region defined by any pair of hyper-parallelograms is not subsumed by any relation hyper-parallelogram (i.e., no unwanted composition pattern is captured). We do not need to show that (I) no unwanted relation hyper-parallelograms intersect, as by the nature of the intersection pattern, f h (r 1 ), f h (r 2 ), and f h (r 3 ) should intersect. 6 . For (M), observe in Figure 2 that all hyper-parallelograms f h (r 1 ), f h (r 2 ), and f h (r 3 ) of m h are on the same side of the identity line. Thus, the mirror images of f h (r 1 ), f h (r 2 ), and f h (r 3 ) across the identity line must be on the other side. Therefore, we have shown (M), i.e., that no relation hyper-parallelograms subsume the mirror image of any other relation hyper-parallelogram and thus that m h does not capture any unwanted symmetry nor inversion pattern. For (C), observe in Figure 2 that for the displayed relation configuration m h , the head intervals of any relation hyper-parallelogram of m h contain only negative values and the tail intervals contain only positive values. Thus, for any pair (r i , r j ) ∈ {r 1 , r 2 , r 3 } 2 , there is no virtual assignment function f v such that m over m h and f v captures r i (x, y) and r j (y, z) for arbitrary entities x, y, z ∈ E. Therefore, no pair of relations (r i , r j ) defines a compositionally defined region. Thus, we have shown (C) that no compositionally defined region is subsumed by any relation hyper-parallelogram (as no compositionally defined region exists) and thus that m h does not capture any unwanted general composition pattern. By Proposition F.5 and by proving (M) and (C), we have shown that the constructed relation configuration m h of Table 6 captures the intersection pattern r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) and does not capture any positive pattern ϕ such that r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y ) ̸ |= ϕ. This means by the definition of capturing patterns exactly and exclusively that m h captures intersection (r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ r 3 (X, Y )) exactly and exclusively, proving the proposition. □ Proposition G.6 (General Composition (Exactly and Exclusively)) Let r 1 , r 2 , r 3 ∈ R be relations and let m h = (M , f h ) be a relation configuration, where f h is defined over r 1 , r 2 , and r 3 . Furthermore let r 3 be the composite relation of r 1 and r 2 , i.e., r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) holds for all entities X, Y, Z ∈ E. Then m h can capture r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly and exclusively. Proof What is to be shown is that m h can capture general composition (r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z)) exactly and exclusively. We have already shown that m h can capture r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) exactly in Proposition F.7. Now, to show that m h can capture general composition exactly and exclusively, we construct an instance of m h such that (1) m h captures general composition and (2) m h does not capture any positive pattern ϕ such that r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) ̸ |= ϕ. Figure 3 visualizes the hyper-parallelograms defined by the one-dimensional relation embeddings of Table 7 . In particular, it displays the hyper-parallelograms of r 1 , r 2 , r 3 , and the compositionally defined region s d of auxiliary relation r d such that r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for f h (r 1 ), f h (r 2 ), and s d . As can be easily seen in Figure 3 (and proven using Theorem E.2 and Proposition F.7), the relation configuration m h described by Table 7 captures r 1 (X, Y )∧r 2 (Y, Z) ⇒ r 3 (X, Z) exactly, as f h (r 3 ) subsumes the compositionally defined region s d . For (M), observe in Figure 3 that all hyper-parallelograms f h (r 1 ), f h (r 2 ), and f h (r 3 ) of m h are on the same side of the identity line. Thus, the mirror images of f h (r 1 ), f h (r 2 ), and f h (r 3 ) across the identity line must be on the other side. Therefore, we have shown (M), i.e., that no relation hyper-parallelograms subsume the mirror image of any other relation hyper-parallelogram and thus that m h does not capture any unwanted symmetry nor inversion pattern. For (I), observe in Figure 3 that no relation hyper-parallelograms f h (r 1 ), f h (r 2 ), and f h (r 3 ) of m h intersect with each other. Thus, we have shown (I), i.e., that m h does not capture any unwanted hierarchy nor intersection pattern. For (C), observe in Figure 3 that for the displayed relation configuration m h , the following head and tail intervals can be defined: (i) H r1,mh = [-4, 0] and T r1,mh = [1, 3], (ii) H r2,mh = [1, 3] and T r2,mh = [6, 9], and (iii) H r3,mh = [-6, -1] and T r3,mh = [4, 10] . The tail intervals solely overlap with the head intervals for T r1,mh and H r2,mh , i.e., T ri,mh ∩ H rj ,mh = ∅, (r i , r j ) ∈ {r 1 , r 2 , r 3 } 2 \ (r 1 , r 2 ). Thus, for any pair (r i , r j ) ∈ {r 1 , r 2 , r 3 } 2 \ (r 1 , r 2 ) there is no virtual assignment function f v such that m over m h and f v captures r i (x, y) and r j (y, z) for arbitrary entities x, y, z ∈ E. Therefore, (r 1 , r 2 ) is the only pair of relations that defines a compositionally defined region, i.e., no other pair of relations defines a compositionally defined region.  (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for all entities X, Y, Z ∈ E. Then m h can capture r 1 (X, Y )∧r 2 (Y, Z) ⇔ r d (X, Z) exactly and exclusively. The proof for Proposition G.7 is straightforward, as it can be proven analogously to Proposition G.6 with the only difference that instead of defining a relation embedding f h (r 3 ) that subsumes the compositionally defined region, we define the compositionally defined relation r d whose embedding f h (r d ) is equal to the compositionally defined region s d . We have stated the relation embeddings for r d in Table 7 and also visualized f h (r d ) in Figure 3 . Finally, the sum of Propositions G.1-G.7 proves Theorems 5.2 and 5.4. Thus, we have theoretically shown that ExpressivE can capture any pattern from Table 1 exactly and exclusively.

H EXTENDED COMPOSITIONS

This section provides theoretical evidence that ExpressivE is not limited to capturing a single composition pattern. Specifically, we prove that ExpressivE can capture more than one application of a composition pattern. The following theoretical result is empirically backed up by further experimental results of Appendix I.3. Proposition H.1 Let r 1 , r 2 , r 3 , r 1,2 , r 1,2,3 ∈ R be relations and let m h = (M , f h ) be a relation configuration, where f h is defined over r 1 , r 2 , r 3 , r 1,2 , and r 1,2,3 . Furthermore, let r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 1,2 (X, Z) and r 1,2 (X, Y )∧r 3 (Y, Z) ⇒ r 1,2,3 (X, Z) hold for all entities X, Y, Z ∈ E. Then m h can capture r 1 (X, Y )∧r 2 (Y, Z) ⇒ r 1,2 (X, Z) and r 1,2 (X, Y )∧r 3 (Y, Z) ⇒ r 1,2,3 (X, Z) exactly and exclusively. Proof What is to be shown is that m h can capture ϕ 1 := r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 1,2 (X, Z) and ϕ 2 := r 1,2 (X, Y ) ∧ r 3 (Y, Z) ⇒ r 1,2,3 (X, Z) exactly and exclusively. To show that there is an m h that captures ϕ 1 and ϕ 2 exactly and exclusively, we construct an instance of m h such that (1) m h captures ϕ 1 and ϕ 2 exactly, and (2) m h does not capture any positive pattern ψ such that (ϕ 1 ∧ ϕ 2 ) ̸ |= ψ. Figure 4 visualizes the hyper-parallelograms defined by the one-dimensional relation embeddings of Table 8 . In particular, it displays the hyper-parallelograms of r 1 , r 2 , r 1,2 , r 3 , r 1,2,3 , and the compositionally defined regions  s d 1,2 , s d 2,3 , s d (1,2),3 , s d 1,(2,3) of auxiliary relation r d 1,2 , r d 2,3 r d (1,2),3 , and r d 1,(2,3) such that r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d 1,2 (X, Z), r 2 (X, Y ) ∧ r 3 (Y, Z) ⇔ r d 2,3 (X, Z), (X, Y ) ∧ r 3 (Y, Z) ⇔ r d (1,2),3 (X, Z), and r 1 (X, Y ) ∧ r d 2,3 (Y, Z) ⇔ r d 1,(2,3) (X, Z) hold for f h (r 1 ), f h (r 2 ), f h (r 3 ), f h (r 1,2 ), f h (r 1,2,3 ), s d 1,2 , s d 2,3 , s d (1,2),3 , and s d 1,(2,3) . Note that from ϕ 1 and ϕ 2 together with the auxiliary relation r d 1,(2,3) -defined above -follows that r d 1,2 (X, Y ) ⇒ r 1,2 (X, Y ), r d (1,2),3 (X, Y ) ⇒ r 1,2,3 (X, Y ), and r d 1,(2,3) (X, Y ) ⇒ r 1,2,3 (X, Y ) need to be satisfied. Thus, as can be easily seen in Figure 4 (and proven using Theorem E.2 and Proposition F.7), the relation configuration m h described by Table 8 captures ϕ 1 and ϕ 2 exactly, as f h (r 1,2 ) subsumes the compositionally defined region s d 1,2 and as f h (r 1,2,3 ) subsumes the compositionally defined regions s d (1,2),3 and s d 1,(2,3) . For (M), observe in Figure 4 that all hyper-parallelograms f h (r 1 ), f h (r 2 ), f h (r 3 ), f h (r 1,2 ), and f h (r 1,2,3 ) of m h are on the same side of the identity line. Thus, the mirror images of any of these hyper-parallelograms across the identity line must be on the other side. Therefore, we have shown (M), i.e., that no relation hyper-parallelograms subsume the mirror image of any other relation hyper-parallelogram and thus that m h does not capture any unwanted symmetry nor inversion pattern. For (I), observe in Figure 4 that no relation hyper-parallelograms f h (r 1 ), f h (r 2 ), f h (r 3 ), f h (r 1,2 ), and f h (r 1,2,3 ) of m h intersect with each other. Thus, we have shown (I), i.e., that m h does not capture any unwanted hierarchy nor intersection pattern. For (C), recall Definition G.1, describing head and tail intervals. We observe in Figure 4 that for the displayed relation configuration m h , the following head and tail intervals can be defined: (i) H r1,mh = [-4, 0] and T r1,mh = [1, 3], (ii) H r2,mh = [1, 3] and T r2,mh = [6, 9], (iii) H r1,2,mh = [-6, -1] and T r1,2,mh = [4, 9.7], (iv) H r3,mh = [7, 9] and T r3,mh = [10, 12], (v) H r1,2,3,mh = [-6, 0] and T r1,2,3,mh = [9.8, 12] , and (vi) H r d 2,3 ,mh = [1, 3] and T r d 2,3 ,mh = [9.8, 12] . The tail intervals solely overlap with the head intervals for the pairs {(r 1 , r 2 ), (r 2 , r 3 ), (r 1,2 , r 3 ), (r 1 , r d 2,3 )}, i.e., T ri,mh ∩ H rj ,mh = ∅, (r i , r j ) ∈ {r 1 , r 2 , r 3 } 2 \ {(r 1 , r 2 ), (r 2 , r 3 ), (r 1,2 , r 3 ), (r 1 , r d 2,3 )}. Thus, for any pair (r i , r j ) ∈ {r 1 , r 2 , r 3 } 2 \ {(r 1 , r 2 ), (r 2 , r 3 ), (r 1,2 , r 3 ), (r 1 , r d 2,3 )} there is no virtual assignment function f v such that m over m h and f v captures r i (x, y) and r j (y, z) for arbitrary entities x, y, z ∈ E. Therefore, {(r 1 , r 2 ), (r 2 , r 3 ), (r 1,2 , r 3 ), (r 1 , r d 2,3 )} are the only pairs of relations that define a compositionally defined region, i.e., no other pair of relations defines a compositionally defined region. Thus we have shown that (1) m h captures ϕ 1 and ϕ 2 exactly -since s d 1,2 ⊆ f h (r 1,2 ) and (s d (1,2),3 ∪ s d 1,(2,3) ) ⊆ f h (r 1,2,3 ) -and (2) the only other existing compositionally defined region s d 2,3 is disjoint with any other relation hyper-parallelograms. By (1) and (2), we have shown (C) that no other compositionally defined region (specifically s d 1,2 ) is subsumed by any other relation and thus that no unwanted composition pattern is captured by m h . By proving that the constructed m h captures ϕ 1 and ϕ 2 exactly and by (I), (M), and (C), we have shown that the constructed relation configuration m h of Table 8 captures ϕ 1 and ϕ 2 and does not capture any positive pattern ψ such that (ϕ 1 ∧ ϕ 2 ) ̸ |= ψ. This means by the definition of capturing patterns exactly and exclusively that m h captures ϕ 1 and ϕ 2 exactly and exclusively, proving the proposition. □

I ADDITIONAL EXPERIMENTS

This section presents additional experiments, providing further empirical evidence for our theoretical results. Specifically, Section I.1 studies the benchmark performances of ExpressivE and its closest relatives on WN18RR stratified by the cardinality of each relation, providing empirical evidence that ExpressivE performs well on 1-1, 1-N, N-1, and N-N relations. Section I.2 provides empirical evidence that ExpressivE can capture general composition and provides empirical support for a link between ExpressivE's significant performance gain on WN18RR and inference capabilities. Finally, Section I.3 discusses empirical results, revealing that ExpressivE can reason over more than one step of composition patterns.

I.1 CARDINALITY EXPERIMENTS

This section provides empirical evidence for our theoretical result that ExpressivE performs well on 1-N, N-1, and N-N relations. Experiment Setup. Following the procedure of Bordes et al. (2013) , we have categorized the relations of WN18RR into four cardinality classes, specifically 1-1, 1-N, N-1, and N-N. As in Bordes et al. (2013) , we have classified a relation r ∈ R by computing: • µ rt the averaged number of head entities h ∈ E per tail entity t ∈ E, appearing in a triple r(h, t) of WN18RR. • µ rh the averaged number of tail entities t ∈ E per head entity h ∈ E, appearing in a triple r(h, t) of WN18RR. Following the soft classification of Bordes et al. (2013) , a relation is: • 1-1 if µ rt ≤ 1.5 and µ rh ≤ 1.5 • 1-N if µ rt ≤ 1.5 and µ rh ≥ 1.5 • N-1 if µ rt ≥ 1.5 and µ rh ≤ 1.5 • N-N if µ rt ≥ 1.5 and µ rh ≥ 1.5 In particular, ExpressivE outperforms both RotatE and BoxE consistently on N-N relations, which are often considered the most complex relations to capture in KGC with regard to cardinalities. Thus, Table 9 provides empirical results supporting our theoretical claim that ExpressivE can capture 1-1, 1-N, N-1, and N-N relations well.

I.2 GENERAL COMPOSITION AND LINK TO PERFORMANCE GAIN

This section provides empirical evidence for the theoretical result of Appendices F and G that ExpressivE can capture general composition exactly and exclusively. Even more, the experiments of this section give evidence for a direct link between the support of general composition and ExpressivE's performance gain on WN18RR. In the following, we first discuss our experiments' preparation and setup details, followed by the considered hypotheses and final results. Pattern Identification. Our first goal, to provide empirical evidence for the discussed points, was to identify patterns occurring in WN18RR. To reach this goal, we have analyzed patterns mined with AMIE+ (Galárraga et al., 2015) from WN18RR by Akrami et al. (2020) that were provided in a GitHub repositoryfoot_1 . To identify the most relevant patterns, we have -similar to the discussion of (Galárraga et al., 2013; 2015) -sorted the patterns ρ = ϕ B1 ∧ • • • ∧ ϕ Bm ⇒ r(X, Y ) by their head coverage h(ρ), which is formally defined as (Galárraga et al., 2013) : h(ρ) = |{(x, y) ∈ E 2 | r(x, y) ∈ G ∧ ∃z 1 . . . z k (ϕ B1 (z 1 , z 2 ) ∈ G ∧ • • • ∧ ϕ Bm (z k-1 , z k ) ∈ G)}| |{(x, y) ∈ E 2 | r(x, y) ∈ G}| On an intuitive level, the head coverage h(ρ) represents the ratio of true triples implied by the pattern ρ on a given knowledge graph (G, E, R). Pattern Selection. To analyze the most relevant patterns in the following experiments, we have selected any patterns whose head coverage is greater than 15% (as inspection of the head coverage of AMIE shows a very low number of inferred triples contained in the test set below that). From these patterns, we have left out any pattern with the head relation _similar _to, as ExpressivE, BoxE, and RotatE already have an MRR of 1 on this relation, thus further stratifying _similar _to's test triples will not reveal novel information. This procedure leads to the following set of patterns, where relations r -1 represent the inverse counterpart of relations r ∈ R: S 1 := _verb_group(Y, X) ⇒ _verb_group(X, Y ) C 2 := _derivationally_related _form(X, Y ) ∧ _derivationally_related _form(Y, Z) ⇒ _verb_group(X, Z) C 3 := _derivationally_related _form(X, Y ) ∧ _derivationally_related _form -1 (Y, Z) ⇒ _verb_group(X, Z) C 4 := _derivationally_related _form -1 (X, Y ) ∧ _derivationally_related _form(Y, Z) ⇒ _verb_group(X, Z) C 5 := _also_see(X, Y ) ∧ _also_see(Y, Z) ⇒ _also_see(X, Z) C 6 := _also_see(X, Y ) ∧ _also_see -1 (Y, Z) ⇒ _also_see(X, Z) S 7 := _also_see(Y, X) ⇒ _also_see(X, Y ) C 8 := _hypernym(X, Y ) ∧ _synset_domain_topic_of (Y, Z) ⇒ _synset_domain_topic_of (X, Z) Experimental Setup. For each of these patterns ρ we have computed all triples that (i) can be derived by ρ from the data known to our model and (ii) are known to be true in the KG, yet unseen to our models. Thus, for each pattern ρ, we have computed the set s ρ , containing all triples that (i) can be derived with ρ from the training set and (ii) are contained in the test set of WN18RR. We have used each of the computed sets of triples s ρ to evaluate the performance of ExpressivE, BoxE, and RotatE on the corresponding pattern ρ. Hypotheses. Note that (as discussed in Appendix K.1) compositional definition r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r 3 (X, Z) defines the triples of the composite relation r 3 completely, whereas general composition r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) allows r 3 to contain more triples than those that the compositional definition pattern can directly infer. Thus, if ExpressivE captures general composition and if RotatE captures compositional definition, we expect the following behavior: • H1. RotatE will perform well solely on relations occurring as the head of maximally one composition pattern, as RotatE solely supports compositional definition. • H2. ExpressivE will perform well even when a relation is defined by multiple composition patterns and/or multiple other patterns since ExpressivE supports general composition. 

I.3 MULTIPLE STEPS OF COMPOSITION

In this section, we provide empirical evidence for the theoretical results of Appendix H. To evaluate how well ExpressivE supports more than one step of a composition pattern, our first goal was to identify multi-step patterns (i.e., patterns that can be "chained" in multiple steps) occurring in WN18RR. We now recall parts of Appendix I.2 for the self-containedness of this section -readers who have read that section can skip ahead to the "experimental setup" paragraph. To reach the goal of identifying multi-step patterns occurring in WN18RR, we have analyzed patterns mined with AMIE+ (Galárraga et al., 2015) from WN18RR by Akrami et al. (2020) that were provided in a GitHub repositoryfoot_2 . To identify the most relevant patterns, we have -similar to the discussion of (Galárraga et al., 2013; 2015) -sorted the patterns ρ = ϕ B1 ∧ • • • ∧ ϕ Bm ⇒ r(X, Y ) by their head coverage h(ρ), which is formally defined as (Galárraga et al., 2013) : h(ρ) = |{(x, y) ∈ E 2 | r(x, y) ∈ G ∧ ∃z 1 . . . z k (ϕ B1 (z 1 , z 2 ) ∈ G ∧ • • • ∧ ϕ Bm (z k-1 , z k ) ∈ G)}| |{(x, y) ∈ E 2 | r(x, y) ∈ G}| On an intuitive level, the head coverage h(ρ) represents the ratio of true triples implied by the pattern ρ on a given knowledge graph (G, E, R). Next, we present the four multi-step patterns with head coverage of at least 15%, as discussed in Appendix I.2: capture composition patterns. Yet, employing functional composition defines the composite relation r d completely and thus represents a more restricted pattern that we call compositional definition r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z). In contrast, general composition r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) does not completely define its composite relation r 3 . This means that in the case of general composition, the composite relation r 3 may contain more triples than those that are directly inferable by compositional definition patterns. Due to this notion of extensibility, we can describe general composition as a combination of compositional definition and hierarchy, i.e., a general composition pattern defines its composite relation r 3 as a superset (hierarchy component) of the compositionally defined relation r d . This explains why no KGE has managed to capture general composition, as any SotA KGE that supports some notion of composition cannot represent hierarchy and vice versa (as will be discussed in Appendix K.2) , yet both are essential to support general composition. Therefore, to capture general composition, ExpressivE combines hierarchy and compositional definition patterns, as discussed in more detail in Section 5.2.

K.2 ANALYSIS OF SPATIAL MODELS

Spatial models embed a relation r ∈ R via spatial regions in the embedding space. Furthermore, they embed an entity e a ∈ E in the role of a head and tail entity with two independent embeddings e h a ∈ K d and e t a ∈ K d . A triple r(e h , e t ) is true for spatial models if the embeddings of the entities e h and e t lie within the respective spatial regions of the relation r. Thus, spatial models may capture hierarchy patterns via the spatial subsumption of the regions defined by the relations. However, since there is no relation between e h a and e t a , spatial models -such as BoxE (Abboud et al., 2020 )cannot capture composition. ExpressivE embeds relations as regions (spatial nature). Yet to achieve the functional nature, it cannot use two independent entity embeddings in the typical embedding space -as we discussed above. The solution and key difference to BoxE is to define the virtual triple space, which is formed by concatenating head and tail entity embeddings of the same embedding space (as described in detail in Section 4). More specifically, any line through the virtual triple space defines a function between head and tail entity embeddings of the same space -the key to the functional nature: • Functional nature. Regions in this virtual triple space establish a mathematical relation between head and tail entities of the same space, by which composition can be captured. • Spatial nature. At the same time, regions can subsume each other, by which -as is intuitive -hierarchy patterns can be captured. Finally, it is precisely the combination of the functional and spatial nature that allows ExpressivE to capture general composition, as described in detail in Section 5.2.

L TRADE-OFF: EXPRESSIVE POWER VS. DEGREES OF FREEDOM

This section discusses the trade-off between a higher expressive power and lower degrees of freedom, observable in the results of Table 3 . Specifically, this trade-off manifests in Table 3 's benchmark results in the following way: • Functional ExpressivE has a lower expressive power compared to Base ExpressivE as it effectively loses the ability to capture hierarchy patterns. The effect of the reduced expressive power of Functional ExpressivE can be seen in the performance drop on WN18RR over Functional ExpressivE in Table 3 . However, since Functional ExpressivE uses fewer parameters than Base ExpressivE, it has a lower degree of freedom, making it less likely to stop in a local minimum than Base ExpressivE as can be seen on Functional ExpressivE's performance on FB15k-237 in Table 3 . • Base ExpressivE has the full expressive powers -the high degree of freedom heightening the chance of ending in a local minimum. Table 3 reveals the significant performance increase of Base ExpressivE over Functional ExpressivE on WN18RR, giving evidence that the expressive power is helpful, but the downside being that its higher degrees of freedom may make it likelier to stop in a local optimum, manifesting in its performance drop over Functional ExpressivE on FB15k-237. Further analyzing this trade-off to establish a link between dataset properties and the necessary expressive power of a KGE will be subject for interesting future work.

M EXPERIMENTAL DETAILS

This section discusses our experiment setup, benchmark datasets, and evaluation metrics in detail. The concrete experiment setups, including details of our implementation, used hardware, learning setup, and chosen hyperparameters, are discussed in Subsection M.1. Subsection M.2 lists properties of the used benchmark datasets and Subsection M.3 lists properties of the used ranking metrics.

M.1 EXPERIMENT SETUP AND EMISSIONS

Implementation Details. We have implemented ExpressivE in PyKEEN 1.7 (Ali et al., 2021) , which is a Python library that uses the MIT license and supports many benchmark KGs and KGEs. Thereby, we make ExpressivE comfortably accessible to the community for future benchmarks and experiments. We have made our code publicly available in a GitHub repositoryfoot_3 . It contains, in addition to the code of ExpressivE, a setup file to install the necessary libraries and a ReadMe.md file containing library versions and running instructions to facilitate the reproducibility of our results. Training Setup. Each model was trained and evaluated on one of 4 GeForce RTX 2080 GPUs of our internal cluster. Specifically, the training process uses the Adam optimizer (Kingma & Ba, 2015) to optimize the self-adversarial negative sampling loss (Sun et al., 2019) . ExpressivE is trained with gradient descent for up to 1000 epochs with early stopping, finishing the training if after 100 epochs the Hits@10 score did not increase by at least 0.5% for WN18RR and 1% for FB15k-237. We have increased the patience for OneBand ExpressivE to 150 epochs for FB15k-237, as it converges slower than the other ablation versions of ExpressivE. We use the model of the final epoch for testing. Each experiment was repeated three times to account for small performance fluctuations. In particular, the MRR values fluctuate by less than 0.003 between runs for Base and Functional ExpressivE on any dataset. We performed hyperparameter tuning over the learning rate λ, embedding dimensionality d, number of negative samples neg, loss margin γ, adversarial temperature α, and minimal denominator D min . Specifically, two mechanisms were employed to implicitly regularize the hyper-parallelogram: (1) the hyperbolic tangent function tanh was element-wise applied to each entity embedding e p , slope vector r p i , and center vector c p i , projecting them into the bounded space [-1, 1] d , and (2) the size of each hyper-parallelogram is limited by the novel D min parameter. In the following, we will briefly introduce the D min parameter and its function. Minimal Denominator D min . As can be easily shown, Equations 108 describe the relation hyperparallelogram's center, and Equations 109-110 its corners in the virtual triple space. Note that the denominator of each term is equal to (1 -r h i r t i ). Since a small denominator in Equations 109 and 110 produces large corners and, therefore, a large hyper-parallelogram, we have introduced the hyperparameter D min , allowing ExpressivE to tune the maximal size of its hyperparallelograms. In particular, D min constrains the relation embeddings such that (1-r h i r t i ) ⪯ D min , thereby constraining the maximal size of a hyper-parallelogram as required. center h i = c h i + r t i c t i 1 - Hyperparameter Optimization. Following Abboud et al. (2020) , we have varied the learning rate by λ ∈ {a * 10 -b |a ∈ {1, 2, 5} ∧ b ∈ {-2, -3, -4, -5, -6}}, the margin m by integer values between 3 and 24 inclusive, the adversarial temperature by α ∈ {1, 2, 3, 4}, and the number of negative samples by neg ∈ {50, 100, 150}. Furthermore, we have varied the novel minimal denominator parameter by D min ∈ {0, 0.5, 1}. We have tuned the hyperparameters of ExpressivE manually within the specified ranges. Finally, to allow a direct performance comparison of ExpressivE to its closest spatial relative BoxE and its closest functional relative RotatE, we chose for each benchmark the embedding dimensionality and negative sampling strategy of the best-performing RotatE and BoxE model (Abboud et al., 2020; Sun et al., 2019) . Concretely we chose self-adversarial negative sampling (Sun et al., 2019) and the embedding dimensionalities listed in Table 12 . The best performing hyperparameters for ExpressivE on each benchmark dataset are listed in Table 12 . We have used the hyperparameters of Table 12 for any considered version of ExpressivE -namely Base, Functional, EqSlopes, NoCenter, and OneBand ExpressivE -, which are described in the ablation study of Section 6.2. ) with a carbon efficiency of 0,432 kg/kWh (based on the OECD's 2014 yearly carbon efficiency average), 200 GPU hours correspond to a rough CO 2 emission of 18.58 kg CO 2 -eq. The estimations were conducted using the MachineLearning Impact calculator (Lacoste et al., 2019) .

M.2 BENCHMARK DATASETS

This section briefly discusses some details of the standard KGC benchmark datasets WN18RR (Dettmers et al., 2018) and FB15k-237 (Toutanova & Chen, 2015) . In particular, Table 13 We have not found licenses for FB15k-237 nor WN18RR. WN18RR is a subset of WN18 (Bordes et al., 2013) , whose license is also unknown, yet FB15k-237 is a subset of FB15k (Bordes et al., 2013) that uses the CC BY 2.5 license.

M.3 METRICS

We have evaluated ExpressivE by measuring the ranking quality of each test set triple r i (e h , e t ) over all possible head e ′ h and tail e ′ t : r i (e ′ h , e t ) for all e ′ h ∈ E and r i (e h , e ′ t ) for all e ′ t ∈ E. The mean reciprocal rank (MRR), and Hits@k are the standard evaluation metrics for this evaluation (Bordes et al., 2013) . In particular, we have reported the filtered metrics (Bordes et al., 2013) , i.e., where all triples that occur in the training, validation, and testing set (except the test triple that shall be ranked) are removed from the ranking, as ranking these triples high does not represent a faulty inference. Furthermore, the filtered MRR, Hits@1, Hits@3, and Hits@10 are the most widely used metrics for evaluating KGEs (Sun et al., 2019; Trouillon et al., 2016; Balazevic et al., 2019; Abboud et al., 2020) . Finally, we will briefly discuss the definitions of these metrics: the MRR represents the average of inverse ranks (1/rank), and Hits@k represents the proportion of true triples within the predicted triples whose rank is at maximum k.



https://github.com/AleksVap/ExpressivE https://github.com/idirlab/kgcompletion https://github.com/idirlab/kgcompletion https://github.com/AleksVap/ExpressivE



Figure 1: (a) Interpretation of relation parameters (orange dashed) as a parallelogram (green solid) in the j-th correlation subspace; (b) Multiple relation embeddings with the following properties: Symmetry(r B ), Anti-Symmetry (r A , r D , r E , r F ), Inversion (r D = r -1 A ), Hierarchy r A (X, Y ) ⇒ r C (X, Y ), Intersection r D (X, Y ) ∧ r E (X, Y ) ⇒ r F (X, Y ), MutualExclusion (e.g., r A ∩ r B = ∅).

Theorem 5.1 (Expressive Power) ExpressivE can capture any arbitrary graph G over R and E if the embedding dimensionality d is at least in O(|E| * |R|).

ExpressivE captures (a) symmetry, (b) anti-symmetry, (c) inversion, (d) hierarchy, (e) intersection, and (f) mutual exclusion.

Z) must be within the region s d of r d . (⇐) For any entity assignment function f e and virtual assignment function

In accordance withSun et al. (2019);Abboud et al. (2020), we define the following inference patterns:

proving the ⇐ part of the proposition. □ Proposition F.6 (Mutual Exclusion (Exactly)) Let m h = (M , f h ) be a relation configuration and r 1 , r 2 ∈ R be relations where r 1 (X, Y ) ∧ r 2 (X, Y ) ⇒ ⊥ holds for any entities X, Y ∈ E.

Figure 2: Visualization of the relation configuration m h described by Table6.

Figure 3: Visualization of the relation configuration m h described by Table7. Now it remains to show that m h does not capture any positive pattern ϕ such that r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) ̸ |= ϕ. To show this, we will show that (M) the mirror image of any relation hyper-parallelogram is not subsumed by any other relation hyper-parallelogram (i.e., no unwanted symmetry nor inversion pattern is captured), (I) no relation hyper-parallelograms intersect with each other (i.e., no unwanted hierarchy nor intersection pattern is captured), and (C) solely the compositionally defined region s d defined by f h (r 1 ) and f h (r 2 ) is subsumed by f h (r 3 ) and no other compositionally defined region is subsumed by any other relation hyper-parallelogram (i.e., no unwanted composition pattern is captured).

Figure 4: Visualization of the relation configuration m h described by Table8.

Model sizes of ExpressivE, BoxE, and RotatE models of equal dimensionality. ExpressivE almost halves the number of parameters for a d-dimensional embedding compared to BoxE and RotatE. Table 2 lists the model sizes of trained ExpressivE, BoxE, and RotatE models of the same dimensionality, empirically confirming that ExpressivE almost halves BoxE's and RotatE's sizes.

KGC performance of ExpressivE and SotA KGEs on FB15k-237 and WN18RR. The table shows the best-published results of the competing models per family, specifically: TransE and RotatE

Ablation study on ExpressivE's parameters.

on WN18RR. Table 5 lists the MRR of ExpressivE, RotatE, and BoxE for each of the 11 relations of WN18RR. Bold values represent the best and underlined values represent the second-best results across the compared models. Relation-wise MRR comparison of ExpressivE, RotatE, and BoxE on WN18RR.

Linking Embeddings to KGs. An ExpressivE model and a KG are linked via the following assignment functions: The entity assignment function f e : E → ϵ assigns an entity embedding e h ∈ ϵ to each entity e h ∈ E. Based on f e , the virtual assignment function f v : E × E → R 2d defines for any pair of entities (e h , e t ) ∈ E a virtual entity pair embedding f v (e h , e t ) = (f e (e h )||f e (e t )), where || represents the concatenation operator. Furthermore, the relation assignment function f h (r i ) : R → R 2d × R 2d × R 2d assigns a hyper-parallelogram to each relation r i . Intuitively, f h (r i ) defines a hyper-parallelogram in the virtual triple space R 2d as described in Section 4.Model Configuration. We call an ExpressivE model M together with a concrete relation assignment function f h a relation configuration m h = (M , f h ) and if it additionally has a concrete virtual assignment function f v , we call it a complete model configuration m

One-dimensional relation embeddings of a relation configuration m h that captures general composition (i.e., r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z)) and that captures compositional definition (i.e., r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z)) exactly and exclusively. c h d h r t c t d t r h

Thus, we have shown (C) that no other compositionally defined region is subsumed by any other relation (as no other compositionally defined region exists) and thus that no unwanted composition pattern is captured by m h . By Proposition F.7 and by proving (I), (M), and (C), we have shown that the constructed relation configuration m h of Table7captures the general composition pattern r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z) and does not capture any positive pattern ϕ such that r 1 (X, Y )∧r 2 (Y, Z) ⇒ r 3 (X, Z) ̸ |= ϕ. This means by the definition of capturing patterns exactly and exclusively that m h captures general composition (r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 3 (X, Z)) exactly and exclusively, proving the proposition. □ Proposition G.7 (Compositional Definition (Exactly and Exclusively)) Let r 1 , r 2 , r d ∈ R be relations and let m h = (M , f h ) be a relation configuration, where f h is defined over r 1 , r 2 , and r d . Furthermore, let r d be the compositionally defined relation of r 1 and r 2 , i.e., r 1

One-dimensional relation embeddings of a relation configuration m h that captures two general compositions (i.e., r 1 (X, Y ) ∧ r 2 (Y, Z) ⇒ r 1,2 (X, Z) and r 1,2 (X, Y ) ∧ r 3 (Y, Z) ⇒ r 1,2,3 (X, Z)) exactly and exclusively.

Now it remains to show that m h does not capture any positive pattern ψ such that (ϕ 1 ∧ ϕ 2 ) ̸ |= ψ.To show this, we will show that (M) the mirror image of any relation hyper-parallelogram is not subsumed by any other relation hyper-parallelogram (i.e., no unwanted symmetry nor inversion pattern is captured), (I) no relation hyper-parallelograms intersect with each other (i.e., no unwanted hierarchy nor intersection pattern is captured), and (C) solely that s d 1,2 ⊆ f h (r 1,2 ) and (s d (1,2),3 ∪ s d 1,(2,3) ) ⊆ f h (r 1,2,3 ) are satisfied, and no other compositionally defined region is subsumed by any other relation hyper-parallelogram (i.e., no unwanted composition pattern is captured).

MRR of ExpressivE, RotatE, and BoxE on WN18RR stratified by cardinality classes (1-1, 1-N, N-1, N-N). The best results are bold, and the second-best are underlined. Table9summarizes the performance results of ExpressivE and its closest spatial relative BoxE and functional relative RotatE on WN18RR, stratified by the four cardinality classes defined previously. It reveals that ExpressivE almost exclusively reaches a SotA or close-to-SotA performance on 1-N, N-1, and N-N relations.

MRR of ExpressivE, RotatE, and BoxE on WN18RR stratified by patterns S 1 -C 8 . S i represents a [S]ymmetry pattern, C i a [C]omposition pattern (i ∈ {1, . . . , 8}).Table 10 lists for each pattern S 1 to C 8 the performances of BoxE, RotatE, and ExpressivE on s ρ , where ρ ∈ {S 1 , . . . , C 8 } and where S i represents a symmetry pattern and C i represents a composition pattern. Table 10 provides evidence for both hypotheses: Evidence for H2. Yet, when a relation is defined via multiple patterns, RotatE's performance decreases drastically on most composition patterns compared to ExpressivE's performance, as can be seen for the patterns C 2 , C 3 , C 4 , and C 5 , giving evidence for H2. Conclusion Thus, these experiments provide empirical evidence for (1) ExpressivE can capture general composition, as ExpressivE and RotatE perform as expected by H1 and H2 under the assumption that ExpressivE captures general composition and that RotatE captures compositional definition. Furthermore, the experiments also provide evidence for (2) ExpressivE's ability to capture general composition contributes to the performance gain on WN18RR, as ExpressivE consistently outperforms RotatE and BoxE on the predicted triples of composition patterns.

Hyperparameters for the best-performing ExpressivE models on WN18RR and FB15k-237. CO2 Emission Related to Experiments. The computation of the reported experiments took below 200 GPU hours. On an RTX 2080 (TDP of 215W

lists the following characteristics of the benchmark datasets, namely their number of: entities |E|, relation types |R|, training, testing, and validation triples. Both WN18RR and FB15k-237 provide training, testing, and validation splits, which were directly used in our experiments. Benchmark dataset characteristics.

ACKNOWLEDGMENTS

We are grateful to Maximilian Beck for helpful discussions and feedback. This work has been funded by the Vienna Science and Technology Fund (WWTF) [10.47379/VRG18013].

annex

Published as a conference paper at ICLR 2023 the slope vectors be positive, i.e., r th i ⪰ 0 for i ∈ {1, 2}. If Inequalities 84-89 define the region s d of r d in the virtual triple space, then r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds for f h (r 1 ), f h (r 2 ) and s d in the virtual triple space.) be a compositional definition pattern over some relations r 1 , r 2 , r d ∈ R and over arbitrary entities X, Y, Z ∈ E. Furthermore, let f h be a relation assignment function that is defined over r 1 and r 2 such that for any i ∈ {1, 2},, and r th i = (r t i ||r h i ). Moreover, let the slope vectors be positive, i.e., r th i ⪰ 0 for i ∈ {1, 2}. What we want to show is that if Inequalities 84-89 define the region of r d in the virtual triple space, then r 1 (X, Y ) ∧ r 2 (Y, Z) ⇔ r d (X, Z) holds in the virtual triple space, i.e., for any entity assignment function f e and virtual assignment function Z ) must be within the region of r d . To prove this, we will construct a system of inequalities first that describes r d and satisfies the compositional definition pattern. Afterward, we will show that the constructed system of inequalities has the same behavior as Inequalities 84-89, proving Theorem E.2.(⇒) First, we choose an arbitrary entity assignment function f e and virtual assignment function f v over f e . We will henceforth denote the assigned entity embeddings with f e (X) = x, f e (Y ) = y, and f e (Z) = z to state our proofs concisely. Next, we assume that the left part of rThis means concretely that we can instantiate the following inequalities from Inequalities 1-2:Our next goal is to construct a system of inequalities that makes r d (X, Z) -the right part of the pattern -true, i.e., that defines the region of r d such that f v (X, Z) lies within it. To reach this goal, we substitute Inequalities 90-97 into each other to receive a system of inequalities that (1) has the same behavior as the initial set and (2) does not contain the entity embedding y. Since we have in the beginning assumed that the slope vectors are positive, we can substitute Inequalities 90-97 into each other as follows:1. 95 in 91 and 94 in 90 leading to 98 2. 95 in 92 and 94 in 93 leading to 99 3. 93 in 97 and 92 in 96 leading to 100 4. 91 in 96 and 90 in 97 leading to 101.The relation _also_see -1 of R 3 and R 4 represents the inverse relation of _also_see.Experimental Setup. For each of the selected multi-step patterns ρ ∈ {R 1 , R 2 , R 3 , R 4 }, we have generated three datasets, the 1-Step, 2-Steps, and 3-Steps sets. Specifically, we have generated for each ρ a j-Step(s) set by computing all triples that (i) can be derived by ρ in j steps from the data known to our model and (ii) are known to be true in the KG, yet unseen to our model. Thus, we have computed for each ρ a j-Step(s) set, containing all triples that (i) can be derived with ρ by j applications on the training set and (ii) are contained in the test set of WN18RR. The performance of ExpressivE on the computed datasets is summarised in Table 11 .Results. We report the performance of at most two steps of R 1 /R 3 /R 4 as after applying R 1 /R 3 /R 4 twice on the training set; no new triples are derived. Similarly, no new triples are derived after at most three steps of R 2 on the training set. We can see that the performance of ExpressivE increases by a large margin when more than one step of reasoning is considered, depicted by the performance gain of the 2-Steps and 3-Steps set over the 1-Step set. Interestingly, a small exception for this is R 1 , where we see a slightly worse behavior -inspection of the results shows that this is due to a single triple. In total, Table 11 provides empirical evidence that ExpressivE can capture chained composition patterns and thus perform more than one step of reasoning.Published as a conference paper at ICLR 2023 3-measuring the distance of entity pair embeddings (points) to relation embeddings (hyper-parallelograms) -is split into two parts:Intuition. As briefly explained in Section 4, the general idea of splitting the distance function is to assign high scores to entity pair embeddings within a hyper-parallelogram and low scores to entity pair embeddings outside the hyper-parallelogram. Specifically, if a triple r i (h, t) is captured to be true by an ExpressivE embedding, i.e., if τ ri(h,t) ⪯ d ht i , then the distance correlates inversely with the hyper-parallelogram's width -through the width-dependent factor w i -keeping low distances/gradients for points within the hyper-parallelogram. Otherwise, the distance correlatesagain through the width-dependent factor w i -linearly with the width to penalize points outside larger parallelograms.

K EXPRESSIVE'S TWO NATURES

In this section, we analyze functional and spatial models in more detail and outline how ExpressivE combines the capabilities of both model families. ExpressivE has two natures, specifically:• ExpressivE has a functional nature (in the spirit of functional models such as TransE and RotatE), allowing it to capture functional composition, discussed in detail in Appendix K.1. • ExpressivE has a spatial nature (in the spirit of spatial models such as BoxE), allowing it to capture hierarchy, discussed in detail in Appendix K.2.The combination of the functional and spatial nature is precisely the reason that allows ExpressivE to capture hierarchy and composition patterns jointly. In the following, we review the inference capabilities of spatial and functional models and discuss how ExpressivE combines both the spatial and functional nature.

K.1 ANALYSIS OF FUNCTIONAL MODELS

We recall the definition of functional models provided in Section 3, which states that functional models basically embed relations as functions f ri : K d → K d and entities as vectors e j ∈ K d over some field K. These models represent true triples r i (e h , e t ) as e t = f ri (e h ) in the embedding space.Our analysis has revealed that the root cause that functional models cannot capture general composition patterns lies within the functional nature of these models. In essence, these models employ mainly functions to embed relations. This allows them to employ functional composition f rd = f r2 • f r1 to

