TWO BIRDS, ONE STONE: AN EQUIVALENT TRANS-FORMATION FOR HYPER-RELATIONAL KNOWLEDGE GRAPH MODELING Anonymous authors Paper under double-blind review

Abstract

By representing knowledge in a primary triple associated with additional attributevalue qualifiers, hyper-relational knowledge graph (HKG) that generalizes triplebased knowledge graph (KG) has been attracting research attention recently. Compared with KG, HKG is enriched with the semantic difference between the primary triple and additional qualifiers as well as the structural connection between entities in hyper-relational graph structure. However, to model HKG, existing studies mainly focus on either semantic information or structural information therein, fail to capture both simultaneously. To tackle this issue, in this paper, we propose an equivalent transformation for HKG modeling, referred to as TransEQ. Specifically, the equivalent transformation transforms a HKG to a KG, which considers both semantic and structural characteristics. Then a generalized encoder-decoder framework is developed to bridge the modeling research between KG and HKG. In the encoder part, KG-based graph neural networks are leveraged for structural modeling; while in the decoder part, various HKG-based scoring functions are exploited for semantic modeling. Especially, we design the sharing embedding mechanism in the encoder-decoder framework with semantic relatedness captured. We further theoretically prove that TransEQ preserves complete information in the equivalent transformation, and also achieves full expressivity. Finally, extensive experiments on three benchmarks demonstrate the superior performance of TransEQ in terms of both effectiveness and efficiency. On the largest benchmark WikiPeople, TransEQ significantly improves the state-of-the-art models by 15% on MRR.

1. INTRODUCTION

In the past decade, knowledge graph (KG) has been widely studied in artificial intelligence area (Ji et al., 2021) . By representing facts into a triple of (s, r, o) with subject entity s, object entity o and relation r, KG stores real-world knowledge in a graph structure. However, recent studies find that KG with simple triples provides incomplete information (Galkin et al., 2020; Rosso et al., 2020) . For example, both (Alan Turing, educated at, Cambridge) and (Alan Turing, educated at, Princeton) are true facts in KG, which might be ambiguous when the degree matters. Hence, the hyper-relational KG (HKG) (Galkin et al., 2020; Rosso et al., 2020; Yu & Yang, 2021) , a.k.a., knowledge hypergraph (Fatemi et al., 2020; 2021) and n-ary knowledge base (Guan et al., 2019; Liu et al., 2021) , is proposed for more generalized knowledge representation. Formally, in HKG, a primary triple is augmented with additional attribute-value qualifiers for rich semantics, called the hyper-relational fact (Guan et al., 2020) . Note that the triple without qualifiers is a special case of hyper-relational facts. Taking Figure 1 as an example, both (Alan Turing, educated at, Cambridge, (degree, Bachelor)) and (Alan Turing, educated at, Princeton, (degree, PhD)) are hyper-relational facts, where (degree, Bachelor) and (degree, PhD) are qualifiers with the degree attribute considered. Such hyper-relational facts are ubiquitous that over 1/3 of the entities in Freebase (Bollacker et al., 2008) involve in them (Wen et al., 2016) . To learn from HKG and further benefit the downstream tasks, HKG modeling learns low-dimensional vector representations (embeddings) of entities and relations (Wang et al., 2021) , which designs a scoring function (SF) based on the embeddings to measure the hyper-relational fact plausibility such that valid ones obtain higher scores than invalid ones. Especially, existing studies mainly consider two aspects of semantic information and structural information in HKG for modeling. The semantic information emphasizes the interaction between entities and relations in a hyper-relational fact. Especially, there is a distinction, a.k.a., semantic difference (Galkin et al., 2020) between the primary triple and attribute-value qualifiers, e.g., the primary triple (Alan Turing, educated at, Cambridge) serves as the fundamental part and preserves the essential knowledge of Alan Turing's education experience at Cambridge, while the attribute-value qualifier (degree, Bachelor) serves as the auxiliary part and enriches the primary triple. To model the semantic information, early studies treat the primary relation and qualifier relations as an n-ary (n≥2) composed relation (Abboud et al., 2020) or multiple semantically equal attributes (Guan et al., 2019; Liu et al., 2021) , largely ignoring the semantic difference. Various SFs are further developed in recent studies (Galkin et al., 2020; Rosso et al., 2020; Yu & Yang, 2021) with semantic difference considered. On the other hand, the structural information focuses on the topological connection between entities in the hyper-relational graph structure, like an entity's neighboring entities under various hyperrelational links, e.g., in Figure 1 Bachelor and Michelle Obama are neighbors of Alan Turing via degree and alumni, respectively. Only few studies (Galkin et al., 2020; Yadati, 2020) extend hypergraph neural network (HGNN) based modules to capture the structural information in HKG, however, empirical results in (Yu & Yang, 2021) demonstrate that removing such modules will not bring performance degradation, i.e., the direct extensions are quite immature for effective structural information capture. Hence, to the best of our knowledge, none of existing studies achieve HKG modeling with both semantic information and structural information completely captured, and it is still an open problem to be addressed. Targeting on this open problem, we look back to KG modeling with an interesting observation that, recent studies (Vashishth et al., 2019; Yu et al., 2021) leverage an encoder-decoder framework for KG modeling, i.e., a powerful graph neural network (GNN) based encoder and an expressive SF-based decoder on triples are leveraged for structural information and semantic information, respectively. Inspired by this, in this paper, we propose an EQuivalent Transformation for HKG modeling, termed as TransEQ. Specifically, TransEQ designs an equivalent transformation on the hyper-relational graph structure, transforming a HKG to a KG with semantic difference considered, based on which a generalized encoder-decoder framework is further developed to capture information. For structural information, TransEQ introduces a GNN-based encoder on transformed KG with transformation characteristics combined. As for semantic information, to measure the plausibility of a hyper-relational fact, TransEQ exploits various SFs in existing HKG modeling studies as the decoder. The sharing embedding mechanism is further designed to capture the semantic relatedness between hyper-relational facts. In this way, with the equivalent transformation, the encoder-decoder framework in TransEQ captures not only structural information but also semantic information, which is the very innovation of this work, just like killing two birds with one stone. Besides, the flexible choice of SF in decoder ensures the full expressivity of TransEQ, representing all types of relations. We further theoretically prove that the proposed transformation is equivalent between a HKG and a KG without information loss. Extensive experiments show that TransEQ achieves the state-of-the-art results, obtaining a 15% relative increase of MRR on the largest benchmark WikiPeople.

2. RELATED WORK

As described before, related studies mainly exploit two aspects of semantic information and structural information for HKG modeling, considering HKG-based SF design and hyper-relational graph structure, respectively. Semantic Modeling Studies. Given a hyper-relational fact (Alan Turing, educated at, Cambridge, (degree, Bachelor)), some studies treat all involved relations as an n-ary composed relation educated at_degree (here n is 3) with the fact (educated at_degree, Alan Turing, Cambridge, Bachelor). For example, both m-TransH (Wen et al., 2016) and RAE (Zhang et al., 2018) extend the SF of TransH (Wang et al., 2014) to the hyper-relational case. BoxE (Abboud et al., 2020) combines translational idea with box embeddings. Moreover, GETD (Liu et al., 2020) and S2S (Di et al., 2021) are both generalized from TuckER (Balazevic et al., 2019) , where GETD further introduces tensor ring decomposition while S2S applies neural architecture search techniques. The bilinear product is also extended to multilinear product with symmetric embeddings in m-DistMult (Yang et al., 2015) , convolutional filters in HypE (Fatemi et al., 2020) , and relational algebra operations in ReAlE (Fatemi et al., 2021) . These studies are directly extended from KG modeling methods without multiple relational semantics considered. On the other hand, NaLP (Guan et al., 2019) and RAM (Liu et al., 2021) decompose all involved relations into semantically equal attributes, and treat the example fact into a collection of attribute-value qualifiers, (educated at_head, Alan Turing, educated at_tail, Cambridge, degree, Bachelor). Nevertheless, models above largely ignore the semantic difference in hyper-relational facts. To capture such semantic difference between the primary triple and attribute-value qualifiers, NeuInfer (Guan et al., 2020) and HINGE (Rosso et al., 2020) design two sub-modules for HKG modeling, i.e., one for triple modeling and the other one for qualifier modeling, where NeuInfer mainly adopts fully connected layers while HINGE resorts to convolutional neural networks. Besides, recent studies of GRAN (Wang et al., 2021) and Hy-Transformer (Yu & Yang, 2021) leverage transformer and embedding processing techniques for HKG modeling. However, these neural network based models rely on tremendous parameters for expressivity and are prone to overfitting. Structural Modeling Studies. G-MPNN (Yadati, 2020) ignores attribute information and treats HKG as a multi-relational ordered hypergraph with n-ary composed relations, and further proposes multi-relational HGNN for modeling. The rough design makes G-MPNN less competitive in practice. StarE (Galkin et al., 2020) firstly introduces GNN for HKG modeling with a relation-specific message passing mechanism developed. However, StarE aggregates hyper-relational fact messages for a specific entity only when the entity involves with the primary triple, but ignores the ones when the entity is in attribute-value qualifiers, i.e., StarE only captures connections among primary triples (Yu & Yang, 2021) . Thus, capturing structural information for HKG modeling is still immature and needs further investigation. Overall, existing HKG modeling studies are affected by various limitations from semantics and structure, while our proposed TransEQ elegantly models both aspects with full expressivity achieved, which is a quite important property for learning capacity in both KG modeling (Balazevic et al., 2019; Sun et al., 2019) and HKG modeling (Abboud et al., 2020; Liu et al., 2020) . Besides, the inductive link prediction and logical query for HKG are investigated in recent studies (Ali et al., 2021; Alivanistos et al., 2022) , which are beyond the scope of this paper.

3. METHOD

Here we first introduce the mathematical definition of HKG as well as the investigated problem. Definition 1 Hyper-relational Knowledge Graph. A HKG is defined as G H = (E, R, F H ), where E and R are the sets of entities and relations, respectively. A hyper-relational fact can be expressed as (s, r, o, {(a i , v i )} n i=1 ), where (s, r, o) is the primary triple and {(a i , v i ) | a i ∈ R, v i ∈ E} n i=1 is the attribute-value qualifier set. Moreover, F H ⊆ E × R × E × P denotes the fact set and P denotes all possible combinations of attribute-value qualifiers. Note that the number of qualifiers can be zero for a hyper-relational fact, i.e., HKG reduces to KG with an empty set P. In practice, attributes and values are also described by relations and entities, respectively (Galkin et al., 2020; Yu & Yang, 2021) . Then we state our research problem. Problem 1 HKG Modeling Problem. Given a HKG G H = (E, R, F H ), the HKG modeling problem aims to learn representations for entities and relations in E and R, respectively. Especially, the HKG is always incomplete, which specifies the research problem as HKG completion problem in practice, i.e., given an incomplete hyper-relational fact with an entity missing at triple or qualifiers, inferring the missing entity from E with observable facts F H . Moreover, HKG involves semantic information of primary triple and attribute-value qualifiers as well as structural information of hyper-relational graph structure, which should be elegantly considered in modeling. As described before, the encoder-decoder framework has shown superior performance to capture both structural information and semantic information in KG (Yu et al., 2021; Vashishth et al., 2019) , and thus a natural idea is to explore it for HKG modeling. Besides, standard RDF reification in semantic web (Frey et al., 2019) as well as compound value type in Freebase (Bollacker et al., 2008) are investigated to describe triple with metadata by transformation. These works provide a motivation to our work transforming a HKG to a KG with the encoder-decoder framework combined. Hence, we build TransEQ with such points in mind, which is presented in the following.

3.1. THE TRANSEQ MODEL

We now come to the details of TransEQ, the architecture of which is illustrated in Figure 2 . TransEQ first introduces the equivalent transformation with a HKG transformed to a KG, and then develops a generalized encoder-decoder framework, where a GNN-based encoder and a SF-based decoder are leveraged for modeling structural information and semantic information, respectively. 

3.1.1. ONE STONE: EQUIVALENT TRANSFORMATION

To identify the importance of transformation between HKG and KG, here we first introduce the definition of equivalent transformation. Definition 2 Equivalent Transformation. A transformation between HKG and KG is equivalent, if the transformation preserves the complete information, i.e., given any HKG and its transformed KG via the transformation, they can be retrieved from each other. Moreover, a hyper-relational fact (s, r, o, {(a i , v i )} n i=1 ) can be viewed as a hyper-relational edge, which connects entities of s, o, {v i } n i=1 with heterogeneous semantics of primary relation r and attributes {a i } n i=1 , as shown in Figure 3 (a) with the k-th fact in a HKG. Thus, motivated by star expansion, we propose an equivalent transformation for hyper-relational edges such that entities and relations in the original HKG are reorganized to the transformed KG with both structural information and semantic information preserved. Specifically, the equivalent transformation in Figure 3 (b) introduces a mediator entity b k to identify the fact, and the primary relation r is extended with two relations r sub and r obj for the relational edges between b k and subject entity s and object entity o, respectively. The attribute information in original hyper-relational fact is preserved by the attributed-based edges between b k and value entities. Moreover, a relational edge r connects entities s and o for semantic difference, i.e., such operation leads to a three-node clique motif (Milo et al., 2002) , reflecting the primary role of the triple. For better understanding, we present the execution process of the equivalent transformation in Algorithm 1. Especially, in lines 7-8, TransEQ utilizes different transformation operations to model the semantic difference. Besides, the original structure of the triple fact, i.e., hyper-relational fact without qualifiers, is kept to avoid redundancy. As proved later, such transformation brings no information loss, and provides a good basis for the following encoder-decoder framework. Algorithm 1: The algorithm for equivalent transformation. Input: HKG G H = (E, R, F H ); Init: Transformed KG G = (E, R, F) with F ← ∅; 1 Obtain the set of primary relations for facts in F H , R pri ; 2 for r ∈ R pri do 3 Define new relations r s , r o , and R ← R ∪ {r sub , r obj }; 4 end 5 for k-th fact (s, r, o, {(a i , v i )} n i=1 ) ∈ F H do 6 Define mediator entity b k , and E ← E ∪ {b k }; 7 F←F ∪ {(s, r, o), (b k , r sub , s), (b k , r obj , o)}; 8 F ← F ∪ {(b k , a i , v i )} n i=1 ; // semantic difference modeling in lines 7-8. 9 end Output: Transformed KG G = (E, R, F).

3.1.2. TWO BIRDS: ENCODER-DECODER FRAMEWORK

To model both structural information and semantic information in the original HKG, TransEQ further introduces a generalized encoder-decoder framework on the transformed KG. GNN-based Encoder. The powerful GNN is developed to capture structural information, where the semantic relatedness in HKG and the mediator entities in equivalent transformation are also incorporated therein. Especially, the semantic relatedness explicitly lies in the shared primary relations across hyper-relational facts. For example, the hyper-relational facts of (Alan Turing, educated at, Cambridge, (degree, Bachelor)) and (Alan Turing, educated at, Princeton, (degree, PhD)) share the same primary relation, which indicates a strong semantic relatedness. On the other hand, in our proposed equivalent transformation, each mediator entity plays an important role of relaying connections among the entities in an original hyper-relational fact, and thus mediator entities aggregate the semantics of corresponding facts. Hence, we introduce the sharing embedding for mediator entities to capture the semantic relatedness, and further combine it with three steps of unified multi-relational message passing mechanism in KG-based GNN (Schlichtkrull et al., 2018; Vashishth et al., 2019) . • Initialized embedding. Given the embedding dimension d, for the mediator entity b, we denote ψ(b) the mapping from b to its involved primary relation, and initialize its representation as h 0 b = [e ψ(b) ; e b ] , where e ψ(b) ∈ R ⌊α•d⌋ and e b ∈ R d-⌊α•d⌋ are sharing and independent embeddings, respectively. α is the hyperparameter to tune the sharing embedding ratio. Thus, mediator entities involved with the same primary relation ψ(b) share part of embedding e ψ(b) . • Message calculation. Considering the stacking layers of GNN, we denote m l+1,ent urt and m l+1,rel urt the messages from a triple (u, r, t) for target entity t and relation r at the (l + 1)-th layer, respectively, which are calculated as follows, m l+1,ent urt = MSG ent (h l u , h l r , h l t ), m l+1,rel urt = MSG rel (h l u , h l r , h l t ) , where h l u , h l t , h l r ∈ R d are the embeddings of entities and relation at the l-th layer, while MSG ent and MSG rel can be composition function in CompGCN (Vashishth et al., 2019) , relation-specific projection in R-GCN (Schlichtkrull et al., 2018) and etc. Besides, the entity representations at the input layer are expressed as, h 0 x = [e ψ(x) ; e x ] if x is mediator entity e ′ x ∈ R d if x is original entity , for x ∈ {u, t}, • Message aggregation. Then neighborhood messages of M l+1 t and M l+1 r are aggregated as follows, M l+1 t = AGG ent (m l+1,ent urt | r ∈ R, u ∈ N r t ), M l+1 r = AGG rel (m l+1,rel urt | (u, t) ∈ N r ), where N r t denotes the entities linked to t via relation r and N r denotes the entity pair linked by relation r. AGG ent and AGG rel are aggregation functions like mean/sum pooling function. • Representation update. Finally, the representations at the (l + 1)-th layer are updated with aggregated messages and former layer representations: h l+1 t = UPD ent (M l+1 t , h l t ), h l+1 r = UPD rel (M l+1 r , h l r ) , where UPD ent and UPD rel can be nonlinear activation functions. Owing to above encoding process, TransEQ fully exploits the topological connections between entities for structural information. SF-based Decoder. The decoder part exploits various SFs to model semantic information. For each hyper-relational fact, the encoder part feeds the representations of corresponding entities and relations into the SF-based decoder to model the interaction between entities and relations therein. Especially, the choice of SF is orthogonal to the encoder, and most existing SFs on HKG modeling can be modified in the decoder. For example of the hyper-relational fact x := (s, r, o, {(a i , v i )} n i=1 ) ∈ F H , we rewrite m-DistMult's SF as: Model Training. To learn the model parameters, we adopt the cross-entropy loss for training. For the hyper-relational fact x ∈ F H with ϕ(x), the practical loss can be written as: ϕ(x) = ⟨φ(h L r , h L a1 , • • • , h L an ), h L s , h L o , h L v1 , • • • , h L vn ⟩, where ϕ(x) L = x∈F H L x (ϕ) = x∈F H -log e ϕ(x) e ϕ(x) + x ′ ∈Nx e ϕ(x ′ ) , where N x denotes the negative samples, i.e., entities in triple and attribute-value qualifiers of x are replaced by other entities in E. The training algorithm of TransEQ is presented in Appendix B for better understanding. The overall model is trained in a mini-batch way with batch normalization and dropout utilized for regularization. Overall, the proposed TransEQ develops an equivalent transformation that transforms a HKG to a KG. Then a generalized encoder-decoder framework associates KG modeling research with HKG ones, where KG-based GNN encodes structural information while HKG-based SF in decoder focuses on semantic information.  % % Neural % O(d 2 ) O(ned + nrd) m-DistMult % % Multilinear % O(d) O(ned + n pri r d) HypE % % Multilinear ! O(d) O(ned + n pri r d) HINGE % ! Neural % O(d 2 ) O(ned + nrd) G-MPNN HGNN % Multilinear % O(N d 2 ) O(ned + n pri r d + nad) StarE GNN ! Neural % O(N d 2 + nad 2 ) O(ned + nrd) TransEQ Transformation & GNN ! Arbitrary SF ! O(N d 2 ) O(ned + nrd + N d)

3.2. THEORETICAL UNDERSTANDING

Complexity Analysis. Information Preserving Transformation. Following the structural information loss concern in hyperedge expansion (Arya et al., 2021; Dong et al., 2020; Zhou et al., 2006) , here we investigate the information loss problem for our proposed transformation on HKG, which emphasizes on preserving both structural information and semantic information. Based on the equivalent transformation in Definition 2 and the proposed transformation in TransEQ, we identify the property with following theorem. Theorem 1 In the conversion from a HKG to a KG, the proposed transformation in TransEQ is an equivalent transformation and preserves the complete information. Full Expressivity. To demonstrate the expressivity of TransEQ, here we introduce the full expressivity property (Abboud et al., 2020; Fatemi et al., 2020; Liu et al., 2021) . A HKG modeling model is fully expressive if, for any given HKG, the model can separate valid hyper-relational facts from invalid ones by appropriate parameter configuration. Considering the encoder-decoder framework in TransEQ, such property is mainly determined by the SF in decoder part, thus we establish the expressivity of TransEQ with the following theorem. Theorem 2 With encoder parameters configured appropriately, the expressivity of TransEQ is in accord with that of the scoring function it uses in decoder, i.e., TransEQ is fully expressive if the scoring function used in decoder is fully expressive. Thus, with appropriate choice of SF like HypE (Fatemi et al., 2020) as well as model parameters, a fully expressive TransEQ model has the potential to represent all types of relations in HKG including symmetric relations, inverse relations, etc. (Liu et al., 2021; Sun et al., 2019) , which generally outperforms the weak ones in practice, as validated in Section 4.3.

The proofs of above theorem are provided in Appendix C

Manually-designed v.s. Learnable Transformations. According to the TransEQ model design, with theoretical guarantee on preserving information, the manually-designed equivalent transformation paves the way for capturing both semantic information and structural information in HKG. Although such transformation design can be learnable, the learning process is over complex without theoretical guarantee, while TransEQ with the manually-designed transformation has achieved the state-of-the-art performance, as validated by results in Section 4.2. Moreover, the simple yet effective manuallydesigned transformation takes the semantic difference into consideration, which offers valuable insights and rethinking discussion to the HKG modeling research.

4. EXPERIMENTS AND RESULTS

4.1 EXPERIMENTAL SETUP

Datasets.

The experiments are conducted on three benchmark HKG datasets, i.e., WikiPeople (Guan et al., 2019) , JF17K (Zhang et al., 2018) and FB-AUTO (Fatemi et al., 2020) . We follow the standard splits (Guan et al., 2019 ) of these datasets. Detailed statistics can be found in Appendix D. Baselines. As for performance comparison, we compare with several state-of-the-art HKG modeling approaches, including semantic modeling ones of BoxE (Abboud et al., 2020) S2S (Di et al., 2021) , HypE (Fatemi et al., 2020) , NeuInfer (Guan et al., 2020) , RAM (Liu et al., 2021) , HINGE (Rosso et al., 2020) , m-TransH (Wen et al., 2016) as well as structural modeling ones of StarE (Galkin et al., 2020) and G-MPNN (Yadati, 2020) . Besides, in TransEQ, we mainly adopt CompGCN (Vashishth et al., 2019) as encoder and m-DistMult (Fatemi et al., 2020) as decoder. Task and Evaluation Metrics. Following typical settings (Abboud et al., 2020; Fatemi et al., 2020; Guan et al., 2019; Liu et al., 2020; Wang et al., 2021) , we evaluate HKG modeling approaches on HKG completion task in transductive setting, and predict the missing entity at each position including triple and qualifier parts. Note that this task is more generalized than only predicting positions in triple part (Galkin et al., 2020; Rosso et al., 2020; Yu & Yang, 2021) . As for evaluation metrics, the standard mean reciprocal ranking (MRR) and Hit@1,3,10 are utilized in filtered setting (Bordes et al., 2013; Guan et al., 2019) . Code and data available: https://anonymous.4open.science/ r/TransEQ_Implementation-03FB.

4.2. HKG COMPLETION RESULTS

Table 2 : Results of HKG completion on all datasets. Results of baselines are collected from original papers and (Di & Chen, 2022; Fatemi et al., 2020; Liu et al., 2021) . Best results are highlighted in bold, and second best results are highlighted with underlines. "-" denotes missing results. WikiPeople JF17K FB-AUTO Model MRR Hit@1 Hit@10 MRR Hit@1 Hit@10 MRR Hit@1 Hit@10 We present the benchmark comparison of HKG completion in Table 2 . According to the results, our proposed TransEQ model achieves the state-of-the-art performance on all benchmarks. On the hardest dataset WikiPeople with the most entities and relations, TransEQ significantly improves the best baseline (BoxE) by 27% and 15% on Hit@1 and MRR, respectively. Considering hyper-relational connections provided in WikiPeople, this improvement demonstrates that our proposed equivalent transformation preserves complete HKG information. Besides, TransEQ significantly outperforms m-DistMult, its original decoder model without GNN-based encoder, which indicates the effectiveness and necessity to consider structural information in HKG modeling. Such results also imply that with powerful SFs like BoxE, TransEQ can obtain even better performance. Moreover, compared with structural modeling approaches of G-MPNN and StarE, the substantial improvement of TransEQ owes to subtle design of the equivalent transformation as well as the semantic information captured in the decoder part.

4.3. ENCODER-DECODER CHOICE COMPARISON

To further investigate the effects of different GNN-based encoders along with HKG-based SFs as decoders, we compare the performance of different encoder-decoder choices in Table 3 . In the table, each result corresponds to the TransEQ model with X as encoder and Y as decoder. According to the results of each row in Table 3 , compared with original models (X=No Encoder), TransEQ models with various GNN-based encoders bring substantial improvement, which again demonstrates the effectiveness of structural information encoding. Since neural network models with tremendous parameters easily overfit, the performance improvement of GNN-based encoder for Transformer is much lower than that for other models. As for the encoder in each column, the decoder choices of HypE achieve the best performance, mainly attributed to the linear complexity and full expressivity property. Benefited from the generalized encoder-decoder framework, TransEQ can flexibly adapt to various GNNs and SFs for both superior performance and full expressivity.

4.4. INFORMATION SHARING STUDY

To validate whether the semantic relatedness in HKG is captured by sharing embedding on mediator entities, we obtain hyper-relational facts of top ten primary relations and visualize their mediator entity embeddings via t-SNE (Maaten & Hinton, 2008), as shown in Figure 4 (a). We select WikiPeople for visualization considering explicit attribute information therein, and mediator entities belonging to the same primary relation are marked in the same color. From the figure, we observe that mediator entities are neatly clustered according to their mapping primary relations, which is in accord with our sharing embedding design in GNN-based encoder. We further investigate the effect of sharing hyperparameter α in Figure 4 (b) and (c). An extreme point can be observed in both datasets, which estimates the semantic relatedness in corresponding datasets. Moreover, a higher sharing ratio α brings fewer model parameters, e.g., α = 0 corresponds to the case that each mediator entity has independent embedding while α = 1 means all mediator entities with the same primary relation own the same representation. Thus, a tradeoff between model parameter complexity and practical performance can be achieved.

5. CONCLUSION

In this paper, we propose TransEQ for HKG modeling. With the equivalent transformation developed, TransEQ successfully transforms a HKG to a KG without information loss. Especially, TransEQ builds the generalized encoder-decoder framework, which firstly captures both structural information and semantic information for HKG. Experiment results show that TransEQ obtains the state-ofthe-art results on benchmark datasets. For future work, we would like to make the transformation design automated, such that each hyper-relational fact can be automatically transformed into a multi-relational subgraph following relation-specific transformation. Moreover, we plan to introduce specific GNN modules on the transformed KG to process the attribute information as well as primary information attached on the mediator entities. Learning representations for entities and relations in KGs has been investigated thoroughly (Ji et al., 2021; Wang et al., 2017) , which designs various SFs to model the semantics in triple knowledge (s, r, o). Based on translational thought, TransE (Bordes et al., 2013) , TransH (Wang et al., 2014) and RotatE (Sun et al., 2019) measure the distance between subject and object entities in a relation-specific latent space. Besides, ConvE (Dettmers et al., 2018) adopts the convolutional neural networks for SF design. TuckER (Balazevic et al., 2019) employs Tucker decomposition for SF design. Furthermore, several models combine the bilinear product with various types of embeddings (Cao et al., 2021; Trouillon et al., 2016; Yang et al., 2015) . For example, ComplEx (Trouillon et al., 2016) and DulE (Cao et al., 2021) employ complex-valued embeddings and dual quaternion embeddings, respectively. However, models above ignore the multi-relational graph structure of KGs. Until the emergence of message passing mechanism with GNN, structural information capture becomes an important topic in KG modeling. An encoder-decoder framework is developed in recent KG-based GNN studies, where GNNs encode structural information of KG and various SFs are combined for semantic information. Specifically, both R-GCN (Schlichtkrull et al., 2018) and SACN (Shang et al., 2019) treat the multi-relational KG as multiple single-relational graphs, and apply relational graph convolutional network (GCN) for entity representations. Moreover, VR-GCN (Ye et al., 2019) combines the translational idea with GNN to learn both entity and relation representations. CompGCN (Vashishth et al., 2019) develops three entity-relation composition operators to update entity representations in GCN, and KE-GCN (Yu et al., 2021) further incorporates the composition with relation update. NBFNet (Zhu et al., 2021) and RED-GNN (Zhang & Yao, 2022 ) also explore GNN with subgraph for KG completion. Overall, GNN-based models achieve promising results in KG modeling, which demonstrates the importance of capturing structural information.

A.2 HYPERGRAPH & HYPEREDGE EXPANSION

A hypergraph is a generalization of graph, where a hyperedge can join any number of nodes (Ouvrard, 2020) . Especially, hyperedge expansion (Agarwal et al., 2006; Dong et al., 2020; Zhou et al., 2006) is introduced to transform a hypergraph to a homogeneous graph, such that graph learning methods can work on hypergraphs (Feng et al., 2019; Yadati et al., 2019) . Since HKG is viewed as a multirelational ordered hypergraph (Yadati, 2020) , here we investigate the representative expansion strategy of star expansion for additional insights. , which is then connected with all original nodes in the hyperedge. With the elegant transformation, hyperedge expansion has been widely applied in recommender systems (Xia et al., 2021) , link prediction (Sun et al., 2021) , etc. On the other hand, the structural information loss has always been a concerned issue with hyperedge expansion strategy (Arya et al., 2021; Dong et al., 2020; Zhou et al., 2006) . To be specific, an expansion strategy on hypergraph suffers from structural information loss, if there can be two distinct hypergraphs on the same node set reduced to the same graph by the expansion (Dong et al., 2020) . According to (Arya et al., 2021; Dong et al., 2020) , the star expansion preserves the complete structural information. However, such traditional hyperedge expansion strategy cannot handle the HKG with hyper-relational semantics considered, which also guides our research that both structural and semantic information loss should be concerned in transforming a HKG to a KG.

B METHOD DETAILS B.1 OTHER VARIANTS OF TRANSFORMATIONS

To demonstrate the effectiveness of our proposed equivalent transformation in Section 3.1.1, here we further show other variants of transformations in Figure 6 . Especially, the plain transformation in Figure 6 (a) follows star expansion without attributes considered. Figure 6 : The illustration of other variants of transformations. In comparison to the star expansion, clique expansion is also a popular hyperedge expansion strategy (Dong et al., 2020; Zhou et al., 2006) , which transforms the hyperedge into a clique subgraph, i.e., each pair of nodes in the hyperedge are connected in the transformed graph. Thus, we also extend clique expansion into the HKG case, e.g., the clique-based plain transformation in Figure 6 (b) with only primary relations considered. To model the attribute information, in Figure 6 (c), for each attribute a i , the clique-based semantic transformation decomposes it into two relations of r sub ai and r obj ai , which connect the value entity with subject and object entities, respectively. Each pair of value entities are also connected by devised relations between attributes to satisfy the clique structure. However, these variants of transformations bring information loss while our proposed equivalent one preserves complete information, as validated by both theoretical proof and experimental performance later.

B.2 TRAINING PROCEDURE

Algorithm 2: TransEQ training algorithm. ))) m-DistMult (Fatemi et al., 2020) ⟨hr, hs, ho, hv 1 , • • • , hv n ⟩ HypE (Fatemi et al., 2020) ⟨hr, Conv(hs), Conv(ho), Conv(hv 1 ), • • • , Conv(hv n )⟩ HINGE (Rosso et al., 2020) FCN(min d ([Conv([hr; hs; ho] ); Conv([hr; hs; ho; ha i ; hv i ])])) G-MPNN (Yadati, 2020) ⟨hr, p1, According to the transformation design, clique-based transformations introduce pairwise edges for relatedness while star-based ones of plain transformation and equivalent transformation rely on additional mediator entities. Therefore, clique-based transformations keep the node complexity of O(n e ) while star-based ones build O(n e +N qua ) nodes. On the other hand, the plain transformation keeps the same edge complexity with the original HKG structure, while a relational edge between subject and object entities is added in equivalent transformation for semantic difference, bringing the complexity increase of O(N qua ). Compared with the relation complexity of about O(n pri r +n qua r ) in the original HKG, the equivalent transformation introduces O(n pri r ) relations to distinguish links between subject and object entities, which are acceptable in practice. Input: HKG G H = (E, R, F H ); Init: E for e ∈ E, R for r ∈ R, θ Enc for GNN-based encoder, θ Dec for SF-based decoder; 1 Build encoder module Enc() with θ Enc ; 2 Build decoder module Dec() with θ Dec ; 3 Transform HKG G H to KG G with Algorithm 1; 4 for t = 1, • • • , n iter do 5 Sample a mini-batch F batch ∈ F H of size m b , L ← 0; 6 E, R = Enc(G, E, R, θ Enc ); 7 for x := (s, r, o, {(a i , v i )} n i=1 ) ∈ F batch do 8 Construct negative samples N x ; 9 ϕ(x) = Dec(x, E, R, θ Dec ); 10 ϕ(x ′ ) = Dec(x ′ , E,

C.3 PROOF OF INFORMATION PRESERVATION

To demonstrate the zero information loss in the equivalent transformation, in Algorithm 3, we present the process that can equivalently recover the original HKG from the transformed KG. Note that N b k in line 3 is a subgraph, and attribute-value qualifiers can be extracted from direct relational links to mediator b k in line 5. Here we consider hyper-relational fact with at least one qualifier, while triple facts can be directly added in recovered HKG due to no mediator. Thus, Algorithm 1 and Algorithm 3 form an equivalent conversion between HKG and KG, i.e., the equivalent transformation preserves the complete information. In comparison, the plain transformation and clique-based plain transformation only keep primary relations in conversion, which cannot be recovered due to attribute loss. Besides, clique-based semantic transformation inherits the structural information loss of clique expansion in hyperedge expansion (Dong et al., 2020) . Algorithm 3: The algorithm for recovering HKG from the transformed KG by equivalent transformation. Input: Transformed KG G = (E, R, F); Init: Recovered HKG G H = (E H , R H , F H ) with E H ← ∅, R H ← ∅, F H ← ∅; 1 Obtain the set of mediator entities from E, E med ; 2 for b k ∈ E med do 3 Find b k 's neighbor entities and their connected relations from F, N b k = {r i , e i } n i=1 ; 4 Extract (s, r, o) from N b k via motif-structure discovery; 5 Extract {(a i , v i )} n-2 i=1 from left parts of N b k ; // part of {r i , e i } n i=1 corresponds to {(a i , v i )} n-2 i=1 . 6 E H ← E H ∪ {e i } n i=1 , R ← R ∪ {r} ∪ {a i } n-2 i=1 ; 7 F H ← F H ∪ {(s, r, o, {(a i , v i )} n-2 i=1 )}; 8 end Output: Recovered HKG G H = (E H , R H , F H ).

C.4 PROOF OF FULL EXPRESSIVITY

Our proposed TransEQ firstly transforms a HKG to a KG, then develops a GNN-based encoder for representation encoding, and calculates plausibility scores based on existing SFs in HKG modeling studies with entity and relation embeddings from encoder part. Meanwhile, several SFs from HypE (Fatemi et al., 2020) , BoxE (Abboud et al., 2020) , RAM (Liu et al., 2021) , etc., have been proved to be fully expressive with an assignment of entity and relation embeddings in their original papers. Hence, with a fully expressive SF in decoder, the TransEQ model is fully expressive if the output embeddings from encoder part follow corresponding assignment required by SF, which is proved as follows, Proof. For t ∈ E, r ∈ R, let h 0 t , h 0 r denote their initialized representations, while h L t , h L r denote corresponding embeddings outputted from encoder part. We also denote h SF t , h SF r the required input embeddings of SF in decoder. Then, in mathematical, with h L t = Enc(h 0 t , θ Enc ) and h L r = Enc(h 0 t , θ Enc ), we should prove h L t = h SF t and h L r = h SF r can be achieved with appropriate choice of encoder parameters θ Encfoot_2 and initialized embeddings h 0 t , h 0 r . Taking the example of R-GCN (Schlichtkrull et al., 2018) as encoder, the message passing process of each GCN layer can be written as, h l+1 t = σ( r∈R u∈N r t 1 |N r t | W l r h l u + W l 0 h l t ), where σ denotes nonlinear activation function like ReLU, which is unnecessary and can be removed (Wu et al., 2019) . Now, we describe a feasible assignment of encoder parameters: For each layer l ∈ {1, • • • , L} and r ∈ R, relation-specific matrix W l r is set to null matrix, while W l 0 is set to identity matrix, where both W l r and W l 0 belong to encoder parameters θ Enc . Following the assignment above, we have h l+1 t = h l t , i.e., h L t = h 0 t . Hence, we can set the values of h 0 t according to h SF t . In R-GCN, h L r is directly initialized and can be set to h SF r . Overall, the encoder's output embeddings follow the required embedding assignment of SF with above assignment on θ Enc , h 0 t and h 0 r . Thus, the expressivity of TransEQ is proved to be in accord with that of the SF it uses in decoder. Finally, we note that the proof can be trivially extended to other GNN-based encoders like CompGCN (Vashishth et al., 2019) by introducing extra assignments on encoder parameters. □

D EXPERIMENT DETAILS

Here we provide more experiment details to support our claim. Moreover, we further perform experiments on five datasets to validate the robustness of our proposed TransEQ model.

D.1 DATASET DETAILS

We detail the dataset statistics in Table 6 . To validate the robustness, we further consider a recently developed dataset WD50K and its variant WD50K(100) (Galkin et al., 2020) , where all facts contain qualifiers, i.e., no simple triple facts therein. We also consider four datasets of WikiPeople-3, JF17K-3, WikiPeople-4, JF17K-4, developed from (Liu et al., 2020) , where facts have a fixed number of qualifiers in accord with the dataset name. Table 7 presents dataset statistics. (Abboud et al., 2020; Fatemi et al., 2020; Galkin et al., 2020; Wang et al., 2021; Yu & Yang, 2021) . The batch size, learning rate and dropout are chosen from {64, 128}, {0.0001, 0.0005, 0.001, 0.005} and [0.1, 0.5] with step 0.1, respectively. Besides, we mainly adopt CompGCN (Vashishth et al., 2019) as encoder and m-DistMult (Fatemi et al., 2020) as decoder. For the encoder part, the number of GNN layers and sharing ratio α are chosen from {1, 2, 3, 4} and [0.0, 1.0] with step 0.2, respectively. The composition operation in encoder is set to rotate function (Sun et al., 2019) . We tune hyperparameters over the validation set with early stopping strategy employed. All experiments are run on a RTX 2080 Ti GPU. Furthermore, we compare the learning processes of TransEQ with structural modeling approaches on three datasets in Figure 7 . The learning curve of HypE with linear time complexity is also plotted for comparison. It can be observed that TransEQ achieves similar convergence speed with HypE in practice, which owes to the multilinear product based SF (Liu et al., 2021) and efficient implementation. With a similar form of SF adopted, G-MPNN achieves a close convergence rate but inferior performance, which demonstrates the strength of GNN-based encoder compared with HGNN. As for StarE with Transformer-based SF, tremendous parameters lead to time-consuming training on all datasets.

D.4 TRANSFORMATION COMPARISON

To analyze the effects of various transformations, we present the performance comparison in Table 8 . Due to space limitation, results with Hit@3 are omitted, which are in accord with other metrics. As described in Section 3.1.1, our proposed equivalent transformation connects subject and object entities via a relational edge r to form the motif for semantic difference. Thus, we investigate the effectiveness of such operation by removing the edge in the transformation, referred to as w/o distinction transformation. From the table, we can observe that the equivalent transformation outperforms other variants of transformations, which is in accord with the information loss analysis in Section 3.2, i.e., only equivalent transformation preserves complete information. Moreover, removing the relational edge in the equivalent transformation leads to a Hit@1 performance drop of 7% on JF17K, which demonstrates the effectiveness and necessity of considering semantic difference in the transformation. Besides, since any two entities are connected in two clique-based transformations, the relatedness between entities is largely captured and thus they obtain close performance, i.e., the clique structure makes these transformations insensitive to semantic information. In comparison, the star structure in plain transformation and equivalent transformation is quite simple, and additional information including hyper-relational semantics and semantic difference should be incorporated in transformation, which also accounts for the obvious gap between these two transformations. Considering the zero information loss and experimental performance, the equivalent transformation becomes the best choice for TransEQ in HKG modeling.

D.5 ENCODER-DECODER CHOICE COMPARISON

We also compare the performance of different encoder-decoder choices of TransEQ models on FB-AUTO in Table 9 . These results further validate the observations in Section 4.3.

D.6 ADDITIONAL HKG COMPLETION RESULTS

Since on WD50K and WD50K(100) former studies (Galkin et al., 2020; Yu & Yang, 2021) only predict missing entities at primary triple, not comparable to HKG completion task in Section 4.1, we select competitive baselines in Table 2 and report their performance on these datasets, as shown in Table 10 . Here we evaluate TransEQ models with m-DistMult and Transformer as decoders, denoted by TransEQ-DM and TransEQ-Trf, respectively. According to the table, TransEQ model with Transformer-based decoder generally performs well on both datasets, which again demonstrates the effectiveness of model design. In Table 11 , with m-DistMult and HypE(HP) as decoders, we further investigate TransEQ's performance on HKG datasets with fixed number of qualifiers, compared with HINGE (Rosso et al., 2020) ,



⟨h1, h2, • • • , hn⟩ = i h1[i] h2[i] • • • hn[i] The expansion strategy is named according to its graph illustration. Note that here we simplify the expression of encoder module, which is still in accord with the form in Algorithm 2.



Figure 1: An example of a HKG including primary triples and attributevalue qualifiers. The entities/relation in the triple are called as primary entities/relation, and attributes/values in qualifiers are called as qualifier entities/relations.

Figure 2: The architecture of our proposed HKG modeling model TransEQ.

Figure 3: The illustration of the equivalent transformation.

is plausibility score measured by TransEQ, ⟨•⟩ denotes the multilinear product 1 , and L denotes the number of GNN layers in encoder part. Since m-DistMult adopts a composed relation for SF, we introduce the function φ to aggregate the embeddings of involved primary relation and attributes for the composed relation embedding, such as mean/sum pooling function. Note that the semantic difference in HKG is also modeled by the SF in decoder. Various SFs are further investigated in experiments later.

Figure 4: (a) The visualization of mediator entity embeddings with top ten primary relations in WikiPeople via t-SNE; The effects of share embedding ratio α on (b) JF17K and (c) FB-AUTO.

Figure 5: The illustration of star expansion on a hyperedge.

Figure5presents an example of a hyperedge with four nodes and its transformed graph by star expansion 2 . Specifically, for a hyperedge, the star expansion introduces a mediator node (like the blank node in the center of Figure5(b)), which is then connected with all original nodes in the hyperedge. With the elegant transformation, hyperedge expansion has been widely applied in recommender systems(Xia et al., 2021), link prediction(Sun et al., 2021), etc.

Figure 7: Comparison on clock time of model training vs. testing MRR.

A comparison of representative HKG modeling studies. n e , n r and n pri r denote the numbers of entities, relations and primary relations. d is the embedding dimension. n a is the maximum number of attribute-value qualifiers for facts, and N = F H is the total number of facts in HKG. Neural: neural network based SF, Multilinear: multilinear product based SF.

To distinguish our proposed TransEQ model design, in Table1, we present a comparison of HKG modeling studies with structural modeling, semantic modeling, full expressivity as well as time and space complexity. According to the table, structural information is rarely explored in existing studies, while HGNN in G-MPNN is at its early stage and thus fails to model attribute semantics. StarE only captures triple-based connections(Yu & Yang, 2021), while TransEQ combines the equivalent transformation with GNN for structural information. As for semantic modeling, TransEQ not only captures the semantic difference between the primary triple and attributevalue qualifiers, but also applies for arbitrary SFs. Compared with the weak expressive power of most studies, the flexible choice of SF guarantees the full expressivity of TransEQ to model various HKGs, and brings performance improvement. Besides, the message passing mechanism in modeling structural information leads to the time complexity of O(N d 2 ), and Transformer module in StarE brings an additional complexity of O(n a d 2 ). Since the equivalent transformation introduces a mediator entity for each hyper-relational fact, TransEQ builds the space complexity of O(n e d+n r d+N d) with original entities and relations considered. Owing to parallel implementation and GPU acceleration, TransEQ obtains comparable efficiency to the fastest current studies in experiments. In this way, TransEQ achieves efficient and expressive HKG modeling with both structural information and semantic information captured.

Performance comparison of different encoder-decoder choices on JF17K. "OOM" indicates out of memory. Best results for each encoder are highlighted in bold.

The TransEQ Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1.1 One Stone: Equivalent Transformation . . . . . . . . . . . . . . . . . . . 4 3.1.2 Two Birds: Encoder-Decoder Framework . . . . . . . . . . . . . . . . . . 5 3.2 Theoretical Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Complexity Analysis on Transformations . . . . . . . . . . . . . . . . . . . . . . 16 C.3 Proof of Information Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . 16 C.4 Proof of Full Expressivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Dataset Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 D.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 D.3 Efficiency Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

R, θ Dec ), ∀x ′ ∈ N x ; Embeddings E, R and parameters θ Enc , θ Dec . (Conv([[hr s ; hs]; [hr o ; ho]; [ha i ; hv i ]]

• • • , pn+2, hs, ho, hv 1 , • • • , hv n ⟩ StarE (Galkin et al., 2020) h ⊤ o FCN(Mean(Trf(hr, ha 1 , • • • , ha n , hs, hv 1 , • • • , hv n )))The parameter complexity of different transformations, in terms of entity/node, relation and edge. n e = |E| and n r = |R| are the number of entities and relations in HKG. n pri r and n qua r are the numbers of primary and qualifier relations, respectively. n a is the maximum number of attribute-value qualifiers for facts. N qua and N pri are the number of hyper-relational facts with and without attribut-value qualifiers, such that N pri + N qua = |F|.

Dataset statistics.

Dataset statistics.

Performance comparison of different transformations. Best results are highlighted in bold.

annex

NeuInfer (Guan et al., 2020) , n-TuckER (Liu et al., 2020) and GETD (Liu et al., 2020) . Note that n-TuckER and GETD can only handle datasets with a fixed number of qualifiers with competitive performance. According to the results, TransEQ models obtain the state-of-the-art performance on most datasets, indicating the robustness and effectiveness of the proposed equivalent transformation as well as the generalized encoder-decoder framework. Following the settings in Section 4.4, in Figure 8 , we compare the visualization results of utilizing independent embedding (α = 0.0) and sharing embedding (the best setting with α = 0.8), which further validates the effectiveness of TransEQ capturing semantic relatedness for mediator entities. 

