HYPERQUERY: A FRAMEWORK FOR HIGHER ORDER LINK PREDICTION

Abstract

Groups with complex set intersection relations are a natural way to model a wide array of data, from the formation of social groups to the complex protein interactions which form the basis of biological life. While graphs are a natural way to represent complex networks and are well studied, typical approaches to modeling group membership using graphs are lossy. Moreover, a simple graph based approach cannot be used for prediction and classification over a collection of entities. Hypergraphs are a more natural way to represent such "higher order" relationships, but efforts to apply machine learning techniques to hypergraph structured datasets have been limited thus far. In this paper, we address the problem of link prediction in knowledge hypergraphs as well as simple hypergraphs and develop a novel, simple, and effective optimization architecture to solve this task. Additionally, we study how integrating data from node-level labels can improve the results of our system. Our self-supervised approach achieves significant improvement over state of the art results on several hyperedge prediction and knowledge hypergraph completion benchmarks.

1. INTRODUCTION

There is a significant demand for applying learning to graph structured data over the past couple of years. While graphs can accurately model binary relations between entities, they are not a natural representation of n-ary relations between entities. For example, a protein complex network cannot be represented by a graph since a protein complex might be created only in a presence of more than two proteins Giurgiu et al. (2019) . In this paper, we set out to answer complex queries that go beyond a simple graph i.e. the current graph learning algorithms cannot solve such problems without major modifications. In particular, we study Learning on Hypergraphs. Hypergraphs are a generalization of graphs for representing such n-ary relations. Formally, a hypergraph H is a tuple (V, E) where V is a set of nodes; E ⊆ 2 |V | is a set of nonempty subsets of V called hyperedges. Similarly, a knowledge hypergraph is a generalization of a knowledge graph where relations are between any number of entities. Recent research shows that hypergraph models produce more accurate results even in problems in which graphs are used to represent n-ary relations Zhou et al. (2006); Feng et al. (2020); Fatemi et al. (2019) . In this paper, we aim to solve the task of hyperedge prediction on both simple and knowledge hypergraphs. Hyperedge prediction in simple hypergraphs is analogous to link prediction in graphs, and can be formally defined as follows: given a hypergraph H = (V, E) and a k-tuple of nodes (v 1 , v 2 , ..., v k ), predict whether this tuple forms a hyperedge or not. While link prediction in graphs is a well-studied problem, hyperedge prediction has not received adequate attention in spite of its many applications. For example, it can be used to predict new protein complexes, drug-drug interactions, new collaborations in citation networks, discover new chemical reactions in metabolic networks, etc.Yadati et al. ( 2020 In knowledge hypergraphs, it is often necessary to not only predict new hyperedges but also their type. For example, in a protein-drug genomics knowledge hypergraph, it is important to predict In this paper, we prefer to work with the star-expansion representation of a hypergraph (left). We are interested in studying hypergraphs and knowledge hypergraphs with categorical labels (types) stored on the edges (center). Our primary objective is to create a system that predicts the existence of a hyperedge and its type. not only drug-drug interactions but also the type of these interactions that describes the side effects Zitnik et al. (2018) . This generalized hyperedge prediction problem can be described formally as follows: given a knowledge hypergraph KH = (E, R) and a tuple of entities (e 1 , e 2 , ..., e k ), we want to predict if this tuple forms a hyperedge and if so, what its type is. This problem setting is illustrated in figure 1 . This paper makes the following contributions. • HyperQuery: We describe HyperQuery, a neural message passing based framework that is designed to find embeddings of hyperedges in a semi-supervised fashion. • Novel feature extraction: We use clustering to extract global features of nodes and hyperedges. • Higher order link prediction: We solve hyperedge prediction on simple hypergraphs as well as knowledge hypergraphs. The pipeline of our system is shown in figure 4 .

2. RELATED WORK

Link prediction in graphs: There are two ways to address the link prediction problem in networks. One approach is to learn an embedding of the nodes of a graph and then apply a function on these embeddings to obtain the embedding of an edge. We call this approach an indirect approach to solve hyperedge prediction. For example, node2vec Grover & Leskovec ( 2016 Hyperedge prediction: Link prediction in hypergraphs can also be done indirectly by first computing the embedding of the nodes of the hypergraph and then applying a function g to the tuple (v 1 , v 2 , ..., v k ) to obtain the embedding of the tuple i.e. hyperedge. The difficulty with such models is that for hypergraphs, function g must be nonlinear to capture higher-order proximity of nodes in the hypergraph Tu et al. (2018) . This means it is not a good design choice to use an operator such as average to obtain the embedding of a hyperedge from its nodes. Related works HyperSAGNN and NHP have used non-linear functions such graph neural networks Zhang et al. (2020); Yadati et al. (2020) to compute the embedding of a hyperedge from its nodes. Directly learning the embedding of



);Giurgiu et al. (2019); Piñero et al. (2019). Hyperedge prediction is a more challenging problem than link prediction in graphs. Formulating this problem as a link prediction problem in graphs is a lossy operation, reducing the accuracy of predictions Kirkland (2017);Tu  et al. (2018).

Figure1: The HyperQuery Inference Problem: In this paper, we prefer to work with the star-expansion representation of a hypergraph (left). We are interested in studying hypergraphs and knowledge hypergraphs with categorical labels (types) stored on the edges (center). Our primary objective is to create a system that predicts the existence of a hyperedge and its type.

), and deepwalk Perozzi et al. (2014) are random-walk based approaches and Vashishth et al. (2019); Davidson et al. (2018) are GNN-based approaches for graphs that learn the embedding of the nodes and then use a binary operator such as average to compute the embedding of an edge i.e. given two nodes u, and v, they apply binary operator o to generate an embedding of g(u, v) such that g : V × V -→ R d where d is the dimension of the embedding. Direct approaches in graphs learn the embedding of edges directly and apply that to the link prediction problem. Examples of this approach includes path-based approaches such as Zhu et al. (2021); Sadeghian et al. (2019). One popular task in knowledge graphs is to predict missing relations between entities. This task is called knowledge graph completion and one could think of it as a generalization of link prediction in graphs. The problem of knowledge graph completion has been studied extensively for example: Zhu et al. (2021); Wang et al. (2020); Rossi et al. (2022).

