HYPERQUERY: A FRAMEWORK FOR HIGHER ORDER LINK PREDICTION

Abstract

Groups with complex set intersection relations are a natural way to model a wide array of data, from the formation of social groups to the complex protein interactions which form the basis of biological life. While graphs are a natural way to represent complex networks and are well studied, typical approaches to modeling group membership using graphs are lossy. Moreover, a simple graph based approach cannot be used for prediction and classification over a collection of entities. Hypergraphs are a more natural way to represent such "higher order" relationships, but efforts to apply machine learning techniques to hypergraph structured datasets have been limited thus far. In this paper, we address the problem of link prediction in knowledge hypergraphs as well as simple hypergraphs and develop a novel, simple, and effective optimization architecture to solve this task. Additionally, we study how integrating data from node-level labels can improve the results of our system. Our self-supervised approach achieves significant improvement over state of the art results on several hyperedge prediction and knowledge hypergraph completion benchmarks.

1. INTRODUCTION

There is a significant demand for applying learning to graph structured data over the past couple of years. While graphs can accurately model binary relations between entities, they are not a natural representation of n-ary relations between entities. For example, a protein complex network cannot be represented by a graph since a protein complex might be created only in a presence of more than two proteins Giurgiu et al. (2019) . In this paper, we set out to answer complex queries that go beyond a simple graph i.e. the current graph learning algorithms cannot solve such problems without major modifications. In particular, we study Learning on Hypergraphs. Hypergraphs are a generalization of graphs for representing such n-ary relations. Formally, a hypergraph H is a tuple (V, E) where V is a set of nodes; E ⊆ 2 |V | is a set of nonempty subsets of V called hyperedges. Similarly, a knowledge hypergraph is a generalization of a knowledge graph where relations are between any number of entities. Recent research shows that hypergraph models produce more accurate results even in problems in which graphs are used to represent n-ary relations Zhou et al. ( 2006 In this paper, we aim to solve the task of hyperedge prediction on both simple and knowledge hypergraphs. Hyperedge prediction in simple hypergraphs is analogous to link prediction in graphs, and can be formally defined as follows: given a hypergraph H = (V, E) and a k-tuple of nodes (v 1 , v 2 , ..., v k ), predict whether this tuple forms a hyperedge or not. While link prediction in graphs is a well-studied problem, hyperedge prediction has not received adequate attention in spite of its many applications. For example, it can be used to predict new protein complexes, drug-drug interactions, new collaborations in citation networks, discover new chemical reactions in metabolic networks, etc. Yadati et al. (2020); Giurgiu et al. (2019); Piñero et al. (2019) . Hyperedge prediction is a more challenging problem than link prediction in graphs. Formulating this problem as a link prediction problem in graphs is a lossy operation, reducing the accuracy of predictions Kirkland (2017); Tu et al. (2018) . In knowledge hypergraphs, it is often necessary to not only predict new hyperedges but also their type. For example, in a protein-drug genomics knowledge hypergraph, it is important to predict



); Feng et al. (2020); Fatemi et al. (2019).

