OLLIVIER-RICCI CURVATURE FOR HYPERGRAPHS: A UNIFIED FRAMEWORK

Abstract

Bridging geometry and topology, curvature is a powerful and expressive invariant. While the utility of curvature has been theoretically and empirically confirmed in the context of manifolds and graphs, its generalization to the emerging domain of hypergraphs has remained largely unexplored. On graphs, the Ollivier-Ricci curvature measures differences between random walks via Wasserstein distances, thus grounding a geometric concept in ideas from probability theory and optimal transport. We develop ORCHID, a flexible framework generalizing Ollivier-Ricci curvature to hypergraphs, and prove that the resulting curvatures have favorable theoretical properties. Through extensive experiments on synthetic and real-world hypergraphs from different domains, we demonstrate that ORCHID curvatures are both scalable and useful to perform a variety of hypergraph tasks in practice. which facilitates its generalization; we will also use κ(e) for (hyper)edges as a shorthand notation for Eq. ( 4). When defining probability measures and AG G functions on hypergraphs, we would like to retain as much flexibility as possible while also ensuring the following conditions: I. Mathematical generalization. For graphs, AG G simplifies to the original ORC on graphs. II. Permutation invariance. AG G(e) = AG G(σ(e)) for edges e and all node index permutations σ. III. Scalability. The probability measures and AG G functions should be efficiently computable. Beyond these properties, we would also like to have the following interpretability features to ascertain that a hypergraph curvature measure is a conceptual generalization of ORC: A. Probabilistic intuition. The probability measures assigned to nodes should correspond to a semantically sensible random walk on the hypergraph. B. Optimal transport intuition. The generalization of the distance metric (AG G) should have a semantically sensible interpretation in terms of optimal transport. C. Geometric intuition. Edges in hypercliques should have positive curvature, edges in hypergrids should have curvature zero, and edges in hypertrees should have negative curvature. We now specify probability measures and AG G functions for which the conditions above hold.

1. INTRODUCTION

Hypergraphs generalize graphs by allowing any number of nodes to participate in an edge. They enable us to faithfully represent complex relations, such as co-authorship of scientific papers, multilateral interactions between chemicals, or group conversations, which cannot be adequately captured by graphs. While hypergraphs are more expressive than graphs and other relational objects like simplicial complexes, they are harder to analyze both theoretically and empirically, and many concepts that have proven useful for understanding graphs have yet to be transferred to the hypergraph setting. Curvature has established itself as a powerful characteristic of Riemannian manifolds, as it permits the description of global properties through local measurements by harmonizing ideas from geometry and topology. For graphs, graph curvature measures to what extent the neighborhood of an edge deviates from certain idealized model spaces, such as cliques, grids, or trees. It has proven helpful, for example, in assessing differences between real-world networks (Samal et al., 2018) , identifying bottlenecks in real-world networks (Gosztolai & Arnaudon, 2021) , and alleviating oversquashing in graph neural networks (Topping et al., 2022) . One prominent notion of graph curvature is Ollivier-Ricci curvature (ORC). ORC compares random walks based at specific nodes, revealing differences in the information diffusion behavior in the graph. As the sizes of edges and edge intersections can vary in hypergraphs, there are many ways to generalize ORC to hypergraphs. While some notions of hypergraph ORC have been previously studied in isolation (e.g., Asoodeh et al., 2018; Eidi & Jost, 2020; Leal et al., 2020) , a unified framework for their definition and computation is still lacking. Contributions. We introduce ORCHID, a unified framework for Ollivier-Ricci curvature on hypergraphs. ORCHID integrates and generalizes existing approaches to hypergraph ORC. Our work is the first to identify the individual building blocks shared by all notions of hypergraph ORC, and to perform a rigorous theoretical and empirical analysis of the resulting curvature formulations. We develop hypergraph ORC notions that are aligned with our geometric intuition while still efficient to compute, and we demonstrate the utility of these notions in practice through extensive experiments. Structure. After providing the necessary background on graphs and hypergraphs and recalling the definition of Ollivier-Ricci curvature for graphs in Section 2, we introduce ORCHID, our framework for hypergraph ORC, and analyze the theoretical properties of ORCHID curvatures in Section 3. We assess the empirical properties and practical utility of ORCHID curvatures through extensive experiments in Section 4, and discuss limitations and potential extensions of ORCHID as well as directions for future work in Section 5. Further materials are provided in Appendices A.1 to A.5.

2. PRELIMINARIES

Graphs and Hypergraphs A simple graph G = (V, E) is a tuple containing n nodes (vertices) V = {v 1 , . . . , v n } and m edges E = {e 1 , . . . , e m }, with e i ∈ V 2 for all i ∈ [m]. Here, for a set S and a positive integer k ≤ |S|, S k denotes the set of all k-element subsets of S, and for x ∈ N with 0 / ∈ N, [x] = {i ∈ N | i ≤ x}. In multi-graphs, edges can occur multiple times, and hence, E = (e 1 , . . . , e m ) is an indexed family of sets, with e i ∈ V 2 for all i ∈ [m]. Generalizing simple graphs, a simple hypergraph H = (V, E) is a tuple containing n nodes V and m hyperedges E ⊆ P(V ) \ ∅, i.e., in contrast to edges, hyperedges can have any cardinality r ∈ [n] . In a multihypergraph, E = (e 1 , . . . , e m ) is an indexed family of sets, with e i ⊆ V for all i ∈ [m]. We assume that all our hypergraphs are multi-hypergraphs, and we drop the prefix hyper from hypergraph and hyperedge where it is clear from context. We denote the degree of node i, i.e., the number of edges containing i, by deg(i) = |{e ∈ E | i ∈ e}|, write i ∼ j if i is adjacent to j (i.e., there exists e ∈ E such that {i, j} ⊆ e), and use N (i) (N (e)) for the neighborhood of i (e), i.e., the set of nodes adjacent to i (edges intersecting edge e). While deg(i) = | N (i)| in simple graphs and deg(i) ≥ | N (i)| in multigraphs, these relations do not generally hold for hypergraphs. Two nodes i ̸ = j are connected in H if there is a sequence of nodes i = v 1 , v 2 , . . . , v k-1 , v k = j such that v l ∼ v l+1 for all l ∈ [k]. Every such sequence is a path in H, whose length is the cardinality of the set of edges used in the adjacency relation. We refer to the length of a shortest path connecting nodes i, j as the distance between them, denoted as d(i, j). We assume that all (hyper)graphs are connected, i.e., there exists a path between all pairs of nodes. This turns H into a metric space (H, d) with diameter diam(H) := max{d(i, j) | i, j ∈ V }. (Hyper)graphs in which all nodes have the same degree k (deg(i) = k for all i ∈ V ) are called k-regular. Three properties of hypergraphs that distinguish them from graphs give rise to additional (ir)regularities. First, hyperedges can vary in cardinality, and a hypergraph in which all hyperedges have the same cardinality r (|e| = r for all e ∈ E) is called r-uniform. Second, hyperedge intersections can have cardinality greater than 1, and we call a hypergraph s-intersecting if all nonempty edge intersections have the same cardinality s (e ∩ f ̸ = ∅ ⇔ |e ∩ f | = s for all e, f ∈ E). Third, nodes can cooccur in any number of hyperedges; we call a hypergraph c-cooccurrent if each node cooccurs c times with any of its neighbors (i ∼ j ⇔ |{e ∈ E | {i, j} ⊆ e}| = c for all i, j ∈ V ). Using this terminology, simple graphs are 2-uniform, 1-intersecting, 1-cooccurrent hypergraphs. Given a hypergraph H = (V, E), the unweighted clique expansion of H is G • = (V, E • ) with E • = {{i, j} | {i, j} ⊆ e for some e ∈ E}, where two nodes are adjacent in G • if and only if they are adjacent in H. The weighted clique expansion of H is G • endowed with a weighting function w : E • → N, where w(e) = |{e ∈ E | {i, j} ⊆ e}| for each e ∈ E • , i.e., an edge {i, j} is weighted by how often i and j cooccur in edges from H. Both of these transformations are lossy, i.e., we cannot uniquely reconstruct H from G • . The unweighted star expansion of H is the bipartite graph G ′ = (V ′ , E ′ ) with V ′ = V ∪E and E ′ = {{i, e} | i ∈ V, e ∈ E, i ∈ e}, and we can uniquely reconstruct H from G ′ if we know which of its parts corresponds to the original node set of H. Ollivier-Ricci Curvature for Graphs Ollivier-Ricci curvature (ORC) extends the notion of Ricci curvature, defined for Riemannian manifolds, to metric spaces equipped with a probability measure or, equivalently, a random walk (Ollivier, 2007; 2009) . On graphs, which are metric spaces with the shortest-path distance d(•, •), the ORC κ of a pair of nodes {i, j} is defined as κ(i, j) := 1 -1 d(i, j) W 1 (µ i , µ j ) , and hence, κ(i, j ) = 1 -W 1 (µ i , µ j ) if i ∼ j , where µ i is a probability measure associated with node i that depends measurably on i and has finite first moment, and W 1 is the Wasserstein distance of order 1, which captures the amount of work needed to transport the probability mass from µ i to µ j in an optimal coupling. The use of the shortest-path distance is necessary to ensure that ORC is also well-defined for pairs of non-adjacent nodes. This definition on edges or pairs of nodes alludes to the fact that Ricci curvature is associated to tangent vectors of a manifold. A common strategy to measure curvature at a node i is to average over the curvatures of all edges incident with i (Banerjee, 2021; Jost & Liu, 2014) , i.e., κ(i) = 1 deg(i) {i,j}∈E κ(i, j) . (2) A popular probability measure that easily generalizes to weighted graphs and multigraphs is µ α i (j) :=      α j = i (1 -α) 1 deg(i) i ∼ j 0 otherwise , where α serves as a smoothing parameter (Lin et al., 2011) . With this definition, stacking the probability measures yields the transition matrix of an α-lazy random walk.

3. THEORY

Having introduced the concept of hypergraphs and the definition of Ollivier-Ricci curvature (ORC) for graphs, we now develop our framework for ORC on hypergraphs, called ORCHID (Ollivier-Ricci Curvature for Hypergraphs In Data). We focus our exposition on undirected, unweighted multi-hypergraphs, but ORCHID straightforwardly generalizes to other hypergraph variants.

3.1. OLLIVIER-RICCI CURVATURES FOR HYPERGRAPHS (ORCHID CURVATURES)

As mentioned in Section 2, hypergraphs differ from graphs in that edges can have any cardinality, and consequently, edges can intersect in more than one node, and nodes can co-occur in more than one edge. When generalizing ORC as defined in Section 2 to hypergraphs, these peculiarities become relevant in two places: (1) in the generalization of the measure µ for nodes, and (2) in the generalization of the distance metric W 1 . Construing the distance metric as a function aggregating measures (AG G), with AG G : V + → R, we can rewrite Eq. ( 1) for pairs of nodes {i, j} as Probability Measures (µ). In graphs, the most natural probability measures are induced by the α-lazy random walk given in Eq. ( 3): With probability α, we stay at the current node i, and with probability (1-α) /deg(i), we move to one of its neighbors. There are at least three direct extensions of this formulation to hypergraphs that all retain this probabilistic intuition, thus fulfilling the requirement of Feature A. These extensions, illustrated in Fig. 1 , differ only in how they distribute the (1-α) probability mass in Eq. (3) from node i to the nodes in i's neighborhood. Given a hypergraph H, for i and j with i ∼ j, first, we could define µ EN i (j) := (1 -α) 1 | N (i)| , by which we pick a neighbor j of node i uniformly at random. We call this the equal-nodes random walk (EN), which is a random walk on the unweighted clique expansion of H. Second, we could set µ EE i (j) := (1 -α) 1 deg(i) -|{e ∋ i | |e| = 1}| e⊇{i,j} 1 |e| -1 , which first picks an edge e ∋ i with |e| ≥ 2, then picks a node j ∈ e\{i}, both uniformly at random. We call this the equal-edges random walk (EE), which is a two-step random walk on the unweighted star expansion of H, starting at a node i ∈ V , and non-backtracking in the second step. It underlies the curvatures studied by Asoodeh et al. (2018) and Banerjee (2021) . Third, we could define µ WE i (j) := (1 -α) e⊇{i,j} |e| -1 f ∋i |f | -1 1 |e| -1 = (1 -α) |{e ∈ E | {i, j} ⊆ e}| f ∋i |f | -1 , first picking an edge e incident with i with probability proportional to its cardinality, then picking a node j ∈ e\{i} uniformly at random. We call this the weighted-edges random walk (WE): a two-step random walk from a node i ∈ V on a specific directed weighted star expansion of H whose second step is non-backtracking-or equivalently, a random walk on a weighted clique expansion of H.

Similarity Measures (AG G).

In the original formulation of ORC, i.e., Eq. ( 1), when determining the curvature of an edge {i, j}, the Wasserstein distance W 1 is used to aggregate the probability measures of i and j. There are at least three different extensions of this aggregation scheme to hypergraphs that retain an optimal transport intuition, as required by Feature B. Leveraging that an edge e ⊆ V is simply a set of nodes, the easiest extension is to leave the aggregation function unchanged. We continue determining the curvature for pairs of nodes, and account for the edges in H only in the definition of our probability measure. In this case, we could derive a curvature for an edge e as the average over all curvatures of node pairs contained in e, i.e., we could define AG G as AG G A (e) := 2 |e|(|e| -1) {i,j}⊆e W 1 µ i , µ j . This is equivalent to computing the curvature of e based on the average over all W 1 distances of probability measures associated with nodes contained in e: κ A (e) := 1 -AG G A (e) = 1 - 2 |e|(|e| -1) {i,j}⊆e W 1 (µ i , µ j ) = 2 |e|(|e| -1) {i,j}⊆e κ(i, j) . (9) Intuitively, this definition assesses the mean amount of work needed to transport the probability mass from one node in e to another node in e. Alternatively, and still keeping with the intuition from optimal transport, we can define AG G as AG G B (e) := 1 |e| -1 i∈e W 1 (µ i , μ) , and consequently, κ B (e) := 1 -AG G B (e) , where μ denotes the Wasserstein barycenter of the probability measures of nodes contained in e, and the denominator generalizes the original d(i, j). Asoodeh et al. (2018) use this aggregation function. Intuitively, AG G B is proportional to the minimum amount of work needed to transport all probability mass from the probability measures of the nodes to one place, with the caveat that this place need not correspond to a node in the underlying hypergraph. Finally, we can capture the maximum amount of work needed to transport all probability mass from one node in e to another node in e as AG G M (e) := max{W 1 (µ i , µ j ) | {i, j} ⊆ e} , and consequently, κ M (e) := 1 -AG G M (e) . Independent of the choice of AG G, the curvature at a node i can be defined as the mean of all curvatures of meaningful directions containing i, i.e., κ N (i) := 1 | N (i)| j∈N (i) κ(i, j) , or it can be derived as the mean of all curvatures of edges containing i, i.e., κ E (i) := 1 deg(i) e∋i κ(e) . Finally, since H is connected, we can define the curvature of an arbitrary subset of nodes s ⊆ V as κ(s) := 1 - AG G(s) d(s) , where AG G can be any of our aggregation functions, and d(s) := max{d(i, j) | {i, j} ⊆ s} refers to the extent of the subset s. Note that for s ∈ E, d(s) = 1, and thus, Eq. ( 14) is consistent with our previous definitions of hyperedge curvatures.

3.2. PROPERTIES OF ORCHID CURVATURES

Having introduced our probability measures (µ) and aggregation functions (AG G), we now analyze their properties and the properties of the resulting curvatures. All proofs are deferred to Appendix A.1. First, we note that µ EN , µ EE , and µ WE are equivalent for certain hypergraph classes, and all aggregation functions coincide for graphs. Lemma 1. For graphs and r-uniform, k-regular, c-cooccurrent hypergraphs, µ EN = µ EE = µ WE . Lemma 2. For graphs, i.e., 2-uniform hypergraphs, we have AG G A (e) = AG G B (e) = AG G M (e) for all edges e ∈ E. Taken together, Lemma 1 and Lemma 2 imply that for graphs, ORCHID simplifies to ORC, regardless of the choice of probability measure and aggregation function. This fulfills Condition I. Moreover, all our aggregation functions are permutation-invariant by construction, thus satisfying Condition II. Concerning Condition III, κ A and κ M exhibit better scalability than κ B , as Wasserstein barycenters are harder to compute than individual distances (Cuturi & Doucet, 2014) . Another reason to prefer κ A and κ M over κ B is the existence of upper and lower bounds that are easy to calculate. To this end, let d min (H) := min{d(u, v) | u ̸ = v ∈ V } be the smallest nonzero distance in H, and let ∥•∥ 1 refer to the L 1 norm of a vector. We then obtain the following bounds for κ A and κ M . Theorem 3. For any probability measure µ and C(e) := 2 /|e|(|e|-1), the curvature κ A (e) of an edge e ∈ E is bounded by 1 -diam(H)C(e) {i,j}⊆e ∥µ i -µ j ∥ 1 ≤ κ A (e) ≤ 1 -d min (H)C(e) {i,j}⊆e ∥µ i -µ j ∥ 1 . ( ) Theorem 4. For any probability measure µ, the curvature κ M (e) of an edge e ∈ E is bounded by 1 -diam(H) max {i,j}⊆e ∥µ i -µ j ∥ 1 ≤ κ M (e) ≤ 1 -d min (H) max {i,j}⊆e ∥µ i -µ j ∥ 1 . Directly from our definitions, we further obtain the following relationships between κ A , κ B , and κ M , and between ORCHID curvatures on hypergraphs and ORC on their unweighted clique expansions. Corollary 5. Given a hypergraph H = (V, E), κ M (e) ≤ κ A (e) and κ M (e) ≤ κ B (e) for all e ∈ E. Corollary 6. Given a hypergraph H = (V, E) and its unweighted clique expansion G • = (V, E • ), for {i, j} ∈ E • , the ORC κ(i, j) in G • equals its ORCHID curvature κ(i, j) of direction {i, j} ⊆ V in H with µ EN , and the ORC κ(i) of i ∈ V in G • equals its ORCHID curvature κ N (i) in H with µ EN . Corollary 6 clarifies that the equal-nodes random walk establishes the connection between ORCHID and ORC on graphs. Moreover, ORCHID curvatures capture relations between global properties and local measurements, similar to the Bonnet-Myers theorem in Riemannian geometry (Myers, 1941) . Theorem 7. Given a subset of nodes s ⊆ V and an arbitrary probability measure µ, let δ i denote a Dirac measure at node i, and let J(µ i ) := W 1 (δ i , µ i ) denote the jump probability of µ i . If (i) all curvatures based on µ are strictly positive, i.e., κ(s) ≥ κ > 0 for all s ⊆ V , and (ii) W 1 (µ i , µ j ) ≤ AG G(s) for {i, j} = argmax(d(s)), then d(s) ≤ J(i) + J(j) κ(s) . Note that condition (ii) of Theorem 7 is always satisfied by AG G M . Finally, in Appendix A.1, we generalize the concepts of cliques, grids, and trees (prototypical positively curved, flat, and negatively curved graphs) to hypergraphs, and we prove the following lemmas to ensure that ORCHID curvatures respect our geometric intuition, as required by Feature C. Theorem 8 (Hyperclique curvature). For an edge e in a hyperclique H = (V, E) on n nodes with edges E = V r for some r ≤ n, with α = 0, κ(e) = 1 - 1 n -1 , i.e., lim n→∞ κ(e) = 1, independent of r. Theorem 9 (Hypergrid curvature). For an edge e in a r-uniform, k-regular hypergrid, with α = 0, κ(e) = 0, independent of r and k. Theorem 10 (Hypertree curvature). For an edge e in a r-uniform, k-regular, 1-intersecting hypertree, with α = 0, κ(e) = 1 - 3(k -1) k + 1 (r -1)k , i.e., lim k→∞ κ(e) = -2, independent of r. To address these questions, we experiment with data from different domains, spanning several orders of magnitude. We investigate four individual real-world hypergraphs in which edges represent co-authorship (aps-a, dblp) and FDA-registered drugs (ndc-ai, ndc-pc), six collections of real-world hypergraphs in which edges represent questions on Stack Exchange Sites (stex), co-authorship by venues (aps-av, dblp-v), co-citation by venues (aps-cv), chords in music pieces (mus), and character cooccurrence on stage in Shakespeare's plays (sha), as well as three collections of synthetic hypergraphs based on different generative models (syn-c, syn-r, syn-s), for a total of 4 321 hypergraphs. We summarize their basic properties in Table 1 , and give more details on their statistics, semantics, and provenance in Appendix A.3. We implement ORCHID in Julia and Python. Our experiments are run on AMD EPYC 7702 CPUs with up to 256 cores. We discuss our implementation and results in more detail in Appendices A.4 and A.5, and make all our code, data, and results publicly available.foot_0 Q1 Parametrization. To understand how our choices of α, µ, and AG G impact ORCHID curvatures, we first compute the pairwise mutual information between ORCHID edge curvatures with 36 different parametrizations. As illustrated in Fig. 2 , while changing α for the same combination of µ and AG G has similar effects across hypergraphs, there is no uniform pattern in the relationships between different combinations of µ and AG G. This underscores the fact that the various notions of ORCHID curvature are not redundant but rather emphasize distinct aspects of hypergraph structure. For a fine-grained view of the differences between parametrizations, we inspect the distributions of 4 6 0 1 4 6 0 2 4 6 0 3 4 6 0 4 4 6 0 5 4 6 0 6 4 6 0 7 4 6 0 8 4 6 1 1 4 6 1 2 4 6 1 Figure 3 : Curvatures carry more information than other local features. We show a 2-dimensional embedding of graphs from the stex collection based on kPCA, using an RBF kernel with curvature distributions computed using α = 0.1, µ WE , and AG G A (3a) or edge neighborhood size distributions (3b) as input features. We see that only curvatures yield a meaningful and discriminative grouping. Corroborating this finding, we also depict Bonferroni-adjusted p-values of testing for significant differences in feature distributions-i.e., p-values multiplied by the number h of hypothesis tests, as Bonferroni (1936) correction requires p ≤ α /h for some desired Type I-error rate α-using MMD on distributions of edge curvatures computed with the same parameters as for (3a) (upper triangle) or edge cardinality (lower triangle), for the subset of the dblp-v collection corresponding to top conferences grouped by areas of research (3c). our four curvature types, (i) edge curvature κ(e), (ii) edge-averaged node curvature κ E (i), (iii) directional curvature κ(i, j) for all {i, j} ⊆ e ∈ E, and (iv) direction-averaged node curvature κ N (i), for each of our 36 parametrizations. By construction, directional curvature and direction-averaged node curvature do not vary with the choice of AG G, and κ M lower-bounds κ A for edge curvatures and edge-averaged node curvatures. However, the differences between κ M and κ A vary across graphs, while consistently, the larger α, the more concentrated our curvature distributions (Appendix A.5). Q2 Hypergraph Exploration. To explore individual graphs, we perform case studies on graphs from the aps-cv collection, leveraging that most nodes in these graphs also occur as edges. We scrutinize the relationships between node and edge curvatures, other local node and edge statistics, and article metadata. We observe that curvature values span a considerable range even for articles with otherwise comparable statistics, but the curvature distributions of influential papers appear to differ systematically from those of less influential papers (Appendix A.5). Exploring graph collections, we run kernel PCA (kPCA) (Schölkopf et al., 1997) with a radial basis function kernel (RBF kernel) and curvatures or other local features known to be powerful baselines (Cai & Wang, 2018) , e.g., node degrees and neighborhood sizes, as inputs to jointly embed graphs from a collection. We statistically bootstrap the maximum mean discrepancy (MMD) (Gretton et al., 2006) to test the null hypothesis that the feature distributions of two graphs are equal. As shown in Fig. 3 , ORCHID curvatures result in more interpretable embeddings and more discriminative tests than other local features. Table 2 : ORCHID curvatures lead to better clusterings than other local features. We show WCC κ(i,j) for collection clusterings computed using RBF or exp. Wasserstein kernels with edge curvatures, edge neighborhood sizes, edge-averaged node curvatures, or node neighborhood sizes as inputs. To explore the utility of curvatures for learning on individual hypergraphs, we perform spectral clustering using either curvatures or other local node features. To evaluate the resulting node clusterings, we leverage that nodes in the aps-cv collection correspond to APS papers, for which we consistently know the titles. Hence, even in the absence of a meaningful ground truth, we can still check the sensibility of a clustering by statistically analyzing the titles grouped together using tools from natural language processing. We find that node clusterings based on curvatures correspond to thematically more coherent groupings (Appendix A.5). For learning on hypergraph collections, we spectrally cluster the collection using RBF or exponential Wasserstein kernel matrices, exp(-γ W(µ x , µ y )), on node and edge curvatures or other local features (Plaen et al., 2020) . Lacking ground-truth labels, we evaluate the clustering quality in an unsupervised manner, using what we call the Wasserstein Clustering Coefficient (WCC). This measure compares averaged intra-cluster Wasserstein distances to averaged inter-cluster Wasserstein distances, such that a lower WCC corresponds to a higher-quality clustering. Given c clusters X = {X 1 , . . . , X c } of hypergraphs H represented by their feature distributions ⃗ χ H , we define RBF κ(e) W κ(e) RBF | N (e)| W | N (e)| RBF κ E (i) W κ E (i) RBF | N (i)| W | N (i)| dblp-v 0. WCC(X ) := X∈X ω(X) 1 + X̸ =Y ∈X ω(X, Y ) , with ω(X) := |X| 2 -1 x̸ =y∈X W(⃗ χ x , ⃗ χ y ) , ω(X, Y ) := (|X||Y |) -1 x,y∈X×Y W(⃗ χ x , ⃗ χ y ) . As illustrated in Table 2 , when evaluated using WCC with directional curvature distributions as ⃗ χ, i.e., WCC κ(i,j) , ORCHID curvatures consistently yield better clusterings than other local features.

5. DISCUSSION AND CONCLUSION

We introduced ORCHID, the first unified framework for Ollivier-Ricci curvature on hypergraphs that integrates and generalizes existing approaches to hypergraph ORC. ORCHID disentangles the common building blocks of all notions of hypergraph ORC, yielding curvature notions that are provably aligned with our geometric intuition. We performed a rigorous theoretical and empirical analysis of ORCHID curvatures, demonstrating their practical utility and scalability through extensive experiments, covering both hypergraph exploration and hypergraph learning. While our work paves the way toward future work seeking to leverage the power of Ollivier-Ricci curvature for hypergraphs in hypergraph learning algorithms, it still has some limitations to be addressed. First, ORC on graphs is defined for any probability measure, but we only consider measures corresponding to a single step of a random walk. Future work could thus harness higher-order random walks or alternative probability measures, and consider analyzing relationships between such probability measures and other structural hypergraph properties. Second, hyperedge intersections can vary in cardinality, but this variation is not currently reflected in our probability measures. One could thus integrate ORCHID with the s-walk framework proposed by Aksoy et al. (2020) , or define persistent ORCHID curvatures based on hypergraph filtrations, extending work on persistent ORC for graphs (Wee & Xia, 2021b) . Third, like the original ORC, ORCHID curvatures are static, but many hypergraphs are inherently dynamic, suggesting a need to develop dynamic curvature notions. Fourth, despite its comprehensive scope, our study only scratches the surface regarding the theoretical and empirical analysis of OR-CHID curvatures, and we believe that there are many more connections between ORCHID curvatures and other hypergraph descriptors to be uncovered, and many additional use cases to be explored. For instance, ORCHID generalizes ORC, but not Forman-Ricci curvature (FRC), and we believe that a framework for FRC could help uncover new relations between combinatorial curvature notions and hypergraph structure. Finally, we imagine that incorporating hypergraph curvature into models as an additional inductive bias could prove useful in hypergraph learning more broadly.

ETHICS STATEMENT

Our main contribution is ORCHID, a unified mathematical framework yielding theoretically sound hypergraph descriptors that are also practically useful for hypergraph exploration and hypergraph learning. As such, ORCHID comes with the caveats applicable to hypergraph exploration and hypergraph learning methods more generally. Most importantly, it should be used with caution on data related to people, and its results should not be decontextualized. We adhered to these principles in our experiments, and selected our datasets accordingly.

REPRODUCIBILITY STATEMENT

To facilitate reproducibility, we provide more details on our data, implementation, and results in more detail in Appendices A.3 to A.5, and make all our code, data, and results publicly available at https://doi.org/10.5281/zenodo.7624573.

A APPENDIX

In this Appendix, we include the following materials. A.1 Deferred Proofs. All proofs for Section 3, along with supporting definitions, lemmas and corollaries. A.2 Related Work. Discussion of related work treating hypergraph curvatures, graph curvatures, or hypergraph analysis. A.3 Dataset Details. Further information on the provenance, semantics, and statistics of our datasets. A.4 Implementation Details. Details on our implementation, including proofs showing the correctness of performance shortcuts. A.5 Further Results. Display and discussion of results not included in the main paper. A.1 DEFERRED PROOFS Lemma 1. For graphs and r-uniform, k-regular, c-cooccurrent hypergraphs, µ EN = µ EE = µ WE . Proof. For notational simplicity, w.l.o.g., we assume that α = 0. In an r-uniform, k-regular, ccooccurrent hypergraph H = (V, E), each node i has degree k and (r-1)k c neighbors, and each edge has cardinality r. Hence, for nodes i, j ∈ V with i ∼ j, µ EN i (j) = 1 | N (i)| = c (r -1)k = 1 k • c • 1 r -1 = 1 deg(i) e∋i,j 1 |e| -1 = µ EE i (j) = c k(r -1) = |{e ∈ E | {i, j} ⊆ e}| f ∋i |f | -1 = µ WE i (j) . Graphs are 2-uniform and 1-cooccurrent (but not generally regular), and hence, | N (i)| = deg(i). Using this to simplify the probability measure expressions, the claim follows. Lemma 2. For graphs, i.e., 2-uniform hypergraphs, we have AG G A (e) = AG G B (e) = AG G M (e) for all edges e ∈ E. Proof. Given probability distributions µ 1 , µ 2 , . . . , µ n , their Wasserstein barycenter is defined as the distribution μ that minimizes f (μ ) := 1 n n i=1 W 1 (μ, µ i ). Since |e| = 2, we minimize W 1 (μ, µ 1 ) + W 1 (μ, µ 2 ). The Wasserstein distance is a metric, so it satisfies the triangle inequality. Thus, W 1 (µ 1 , µ 2 ) ≤ W 1 (μ, µ 1 )+W 1 (μ, µ 2 ) for all choices of μ. Hence, f is minimized by either µ 1 or µ 2 . Evaluating both cases yields AG G A (e) = AG G B (e), and observing that AG G M (e) = W 1 (µ i , µ j ) for e = {i, j} by definition, the claim follows. Theorem 3. For any probability measure µ and C(e) := 2 /|e|(|e|-1), the curvature κ A (e) of an edge e ∈ E is bounded by 1 -diam(H)C(e) {i,j}⊆e ∥µ i -µ j ∥ 1 ≤ κ A (e) ≤ 1 -d min (H)C(e) {i,j}⊆e ∥µ i -µ j ∥ 1 . ( ) Proof. We bound each of the summands in the curvature calculation. Given probability measures µ i , µ j , a result by Gibbs & Su (2002, Theorem 4) states that d min (H) d TV (µ i , µ j ) ≤ W 1 (µ i , µ j ) ≤ diam(H) d TV (µ i , µ j ) , where d TV refers to the total variation distance. The intuition behind this bound is that the total variation distance represents a specific type of transport plan between the two probability measures; the factors arising from the minimum (maximum) distance in a space indicate the minimum (maximum) distance that realizes this transport plan. Since all our measures are defined over a finite space, we have d TV (µ i , µ j ) = 1 /2∥µ iµ j ∥ 1 . The claim follows by considering that pairwise distances are being subtracted to calculate our curvature measure. Theorem 4. For any probability measure µ, the curvature κ M (e) of an edge e ∈ E is bounded by 1 -diam(H) max {i,j}⊆e ∥µ i -µ j ∥ 1 ≤ κ M (e) ≤ 1 -d min (H) max {i,j}⊆e ∥µ i -µ j ∥ 1 . Proof. For AG G M , Eq. ( 18) applies for a single pairwise distance only. We thus only obtain a single bound based on the maximum total variation distance between two probability measures. Theorem 7. Given a subset of nodes s ⊆ V and an arbitrary probability measure µ, let δ i denote a Dirac measure at node i, and let J(µ i ) := W 1 (δ i , µ i ) denote the jump probability of µ i . If (i) all curvatures based on µ are strictly positive, i.e., κ(s) ≥ κ > 0 for all s ⊆ V , and (ii) W 1 (µ i , µ j ) ≤ AG G(s) for {i, j} = argmax(d(s)), then d(s) ≤ J(i) + J(j) κ(s) . ( ) Proof. Let {i, j} = argmax(d(s)) as required in the theorem. We then have following chain of (in)equalities: d(s) = d(i, j) = W 1 (δ i , δ j ) ≤ W 1 (δ i , µ i ) + W 1 (µ i , µ j ) + W 1 (µ j , δ j ) . ( ) Rearranging Eq. ( 14), we have (1 -κ(s)) d(s) = AG G(s). According to our assumptions, W 1 (µ i , µ j ) ≤ AG G(s) = (1 -κ(s)) d(i, j). Inserting this into Eq. ( 19) yields d(i, j) ≤ J(µ i ) + J(µ j ) + (1 -κ(s)) d(i, j) (20) ⇔ d(i, j) -(1 -κ(s)) d(i, j) ≤ J(µ i ) + J(µ j ) (21) ⇔ d(i, j) ≤ J(i) + J(j) κ(s) , where the last step is only valid since κ(s) ≥ κ > 0 by assumption. Definition 11 (Hypercliques, hypergrids, hypertrees). A simple, connected hypergraph H = (V, E) is -a hyperclique if E = V r for some r ≤ |V |, -a hypergrid if H is an r-uniform hypergraph for which there exists a lattice L = (V, E L ) s.t. E = {e ∈ V r | e corresponds to a path of length r in L}, and -a hypertree if there exists a tree T = (V, E T ) s.t. each edge e ∈ E T induces a subtree in T . Corollary 12. Cliques are hypercliques, grids are hypergrids, and trees are hypertrees. Corollary 13. If H = (V, E) is a hyperclique, a hypergrid, or an r-uniform, k-regular, 1- intersecting hypertree, for i, j ∈ V , the sets S i = {e ∈ E | i ∈ e} and S j = {e ∈ E | j ∈ e} are isomorphic, i.e., there exists φ : N (i) ∪ {i} → N (j) ∪ {j} such that {{φ(x) | x ∈ e} | e ∈ S i } = S j . For hypercliques, hypergrids, and hypertrees with certain regularities, AG G A (e) and AG G M (e) are constants. Lemma 14 (Hypercliques, hypergrids, hypertrees). If H = (V, E) is a hyperclique, a hypergrid, or an r-uniform, k-regular, 1-intersecting hypertree, we have AG G A (e) = AG G M (e) = W 1 (µ i , µ j ) = w for w ∈ R, e ∈ E, and i, j ∈ V with i ∼ j. Proof. By Corollary 13, we have w := W 1 (µ i , µ j ) = W 1 (µ p , µ q ) for i, j, p, q ∈ V with i ∼ j and p ∼ q. Hence AG G M (e) = w, and AG G A (e) = 2 |e|(|e|-1) {i,j}⊆e W 1 (µ i , µ j ) = 2 |e|(|e|-1) |e|(|e|-1) 2 w = w, for e ∈ E. Corollary 15. If H = (V, E) is a hyperclique, a hypergrid, or an r-uniform, k-regular, 1intersecting hypertree, AG G A (e) = AG G M (e). Using Lemma 14, we now prove that under AG G A and AG G M , hypercliques are positively curved, hypergrids are flat, and hypertrees are negatively curved, as desired. Theorem 8 (Hyperclique curvature). For an edge e in a hyperclique H = (V, E) on n nodes with edges E = V r for some r ≤ n, with α = 0, κ(e) = 1 - 1 n -1 , i.e., lim n→∞ κ(e) = 1, independent of r. Proof. A hyperclique is r-uniform, (n -1)-regular, and (r -2)-cooccurrent, so µ EN i = µ EE i = µ WE i for each node i ∈ V by Lemma 1. Thus, considering µ EN i , each node i ∈ V has n -1 neighbors to which it distributes its probability mass equally, and we have W 1 (µ i , µ j ) = 1 n-1 for i, j ∈ V with i ∼ j. The claim now follows from Lemma 14. Theorem 9 (Hypergrid curvature). For an edge e in a r-uniform, k-regular hypergrid, with α = 0, κ(e) = 0, independent of r and k. Proof. By Corollary 13, the sets S i = {e ∈ E | i ∈ e} and S j = {e ∈ E | j ∈ e} are isomorphic, and due to the symmetries in the hypergrid, the isomorphism φ : N (i) ∪ {i} → N (j) ∪ {j} minimizing the cost x∈N (i)∪{i} d (x, φ(x)) corresponds to the coupling minimizing W 1 (µ i , µ j ). The cost of φ equals the minimum cost of an isomorphism in H's underlying lattice L between the inclusive (r -1)-hop neighborhoods of two nodes adjacent in L, which is | N (i) ∪ {i}|. Hence, W 1 (µ i , µ j ) = | N (i)∪{i}| | N (i)∪{i}| = 1 for i, j ∈ V with i ∼ j and all choices of µ, and the claim then follows from Lemma 14. Theorem 10 (Hypertree curvature). For an edge e in a r-uniform, k-regular, 1-intersecting hypertree, with α = 0, κ(e) = 1 - 3(k -1) k + 1 (r -1)k , i.e., lim k→∞ κ(e) = -2, independent of r. Proof. An r-uniform, k-regular, 1-intersecting hypertree is 1-cooccurrent, so we have µ EN i = µ EE i = µ WE i for each node i ∈ V by Lemma 1. Each node i ∈ V has (r -1)k neighbors, such that µ EN i distributes a fraction 1 (r-1)k of the probability mass to each of i's neighbors. Nodes i, j ∈ V with i ∼ j share (r -2) neighbors (those in the unique edge e satisfying {i, j} ⊆ e), and the probability mass allocated by µ i to j can be matched with the probability mass allocated by µ j to i at cost 1. Because H is a hypertree, the remaining probability mass, (r -1)(k -1)/ (r -1)k = (k -1)/k, needs to be transported from the neighborhood of i to the neighborhood of j at cost 3. Hence, W 1 (µ i , µ j ) = 1 • 1 (r -1)k + 3 • k -1 k for i, j ∈ V with i ∼ j. Again, the claim follows from Lemma 14.

A.2 RELATED WORK

Hypergraph Curvature Most closely related to our work is the literature on hypergraph curvatures. Much of this literature focuses on defining notions of ORC and Forman-Ricci Curvature (FRC) specifically for directed hypergraphs and studying some of their mathematical and empirical properties (e.g., Leal et al., 2019; 2020; 2021; Saucan & Weber, 2018) Graph Curvature. Beyond the Ollivier-Ricci concepts, there are also curvature concepts based on the contractivity of operators (Bakry & Émery, 1985) , which could be considered a "spiritual precursor" to Ollivier's work. This perspective has been used to provide a predominantly spectral perspective on curvature (Liu et al., 2019; Münch & Rose, 2020) , whereas ORC can foremost be seen as a probabilistic concept. Recently, Kempton et al. (2020) defined a hybrid between Ollivier and Bakry-Émery curvature on graphs. A more combinatorial perspective is assumed by FRC, which is motivated by defining equivalent formulations of curvature on structured spaces, such as CW complexes or simplicial complexes. Originally described by Forman (2003) , FRC has since been improved in the context of explaining the learning behavior of graph neural networks (Topping et al., 2022) , with other recent work focusing on fusing it with topological graph properties (Roy et al., 2020) . ORC was first developed for general Markov chains (Ollivier, 2007; 2009) , but has quickly been adopted to characterize graphs (Jost & Liu, 2014) and networks (Weber et al., 2017) . With numerous follow-up publications elucidating the relationship between structural properties of a graph and ORC (Bauer et al., 2017; Samal et al., 2018) , the initial concept has also been substantially updated (Bourne et al., 2018; Lin et al., 2011) . As an emerging research direction, we identified the combination of ORC (and FRC) with concepts from computational topology, leading to an inherent multi-scale perspective on data. This has led to promising results for treating biomedical graph data (Wee & Xia, 2021a; b) . Hypergraph Learning. Work tackling certain hypergraph learning tasks such as hypergraph clustering (Amburg et al., 2020; Veldt et al., 2020) has existed for many years (Wachman & Khardon, 2007; Zhou et al., 2006) . Some approaches make use of intrinsic structural properties of hypergraphs, leading to hypergraph neural network architectures (Huang & Yang, 2021) and message passing formulations (Gao et al., 2019) , whereas others focus on developing similarity measures, i.e., kernels (Bai et al., 2014; Bloch & Bretto, 2013; Martino & Rizzi, 2020) . Methods from the rich literature on graph kernels can also be employed to address hypergraph learning tasks, namely, by transforming the hypergraph into a graph, but most popular transformations are lossy and may drastically increase the size of the object under study, such that the practicality and utility of this approach is unclear. Hypergraph Mining and Analysis. In recent years, there has been a renewed interest in hypergraph mining and analysis. Notably, there is work developing new hypergraph descriptors (Aksoy et al., 2020) , extending motif discovery to hypergraphs (Lee & Shin, 2021; Lee et al., 2020) , solving classic graph mining tasks in the hypergraph setting (Macgregor & Sun, 2021) , or identifying patterns in real-world hypergraphs (Do et al., 2020) . However, to the best of our knowledge, none of this work draws on curvature concepts to solve the mining and analysis tasks of interest.

A.3 DATASET DETAILS

At a high level, our workflow to produce and work with the datasets used in our experiments (Section 4) was as follows: 1. Obtain raw data in a variety of different formats, e.g., CSV, JSON, or XML. 2. Transform the raw data into a hypergraph CSV that retains as much of the raw data semantics as possible. This CSV is guaranteed to contain one row per edge, one column with unique edge identifiers, and one column with the nodes contained in each edge. It may also contain additional columns holding further metadata associated with individual edges. Column names may differ between datasets to reflect dataset semantics. 3. Provide a unified loading interface to the datasets in Python. 4. Transform semantics-laden hypergraph CSV files into semantics-free one-based integer edge lists and sparse matrices for curvature computations in Julia, compute curvatures in Julia, and store the results in JSON files. 5. Map results back to original dataset semantics in Python for further examination. In the following, we give more details on the provenance, semantics, and statistics of our datasets. Unless if otherwise noted, we make our datasets publicly available with our online materials, along with the raw data and all preprocessing code.foot_1 

A.3.1 APS-A, APS-AV, APS-CV: AMERICAN PHYSICAL SOCIETY JOURNAL ARTICLES

The American Physical Society (APS), a nonprofit organization working to advance the knowledge of physics, publishes several peer-reviewed research journals. The APS makes two datasets based on its publications available to researchers: (i) an edge list containing (citing, cited) pairs of articles contained in its collection, and (ii) a JSON dataset containing the metadata for each article in its collection. These datasets are updated on a yearly basis, and researchers can request access by filling out a web form located at https://journals.aps.org/datasets. We made a data access request and were granted access to the 2021 versions of the APS datasets within two weeks. From the APS datasets, we derived the following hypergraphs and hypergraph collections: (i) aps-a: Each node corresponds to an author who published at least one article in an APS journal. Each edge e corresponds to an article in an APS journal, and it contains as nodes all authors of e. This hypergraph is derived from the JSON data. (ii) aps-av: aps-a, split up by journal, for a total of 19 hypergraphs. For each journal j, the edge set of aps-a is restricted to articles from j, and the node set of aps-a is restricted to nodes authoring at least one article from j. (iii) aps-cv: We derive one hypergraph for each of the 19 journals represented in the edge list data. For each journal j, the edge set comprises articles from j citing at least one article in j, and the node set consists of articles in j cited by at least one article in j. Access. Due to the terms and conditions associated with data access, we cannot make the APS datasets or the hypergraphs derived from them publicly available, and researchers seeking to work with this data will have to request data access from APS directly as outlined above. However, we make our preprocessing code publicly available, such that researchers who have obtained access to the APS datasets can easily reproduce our hypergraphs from the raw data. Caveats. When doing our case studies on the aps-cv dataset, we observed that some DOIs present in the edge list had no associated metadata in the JSON files provided by APS. This does not affect our curvature computations, but it might constrain the interpretability of results, e.g., when inspecting node clustering results based on article categories present only in the metadata. Caveats. For about 0.1% of all records, our XML parser failed, which originally resulted in "None" as one of the authors of all problematic records. We then redid the preprocessing (and all subsequent computations) excluding those records, but the records were still counted when determining the venues to include in dblp-v.

A.3.3 NDC-AI, NDC-PC: DRUGS APPROVED BY THE U.S. FOOD & DRUG ADMINISTRATION

The U.S. Food and Drug Administration (FDA) collects information on all drugs manufactured, prepared, propagated, compounded, or processed by registered drug establishments for commercial distribution in the United States. The FDA maintains the National Drug Code (NDC) Directory, which is updated daily and contains the listed NDC numbers and all information submitted as part of a drug listing. We downloaded the NDC data from https://download.open.fda.gov/ drug/ndc/drug-ndc-0001-of-0001.json.zip on August 21, 2022, and transformed it into a CSV file, an example record of which is shown in Table 3 . From this CSV file, we derived two hypergraphs. In both hypergraphs, edges correspond to FDA-registered drugs. In ndc-ai, nodes correspond to the active ingredients used in these drugs, and in ndc-pc, nodes correspond to the pharmaceutical classes assigned to these drugs. The edge cardinality distributions resulting from both semantics are shown in Fig. 4 .

A.3.4 MUS: MUSIC PIECES

music21 is an open-source Python library for computer-aided musicology that comes with a corpus of public-domain music in symbolic notation. Using the music21 library, we extracted a collection of hypergraphs from the music21 corpus. In this collection, each hypergraph corresponds to a music piece, each edge corresponds to a chord sounding for a specific duration at a particular offset from the start of the piece, and each node corresponds to a sound frequency. Note that hypergraphs in the mus collection are node-aligned, which distinguishes them from the hypergraphs in all other collections. In Table 4 , we show the cardinality decomposition of selected music hypergraphs that include the largest edges. There, we include edges of cardinality 0 for completeness (they correspond to pauses in the music), but they are discarded in our curvature computations. Caveats. When constructing our hypergraph collection from the music21 corpus, we excluded pieces that are primarily monophonic. After exploring the corpus manually and evaluating the chord ['NO70W886KK', '362O9ITL9D']} statistics of individual pieces, we decided to use only music with the following prefixes (corresponding to names of composers or collections) : bach, beethoven, chopin, haydn, handel, monteverdi, mozart, palestrina, schumann, schubert, verdi, joplin, trecento, weber . Some pieces are included in several editions (e.g., BWV 190.7, the chorale by Johann Sebastian Bach occupying the first two lines of Table 4 , which is included in both the original and an instrumental version).

A.3.5 STEX: STACKEXCHANGE SITES

StackExchange is a platform hosting Q&A communities also known as sites. Each question is assigned at least one and at most five tags. In the second half of August 2022, we used the StackExchange API to download all questions asked on all StackExchange sites listed on the StackExchange data explorer (https://data.stackexchange.com/), along with their associated tags and other metadata (including question titles and, for smaller sites, also question bodies). From our downloads, we created the stex hypergraph collection, in which each hypergraph corresponds to a StackExchange site, each edge corresponds to a question asked on a site, and each node corresponds to a tag used at least once on a site. Tables 5 to 11 list the basic statistics for each hypergraph from the stex collection. Caveats. While our curvature computations uniformly include only questions asked no later than August 15, midnight GMT, the metadata associated with these questions stems from snapshots at different times in the second half of August 2022. We also excluded stackoverflow.com and math.stackexchange.com from our downloads because they could not be downloaded within one day due to API quota limitations, and ru.stackoverflow.com because it was large but we would not have been able to interpret our results. Table 4 : Selection of hypergraphs from the mus collection. n is the number of nodes, m is the number of edges, and the columns labeled i for i ∈ {0, 1, . . . , 12} record the number of edges of cardinality i in the hypergraph. Identifiers correspond to abbreviated music21 identifiers and generally have the shape {composer}-{work identifier}-{suffix}, where o stands for opus, m stands for movement, and inst stands for instrumental. Here, each hypergraph represents one of Shakespeare's plays, which are categorized into three types: comedy, history, and tragedy. In each hypergraph representing a play, nodes correspond to named characters in the play, and edges correspond to groups of characters simultaneously present on stage. These hypergraphs are documented extensively in the paper introducing the HYPERBARD dataset (Coupette et al., 2022) .

A.3.7 SYN-C, SYN-R, SYN-S: SYNTHETIC HYPERGRAPHS

To generate synthetic hypergraphs, we wrote hypergraph generators extending three well-known graph models to hypergraphs. (i) For syn-c, we extended the configuration model, which, for undirected graphs, is specified by a degree sequence. Our hypergraph configuration model is specified by a node degree sequence and an edge cardinality sequence. (ii) For syn-r, we extended the Erdős-Rényi random graph model, which, for undirected graphs, is specified by a number of nodes n and an edge existence probability p. Our Erdős-Rényi random hypergraph model is specified by a number of nodes n, a number of edges m, and the probability p of a one in any cell of the node-to-edge incidence matrix. (iii) For syn-s, we extended the stochastic block model which, for undirected graphs, is specified by a vector of c community sizes and a c × c affinity matrix specifying affiliation probabilities between communities. Our hypergraph stochastic block model is specified by a vector of c V node community sizes, a vector of c E edge community sizes, and a c V × c E affinity matrix specifying affiliation probabilities between node communities and edge communities. We used each of our generators to create 250 hypergraphs with identical node count n, edge count m, and density c /nm, where c is the number of filled cells in the node-to-edge incidence matrix. Caveats. Our generators work by pairing node and edge indices, and duplicated (node, edge) index pairs are discarded to generate simple hypergraphs, which can lead to small deviations from the input specification in practice. A.5 FURTHER RESULTS Here, we showcase further results to support and supplement the exposition in the main paper. Q1 Parametrization. Expanding the discussion on ORCHID parametrizations, Fig. 5 shows the distributions of edge curvatures and edge-averaged node curvatures for two hypergraphs from the dblp-v collection, representing top conferences in machine learning and theoretical computer science, respectively. The figure highlights once more the consistently concentrating effect of increasing α, and it elucidates the differential effects of moving from maximum aggregation (left parts of the split violins) to mean aggregation (right parts of the split violins), from almost no shifts to large shifts in probability mass (compare, e.g., Fig. 5b , top right panel, with Fig. 5b , bottom left panel). Fig. 5 might convey the impression that, other parameters being equal, the distributions of curvatures based on µ EN and µ WE are more similar to each other than to µ EE . This does not hold in general, however, as demonstrated for ndc-pc in Fig. 6a , where node curvature distributions based on µ WE are more similar to those based on µ EE than to the node curvature distributions based on µ EN . Comparing Fig. 6a to Fig. 6b (ndc-ai), we further observe that rather similar distributions of edge curvature and directional curvature can be accompanied by rather different distributions of edge-averaged and direction-averaged node curvatures, even for hypergraphs originating from the same domain. Finally, when visualizing curvatures for hypergraphs in the same collection or across collections with related semantics (Fig. 7 ), we can identify several distinct prototypical shapes of curvature distributions and relationships between curvatures based on different probability measures. and each edge i comprises the nodes j cited by the paper corresponding to i. Therefore, the edge curvature of a (citing) paper i can be interpreted as an indicator of its breadth of content: The more positive the edge curvature, the stronger the general tendency of the papers jointly cited by paper i to be cited together, suggesting that these papers are topically related. Similarly, the node curvature of a (cited) paper j can be interpreted as an indicator of its breadth of impact: The more negative the node curvature, the more diversely the paper has been cited in the literature. With these interpretations in mind, we compute all curvatures for the PRE citation hypergraph, using α = 0.1, µ WE , and AG G A . We find that for all 54 articles with at least 100 citations (top articles), the edge-averaged node curvature is larger than the direction-averaged node curvature, which is always negative, although only 36% of all PRE articles exhibit this feature combination. This matches the intuition that from highly cited articles, the literature should diverge in many different directions. At the same time, we observe that curvatures span a considerable range, even among top articles. In Table 12 , we record the top articles with extreme curvature values, and in Fig. 8 , we display the pairwise relationships between curvature features and other local features for all PRE articles. In line with the interpretations sketched above, the top article with the largest node curvatures is a classic reference for community detection in the highly integrated field of network science, whereas the articles with the smallest node curvatures address topics relevant to a broader range of approaches to collective phenomena in many-body systems (which are the focus of PRE). Table 13 : ORCHID features lead to node clusterings that are semantically more coherent than node clusterings derived from other local features. For two clusterings of the PRE citation hypergraph from the aps-cv collection-one a spectral clustering using the sign of directional curvatures as a feature (Table 13a ), the other a clustering using an RBF kernel with node neighborhood size as a feature (Table 13b )-we show the top terms, i.e., the terms associated with each cluster that have a TF-IDF score of at least 0.1, along with their TF-IDF scores and their occurrence frequency across all clusters, in tuples of shape (term, TF-IDF score, global occurrence frequency). (a) Feature: sign of directional ORCHID curvatures (smectic, 0.51, 1), (liquid, 0.39, 4), (crystals, 0.22, 4), (antiferroelectric, 0.21, 1), (crystal, 0.19, 2), (phase, 0.17, 4), (chiral, 0.17, 1), (cα, 0.15, 1), (paper, 0.15, 1), (rock, 0.15, 1), (scissors, 0.15, 1), (electric, 0.14, 1), (phases, 0.14, 2), (ray, 0.13, 1), (cyclic, 0.13, 1), (species, 0.12, 1), (field, 0.11, 3), (games, 0.1, 2) (resetting, 0.76, 1), (stochastic, 0.32, 1), (random, 0.24, 2), (walks, 0.18, 1), (diffusion, 0.17, 2), (brownian, 0.15, 1), (processes, 0.11, 1) (nematic, 0.66, 2), (liquid, 0.41, 4), (crystal, 0.3, 2), (colloidal, 0.26, 1), (colloids, 0.18, 1), (crystals, 0.16, 4), (particles, 0.15, 1), (interaction, 0.14, 1) (boltzmann, 0.75, 1), (lattice, 0.51, 1), (method, 0.2, 1), (flows, 0.15, 1), (model, 0.11, 5) (quantum, 0.58, 3), (heat, 0.38, 1), (engine, 0.34, 1), (engines, 0.27, 1), (efficiency, 0.24, 1), (performance, 0.21, 1), (power, 0.17, 1), (maximum, 0.17, 1), (otto, 0.12, 1), (carnot, 0.12, 1), (refrigerators, 0.1, 1) (granular, 0.85, 2), (gas, 0.17, 1), (gases, 0.16, 1), (inelastic, 0.13, 1), (driven, 0.13, 1) (chimera, 0.7, 1), (states, 0.35, 1), (oscillators, 0.33, 1), (coupled, 0.31, 2), (networks, 0.2, 3), (nonlocally, 0.13, 1), (chimeras, 0.12, 1), (coupling, 0.1, 1)] (dynamics, 0.19, 1), (model, 0.18, 5), (networks, 0.17, 3), (liquid, 0.16, 4), (diffusion, 0.13, 2), (phase, 0.13, 4), (quantum, 0.13, 3), (dimensional, 0.12, 1), (random, 0.12, 2), (flow, 0.11, 2), (systems, 0.11, 1), (plasma, 0.11, 1), (coupled, 0.1, 2), (time, 0.1, 1) (dynamic, 0.41, 1), (ising, 0.35, 2), (phase, 0.34, 4), (oscillating, 0.34, 1), (field, 0.32, 3), (transition, 0.24, 1), (kinetic, 0.2, 1), (model, 0.2, 5), (magnetic, 0.15, 1), (nonequilibrium, 0.13, 1), (blume, 0.12, 1), (capel, 0.12, 1), (transitions, 0.11, 1) (biaxial, 0.53, 1), (nematic, 0.5, 2), (liquid, 0.29, 4), (crystals, 0.19, 4), (phases, 0.19, 2), (bent, 0.17, 1), (phase, 0.16, 4), (molecules, 0.15, 1), (core, 0.14, 1), (simulation, 0.12, 1), (molecular, 0.1, 1), (antinematic, 0.1, 1), (mesogenic, 0.1, 1) (passive, 0.47, 1), (scalar, 0.41, 1), (anomalous, 0.39, 1), (scaling, 0.29, 1), (advected, 0.24, 1), (turbulence, 0.22, 1), (turbulent, 0.18, 1), (advection, 0.15, 1), (loop, 0.12, 1), (anisotropy, 0.11, 1), (anisotropic, 0.11, 1), (renormalization, 0.11, 1), (vector, 0.11, 1), (field, 0.1, 3) (quantum, 0.51, 3), (decay, 0.45, 1), (loschmidt, 0.33, 1), (echo, 0.33, 1), (fidelity, 0.25, 1), (chaotic, 0.23, 1), (semiclassical, 0.18, 1), (lyapunov, 0.13, 1), (perturbations, 0.11, 1) (casimir, 0.69, 1), (critical, 0.37, 1), (forces, 0.27, 1), (films, 0.13, 1), (size, 0.13, 1), (force, 0.13, 1), (finite, 0.12, 1), (free, 0.11, 1), (ising, 0.11, 2), (thermodynamic, 0.1, 1), (model, 0.1, 5) (traffic, 0.88, 1), (flow, 0.3, 2), (model, 0.13, 5), (car, 0.13, 1), (following, 0.11, 1) (rogue, 0.62, 1), (schrödinger, 0.34, 1), (waves, 0.31, 2), (wave, 0.29, 2), (equation, 0.25, 1), (nonlinear, 0.21, 2), (solutions, 0.17, 1), (soliton, 0.12, 1), (solitons, 0.11, 1) (cooperation, 0.6, 1), (dilemma, 0.38, 1), (prisoner, 0.34, 1), (game, 0.25, 1), (games, 0.24, 2), (evolutionary, 0.19, 1), (networks, 0.18, 3), (spatial, 0.17, 1), (social, 0.14, 1), (public, 0.12, 1), (goods, 0.1, 1) (granular, 0.59, 2), (chains, 0.36, 1), (chain, 0.32, 1), (propagation, 0.22, 1), (waves, 0.21, 2), (nonlinear, 0.2, 2), (solitary, 0.2, 1), (wave, 0.17, 2), (pulse, 0.15, 1), (crystals, 0.14, 4), (strongly, 0.12, 1)



https://doi.org/10.5281/zenodo.7624573 https://doi.org/10.5281/zenodo.7624573



Figure 1: ORCHID's probability measures are based on random walks, depicted for the neighborhood of a node 0. Arrows outgoing from the same node or edge are traversed with uniform probability.

Figure 2: ORCHID curvature notions are non-redundant. We show the Min-Max-Normalized Mutual Information (NMI) between ORCHID edge curvatures with 36 different parametrizations, using probability measures µ EN (EN), µ EE (EE), or µ WE (WE), aggregations AG G M (M) or AG G A (A), and α ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5} (ordered →, ↓), for two synthetic and two real-world hypergraphs.

kPCA (directional curvature) (b) kPCA (edge neighborhood size) (c) MMD (cardinality vs. curvature)

Figure 4: Edge cardinality distributions for hypergraphs derived from NDC data.

6 SHA: SHAKESPEARE'S PLAYS The sha collection is a subset of the HYPERBARD dataset recently introduced by Coupette et al. (2022), based on the TEI-encoded XML files of Shakespeare's plays provided by Folger Digital Texts.

invariant motion in intermittent chaotic systems min κ N (i) 10.1103/PhysRevE.48.R29 -0.241216 -0.704752 0.463536 0 Extended self-similarity in turbulent flows max ∆(κ(i)) 10.1103/PhysRevE.64.056101 -0.131542 -0.668266 0.536724 0.038477 Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram min ∆(κ(i)) 10.1103/PhysRevE.74.016118 -0.015495 -0.191193 0.175697 -0.156824 Amorphous systems in athermal, quasistatic shear max κdefects and interactions in nematic emulsions min κ(e) 10.1103/PhysRevE.64.016706 -0.191094 -0.552908 0.361815 -0.644446 Fast Monte Carlo algorithm for site or bond percolation

Figure 8: Highly cited articles have distinct curvature distributions. Pairwise relationships between (left-to-right, top-to-bottom) node neighborhood size, edge-averaged node curvature, directionaveraged node curvature, curvature delta, node expansion := deg(i) /| N (i)|, edge cardinality, edge neighborhood size, edge curvature, edge expansion := deg(e) /| N (e)|, and (as an additional metadata feature) publication year, for all PRE articles cited at least once by another PRE article, colored by node degree (number of citations within PRE), where brighter colors signal larger node degrees.

Hypergraphs used in ORCHID experiments cover several domains and orders of magnitude. n and m are node and edge counts, n /m is the aspect ratio, c is the number of filled cells in the nodeto-edge incidence matrix, c /nm is the density, and N is the number of hypergraphs in a collection.

. Notably, the directed hypergraph ORC introduced byEidi & Jost (2020)  is an instantiation of our framework with µ EE and AG G A . Curvature notions for undirected hypergraphs are comparatively less explored, and especially the literature generalizing ORC is almost entirely theoretical. The generalization of ORC proposed byAsoodeh et al. (2018) and the equivalent measure used byBanerjee (2021) are instantiations of our framework using µ EE and AG G B .Akamatsu (2022) propose (α, h)-ORC using cost functions based on structured optimal transport, and Ikeda et al. (2021) define λ-coarse Ricci curvature using a λ-nonlinear Kantorovich difference based on a submodular hypergraph Laplacian as a generalization of ORC as introduced byLin et al. (2011). Both of these works define curvature exclusively for pairs of nodes, rather than for hyperedges. Beyond ORC,Yadav et al. (2022) study FRC for undirected hypergraphs defined via poset representations, andMurgas et al. (2022) explore hypergraphs constructed from protein-protein interactions using a different notion of FRC based on the Hodge Laplacian. To the best of our knowledge, with ORCHID, we are the first to introduce a flexible framework generalizing ORC to hypergraphs, and to demonstrate the utility of hypergraph ORC in practice.

Example record from the data underlying the ndc-ai and ndc-pc hypergraphs.

Basic statistics of hypergraphs derived from StackExchange sites. n is the number of nodes, m is the number of edges, and columns labeled i ∈ [5] count edges of cardinality i.

Basic statistics of hypergraphs derived from StackExchange sites (continued). n is the number of nodes, m is the number of edges, and columns labeled i ∈ [5] count edges of cardinality i.

Basic statistics of hypergraphs derived from StackExchange sites (continued). n is the number of nodes, m is the number of edges, and columns labeled i ∈ [5] count edges of cardinality i.

Basic statistics of hypergraphs derived from StackExchange sites (continued). n is the number of nodes, m is the number of edges, and columns labeled i ∈ [5] count edges of cardinality i.

Basic statistics of hypergraphs derived from StackExchange sites (continued). n is the number of nodes, m is the number of edges, and columns labeled i ∈ [5] count edges of cardinality i.

Basic statistics of hypergraphs derived from StackExchange sites (continued). n is the number of nodes, m is the number of edges, and columns labeled i ∈ [5] count edges of cardinality i.

Basic statistics of hypergraphs derived from StackExchange sites (continued). n is the number of nodes, m is the number of edges, and columns labeled i ∈ [5] count edges of cardinality i.

Q2 Hypergraph Exploration. Extending the discussion of individual hypergraph exploration in the main paper, we focus on a case study of the citation hypergraph of the journal Physical Review E (PRE), which regularly publishes, inter alia, interdisciplinary work on graphs and networks. In this hypergraph, which has 45 504 nodes and 52 574 edges, nodes represent PRE articles cited by at least one other PRE article, edges represent PRE articles citing at least one other PRE article,

Top articles display varying relationships between different curvature values. We list the PRE articles that, out of all PRE articles cited at least 100 times, exhibit the most extreme curvaturerelated values.

availability

Published as a conference paper at ICLR 2023 SPONGE-unweighted-100-signedadj-17 SPONGE_sym-unweighted-100-signedadj-17 curvature-100-dir_curvature_negative-17 curvature-100-dir_curvature_positive-17 rbfkernel-100-adj-node_curvature_edges-17 rbfkernel-100-adj-node_curvature_neighborhood-17 rbfkernel-100-node_curvature_edges-17 rbfkernel-100-node_curvature_neighborhood-17 rbfkernel-100-node_degree-17 rbfkernel-100-node_neighborhood_size-17 rbfkernel-200-node_degree-17 rbfkernel-200-node_neighborhood_size-17 rbfkernel-50-node_degree-17 rbfkernel-50-node_neighborhood_size-17 rbfkernel-STD-adj-node_curvature_edges-17 rbfkernel-STD-adj-node_curvature_neighborhood-17 rbfkernel-STD-node_curvature_edges-17 rbfkernel-STD-node_curvature_neighborhood-17 rbfkernel-STD-node_degree-17 rbfkernel-STD-node_neighborhood_size-17 spectral_cluster_adjacency_reg-unweighted-100-signedadj-17 spectral_cluster_bnc-unweighted-100-signedadj-17 spectral_cluster_laplacian-unweighted-100-signedadj-17 unweighted-100-adj-17 wassersteinkernel-100-W-17 wassersteinkernel-200-W-17 wassersteinkernel-50-W-17 wassersteinkernel-STD-W-17 SPONGE-unweighted-100-signedadj-17 SPONGE_sym-unweighted-100-signedadj-17 curvature-100-dir_curvature_negative-17 curvature-100-dir_curvature_positive-17 rbfkernel-100-adj-node_curvature_edges-17 rbfkernel-100-adj-node_curvature_neighborhood-17 rbfkernel-100-node_curvature_edges-17 rbfkernel-100-node_curvature_neighborhood-17 rbfkernel-100-node_degree-17 rbfkernel-100-node_neighborhood_size-17 rbfkernel-200-node_degree-17 rbfkernel-200-node_neighborhood_size-17 rbfkernel-50-node_degree-17 rbfkernel-50-node_neighborhood_size-17 rbfkernel-STD-adj-node_curvature_edges-17 rbfkernel-STD-adj-node_curvature_neighborhood-17 rbfkernel-STD-node_curvature_edges-17 rbfkernel-STD-node_curvature_neighborhood-17 rbfkernel-STD-node_degree-17 rbfkernel-STD-node_neighborhood_size-17 spectral_cluster_adjacency_reg-unweighted-100-signedadj-17 spectral_cluster_bnc-unweighted-100-signedadj-17 spectral_cluster_laplacian-unweighted-100-signedadj-17 unweighted-100-adj-17 wassersteinkernel-100-W-17 wassersteinkernel-200-W-17 wassersteinkernel-50-W-17 wassersteinkernel-STD-W-17

A.3.2 DBLP, DBLP-V: DBLP JOURNAL ARTICLES AND CONFERENCE PROCEEDINGS

The DBLP computer science library provides high-quality bibliographic information on computer science publications. All DBLP data is released under a CC0 license and freely available in one XML file that is updated regularly. We obtained the XML dump dated September 1, 2022 from https://dblp.org/xml/release/ and preprocessed it into a CSV file containing only entries corresponding to the XML tags article and inproceedings, with one row per entry and the following columns:-key: unique identifier of the entry, e.g., conf/iclr/XuHLJ19 or journals/cacm/Savage16c. -tag: XML tag associated with the entry, one of {inproceedings, article}.-crossref: cross-reference to a venue, e.g., conf/iclr/2019. Sometimes missing although a venue should be present. -author: semicolon-separated list of DBLP author names, e.g., Keyulu Xu;Weihua Hu;Jure Leskovec;Stefanie Jegelka. Sometimes missing (we discard entries without authors when loading the data). -year: entry publication year, e.g., 2019.-title: entry title, e.g., How Powerful are Graph Neural Networks?.-publtype: if present, the type of publication, e.g., informal. Mostly missing.-journal: for article entries, the name of the publishing journal, e.g., Commun. ACM.-booktitle: for inproceedings entries, the name of the publishing venue, e.g., ICLR.-volume: if present, the publication volume, e.g., 59.-number: if present, the publication number, e.g., 7.-pages: if present, the entry pages, e.g., 12-14.-mdate: modification date, e.g., 2019-07-25. This constitutes our individual hypergraph dblp, in which each edge represents a paper, and each node represents an author. From this hypergraph, we additionally derived the dblp-v hypergraph collection, which contains different subsets of dblp by venue or group of venues. More precisely, we distinguish 1 193 hypergraphs as follows:(i) dblp_journal-all, dblp_inproceedings-all: partition of dblp into entries published in journals and entries published as part of proceedings.(ii) dblp_journal-{journal}: one hypergraph per journal, for all journals with at least 1 000 articles in the DBLP dataset.(iii) dblp_proceedings-{venue}: one hypergraph per venue (grouped by booktitle), for all venues with at least 1 000 papers in the DBLP dataset.( 

A.4 IMPLEMENTATION DETAILS

To simplify the computation of Wasserstein distances between adjacent nodes, we leverage the following fact about the relevant distances (i.e., transportation costs) between nodes. Lemma 1. Given a hypergraph H = (V, E) and nodes i, j, k, ℓ ∈ V with i ∼ j as well as µ i (k) > 0  and µ j (ℓ) > 0, d(k, ℓ) ≤ 3. Proof. By the triangle inequality and the definition of our probability measures, we haveFurthermore, we speed up the computation of Wasserstein distances by exploiting the following observation to reduce each instance to its smallest equivalent instance. Lemma 2. Given a hypergraph H = (V, E) and nodes i, jIn this case, let C * be an optimal coupling between µ i and µ j . If the probability mass allocated to k by µ i does not get moved at all in C * , it contributes 0 to W 1 (µ i , µ j ), and we are done. Therefore, assume otherwise. Then there exist nodes p, q ∈ V such that probability mass gets moved from p to k and from k to q in C * . By the triangle inequality, d(p, q) ≤ d(p, k) + d(k, q), and as d(k, k) = 0, the cost of moving that mass directly from p to q and keeping all mass at k cannot be larger than the cost of moving the mass from p to k and from k to q. Hence, we can modify C * such that the mass allocated to k by µ i does not get moved at all without increasing the coupling cost. Thus, there always exists an optimal coupling in which all mass at k remains at k, and the claim follows. ) Q3 Hypergraph Learning. Continuing the discussion of node clustering in hypergraphs abridged in the main paper, we again focus on the citation hypergraph corresponding to articles from Physical Review E (PRE). We experiment with a variety of features, clustering methods, and combinations thereof, including both classic and recent clustering methods, such as SPONGE (Cucuringu et al., 2019) . We aim for 17 clusters, which is the number of "disciplines" present in the APS metadata (unfortunately, disciplines are only assigned to more recent articles, and hence, cannot serve as ground truth). As depicted in Fig. 9 , we find that clusterings generated using curvatures as features differ radically from clusterings generated using other local features. To evaluate the semantic sensibility of our clusterings in the absence of a suitable ground truth, we leverage the metadata associated with PRE articles. In particular, we concatenate the titles of the articles grouped in each of our clusters into "documents", and consider the set of all clusters as our "document collection", to then identify characteristic terms for each cluster using TF-IDF feature extraction. We observe that clusterings based on ORCHID features tend to be more thematically coherent than clusterings based on other local features. As illustrated in Table 13 , ORCHID features tend to separate paper titles well by topic (many frequently occurring terms are associated with only very few clusters, and the terms grouped together characterize specific subfields of the physics of collective phenomena covered by PRE), whereas clusters based on non-ORCHID features are much less topically focused.

