UNVEILING THE SAMPLING DENSITY IN NON-UNIFORM GEOMETRIC GRAPHS

Abstract

A powerful framework for studying graphs is to consider them as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius. Currently, the literature mostly focuses on uniform sampling and constant neighborhood radius. However, real-world graphs are likely to be better represented by a model in which the sampling density and the neighborhood radius can both vary over the latent space. For instance, in a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius. In this work, we first perform a rigorous mathematical analysis of this (more general) class of models, including derivations of the resulting graph shift operators. The key insight is that graph shift operators should be corrected in order to avoid potential distortions introduced by the non-uniform sampling. Then, we develop methods to estimate the unknown sampling density in a self-supervised fashion. Finally, we present exemplary applications in which the learned density is used to 1) correct the graph shift operator and improve performance on a variety of tasks, 2) improve pooling, and 3) extract knowledge from networks. Our experimental findings support our theory and provide strong evidence for our model.

1. INTRODUCTION

Graphs are mathematical objects used to represent relationships among entities. Their use is ubiquitous, ranging from social networks to recommender systems, from protein-protein interactions to functional brain networks. Despite their versatility, their non-euclidean nature makes graphs hard to analyze. For instance, the indexing of the nodes is arbitrary, there is no natural definition of orientation, and neighborhoods can vary in size and topology. Moreover, it is not clear how to compare a general pair of graphs since they can have a different number of nodes. Therefore, new ways of thinking about graphs were developed by the community. One approach is proposed in graphon theory (Lovász, 2012) : graphs are sampled from continuous graph models called graphons, and any two graphs of any size and topology can be compared using certain metrics defined in the space of graphons. A geometric graph is an important case of a graph sampled from a graphon. In a geometric graph, a set of points is uniformly sampled from a metric-measure space, and every pair of points is linked if their distance is less than a specified neighborhood radius. Therefore, a geometric graph inherits a geometric structure from its latent space that can be leveraged to perform rigorous mathematical analysis and to derive computational methods. Geometric graphs have a long history, dating back to the 60s (Gilbert, 1961) . They have been extensively used to model complex spatial networks (Barthelemy, 2011) . One of the first models of geometric graphs is the random geometric graph (Penrose, 2003) , where the latent space is a Euclidean unit square. Various generalizations and modifications of this model have been proposed in the literature, such as random rectangular graphs (Estrada & Sheerin, 2015) , random spherical graphs (Allen-Perkins, 2018), and random hyperbolic graphs (Krioukov et al., 2010) . Geometric graphs are particularly useful since they share properties with real-world networks. For instance, random hyperbolic graphs are small-world, scale-free, with high clustering (Papadopoulos et al., 2010; Gugelmann et al., 2012) . The small-world property asserts that the distance between any two nodes is small, even if the graph is large. The scale-free property is the description of the degree sequence as a heavy-tailed distribution: a small number of nodes have many connections, while the rest have small neighborhoods. These two properties are related to the presence of hubs -nodes with large neighborhoods -while the high clustering is related to the network's community structure. However, standard geometric graph models focus mainly on uniform sampling, which does not describe real-world networks well. For instance, in location-based social networks, the spatial distribution of nodes is rarely uniform because people congregate around the city centers (Cho et al., 2011; Wang & González, 2009) . In online communities such as the LiveJournal social network, non-uniformity arises since the probability of befriending a particular person is inversely proportional to the number of closer people (Hu et al., 2011; Liben-Nowell et al., 2005) . In a WWW network, there are more pages for popular topics than obscure ones. In social networks, different demographics (age, gender, ethnicity, etc.) may join a social media platform at different rates. For surface meshes, specific locations may be sampled more finely, depending on the required level of detail. The imbalance caused by non-uniform sampling could affect the analysis and lead to biased results. For instance, Janssen et al. (2016) show that incorrectly assuming uniform density consistently overestimates the node distances while using the (estimated) density gives more accurate results. Therefore, it is essential to assess the sampling density, which is one of the main goals of this paper. Barring a few exceptions, non-uniformity is rarely considered in geometric graphs. Iyer & Thacker (2012) study a class of non-uniform random geometric graphs where the radii depend on the location. Martínez-Martínez et al. (2022) study non-uniform graphs on the plane with the density functions specified in polar coordinates. Pratt et al. (2018) consider temporal connectivity in finite networks with non-uniform measures. In all of these works, the focus is on (asymptotic) statistical properties of the graphs, such as the average degree and the number of isolated nodes.

1.1. OUR CONTRIBUTION

While traditional Laplacian approximation approaches solve the direct problem -approximating a known continuous Laplacian with a graph Laplacian -in this paper we solve the inverse problemconstructing a graph Laplacian from an observed graph that is guaranteed to approximate an unknown continuous Laplacian. We believe that our approach has high practical significance, as in practical data science on graphs, the graph is typically given, but the underlying continuous model is unknown. To be able to solve this inverse problem, we introduce the non-uniform geometric graph (NuG) model. Unlike the standard geometric graph model, a NuG is generated by a non-uniform sampling density and a non-constant neighborhood radius. In this setting, we propose a class of graph shift operators (GSOs), called non-uniform geometric GSOs, that are computed solely from the topology of the graph and the node/edge features while guaranteeing that these GSOs approximate corresponding latent continuous operators defined on the underlying geometric spaces. Together with Dasoulas et al. (2021) and Sahbi (2021), our work can be listed as a theoretically grounded way to learn the GSO. Justified by formulas grounded in Monte-Carlo analysis, we show how to compensate for the nonuniformity in the sampling when computing non-uniform geometric GSOs. This requires having estimates both of the sampling density and the neighborhood radii. Estimating these by only observing the graph is a hard task. For example, graph quantities like the node degrees are affected both by the density and the radius, and hence, it is hard to decouple the density from the radius by only observing

