

Abstract

In this paper, we introduce InstantEmbedding, an efficient method for generating single-node representations using local PageRank computations. We theoretically prove that our approach produces globally consistent representations in sublinear time. We demonstrate this empirically by conducting extensive experiments on real-world datasets with over a billion edges. Our experiments confirm that Instant-Embedding requires drastically less computation time (over 9,000 times faster) and less memory (by over 8,000 times) to produce a single node's embedding than traditional methods including DeepWalk, node2vec, VERSE, and FastRP. We also show that our method produces high quality representations, demonstrating results that meet or exceed the state of the art for unsupervised representation learning on tasks like node classification and link prediction.

1. I N T R O D U C T I O N

Graphs are widely used to represent data when are objects connected to each other, such as social networks, chemical molecules, and knowledge graphs. A widely used approach in dealing with graphs is learning compact representations of graphs (Perozzi et al., 2014; Grover & Leskovec, 2016; Abu-El-Haija et al., 2018) , which learns a d-dimensional embedding vector for each node in a given graph. Unsupervised embeddings in particular have shown improvements in many downstream machine learning tasks, such as visualization (Maaten & Hinton, 2008) , node classification (Perozzi et al., 2014) and link prediction (Abu-El-Haija et al., 2018) . Importantly, since such embeddings are learned solely from the structure of the graph, they can be used across multiple tasks and applications. Typically, graph embedding models often assume that graph data fits in memory (Perozzi et al., 2014) and require representations for all nodes to be generated. However, in many real-world applications, it is often the case that graph data is large but also scarcely annotated. For example, the Friendster social graph (Yang & Leskovec, 2015) has only 30% nodes assigned to a community, from its total 65M entries. At the same time, many applications of graph embeddings such as classifying a data item only require one current representation for the item itself, and eventually representations of labeled nodes. Therefore, computing a full graph embedding is at worst infeasible and at best inefficient. These observations motivate the problem which we study in this paper -the Local Node Embedding problem. In this setting, the embedding for a node is restricted to using only local structural information, and can not access the representations of other nodes in the graph or rely on trained global model state. In addition, we require that a local method needs to produce embeddings which are consistent with all other node's representations, so that the final representations can be used in the same downstream tasks that graph embeddings have proved adapt at in the past. In this work, we introduce InstantEmbedding, an efficient method to generate local node embeddings on the fly in sublinear time which are globally consistent. Considering previous work that links embedding learning methods to matrix factorization (Tsitsulin et al., 2018; Qiu et al., 2018) , our method leverages a high-order similarity matrix based on Personalized PageRank (PPR) as foundations on which local node embeddings are computed via hashing. We offer theoretical guarantees on the locality of the computation, as well as the proof of the global consistency of the generated embeddings. We show empirically that our method is able to produce high-quality representations on par with state of the art methods, with efficiency several orders of magnitude better in clock time and memory consumption: running 9,000 times faster and using 8,000 times less memory on the largest graphs that contenders can process.  1 α(1-α) + d 1 α(1-α) + d 2 P R E L I M I N A R I E S & R E L AT E D W O R K 2 . 1 G R A P H E M B E D D I N G Let G = (V, E) represent an unweighted graph, which contains a set of nodes V , |V | = n, and edges E ⊆ (V × V ), |E| = m. A graph can also be represented as an adjacency matrix A ∈ {0, 1} n×n where A u,v = 1 iff (u, v) ∈ E. The task of graph embedding then, is to learn a d-dimensional node embedding matrix X ∈ R n×d where X v serves as the embedding for any node v ∈ V . We note that d n, i.e. the learned representations are low-dimensional, and the challenge of graph embedding is to best preserve graph properties (such as node similarities) in this space. Following the formalization in Abu-El-Haija et al. ( 2018), many graph embedding can be thought of minimizing an objective in the general form: min X L(f (X), g(A)), where f : R n×d → R n×n is a pairwise distance function on the embedding space, g : R n×n → R n×n is a distance function on the (possibly transformed) adjacency matrix, and L is a loss function over all (u, v) ∈ (V × V ) pairs. A number of graph embedding methods have been proposed. One family of these methods simply learn X as a lookup dictionary of embeddings and calculate the loss via distance (Kruskal, 1964) , or matrix factorization (either implicit (Perozzi et al., 2014; Grover & Leskovec, 2016) or explicit (Ou et al., 2016) ). Another line of work focuses on leveraging the graph structure using neighborhood aggregation (Battaglia et al., 2016; Scarselli et al., 2008) , or the Laplacian matrix of the graph (Kipf & Welling, 2016) . On attributed structured data, Graph Convolutional Networks (Kipf & Welling, 2016) have been successfully applied to both supervised and unsupervised tasks (Veličković et al., 2018) . However, in the absence of node-level features, Duong et al. (2019) demonstrated that these methods do not produce meaningful representations. Graph Embedding via Random Projection The computational efficiency brought by advances in random projection (Achlioptas, 2003; Dasgupta et al., 2010) paved the way for its adaptation in graph embedding to allow direct construction of the embedding matrix X. Two recent works, RandNE (Zhang et al., 2018) and FastRP (Chen et al., 2019) iteratively project the adjacency matrix to simulate the higher-order interactions between nodes. As we show in the experiments, these methods suffer from high memory requirements and are not always competitive with other methods.

2. . L O C

A L A L G O R I T H M S O N G R A P H S Local algorithms on graphs (Suomela, 2013) solve graph without using the full graph. A wellstudied problem in this space is personalized recommendation (Jeh & Widom, 2003) where users are represented as nodes in a graph and the goal is to recommend items to specific users leveraging the graph structure. Classic solutions to this problem are Personalized PageRank (Gupta et al., 2013) and Collaborative Filtering (Schafer et al., 2007; He et al., 2017) . Interestingly, these methods have been recently applied to graph neural networks (Klicpera et al., 2019; He et al., 2020) . We now recall the definition of Personalized PageRank that is one of the main ingredients in our embedding algorithm.



Related work in terms of desirable properties and the computational complexity necessary to generate a single node embedding. Note that all existing methods must generate a full graph embedding, and thus are directly dependent on the total graph size, while our method can directly solve this task in sublinear time. Analysis in Section 3.2.1.

