QUANTUM-INSPIRED TENSORIZED EMBEDDING WITH APPLICATION TO NODE REPRESENTATION LEARNING

Abstract

Node representation learning a.k.a. network embedding (NE) is an essential technique for network analysis by representing nodes as vectors. Most existing NE algorithms require a O(p) space complexity for each node embedding (p is the final embedding vector size). Impressed and inspired by the large Hilbert space of quantum systems, we propose a brand new NE algorithm node2ket and its variant node2kek+ by imitating behaviors of quantum systems. Theoretically, we give analysis on how it unifies existing embedding methods including both conventional ones and tensorized ones, and prove that our methods achieve to use O(p 1/ ln p ln p) or even smaller space for each node embedding. We are the first to successfully conduct node embedding in the super large Hilbert space. Experiments are conducted on five public real-world networks where methods are evaluated through the tasks of network reconstruction, link prediction, and node classification. On BlogCatalog, our method achieves to outperform all baselines with 1/32 training parameters and 1/16 running time on the same machine. On DBLP, the reconstruction precision of node2ket achieves to be 3 times higher than the best baseline i.e. LouvainNE. The source code will be publicly available. Hilbert space is a big place! -Carlton Morris Caves.

1. INTRODUCTION

Node representation learning, also known as network embedding (NE), has been well-studied in decades. Borrowing idea from word embedding in NLP, deepwalk (Perozzi et al., 2014) first proposes to learn node embeddings for downstream tasks by feeding random walk sequences to word2vec (Mikolov et al., 2013) . Afterward, numerous NE methods for different types of networks (e.g. heterogeneous/multiplex networks) and for different purposes (e.g. preserving structural similarity) are proposed (see Sec. 2 for more). Despite the great success, the existing NE methods still suffer from the high space complexity for storing/training the embeddings whose size grows linearly to the multiplication of the number of nodes and the embedding dimension (Xiong et al., 2022) (see Appendix A for details). As a result, there is difficulty in deploying these methods on memorylimited devices for privacy-sensitive computing e.g. recommender systems in social networks on mobile phones. Meanwhile, the expressiveness of the embedding vectors can be restricted. Recently, inspired by quantum mechanics, word2ket (Panahi et al., 2020) was proposed for compressive word embedding by imitating the behaviors of quantum bits in quantum systems. By tensor entanglement, word2ket achieves both a high compressive ratio and experimental results comparable to conventional embeddings. Though providing many insights in generating embeddings, word2ket still faces difficulties transferring to embedding problems in other fields such as node representation learning (Hamilton et al., 2017b) . These difficulties include: i) Word2ket generates embeddings from a fixed complete binary tree, which neglects the latent structure of the input data and hence brings challenges to learning embeddings for data with explicit structures (e.g. networks). ii) Word2ket adopts deep neural networks to decode the embedding and is trained in a supervised way, which is rather different from many embedding methods (Tang et al., 2015; Grover & Leskovec, 2016) that decode embeddings via simply inner product operation and train embeddings by optimizing the objective based on Noise Contrastive Estimation (NCE) (Gutmann & Hyvärinen, 2010) in a self-supervised (contrastive learning) style.

