QUANTUM-INSPIRED TENSORIZED EMBEDDING WITH APPLICATION TO NODE REPRESENTATION LEARNING

Abstract

Node representation learning a.k.a. network embedding (NE) is an essential technique for network analysis by representing nodes as vectors. Most existing NE algorithms require a O(p) space complexity for each node embedding (p is the final embedding vector size). Impressed and inspired by the large Hilbert space of quantum systems, we propose a brand new NE algorithm node2ket and its variant node2kek+ by imitating behaviors of quantum systems. Theoretically, we give analysis on how it unifies existing embedding methods including both conventional ones and tensorized ones, and prove that our methods achieve to use O(p 1/ ln p ln p) or even smaller space for each node embedding. We are the first to successfully conduct node embedding in the super large Hilbert space. Experiments are conducted on five public real-world networks where methods are evaluated through the tasks of network reconstruction, link prediction, and node classification. On BlogCatalog, our method achieves to outperform all baselines with 1/32 training parameters and 1/16 running time on the same machine. On DBLP, the reconstruction precision of node2ket achieves to be 3 times higher than the best baseline i.e. LouvainNE. The source code will be publicly available. Hilbert space is a big place! -Carlton Morris Caves.

1. INTRODUCTION

Node representation learning, also known as network embedding (NE), has been well-studied in decades. Borrowing idea from word embedding in NLP, deepwalk (Perozzi et al., 2014) first proposes to learn node embeddings for downstream tasks by feeding random walk sequences to word2vec (Mikolov et al., 2013) . Afterward, numerous NE methods for different types of networks (e.g. heterogeneous/multiplex networks) and for different purposes (e.g. preserving structural similarity) are proposed (see Sec. 2 for more). Despite the great success, the existing NE methods still suffer from the high space complexity for storing/training the embeddings whose size grows linearly to the multiplication of the number of nodes and the embedding dimension (Xiong et al., 2022) (see Appendix A for details). As a result, there is difficulty in deploying these methods on memorylimited devices for privacy-sensitive computing e.g. recommender systems in social networks on mobile phones. Meanwhile, the expressiveness of the embedding vectors can be restricted. Recently, inspired by quantum mechanics, word2ket (Panahi et al., 2020) was proposed for compressive word embedding by imitating the behaviors of quantum bits in quantum systems. By tensor entanglement, word2ket achieves both a high compressive ratio and experimental results comparable to conventional embeddings. Though providing many insights in generating embeddings, word2ket still faces difficulties transferring to embedding problems in other fields such as node representation learning (Hamilton et al., 2017b) . These difficulties include: i) Word2ket generates embeddings from a fixed complete binary tree, which neglects the latent structure of the input data and hence brings challenges to learning embeddings for data with explicit structures (e.g. networks). ii) Word2ket adopts deep neural networks to decode the embedding and is trained in a supervised way, which is rather different from many embedding methods (Tang et al., 2015; Grover & Leskovec, 2016) that decode embeddings via simply inner product operation and train embeddings by optimizing the objective based on Noise Contrastive Estimation (NCE) (Gutmann & Hyvärinen, 2010) in a self-supervised (contrastive learning) style. In this paper, we propose a novel network embedding paradigm with special treatments concerning the above two problems. For the embedding generation, we propose an embedding module named as Tensorized Embedding Blocks (TEBs) composed of columns of Tensor Units (TUs), each representing an embedding tensor. TEBs can unify the embedding layers in word2ket/word2ketXS by making changes in the way of indexing TUs (see Sec. 4.1 and Sec. 4.5) . A good strategy of assigning TU indices to the embedded nodes will result in a model with both high expressiveness and space efficiency, which is proved by the experiments. For the embedding learning, we optimize the widely adopted NCE-based objective and succeed in learning node embeddings in a 2 32 -dimensional Hilbert space stably and robustly on a single machine (details in Sec. 4.2). Specifically, we propose two variants on different compressive levels, named as node2ket and node2ket+. The former is a preliminary baseline and the latter is more compressive by utilizing graph partition techniques. Furthermore, we give a complexity analysis of the embedding modules in TEBs, and theoretically prove an upper bound of the minimum of the required space to store/train embeddings for node2ket and node2ket+ in Sec. 4.3 and Sec. 4.4 respectively. Our contributions are listed as follows: 1) We are the first to formalize a general embedding architecture, Tensorized Embedding Blocks (TEBs), which generate embeddings in Hilbert space by tensor product and entanglement, imitating behaviors of quantum systems. In this way, high-dimensional embeddings of high expressiveness in the Hilbert space can be obtained via a small number of parameters. Our tensorized embedding architecture is a generalization of the recent method word2ket/word2ketXS and also unifies conventional embedding methods. 2) We take an initiative (the first time to our best knowledge) to adopt tensorized node embedding in representation learning based on Noise Contrastive Estimation. Specifically, we propose two variants at different compressive levels, node2ket and node2ket+, with the bound of compressive power proved respectively. Compared with the preliminary baseline node2ket, node2ket+ further utilizes graph partition techniques in the design of TEBs, in which way features on different levels of granularity are preserved and the embedding model becomes more compressive. 3) Experiments on network reconstruction and link prediction show the overwhelming performance of node2ket. Notably, in the network reconstruction experiments, it becomes the top method on BlogCatalog with 1/32 training parameters and 1/16 running time, and the precision on DBLP is more than 3 times higher than the best one of baselines i.e. LouvainNE (Bhowmick et al., 2020) .

2. RELATED WORKS

Network Embedding. Here we mainly discuss the NE methods based on the language model word2vec (Mikolov et al., 2013) , which are often designed for different types of networks, e.g. homogeneous networks (Perozzi et al., 2014; Grover & Leskovec, 2016; Tang et al., 2015) , heterogeneous networks (Dong et al., 2017) and multiplex networks (Liu et al., 2017; Qu et al., 2017; Xiong et al., 2021) . It also often aims to preserve specific types of information, e.g. low-order proximity (Tang et al., 2015) , structure similarity (Ribeiro et al., 2017) , versatile similarity measures (Tsitsulin et al., 2018) , and cross-network alignment relations (Du et al., 2022) . Apart from NE methods derived from word2vec, methods based on deep neural networks (Wang et al., 2016) especially graph neural networks (Hamilton et al., 2017a) are also influential in literature, which, however, are technically divergent to this paper. Tensorized Embedding Model. Tensors are a generalization of vectors and matrices. Thanks to the capability of representing and manipulating high-dimensional subjects, tensor has been successfully applied to exploit complex physical systems (Nielsen & Chuang, 2002) . The most crucial application is to understand the quantum many-body systems and to make an efficient description of quantum states residing in the exponential growth of the Hilbert space (Bridgeman & Chubb, 2017). In addition, tensor networks composed of multiple tensors can also be used to understand many of the foundational results in quantum information (Orús, 2019) . Utilizing the tensor network structure makes it reasonably possible to simplify the concept like quantum teleportation, purification, and the church of the larger Hilbert space (Bridgeman & Chubb, 2017) . Inspired by this, many tensorized embedding models try to take advantage of tensor operations like contraction and decomposition to handle large volumes of data and compress parameters, ranging from image classification (Stoudenmire & Schwab, 2016; Martyn et al., 2020; Selvan & Dam, 2020; Cheng et al., 2021) to natural language processing (Panahi et al., 2020; Ma et al., 2019; Liu et al., 2020; Qiu & Huang, 2015) . It

