NODE NUMBER AWARENESS REPRESENTATION FOR GRAPH SIMILARITY LEARNING Anonymous

Abstract

This work aims to address two important issues in the graph similarity computation, the first one is the Node Number Awareness Issue (N 2 AI), and the second one is how to accelerate the inference speed of graph similarity computation in downstream tasks. We found that existing Graph Neural Network based graph similarity models have a large error in predicting the similarity score of two graphs with similar number of nodes. Our analysis shows that this is because of the global pooling function in graph neural networks that maps graphs with similar number of nodes to similar embedding distributions, reducing the separability of their embeddings, which we refer to as the N 2 AI. Our motivation is to enhance the difference between the two embeddings to improve their separability, thus we leverage our proposed Different Attention (DiffAtt) to construct Node Number Awareness Graph Similarity Model (N 2 AGim). In addition, we propose the Graph Similarity Learning with Landmarks (GSL 2 ) to accelerate similarity computation. GSL 2 uses the trained N 2 AGim to generate the individual embedding for each graph without any additional learning, and this individual embedding can effectively help GSL 2 to improve its inference speed. Experiments demonstrate that our N 2 AGim outperforms the second best approach on Mean Square Error by 24.3%(1.170 vs 1.546), 43.1%(0.066 vs 0.116), and 44.3%(0.308 vs 0.553), for AIDS700nef, LINUX, and IMDBMulti datasets, respectively. Our GSL 2 is at most 47.7 and 1.36 times faster than N 2 AGim and the second faster model. Our code is publicly available on https://github.com/iclr231312/N2AGim.

1. INTRODUCTION

Graph similarity computation is a fundamental problem for graph-based applications, e.g., graph data mining, graph retrieval, and graph clustering (Kriege et al., 2020; Ok & Korea, 2020) . Graph Edit Distance (GED), which is defined as the least number of graph edit operators to transform graph G i to graph G j , is one of the most popular graph similarity metrics (Gao et al., 2010; Neuhaus et al., 2006; Bougleux et al., 2015) . The graph edit operators are insert or delete a node/edge, or relabel an edge. Unfortunately, the exact GED computation is NP-Hard in general (Zeng et al., 2009) , which is too expensive to leverage in the downstream tasks. Recently, many Graph Neural Networks (GNNs) based graph similarity computation algorithms have been proposed to compute the GED in a faster manner (Bai et al., 2019; 2020; Li et al., 2019; Ling et al., 2021; Bai & Zhao, 2021; Wang et al., 2021) . The GNN-based algorithms transform the GED value to a similarity score and use an end-to-end framework to learn to map the given two graphs to their similarity score. As a general framework, the Siamese neural network can be used to aggregate information on each graph, while the feature fusion module can be used to capture the similarity between them, and the Multi-layer Perceptron (MLP) is then leveraged for the regression. However, the existing popular graph similarity models become very inaccurate in predicting the similarity of two graphs with similar number of nodes, as shown in Fig 1 . It is clear that the MSE of all four models becomes large as the difference in the number of nodes in the two graphs becomes smaller. In order to better understand this issue, we present in Section 3 a theoretical analysis of the most widely used modules in the graph similarity models from a statistical viewpoint. As shown in Fig 2(a) -(e), our conclusion is that all global pooling functions, also called graph readout functions, map graphs with similar number of nodes to similar embeddings, which reduces the separability  f (G 1 , G 2 ) = |N 1 -N 2 |/max(N 1 , N 2 ) , where N i is the number of nodes in G i . It is clear that all models have a larger MSE when the SizeDiff is smaller, i.e. when the number of nodes in the graph pair is similar. between embeddings and leads to a large MSE for the models in predicting the similarity of two graphs with similar number of nodes. We refer to this issue of indistinguishable embeddings of graphs with similar number of nodes as the Node Number Awareness Issue (N 2 AI). Our motivation to address the N 2 AI is to focus more on the differences between two similar embeddings during the learning process, and we propose the Different Attention (DiffAtt) to construct our Node Number Awareness Graph Simialrity Model (N 2 AGim). DiffAtt is simple in architecture, and can be added as a plug-and-play module to any global pooling method. Our evaluations on three datasets (Section 5) demonstrate that the models with different pooling methods achieve a significant improvement after using DiffAtt. Moreover, our N 2 AGim achieves state-of-the-art performance compared to the popular GNN-based graph similarity models, e.g., better about on average 33.3%(0.515 vs 0.772), 51.4%(0.515 vs 1.059) on Mean Square Error (MSE) than EGSCT (Qin et al., 2021) and GraphSim (Bai et al., 2020) , respectively. 2008), we propose a faster and more accurate IEM called Graph Similarity Learning with Landmarks (GSL 2 ). In GSL 2 , a subset of graphs, called landmarks S, are selected, and then each graph G is represented as a vector u G = [GED(G, Ĝ1 ), • • • , GED(G, Ĝm )] T , where Ĝ ∈ S. Finally, an MLP is learned to map the concatenation of the embeddings of the two graphs to their GED target. Instead of learning the embeddings on the graph data, our GSL 2 uses an already trained graph similarity model to directly generate an individual embedding for each graph, and this individual embedding can effectively improve the inference speed of GSL 2 . To sum up, the contributions of this paper can be summarized as follows:



Figure 1: Histogram of the Mean Square Error (MSE) of the existing graph similarity models on three datasets at different level of SizeDiff. The SizeDiff represents the percentage difference in the number of nodes and is defined as SizeDiff (G 1 , G 2 ) = |N 1 -N 2 |/max(N 1 , N 2 ), where N i is the number of nodes in G i . It is clear that all models have a larger MSE when the SizeDiff is smaller, i.e. when the number of nodes in the graph pair is similar. between embeddings and leads to a large MSE for the models in predicting the similarity of two graphs with similar number of nodes. We refer to this issue of indistinguishable embeddings of graphs with similar number of nodes as the Node Number Awareness Issue (N 2 AI).

Figure 2: (a)-(d) Distributions of output from different global pooling functions with N nodes, which show that all global pooling functions map graphs with similar number of nodes to similar distributions. See Section 3 for details. (e) Illustration of the N 2 AI, i.e., the distribution of the embeddings of two graphs with similar number of nodes is indistinguishable. Region A represents where two distributions overlap, while B is the opposite. Our aim is to enhance the information in B to address the N 2 AI. (f)-(g) Illustration of the Early Fusion Model (EFM) and Individual Embedding Model (IEM). Another issue of interest in the field of graph similarity learning is to accelerate the inference speed of graph similarity models in downstream tasks. Qin et al. (2021) divided the graph similarity models into two categories, one is the Early Fusion Model (EFM), shown in Fig 2(f), which performs feature fusion at an early stage to achieve high accuracy but slow inference, and the other one is the Individual Embedding Model (IEM), shown in Fig 2(g), which generates an individual embedding for each graph and then performs fusion. This model is fast but achieves low accuracy. The existing solution (Qin et al., 2021) uses a special designed Knowledge Distillation (KD) paradigm to leverage an EFM teacher to improve the individual embeddings generaged by the IEM student. However, motivated by Balcan et al. (2008), we propose a faster and more accurate IEM called Graph Similarity Learning with Landmarks (GSL 2 ). In GSL 2 , a subset of graphs, called landmarks S, are selected, and then each graph G is represented as a vector u G = [GED(G, Ĝ1 ), • • • , GED(G, Ĝm )] T , where Ĝ ∈ S. Finally, an MLP is learned to map the concatenation of the embeddings of the two graphs to their GED target. Instead of learning the embeddings on the graph data, our GSL 2 uses an already trained graph similarity model to directly generate an individual embedding for each graph, and this individual embedding can effectively improve the inference speed of GSL 2 . To sum up, the contributions of this paper can be summarized as follows:

