ATTRIBUTES RECONSTRUCTION IN HETEROGENEOUS NETWORKS VIA GRAPH AUGMENTATION

Abstract

Heterogeneous Graph Neural Networks(HGNNs), as an effective tool for mining heterogeneous graphs, have achieved remarkable performance on node classification tasks. Yet, HGNNs are limited in their mining power as they require all nodes to have complete and reliable attributes. It is usually unrealistic since the attributes of many nodes in reality are inevitably missing or defective. Existing methods usually take imputation schemes to complete missing attributes, in which topology information is ignored, leading to suboptimal performance. And some graph augmentation techniques have improved the quality of attributes, while few of them are designed for heterogeneous graphs. In this work, we study the data augmentation on heterogeneous graphs, tackling the missing and defective attributes simultaneously, and propose a novel generic architecture-Attributes Reconstruction in Heterogeneous networks via Graph Augmentation(ARHGA), including random sampling, attribute augmentation and consistency training. In graph augmentation, to ensure attributes plausible and accurate, the attention mechanism is adopted to reconstruct attributes under the guidance of the topological relationship between nodes. Our proposed architecture can be easily combined with any GNN-based heterogeneous model, and improves the performance. Extensive experiments on three benchmark datasets demonstrate the superior performance of ARHGA over strate-of-the-art baselines on semi-supervised node classification.

1. INTRODUCTION

Heterogeneous information networks(HINs) (Yang et al. (2020) ; Shi et al. (2016) ; Shen et al. (2017) ), which contain multiple types of nodes and edges, have been widely used to model complex systems and solve practical problems. Recently, heterogeneous graph neural networks have emerged as prevalent deep learning architectures to analyze HINs and shown superior performance in various graph analytical tasks, such as node classification (Wang et al. (2019a) ; Yun et al. (2019) ) and link prediction (Fu et al. (2020) ; Zhang et al. (2019) ). Most HGNNs follow a message-passing scheme in which each node updates its embedding by aggregating information of its neighbors'attributes. Such message-passing scheme usually requires that all nodes have complete and reliable attributes, which is not always satisfied in practice due to resource limitation and personal privacy, resulting in missing and defective attributes. In general, the attribute missing in heterogeneous graphs means that attributes of partial nodes are entirely missing, compared to that in homogeneous graphs, is more frequent and complex. Take DBLP (Sun & Han (2013) ) as an example, the network has four types of nodes(author, paper, term and venue) and three types of links. Only paper nodes have attributes which are extracted from the keywords in their titles, while other types of nodes have no attributes. It impairs the effectiveness of the corresponding graph mining model to certain extents. In another fold, the original attributes of nodes are sometimes not ideal since heterogeneous graphs are extracted from complex systems which inevitably are subject to various forms of contamination, such as mistakes and adversarial attacks, making error propagation and greatly affecting the process of message-passing. This suggests the need for effective approaches able to complete missing attributes and calibrate defective attributes in heterogeneous graphs simultaneously. To alleviate the effect incurred from missing attributes, the existing models usually adopt imputation strategy, such as neighbor's average or one-hot vector as done in MAGNN (Fu et al. (2020) ). These imputation methods are nonoptimal because graph structure information is ignored and only rare useful information is provided, hampering subsequent analysis. An alternative technique to tackle this issue is to consider graph topology information and inject it into the completion models. The work of Jin et al. (2021) and He et al. (2022) has shown a significant boost on node classification tasks. But both methods naturally assume that the original attributes are reliable, which is not easy to satisfy in real-world applications. In another concern of research, some graph augmentation techniques are adopted to calibrate original attributes to improve the quality and have shown a promising performance (Xu et al. (2022) ; Zhu et al. (2021) ). However, these methods are deficient for heterogeneous graphs as they are not capable of encoding complex interaction. Further, existing methods either only complete missing attributes or only improve the quality of attributes, while it is worthy making efforts to solving both problems at the same time. In this paper, we attempt to deal with the missing and defective attributes simultaneously in heterogeneous graphs, and propose a novel framework for Attributes Reconstruction in Heterogeneous networks via Graph Augmentation(ARHGA). ARHGA repeatedly sample nodes to perform attribute augmentation to obtain multiple augmented attributes for each node, and then utilize consistency training (Xie et al. (2020) ) to make the outputs of different augmentations as similar as possible. Moreover, to ensure the augmented attributes more accurate, node topological embeddings are learned through HIN-embedding methods (Dong et al. (2017); Fu et al. (2017) ; Shang et al. (2016) ; Wang et al. (2019b) ) to capture graph structure information as guidance. In this way, ARHGA effectively enhances the performance of existing GNN-based heterogeneous models in aid of the reconstructed attributes. Contributions. In summary, the main contributions of this paper are as follows: • We propose a generic architecture of graph augmentation on heterogeneous networks for attributes reconstruction, focusing both on the missing and defective attributes. • We design an effective attribute-wise augmentation strategy implemented by attention mechanism, which integrates topology information to increase the reliability of the reconstructed attributes. • Extensive experimental results on three node classification benchmark datasets demonstrate the effectiveness of our proposed model.

2. RELATED WORK

Heterogeneous graph neural networks. Heterogeneous graphs have been widely used to solve real-world problems due to a diversity of node types and relationships between nodes. Recently, many HGNNs have been proposed to analyze HINs. HAN (Wang et al. (2019a) ) learns node representations using the node-level attention and the meta-path-level attention. methods only complete the missing attributes, while our proposed method not only completes the missing attributes but also calibrates defective attributes to enhance the quality of attributes. Graph data augmentation. Recently, variants of data augmentation techniques have been explored in deep graph learning and demonstrated remarkable results. Graph data augmentation aims to generate the augmented graph(s) through enriching or changing the information from the original graph. It can be categorized into three classes: attribute-wise augmentation, structure-wise augmentation and label-wise augmentation (Ding et al. (2022) ). Among the attribute-wise augmentation strategies, there are a few lines of existing works for attribute calibration/denoising. A straightforward method (Xu et al. (2022) ) is to compute the gradient of specific objective functions w.r.t. the node attribute matrix and calibrate the node attributes matrix based on the computed gradient. In addition, as a special case of noisy setting on graph data, the problem of missing attributes also has been studied. Representative works include GCNMF and Feature Propagation. GCNMF (Taguchi et al. (2021) ) completes the missing data by Gaussian Mixture Model(GMM). It integrates graph representation learning with attribute completion and can be trained in end-to-end manner. Feature Propagation (Rossi et al. (2021) ) reconstructs the missing node attributes based on a diffusion-type differential equation on the graph. It is important to note that the above-mentioned graph augmentation methods only work on homogeneous graphs, heterogeneous attribute-wise augmentation is still an under-explored remains problem.

3. METHODOLOGY

In this section, we introduce the ARHGA framework, of which the general idea is to tackle the missing and defective attributes simultaneously through an effective graph augmentation strategy, and integrate the augmentation process and HGNN module into a unified framework to benefit the performance on node classification tasks. Figure 1 illustrates the proposed framework.

3.1. RANDOM SAMPLING

Random sampling randomly selects a part of nodes to form the augmentation node set(AN S) in which nodes may have no attributes or have original attributes. Specifically, given a heterogeneous graph G with node set V of n nodes, edge set E and attribute matrix X, we randomly sample a binary mask ε i ∼ Bernoulli(1 -δ) for each node v i in V to determine whether v i belongs to the AN S, where δ is the probability when ε i takes 0. We obtain AN S = {v i |ε i = 0}, i.e., if ε i = 0, then v i ∈ AN S. The attributes of nodes in the AN S are dropped in this step and reconstructed in the phase of attribute augmentation. To reconstruct attributes for all nodes, random sampling is performed repeatedly to guarantee that all nodes are selected. If the number of samplings is set as M , we give the constraint under which M should satisfy from a probabilistic view. Let P denote the probability of all nodes being selected at least once, since each node is sampled independently, we have: P = n i=1 P i = (P i ) n , P i = 1 -P C i = 1 -P (k = 0) = 1 -δ M , where P i and P C i are the probability that node v i is selected at least once and that is not selected respectively, k is the number of times being selected. Given a threshold τ close enough to 1, when P is greater than τ , it is considered having the full coverage of all nodes after M random samplings. Substituting Eq.( 2) into Eq.( 1), it can be deduced that M should satisfy (1 -δ M ) n > τ , i.e., M > log δ (1 -n √ τ ). Note that the sampling procedure is only performed during training. During inference, we directly consider the node set V as the AN S.

3.2. ATRRIBUTE AUGMENTATION

In heterogeneous graphs, attribute information and structure information are two crucial characteristics, and are semantically similar to each other due to the homophily of network (McPherson et al. (2001) ; Pei et al. (2020) ; Schlichtkrull et al. (2018) ). With this principle, there is a reasonable hypothesis that the topological relationship between nodes can well reflect the relationship of nodes'attributes. Or, briefly, adjacent nodes tend to have similar attributes. So attributes of nodes in the AN S can be reconstructed based on attribute information of their neighbors. Considering the complex interactions in heterogeneous networks, the node topological embeddings H is learned to capture underlying structure information, and used as guidance to reconstruct attributes. Topological embedding. In graph G, the node set V is associated with corresponding node-type set F and the edge set E is associated with corresponding edge-type set R. To capture the structure information in G, random walk (Huang et al. (2019) ) is adopted based on multiple common metapaths, to generate more comprehensive node sequences which are fed into a heterogeneous skipgram model to learn node topological embeddings. Given a pre-defined meta-path P : F 1 R1 -→ F 2 R2 -→ • • • R l-1 -→ F l , for node v i with type F i , the transition probability at step i can be formulated as: p(v i+1 |v i , P) = 1 |N F i+1 (vi)| , (v i+1 , v i ) ∈ E, φ(v i+1 ) = F i+1 0, otherwise , where N Fi+1 (v i ) stands for the neighbors with type F i+1 of node v i . The random walk makes that underlying semantics information in graph G is preserved properly. Then skip-gram (Mikolov et al. (2013) ) with one-hot vector as input is adopted to learn topological embeddings by maximizing the probability of the local neighbor structures captured by the random walk, that is: max θ v∈V F ∈F u∈N F (v) log p(u|v; θ), where N F (v) the set of neighbors with type F of node v i , which is sampled by random walk based on the given metapath. The topological embedding of nodes can be learned and denoted as H. Attribute-wise augmentation. From the perspective of network science, the information on direct links is more essential, so one-hop neighbors have more contributions to attribute augmentation. And we notice that directly connected neighbors have different importance because the neighbors of a node may be of different types and different local structure in heterogeneous graphs. ARHGA adopts the attention mechanism to learn the importance of different one-hop neighbors based on topological embedding H. Note that ARHGA only computes attention coefficients of nodes' direct neighbors by performing masked attention, which reduces unnecessary computation and is thus more efficient. Specifically, given a node pair (v, u) which are directly connected, the attention layer learns the importance e vu which indicates the contribution of node u to node v, u ∈ N + v , where N + v denotes the one-hop attributed neighbors of node v, the importance can be computed as: e vu = σ(h T v W h u ), where h u and h v are the topological embeddings of node u and node v. Here the attention layer, parametrized by a weight matrix W , is shared for all node pairs, and σ is an activation function. To better compare the importance across different nodes, the softmax function is used to normalize them to get weighted coefficient α uv : α vu = sof tmax(e vu ) = exp(e vu ) s∈N + v exp(e vs ) . Then, ARHGA obtains attributes of node v by aggregating attributes of it's one-hop neighbors according to weighted coefficients α vu : xv = u∈N + v α vu x u . In this way, the missing attributes are completed and original defective attributes are calibrated. As shown in Figure 2 , we give a brief explanation on the author node(A1) in DBLP and the paper node(P1) in ACM as an example. The node A1 has no attributes and the node P1 has original defective attributes, their attributes are reconstructed effectively using our proposed framework. Importantly, the completed attributes are confirmed and the calibrated attributes are more meaningful than original attributes owing to integration with underlying semantic information in heterogeneous graphs after conducting weighted aggregation. Finally, to stabilize the learning process and reduce the high variance (brought by the heterogeneity of graphs), we extend the attention mechanism to multi-head attention, as done in many existing methods (Veličković et al. (2017) , Wang et al. (2019a) ). Specifically, K independent attention mechanisms are performed to serve as the final attributes of node v, and then their outputs are averaged to generate the following attributes of node v: xv = mean( K k u∈N + v α vu x u ), where mean(•) represents an average function and K is time of performing attention mechanism. After attribute-wise augmentation, attributes of all nodes are updated as: X = { Xi , X j |v i ∈ AN S, v j ∈ V -AN S}.

3.3. CONSISTENCY TRAINING

After performing random sampling and attribute augmentation for M times, we generate M augmented attribute matrices { X(m) |1 ≤ m ≤ M }, each of which together with the original graph structure A is sent into the HINs model to get the corresponding output: Ỹ (m) = Φ(A, X(m) ), where Ỹ (m) are the prediction probabilities of X(m) , Φ denotes a HINs model. With the augmented attributes, ARHGA can be applied to any other heterogeneous graph models and successfully enhance the performance. To be specific, MAGNN (Fu et al. (2020) ) is combined when implementing ARHGA. Supervised Loss. Under a semi-supervised setting, with s labeled nodes among n nodes, the supervised loss on the node classification task is defined as the average cross-entropy loss over M augmentations: L sup = - 1 M M m=1 s-1 i=0 Y T i log Ỹ (m) i , where Y ∈ {0, 1} n×C are ground-truth labels with C representing the number of classes. Consistency Loss. Valid graph data augmentation changes the input in a way that has relatively trivial impact on the final node classification. We naturally embed this knowledge into our model by designing a consistency loss over M augmentations. For each node v i , we generate a "guessed label" no matter it originally has a label or not. Specifically, find the average of the model's prediction distributions across all the M augmentations of v i : Ȳi = 1 M M m=1 Ỹ (m) i , then the "guessed label" Ȳ i = ( Ȳ i0 , ..., Ȳ ij , Ȳ iC-1 ) T for node v i is computed through the sharpening trick (Berthelot et al. (2019) ), in which Ȳ ij is the guessed probability of v i belonging to class j: Ȳ ij = Ȳ 1 T ij / C-1 c=0 Ȳ 1 T ic , 0 ≤ j ≤ C -1, where 0 ≤ T ≤ 1 acts as the "temperature" that controls the sharpness of the distribution. In ARHGA, T is set as a small value to enforce the "guessed label" to approach a one-hot distribution. Then we minimize the distance between Ỹi and Ȳ i to design the consistency loss: L con = 1 M M m=1 n-1 i=0 || Ȳ i - Ỹ (m) i || 2 2 . ( ) Optimization Objective. Both the supervised loss and the consistency loss are combined as the final loss of ARHGA: L = L sup + λL con , where the hyper-parameter λ is introduced to control the balance between the two losses. By minimizing the final loss, our model can be optimized via back propagation in an end-to-end manner. Algorithm 1(present in Appendix A.1) outlines ARHGA's training process. During inference, we directly take V as the AN S, that is, attributes of all nodes are reconstructed after performing attribute augmentation one time. Hence, the inference formula is Ỹ = Φ(A, X, Θ), where X = { Xi |v i ∈ V} and Ỹ is the corresponding prediction probabilities.

4. EXPERIMENTS

In this section, we first give the experimental setup, and then evaluate the performance of ARHGA on the node classification and report the visualization results. We also conduct a deep analysis on the effectiveness of ARHGA. Finally, we investigate the sensitivity with respect to hyper-parameters of ARHGA. Datasets. We conduct experiments on three widely-used HINs datasets, i.e., DBLPfoot_0 , ACMfoot_1 , IMDBfoot_2 , to analyze the effectiveness of ARHGA. Note that, only paper nodes in DBLP and ACM, Baselines. We compare ARHGA with seven state-of-the-art methods representative of two different categories: two traditional homogeneous GNNs, i.e., GCN(Kipf & Welling (2016) ), GAT (Veličković et al. (2017) ), and five heterogeneous graph models, i.e., metapath2vec (Dong et al. (2017) ), HetGNN (Zhang et al. (2019) ), HAN (Wang et al. (2019a) ), MAGNN (Fu et al. (2020) ), MAGNN-AC (Jin et al. (2021) ). Among heterogeneous graph models, metapath2vec is a network embedding methods and other models are GNN-based methods. MAGNN-AC is an approach to complete missing attributes of nodes having no attributes in heterogenerous graphs. Parameter Settings. For the settings/parameters of different baselines, we use the default hyperparameters suggested in MAGNN-AC. The embedding dimensions of all methods is set to 64 for a fair comparison. In ARHGA, random sampling probability δ is set to 0.5 and the number of augmentations M to 6. The number of attention heads K is set to 8 and λ is 0.5 for the weight of the consistency loss. In addition, ARHGA is trained by adopting Adam optimizer(Kingma & Ba ( 2014)) with the learning rate 0.005. The early stopping strategy with a patience of 5 epoches is adopted on validation set in our experiments.

4.1. NODE CLASSIFICATION

In this section, the performance on node classification of ARHGA is compared with that of different models. First, we generate embeddings of labeled nodes(i.e., authors in DBLP, authors in ACM and movies in IMDB), and then feed them into a linear support vector machine(SVM) (Suykens (2001) ) classifier with different training ratios from 10% to 80%. Table 1 reports the results of averaged Macro-F1 and Micro-F1 over 5 times, where the best results are in bold fonts. As shown, ARHGA consistently achieves a significant margin over baselines and datasets. After conducting attribute augmentation, ARHGA has 0.9%-5.47% higher accuracy than MAGNN and 0.06%-1.78% higher than MAGNN-AC, another MAGNN-based method, which performs the best results from the baseline methods. Specially, although some methods have already achieved high accuracy on DBLP dataset, ARHGA still has nontrivial improvement. The improvement is primarily due to the fact ARHGA provides a better representation of node attributes. In addition, GNN-based heterogeneous methods perform better than metapath2vec since attributes information is integrated, demonstrating the importance of node attributes. The poor performance of GCN and GAT reveals the importance of encoding heterogeneous semantic information when analyzing HINs. Our proposed method not only captures and preserves underlying semantic information in heterogeneous graphs but also effectively reconstructs node attributes to augment graph, which contribute to the great superiority of model.

4.2. VISUALIZATION

For a more intuitive comparison, we conduct the task of visualization by learning embeddings of paper nodes of MAGNN, MAGNN-AC and ARHGA on the ACM dataset. Then the well-known t-SNE(Van der Maaten & Hinton ( 2008)) is utilized to project the embeddings into a 2-dimensional space, where nodes with different colors belong to different class. As shown in Figure 3 , with the reconstructed attributes, ARHGA has the clearest boundary and densest cluster structure to classify nodes among the three methods. In contrast, MAGNN and MAGNN-AC perform poorly, some paper nodes of different classes are mixed and overlapped obviously. MAGNN adopts imputation methods to fill in the missing attributes, which provides little useful information for node classification, resulting in poor performance. MAGNN-AC only completes the missing attributes and the defective attributes are not addressed properly. ARHGA not only completes missing attributes but also enhances the quality of attributes to make nodes more distinguishable. To verify the effectiveness of attribute completion and attribute calibration in our method, we conduct experiments on comparing ARHGA with two variants. The variants are given as follows: 1) ARHGA of removing attributes calibration, only completing the missing attributes, named ARHGA-1; 2) ARHGA of removing attributes completion, only enhancing the quantity of the original defec-tive attributes, named ARHGA-2, in which, the missing attributes are obtained by imputation strategies. In addition, we also compare with MAGNN to better analyze the benefit of ARHGA. Figure 4 presents the results of the deep analysis. More results see Appendix A.2.2.

4.3. A DEEP ANALYSIS OF ARHGA

As shown, ARHGA has the best results on all datasets at all label rates. Though ARHGA-1 and ARHGA-2 are lower than ARHGA, still outperform MAGNN. This is mainly because ARHGA-1 completes missing attributes accurately and ARHGA-2 calibrates the defective attributes effectively, while there are no corresponding strategies to tackle missing and defective attributes in MAGNN. Furthermore, all the results of ARHGA-1 are better than those of ARHGA-2, so it is believed that attribute completion contributes more than attribute calibration to the effectiveness of ARHGA. The overall results show that accurate attribute information palys a significant role in learning node representations. And ARHGA is an effective exploration of attribute reconstruction through graph data augmentation.

4.4. PARAMETER ANALYSIS

In this section, we investigate the sensitivity of ARHGA to the critical hyper-parameters: random sampling probability δ, number of augmentations M and weight of consistency loss λ. We report the results of node classification by varying these parameters on ACM dataset. 

5. CONCLUSIONS

In this paper, we present a novel generic architecture(ARHGA) to reconstruct attributes in heterogeneous graphs. In ARHGA, we design an effective attribute augmentation strategy, which not only solves the problem of missing attributes, but also enhances the quality of original defective attributes. The augmentation strategy is essentially to conduct weighted aggregation through the attention mechanism guided by the topological relationship between nodes. The results of node classification show its consistent performance superiority over seven state-of-the-art baselines on benchmark datasets. The deep analysis of ARHGA demonstrates the effectiveness of completion of missing attributes and calibration of defective attributes. We conclude that ARHGA provides a better attribute representation helpful for improving semi-supervised classification on heterogeneous networks. In future work, we aim to design a structure-wise augmentation for a better graph structure representation in heterogeneous graphs.  sup = -1 M M m=1 s-1 i=0 Y T i log Ỹ (m) i and the consistency loss L con = 1 M M m=1 n-1 i=0 || Ȳ i - Ỹ (m) i || 2 8: Update the parameters by gradients descending: Θ = Θ -η∇ Θ (L sup + λL con ) 9: end while 10: Output prediction Ỹ via: Ỹ = Φ(A, X, Θ)

A.2 MORE EXPERIMENTAL RESULTS

In this section, we provide dataset details and more experimental results besides the results in the main paper.

A.2.1 ADDITIONAL DATASET DETAILS

In this section, we provide some additional, relevant dataset details, the statistics of these datasets are shown in Table A1 . Table A1 • DBLP: This is a subset of DBLP with 4057 authors(A), 14328 papers(P), 20 venue(V), and 8789 terms(t). The authors are divided into four research areas: database, data mining, machinesearching, and information retrieval. Only paper nodes have attributes derived from their keywords, and other nodes have no raw attributes. • ACM: This is a subset of DBLP with 4057 authors(A), 14328 papers(P), 20 venue(V), and 8789 terms(t). The authors are divided into four research areas: database, data mining, machinesearch-ing, and information retrieval. Only paper nodes have attributes derived from their keywords, and other nodes have no original attributes. • IMDB: We extract a subset of IMDB with 4278 movies(M), 2081 directors(D), and 5257 actors(A). The movie nodes are labeled by their genres: action, comedy and drama. In this dataset, movies are described by bag-of-words representation of their plot keywords, and other nodes have no original attributes.

A.2.2 A DEEP ANALYSIS OF ARHGA

We also report the Micro-F1 values on three benchmark datasets to evaluate the effectiveness of attribute completion and attribute calibration in ARHGA. Figure 6 presents the results, which is similar to that of ACM dataset. 



https://dblp.uni-trier.de/ https://dl.acm.org/ https://www.imdb.com/



Figure 1: The overview of ARHGA framework. ARHGA performs random sampling (a) to produce the AN S. The attributes of nodes in the AN S are completed or calibrated through attribute augmentation (b) to generate multiple graph augmentations (c), which are fed into to shared HGNNs to construct the consistency loss (d). The upper box shows an example of missing attributes in a sampling and the lower represents an example of defective attributes in another sampling.

Figure 2: (a) An illustration of completing the attributes of node A1 in DBLP dataset. (b) An illustration of calibrating the attributes of node P1 in ACM dataset.

Figure 3: Visualization of embeddings of paper nodes in ACM. Different colors correspond to different research in ground truth.

Figure 4: Comparisons of ARHGA with two variants(ARHGA-1, ARHGA-2) and MAGNN on node classification.

Figure 5: Sensitivity analysis on ACM dataset. We report the average result of the node classification across different training ratios.

The node set V, adjacency matrixA, attribute matrix X ∈ R n×d , time of augmentations M in each epoch, the number of attention head K, learning rate η, a HIN model:Φ(A, X, Θ). Output: Prediction Ỹ . 1: while not convergence do 2: for m = 1 : M do 3:Randomly select nodes from V to compose the AN S(m)   4: Perform attribute augmentation on the AN S (m) : xm v = mean( using the HINs model: Ỹ (m) = Φ(A, X(m) , Θ) 6: end for 7: Compute the supervised loss L

Figure 6: Comparisons of ARHGA with two variants(ARHGA-1, ARHGA-2) and MAGNN on node classification.

Results(%) of node classification on three datasets

