ATTRIBUTES RECONSTRUCTION IN HETEROGENEOUS NETWORKS VIA GRAPH AUGMENTATION

Abstract

Heterogeneous Graph Neural Networks(HGNNs), as an effective tool for mining heterogeneous graphs, have achieved remarkable performance on node classification tasks. Yet, HGNNs are limited in their mining power as they require all nodes to have complete and reliable attributes. It is usually unrealistic since the attributes of many nodes in reality are inevitably missing or defective. Existing methods usually take imputation schemes to complete missing attributes, in which topology information is ignored, leading to suboptimal performance. And some graph augmentation techniques have improved the quality of attributes, while few of them are designed for heterogeneous graphs. In this work, we study the data augmentation on heterogeneous graphs, tackling the missing and defective attributes simultaneously, and propose a novel generic architecture-Attributes Reconstruction in Heterogeneous networks via Graph Augmentation(ARHGA), including random sampling, attribute augmentation and consistency training. In graph augmentation, to ensure attributes plausible and accurate, the attention mechanism is adopted to reconstruct attributes under the guidance of the topological relationship between nodes. Our proposed architecture can be easily combined with any GNN-based heterogeneous model, and improves the performance. Extensive experiments on three benchmark datasets demonstrate the superior performance of ARHGA over strate-of-the-art baselines on semi-supervised node classification.

1. INTRODUCTION

Heterogeneous information networks(HINs) (Yang et al. (2020) ; Shi et al. (2016) ; Shen et al. (2017) ), which contain multiple types of nodes and edges, have been widely used to model complex systems and solve practical problems. Recently, heterogeneous graph neural networks have emerged as prevalent deep learning architectures to analyze HINs and shown superior performance in various graph analytical tasks, such as node classification (Wang et al. (2019a) ; Yun et al. (2019) ) and link prediction (Fu et al. (2020) ; Zhang et al. (2019) ). Most HGNNs follow a message-passing scheme in which each node updates its embedding by aggregating information of its neighbors'attributes. Such message-passing scheme usually requires that all nodes have complete and reliable attributes, which is not always satisfied in practice due to resource limitation and personal privacy, resulting in missing and defective attributes. In general, the attribute missing in heterogeneous graphs means that attributes of partial nodes are entirely missing, compared to that in homogeneous graphs, is more frequent and complex. Take DBLP (Sun & Han (2013) ) as an example, the network has four types of nodes(author, paper, term and venue) and three types of links. Only paper nodes have attributes which are extracted from the keywords in their titles, while other types of nodes have no attributes. It impairs the effectiveness of the corresponding graph mining model to certain extents. In another fold, the original attributes of nodes are sometimes not ideal since heterogeneous graphs are extracted from complex systems which inevitably are subject to various forms of contamination, such as mistakes and adversarial attacks, making error propagation and greatly affecting the process of message-passing. This suggests the need for effective approaches able to complete missing attributes and calibrate defective attributes in heterogeneous graphs simultaneously. To alleviate the effect incurred from missing attributes, the existing models usually adopt imputation strategy, such as neighbor's average or one-hot vector as done in MAGNN(Fu et al. (2020) ). These imputation methods are nonoptimal because graph structure information is ignored and only rare useful information is provided, hampering subsequent analysis. An alternative technique to tackle this issue is to consider graph topology information and inject it into the completion models. The work of Jin et al. (2021) and He et al. (2022) has shown a significant boost on node classification tasks. But both methods naturally assume that the original attributes are reliable, which is not easy to satisfy in real-world applications. In another concern of research, some graph augmentation techniques are adopted to calibrate original attributes to improve the quality and have shown a promising performance (Xu et al. (2022) ; Zhu et al. ( 2021)). However, these methods are deficient for heterogeneous graphs as they are not capable of encoding complex interaction. Further, existing methods either only complete missing attributes or only improve the quality of attributes, while it is worthy making efforts to solving both problems at the same time. In this paper, we attempt to deal with the missing and defective attributes simultaneously in heterogeneous graphs, and propose a novel framework for Attributes Reconstruction in Heterogeneous networks via Graph Augmentation(ARHGA). ARHGA repeatedly sample nodes to perform attribute augmentation to obtain multiple augmented attributes for each node, and then utilize consistency training(Xie et al. ( 2020)) to make the outputs of different augmentations as similar as possible. Moreover, to ensure the augmented attributes more accurate, node topological embeddings are learned through HIN-embedding methods(Dong et al. ( 2017 Contributions. In summary, the main contributions of this paper are as follows: • We propose a generic architecture of graph augmentation on heterogeneous networks for attributes reconstruction, focusing both on the missing and defective attributes. • We design an effective attribute-wise augmentation strategy implemented by attention mechanism, which integrates topology information to increase the reliability of the reconstructed attributes. 



); Fu et al. (2017); Shang et al. (2016); Wang et al. (2019b)) to capture graph structure information as guidance. In this way, ARHGA effectively enhances the performance of existing GNN-based heterogeneous models in aid of the reconstructed attributes.

Extensive experimental results on three node classification benchmark datasets demonstrate the effectiveness of our proposed model. Heterogeneous graph neural networks. Heterogeneous graphs have been widely used to solve real-world problems due to a diversity of node types and relationships between nodes. Recently, many HGNNs have been proposed to analyze HINs. HAN(Wang et al. (2019a)) learns node representations using the node-level attention and the meta-path-level attention. Fu et al. (2020) further consider the intermediate nodes and propose the node-type specific transformation, then use a hierarchical attention structure similar to HAN. Graph Transformer Networks(Yun et al. (2019)) generates a new graph by identifying useful connections between unconnected nodes on the original graph, then performs graph convolution on the new graph. Zhang et al. (2019) sample heterogeneous neighbors for each node by random walk and aggregate node and type level information. Hu et al. (2020) design meta-relation-based mutual attention to handle graph heterogeneity and implicitly learn meta paths. Nevertheless, these methods require that all nodes have complete and reliable attributes and encounter difficulties in dealing with heterogeneous graphs with the missing and defective attributes. Learning with missing attributes. Previous handcrafted approaches to fill in missing attributes rarely consider graph topology information, making inefficient attributes and compromising model performance. Several deep learning models have been explored. You et al. (2020) design a unified framework in which attribute completion is modeled as an edge-level prediction task and the label prediction as a node-level prediction task. Chen et al. (2020) develop a novel GNN framework to perform the link prediction task and the attribute completion task based on distribution matching. Despite tremendous success, they are not suitable for HINs with missing attributes because the complex interaction has not been addressed. Recent advance in HGNNs has provided new directions to solve the problem. Jin et al. (2021) attempt to complete missing attributes in a learnable manner and propose the framework HGNN-AC, which contains pre-training of topology embedding and attribute completion with attention mechanism. He et al. (2022) design an unsupervised heterogeneous graph contrastive learning approach for analyzing HINs with missing attributes. Both

