ACTIVE SAMPLING FOR NODE ATTRIBUTE COMPLE-TION ON GRAPHS

Abstract

Node attribute is one kind of crucial information on graphs, but real-world graphs usually face attribute-missing problem where attributes of partial nodes are missing and attributes of the other nodes are available. It is meaningful to restore the missing attributes so as to benefit downstream graph learning tasks. Popular GNN is not designed for this node attribute completion issue and is not capable of solving it. Recent proposed Structure-attribute Transformer (SAT) framework decouples the input of graph structures and node attributes by a distribution matching technique, and can work on it properly. However, SAT leverages nodes with observed attributes in an equally-treated way and neglects the different contributions of different nodes in learning. In this paper, we propose a novel active sampling algorithm (ATS) to more efficiently utilize the nodes with observed attributes and better restore the missing node attributes. Specifically, ATS contains two metrics that measure the representativeness and uncertainty of each node's information by considering the graph structures, representation similarity and learning bias. Then, these two metrics are linearly combined by a Beta distribution controlled weighting scheme to finally determine which nodes are selected into the train set in the next optimization step. This ATS algorithm can be combined with SAT framework together, and is learned in an iterative manner. Through extensive experiments on 4 public benchmark datasets and two downstream tasks, we show the superiority of ATS in node attribute completion.



For example, in citation graphs, key terms or detailed content of some papers may be inaccessible because of copyright protection. In social networks, profiles of some users may be unavailable due to privacy protection. When observing the attributes of partial nodes on graphs, it is significant to restore the missing attributes of the other nodes so as to benefit the downstream graph learning tasks. Namely, this is the goal of node attribute completion task. 2019) can potentially deal with this problem but they rely on high-quality random walks and carefully designed sampling strategies which are hard to be guaranteed Yang et al. (2019) . The popular GNN framework takes graph structures and node attributes as a coupled input and can work on the node attribute completion problem by some attribute-filling tricks, while these tricks introduce noise in learning and bring worse performance. In last few years, researchers begin to concentrate on the learning



known as a kind of important information on graphs, plays a vital role in many graph learning tasks. It boosts the performance of Graph Neural Network (GNN) Defferrard et al. (2016); Kipf & Welling (2017); Xu et al. (2019b); Veličković et al. (2018) in various domains, e.g. node classification Jin et al. (2021); Xu et al. (2019a) and community detection Sun et al. (2021); Chen et al. (2017). Meanwhile, node attribute provides human-perceptive demonstrations for the non-Euclidean structured data Zhang et al. (2019); Li et al. (2021). In spite of its indispensability, real-world graphs may have missing node attributes due to kinds of reasons Chen et al. (2022).

Currently, there are limited works on the node attribute completion problem. Recent graph learning algorithms such as network embedding Cui et al. (2018) and GNN are not targeted for this problem and are limited in solving it. Random walk based methods Perozzi et al. (2014); Tang et al. (2015); Grover & Leskovec (2016) are effective in learning node embeddings on large-scale graphs. While they only take the graph structures into consideration and ignore the rich information from node attributes. Attributed random walk models Huang et al. (2019); Lei Chen & Bronstein (

