ACTIVE SAMPLING FOR NODE ATTRIBUTE COMPLE-TION ON GRAPHS

Abstract

Node attribute is one kind of crucial information on graphs, but real-world graphs usually face attribute-missing problem where attributes of partial nodes are missing and attributes of the other nodes are available. It is meaningful to restore the missing attributes so as to benefit downstream graph learning tasks. Popular GNN is not designed for this node attribute completion issue and is not capable of solving it. Recent proposed Structure-attribute Transformer (SAT) framework decouples the input of graph structures and node attributes by a distribution matching technique, and can work on it properly. However, SAT leverages nodes with observed attributes in an equally-treated way and neglects the different contributions of different nodes in learning. In this paper, we propose a novel active sampling algorithm (ATS) to more efficiently utilize the nodes with observed attributes and better restore the missing node attributes. Specifically, ATS contains two metrics that measure the representativeness and uncertainty of each node's information by considering the graph structures, representation similarity and learning bias. Then, these two metrics are linearly combined by a Beta distribution controlled weighting scheme to finally determine which nodes are selected into the train set in the next optimization step. This ATS algorithm can be combined with SAT framework together, and is learned in an iterative manner. Through extensive experiments on 4 public benchmark datasets and two downstream tasks, we show the superiority of ATS in node attribute completion.

1. INTRODUCTION

Node attribute, known as a kind of important information on graphs, plays a vital role in many graph learning tasks. It boosts the performance of Graph Neural Network (GNN) Defferrard et al. (2016) ; Kipf & Welling (2017) ; Xu et al. (2019b) ; Veličković et al. (2018) in various domains, e.g. node classification Jin et al. (2021) ; Xu et al. (2019a) and community detection Sun et al. (2021) ; Chen et al. (2017) . Meanwhile, node attribute provides human-perceptive demonstrations for the non-Euclidean structured data Zhang et al. (2019) ; Li et al. (2021) . In spite of its indispensability, real-world graphs may have missing node attributes due to kinds of reasons Chen et al. (2022) . For example, in citation graphs, key terms or detailed content of some papers may be inaccessible because of copyright protection. In social networks, profiles of some users may be unavailable due to privacy protection. When observing the attributes of partial nodes on graphs, it is significant to restore the missing attributes of the other nodes so as to benefit the downstream graph learning tasks. Namely, this is the goal of node attribute completion task. Currently, there are limited works on the node attribute completion problem. Recent graph learning algorithms such as network embedding Cui et al. (2018) and GNN are not targeted for this problem and are limited in solving it. Random walk based methods Perozzi et al. (2014) ; Tang et al. (2015) ; Grover & Leskovec (2016) are effective in learning node embeddings on large-scale graphs. While they only take the graph structures into consideration and ignore the rich information from node attributes. Attributed random walk models Huang et al. (2019) ; Lei Chen & Bronstein (2019) can potentially deal with this problem but they rely on high-quality random walks and carefully designed sampling strategies which are hard to be guaranteed Yang et al. (2019) . The popular GNN framework takes graph structures and node attributes as a coupled input and can work on the node attribute completion problem by some attribute-filling tricks, while these tricks introduce noise in learning and bring worse performance. In last few years, researchers begin to concentrate on the learning problem on the attribute-missing graphs. Chen et al. (2022) propose a novel structure-attribute transformer (SAT) framework that can handle the node attribute completion case. SAT leverages structures and attributes in a decoupled scheme and achieves the joint distribution modeling by matching the latent codes of structures and attributes. Although SAT has shown great promise on node attribute completion problem, it leverages the nodes with observed attributes in an equally-treated manner and ignores the different contributions of nodes in the learning schedule. Given limited nodes with observed attributes, it is more important to notice that different nodes have different information (e.g. degrees, neighbours, etc.) and should have different importance in the learning process. Importance re-weighting Wang et al. (2017) ; Fang et al. (2020) ; Byrd & Lipton (2019) on the optimization objective may come to mind to be a potential solution. Whereas, the information of nodes is influenced by each other and has more complex patterns. The importance distribution is implicit, intractable and rather complicated, raising great difficulties to design its formulation. It's challenging to find a more practical way to exert the different importance of the partial nodes with observed attributes at different learning stages. In this paper, we propose an active sampling algorithm named ATS to better leverage the partial nodes with observed attributes and help SAT model converge to a more desirable state. In particular, ATS measures the representativeness and uncertainty of node information on graphs to adaptively and gradually select nodes from the candidate set to the train set after each training epoch, and thus encourage the model to consider the node's importance in learning. The representativeness and uncertainty are designed by considering the graph structures, representation similarity and learning bias. Furthermore, it is interesting to find that the learning prefers nodes of high representativeness and low uncertainty at the early stage while low representativeness and high uncertainty at the late stage. Thereby, we proposes a Beta distribution controlled weighting scheme to exert adaptive learning weights on representativeness and uncertainty. In this way, these two metrics are linearly combined as the final score to determine which nodes are selected into the train set in next optimization epoch. The active sampling algorithm (ATS) and the SAT model are learned in an iterative manner until the model converges. Our contributions are as summarized follows: • In node attribute completion, to better leverage the partial nodes with observed attributes, we advocate to use active sampling algorithm to adaptively and gradually select samples into the train set in each optimization epoch and help the model converge to a better state. • We propose a novel ATS algorithm to measure the importance of nodes by designed representativeness and uncertainty metrics. Furthermore, when combining these two metrics as the final score function, we propose a Beta distribution controlled weighting scheme to better exert the power of representativeness and uncertainty in learning. • We combine ATS with SAT, a newly node attribute completion model, and conduct extensive experiments on 4 public benchmarks. Through the experimental results, we show that our ATS algorithm can help SAT reach a better optimum, and restore higher-quality node attributes that benefit downstream node classification and profiling tasks. (2021) . GNN performs a message passing scheme, which is reminiscent of standard convolution as in Graph Convolutional Networks (GCN) Kipf & Welling (2017) . GNN can infer the distribution of nodes based on node attributes and edges and achieve impressive results on graph-related tasks. There are also numerous creative modifications in GNN.

2. RELATED WORK

GAT Veličković et al. (2018) Nevertheless, most of today's popular active sampling algorithms on graphs aim to resolve the node classification task and focus on how to reduce the annotation cost. For this node attribute completion task, since the attribute-observed nodes are limited and the dimension of node attributes is much higher than node classes, we demand a more advanced active sampling algorithm to help the primary model utilize the attribute-observed nodes more efficiently and learn the complicated attribute distribution better. In addition, the current query strategies measure the uncertainty by an unsupervised manner, but we propose a supervised one to make the sampling closer to the primary model.

3.1. PROBLEM DEFINITION

For node attribute completion task, we denote G = (V, A, X) as a graph with node set V = {v 1 , v 2 , . . . , v N }, the adjacency matrix A ∈ R N ×N and the node attribute matrix X ∈ R N ×F . V o = {v o 1 , v o 2 , ..., v o No } is the set of attribute-observed nodes. The attribute information of V o is X o = {x o 1 , x o 2 , ..., x o No } and the structural information of V o is A o = {a o 1 , a o 2 , ..., a o No }. V u = {v u 1 , v u 2 , ..., v u Nu } is the set of attribute-missing nodes. The attribute information of V u is X u = {x u 1 , x u 2 , ..., x u Nu } and the structural information of V u is A u = {a u 1 , a u 2 , ..., a u Nu }. More specifically, we have V = V u ∪ V o , V u ∩ V o = ∅, and N = N o + N u . We expect to complete the missing node attributes X u based on the observed node attributes X o and structural information A. For active sampling algorithm, we denote the total training set as T , in which the node attributes are known. The current training set of SAT model is T L and the set containing the candidate nodes is denoted as T U . We have T = T L ∪ T U . We design a reasonable sampling strategy named ATS which iteratively transfers the most suitable candidate nodes from T U to T L to boost the training efficiency of SAT until T U = ∅ and the model converges. Since we combine SAT with our ATS to demonstrate how ATS works, we briefly introduce SAT in this part. The general architecture of SAT is shown in Figure 1 . SAT inputs structures and attributes in a decoupled manner, and matches the joint distribution of structures and attributes by a paired structure-attribute matching and an adversarial distribution matching. During the paired structure-attribute matching, we have a structure encoder E A (a two-layer GNN such as GCN) and an attribute encoder E X (a two-layer MLP) that encodes the structural information a i and the attribute information x i into z a and z x , respectively. Then two decoders D A and D X decode z a and z x as the structures a i and attributes x i in both parallel and cross ways. Encoders and decoders are parameterized by ϕ and θ respectively. The joint reconstruction loss L r of SAT can be written as:

3.2. STRUCTURE-ATTRIBUTE TRANSFORMER

min θx,θa,ϕx,ϕa L r = - 1 2 E xi [E q ϕx (zx|xi) [log p θx (x i |z x )]] - 1 2 E ai [E q ϕa (za|ai) [log p θa (a i |z a )]] - 1 2 λ c • E ai [E q ϕa (za|ai) [log p θx (x i |z a )]] - 1 2 λ c • E xi [E q ϕx (zx|xi) [log p θa (a i |z x )]] where p θ x and p θ a are the encoders, q ϕx and p ϕa are the decoders. The first two terms in Eq. 1 represent the self-reconstruction stream. The latent variable z x , z a are decoded to Xo , Â by twolayer MLP decoders D X , D A respectively. The last two terms indicate the cross-reconstruction stream, where z x and z a are decoded to Â and Xo , respectively. During the adversarial distribution matching, SAT expects to match the posterior distributions q ϕx (z x |x i ) and q ϕa (z a |a i ) to a Gaussian prior p(z) ∼ N (0, 1). Inspired by Makhzani et al. (2015) , SAT adopts an efficient adversarial matching approach between z x , z a and samples from the Gaussian distribution. The adversarial distribution matching loss L adv can be written as a minimax game: min ψ max ϕx,ϕa L adv = -E zp∼p(z) [log D(z p )] -E zx∼q ϕx (zx|xi) [log(1 -D(z x ))] -E zp∼p(z) [log D(z p )] -E za∼q ϕa (za|ai) [log(1 -D(z a ))] where ψ is the parameters of the shared discriminator D. In summary, the objective function of SAT is: min Θ max Φ L = L r + L adv where Θ = {θ x , θ a , ϕ x , ϕ a , ψ}, Φ = {ϕ x , ϕ a }. In the training phase of the node attribute completion task, SAT aims to minimize the reconstruction loss between Â, Xo and A, X o in Eq. 1, as well as the adversarial loss in Eq. 2. In testing, it encodes the structural information A u of attributemissing nodes by the encoder E A and then restore their missing attributes X u by the decoder D X .

4. METHOD

We design the query strategy of ATS by measuring the representativeness and uncertainty of the candidate nodes. Then we combine the scores of uncertainty and representativeness as the final score by an adaptive reweighting scheme and select the nodes with the highest scores for next learning epoch. We will explain these more in the following parts.

4.1. QUERY STRATEGY OF ATS

Representativeness: The major and typical patterns among the nodes are vital for the model to converge to the right direction. In this section, we introduce the concept of representativeness as a sampling metric. This metric is composed of two parts: 1) information density ϕ density and 2) structural centrality ϕ centrality . The former mainly focuses on measuring the similarity between the corresponding latent vectors of attributes and structures. The latter indicates how closely a node is connected to its neighbours on graph. In other words, the information density is inspired by the good representation learning ability of SAT and the structural centrality is natural to mine the information on the graph structures. These two aspects offer us a comprehensive analysis of the representativeness in both implicit and explicit ways. We first focus on the information density. SAT proposes a shared-latent space assumption for node attribute completion. We can study the node similarities through the implied features learned by the model. If there is a dense distribution of representation vectors in a local region of the latent space, the corresponding nodes will have more similar features and this region will contain further mainstream information, so we expect to train these more representative nodes in priority. For node attribute completion task, although there are attribute embeddings and structure embeddings in shared-latent space, our ATS only uses the structure embeddings z ai to calculate the ϕ density as shown in Eq. 4 since we rely on the structural representations to restore the missing node attributes. In order to find the central node located in high-density region, we employ the K-means algorithm in the latent space and calculate the Euclidean distance between each node and its clustering center. Given d as the metric of Euclidean distance in l 2 -norm and C za i as the clustering center of z ai in latent space, the formulation of ϕ density is written as: ϕ density (v i ) = 1 1 + d(z ai , C za i ) , v i ∈ T U The larger the ϕ density is, the more representative the node is, and the node contains more representative features that are worthy of the model's attention. Besides the feature analysis in latent space, the node representativeness can also be inferred from the explicit graph structures. We can study the connections between nodes and develop a metric to calculate the node centrality based on the structural information. Intuitively, the centrality can have a positive correlation with the number of neighbours. At the early stage of training, if we can focus on these nodes, the model will learn the approximate distribution of the data faster and reduce the influence caused by the noisy ones. PageRank Page et al. (1999) algorithm is an effective randomwalk method to acquire the visiting probabilities of nodes. We utilize the PageRank score as the structural centrality ϕ centrality , which is shown as below: ϕ centrality (v i ) = ρ j A ij ϕ centrality (v j ) k A jk + 1 -ρ N U , v i ∈ T U (5) where N U is the number of nodes in T U , ρ is the damping parameter. The larger ϕ centrality is, the more representative the node is, and the node is more closely associated with its neighbours. Uncertainty: Uncertainty reflects the learning state of the current model towards the nodes. When the model is reliable, it's reasonable to pay more attention on the nodes that have not been sufficiently learned. Uncertainty is a commonly-used query criterion in active learning. However, as mentioned before, the uncertainty in other sampling algorithms Cai et al. (2017) ; Caramalau et al. (2021) ; Zhang et al. (2022a) usually works for node classifications and is designed in an unsupervised manner to reduce the annotation cost. In this paper, for the node attribute completion task, in order to know the training status of the model more accurately, we consider the observed attributes and structures as supervision, and use the learning loss in SAT as the uncertainty metric, noted as ϕ entropy (v i ). ϕ entropy (v i ) = L r (v i ) + L adv (v i ), v i ∈ T U (6) We can input the attributes of candidate nodes and the corresponding graph structures into SAT, and then obtain their loss values. The larger ϕ entropy (v i ) is, the more uncertainty of node v i has. From the perspective of information theory, nodes with greater uncertainty contain more information. Sampling these nodes can help the model get the information that has not been learned in previous training, thus enhancing the training efficiency.

4.2. SCORE FUNCTION AND BETA DISTRIBUTION CONTROLLED WEIGHTING SCHEME

We have presented three metrics of our query strategy. Then, a question arises: How to combine these metrics to score each node? Combing the metrics with a weighted sum is a possible solution but still faces great difficulties. First, the values of different metrics are incomparable because of the distinct dimensional units. Second, the different metrics may take different effects at different learning stages. To solve these issues, we introduce a percentile evaluation and design a Beta-distribution controlled re-weighting scheme to exert the functions of each metric, since Beta distribution is a suitable model for the random behavior of percentages and proportions Gupta & Nadarajah (2004) . Denote P ϕ (v i , T U ) as the percentage of the candidate nodes in T U which have smaller values than the node v i with metric ϕ. For example, if there are 5 candidate nodes and the scores of one metric is [1, 2, 3, 4, 5], the percentile of the corresponding 5 nodes will be [0, 0.2, 0.4, 0.6, 0.8]. We apply the percentile to three metrics and define the final score function of ATS as: S(v i ) = α • P entropy (v i , T U ) + β • P density (v i , T U ) + γ • P centrality (v i , T U ) where α + β + γ = 1.  T U ← T U \ T S ; end end Further, it is worth noting that the uncertainty and the information density are determined by the training result returned from SAT. At an early training stage, the model is unstable and the returned training result may not be quite reliable. A sampling process based on inaccurate model-returned results may lead to undesirable results. Hence, we set the weights to time-sensitive ones. The structure-related weight γ is more credible so it can be larger initially. As the training epoch increases, the model can pay more attention to ϕ entropy and ϕ density , while the weight γ will decrease gradually. We formalize this by sampling γ from a Beta distribution, of which the expectation becomes smaller with the increase of training epoch. The weighting values are defined as: γ ∼ Beta(1, n t ), n t = n e ϵ and α = β = 1 -γ 2 . ( ) where n t is one of the determinants in Beta distribution; ϵ is used to control the expectation of γ; n e denotes the current number of epochs. We obtain the expectation by calculating the average value of 10,000 random samples.

4.3. ITERATIVE TRAINING AND IMPLEMENTATION

In general, our method consists of two stages: one is SAT, responsible for the training stage; the other is ATS, responsible for the sampling stage. Before the training, we divide total training set T into T U and T L . We randomly sample 1% of the nodes in T as the initial nodes of T L and the rest composes T U . SAT will be trained on the changeable T L . Once SAT accomplishes a single training epoch, ATS starts the sampling process. We sample the most representative and informative candidate nodes from T U according to the query strategy. These selected nodes are added to T L and removed from T U . Then SAT will be trained on the renewed T L at next epoch. The training stage and the sampling stage alter iteratively until T U is null. Finally ATS is terminated and SAT will continue training to convergence. We clarify the learning process in Algorithm 1. Evaluation metrics: In node attribute completion, the restored attributes can provide side information for nodes and benefit downstream tasks. By following SAT Chen et al. (2022) , we study the effect of ATS on two downstream tasks including node classification task in the node level and profiling task in the attribute level. In node classification, the restored attributes serve as one kind of data augmentation and supply more information to the down-stream classification task. In profiling, we aim to predict the possible profile (e.g. key terms of papers in Cora) in each attribute dimension.

5. EXPERIMENTS AND ANALYSIS

Parameters setting: In the experiment, we randomly sample 40% nodes with attributes as training data, 10% nodes as validation data and the rest as test data. The attributes of validation and test nodes are unobserved in training. For the baselines, the parameters setting and the experiment results refer to Chen et al. (2022) . For our ATS method, the SAT's setting remains the same, such as λ c . We mainly have two hyper-parameters: ϵ in the weighting scheme and cluster numbers in the estimation of density ϕ density . Considering the objective of the Beta distribution weighting scheme, ϵ should be larger than the total sampling times. Hence in Cora and Citeseer, we set ϵ = 1500 and when it comes to Amazon Photo and Amazon Computer, we set ϵ = 2000. In addition, we set the cluster number as 10, 15, 10, 15 for Cora, Citeseer, Amazon Photo and Amazon Computer.

5.3.1. NODE CLASSIFICATION

Classification is an effective downstream task to test the quality of the recovered attributes. In node classification task, the nodes with restored attributes are split into 80% training data and 20% test data. Then we conduct five-fold cross-validation in 10 times and take the average results of evaluation metrics as the model performance. We use two supervised classifiers: MLP and GCN. The MLP classifier is composed by two fully-connected layers, which classifies the nodes based on attributes. The GCN classifier is an end-to-end graph representation learning model, which can learn the structure and attributes simultaneously. Results are shown in Table 1 . According to the results of "X" row where only node attributes are used, the optimized SAT model with our proposed ATS algorithm achieves obvious improvement than original SAT model. Our ATS can also adapt to SAT with different GNN backbones (e.g. GCN and GAT) and achieve higher classification accuracy than the original models. For the results of "A+X" row where both structures and node attributes are used by a GCN classifier, our method achieves the highest score in Citeseer and Amazon-Computer, with 0.84% and 0.56% respectively, because ATS contains the density metric and can help the model better learn the inner semantic structures. 

5.3.2. PROFILING

The model outputs the restored attributes in different dimensions with probabilities. Higher corresponding probabilities of ground-truth attributes signify better performance. In this section, we use two common metrics Recall@k and NDCG@k to evaluate the profiling performance. The experiment results are shown in Table 2 . According to the profiling results in Table 2 , on the basis of the advantages established by the SAT model towards other baselines, the combination of the ATS algorithm and SAT model (ATS+SAT) obtains even higher performance in almost all the evaluation metrics and almost all the datasets. For example, ATS+SAT(GAT) obtains a relative 13.5% gain of Recall@10 and a relative 13.3% gain of NDCG@10 on Citeseer compared with SAT(GAT). The main reason of these results is that the active sampling algorithm ATS helps SAT model to realize different importance of different nodes in learning, and thus facilitates better distribution modeling of the high-dimensional node attributes.

5.4. STUDY OF THE WEIGHTING SCHEME

Besides the active sampling metrics, the Beta distribution controlled weighting scheme is also a highlight of the ATS algorithm. We will verify the effectiveness of our proposed scheme in comparison with other weighting schemes, such as the fixed weighting scheme and the linear variation weighting scheme. For the fixed one, the values of γ are 0.2, 1 3 , 0.6, and α = β = 1-γ 2 . For the linear variation one, γ decreases linearly from 1 to 0.5 or from 1 to 0. Table 2 : Profiling of the attribute-level evaluation for node attribute completion.

Method

Recall@10 Recall@20 Recall@50 NDCG@10 NDCG@20 NDCG@50 Recall@20 fixed-0.2 fixed-1/3 fixed-0.6 linear-0 linear-0.5 beta (c) Amazon photo From Figure 2 , we see our proposed weighting scheme outperforms other schemes because Beta distribution changes the weights dynamically during the sampling process and meanwhile remains some randomness to improve the robustness of the algorithm. 

A APPENDIX

A.1 DETAILS ABOUT THE BASELINES NeighAggre is an intuitive attribute aggregation algorithm. It completes one node's missing attributes by averaging its neighbour nodes' attributes, which is a simple but efficient method to take advantage of the structural information. VAE is a famous generative model, which consists of an encoder and a decoder. For test nodes without the attributes, the encoder will generate the corresponding latent code through the neighbour aggregation. Then the decoder will restore the missing attributes. GCN, GraphSage and GAT are three typical graph representation learning methods. For attribute-missing scenario, only the graph structure will be encoded to latent codes. The missing attributes will be recovered by the decoders of these GNN methods from the latent code generated by the encoders. Hers is a cold-start recommendation method. GraphRNA and ARWMF are two attributed random walk based methods to learn the node representations, which can be extended to deal with the missing attributes problems. They separate the graph structure and node attributes and learn the node embeddings by random walks.

A.2 ABLATION STUDY OF DIFFERENT METRICS IN ATS

In this section, we conduct the ablation study to investigate the effects of three different metrics in ATS. The experimental settings remain the same as the profiling task. We use Recall@20 to evaluate the performance of different metric combinations. The results are shown in Figure 3 . In Cora, centrality-only sampling method hurts the profiling performance. Different metrics focus on different aspects and the result shows that they can complement each other. The uncertainty metric focuses on the training status of the model, while the representativeness metric focuses on the implied information from both the structure and attribute aspects. Generally, any subgroup of the sampling criteria is inferior to the results achieved by the complete ATS.

A.3 EMPIRICAL TIME COMPLEXITY ANALYSIS

Our ATS is an active sampling procedure based on the SAT model, so it's critical to study the extra processing time cost by the ATS. Thus we conduct an experiment to count the running time of different parts of the ATS compared with the original SAT. These different parts are forward process, uncertainty and representativeness. The forward process means the forward propagation, which is essential to calculate the uncertainty score. We implement the experiment on a machine with one Nvidia 1080Ti GPU. According to the running time shown in Figure 4 , the forward propagation in ATS is much faster than SAT due to the time-consuming back propagation in SAT. Although the processing time of uncertainty metric and representativeness metric is relatively higher than SAT because of the clustering and percentile calculations, it's comparable with the time of SAT. With the addition of the ATS algorithm, the time required for each epoch will increase within an acceptable range. A.4 SENSITIVITY OF THE HYPERPARAMETERS As mentioned in Section 5.2, cluster number is a vital hyper-parameter that determines the information density of each node. We conduct the experiments on both the profiling and classification tasks with different cluster numbers. The results of Figure 5 show that too large or too small cluster numbers are not conducive to the training. If there are not enough cluster centers, the sampling algorithm is not robust to extract the density of the embedding distribution. On the other hand, if there are too many cluster centers, it will



CONCLUSIONIn this paper, we propose a novel active sampling algorithm ATS to better solve the node attribute completion problem. In order to distinguish the differences in the amount of information among nodes, ATS utilizes the proposed uncertainty and representativeness metrics to select the most informative nodes and renew the training set after each training epoch. Further, the Beta distribution controlled weighting scheme is proposed to adjust the metric weights dynamically according to the training status. The sampling process increases the running time of each epoch within an affordable cost, but meanwhile helps the base model achieve superior performance on profiling and node classification tasks. Therefore, ATS is effective in boosting the quality of restored attributes.



Figure 1: The general architecture of SAT. The attributes and the structure are encoded by E X and E A and reconstructed by D X and D A . Meanwhile, SAT matches the latent codes of structures and attributes to a prior distribution by adversarial distribution matching.

We compare SAT model combined with ATS with other baselines introduced in Chen et al. (2022): NeighAggre S ¸ims ¸ek & Jensen (2008), VAE Kingma & Welling (2013), GCN Kipf & Welling (2017), GraphSage Hamilton et al. (2017), GAT Veličković et al. (2018), Hers Hu et al. (2019), GraphRNA Huang et al. (2019), ARWMF Lei Chen & Bronstein (2019) and original SAT. Details about how they work on node attribute completion are illustrated in Appendix A.1.

Figure 2: Visualization of profiling performance of different weighting schemes on test data during training process. We compare our Beta distribution controlled weighting scheme with other weighting schemes(e.g. fixed weight, linear variation).

Figure 4: The comparison among the average processing GPU time per epoch of different model components. 'Forward' indicates the forward propagation that is a part of the calculation in uncertainty metric.

Figure 5: Results with different cluster numbers when calculating the density score in the representativeness metric. (a-c) show the Recall@20 results for profiling task. (d-f) show the attribute-only classification accuracy with the use of MLP classifier. (g-h) show the classification accuracy considering both the structure and attribute information.

introduces multi-head attention into GNN. GraphSAGEHamilton et al. (2017) moves to the inductive learning setting to deal with large-scale graphs.

At the sampling stage, ATS will select one or several nodes with the largest S and add them to the training set T L for the next training epoch of SAT. density + γ • P centrality ; // select the node with the highest score T S ← activeSample(S, T U ); // renew the training set of SAT T L ← T L ∪ T S ; // renew the candidate set

Node classification of the node-level evaluation for node attribute completion. "X" indicates the MLP classifier that only considers the node attributes. "A+X" indicates the GCN classifier that considers both the structures and node attributes.

Ablation study of different metrics in ATS. We show the recall@20 result of different combinations of the sampling metrics on 3 benchmarks. The horizontal coordinate refers to the different sampling criteria combinations. 'E' indicates the entropy metric; 'D' indicates the density metric; 'C' indicates the centrality metric; 'E+D+C' indicates our ATS algorithm.

annex

introduce more disturbance and might separate the nodes belonging to the same class. We determine the value of hyper-parameter based on the Recall@20 results in the profiling task.

