NODE IMPORTANCE SPECIFIC META LEARNING IN GRAPH NEURAL NETWORKS

Abstract

While current node classification methods for graphs have enabled significant progress in many applications, they rely on abundant labeled nodes for training. In many real-world datasets, nodes for some classes are always scarce, thus current algorithms are ill-equipped to handle these few-shot node classes. Some meta learning approaches for graphs have demonstrated advantages in tackling such few-shot problems, but they disregard the impact of node importance on a task. Being exclusive to graph data, the dependencies between nodes convey vital information for determining the importance of nodes in contrast to node features only, which poses unique challenges here. In this paper, we investigate the effect of node importance in node classification meta learning tasks. We first theoretically analyze the influence of distinguishing node importance on the lower bound of the model accuracy. Then, based on the theoretical conclusion, we propose a novel Node Importance Meta Learning architecture (NIML) that learns and applies the importance score of each node for meta learning. Specifically, after constructing an attention vector based on the interaction between a node and its neighbors, we train an importance predictor in a supervised manner to capture the distance between node embedding and the expectation of same-class embedding. Extensive experiments on public datasets demonstrate the state-of-the-art performance of NIML on few-shot node classification problems.

1. INTRODUCTION

Graph structure can model various complicated relationships and systems, such as molecular structure (Subramanian et al., 2005) , citationships (Tang et al., 2008b) and social media relationships (Ding et al., 2019) . The use of various deep learning methods (Hamilton et al., 2017; Kipf & Welling, 2016) to analyze graph structure data has sparked lots of research interest recently, where node classification is one of the essential problems. Several types of graph neural networks (GNNs) (Veličković et al., 2017; Wu et al., 2020) have been proposed to address the problem by learning high-level feature representations of nodes and addressing the classification task end-toend. Despite the success in various domains, the performance of GNNs drops dramatically under the few-shot scenario (Mandal et al., 2022) , where extremely few labeled nodes are available for novel classes. For example, annotating nodes in graph-structured data is challenging when the samples originate from specialist disciplines (Guo et al., 2021) like biology and medicine. Many meta learning works, including optimization-based methods (Finn et al., 2017) and metricbased methods (Snell et al., 2017; Vinyals et al., 2016) , have demonstrated their power to address few-shot problems in diverse applications, such as computer vision and natural language processing (Lee et al., 2022) . In meta learning, a meta learner is trained on various tasks with limited labeled data in order to be capable of fast generalization and adaption to a new task that has never been encountered before. However, it is considerably challenging to generalize these meta learning algorithms designed for independent and identically distributed (i.i.d.) Euclidean data to graph data. To address the few-shot node classification problem, some graph meta learning approaches have been proposed (Liu et al., 2021; Ding et al., 2020; Yao et al., 2020) . They structure the node classification problem as a collection of tasks. The key idea is to learn the class of nodes in the query set by transferring previous knowledge from limited support nodes in each task. However, most existing approaches simply assume that all labeled nodes are of equal importance to represent the class they belong to. Differences and interdependencies between nodes are not considered in the learning process of the few-shot models. Since only limited data points are sampled to generate tasks in meta learning, each sampled task has high variance; therefore, treating all the data points equally might lead to loss of the crucial information supplied by central data points and render the model vulnerable to noises or outliers. In particular, the relationship between nodes and neighbors in a graph is an important factor that carries node information in addition to node features, and can be utilized as a starting point to investigate the importance of nodes. Although some work (Ding et al., 2020) considers the importance of nodes, there is lack of theoretical analysis about it. To address the aforementioned challenges, we first explore, in a theoretical manner, the effect of distinguishing nodes of different degree of importance on the lower bound of the accuracy of the model. We analyze the ProtoNet (Snell et al., 2017) , and conclude that when important nodes are given more weight when computing prototype representations in a task, the prototype will get closer to its own expectation, thus the lower bound of the accuracy will be increased. Based on this theoretical result, we propose a node importance meta learning framework (NIML) for learning and using the node importance in a task. Specifically, an attention vector is constructed for each node to describe the relationship distribution of that node and its neighbors. Then we train a supervised model using this attention vector as input to learn the distance between the node embedding and the same-class prototype expectation, effectively capturing the importance of that node to its class. The obtained distance will be used to calculate a weighted prototype in meta learning. We conduct experiments on three benchmarks, and results validate the superiority of proposed NIML framework. To summarize, the main contributions of this paper are as follows: 1) We theoretically explore the influence of node importance on the lower bound of model accuracy and show the benefit of distinguishing between nodes of different importance in a meta learning task. The theory conclusion can be applied to any domain, not only graph data. 2) We design a category-irrelevant predictor to estimate the distance between node embedding and approximated prototype expectation and follow the theorem conclusion to compute a weighted prototype, where we construct an attention vector as the input, which describes the distribution of neighbor relationships for a given node. 3) We perform extensive experiments on various real-world datasets and show the effectiveness of our approach.

2.1. GRAPH NEURAL NETWORKS

Recent efforts to develop deep neural networks for graph-structured data have been largely driven by the phenomenal success of deep learning (Cao et al., 2016; Chang et al., 2015) . A large number of graph convolutional networks (GCNs) have been proposed based on the graph spectral theory. Spectral CNN (Bruna et al., 2013) mimics the properties of CNN by defining graph convolution kernels at each layer to form a GCN. Based on this work, researches on GCNs are increasingly getting success in (Defferrard et al., 2016; Henaff et al., 2015; Kipf & Welling, 2016) . Graph Attention Networks (GATs) (Veličković et al., 2017) learn the weights of node neighbors in the aggregation process by an attention mechanism. GraphSAGE (Hamilton et al., 2017) utilizes aggregation schemes to aggregate feature information from local neighborhoods. However, modern GNN models are primarily concerned with semi-supervised node classification. As a result, we develop a GNN framework to address the few-shot difficulties in graph data, which is one of their largest obstacles.

2.2. META LEARNING

Existing meta learning algorithms mainly fall into two categories (Hospedales et al., 2020): optimization-based meta learning and metric-based meta learning. Optimization-based meta learning (Finn et al., 2017; Li et al., 2017; Mishra et al., 2017; Ravi & Larochelle, 2016; Mishra et al., 2017) aims to learn an initialization of parameters in a gradient-based network. MAML (Finn et al., 2017) discovers the parameter initialization that is suitable for various few-shot tasks and can be used in any gradient descent model. MetaSGD (Li et al., 2017) advances MAML and learns the initialization of weights, gradient update direction, and learning rate in a single step. Metric-based meta learning (Liu et al., 2019; Ren et al., 2018; Snell et al., 2017; Sung et al., 2018; Vinyals et al., 2016) focuses on learning a generalized metric and matching function from training tasks. In partic-

