DISTRIBUTIONAL SIGNALS FOR NODE CLASSIFICA-TION IN GRAPH NEURAL NETWORKS

Abstract

In graph neural networks (GNNs), both node features and labels are examples of graph signals, a key notion in graph signal processing (GSP). While it is common in GSP to impose signal smoothness constraints in learning and estimation tasks, it is unclear how this can be done for discrete node labels. We bridge this gap by introducing the concept of distributional graph signals. In our framework, we work with the distributions of node labels instead of their values and propose notions of smoothness and non-uniformity of such distributional graph signals. We then propose a general regularization method for GNNs that allows us to encode distributional smoothness and non-uniformity of the model output in semisupervised node classification tasks. Numerical experiments demonstrate that our method can significantly improve the performance of most base GNN models in different problem settings.

1. INTRODUCTION

We consider the semi-supervised node classification problem (Kipf & Welling, 2017) that determines class labels of nodes in graphs given sample observations and possibly node features. Numerous graph neural network (GNN) models have been proposed to tackle this problem. One of the first models is the graph convolutional network (GCN) (Defferrard et al., 2016) . Interpreted geometrically, a GCN aggregates information such as node features from the neighborhood of each node of the graph. Algebraically, this process is equivalent to applying a graph convolution filter to node feature vectors. Subsequently, many GNN models with different considerations are introduced. Popular models include the graph attention network (GAT) (Velicković et al., 2018) that learns weights between pairs of nodes during aggregation, and the hyperbolic graph convolutional neural network (HGCN) (Chami et al., 2019) that considers embedding of nodes of a graph in a hyperbolic space instead of a Euclidean space. For inductive learning, GraphSAGE (Hamilton et al., 2017) is proposed to generate low-dimensional vector representations for nodes that are useful for graphs with rich node attribute information. While new models draw inspiration from GCN, GCN itself is built upon the foundation of graph signal processing (GSP). GSP is a signal processing framework that handles graph-structured data (Shuman et al., 2013; Ortega et al., 2018; Ji & Tay, 2019) . A graph signal is a vector with each component corresponding to a node of a graph. Examples include node features and node labels. Moreover, convolutions used in models such as GCN are special cases of convolution filters in GSP (Shuman et al., 2013) . All these show the close connections between GSP theory and GNNs. In GSP, signal smoothness (over the graph) is widely used to regularize inference tasks. Intuitively, a signal is smooth if its values are similar at each pair of nodes connected by an edge. One popular way to formally define signal smoothness is to use the Laplacian quadratic form. There are numerous GSP tools that leverage a smooth prior of the graph signals. 2019) that co-trains GNN models with an additional agreement model, which gives the probability that two nodes have the same label. We partially agree with the above assertion regarding graph Laplacian regularization, while remaining reservative about its full correctness. In this paper, we propose a method that is inspired by Laplacian regularization. As our main contribution, we introduce the notion of distributional graph signals, instead of considering graph signals. Analogous to the graph signal smoothness defined using graph Laplacian, we define the smoothness of distributional graph signals. Together with another property known as non-uniformity, we devise a regularization scheme for GNNs in node classification tasks. This approach is easy to implement and can be used as a plug-in regularization term together with any given base GNN model. Its effectiveness is demonstrated with numerical results.

2. DISTRIBUTIONAL GRAPH SIGNALS

In this section, we motivate and introduce distributional graph signals based on GSP theory.

2.1. GSP PRELIMINARIES AND SIGNAL SMOOTHNESS

In this subsection, we give a brief overview of GSP theory (Shuman et al., 2013) . The focus is on the discussion of graph signal smoothness. Let G = (V, E) be an undirected graph with V the vertex set and E the edge set. Suppose the size of the graph is n = |V|. Fix an ordering of V. Then, the space of graph signals can be identified with the vector space R n , with a graph signal x ∈ R n , which assigns its i-th component to the i-th vertex of G. By convention, signals are in column form, and x(i) is the i-th component of x. In GSP, the key notion is the graph shift operator. Though there are several choices for the graph shift operator, in our paper, we consider a common choice: L G , the Laplacian of G, defined by L G = D G -A G , where D G , A G are the degree matrix and adjacency matrix of G, respectively. The Laplacian L G is positive semi-definite and symmetric. By the spectral theorem, it has an eigendecomposition L G = U G Λ G U ⊤ G . In the decomposition, Λ G is a diagonal matrix, whose diagonal entries {λ 1 , . . . , λ n } are eigenvalues of L G . They are non-negative and we assume λ 1 ≤ . . . ≤ λ n . The associated eigenbasis {u 1 , . . . , u n } are the columns of U G . In GSP, an eigenvector with a small eigenvalue (and hence a small index) is considered to be smooth. The signal values of such a vector have small fluctuations across the edges of G. Given a graph signal x, its graph Fourier transform is x = U ⊤ G x, or equivalently, x(i) = ⟨x, u i ⟩, for 1 ≤ i ≤ n. The components x(i) of x are called the frequency components of x. Same as above, the signal x is smooth if x(i) has a small absolute value for large i. Quantitatively, we can define its total variation by T (x) = (vi,vj )∈E (x(i) -x(j)) 2 = x ⊤ L G x. (1) It is straightforward to compute that T (u i ) = λ i . This observation indicates that it is reasonable to use total variation as a measure of smoothness. Minimizing the total variation of graph signals has many applications in GSP as we have pointed out in Section 1.

2.2. STEP GRAPH SIGNALS

Let S be a finite set of numbers. A step graph signal with respect to (w.r.t.) S is a graph signal x such that all its components take values in S, i.e., x ∈ S n . Example 1. For the simplest example, consider the classical Heaviside function H on R defined by H(x) = 1, for x > 0 and H(x) = 0, for x ≤ 0. It is a non-smooth function, as it is not even continuous at x = 0. On the other hand, let G be the path graph with 2m + 1 nodes embedded on the real line by identifying the nodes of G with the integers in the interval [-m, m] . Then H induces a step graph signal h on G. Same as the Heaviside function H, the signal h should be considered to be a non-smooth graph signal. Step graph signals occur naturally in semi-supervised node classification tasks. In particular, if S is the set of all possible class labels, then the labels of the nodes of G form a step graph signal c w.r.t. S



For example, Laplacian (Tikhonov) regularization is proposed for noise removal inShuman et al. (2013)  and signal interpolation(Narang  et al., 2013). InChen et al. (2015), it is used in graph signal in-painting and anomaly detection. In Kalofolias (2016), the same technique is used for graph topology inference. However, for GNNs, it is remarked in Yang et al. (2021, Section 4.1.2) that "graph Laplacian regularization can hardly provide extra information that existing GNNs cannot capture". Therefore, a regularization scheme based on feature propagation is proposed. It is demonstrated to be effective by comparing with other methods such as Feng et al. (2021) and Deng & Zhu (2019) based on adversarial learning and Stretcu et al. (

