GRAPH MIXUP WITH SOFT ALIGNMENTS

Abstract

We study graph data augmentation by mixup, which has been used successfully on images. A key operation of mixup is to compute a convex combination of a pair of inputs. This operation is straightforward for grid-like data, such as images, but challenging for graph data. The key difficulty lies in the fact that different graphs typically have different numbers of nodes, and thus there lacks a node-level correspondence between graphs. In this work, we propose a simple yet effective mixup method for graph classification by soft alignments. Specifically, given a pair of graphs, we explicitly obtain node-level correspondence via computing a soft assignment matrix to match the nodes between two graphs. Based on the soft assignments, we transform the adjacency and node feature matrices of one graph, so that the transformed graph is aligned with the other graph. In this way, any pair of graphs can be mixed directly to generate an augmented graph. We conduct systematic experiments to show that our method can improve the performance and generalization of graph neural networks (GNNs) on various graph classification tasks. In addition, we show that our method can increase the robustness of GNNs against noisy labels.

1. INTRODUCTION

Data augmentations aim at generating new training samples by applying certain transformations on the original samples. For example, applying rotations and flipping on images generates new images with the same labels. Many empirical results have shown that data augmentations can help improve the invariance and thus the generalization abilities of deep learning models. While data augmentations are relatively straightforward for grid-like data, such as images, they are particularly challenging for graph data. A key difficulty lies in the lack of simple graph operations that preserve the original labels, such as rotations on images. Most existing graph augmentation methods, such as DropEdge (Rong et al., 2019 ), DropNode (Feng et al., 2020 ) and Subgraph (You et al., 2020) , assume labels are the same after simple operations, such as drop a random node or edge, on training graphs. On one hand, such simple operations may not be able to generate sufficiently diverse new samples. On the other hand, although the operations are simple, they are not guaranteed to preserve the original labels. Recently, mixup (Zhang et al., 2017) has been shown to be an effective method for image data augmentation. In particular, mixup generates new samples and corresponding labels by performing convex combinations of a pair of original samples and labels. A key challenge of applying mixup on graphs lies in the fact that different graphs typically have different numbers of nodes. Even for graphs with the same number of nodes, there lacks a node-level correspondence that is required to perform mixup. Several existing graph mixup methods (Han et al., 2022; Park et al., 2022; Yoo et al., 2022; Guo & Mao, 2021) use various tricks to sidestep this problem. For example, ifMixup (Guo & Mao, 2021) uses a random node order to align graphs and then interpolate node feature matrices and adjacency matrices. Han et al. (2022) proposes to learn a Graphon for each class and performs mixup in Graphon space. Graph Transplant (Park et al., 2022) and SubMix (Yoo et al., 2022) connect subgraphs from different input graphs to generate new graphs. However, none of these methods explicitly models the node-level correspondence among different graphs and perform mixup as in the case of images. A natural question is raised: Can we conduct image-like mixup for graphs with node-level correspondence to preserve critical information? In this work, we provide an affirmative answer to this question and propose a simple yet effective graph mixup approach via soft alignments. A key design principle of our method is to explicitly and automatically model the node-level correspondence (i.e., soft alignment matrix) between two graphs when performing mixup, thereby avoiding random matching noise and preserving critical graph components in the augmented data. Given a pair of graphs, we first obtain node-level correspondence by computing a soft assignment matrix that measures the similarity of nodes across two graphs based on node features and graph topology. Then this soft alignment matrix guides the graph transformation, including adjacency matrix and node feature matrix transformation, to generate the aligned graph with the same number of nodes and node order as the other graph. In this way, we can interpolate the adjacency matrices and node feature matrices of any graph pairs to generate synthetic graphs for training. We conduct comprehensive experiments to evaluate our method. Results show that our method can improve the performance and generalization of GNNs on various graph classification tasks. In addition, results show that our method increases the robustness of GNNs against noisy labels.

2. PRELIMINARIES 2.1 GRAPH CLASSIFICATION WITH GRAPH NEURAL NETWORKS

In this work, we study the problem of graph classification. Let G = (A, X) represent a graph with n nodes. Here, A ∈ {0, 1} n×n is the adjacency matrix, and A i,j = 1 if and only if there exists an edge between nodes i and j. X = [x 1 , • • • , x n ] T ∈ R n×d is the node feature matrix, where each row x i ∈ R d represents the d-dimensional feature of node i. Given a set of labeled graphs, graph classification tasks aim to learn a model that predicts the class label y of each graph G. Recently, GNNs have shown remarkable performance in various graph classification problems. GNNs usually use a message passing scheme to learn node representations in graphs. Let H (l) = [h (l) 1 , • • • , h n ] T ∈ R n×d l denote the node representations at the l-th layer of a message passing GNN model, where each row h (l) i ∈ R d l is the d l -dimensional representation of node i. Formally, one message passing layer can be described as H (l) = UPDATE(H (l-1) , MSG(H (l-1) , A)), where MSG(•) is a message propagation function that aggregates the messages from neighbors of each node, and UPDATE(•) is a function that updates H (l-1) to H (l) using the aggregated messages. The node representations H (0) are initialized as X. After L layers of such message passing, the graph representation h G is obtained by applying a global pooling function READOUT over node representations as h G = READOUT(H (L) ). (2) Given the graph representation h G , a multi-layer perceptron (MLP) model computes the probability that graph G belongs to each class. Despite the success of GNNs, a primary challenge in graph classification tasks is the lack of labeled data due to expensive annotations. In this paper, we focus on designing a pairwise graph data augmentation method to generate more training data, thereby improving the performance of GNNs.

2.2. MIXUP

Mixup (Zhang et al., 2017) is a data augmentation method for regular, grid-like, and Euclidean data such as images and tabular data. The idea of mixup is to linearly interpolate random pairs of data samples and their corresponding labels. Given a random pair of samples x i and x j and their corresponding one-hot class labels y i and y j , Mixup constructs training data as x = λx i + (1 -λ)x j , ỹ = λy i + (1 -λ)y j , where λ ∼ Beta(α, α) is a random variable drawn from the Beta distribution parameterized with α. Mixup and its variants (Yang et al., 2020; Yun et al., 2019; Berthelot et al., 2019) have shown great success in improving the generalization and robustness of deep neural networks in image recognition and natural language processing. However, mixing graphs is a challenging problem due to the irregular and non-Euclidean structure of graph data. Specifically, the number of nodes varies in different graphs, making it infeasible to apply the mixing rule in Eq. ( 3) directly. Even if two

