MULTI-TASK SELF-SUPERVISED GRAPH NEURAL NET-WORKS ENABLE STRONGER TASK GENERALIZATION

Abstract

Self-supervised learning (SSL) for graph neural networks (GNNs) has attracted increasing attention from the graph machine learning community in recent years, owing to its capability to learn performant node embeddings without costly label information. One weakness of conventional SSL frameworks for GNNs is that they learn through a single philosophy, such as mutual information maximization or generative reconstruction. When applied to various downstream tasks, these frameworks rarely perform equally well for every task, because one philosophy may not span the extensive knowledge required for all tasks. To enhance the task generalization across tasks, as an important first step forward in exploring fundamental graph models, we introduce PARETOGNN, a multi-task SSL framework for node representation learning over graphs. Specifically, PARETOGNN is self-supervised by manifold pretext tasks observing multiple philosophies. To reconcile different philosophies, we explore a multiple-gradient descent algorithm, such that PARE-TOGNN actively learns from every pretext task while minimizing potential conflicts. We conduct comprehensive experiments over four downstream tasks (i.e., node classification, node clustering, link prediction, and partition prediction), and our proposal achieves the best overall performance across tasks on 11 widely adopted benchmark datasets. Besides, we observe that learning from multiple philosophies enhances not only the task generalization but also the single task performances, demonstrating that PARETOGNN achieves better task generalization via the disjoint yet complementary knowledge learned from different philosophies.

1. INTRODUCTION

Graph-structured data is ubiquitous in the real world (McAuley et al., 2015; Hu et al., 2020) . To model the rich underlying knowledge for graphs, graph neural networks (GNNs) have been proposed and achieved outstanding performance on various tasks, such as node classification (Kipf & Welling, 2016a; Hamilton et al., 2017) , link prediction (Zhang & Chen, 2018; Zhao et al., 2022b) , node clustering (Bianchi et al., 2020; You et al., 2020b) , etc. These tasks form the archetypes of many real-world practical applications, such as recommendation systems (Ying et al., 2018; Fan et al., 2019) , predictive user behavior models (Pal et al., 2020; Zhao et al., 2021a; Zhang et al., 2021a) . Existing works for graphs serve well to make progress on narrow experts and guarantee their effectiveness on mostly one task or two. However, given a graph learning framework, its promising performance on one task may not (and usually does not) translate to competitive results on other tasks. Consistent task generalization across various tasks and datasets is a significant and well-studied research topic in other domains (Wang et al., 2018; Yu et al., 2020) . Results from the Natural Language Processing (Radford et al., 2019; Sanh et al., 2021) and Computer Vision (Doersch & Zisserman, 2017; Ni et al., 2021) have shown that models enhanced by self-supervised learning (SSL) over multiple pretext tasks observing diverse philosophies can achieve strong task generalization and learn intrinsic patterns that are transferable to multiple downstream tasks. Intuitively, SSL over multiple pretext tasks greatly reduces the risk of overfitting (Baxter, 1997; Ruder, 2017) , because learning intrinsic patterns that well-address difficult pretext tasks is non-trivial for only one set of parameters. Moreover, gradients from multiple objectives regularize the learning model against extracting task-irrelevant information (Ren & Lee, 2018; Ravanelli et al., 2020) , so that the model can learn multiple views of one training sample. Nonetheless, current state-of-the-art graph SSL frameworks are mostly introduced according to only one pretext task with a single philosophy, such as mutual information maximization (Velickovic et al., 2019; Zhu et al., 2020; Thakoor et al., 2022) , whitening decorrelation (Zhang et al., 2021b) , and generative reconstruction (Hou et al., 2022) . Though these methods achieve promising results in some circumstances, they usually do not retain competitive performance for all downstream tasks across different datasets. For example, DGI (Velickovic et al., 2019) , grounded on mutual information maximization, excels at the partition prediction task but underperforms on node classification and link prediction tasks. Besides, GRAPHMAE (Hou et al., 2022) , based on feature reconstruction, achieves strong performance for datasets with powerful node features (e.g., graph topology can be inferred simply by node features (Zhang et al., 2021d) ), but suffers when node features are less informative, which is empirically demonstrated in this work. To bridge this research gap, we ask: How to combine multiple philosophies to enhance task generalization for SSL-based GNNs? A very recent work, AUTOSSL (Jin et al., 2022) , explores this research direction by reconciling different pretext tasks by learning different weights in a joint loss function so that the node-level pseudo-homophily is promoted. This approach has two major drawbacks: (i) Not all downstream tasks benefit from the homophily assumption. In experimental results shown by Jin et al. ( 2022), we observe key pretext tasks (e.g., DGI based on mutual information maximization) being assigned zero weight. However, our empirical study shows that the philosophies behind these neglected pretext tasks are essential for the success of some downstream tasks, and this phenomenon prevents GNNs from achieving better task generalization. (ii) In reality, many graphs do not follow the homophily assumption (Pei et al., 2019; Ma et al., 2021) . Arguably, applying such an inductive bias to heterophilous graphs is contradictory and might yield sub-optimal performance. In this work, we adopt a different perspective: we remove the reliance on the graph or task alignment with homophily assumptions while self-supervising GNNs with multiple pretext tasks. During the self-supervised training of our proposed method, given a single graph encoder, all pretext tasks are simultaneously optimized and dynamically coordinated. We reconcile pretext tasks by dynamically assigning weights that promote the Pareto optimality (Désidéri, 2012) , such that the graph encoder actively learns knowledge from every pretext task while minimizing conflicts. We call our method PARETOGNN. Overall, our contributions are summarized as follows: • We investigate the problem of task generalization on graphs in a more rigorous setting, where a good SSL-based GNN should perform well not only over different datasets but also at multiple distinct downstream tasks simultaneously. We evaluate state-of-the-art graph SSL frameworks in this setting and unveil their sub-optimal task generalization. • To enhance the task generalization across tasks, as an important first step forward in exploring fundamental graph models, we first design five simple and scalable pretext tasks according to philosophies proven to be effective in the SSL literature and propose PARETOGNN, a multi-task SSL framework for GNNs. PARETOGNN is simultaneously self-supervised by these pretext tasks, which are dynamically reconciled to promote the Pareto optimality, such that the graph encoder actively learns knowledge from every pretext task while minimizing potential conflicts. • We evaluate PARETOGNN along with 7 state-of-the-art SSL-based GNNs on 11 acknowledged benchmarks over 4 downstream tasks (i.e., node classification, node clustering, link prediction, and partition prediction). Our experiments show that PARETOGNN improves the overall performance by up to +5.3% over the state-of-the-art SSL-based GNNs. Besides, we observe that PARE-TOGNN achieves SOTA single-task performance, proving that PARETOGNN achieves better task generalization via the disjoint yet complementary knowledge learned from different philosophies.

2. MULTI-TASK SELF-SUPERVISED LEARNING VIA PARETOGNN

In this section, we illustrate our proposed multi-task self-supervised learning framework for GNNs, namely PARETOGNN. As Figure 1 illustrates, PARETOGNN is trained with different SSL tasks

availability

Our code is publicly available at https://github.com/jumxglhf/ParetoGNN.

