GRAPH REPRESENTATION LEARNING FOR MULTI-TASK SETTINGS: A META-LEARNING APPROACH

Abstract

Graph Neural Networks (GNNs) have become the state-of-the-art method for many applications on graph structured data. GNNs are a framework for graph representation learning, where a model learns to generate low dimensional node embeddings that encapsulate structural and feature-related information. GNNs are usually trained in an end-to-end fashion, leading to highly specialized node embeddings. While this approach achieves great results in the single-task setting, generating node embeddings that can be used to perform multiple tasks (with performance comparable to single-task models) is an open problem. We propose a novel representation learning strategy, based on meta-learning, capable of producing multi-task node embeddings. Our method avoids the difficulties arising when learning to perform multiple tasks concurrently by, instead, learning to quickly (i.e. with a few steps of gradient descent) adapt to multiple tasks singularly. We show that the embeddings produced by our method can be used to perform multiple tasks with comparable or higher performance than both single-task and multitask end-to-end models. Our method is model-agnostic and task-agnostic and can hence be applied to a wide variety of multi-task domains.

Original Embeddings

Transferred Embeddings Hamilton et al., 2017; Chami et al., 2020; Wu et al., 2020) . The encoder produces node embeddings (low-dimensional vectors capturing relevant structural and feature-related information about each node), while the decoder uses the embeddings to carry out the desired downstream task. The model is then trained in an end-to-end manner, giving rise to highly specialized node embeddings. While this can lead to state-of-the-art performance, it also affects the generalization and reusability of the embeddings. In fact, taking the encoder from a GNN trained on a given task and using its node embeddings to train a decoder for a different task leads to substantial performance loss, as shown in Figure 1 . The low transferability of node embeddings requires the use of one specialized encoder and one specialized decoder for each considered task. However, many practical machine learning applications operate in resource-constrained environments where being able to share part of the model architecture between tasks is of great importance. Furthermore, the training signal from multiple related tasks can lead to higher generalization. Nevertheless, making sure tasks do not negatively interfere with each other is not trivial (Standley et al., 2020) . The problem of learning models that can perform multiple tasks is known as Multi-Task Learning (MTL), and is an open area of research, attracting many researchers in the deep learning community (Vandenhende et al., 2020) . MTL on graphs has not received much attention, and no single model capable of performing the three most common graph-related tasks has yet been proposed. In fact, we notice that training a multi-head model with the classical procedure, i.e. by performing multiple tasks concurrently on each graph, and updating the parameters with some form of gradient descent to minimize the sum of the single-task losses, can lead to a performance loss with respect to single-task models. Thus, we propose a novel optimization-based meta-learning (Finn et al., 2017) procedure with a focus on representation learning that can generate node embeddings that generalize across tasks. Our proposed meta-learning procedure produces task-generalizing node embeddings by not aiming at a setting of the parameters that can perform multiple tasks concurrently (like a classical method would do), or to a setting that allows fast multi-task adaptation (like traditional meta-learning), but to a setting that can easily be adapted to perform each of the tasks singularly. In fact, our metalearning procedure aims at a setting of the parameters where a few steps of gradient descent on a given task, can lead to good performance on that task, hence removing the burden of directly learning to solve multiple tasks concurrently. We summarize our contributions as follows: • We propose a novel method for learning representations that can generalize to multiple tasks. We apply it to the challenging setting of graph MTL, and show that a GNN trained with our method produces higher quality node embeddings with respect to classical end-toend training procedures. Our method is based on meta-learning and is model-agnostic and task-agnostic, which makes it easily applicable to a wide range of multi-task domains. • To the best of our knowledge, we are the first to propose a GNN model generating a single set of node embeddings that can be used to perform the three most common graph-related tasks (graph classification, node classification, and link prediction). In particular, our embeddings lead to comparable or higher performance with respect to single-task models even when used as input to a simple linear classifier. • We show that the episodic training strategy at the base of our proposed meta-learning procedure leads to better node embeddings even for models trained on a single task. This unexpected finding provides interesting directions that we believe can be useful to the whole deep representation learning community.

2. RELATED WORK

GNNs, MTL, and meta-learning are very active areas of research. We highlight works that are at the intersections of these subjects, and point the interested reader to comprehensive reviews of each field. To the best of our knowledge there is no work using meta-learning for graph MTL, or proposing a GNN performing graph classification, node classification, and link prediction concurrently. 



Figure 1: Performance drop when transferring node embeddings between tasks on (a) Node Classification (NC), (b)Graph Classification (GC), and (c) Link Prediction (LP) on the ENZYMES dataset. On the horizontal axis, "x ->y" indicates that the embeddings obtained from a model trained on task x are used to train a network for task y.

Neural Networks. GNNs have a long history(Scarselli et al., 2009), but in the past few years the field has grown exponentially; we refer the reader toChami et al. (2020); Wu et al. (2020)   for a thorough review of the field. The first popular GNN approaches were based on filters in the graph spectral domain(Bronstein et al., 2017), and presented many challenges including high computational complexity.Defferrard et al. (2016)  introduced ChebNet, which uses Chebyshev polynomials to produce localized and efficient filters in the graph spectral domain. Graph Convolutional Networks (Kipf & Welling, 2017) then introduced a localized first-order approximation of spectral graph convolutions which was then extended to include attention mechanisms(Veličković  et al., 2018).Recently, Xu et al. (2019)  provides theoretical proofs for the expressivity of GNNs.

