ROBUST GRAPH REPRESENTATION LEARNING VIA PREDICTIVE CODING

Abstract

Graph neural networks have recently shown outstanding results in diverse types of tasks in machine learning, providing interdisciplinary state-of-the-art performance on structured data. However, they have been proved to be vulnerable to imperceptible adversarial attacks and shown to be unfit for out-of-distribution generalisation. Here, we address this problem by introducing a novel message-passing scheme based on the theory of predictive coding, an energy-based alternative to back-propagation that has its roots in neuroscience. As both graph convolution and predictive coding can be seen as low-pass filtering mechanisms, we postulate that predictive coding adds a second efficient filter to the messaging passing process which enhances the robustness of the learned representation. Through an extensive set of experiments, we show that the proposed model attains comparable performance to its graph convolution network counterpart, delivering strictly better performance on inductive tasks. Most importantly, we show that the energy minimization enhances the robustness of the produced presentation and can be leveraged to further calibrate our models and provide representations that are more robust against advanced graph adversarial attacks.

1. INTRODUCTION

Extracting information from structured data has always been an active area of research in machine learning. This, mixed with the rise of deep neural networks as the main model of the field, has led to the development of graph neural networks (GNNs). These models have achieved results in diverse types of tasks in machine learning, providing interdisciplinary state-of-the-art performance in areas such as e-commerce and financial fraud detection (Zhang et al., 2022; Wang et al., 2019) , drug and advanced material discovery (Bongini et al., 2021; Zhao et al., 2021; Xiong et al., 2019 ), recommender systems (Wu et al., 2021 ), and social networks (Liao et al., 2018) . Their power lies in a message passing mechanism among vertices of a graph, performed iteratively at different levels of hierarchy of a deep network. Popular examples of these models are graph convolutional networks (GCNs) (Welling & Kipf, 2016) , and graph attention networks (Veličković et al., 2017) . Despite the aforementioned results and performance obtained in the last years, these models have been shown to lack robustness and to be vulnerable against carefully-crafted adversarial attacks (Zügner et al., 2018; Günnemann, 2022) . They have in fact been proved to be vulnerable and susceptible to imperceptible adversarial attacks (Dai et al., 2018; Zügner & Günnemann, 2019; Günnemann, 2022) and unfit for out-of-distribution generalisation (Hu et al., 2020) . This prevents GNNs from being used in critical tasks, where misleading predictions may lead to serious consequences, or maliciously manipulated signals may lead to the loss of a large amount of money. More generally, robustness has always been a problem of deep learning models, highlighted by the famous example of a panda picture being classified as a gibbon with almost perfect confidence after the addition of a small amount of adversarial noise (Akhtar & Mian, 2018) . To address this problem, an influential work has shown that it is possible to treat a classifier as an energy-based generative model, and train the joint distribution of a data point and its label to improve robustness and calibration Grathwohl et al. (2019) . Justified by this result, this work studies the robustness of GNNs trained using an energy-based training algorithm called predictive coding (PC), originally developed to model information processing in hierarchical generative networks present in the neocortex (Rao & Ballard, 1999) . Despite not being initially developed to perform machine learning tasks, recent works have been analyzing possible applications of PC in deep learning. This is motivated by inter-esting properties of PC, as well as its surprising similarities with BP: when used to train classifiers, PC is able to approximate the weight update of BP on any neural network (Whittington & Bogacz, 2017; Millidge et al., 2021) , and a variation of it is able to exactly replicate the weight update of BP (Song et al., 2020; Salvatori et al., 2022b) . It has been shown that PC is able to train powerful image classifiers (He et al., 2016) , is able to perform generation tasks (Ororbia & Kifer, 2022), continual learning (Ororbia et al., 2020) , associative memories (Salvatori et al., 2021) , reinforcement learning (Ororbia & Mali, 2022) and train neural networks with any structure (Salvatori et al., 2022a) . In this work, we extend the study of PC to structure data, and show that PC is naturally able to train robust classifiers due to its energy-based formulation. To show that, we first show that PC is able to match the performance of BP on small and medium tasks, hence showing that the results on image classification (Whittington & Bogacz, 2017) extend to graph data, and then showing the improved calibration and robustness against adversarial attacks of models trained this way. Summarizing, our contributions are briefly as follows: • We introduce and formalise a new class of message passing models, which we call graph predictive coding networks (GPCN). We show that these models achieve performance comparable with equivalent GCNs trained using BP in multiple tasks, and propose a general recipe to train any message-passing GNN with PC. • We empirically show that GPCNs are less confident in their prediction, and hence produce models that are better calibrated than equivalent GCNs. Our results show large improvements in expected calibration error (ECE) and maximumum calibration error (MCE) on the Cora, Citesser, and Pubmed datasets. This proves the ability of GPCN to estimate the likelihood close to the true probability of a given data point and capacity to better capture uncertainty in its prediction. • We further conduct an extensive robustness evaluation using advanced graph adversarial attacks on various dimensions: poisoning and evasion, global and targeted, direct and indirect. In these evaluations, GPCNs outperforms its GCNs counterpert on all kinds of evasion attacks and gains over 10% improvement on poisoning attacks on the most corrupted graph data and obtain a better performance on various datasets than other complex methods that use attention mechanisms (Veličković et al., 2017) , or tricks designed to make the model more robust (Zhu et al., 2019) .

2. PRELIMINARIES

In this section, we review the general framework of message-passing neural networks (MPNNs) (Gilmer et al., 2017) . Let us assume a graph G = (V, E, X) with a set of nodes V , a set of edges E, and a set of attributes or properties of each node in the graph, described by a matrix X ∈ R |V |xd . The idea behind MPNNs is to begin with certain initial node characteristics and iteratively modify them over the course of k iterations using information gained from neighbours of each node, hence message passing, according to a multilayer structure. Let the representation of a node u ∈ V at layer k be h u . This representation is then iteratively modified by as follows: h (k) u = update (k) h (k-1) u , aggregate (t) h (k-1) v | v ∈ N (u) where N (u) is a set of neighbors of node u, and update and aggregate are differentiable functions (i.e., neural networks). The aggregate function has to be a permutation invariant to maintain symmetries necessary when operating on graph data such as locality and invariance properties. In this work, we will mainly focus on graph convolutional networks (GCNs). Here, the aggregation function is a weighted combination of neighbour characteristics with predetermined fixed weights, and aggregate function is a linear transformation. We chose GCNs as they tend to be lightweight and scale conveniently for large graphs.

2.1. PREDICTIVE CODING NETWORKS

Predictive coding networks (PCNs) were first introduced for unsupervised feature learning (Rao & Ballard, 1999) , and later extended to supervised learning (Whittington & Bogacz, 2017) . Here, we

