EMPOWERING GRAPH REPRESENTATION LEARNING WITH TEST-TIME GRAPH TRANSFORMATION

Abstract

As powerful tools for representation learning on graphs, graph neural networks (GNNs) have facilitated various applications from drug discovery to recommender systems. Nevertheless, the effectiveness of GNNs is immensely challenged by issues related to data quality, such as distribution shift, abnormal features and adversarial attacks. Recent efforts have been made on tackling these issues from a modeling perspective which requires additional cost of changing model architectures or re-training model parameters. In this work, we provide a data-centric view to tackle these issues and propose a graph transformation framework named GTRANS which adapts and refines graph data at test time to achieve better performance. We provide theoretical analysis on the design of the framework and discuss why adapting graph data works better than adapting the model. Extensive experiments have demonstrated the effectiveness of GTRANS on three distinct scenarios for eight benchmark datasets where suboptimal data is presented. Remarkably, GTRANS performs the best in most cases with improvements up to 2.8%, 8.2% and 3.8% over the best baselines on three experimental settings. Code is released at https://github.com/ChandlerBang/GTrans.

1. INTRODUCTION

Graph representation learning has been at the center of various real-world applications, such as drug discovery (Duvenaud et al., 2015; Guo et al., 2022 ), recommender systems (Ying et al., 2018; Fan et al., 2019; Sankar et al., 2021 ), forecasting (Tang et al., 2020; Derrow-Pinion et al., 2021) and outlier detection (Zhao et al., 2021a; Deng & Hooi, 2021) . In recent years, there has been a surge of interest in developing graph neural networks (GNNs) as powerful tools for graph representation learning (Kipf & Welling, 2016a; Veličković et al., 2018; Hamilton et al., 2017; Wu et al., 2019) . Remarkably, GNNs have achieved state-of-the-art performance on numerous graph-related tasks including node classification, graph classification and link prediction (Chien et al., 2021; You et al., 2021; Zhao et al., 2022b) . Despite the enormous success of GNNs, recent studies have revealed that their generalization and robustness are immensely challenged by the data quality (Jin et al., 2021b; Li et al., 2022) . In particular, GNNs can behave unreliably in scenarios where sub-optimal data is presented: 1. Distribution shift (Wu et al., 2022a; Zhu et al., 2021a) . GNNs tend to yield inferior performance when the distributions of training and test data are not aligned (due to corruption or inconsistent collection procedure of test data). 2. Abnormal features (Liu et al., 2021a) . GNNs suffer from high classification errors when data contains abnormal features, e.g., incorrect user profile information in social networks. 3. Adversarial structure attack (Zügner et al., 2018; Li et al., 2021) . GNNs are vulnerable to imperceptible perturbations on the graph structure which can lead to severe performance degradation. To tackle these problems, significant efforts have been made on developing new techniques from the modeling perspective, e.g., designing new architectures and employing adversarial training strategies (Xu et al., 2019; Wu et al., 2022a) . However, employing these methods in practice may be

