ENERGY-BASED OUT-OF-DISTRIBUTION DETECTION FOR GRAPH NEURAL NETWORKS

Abstract

Learning on graphs, where instance nodes are inter-connected, has become one of the central problems for deep learning, as relational structures are pervasive and induce data inter-dependence which hinders trivial adaptation of existing approaches that assume inputs to be i.i.d. sampled. However, current models mostly focus on improving testing performance of in-distribution data and largely ignore the potential risk w.r.t. out-of-distribution (OOD) testing samples that may cause negative outcome if the prediction is overconfident on them. In this paper, we investigate the under-explored problem, OOD detection on graph-structured data, and identify a provably effective OOD discriminator based on an energy function directly extracted from graph neural networks trained with standard classification loss. This paves a way for a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSAFE. It also has nice theoretical properties that guarantee an overall distinguishable margin between the detection scores for in-distribution and OOD samples, which, more critically, can be further strengthened by a learning-free energy belief propagation scheme. For comprehensive evaluation, we introduce new benchmark settings that evaluate the model for detecting OOD data from both synthetic and real distribution shifts (crossdomain graph shifts and temporal graph shifts). The results show that GNNSAFE achieves up to 17.0% AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area. The codes are available at https://github.com/qitianwu/GraphOOD-GNNSafe.

1. INTRODUCTION

Real-world applications often require machine learning systems to interact with an open world, violating the common assumption that testing and training distributions are identical. This urges the community to devote increasing efforts on how to enhance models' generalization (Muandet et al., 2013) and reliability (Liang et al., 2018) w.r.t. out-of-distribution (OOD) data. However, most of current approaches are built on the hypothesis that data samples are independently generated (e.g., image recognition where instances have no interaction). Such a premise hinders these models from readily adapting to graph-structured data where node instances have inter-dependence (Zhao et al., 2020; Ma et al., 2021; Wu et al., 2022a) . Out-of-Distribution Generalization. To fill the research gap, a growing number of recent studies on graph-related tasks move beyond the single target w.r.t. in-distribution testing performance and turn more attentions to how the model can generalize to perform well on OOD data. One of the seminal works (Wu et al., 2022a) formulates the graph-based OOD generalization problem and leverages (causal) invariance principle for devising a new domain-invariant learning approach for graph data. Different from grid-structured and independently generated images, distribution shifts concerning graph-structured data can be more complicated and hard to address, which often requires graph-specific technical originality. For instances, Yang et al. (2022c) proposes to identify invariant substructures, i.e., a subset of nodes with causal effects to labels, in input graphs to learn stable predictive relations across environments, while Yang et al. (2022b) resorts to an analogy of thermodynamics diffusion on graphs to build a principled knowledge distillation model for geometric knowledge transfer and generalization. Out-of-Distribution Detection. One critical challenge that stands in the way for trustworthy AI systems is how to equip the deep learning models with enough reliability of realizing what they don't know, i.e., detecting OOD data on which the models are expected to have low confidence (Amodei et al., 2016; Liang et al., 2018) . This is fairly important when it comes to safety-critical applications such as medical diagnosis (Kukar, 2003) , autonomous driving (Dai & Van Gool, 2018) , etc. Such a problem is called out-of-distribution detection in the literature, which aims at discriminating OOD data from the in-distribution one. While there are a surge of recent works exploring various effective methods for OOD detection (Hendrycks & Gimpel, 2016; Bevandić et al., 2018; DeVries & Taylor, 2018; Hein et al., 2019; Hsu et al., 2020; Sun et al., 2021; Bitterwolf et al., 2022) , these models again tacitly assume inputs to be independently sampled and are hard to be applied for graph data. In this paper, we investigate into out-of-distribution detection for learning on graphs, where the model needs to particularly handle data inter-dependence induced by the graph. Our methodology is built on graph neural networks (GNNs) as an encoder for node representation/prediction that accommodates the structural information. As an important step to enhance the reliability of GNN models against OOD testing instances, we identify an intrinsic OOD discriminator from a GNN classifier trained with standard learning objective. The key insight of our work is that standard GNN classifiers possess inherent good capability for detecting OOD samples (i.e., what they don't know) from unknown testing observations, with details of the model as described below. • Simplicity and Generality: The out-of-distribution discriminator in our model is based on an energy function that is directly extracted through simple transformation from the predicted logits of a GNN classifier trained with standard supervised classification loss on in-distribution data. Therefore, our model can be efficiently deployed in practice, i.e., it does not require training a graph generative model for density estimation or any extra OOD discriminator. Also, the model keeps a general form, i.e., the energy-based detector is agnostic to GNNs' architectures and can in principle enhance the reliability for arbitrary off-the-shelf GNNs (or more broadly, graph Transformers) against OOD data. • Theoretical Soundness: Despite simplicity, our model can be provably effective for yielding distinguishable scores for in-distribution and OOD inputs, which can be further reinforced by an energy-based belief propagation scheme, a learning-free approach for boosting the detection consensus over graph topology. We also discuss how to properly incorporate an auxiliary regularization term when training data contains additional OOD observation as outlier exposure, with double guarantees for preserving in-distribution learning and enhancing out-of-distribution reliability. • Practical Efficacy: We apply our model to extensive node classification datasets of different properties and consider various OOD types. When training with standard cross-entropy loss on pure in-distribution data and testing on OOD detection at inference time, our model consistently outperforms SOTA approaches with an improvement of up to 12.9% on average AUROC; when training with auxiliary OOD exposure data as regularization and testing on new unseen OOD data, our model outperforms the strong competitors with up to 17.0% improvement on average AUROC.

2. BACKGROUND

Predictive tasks on graphs. We consider a set of instances with indices i ∈ {1, 2, • • • , N } = I whose generation process involves inter-dependence among each other, represented by an observed graph G = (V, E) where V = {i|1 ≤ i ≤ N } denotes the node set containing all the instances and E = {e ij } denotes the edge set. The observed edges induce an adjacency matrix A = [a ij ] N ×N where a ij = 1 if there exists an edge connecting nodes i and j and 0 otherwise. Moreover, each instance i has an input feature vector denoted by x i ∈ R D and a label y i ∈ {1, • • • , C} where D is the input dimension and C denotes the class number. The N instances are partially labeled and we define I s (resp. I u ) as the labeled (resp. unlabeled) node set, i.e., I = I s ∪ I u . The goal of standard (semi-)supervised learning on graphs is to train a node-level classifier f with Ŷ = f (X, A), where X = [x i ] i∈I and Ŷ = [ŷ i ] i∈I , that predicts the labels for in-distribution instances in I u . Out-of-distribution detection. Besides decent predictive performance on in-distribution testing nodes (sampled from the same distribution as training data), we expect the learned classifier is capable

funding

Junchi Yan who is also affiliated with Shanghai AI Lab. The work was in part supported by National Key Research and Development Program of China (2020AAA0107600), National Natural Science Foundation of China (62222607), STCSM (22511105100).

