COPULAGNN: TOWARDS INTEGRATING REPRESEN-TATIONAL AND CORRELATIONAL ROLES OF GRAPHS IN GRAPH NEURAL NETWORKS

Abstract

Graph-structured data are ubiquitous. However, graphs encode diverse types of information and thus play different roles in data representation. In this paper, we distinguish the representational and the correlational roles played by the graphs in node-level prediction tasks, and we investigate how Graph Neural Network (GNN) models can effectively leverage both types of information. Conceptually, the representational information provides guidance for the model to construct better node features; while the correlational information indicates the correlation between node outcomes conditional on node features. Through a simulation study, we find that many popular GNN models are incapable of effectively utilizing the correlational information. By leveraging the idea of the copula, a principled way to describe the dependence among multivariate random variables, we offer a general solution. The proposed Copula Graph Neural Network (CopulaGNN) can take a wide range of GNN models as base models and utilize both representational and correlational information stored in the graphs. Experimental results on two types of regression tasks verify the effectiveness of the proposed method 1 .

1. INTRODUCTION

Graphs, as flexible data representations that store rich relational information, have been commonly used in data science tasks. Machine learning methods on graphs (Chami et al., 2020) , especially Graph Neural Networks (GNNs), have attracted increasing interest in the research community. They are widely applied to real-world problems such as recommender systems (Ying et al., 2018) , social network analysis (Li et al., 2017) , and transportation forecasting (Yu et al., 2017) . Among the heterogeneous types of graph-structured data, it is worth noting that graphs usually play diverse roles in different contexts, different datasets, and different tasks. Some of the roles are relational, as a graph may indicate certain statistical relationships of connected observations; some are representational, as the topological structure of a graph may encode important features/patterns of the data; some are even causal, as a graph may reflect causal relationships specified by domain experts. It is crucial to recognize the distinct roles of a graph in order to correctly utilize the signals in the graph-structured data. In this paper, we distinguish the representational role and the correlational role of graphs in the context of node-level (semi-)supervised learning, and we investigate how to design better GNNs that take advantage of both roles. In a node-level prediction task, the observed graph in the data may relate to the outcomes of interest (e.g., node labels) in multiple ways. Conceptually, we call that the graph plays a representational



The code is available at https://github.com/jiaqima/CopulaGNN. 1

