LEARNING DISCRETE ADAPTIVE RECEPTIVE FIELDS FOR GRAPH CONVOLUTIONAL NETWORKS Anonymous

Abstract

Different nodes in a graph neighborhood generally yield different importance. In previous work of Graph Convolutional Networks (GCNs), such differences are typically modeled with attention mechanisms. However, as we prove in our paper, soft attention weights suffer from over-smoothness in large neighborhoods. To address this weakness, we introduce a novel framework of conducting graph convolutions, where nodes are discretely selected among multi-hop neighborhoods to construct adaptive receptive fields (ARFs). ARFs enable GCNs to get rid of the over-smoothness of soft attention weights, as well as to efficiently explore long-distance dependencies in graphs. We further propose GRARF (GCN with Reinforced Adaptive Receptive Fields) as an instance, where an optimal policy of constructing ARFs is learned with reinforcement learning. GRARF achieves or matches state-of-the-art performances on public datasets from different domains. Our further analysis corroborates that GRARF is more robust than attention models against neighborhood noises.

1. INTRODUCTION

After a series of explorations and modifications (Bruna et al., 2014; Kipf & Welling, 2017; Velickovic et al., 2017; Xu et al., 2019; Li et al., 2019; Abu-El-Haija et al., 2019) , Graph Convolutional Networks (GCNs)foot_0 have gained considerable attention in the machine learning community. Typically, a graph convolutional model can be abstracted as a message-passing process (Gilmer et al., 2017) -nodes in the neighborhood of a central node are regarded as contexts, who individually pass their messages to the central node via convolutional layers. The central node then weighs and transforms these messages. This process is recursively conducted as the depth of network increases. 2 Neighborhood convolutions proved to be widely useful on various graph data. However, some inconveniences also exist in current GCNs. While different nodes may yield different importance in the neighborhood, early GCNs (Kipf & Welling, 2017; Hamilton et al., 2017) did not discriminate contexts in their receptive fields. These models either treated contexts equally, or used normalized edge weights as the weights of contexts. As a result, such implementations failed to capture critical contexts -contexts that pose greater influences on the central node, close friends among acquaintances, for example. Graph Attention Networks (GATs) (Velickovic et al., 2017) resolved this problem with attention mechanisms (Bahdanau et al., 2015; Vaswani et al., 2017) . Soft attention weights were used to discriminate importance of contexts, which allowed the model to better focus on relevant contexts to make decisions. With impressive performances, GATs became widely used in later generations of GCNs including (Li et al., 2019; Liu et al., 2019) . However, we observe that using soft attention weights in hierarchical convolutions does not fully solve the problem. Firstly, we will show as Proposition 1 that under common conditions, soft attention weights almost surely approach 0 as the neighborhood sizes increase. This smoothness 3 hinders the discrimination of context importance in large neighborhoods. Secondly, we will show by experiments



We use the name GCN for a class of deep learning approaches where information is convolved among graph neighborhoods, including but not limited to the vanilla GCN (Kipf & Welling, 2017).2 We use the term contexts to denote the neighbor nodes, and receptive field to denote the set of contexts that the convolutions refer to.3 The smoothness discussed in our paper is different to that in(Li et al., 2018), i.e. the phenomenon that representations of nodes converge in very deep GNNs.

