PARAMETERIZED PSEUDO-DIFFERENTIAL OPERA-TORS FOR GRAPH CONVOLUTIONAL NEURAL NET-WORKS

Abstract

We present a novel graph convolutional layer that is fast, conceptually simple, and provides high accuracy with reduced overfitting. Based on pseudo-differential operators, our layer operates on graphs with relative position information available for each pair of connected nodes. We evaluate our method on a variety of supervised learning tasks, including superpixel image classification using the MNIST, CIFAR10, and CIFAR100 superpixel datasets, node correspondence using the FAUST dataset, and shape classification using the ModelNet10 dataset. The new layer outperforms multiple recent architectures on superpixel image classification tasks using the MNIST and CIFAR100 superpixel datasets and performs comparably with recent results on the CIFAR10 superpixel dataset. We measure test accuracy without bias to the test set by selecting the model with the best training accuracy. The new layer achieves a test error rate of 0.80% on the MNIST superpixel dataset, beating the closest reported rate of 0.95% by a factor of more than 15%. After dropping roughly 70% of the edge connections from the input by performing a Delaunay triangulation, our model still achieves a competitive error rate of 1.04%.

1. INTRODUCTION

Convolutional neural networks have performed incredibly well on tasks such as image classification, segmentation, and object detection (Khan et al., 2020) . While there have been diverse architectural design innovations leading to improved accuracies across these tasks, all of these tasks share the common property that they operate on structured Euclidean domain inputs. A growing body of research on how to transfer these successes into non-Euclidean domains, such as manifolds and graphs, has followed. We focus on unstructured graphs which represent discretizations of an underlying metric space. These data types are ubiquitous in computational physics, faceted surface meshes, and (with superpixel conversion) images. Previous efforts to extend CNNs to this type of data have involved parameterized function approximations on localized neighborhoods, such as MoNet (Monti et al., 2017) and SplineCNN (Fey et al., 2017) . These function approximations (Gaussian mixture models in the case of MoNet and B-spline kernels in the case of SpineCNN) are complex and relatively expensive to calculate compared to CNN kernels. Inspired by earlier work in shape correspondence (Boscaini et al., 2016) , image segmentation on the unit sphere (Jiang et al., 2019) , and low-dimensional embeddings of computational physics data (Tencer & Potter, 2020) we seek to utilize parameterized differential operators (PDOs) to construct convolution kernels. In contrast to MoNet and SpineCNN, parameterized differential operators are cheap to compute and involve only elementary operations. Boscaini et al. (2016) used anisotropic diffusion kernels while Jiang et al. ( 2019) included gradient operators in addition to an isotropic diffusion operator. Tencer & Potter (2020) performed an ablation study of the differential operators used and demonstrated that the including the gradient operators is broadly beneficial, but that little is gained by including additional terms. Prior work (Jiang et al., 2019; Tencer & Potter, 2020) has used differential operators precomputed for specific meshes. This approach has two drawbacks: (1) precomputing operators is not practical for datasets for which the connectivity graph varies between sample points, and (2) differential operators place restrictions on graph connectivity. Differential operators defined for mesh topologies rely on element connectivity information which is unavailable for more general graphs. Superpixel image datasets highlight both of these deficiencies. In contrast to these prior works, we do not precompute any operators and we do not directly use differential operators. Instead, we formulate pseudo-differential operators which are cheaply computed at run-time for a more general class of graphs. While our approach only applies to graphs with relative position information for each node, the set of graphs with the required positional information is large, encompassing nearly all physical systems as well as a significant number of other graphs, such as graph representations derived from image data. Since our method relies on computing approximate spatial derivatives of nodal features, it is also important that these nodal values represent a meaningfully changing field. This criteria is not necessarily met for the node correspondence task on the FAUST dataset or the shape classification task on the ModelNet10 dataset and a corresponding decrease in performance is observed. In contrast, nodal features are critical to superpixel image classification tasks and our method is observed to perform well for these datasets. Superpixel representations are popular for a wide range of tasks, particularly tasks in which large data quantities make the direct application of CNNs to the raw data impractical, such as hyperspectral imaging (Hong et al., 2020) and medical diagnostics (Roth et al., 2015) . For these applications, the superpixel representation serves as a sort of context aware lossy compression. Knyazev et al. ( 2019) compared GCNs applied to superpixel images to CNNs applied to low resolution images with approximately the same information content. In those cases, the graph methods not only held their own, but pulled ahead compared to the CNN performance. While those datasets were not at all pushing the limitations of image size, they indicate a possibility that superpixel methods might handle high resolution image data more efficiently and the value in developing methods that perform well on superpixel datasets. Our method is especially well-suited for analyzing superpixel image representations in addition to being applicable to the datasets used by Jiang et al. (2019) and Tencer & Potter (2020) to demonstrate their PDO-based approaches. For regular meshes, such as the icosahedral spherical mesh used by Jiang et al. (2019) , our pseudo-differential operators closely approximate the differential operators used in those works.

1.1. OUR CONTRIBUTIONS

We created a novel layer architecture inspired by PDOs. • We improve upon the static matrix approach of Tencer & Potter (2020) with a dynamic method that enables support for variable graph forms and eliminates the need to precompute matrices. • Our method utilizes pseudo-differential operators in contrast to the differential operators used in prior works. Pseudo-differential operators are cheap to compute and are applicable to a broader class of graphs than differential operators. • Our novel mixing layer is conceptually simple and easy to code (integrating painlessly with existing graph libraries). (section 3.1) • The new approach remains accurate for both sparsely and densely connected graphs, including state-of-the-art results for the MNIST superpixel 75 dataset both with and without reduced edge connection input data. (section 4.1) • The new approach is faster than common approaches for equivalent numbers of features owing to the simpler mathematical functions involved. (section 4.1.2)

