LEARNING FROM MULTISCALE WAVELET SUPERPIX-ELS USING GNN WITH SPATIALLY HETEROGENEOUS POOLING

Abstract

Neural networks have become the standard for image classification tasks. On one hand, convolutional neural networks (CNNs) achieve state-of-the-art performance by learning from a regular grid representation of images. On the other hand, graph neural networks (GNNs) have shown promise in learning image classification from an embedded superpixel graph. However, in the latter, studies have been restricted to SLIC superpixels, where 1) a single target number of superpixels is arbitrarily defined for an entire dataset irrespective of differences across images and 2) the superpixels in a given image are of similar size despite intrinsic multiscale structure. In this study, we investigate learning from a new principled representation in which individual images are represented by an image-specific number of multiscale superpixels. We propose WaveMesh, a wavelet-based superpixeling algorithm, where the number and sizes of superpixels in an image are systematically computed based on the image content. We also present WavePool, a spatially heterogeneous pooling scheme tailored to WaveMesh superpixels. We study the feasibility of learning from the WaveMesh superpixel representation using SplineCNN, a state-of-the-art network for image graph classification. We show that under the same network architecture and training settings, SplineCNN with original Graclus-based pooling learns from WaveMesh superpixels on-par with SLIC superpixels. Additionally, we observe that the best performance is achieved when replacing Graclus-based pooling with WavePool while using WaveMesh superpixels.

1. INTRODUCTION

Convolutional neural networks (CNNs) achieve state-of-the-art performance on a variety of image classification tasks from different domains (Tan & Le, 2019; Gulshan et al., 2016) . CNNs learn from a regular pixel-grid representation of the images. Although not all pixels provide equal amount of new information, by design the filters in the first layer of a CNN operate on each pixel from top-left to bottom-right in the same way. Additionally, images are typically resized to a prescribed size before feeding into a CNN. In applications that use standard CNN architectures or pre-trained models on a new image classification dataset, the images are typically uniformly downsampled to meet the input size requirements of the architecture being used. Uniform downsampling may be suboptimal as real data naturally exhibits spatial and multiscale heterogeneity. Few studies have explored the impact of input image resolution on model performance (Sabottke & Spieler, 2020), despite its recognized importance (Lakhani, 2020). Graph neural network (GNN) is a type of neural network that learns from graph structured data. Recent studies have shown the performance of GNNs on image graph classification tasks (Monti et al., 2017; Fey et al., 2018; Knyazev et al., 2019; Dwivedi et al., 2020) . In this task, a GNN learns to classify images from embedded graphs that represent superpixels in the images. However, prior studies have been restricted to SLIC superpixels (Achanta et al., 2012) . In this framework, a single target number of superpixels is arbitrarily defined for an entire dataset irrespective of differences across images, and the superpixels in a given image are of similar size despite intrinsic multiscale structure. Our proposed approach circumvents these limitations, as shown in Figure 1 .

