MODELLING LONG RANGE DEPENDENCIES IN N D: FROM TASK-SPECIFIC TO A GENERAL PURPOSE CNN

Abstract

Performant Convolutional Neural Network (CNN) architectures must be tailored to specific tasks in order to consider the length, resolution, and dimensionality of the input data. In this work, we tackle the need for problem-specific CNN architectures. We present the Continuous Convolutional Neural Network (CCNN): a single CNN able to process data of arbitrary resolution, dimensionality and length without any structural changes. Its key component are its continuous convolutional kernels which model long-range dependencies at every layer, and thus remove the need of current CNN architectures for task-dependent downsampling and depths. We showcase the generality of our method by using the same architecture for tasks on sequential (1D), visual (2D) and point-cloud (3D) data. Our CCNN matches and often outperforms the current state-of-the-art across all tasks considered.

1. INTRODUCTION

The vast popularity of Convolutional Neural Networks (LeCun et al., 1998 ) (CNNs) is a result of their high performance and efficiency, which has led them to achieve state-of-the-art in applications across sequential (Abdel-Hamid et al., 2014; Van Den Oord et al., 2016 ), visual (Krizhevsky et al., 2012; Simonyan & Zisserman, 2014) and high-dimensional data (Schütt et al., 2017; Wu et al., 2019) . Nevertheless, a major limitation of CNNs -and other neural networks-is that their architectures must be tailored to particular applications in order to consider the length, resolution and dimensionality of the input data. This has led to a plethora of task-specific architectures (Oord et al., 2016; Bai et al., 2018; Simonyan & Zisserman, 2014; Szegedy et al., 2015; Ronneberger et al., 2015; He et al., 2016; Qi et al., 2017; Wu et al., 2019) which (i) hampers the selection of the most appropriate architecture for a particular task, and (ii) obscures the transfer and generalization of insights across applications. In this work, we tackle the need for problem-specific CNN architectures and propose a generic CNN architecture that can be used independent of the length, resolution and dimensionality of the data. CNN architectures are data dependent. Current CNN architectures are task-specific because they are tied to the length, resolution, and dimensionality of the input. The length of the data varies from task to task, e.g. audio fragments may span milliseconds to minutes. This requires carefully chosen 



Figure 1: Discrete and continuous convolutional kernels. Discrete convolutional kernels assign a weight w i out of a discrete set of weights W to a relative offset xx. This ties the kernel to the length, resolution and dimensionality of the input, limiting the general applicability of the CNN architectures. Instead, our Continuous Convolutional Neural Network parameterizes kernel values as a continuous function φ Kernel over the input domain R d , which decouples it from data characteristics.

