PROBABILISTIC NUMERIC CONVOLUTIONAL NEURAL NETWORKS

Abstract

Continuous input signals like images and time series that are irregularly sampled or have missing values are challenging for existing deep learning methods. Coherently defined feature representations must depend on the values in unobserved regions of the input. Drawing from the work in probabilistic numerics, we propose Probabilistic Numeric Convolutional Neural Networks which represent features as Gaussian processes (GPs), providing a probabilistic description of discretization error. We then define a convolutional layer as the evolution of a PDE defined on this GP, followed by a nonlinearity. This approach also naturally admits steerable equivariant convolutions under e.g. the rotation group. In experiments we show that our approach yields a 3× reduction of error from the previous state of the art on the SuperPixel-MNIST dataset and competitive performance on the medical time series dataset PhysioNet2012.

1. INTRODUCTION

Standard convolutional neural networks are defined on a regular input grid. For continuous signals like time series and images, these elements correspond to regular samples of an underlying function f defined on a continuous domain. In this case, the standard convolutional layer of a neural network is a numerical approximation of a continuous convolution operator A. Coherently defined networks on continuous functions should only depend on the input function f , and not on spurious shortcut features (Geirhos et al., 2020) such as the sampling locations or sampling density, which enable overfitting and reduce robustness to changes in the sampling procedure. Each application of A in a standard neural network incurs some discretization error which is determined by the sampling resolution. In some sense, this error is unavoidable because the features f ( ) at the layers depend on the values of the input function f at regions that have not been observed. For input signals which are sampled at a low resolution, or even sampled irregularly such as with the sporadic measurements of patient vitals data in ICUs or dispersed sensors for measuring ocean currents, this discretization error cannot be neglected. Simply filling in the missing data with zeros or imputing the values is not sufficient since many different imputations are possible, each of which can affect the outcomes of the network. Probabilistic numerics is an emergent field that studies discretization errors in numerical algorithms using probability theory (Cockayne et al., 2019) . Here we build upon these ideas to quantify the dependence of the network on the regions in the input which are unknown, and integrate this uncertainty into the computation of the network. To do so, we replace the discretely evaluated feature maps {f ( ) (x i )} N i=1 with Gaussian processes: distributions over the continuous function f ( ) that track the most likely values as well as the uncertainty. On this Gaussian process feature representation, we need not resort to discretizing the convolution operator A as in a standard convnet, but instead we can apply the continuous convolution operator directly. If a given feature is a Gaussian process, then applying linear operators yields a new Gaussian process with transformed mean and covariance functions. The dependence of Af on regions of f which are not known translates into

