PROBABILISTIC NUMERIC CONVOLUTIONAL NEURAL NETWORKS

Abstract

Continuous input signals like images and time series that are irregularly sampled or have missing values are challenging for existing deep learning methods. Coherently defined feature representations must depend on the values in unobserved regions of the input. Drawing from the work in probabilistic numerics, we propose Probabilistic Numeric Convolutional Neural Networks which represent features as Gaussian processes (GPs), providing a probabilistic description of discretization error. We then define a convolutional layer as the evolution of a PDE defined on this GP, followed by a nonlinearity. This approach also naturally admits steerable equivariant convolutions under e.g. the rotation group. In experiments we show that our approach yields a 3× reduction of error from the previous state of the art on the SuperPixel-MNIST dataset and competitive performance on the medical time series dataset PhysioNet2012.

1. INTRODUCTION

Standard convolutional neural networks are defined on a regular input grid. For continuous signals like time series and images, these elements correspond to regular samples of an underlying function f defined on a continuous domain. In this case, the standard convolutional layer of a neural network is a numerical approximation of a continuous convolution operator A. Coherently defined networks on continuous functions should only depend on the input function f , and not on spurious shortcut features (Geirhos et al., 2020) such as the sampling locations or sampling density, which enable overfitting and reduce robustness to changes in the sampling procedure. Each application of A in a standard neural network incurs some discretization error which is determined by the sampling resolution. In some sense, this error is unavoidable because the features f ( ) at the layers depend on the values of the input function f at regions that have not been observed. For input signals which are sampled at a low resolution, or even sampled irregularly such as with the sporadic measurements of patient vitals data in ICUs or dispersed sensors for measuring ocean currents, this discretization error cannot be neglected. Simply filling in the missing data with zeros or imputing the values is not sufficient since many different imputations are possible, each of which can affect the outcomes of the network. Probabilistic numerics is an emergent field that studies discretization errors in numerical algorithms using probability theory (Cockayne et al., 2019) . Here we build upon these ideas to quantify the dependence of the network on the regions in the input which are unknown, and integrate this uncertainty into the computation of the network. To do so, we replace the discretely evaluated feature maps {f ( ) (x i )} N i=1 with Gaussian processes: distributions over the continuous function f ( ) that track the most likely values as well as the uncertainty. On this Gaussian process feature representation, we need not resort to discretizing the convolution operator A as in a standard convnet, but instead we can apply the continuous convolution operator directly. If a given feature is a Gaussian process, then applying linear operators yields a new Gaussian process with transformed mean and covariance functions. The dependence of Af on regions of f which are not known translates into the uncertainty represented in the transformed covariance function, the analogue of the discretization error in a CNN, which is now tracked explicitly. We call the resulting model Probabilistic Numeric Convolutional Neural Network (PNCNN).

2. RELATED WORK

Over the years there have been many successful convolutional approaches for ungridded data such as GCN (Kipf and Welling, 2016) , PointNet (Qi et al., 2017 ), Transformer (Vaswani et al., 2017 ), Deep Sets (Zaheer et al., 2017) , SplineCNN (Fey et al., 2018) , PCNN (Atzmon et al., 2018 ), PointConv (Wu et al., 2019) , KPConv (Thomas et al., 2019) and many others (de Haan et al., 2020; Finzi et al., 2020; Schütt et al., 2017; Wang et al., 2018) . However, the target domains of sets, graphs, and point clouds are intrinsically discrete and for continuous data each of these methods fail to take full advantage of the assumption that the underlying signal is continuous. Furthermore, none of these approaches reason about the underlying signal probabilistically. In a separate line of work there are several approaches tackling irregularly spaced time series with RNNs (Che et al., 2018 ), Neural ODEs (Rubanova et al., 2019) , imputation to a regular grid (Li and Marlin, 2016; Futoma et al., 2017; Shukla and Marlin, 2019; Fortuin et al., 2020) , set functions (Horn et al., 2019) and attention (Shukla and Marlin, 2020). Additionally there are several works exploring reconstruction of images from incomplete observations for downstream classification (Huijben et al., 2019; Li and Marlin, 2020) . Most similar to our method are the end-to-end Gaussian process adapter (Li and Marlin, 2016) and the multi-task Gaussian process RNN classifier (Futoma et al., 2017) . In these two works, a Gaussian process is fit to an irregularly spaced time series and sampled imputations from this process are fed into a separate RNN classifier. Unlike our approach where the classifier operates directly on a continuous and probabilistic signal, in these works the classifier operates on a deterministic signal on a regular grid and cannot reason probabilistically about discretization errors. Finally, while superficially similar to Deep GPs (Damianou and Lawrence, 2013) or Deep Differential Gaussian Process Flows (Hegde et al., 2018) , our PNCNNs tackle fundamentally different kinds of problems like image classificationfoot_0 , and our GPs represent epistemic uncertainty over the values of the feature maps rather than the parameters of the network.

Probabilistic Numerics:

We draw inspiration for our approach from the community of probabilistic numerics where the error in numerical algorithms are modeled probabilistically, and typically with a Gaussian process. In this framework, only a finite number of input function calls can be made, and therefore the numerical algorithm can be viewed as an autonomous agent which has epistemic uncertainty over the values of the input. A well known example is Bayesian Monte Carlo where a Gaussian process is used to model the error in the numerical estimation of an integral and optimally select a rule for its computation (Minka, 2000; Rasmussen and Ghahramani, 2003) . Probabilistic numerics has been applied widely to numerical problems such as the inversion of a matrix (Hennig, 2015) , the solution of an ODE (Schober et al., 2019) , a meshless solution to boundary value PDEs (Cockayne et al., 2016) , and other numerical problems (Cockayne et al., 2019) . To our knowledge, we are the first to construct a probabilistic numeric method for convolutional neural networks.

Gaussian Processes:

We are interested in operating on the continuous function f (x) underlying the input, but in practice we have access only to a collection of the values of that function sampled on a finite number of points {x i } N i=1 . Classical interpolation theory reconstructs f deterministically by assuming a certain structure of the signal in the frequency domain. Gaussian processes give a way of modeling our beliefs about values that have not been observed (Rasmussen et al., 2006) , as reviewed in appendix A. These beliefs are encoded into a prior covariance k of the GP f ∼ GP(0, k) and updated upon seeing data with Bayesian inference. Explicitly, given a set of sampling locations x = {x i } N i=1 and noisy observations y = {y i } N i=1 sampled y i ∼ N (f (x i ), σ 2 i ), using Bayes rule



While GPs can be applied directly to image classification, they are not well suited to this task even with convolutional structure baked in, as shown inKumar et al. (2018).

