RESNET AFTER ALL? NEURAL ODES AND THEIR NUMERICAL SOLUTION

Abstract

A key appeal of the recently proposed Neural Ordinary Differential Equation (ODE) framework is that it seems to provide a continuous-time extension of discrete residual neural networks. As we show herein, though, trained Neural ODE models actually depend on the specific numerical method used during training. If the trained model is supposed to be a flow generated from an ODE, it should be possible to choose another numerical solver with equal or smaller numerical error without loss of performance. We observe that if training relies on a solver with overly coarse discretization, then testing with another solver of equal or smaller numerical error results in a sharp drop in accuracy. In such cases, the combination of vector field and numerical method cannot be interpreted as a flow generated from an ODE, which arguably poses a fatal breakdown of the Neural ODE concept. We observe, however, that there exists a critical step size beyond which the training yields a valid ODE vector field. We propose a method that monitors the behavior of the ODE solver during training to adapt its step size, aiming to ensure a valid ODE without unnecessarily increasing computational cost. We verify this adaptation algorithm on a common bench mark dataset as well as a synthetic dataset.

1. INTRODUCTION

The choice of neural network architecture is an important consideration in the deep learning community. Among a plethora of options, Residual Neural Networks (ResNets) (He et al., 2016) have emerged as an important subclass of models, as they mitigate the gradient issues (Balduzzi et al., 2017) arising with training deep neural networks by adding skip connections between the successive layers. Besides the architectural advancements inspired from the original scheme (Zagoruyko & Komodakis, 2016; Xie et al., 2017) , recently Neural Ordinary Differential Equation (Neural ODE) models (Chen et al., 2018; E, 2017; Lu et al., 2018; Haber & Ruthotto, 2017) have been proposed as an analog of continuous-depth ResNets. While Neural ODEs do not necessarily improve upon the sheer predictive performance of ResNets, they offer the vast knowledge of ODE theory to be applied to deep learning research. For instance, the authors in Yan et al. (2020) discovered that for specific perturbations, Neural ODEs are more robust than convolutional neural networks. Moreover, inspired by the theoretical properties of the solution curves, they propose a regularizer which improved the robustness of Neural ODE models even further. However, if Neural ODEs are chosen for their theoretical advantages, it is essential that the effective model-the combination of ODE problem and its solution via a particular numerical method-is a close approximation of the true analytical, but practically inaccessible ODE solution. Code: https://github.com/boschresearch/numerics_independent_neural_odes 1

