OSCILLATION NEURAL ORDINARY DIFFERENTIAL EQUATIONS

Abstract

Neural ordinary differential equations (NODEs) have received a lot of attention in recent years due to their memory efficiency. Different from traditional deep learning, it defines a continuous deep learning architecture based on the theory of ordinary differential equations (ODEs), which also improves the interpretability of deep learning. However, it has several obvious limitations, such as a NODE is not a universal approximator, it requires a large number of function evaluations (NFEs), and it has a slow convergence rate. We address these drawbacks by modeling and adding an oscillator to the framework of the NODEs. The oscillator enables the trajectories of our model to cross each other. We prove that our model is a universal approximator, even in the original input space. Due to the presence of oscillators, the flows learned by the model will be simpler, thus our model needs fewer NFEs and has a faster convergence speed. We apply our model to various tasks including classification and time series extrapolation, then compare several metrics including accuracy, NFEs, and convergence speed. The experiments show that our model can achieve better results compared to the existing baselines.

1. INTRODUCTION

Neural Ordinary Differential Equations (NODEs) (Chen et al., 2018) are the latest continuous deep learning architectures that were first developed in the context of continuous recurrent networks (Cohen & Grossberg, 1983) . This continuous deep learning architecture provides a new perspective that theoretically bridges the gap between deep learning and dynamic systems. This deep learning architecture can be efficiently trained with backpropagation and has shown great promise on several tasks including modeling continuous time data, classification, and building normalizing flows. The core idea of a NODE is to use a neural network to parameterize the vector field (Chen et al., 2018; Kidger, 2022) . Typically, a simple neural network is enough to represent the vector field, which will be optimized during the training process. Based on the well-learned vector field, trajectories will be obtained as the estimate functions. However, this architecture has several limitations. First, NODEs cannot learn any crossover-mapping functions (Dupont et al., 2019) , which results in them not being universal approximators. Second, to optimize the vector field, it will need many function evaluations during both forward evaluation and backpropagation processes of training. Third, the time and convergence rate of the training process is relatively slow. The first limitation is caused by the continuity of vector-field-based trajectories because the trajectories in NODEs cannot cross each other at the same time (Massaroli et al., 2020; Norcliffe et al., 2020) . This property causes NODEs to be powerless against some special topologies, such as the cases of concentric circles and intersecting lines mentioned by Dupont et al. (2019) . We conjecture the reason for the second limitation is caused by the straightforward optimization approach of the vector field. There is no guarantee that learning the vector field is a better choice than learning the estimated functions directly. Sometimes it will need many function evaluations to optimize the vector field, so the difficulties go beyond learning the estimated functions themselves. The third limitation is caused by the trade-off between accuracy and speed for the ordinary differential equation solver (ODE solver). NODEs perform forward evaluation and backpropagation calculations via ODE solvers, which can be treated as black boxes. If we need to ensure the accuracy of an ODE solver, then we must sacrifice the speed.

