GO WITH THE FLOW: ADAPTIVE CONTROL FOR NEURAL ODES

Abstract

Despite their elegant formulation and lightweight memory cost, neural ordinary differential equations (NODEs) suffer from known representational limitations. In particular, the single flow learned by NODEs cannot express all homeomorphisms from a given data space to itself, and their static weight parameterization restricts the type of functions they can learn compared to discrete architectures with layer-dependent weights. Here, we describe a new module called neurallycontrolled ODE (N-CODE) designed to improve the expressivity of NODEs. The parameters of N-CODE modules are dynamic variables governed by a trainable map from initial or current activation state, resulting in forms of open-loop and closed-loop control, respectively. A single module is sufficient for learning a distribution on non-autonomous flows that adaptively drive neural representations. We provide theoretical and empirical evidence that N-CODE circumvents limitations of previous NODEs models and show how increased model expressivity manifests in several supervised and unsupervised learning problems. These favorable empirical results indicate the potential of using data-and activity-dependent plasticity in neural networks across numerous domains.

1. INTRODUCTION

The interpretation of artificial neural networks as continuous-time dynamical systems has led to both theoretical and practical advances in representation learning. According to this interpretation, the separate layers of a deep neural network are understood to be a discretization of a continuous-time operator so that, in effect, the net is infinitely deep. One important class of continuous-time models, neural ordinary differential equations (NODEs) (Chen et al., 2018) , have found natural applications in generative variational inference (Grathwohl et al., 2019) and physical modeling (Köhler et al., 2019; Ruthotto et al., 2020) because of their ability to take advantage of black-box differential equation solvers and correspondence to dynamical systems in nature. Nevertheless, NODEs suffer from known representational limitations, which researchers have tried to alleviate either by lifting the NODE activation space to higher dimensions or by allowing the transition operator to change in time, making the system non-autonomous (Dupont et al., 2019) . For example, Zhang et al. (2020) showed that NODEs can arbitrarily approximate maps from R d to R if NODE dynamics operate with an additional time dimension in R d+1 and the system is affixed with an additional linear layer. The same authors showed that NODEs could approximate homeomorphisms from R d to itself if the dynamics were lifted to R 2d . Yet, the set of homeomorphisms from R d to itself is in fact quite a conservative function space from the perspective of representation learning, since these mappings preserve topological invariants of the data space, preventing them from "disentangling" data classes like those of the annulus data in Fig. 1 (lower left panel). In general, much remains to be understood about the continuous-time framework and its expressive capabilities.

