GO WITH THE FLOW: ADAPTIVE CONTROL FOR NEURAL ODES

Abstract

Despite their elegant formulation and lightweight memory cost, neural ordinary differential equations (NODEs) suffer from known representational limitations. In particular, the single flow learned by NODEs cannot express all homeomorphisms from a given data space to itself, and their static weight parameterization restricts the type of functions they can learn compared to discrete architectures with layer-dependent weights. Here, we describe a new module called neurallycontrolled ODE (N-CODE) designed to improve the expressivity of NODEs. The parameters of N-CODE modules are dynamic variables governed by a trainable map from initial or current activation state, resulting in forms of open-loop and closed-loop control, respectively. A single module is sufficient for learning a distribution on non-autonomous flows that adaptively drive neural representations. We provide theoretical and empirical evidence that N-CODE circumvents limitations of previous NODEs models and show how increased model expressivity manifests in several supervised and unsupervised learning problems. These favorable empirical results indicate the potential of using data-and activity-dependent plasticity in neural networks across numerous domains.

1. INTRODUCTION

The interpretation of artificial neural networks as continuous-time dynamical systems has led to both theoretical and practical advances in representation learning. According to this interpretation, the separate layers of a deep neural network are understood to be a discretization of a continuous-time operator so that, in effect, the net is infinitely deep. One important class of continuous-time models, neural ordinary differential equations (NODEs) (Chen et al., 2018) , have found natural applications in generative variational inference (Grathwohl et al., 2019) and physical modeling (Köhler et al., 2019; Ruthotto et al., 2020) because of their ability to take advantage of black-box differential equation solvers and correspondence to dynamical systems in nature. Nevertheless, NODEs suffer from known representational limitations, which researchers have tried to alleviate either by lifting the NODE activation space to higher dimensions or by allowing the transition operator to change in time, making the system non-autonomous (Dupont et al., 2019) . For example, Zhang et al. (2020) showed that NODEs can arbitrarily approximate maps from R d to R if NODE dynamics operate with an additional time dimension in R d+1 and the system is affixed with an additional linear layer. The same authors showed that NODEs could approximate homeomorphisms from R d to itself if the dynamics were lifted to R 2d . Yet, the set of homeomorphisms from R d to itself is in fact quite a conservative function space from the perspective of representation learning, since these mappings preserve topological invariants of the data space, preventing them from "disentangling" data classes like those of the annulus data in Fig. 1 (lower left panel). In general, much remains to be understood about the continuous-time framework and its expressive capabilities. In this paper, we propose a new approach that we call neurally-controlled ODEs (N-CODE) designed to increase the expressivity of continuous-time neural nets by using tools from control theory. Whereas previous continuous-time methods learn a single, time-varying vector field for the whole input space, our system learns a family of vector fields parameterized by data. We do so by mapping the input space to a collection of control weights which interact with neural activity to optimally steer model dynamics. The implications of this new formulation are critical for model expressivity. In particular, the transformation of the input space is no longer constrained to be a homeomorphism, since the flows associated with each datum are specifically adapted to that point. Consequently, our system can easily "tear" apart the two annulus classes in Fig. 1 (lower right panel) without directly lifting the data space to a higher dimension. Moreover, when control weights are allowed to vary in time, they can play the role of fast, plastic synapses which can adapt to dynamic model states and inputs. The rest of the paper proceeds as follows. First, we will lay out the background for N-CODE and its technical formulation. Then, we will demonstrate its efficacy for supervised and unsupervised learning. In the supervised case, we show how N-CODE can classify data by learning to bifurcate its dynamics along class boundaries as well as memorize high-dimensional patterns in real-time using fast synapses. Then, we show how the flows learned by N-CODE can be used as latent representations in an unsupervised autoencoder, improving image generation over a base model.

2. BACKGROUND

Neural ODEs (NODEs) (Chen et al., 2018) are dynamical systems of the form dx dt = f (x, θ, t), where X is a space of features, θ ∈ Θ is a collection of learnable parameters, and f : X ×Θ×R → X is an equation of motion which we take to be differentiable on its whole domain. f defines a flow, i.e. a triple (X , R, Φ θ ) with Φ θ : X × R → X defined by Φ θ (x(0), T ) = x(0) + T 0 f (x(t), θ, t)dt which relates an initial point x(0) to an orbit of points {x(t) = Φ θ (x(0), t), t ∈ R}. For a fixed T , the map x → Φ θ (x, T ) is a homeomorphism from X to itelf parametrized by θ. Several properties of such flows make them appealing for machine learning. For example, the ODEs that govern such such flows can be solved with off-the shelf solvers and they can potentially model data irregularly sampled in time. Moreover, such flows are reversible maps by construction whose inverse is just the system integrated backward in time, Φ θ (., t) -1 = Φ θ (., -t). This property enables depth-constant memory cost of training thanks to the adjoint sensitivity method (Pontryagin Lev Semyonovich ; Boltyanskii V G & F, 1962) and the modeling of continuous-time generative normalizing flow algorithms (?).



Figure 1: Vector fields for continuous-time neural networks. Integral curves with arrows show the trajectories of data points under the influence of the network. Top left: Standard NODEs learn a single time-independent flow (in black) that must account for the whole data space. Top right: N-CODE learns a family of vector fields (red vs yellow vs blue), enabling the system to flexibly adjust the trajectory for every data point. Bottom left: Trained NODE trajectories of initial values in a data set of concentric annuli, colored green and yellow. The NODE transformation is a homeomorphism on the data space and cannot separate the classes as a result. Colored points are the initial state of the dynamics, and black points are the final state. Bottom right: Corresponding flows for N-CODE which easily separate the classes. Transiting from the inner to outer annulus effects a bifurcation which linearly separates the data.

