

Abstract

We introduce Active Tuning, a novel paradigm for optimizing the internal dynamics of recurrent neural networks (RNNs) on the fly. In contrast to the conventional sequence-to-sequence mapping scheme, Active Tuning decouples the RNN's recurrent neural activities from the input stream, using the unfolding temporal gradient signal to tune the internal dynamics into the data stream. As a consequence, the model output depends only on its internal hidden dynamics and the closedloop feedback of its own predictions; its hidden state is continuously adapted by means of the temporal gradient resulting from backpropagating the discrepancy between the signal observations and the model outputs through time. In this way, Active Tuning infers the signal actively but indirectly based on the originally learned temporal patterns, fitting the most plausible hidden state sequence into the observations. We demonstrate the effectiveness of Active Tuning on several time series prediction benchmarks, including multiple super-imposed sine waves, a chaotic double pendulum, and spatiotemporal wave dynamics. Active Tuning consistently improves the robustness, accuracy, and generalization abilities of all evaluated models. Moreover, networks trained for signal prediction and denoising can be successfully applied to a much larger range of noise conditions with the help of Active Tuning. Thus, given a capable time series predictor, Active Tuning enhances its online signal filtering, denoising, and reconstruction abilities without the need for additional training.

1. INTRODUCTION

Recurrent neural networks (RNNs) are inherently only robust against noise to a limited extent and they often generate unsuitable predictions when confronted with corrupted or missing data (cf., e.g., Otte et al., 2015) . To tackle noise, an explicit noise-aware training procedure can be employed, yielding denoising networks, which are targeted to handle particular noise types and levels. Recurrent oscillators, such as echo state networks (ESNs) (Jaeger, 2001; Koryakin et al., 2012; Otte et al., 2016) , when initialized with teacher forcing, however, are highly dependent on a clean and accurate target signal. Given an overly noisy signal, the system is often not able to tune its neural activities into the desired target dynamics at all. Here, we present a method that can be seen as an alternative to regular teacher forcing and, moreover, as a general tool for more robustly tuning and thus synchronizing the dynamics of a generative differentiable temporal forward model-such as a standard RNN, ESN, or LSTM-like RNN (Hochreiter & Schmidhuber, 1997; Otte et al., 2014; Chung et al., 2014; Otte et al., 2016 )-into the observed data stream. The proposed method, which we call Active Tuning, uses gradient back-propagation through time (BPTT) (Werbos, 1990) , where the back-propagated gradient signal is used to tune the hidden activities of a neural network instead of adapting its weights. The way we utilize the temporal gradient signal is related to learning parametric biases (Sugita et al., 2011) and applying dynamic context inference (Butz et al., 2019) . With Active Tuning, two essential aspects apply: First, during signal inference, the model is not driven by the observations directly, but indirectly via prediction errorinducted temporal gradient information, which is used to infer the hidden state activation sequence that best explains the observed signal. Second, the general stabilization ability of propagating signal hypotheses through the network is exploited, effectively washing out activity components (such as noise) that cannot be modeled with the learned temporal structures within the network. As a result, the vulnerable internal dynamics are kept within a system-consistent activity milieu, effectively decoupling it from noise or other unknown distortions that are present in the defective actual signal.

