LEARNING REPRESENTATIONS FROM TEMPORALLY SMOOTH DATA

Abstract

Events in the real world are correlated across nearby points in time, and we must learn from this temporally "smooth" data. However, when neural networks are trained to categorize or reconstruct single items, the common practice is to randomize the order of training items. What are the effects of temporally smooth training data on the efficiency of learning? We first tested the effects of smoothness in training data on incremental learning in feedforward nets and found that smoother data slowed learning. Moreover, sampling so as to minimize temporal smoothness produced more efficient learning than sampling randomly. If smoothness generally impairs incremental learning, then how can networks be modified to benefit from smoothness in the training data? We hypothesized that two simple brain-inspired mechanisms -leaky memory in activation units and memory-gating -could enable networks to rapidly extract useful representations from smooth data. Across all levels of data smoothness, these brain-inspired architectures achieved more efficient category learning than feedforward networks. This advantage persisted, even when leaky memory networks with gating were trained on smooth data and tested on randomly-ordered data. Finally, we investigated how these brain-inspired mechanisms altered the internal representations learned by the networks. We found that networks with multi-scale leaky memory and memory-gating could learn internal representations that "un-mixed" data sources which vary on fast and slow timescales across training samples. Altogether, we identified simple mechanisms enabling neural networks to learn more quickly from temporally smooth data, and to generate internal representations that separate timescales in the training signal.

1. INTRODUCTION

Events in the world are correlated in time: the information that we receive at one moment is usually similar to the information that we receive at the next. For example, when having a conversation with someone, we see multiple samples of the same face from different angles over the course of several seconds. However, when we train neural networks for categorization or reconstruction tasks, we commonly ignore temporal ordering of samples and use randomly ordered data. Given that humans can learn robustly and efficiently when learning incrementally from sequentially correlated, it is important to examine what kinds of architectures and inductive biases may support such learning (Hadsell et al., 2020) . Therefore, we asked how does the sequential correlation structure in the data affect learning in neural networks that are performing categorization or reconstruction of one input at a time? Moreover, we asked: which mechanisms can a network employ to exploit the temporal autocorrelation ("smoothness") of data, without needing to perform backpropagation through time (BPTT) (Sutskever, 2013)? We investigated this question in three stages. In the first stage, we examined the effects of temporally smooth training data on feedforward neural networks performing category learning. Here we confirmed that autocorrelation in training data slows learning in feeforward nets. In the second stage, we investigated conditions under which these classifier networks might take advantage of smooth data. We hypothesized that human brains may possess mechanisms (or inductive biases) that maximize the benefits of learning from temporally smooth data. We therefore tested two network mechanisms inspired by properties of cortical circuits: leaky memory (associated with

