FUZZY TILING ACTIVATIONS: A SIMPLE APPROACH TO LEARNING SPARSE REPRESENTATIONS ONLINE

Abstract

Recent work has shown that sparse representations-where only a small percentage of units are active-can significantly reduce interference. Those works, however, relied on relatively complex regularization or meta-learning approaches, that have only been used offline in a pre-training phase. In this work, we pursue a direction that achieves sparsity by design, rather than by learning. Specifically, we design an activation function that produces sparse representations deterministically by construction, and so is more amenable to online training. The idea relies on the simple approach of binning, but overcomes the two key limitations of binning: zero gradients for the flat regions almost everywhere, and lost precision-reduced discrimination-due to coarse aggregation. We introduce a Fuzzy Tiling Activation (FTA) that provides non-negligible gradients and produces overlap between bins that improves discrimination. We first show that FTA is robust under covariate shift in a synthetic online supervised learning problem, where we can vary the level of correlation and drift. Then we move to the deep reinforcement learning setting and investigate both value-based and policy gradient algorithms that use neural networks with FTAs, in classic discrete control and Mujoco continuous control environments. We show that algorithms equipped with FTAs are able to learn a stable policy faster without needing target networks on most domains. 1

1. INTRODUCTION

Representation learning in online learning systems can strongly impact learning efficiency, both positively due to generalization but also negatively due to interference (Liang et al., 2016; Heravi, 2019; Le et al., 2017; Liu et al., 2019; Chandak et al., 2019; Caselles-Dupré et al., 2018; Madjiheurem & Toni, 2019) . Neural networks particularly suffer from interference-where updates for some inputs degrade accuracy for others-when training on temporally correlated data (McCloskey & Cohen, 1989; French, 1999; Kemker et al., 2018) . Recent work (Liu et al., 2019; Ghiassian et al., 2020; Javed & White, 2019; Rafati & Noelle, 2019; Hernandez-Garcia & Sutton, 2019) , as well as older work (McCloskey & Cohen, 1989; French, 1991) , have shown that sparse representation can reduce interference in training parameter updates. A sparse representation is one where only a small number of features are active, for each input (Cheng et al., 2013) . Each update only impacts a small number of weights and so is less likely to interfere with many state values. Further, when constrained to learn sparse features, the feature vectors are more likely to be orthogonal (Cover, 1965) , which further mitigates interference. The learned features can still be highly expressive, and even more interpretable, as only a small number of attributes are active for a given input. However, learning sparse representations online remains relatively open. Some previous work has relied on representations pre-trained before learning, either with regularizers that encourage sparsity (Tibshirani, 1996; Xiang et al., 2011; Liu et al., 2019) or with meta-learning (Javed & White, 2019) . Other work has trained the sparse-representation neural network online, by using sparsity regularizers online with replay buffers (Hernandez-Garcia & Sutton, 2019) or using a winner-take-all strategy where all but the top activations are set to zero (Rafati & Noelle, 2019) . Hernandez-Garcia &



Code is available at https://github.com/yannickycpan/reproduceRL.git 1

