Drop-Bottleneck: LEARNING DISCRETE COMPRESSED REPRESENTATION FOR NOISE-ROBUST EXPLORATION

Abstract

We propose a novel information bottleneck (IB) method named Drop-Bottleneck, which discretely drops features that are irrelevant to the target variable. Drop-Bottleneck not only enjoys a simple and tractable compression objective but also additionally provides a deterministic compressed representation of the input variable, which is useful for inference tasks that require consistent representation. Moreover, it can jointly learn a feature extractor and select features considering each feature dimension's relevance to the target task, which is unattainable by most neural network-based IB methods. We propose an exploration method based on Drop-Bottleneck for reinforcement learning tasks. In a multitude of noisy and reward sparse maze navigation tasks in VizDoom (Kempka et al., 2016) and DM-Lab (Beattie et al., 2016), our exploration method achieves state-of-the-art performance. As a new IB framework, we demonstrate that Drop-Bottleneck outperforms Variational Information Bottleneck (VIB) (Alemi et al., 2017) in multiple aspects including adversarial robustness and dimensionality reduction.

1. INTRODUCTION

Data with noise or task-irrelevant information easily harm the training of a model; for instance, the noisy-TV problem (Burda et al., 2019a ) is one of well-known such phenomena in reinforcement learning. If observations from the environment are modified to contain a TV screen, which changes its channel randomly based on the agent's actions, the performance of curiosity-based exploration methods dramatically degrades (Burda et al., 2019a; b; Kim et al., 2019; Savinov et al., 2019) . The information bottleneck (IB) theory (Tishby et al., 2000; Tishby & Zaslavsky, 2015) provides a framework for dealing with such task-irrelevant information, and has been actively adopted to exploration in reinforcement learning (Kim et al., 2019; Igl et al., 2019) . For an input variable X and a target variable Y , the IB theory introduces another variable Z, which is a compressed representation of X. The IB objective trains Z to contain less information about X but more information about Y as possible, where the two are quantified by mutual information terms of I(Z; X) and I(Z; Y ), respectively. IB methods such as Variational Information Bottleneck (VIB) (Alemi et al., 2017; Chalk et al., 2016) and Information Dropout (Achille & Soatto, 2018) show that the compression of the input variable X can be done by neural networks. In this work, we propose a novel information bottleneck method named Drop-Bottleneck that compresses the input variable by discretely dropping a subset of its input features that are irrelevant to the target variable. Drop-Bottleneck provides some nice properties as follows: • The compression term of Drop-Bottleneck's objective is simple and is optimized as a tractable solution. • Drop-Bottleneck provides a deterministic compressed representation that still maintains majority of the learned indistinguishability i.e. compression. It is useful for inference tasks that require the input representation to be consistent and stable. • Drop-Bottleneck jointly trains a feature extractor and performs feature selection, as it learns the feature-wise drop probability taking into account each feature dimension's relevance to the target task. Hence, unlike the compression provided by most neural network-based IB

