Intrinsic Motivation via Surprise Memory

Abstract

We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capability of surprise-based intrinsic motivators, maintaining the agent's interest in exciting exploration while reducing unwanted attraction to unpredictable or noisy observations. Our experiments demonstrate that the SM combined with various surprise predictors exhibits ecient exploring behaviors and signicantly boosts the nal performance in sparse reward environments, including Noisy-TV, navigation and challenging Atari games.

1. Introduction

What motivates agents to explore? Successfully answering this question would enable agents to learn eciently in formidable tasks. Random explorations such as -greedy are inecient in high dimensional cases, failing to learn despite training for hundreds of million steps in sparse reward games (Bellemare et al., 2016) . Alternative approaches propose to use intrinsic motivation to aid exploration by adding bonuses to the environment's rewards (Bellemare et al., 2016; Stadie et al., 2015) . The intrinsic reward is often proportional to the novelty of the visiting state: it is high if the state is novel (e.g. dierent from the past ones (Badia Another view of intrinsic motivation is from surprise, which refers to the result of the experience being unexpected, and is determined by the discrepancy between the expectation (from the gent's prediction) and observed reality (Barto et al., 2013; Schmidhuber, 2010) . Technically, surprise is the dierence between prediction and observation representation vectors. The norm of the residual (i.e. prediction error) is used as the intrinsic reward.



Figure 1: Montezuma Revenge: surprise novelty better reects the originality of the environment than surprise norm. While surprise norm can be signicant even for dull events such as those in the dark room due to unpredictability, surprise novelty tends to be less (3 rd and 6 th image). On the other hand, surprise novelty can be higher in truly vivid states on the rst visit to the ladder and island rooms (1 st and 2 nd image) and reduced on the

