TEMPORAL CHANGE SENSITIVE REPRESENTATION FOR REINFORCEMENT LEARING

Abstract

Image-based deep reinforcement learning has made a great improvement recently by combining state-of-the-art reinforcement learning algorithms with selfsupervised representation learning algorithms. However, these self-supervised representation learning algorithms are designed to preserve global visual information, which may miss changes in visual information that are important for performing the task, like in Figure 1 . To resolve this problem, self-supervised representation learning specifically designed for better preserving task relevant information is necessary. Following this idea, we introduce Temporal Change Sensitive Representation (TCSR), which is designed for reinforcement learning algorithms that have a latent dynamic model. TCSR enforces the latent state representation of the reinforcement agent to put more emphasis on the part of observation that could potentially change in the future. Our method achieves SoTA performance in Atari100K benchmark.



Figure 1 : The ground truth observation compared with image reconstructed from latent state representation predicted by TCSR and EfficientZero. TCSR can not only predict the movement of enemies in the short term (Marked in the yellow box) but also predict exactly when and where the UFO will release a new enemy till the end of the planning horizon (Marked in the red box). However, EfficientZero fails to predict both of these changes. This shows that TCSR is more sensitive to the changes in the latent state representation. These change includes but not limited to position, appearance and disappearance of task related objects as shown in this figure.

1. INTRODUCTION

Deep reinforcement learning has achieved much success in solving image based tasks over the last several years. A critical step to solving image based tasks is learning a good representation of the 1

