LEARNING TO OBSERVE WITH REINFORCEMENT LEARNING

Abstract

We consider a decision making problem where an autonomous agent decides on which actions to take based on the observations it collects from the environment. We are interested in revealing the information structure of the observation space illustrating which type of observations are the most important (such as position versus velocity) and the dependence of this on the state of agent (such as at the bottom versus top of a hill). We approach this problem by associating a cost with collecting observations which increases with the accuracy. We adopt a reinforcement learning (RL) framework where the RL agent learns to adjust the accuracy of the observations alongside learning to perform the original task. We consider both the scenario where the accuracy can be adjusted continuously and also the scenario where the agent has to choose between given preset levels, such as taking a sample perfectly or not taking a sample at all. In contrast to the existing work that mostly focuses on sample efficiency during training, our focus is on the behaviour during the actual task. Our results illustrate that the RL agent can learn to use the observation space efficiently and obtain satisfactory performance in the original task while collecting effectively smaller amount of data. By uncovering the relative usefulness of different types of observations and trade-offs within, these results also provide insights for further design of active data acquisition schemes.

1. INTRODUCTION

Autonomous decision making relies on collecting data, i.e. observations, from the environment where the actions are decided based on the observations. We are interested in revealing the information structure of the observation space illustrating which type of observations are the most important (such as position versus velocity). Revealing this structure is challenging since the usefulness of the information that an observation can bring is a priori unknown and depends on the environment as well as the current knowledge state of the decision-maker, for instance, whether the agent is at the bottom versus the top of a hill and how sure the agent is about its position. Hence, we're interested in questions such as "Instead of collecting all available observations, is it possible to skip some observations and obtain satisfactory performance?", "Which observation components (such as the position or the velocity) are the most useful when the object is far away from (or close to) the target state?". The primary aim of this work is to reveal this information structure of the observation space within a systematic framework. We approach this problem by associating a cost with collecting observations which increases with the accuracy. The agent can choose the accuracy level of its observations. Since cost increases with the accuracy, we expect that the agent will choose to collect only the observations which are most likely to be informative and worth the cost. We adopt a reinforcement learning (RL) framework where the RL agent learns to adjust the accuracy of the observations alongside learning to perform the original task. We consider both the scenario where the accuracy can be adjusted continuously and also the scenario where the agent has to choose between given preset levels, such as taking a sample perfectly or not taking a sample at all. In contrast to the existing work that mostly focuses on sample efficiency during training, our focus is on the behaviour during the actual task. Our results illustrate that the RL agent can learn to use the observation space efficiently and obtain satisfactory performance in the original task while collecting effectively smaller amount of data.

