EVALUATING AGENTS WITHOUT REWARDS

Abstract

Reinforcement learning has enabled agents to solve challenging control tasks from raw image inputs. However, manually crafting reward functions can be time consuming, expensive, and prone to human error. Competing objectives have been proposed for agents to learn without external supervision, such as artificial input entropy, information gain, and empowerment. Estimating these objectives can be challenging and it remains unclear how well they reflect task rewards or human behavior. We study these objectives across seven agents and three Atari games. Retrospectively computing the objectives from the agent's lifetime of experience simplifies accurate estimation. We find that all three objectives correlate more strongly with a human behavior similarity metric than with task reward. Moreover, input entropy and information gain both correlate more strongly with human similarity than task reward does.

Metric Reward Correlation

, and game playing (Mnih et al., 2015; Silver et al., 2017) . However, many of these successes are built upon rich supervision in the form of manually defined reward functions. Unfortunately, designing informative reward functions is often expensive, time-consuming, and prone to human error (Krakovna et al., 2020) . Furthermore, these difficulties increase with the complexity of the task of interest. In contrast to many RL agents, natural agents generally learn without externally provided tasks, through intrinsic objectives. For example, children explore the world by crawling around and playing with objects they find. Inspired by this, the field of intrinsic motivation (Schmidhuber, 1991; Oudeyer et al., 2007) seeks mathematical objectives for RL agents that do not depend on a specific task and can be applicable to any unknown environment. We study three common types of intrinsic motivation: • Input entropy encourages encountering rare sensory inputs, measured by a learned density model (Schmidhuber, 1990; Bellemare et al., 2016b; Pathak et al., 2017; Burda et al., 2018b ). • Information gain, or infogain for short, rewards the agent for discovering the rules of its environment (Lindley et al., 1956; Houthooft et al., 2016; Shyam et al., 2018; Sekar et al., 2020) . • Empowerment measures the agent's influence it has over its sensory inputs or environment (Klyubin et al., 2005; Mohamed and Rezende, 2015; Karl et al., 2017) . Despite the empirical success of intrinsic motivation for facilitating exploration (Bellemare et al., 2016b; Burda et al., 2018b) , it remains unclear which family of intrinsic objectives is best for a given scenario, for example when task rewards are sparse or unavailable, or when the goal is to behave similarly to human actors. Moreover, it is not clear whether different intrinsic objectives offer 1

