STAY MORAL AND EXPLORE: LEARN TO BEHAVE MORALLY IN TEXT-BASED GAMES

Abstract

Reinforcement learning (RL) in text-based games has developed rapidly and achieved promising results. However, little effort has been expended to design agents that pursue objectives while behaving morally, which is a critical issue in the field of autonomous agents. In this paper, we propose a general algorithm named Moral Awareness Adaptive Learning (MorAL) that enhances the morality capacity of an agent using a plugin moral-aware learning model. The algorithm allows the agent to execute task learning and morality learning adaptively. The agent selects trajectories from past experiences during task learning. Meanwhile, the trajectories are used to conduct self-imitation learning with a moral-enhanced objective. In order to achieve the trade-off between morality and task progress, the agent uses the combination of task policy and moral policy for action selection. We evaluate on the Jiminy Cricket benchmark, a set of text-based games with various scenes and dense morality annotations. Our experiments demonstrate that, compared with strong contemporary value alignment approaches, the proposed algorithm improves task performance while reducing immoral behaviours in various games.

1. INTRODUCTION

Text-based games have emerged as promising environments where the game agents comprehend situations in language and make language-based decisions (Hausknecht et al., 2020b) . These games have been proven to be suitable test-beds for studying various natural language processing (NLP) problems, such as question answering (Yuan et al., 2019 ), dialogue systems (Ammanabrolu et al., 2022a) , situated language learning (Shridhar et al., 2020) and commonsense reasoning (Murugesan et al., 2021) . Recent years have witnessed the thrives of designing reinforcement learning (RL) agents in solving these games (Narasimhan et al., 2015; Hausknecht et al., 2020a) . Among them, identifying admissible actions in large action spaces is challenging. The majority of existing RL agents use a set of predefined action candidates provided by the environment (He et al., 2015) . Recently, CALM uses a language model to generate a compact set of action candidates for RL agents to select, which addresses the combinatorial action space problem (Yao et al., 2020) . Unfortunately, it is observed that actions generated by agents may be immoral, such as stealing and attacking humans. RL agents may select immoral actions, especially when trained in environments that dismiss moral concerns (Ammanabrolu et al., 2022b) . Figure 1 provides an example of gameplay from the text-based game "Zork1". Applying agents with embedded immoral bias to real scenarios will raise concerning issues (Russell et al., 2015; Amodei et al., 2016) . To our knowledge, however, little effort has been expended to design agents that pursue specific objectives while behaving morally. Recently, the Jiminy Cricket benchmark provides a set of text-based games with various scenes and dense morality annotations (Hendrycks et al., 2021b) . Jiminy Cricket benchmark evaluates game agents comprehensively by annotating the morality of each action they took. These annotations have a wide variety of morally significant circumstances, ranging from bodily injury and theft to altruism. Consequently, an urgent challenge in designing and training RL agents is ensuring they can make decisions consistent with expected human values in a given context.

