STAY MORAL AND EXPLORE: LEARN TO BEHAVE MORALLY IN TEXT-BASED GAMES

Abstract

Reinforcement learning (RL) in text-based games has developed rapidly and achieved promising results. However, little effort has been expended to design agents that pursue objectives while behaving morally, which is a critical issue in the field of autonomous agents. In this paper, we propose a general algorithm named Moral Awareness Adaptive Learning (MorAL) that enhances the morality capacity of an agent using a plugin moral-aware learning model. The algorithm allows the agent to execute task learning and morality learning adaptively. The agent selects trajectories from past experiences during task learning. Meanwhile, the trajectories are used to conduct self-imitation learning with a moral-enhanced objective. In order to achieve the trade-off between morality and task progress, the agent uses the combination of task policy and moral policy for action selection. We evaluate on the Jiminy Cricket benchmark, a set of text-based games with various scenes and dense morality annotations. Our experiments demonstrate that, compared with strong contemporary value alignment approaches, the proposed algorithm improves task performance while reducing immoral behaviours in various games.

1. INTRODUCTION

Text-based games have emerged as promising environments where the game agents comprehend situations in language and make language-based decisions (Hausknecht et al., 2020b) . These games have been proven to be suitable test-beds for studying various natural language processing (NLP) problems, such as question answering (Yuan et al., 2019) , dialogue systems (Ammanabrolu et al., 2022a) , situated language learning (Shridhar et al., 2020) and commonsense reasoning (Murugesan et al., 2021) . Recent years have witnessed the thrives of designing reinforcement learning (RL) agents in solving these games (Narasimhan et al., 2015; Hausknecht et al., 2020a) . Among them, identifying admissible actions in large action spaces is challenging. The majority of existing RL agents use a set of predefined action candidates provided by the environment (He et al., 2015) . Recently, CALM uses a language model to generate a compact set of action candidates for RL agents to select, which addresses the combinatorial action space problem (Yao et al., 2020) . Unfortunately, it is observed that actions generated by agents may be immoral, such as stealing and attacking humans. RL agents may select immoral actions, especially when trained in environments that dismiss moral concerns (Ammanabrolu et al., 2022b) . Figure 1 provides an example of gameplay from the text-based game "Zork1". Applying agents with embedded immoral bias to real scenarios will raise concerning issues (Russell et al., 2015; Amodei et al., 2016) . To our knowledge, however, little effort has been expended to design agents that pursue specific objectives while behaving morally. Recently, the Jiminy Cricket benchmark provides a set of text-based games with various scenes and dense morality annotations (Hendrycks et al., 2021b) . Jiminy Cricket benchmark evaluates game agents comprehensively by annotating the morality of each action they took. These annotations have a wide variety of morally significant circumstances, ranging from bodily injury and theft to altruism. Consequently, an urgent challenge in designing and training RL agents is ensuring they can make decisions consistent with expected human values in a given context. ). However, such strategies suffer at least two drawbacks. First, designing an appropriate correction term for game rewards or Q-values is challenging, especially for extremely sparse game rewards. In addition, some immoral actions are necessary for progressing through the game. For instance, in the game "Zork1", the agent must steal the lantern to reach the following location on the map, as shown in Figure 1 . The trade-off between task progress and morality is a dilemma that agents may encounter while making decisions. In this paper, we design a general Moral Awareness Adaptive Learning (MorAL) algorithm to make an agent pursue its individual goal while behaving morally. Specifically, our MorAL algorithm allows the agent to execute a task policy with moral awareness control. During training, it has multiple stages to learn tasks and morality alternatingly. For task learning, the agent uses game rewards to learn a value function for the task policy over these actions. Then for morality learning, the agent collects high-quality trajectories from past experience and builds the moral awareness control policy via self-imitation learning with a moral-enhanced objective. To balance morality and game completion, the agent uses a mixture policy with the combination of the task policy and the moral policy. The algorithm eliminates the assumption that dense human feedback is required during training, as we only perform morality learning using a limited number of trajectories at specific stages. Experiments indicate that our algorithm significantly increases task performance and decreases the frequency of immoral behaviour in a variety of Jiminy Cricket games. In summary, our contributions are summarized as follows: Firstly, we provide a general algorithm to enhance an agent's morality capacity using a plugin moral-aware learning model. The algorithm conducts adaptive task learning and morality learning. Secondly, we develop a mixture policy to solve the trade-off between morality and progress in text-based games. Thirdly, compared to valuealigned game agents, our method improves both performance and morality in a variety of games from the Jiminy Cricket benchmarkfoot_0 .

2. RELATED WORKS

RL Agents for Text-based Games. Previous research has explored RL agents with varying architectures and learning schemes for text-based games (He et al., 2015; Narasimhan et al., 2015; Ammanabrolu & Hausknecht, 2020; Xu et al., 2021; Ryu et al., 2022) . Innovations include solving



The source code is available at https://github.com/winni18/MorAL.



Figure1: Excerpt from the text-based game "Zork1". Although the agent receives good rewards, it breaks into a house and steals a lantern from the living room, which is considered immoral and causes harm to others.

