KNOWLEDGE-GROUNDED REINFORCEMENT LEARN-ING

Abstract

Receiving knowledge, abiding by laws, and being aware of regulations are common behaviors in human society. Bearing in mind that reinforcement learning (RL) algorithms benefit from mimicking humanity, in this work, we propose that an RL agent can act on external guidance in both its learning process and model deployment, making the agent more socially acceptable. We introduce the concept, Knowledge-Grounded RL (KGRL), with a formal definition that an agent learns to follow external guidelines and develop its own policy. Moving towards the goal of KGRL, we propose a novel actor model with an embedding-based attention mechanism that can attend to either a learnable internal policy or external knowledge. The proposed method is orthogonal to training algorithms, and the external knowledge can be flexibly recomposed, rearranged, and reused in both training and inference stages. Through experiments on tasks with discrete and continuous action space, our KGRL agent is shown to be more sample efficient and generalizable, and it has flexibly rearrangeable knowledge embeddings and interpretable behaviors.

1. INTRODUCTION

Incorporating external guidance into learning is a commonly seen behavior among humans. We can speed up our learning process by referring to useful suggestions. At the same time, we beware of external regulations for safe and ethical reasons. On top of following external guidance, we humans can learn our own strategies to complete a task. We can also arbitrarily recompose, rearrange, and reuse those strategies and external guidelines to solve a new task and adapt to environmental changes. Imitating human behaviors has been shown to benefit reinforcement learning (RL) (Billard et al., 2016; Sutton and Barto, 2018; Zhang et al., 2019) . However, how an RL agent can achieve the above capabilities remains challenging. Different approaches have been proposed to develop some of these abilities. One branch of previous work in RL has studied how an agent can learn from demonstrations provided externally as examples of completing a task (Ross et al., 2011; Rajeswaran et al., 2017; Duan et al., 2017; Nair et al., 2018; Goecks et al., 2019; Ding et al., 2019) . Another branch of previous research has investigated how an agent can learn reusable policies. This branch of research includes (1) transferring knowledge among tasks with similar difficulty (Parisotto et al., 2015; Yin and Pan, 2017; Gupta et al., 2018a; Liu et al., 2019; Tao et al., 2021) and ( 2) learning multiple reusable skills to solve a complex task in a divideand-conquer manner (Bacon et al., 2017; Frans et al., 2017; Nachum et al., 2018a; Eysenbach et al., 2018; Kim et al., 2021; Tseng et al., 2021) . These methods allow an agent to learn a new task with fewer training samples. However, demonstrations are too task-specific to be reused in a new task, and incorporating external guidelines from different sources into existing knowledge-reuse frameworks is not straightforward. Moreover, current learning-from-demonstration and knowledge-reuse approaches lack the flexibility to rearrange and recompose different demonstrations or knowledge, so they cannot dynamically adapt to environmental changes. In this work, we introduce Knowledge Grounded Reinforcement Learning (KGRL), a novel problem with the following goal: An RL agent can learn its own policy (knowledge) while referring to external knowledge. Meanwhile, all knowledge can be arbitrarily recomposed, rearranged, and reused anytime in the learning and inference stages. A KGRL problem simultaneously considers the following three questions: (1) How can an agent follow a set of external knowledge from different sources? (2) How can an agent efficiently learn new knowledge by referring to external ones? (3) What is a proper

