WATCH-AND-HELP: A CHALLENGE FOR SOCIAL PER-CEPTION AND HUMAN-AI COLLABORATION

Abstract

In this paper, we introduce Watch-And-Help (WAH), a challenge for testing social intelligence in agents. In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently. To succeed, the AI agent needs to i) understand the underlying goal of the task by watching a single demonstration of the human-like agent performing the same task (social perception), and ii) coordinate with the human-like agent to solve the task in an unseen environment as fast as possible (human-AI collaboration). For this challenge, we build VirtualHome-Social, a multi-agent household environment, and provide a benchmark including both planning and learning based baselines. We evaluate the performance of AI agents with the human-like agent as well as with real humans using objective metrics and subjective user ratings. Experimental results demonstrate that the proposed challenge and virtual environment enable a systematic evaluation on the important aspects of machine social intelligence at scale. 1

1. INTRODUCTION

Humans exhibit altruistic behaviors at an early age (Warneken & Tomasello, 2006) . Without much prior experience, children can robustly recognize goals of other people by simply watching them act in an environment, and are able to come up with plans to help them, even in novel scenarios. In contrast, the most advanced AI systems to date still struggle with such basic social skills. In order to achieve the level of social intelligence required to effectively help humans, an AI agent should acquire two key abilities: i) social perception, i.e., the ability to understand human behavior, and ii) collaborative planning, i.e., the ability to reason about the physical environment and plan its actions to coordinate with humans. In this paper, we are interested in developing AI agents with these two abilities. Towards this goal, we introduce a new AI challenge, Watch-And-Help (WAH), which focuses on social perception and human-AI collaboration. In this challenge, an AI agent needs to collaborate with a human-like agent to enable it to achieve the goal faster. In particular, we present a 2-stage framework as shown in Figure 1 . In the first, Watch stage, an AI agent (Bob) watches a human-like agent (Alice) performing a task once and infers Alice's goal from her actions. In the second, Help stage, Bob helps Alice achieve the same goal in a different environment as quickly as possible (i.e., with the minimum number of environment steps). This 2-stage framework poses unique challenges for human-AI collaboration. Unlike prior work which provides a common goal a priori or considers a small goal space (Goodrich & Schultz, 2007; Carroll et al., 2019) , our AI agent has to reason about what the human-like agent is trying to achieve by watching a single demonstration. Furthermore, the AI agent has to generalize its acquired knowl-



Code and documentation for the VirtualHome-Social environment are available at https:// virtual-home.org. Code and data for the WAH challenge are available at https://github.com/ xavierpuigf/watch_and_help. A supplementary video can be viewed at https://youtu.be/ lrB4K2i8xPI.

