MANIPULATING MULTI-AGENT NAVIGATION TASK VIA EMERGENT COMMUNICATIONS

Abstract

Multi-agent corporations struggle to efficiently sustain grounded communications with specific task goal. Existing approaches are limited in their simple task settings and single-turn communications. This work describes a multi-agent communication scenario via emergent language in navigation task. This task involves two agents with unequal abilities: the tourist (agent A) who can only observe its surroundings and the guide (agent B) who has the holistic view but does not know the initial position of agent A. They communicate with the emerged language grounded through the environment and a common task goal: help the tourist find the target place. We release a new dataset of 3000 scenarios that involve such visual and language navigation. We also seek to address the multi-agent emergent communications by proposing a collaborative learning framework that enables the agents to generate and understand emergent language and solve task. The framework is trained with reinforcement learning by maximizing the task success rate in an end-to-end manner. Results show that the proposed framework achieves competing performance in both accuracy of language understanding and in task success rate. We also discuss the explanations of the emerged language.

1. INTRODUCTION

Communication is a crucial factor for multiple agents to cooperate. While most recent works are focused on the interactions between the artificial agent with humans, there still are some research efforts made on communication between artificial agents. However, most of these works are focused on single-turn communication with a unidirectional message pass and the evolvement of natural language or some specific properties of the emergent language like compositionality, interpretability, and so on. But the multi-turn conversation is more analogous to human language. In a natural conversation, language generation and understanding should be mutual rather than unidirectional. Therefore, we provide a framework for agents to generate multi-turn dialogues including two agents. To prove the feasibility of our framework, we provide a scenario for agents to generate multi-turn conversations. We propose a new task adapted from the vision-language navigation(VLN) task coming from the human-machine communication area where the guide should communicate with the tourist to give guidance and help it find the target location. Different from traditional VLN tasks, in our settings, the tourist(agent A) and the guide(agent B) are both machines, rather than the original human-machine settings. Besides, we suppose that the guide does not know the initial position of the tourist, so the guide not only has to give guidance to the tourist but also confirm the location of it. Our contributions can be summarized as follows: 1. From the view of emergent language, we study the language with multiple turns. To give a suitable scenario for multi-turn conversations, we provide a navigation task adapted from the vision-language navigation(VLN) task. 2. Compared with methods with agents speaking the natural language, ours is cheaper with no expensive annotation, making it a more practical way for agents to communicate. 3. As far as we know, We are the first to propose a VLN-like task in a two-agent cooperation scenario. And we also provide a benchmark for it, which gives a possible solution to this kind of task.

