COMPUTATIONAL LANGUAGE ACQUISITION WITH THEORY OF MIND

Abstract

Unlike current state-of-the-art language models, young children actively acquire language through interactions with their surrounding environment and caretakers. One mechanism that has been argued to be critical to language learning is the ability to infer the mental states of other agents in social environments, coined Theory of Mind (ToM) by Premack & Woodruff (1978). Drawing inspiration from the modern operationalized versions of ToM implemented in Rabinowitz et al. ( 2018) and Zhu et al. ( 2021), we build language-learning agents equipped with ToM, and measure its effects on the learning process. We model ToM by giving the speaker agent an internal listener model that is trained alongside the speaker and used to rerank potential utterances. We experiment with varying task difficulty, hypothesizing that models will acquire more complex language to adapt to stronger environmental pressures. We find that training speakers with a highly weighted ToM listener component leads to performance gains in our image referential game setting. We also find some evidence that increasing task difficulty in the training process results in more fluent and precise utterances in evaluation. This suggests the potential utility of further incorporating ToM, as well as other insights from child language acquisition, into computational models of language acquisition 1 .

1. INTRODUCTION

Human languages are fundamentally shaped by social-communicative goals in the grounded world. Modern theories from developmental psychology often attribute humans' unique ability to quickly acquire and adapt language to their ability to ascribe mental states to other agents (Tomasello, 2005) , an ability also known as Theory of Mind (ToM). Some previous studies have attempted to perform computational modeling of ToM. For instance, ToM-like mechanisms have been demonstrated to allow models to better predict the behavior of a future agent (Rabinowitz et al., 2018) , model agents' beliefs in a negotiation (Cao et al., 2018) or a cooperative game (Bard et al., 2020) , or choose good utterances based on the listener's linguistic abilities (Zhu et al., 2021) . However, the effects of ToM have not yet been studied in the higher-level context of computational language acquisition. In this paper, we study how an internal ToM mechanism and external environmental pressure contribute to language learning. We use an image referential game setting consisting of a series of training episodes between a speaker, which represents a language learner (Zhu et al., 2022) , and a listener, which represents a fluent teacher. When presented with a set of images, one of which is the target referent, the speaker must learn to generate an English utterance that the listener can use to select the target. The speaker is rewarded for generating utterances that are used to correctly guess the target image. Additionally, the speaker may be given feedback depending on the confidence the listener has in the selection. This setting provides an attractive test-bed for testing the effects of various reward signals or model designs on the speaker's learned language; previous studies of pragmatics in language acquisition, such as Andreas & Klein (2016), have used similar settings.



Code and data can be found at https://github.com/neulab/ToM-Language-Acquisition.1

