LOCAL INFORMATION OPPONENT MODELLING USING VARIATIONAL AUTOENCODERS

Abstract

Modelling the behaviours of other agents (opponents) is essential for understanding how agents interact and making effective decisions. Existing methods for opponent modelling commonly assume knowledge of the local observations and chosen actions of the modelled opponents, which can significantly limit their applicability. We propose a new modelling technique based on variational autoencoders, which are trained to reconstruct the local actions and observations of the opponent based on embeddings which depend only on the local observations of the modelling agent (its observed world state, chosen actions, and received rewards). The embeddings are used to augment the modelling agent's decision policy which is trained via deep reinforcement learning; thus the policy does not require access to opponent observations. We provide a comprehensive evaluation and ablation study in diverse multi-agent tasks, showing that our method achieves comparable performance to an ideal baseline which has full access to opponent's information, and significantly higher returns than a baseline method which does not use the learned embeddings.

1. INTRODUCTION

An important aspect of autonomous decision-making agents is the ability to reason about the unknown intentions and behaviours of other agents. Much research has been devoted to this opponent modelling problem [2] , with recent works focused on the use of deep learning architectures for opponent modelling and reinforcement learning (RL) [20, 34, 16, 33] . A common assumption in existing methods is that the modelling agent has access to the local trajectory of the modelled agents [2] , which may include their local observations of the environment state, their past actions, and possibly their received rewards. While it is certainly desirable to be able to observe an agent's local context in order to reason about its past and future decisions, in practice such an assumption may be too restrictive. Agents may only have a limited view of their surroundings, communication with other agents may not be feasible or reliable [40] , and knowledge of the perception system of other agents may not be available [13] . In such cases, an agent must reason with only locally available information. We consider the question: Can effective opponent modelling be achieved using only the locally available information of the modelling agent during execution? A strength of deep learning techniques is their ability to identify informative features in data. Here, we use deep learning techniques to extract informative features from a stream of local observations for the purpose of opponent modelling. Specifically, we consider multi-agent settings in which we control a single agent which must learn to interact with a set of opponent agents (we use the term "opponent" in a neutral sense). We assume a given set of possible policies for opponent agents and that these policies are fixed (that is, other agents do not simultaneously learn, such as in multi-agent RL [32] ). We propose an opponent modelling method which is able to extract a compact yet informative representation of opponents given only the local information of the controlled agent, which includes its local state observations, past actions, and rewards. To this end, we use an encoder-decoder architecture based on variational autoencoders (VAE) [26] . The VAE model is trained to replicate opponent actions and observations from the local information only. During training, the opponent's observations are utilised as reconstruction targets for the decoder; after training, only the encoder component is retained which generates embeddings using local observations of the controlled agent. The learned embeddings condition the policy of the

