DYNAMIC RELATIONAL INFERENCE IN MULTI-AGENT TRAJECTORIES

Abstract

Unsupervised learning of interactions from multi-agent trajectories has broad applications in physics, vision and robotics. However, existing neural relational inference works are limited to static relations. In this paper, we consider a more general setting of dynamic relational inference where interactions change over time. We propose DYnamic multi-Agent Relational Inference (DYARI) model, a deep generative model that can reason about dynamic relations. Using a simulated physics system, we study various dynamic relation scenarios, including periodic and additive dynamics. We perform comprehensive study on the trade-off between dynamic and inference period, the impact of training scheme, and model architecture on dynamic relational inference accuracy. We also showcase an application of our model to infer coordination and competition patterns from real-world multi-agent basketball trajectories.

1. INTRODUCTION

Particles, friends, and teams are multi-agent relations at different scales. Learning multi-agent interactions is essential to our understanding of the structures and dynamics underlying many systems. Practical examples include understanding social dynamics among pedestrians (Alahi et al., 2016) , learning communication protocols in traffic (Sukhbaatar et al., 2016; Lowe et al., 2017) and predicting physical interactions of particles (Mrowca et al., 2018; Li et al., 2018; Sanchez-Gonzalez et al., 2020) . Most existing work on modeling relations assume the interactions are observed and train the models with supervised learning. For multi-agent trajectories, the interactions are hidden and thus need to be inferred from data in an unsupervised fashion. While one could impose an interaction graph structure (Battaglia et al., 2016) , it is difficult to find the correct structure as the search space is very large (Grosse et al., 2012) . The search task is computationally expensive and the resulting model can potentially suffer from the model misspecification issue (Koopmans & Reiersol, 1950) .

Neural Relational Inference for Interacting Systems

Thomas Kipf * 1 Ethan Fetaya * 2 3 Kuan-Chieh Wang 2 3 Max Welling 1 4 Richard Zemel 2 3 4

Abstract

Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data. 1 , NRI simultaneously learns the dynamics from multi-agent trajectories and infers their relations. In particular, NRI builds upon variational auto-encoder (VAE) (Kingma & Welling, 2013) and introduces latent variables to represent the hidden relations. Despite its flexibility, a major limiting factor of NRI is that it assumes the relations among the agents are static. That is, two agents are either interacting or not interacting regardless of their states at different time steps, which is rather restrictive. In this paper, we study a more realistic setting: dynamic relational inference. For example, in game plays, players can coordinate and compete dynamically depending on the strategy. We propose a novel deep generative model, which we call DYnamic multi-Agent Relational Inference (DYARI). DYARI encodes trajectory interactions at different time steps. It utilizes deep temporal CNN models with pyramid pooling to extract rich representations from the interactions. DYARI infers the relations for each sub-sequence dynamically and jointly decode a sequence of relations. As relational inference is unsupervised, we use simulated dynamics physics systems as ground truth for validation. We find that the performance of the static NRI model deteriorates significantly with shorter output trajectories, making it unsuitable for dynamic relational inference. In contrast, DYARI is able to accurate infer the hidden relations with various dynamics scenarios. We also perform extensive ablative study to understand the effect of inference period, training schemes and model architecture. Finally, We showcase our DYARI model on real-world basketball trajectories. In summary, our contributions include: • We tackle the challenging problem of unsupervised learning of hidden dynamic relations given multi-agent trajectories. • We develop a novel deep generative model called DYARI to handle time-varying interactions and predict a sequence of hidden relations in an end-to-end fashion. • We demonstrate the effectiveness our method on both the simulated physics dynamics and real-world basketball game play datasets.

2. RELATED WORK

Deep sequence models Deep sequence models include both deterministic models (Alahi et al., 2016; Li et al., 2019; Mittal et al., 2020) and stochastic models (Chung et al., 2015; Fraccaro et al., 2016; Krishnan et al., 2017; Rangapuram et al., 2018; Chen et al., 2018; Huang et al., 2018; Yoon et al., 2019) . For GAN-like models, ( Grover et al. (2018) directly models the episodes of interaction data with GNs for learning multiagent policies. Our method instantiates the multi-agent imitation learning framework, but focuses on relational inference. Our approach is also applicable to dynamic modeling in model-based RL.



Yoon et al., 2019) combine adversarial training and a supervised learning objective for time series forecasting. Liu et al. (2019) propose a non-autoregressive model for sequence generation. Compared with GANs, VAE-type models can provide explicit inference and are preferable for our purpose. For instance, Chung et al. (2015) introduces stochastic layers in recurrent neural networks to model speech and hand-writing. Rangapuram et al. (2018) parameterizes a linear state-space model for probabilistic time series forecasting. Chen et al. (2018); Huang et al. (2018) combine normalizing flows with autoregressive models. However, all existing models only model the temporal latent states for individual sequences rather than their interactions.Relational inference Graph neural networks (GNNs) seek to learn representations over relational data, see several recent surveys on GNNs and the references therein, e.g.(Wu et al., 2019; Goyal &  Ferrara, 2018). Unfortunately, most existing work assume the graph structure is observed and train with supervised learning. In contrast, relational inference aims to discover the hidden interactions and is unsupervised. Earlier work in relational reasoning(Koller et al., 2007)  use probabilistic graphical models, but requires significant feature engineering. The seminal work of NRI(Kipf et al., 2018)   use neural networks to reason in dynamic physical systems. Alet et al. (2019) reformulates NRI as meta-learning and proposes simulated annealing to search for graph structures. Relational inference is also posed as Granger causal inference for sequences(Louizos et al., 2017; Löwe et al., 2020). Nevertheless, all existing work are limited to static relations while we focus on dynamic relations. Multi-agent learning Multi-agent trajectories arises frequently in reinforcement learning (RL) and imitation learning (IL)(Albrecht & Stone, 2018; Jaderberg et al., 2019). Modeling agent interactions given dynamic observations from the environment remains a central topic. In the RL setting, for example, Sukhbaatar et al. (2016) models the control policy in a fully cooperative multi-agent setting and applies a GNN to represent the communications.Le et al. (2017)  models the agents coordination as a latent variable for imitation learning.Song et al. (2018) generalizes GAIL (Ho & Ermon, 2016)   to multi-agent through a shared generator. However, these coordination models only capture the global interactions implicitly without the explicit graph structure.Tacchetti et al. (2019)  combines GNN with a forward dynamics model to model multi-agent coordination but also requires supervision.

