INTERACTIVE SEQUENTIAL GENERATIVE MODELS

Abstract

Understanding spatiotemporal relationships among several agents is of considerable relevance for many domains. Team sports represent a particularly interesting real-world proving ground since modeling interacting athletes requires capturing highly dynamic and complex agent-agent dependencies in addition to temporal components. However, existing generative methods in this field either entangle all latent factors into a single variable and are thus constrained in practical applicability, or they focus on uncovering interaction structures, which restricts their generative ability. To address this gap, we propose a framework for multiagent trajectories that augments graph-structured sequential generative models with explicit latent social dependencies. First, we derive a novel objective within the variational autoencoder family using a disentangled latent space that aims to encapsulate inherent data traits. Based on the proposed training criterion, we then present a model architecture that unifies insights from neural interaction inference and graph-structured variational recurrent neural networks for generating collective movements while allocating latent information. We validate our model on data from professional soccer and basketball. Our framework not only improves upon existing state-of-the-art approaches in forecasting trajectories, but also infers semantically meaningful representations that can be used in downstream tasks.

1. INTRODUCTION

The study of agent behavior governed by temporal and spatial dependencies is of great importance in many different fields, such as autonomous driving (Brown et al., 2020; Rasouli & Tsotsos, 2019) , robot navigation (Rudenko et al., 2020) , or sports analytics (Tuyls et al., 2021) . In particular, accurate detection of implicit causal social structures offers several advantages by removing confounding factors for trajectory forecasting tasks and providing practitioners with interpretable dynamics that can in turn be integrated into downstream decision-making processes or applications. Modeling the dynamics of multiplayer sports games (Omidshafiei et al., 2022; Le et al., 2017; Yue et al., 2014; Liu et al., 2020) is particularly challenging since accurate trajectory generation in this environment requires capturing highly dynamic and complex underlying modular structures (Makansi et al., 2022) . For example, the roles prescribed in a team formation are a poor indicator of the actual behavior observed in a given situation. Moreover, most of the interacting elements inject noise into the forecasting process because they are either irrelevant (e.g., goal keepers) or their influential nature changes as the situation evolves. However, existing methods for modeling sports data rely on graph encoding strategies (Kipf & Welling, 2016; Vaswani et al., 2017) that aggregate social information into only single variables that need to capture all latent stochasticity (Zhan et al., 2019; Yeh et al., 2019; Sun et al., 2019; Omidshafiei et al., 2022) . In recent years, a considerable amount of methods have been proposed that aim to infer interactive components in general multiagent systems via discrete latent variables. These methods are usually formulated as some form of variational autoencoder (Kingma & Welling, 2013; Sohn et al., 2015) that learns latent edge categories of an assumed underlying graph structure (Kipf et al., 2018; Graber & Schwing, 2020; Löwe et al., 2022) . However, being the only causal factors specified, the proposed frameworks neglect other potential latent characteristics not originating in mere interactive categories but equally affecting multimodal agent behavior, which limits their generative capacity. To address previous shortcomings, we propose a novel framework for modeling multiagent trajectory data that enhances existing graph-structured latent variable models by explicitly encoding social 1

