AGENT PRIORITIZATION WITH INTERPRETABLE RE-LATION FOR TRAJECTORY PREDICTION

Abstract

In this paper, we present a novel multi-agent trajectory prediction model, which discovers interpretable relations among agents and prioritize agent's motion. Different from existing approaches, our interpretable design is inspired by the fundamental navigation and motion functions of agent movements, which represent 'where' and 'how' the agents move in the scenes. Specifically, it generates the relation matrix, where each element indicates the motion impact from one to another. In addition, in highly interactive scenarios, one agent may implicitly gain higher priority to move, while the motion of other agents may be impacted by the prioritized agents with higher priority (e.g., a vehicle stopping or reducing its speed due to crossing pedestrians). Based on this intuition, we design a novel motion prioritization module to learn the agent motion priorities based on the inferred relation matrix. Then, a decoder is proposed to sequentially predict and iteratively update the future trajectories of each agent based on their priority orders and the learned relation structures. We first demonstrate the effectiveness of our prediction model on simulated Charged Particles (Kipf et al., 2018) dataset. Next, extensive evaluations are performed on commonly-used datasets for robot navigation, human-robot interactions, and autonomous agents: real-world NBA basketball (Yue et al., 2014) and INTERACTION (Zhan et al., 2019). Finally, we show that the proposed model outperforms other state-of-the-art relation based methods, and is capable to infer interpretable, meaningful relations among agents.



Figure 1 : Different from the common paradigm on inferring relation for trajectory prediction, our approach aims to learn interpretable relations, prioritize agent motions, and make in-order prediction based on their priorities. Multi-agent trajectory prediction is an essential component in a wide range of applications from robot navigation to autonomous intelligent systems. While navigating in crowded scenes, autonomous agents (i.e., robots and vehicles) not only themselves interact, but also should have ability to observe others' interactions and anticipate where other agents will move in near future. This ability is crucial for autonomous agents to avoid collisions and plan meaningful machinehuman/machine-machine interactions. Designing a robust and accurate trajectory prediction model has attracted much of recent research efforts. In fact, meaningful reasoning about interactions among agents provides valuable cues to improve the trajectory prediction accuracy, especially in highly interactive scenarios. However, how to learn/discover meaningful relations among agents from historical motion data to improve prediction accuracy remains a challenging task in recent research. There is a large body of recent research focused on modeling interaction among agents for future trajectory prediction. Some notable research in this field learn interaction features using advanced deep learning techniques such as graph neural networks (Scarselli et al., 2008) , social pooling mechanism (Alahi et al., 2016; Gupta et al., 2018) , or attention networks (Kamra et al., 2020) . These works follow a common paradigm, as shown in Figure 1a , to infer the relations among agents using their historical motions. The main limitation of these approaches is that they lack a mechanism to learn the motion importance of each agent in the scene. In realistically interactive scenarios, it is often that the agent movements implicitly imply that an agent gains higher priority to decide where and when to move, their movements will impact the others in the scene. For example, in driving scenarios, vehicles yield to crossing pedestrians (higher prioritized agents). In the basket ball game, the other players' movements are likely to be conditioned (i.e., impacted ) by the ball-controlling offensive player. To handle the aforementioned limitations, we propose a new approach, shown in Figure 1b , to prioritize each agent motion based on their interpretable relations. Our model first learns the interaction among agents at each time-step. Inspired by the relation learning method (Fujii et al., 2021) for animal movements, we design a inter-agent encoder that consists of two sub-encoders, each of which represents innate movement and navigation capacities of agents. While the navigation encoder captures the agent relations based on the moving directions (i.e, where to move), the motion encoder infers the relation based on the motion capacity (i.e., how to move). Next, the prioritization module quantifies the importance score (i.e., priority) of each agent by measuring their motion impacts on other agents. Based on the orders of priorities, sequential predictions are made to allow the predicted future trajectories of agents with higher priorities to impact on the lower prioritized ones within their relation structures. In summary, our contributions are: • We propose a novel prediction pipeline with motion prioritization module to prioritize the importance of each agent based on its motion impacts on other agents within their interpretable relation structures. • We design an interpretable interaction encoder to capture the agent relations from both navigational and motion perspectives. The relationships among agents are learned interpretably at each observed time step to produce meaningful relation structures for prioritization task. • We evaluate our prediction model on several highly interactive datasets: Charged Particles, NBA, and INTERACTION. We show that the proposed model is able to learn meaningful interaction features and outperforms state-of-the-art models on these datasets.

2. RELATED WORKS

Multi-Agent Trajectory Prediction Multi-agent trajectory prediction is an actively researched problem due to its broad applications in robot planning (Schmerling et al., 2018) , traffic prediction (Liao et al., 2018) , sport video analysis (Felsen et al., 2017) . Recent research have focused on modeling the relations among agents seeking to improve trajectory prediction. In general, existing approaches employ common structures such as graph neural network, social-GAN (Mohamed et al., 2020) , attention network, transformer, etc. to learn agent interactions from their motions. Notably, Kosaraju et al. (Kosaraju et al., 2019) on graph attention network (Veličković et al., 2017) to decide how much information to share between agents. Kamra et al. (Kamra et al., 2020) developed a dedicated attention mechanism for trajectory prediction from the inductive bias of motion and intents. Kipf et al. (Kipf et al., 2018) proposed neural relational inference (NRI) proposed by taking the form of a variational auto-encoder;. Jiachen et al. (Li et al., 2020) improved NRI by using graphs that evolve over time. Although these approaches can capture some interactions among agent, they lack mechanism to reason the agent priorities, important feature cues to improve trajectory results. Relation Discovery Another closely related research theme focuses on discovering the Grangercausal (GC) relationship among agents. SENN (Alvarez Melis & Jaakkola, 2018) is the first selfexplanatory network, a class of intrinsically interpretable models, which explains the contributions concepts (i.e., raw inputs) to predictions. SENN was applied to infer GC relationship via generalized vector autoregression model (GVAR) (Marcinkevičs & Vogt, 2021) , which captures the GC relationship via coefficient matrices. These models have shown promising performances to learn

