GROUNDING GRAPH NETWORK SIMULATORS USING PHYSICAL SENSOR OBSERVATIONS

Abstract

Physical simulations that accurately model reality are crucial for many engineering disciplines such as mechanical engineering and robotic motion planning. In recent years, learned Graph Network Simulators produced accurate mesh-based simulations while requiring only a fraction of the computational cost of traditional simulators. Yet, the resulting predictors are confined to learning from data generated by existing mesh-based simulators and thus cannot include real world sensory information such as point cloud data. As these predictors have to simulate complex physical systems from only an initial state, they exhibit a high error accumulation for long-term predictions. In this work, we integrate sensory information to ground Graph Network Simulators on real world observations. In particular, we predict the mesh state of deformable objects by utilizing point cloud data. The resulting model allows for accurate predictions over longer time horizons, even under uncertainties in the simulation, such as unknown material properties. Since point clouds are usually not available for every time step, especially in online settings, we employ an imputation-based model. The model can make use of such additional information only when provided, and resorts to a standard Graph Network Simulator, otherwise. We experimentally validate our approach on a suite of prediction tasks for mesh-based interactions between soft and rigid bodies. Our method results in utilization of additional point cloud information to accurately predict stable simulations where existing Graph Network Simulators fail.

1. INTRODUCTION

Mesh-based simulation of complex physical systems lies at the heart of many fields in numerical science and engineering (Liu et al., 2022; Reddy, 2019; Rao, 2017; Sabat & Kundu, 2021) . Applications include structural mechanics (Zienkiewicz & Taylor, 2005; Stanova et al., 2015) , electromagnetics (Jin, 2015; Xiao et al., 2022; Coggon, 1971) , fluid dynamics (Chung, 1978; Zawawi et al., 2018; Long et al., 2021) and biomedical engineering (Van Staden et al., 2006; Soro et al., 2018) , most of which traditionally depend on highly specialized task-dependent simulators. Recent advancements in deep learning brought rise to more general learned dynamic models such as Graph Network Simulators (GNSs) (Sanchez-Gonzalez et al., 2018; 2020; Pfaff et al., 2021) . GNSs learn to predict the dynamics of a system from data by encoding the system state as a graph and then iteratively computing the dynamics for every node in the graph with a Graph Neural Network (GNN) (Scarselli et al., 2009; Battaglia et al., 2018; Wu et al., 2020b) . Recent extensions include long-term fluid flow predictions (Han et al., 2022) and dynamics on different scales (Fortunato et al., 2022 ). Yet, these approaches assume full knowledge of the initial system state, making them ill-suited for applications Figure 1 : A robot's end-effector (grey, red) grasps a 3-dimensional deformable cavity. The robot maintains an internal simulated prediction of the cavity (orange) for two consecutive simulation steps (left, right). This prediction can deviate from the true state of the cavity over time due to an accumulation of error. However, the true cavity state can infrequently be observed from point cloud data (blue), which the model can use to correct its prediction. Here, the point cloud is used to contract the simulated cavity at the bottom and extend it at the top, causing the points to better align with the mesh surface. We repeat the point cloud from the earlier simulation step in both images for clarity. like model-predictive control (Camacho & Alba, 2013; Schwenzer et al., 2021) and model-based Reinforcement Learning (Polydoros & Nalpantidis, 2017; Moerland et al., 2020) where accurate predictions must be made based on partial initial states and observations. In this work, we present Grounding Graph Network Simulators (GGNSs), a new class of GNS that can process sensory information as input to ground predictions in the scene observations. More precisely, we extend the graph of the current system state with point cloud data before predicting the system dynamics from it. Since point clouds do not provide correspondences over time, it is difficult to learn dynamics from point clouds alone. Thus, we use mesh-based data to learn the general system dynamics and utilize point clouds to correct the predictions. As the sensory data is not always available, particularly not for future predictions, our architecture is trained with imputed point clouds, i.e., for each time step the model receives point clouds only with a certain probability. This training scheme allows the model to efficiently integrate the additional information whenever provided. During inference, the model iteratively predicts the next system state, using point clouds whenever available to greatly improve the simulation quality, especially for simulations with incomplete initial state information. Furthermore, our architecture addresses a critical research topic for GNSs by alleviating common challenges such as drift and error accumulation during long-term predictions. As a practical example, consider a robot grasping a deformable object. For optimal planning of the grasp, the robot needs to model the state of the deformable object over time and predict the influence of interactions between object and gripper. This prediction not only depends on the initial shape of the object, but also on the forces the robot applies, the kind of material to grasp and external factors such as the temperature, making it difficult to accurately predict how the material will deform over time. However, once the robot starts deforming the object, it may easily observe the deformations in the form of e.g., point clouds. These observations can then be integrated into the state prediction, i.e., they can ground the simulation whenever new information becomes available. An example is given in Figure 1 . Such observation-aided prediction is similar in nature to e.g., Kalman Filters (Kalman, 1960; Jazwinski, 1970; Becker et al., 2019) as the belief of the system state is updated based on partial observations about the system. However, while Kalman Filters explicitly integrate novel information into the belief in a mathematical fashion, we instead simply provide this information to a learned model as additional unstructured sensor input. We evaluate GGNS on a suite of 2d and 3d deformation prediction tasks created in the Simulation Open Framework Architecture (SOFA) (Faure et al., 2012) . Comparing our approach to an existing GNS (Pfaff et al., 2021) , we find that adding sensory information in the form of point clouds to our model improves the simulation quality for all tasks. We investigate this behavior through extensive ablation studies, showing the importance of different parameter choices and design decisions. Code and data can be found under https://github.com/jlinki/GGNS. Our list of contributions is as follows: (I) We extend the GNS framework to include sensory information to ground predicted simulations in observations of the system state, allowing for accurate

