LEARNING RIGID DYNAMICS WITH FACE INTERACTION GRAPH NETWORKS

Abstract

Simulating rigid collisions among arbitrary shapes is notoriously difficult due to complex geometry and the strong non-linearity of the interactions. While graph neural network (GNN)-based models are effective at learning to simulate complex physical dynamics, such as fluids, cloth and articulated bodies, they have been less effective and efficient on rigid-body physics, except with very simple shapes. Existing methods that model collisions through the meshes' nodes are often inaccurate because they struggle when collisions occur on faces far from nodes. Alternative approaches that represent the geometry densely with many particles are prohibitively expensive for complex shapes. Here we introduce the "Face Interaction Graph Network" (FIGNet) which extends beyond GNN-based methods, and computes interactions between mesh faces, rather than nodes. Compared to learned node-and particle-based methods, FIGNet is around 4x more accurate in simulating complex shape interactions, while also 8x more computationally efficient on sparse, rigid meshes. Moreover, FIGNet can learn frictional dynamics directly from real-world data, and can be more accurate than analytical solvers given modest amounts of training data. FIGNet represents a key step forward in one of the few remaining physical domains which have seen little competition from learned simulators, and offers allied fields such as robotics, graphics and mechanical design a new tool for simulation and model-based planning.

1. INTRODUCTION

Simulating rigid bodies accurately is vital in a wide variety of disciplines from robotics to graphics to mechanical design. While popular general-purpose tools like Bullet (Coumans, 2015) , MuJoCo (Todorov et al., 2012) and Drake (Tedrake, 2019) can generate plausible predictions, predictions that match real-world observations accurately are notoriously difficult (Wieber et al., 2016; Anitescu & Potra, 1997; Stewart & Trinkle, 1996; Fazeli et al., 2017; Lan et al., 2022) . Numerical approximations necessary for efficiency are often inaccurate and unstable. Collision, contact and friction are challenging to model accurately, and hard to estimate parameters for. The dynamics are non-smooth and nearly discontinuous (Pfrommer et al., 2020; Parmar et al., 2021) , and influenced heavily by the fine-grained structure of colliding objects' surfaces (Bauza & Rodriguez, 2017) . Slight errors in the physical model or state estimates can thus lead to large errors in objects' predicted trajectories. This underpins the well-known sim-to-real gap between results from analytical solvers and real-world experiments. Learned simulators can potentially fill the sim-to-real gap. They can be trained to correct imperfect state estimation, and can learn physical dynamics directly from observations, potentially producing more accurate predictions than analytical solvers (Allen et al., 2022; Kloss et al., 2022) . Graph neural network (GNN)-based models, in particular, are effective at simulating liquids, sand, soft materials and simple rigids (Sanchez-Gonzalez et al., 2020; Mrowca et al., 2018; Li et al., 2019b; Pfaff et al., 2021; Li et al., 2019a) . Many GNN-based models are node-based: they detect and resolve potential collisions based on whether two mesh nodes or particles are within a local neighborhood. However, collisions between objects do not only happen at nodes. For example, two cubes may collide by one corner hitting the other's face, or one edge hitting the other's edge (in fact, cornerto-corner collisions are vanishingly rare). For larger meshes, fewer collisions occur within the local neighborhood around nodes, and thus collisions may be missed. Some prior work thus restricts collisions to simple scenarios (e.g. a single object colliding with a floor (Allen et al., 2022; Pfrommer et al., 2020) ). Alternative approaches represent the object densely with particle nodes (Li et al., 2019a; Sanchez-Gonzalez et al., 2020) , but that leads to a quadratic increase in node-node collision tests, which is computationally prohibitive for nontrivial scenes. Here we introduce a novel mesh-based approach to collision handling-Face Interaction Graph Networks (FIGNet)-which extends message passing from graphs with directed edges between nodes, to graphs with directed hyper-edges between faces. This allows FIGNet to compute interactions between mesh faces (whose representations are informed by their associated nodes) instead of nodes directly, allowing accurate and efficient collision handling on sparse meshes without missing collisions. Relative to prior node-based models, we show that for simulated multi-rigid interaction datasets like Kubric (Greff et al., 2022) , FIGNet is 8x more efficient at modeling interactions between sparse, simple rigid meshes, and around 4x more accurate in translation and rotation error for predicting the dynamics of more complex shapes with hundreds or thousands of nodes. We additionally show that FIGNet even outperforms analytical solvers for challenging real-world robotic pushing experiments (Yu et al., 2016) . To our knowledge, FIGNet is the first fully learned simulator that can accurately model collision interactions of multiple rigid objects with complex shapes.

2. RELATED WORK

To address the weaknesses of traditional analytic simulation methods, a variety of hybrid approaches that combine machine learning with analytic physics simulators have been proposed. Learned simulation models can be used to correct analytic models for a variety of real-world domains (Fazeli et al., 2017; Ajay et al., 2018; Kloss et al., 2017; Golemo et al., 2018; Zeng et al., 2020; Hwangbo et al., 2019; Heiden et al., 2021) . Analytic equations can also be embedded into neural solvers that exactly preserve physical laws (Pfrommer et al., 2020; Jiang et al., 2022) . While hybrid approaches can improve over analytic models in matching simulated and real object trajectories, they still struggle in cases where the analytic simulator is not a good model for the dynamics, and cannot be trivially extended to non-rigid systems. More pure learning-centric approaches to simulation have been proposed in recent years to support general physical dynamics, and GNN-based methods are among the most effective. GNNs represent entities and their relations with graphs, and compute their interactions using flexible neural network function approximators. Such methods can capture the dynamics of fluids and deformable meshes (Li et al., 2019a; Sanchez-Gonzalez et al., 2020; Pfaff et al., 2021 ), robotic systems (Pathak et al., 2019; Sanchez-Gonzalez et al., 2018; Wang et al., 2018) , and simple rigids (Rubanova et al., 2021; Mrowca et al., 2018; Li et al., 2019b; Battaglia et al., 2016; Allen et al., 2022; Bear et al., 2021) . Despite these models' successes in many areas, the rigid dynamics settings they have been applied to generally use very simple shapes like spheres and cubes. Furthermore, most models are tested only in simulation, which may be a poor proxy for real world rigid dynamics (Bauza & Rodriguez, 2017; Fazeli et al., 2017; Acosta et al., 2022) . It is therefore unclear whether end-to-end deep learning models, including GNNs, are generally a good fit for learning rigid-body interactions, particularly if the rigid bodies are complex, volumetric objects like those we experience in the real world. In this work, we demonstrate how using insights from graphics for representing and resolving collisions as face-to-face interactions improves rigid body dynamics learning enough to not only very accurately capture simulated dynamics, but also outperform analytic simulators (Todorov et al., 2012; Coumans, 2015; Lynch, 1992) for real-world rigid-body dynamics data. We conduct a series of ablations and experiments which showcase how face-to-face collision representations dramatically improve rigid body dynamics prediction.

3. METHOD

For the purposes of rigid-body simulation, objects are represented as meshes M , consisting of the set of node positions {x i } i=1..N and a set of faces {F} that describe how nodes are connected to

