LEARNING INTERPRETABLE DYNAMICS FROM IMAGES OF A FREELY ROTATING 3D RIGID BODY

Abstract

In many real-world settings, image observations of freely rotating 3D rigid bodies, such as satellites, may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics and a lack of interpretability reduces the usefulness of standard deep learning methods. In this work, we present a physics-informed neural network model to estimate and predict 3D rotational dynamics from image sequences. We achieve this using a multi-stage prediction pipeline that maps individual images to a latent representation homeomorphic to SO(3), computes angular velocities from latent pairs, and predicts future latent states using the Hamiltonian equations of motion with a learned representation of the Hamiltonian. We demonstrate the efficacy of our approach on a new rotating rigid-body dataset with sequences of rotating cubes and rectangular prisms with uniform and non-uniform density.

1. INTRODUCTION

Images of 3D rigid bodies in motion are available across a range of application areas and can give insight into system dynamics. Learning dynamics from images has applications to planning, navigation, prediction, and control of robotic systems. Resident space objects (RSOs) are natural or man-made objects that orbit a planet or moon and are examples of commonly studied, free-rotating rigid bodies. When planning proximity operation missions with RSOs-collecting samples from an asteroid (Williams et al., 2018) , servicing a malfunctioning satellite (Flores-Abad et al., 2014) , or active space debris removal (Mark and Kamath, 2019)-it is critical to correctly estimate the RSO dynamics in order to avoid mission failure. Space robotic systems typically have access to onboard cameras, which makes learning dynamics from images a compelling approach for vision-based navigation and control. 2020) learn the underlying dynamics in an overparameterized black-box model. The combination of deep learning with physics-based models allows models to learn dynamics from high-dimensional data such as images (Allen-Blanchette et al., 2020; Zhong and Leonard, 2020; Toth et al., 2020) . However, as far as we know, our method is the first to use the Hamiltonian formalism to learn 3D rigid-body dynamics from images Kinematics and dynamics of 3D rigid body rotation are both fundamental to accomplishing the goals of this paper. The kinematics describe the rate of change of rigid body orientation as a function of the orientation and the angular velocity. Our method integrates the kinematic equations to compute the orientation trajectory in latent space using the latent angular velocity. The dynamics describe the rate of change of the angular velocity as a function of the angular velocity and the moment-of-inertia matrix J, which depends on the distribution of mass over the rigid body volume. J is unknown and cannot be computed from knowledge of the external geometry of the rigid body, except in the special case in which the mass is known and the mass is uniformly distributed over the rigid body volume. In our framework, we learn the dynamics from the motion of the rigid body, not from the external 3). We estimate angular velocity and momentum, then predict future orientation and momentum using the learned Hamiltonian (bottomleft). Each future latent state is decoded into an image using only the predicted orientation. geometry since the special case is not practical. Importantly, the difference between the dynamics of a uniformly distributed mass and a non-uniformly distributed mass inside the same external geometry is significant. We show that we can learn these very different dynamics even when the external geometry is the same. By integrating the learned dynamics, we can predict future latent angular velocity. Works such as [1, 2] estimate the transformation between image pairs, which can then be used to generate image trajectories. However, this approach implicitly assumes the object evolves with a constant velocity, an assumption that is not true in the general case. Our work addresses this limitation by estimating the angular acceleration and mass distribution for the object in addition to its angular velocity. In this paper we introduce a model, with architecture depicted in Figure 1 , that is capable of (1) learning 3D rigid-body dynamics from images, (2) predicting future image sequences in time, and (3) providing a low-dimensional, interpretable representation of the latent state. Our model incorporates the Hamiltonian formulation of the dynamics as an inductive bias to facilitate learning the moment of inertia tensor J ∈ R 3×3 and an auto-encoding map between images and SO(3). The efficacy of our approach is demonstrated through long-term image prediction and its additional latent space interpretability.

2.1. HAMILTONIAN AND LAGRANGIAN PRIORS IN LEARNING DYNAMICS

Physics-guided machine learning approaches with Lagrangian and Hamiltonian priors often use state-control (e.g. 1D position and control) trajectories (Ahmadi and Khadir, 2020; Chen et al., 2020; Cranmer et al., 2020a; Duong and Atanasov, 2021; Finzi et al., 2020; Greydanus et al., 2019;  



Previous work Allen-Blanchette et al. (2020); Zhong and Leonard (2020); Toth et al. (2020) has made significant progress in learning dynamics from images of planar rigid bodies. Learning dynamics of 3D rigid-body motion has also been explored with a variety of types of input data Duong and Atanasov (2021); Byravan and Fox (2017); Peretroukhin et al. (2020). Duong and Atanasov (2021) uses state measurement data (i.e. rotation matrix and angular momenta), while Peretroukhin et al. (

Figure 1: Schematic of model architecture (top). The architecture combines an auto-encoding neural network with a Hamiltonian dynamics model for 3D rigid bodies (bottom-right). The encoder maps a sequence of images to a sequence of latent states in SO(3). We estimate angular velocity and momentum, then predict future orientation and momentum using the learned Hamiltonian (bottomleft). Each future latent state is decoded into an image using only the predicted orientation.

