AQUAMAM: AN AUTOREGRESSIVE, QUATERNION MANIFOLD MODEL FOR RAPIDLY ESTIMATING COM-PLEX SO(3) DISTRIBUTIONS

Abstract

Accurately modeling complex, multimodal distributions is necessary for optimal decision-making, but doing so for rotations in three-dimensions, i.e., the SO(3) group, is challenging due to the curvature of the rotation manifold. The recently described implicit-PDF (IPDF) is a simple, elegant, and effective approach for learning arbitrary distributions on SO(3) up to a given precision. However, inference with IPDF requires N forward passes through the network's final multilayer perceptron-where N places an upper bound on the likelihood that can be calculated by the model-which is prohibitively slow for those without the computational resources necessary to parallelize the queries. In this paper, I introduce AQuaMaM, 12 a neural network capable of both learning complex distributions on the rotation manifold and calculating exact likelihoods for query rotations in a single forward pass. Specifically, AQuaMaM autoregressively models the projected components of unit quaternions as mixtures of uniform distributions that partition their geometrically-restricted domain of values. On an "infinite" toy dataset with ambiguous viewpoints, AQuaMaM rapidly converges to a sampling distribution closely matching the true data distribution. In contrast, the sampling distribution for IPDF dramatically diverges from the true data distribution, despite IPDF approaching its theoretical minimum evaluation loss during training. On a constructed dataset of 500,000 renders of a die in different rotations, an AQuaMaM model trained from scratch reaches a log-likelihood 14% higher than an IPDF model using a pretrained ResNet-50. Further, compared to IPDF, AQuaMaM uses 24% fewer parameters, has a prediction throughput 52× faster on a single GPU, and converges in a similar amount of time during training.

1. INTRODUCTION AND RELATED WORK

In many robotics applications, e.g., robotic weed control (Wu et al., 2020) , the ability to accurately estimate the poses of objects is a prerequisite for successful deployment. However, compared to other automation tasks, which primarily involve either classification or regression in R n , pose estimation is particularly challenging because the 3D rotation group SO(3)foot_2 lies on a curved manifold. As a result, standard probability distributions (e.g., the multivariate Gaussian) are not well-suited for modeling elements of the SO(3) set. Further, because the steps for interacting with an object in the "mean" pose between two possible poses (Figure 1 ) will often fail when applied to the object when it is in one of the non-mean poses, accounting for multimodality in the context of rotations is essential. The recently described implicit-PDF (IPDF) (Murphy et al., 2021) is a simple,



Pronounced "aqua ma'am". All code to generate the datasets, train and evaluate the models, and generate the figures can be found at: <anonymized for review>. SO(3) stands for "special orthogonal group in three dimensions", with the "special" referring to the fact that all rotation matrices have a determinant of one. See: https://blogs.scientificamerican. com/roots-of-unity/a-few-of-my-favorite-spaces-so-3/ for a popular science introduction to SO(3).

