A LAPLACE-INSPIRED DISTRIBUTION ON SO(3) FOR PROBABILISTIC ROTATION ESTIMATION

Abstract

Estimating the 3DoF rotation from a single RGB image is an important yet challenging problem. Probabilistic rotation regression has raised more and more attention with the benefit of expressing uncertainty information along with the prediction. Though modeling noise using Gaussian-resembling Bingham distribution and matrix Fisher distribution is natural, they are shown to be sensitive to outliers for the nature of quadratic punishment to deviations. In this paper, we draw inspiration from multivariate Laplace distribution and propose a novel Rotation Laplace distribution on SO(3). Rotation Laplace distribution is robust to the disturbance of outliers and enforces much gradient to the low-error region, resulting in a better convergence. Our extensive experiments show that our proposed distribution achieves state-of-the-art performance for rotation regression tasks over both probabilistic and non-probabilistic baselines. Our project page is at pkuepic.github.io/RotationLaplace.

1. INTRODUCTION

Incorporating neural networks to perform rotation regression is of great importance in the field of computer vision, computer graphics and robotics (Wang et al., 2019b; Yin et al., 2022; Dong et al., 2021; Breyer et al., 2021) . To close the gap between the SO(3) manifold and the Euclidean space where neural network outputs exist, one popular line of research discovers learning-friendly rotation representations including 6D continuous representation (Zhou et al., 2019) , 9D matrix representation with SVD orthogonalization (Levinson et al., 2020 ), etc. Recently, Chen et al. (2022) focuses on the gradient backpropagating process and replaces the vanilla auto differentiation with a SO(3) manifold-aware gradient layer, which sets the new state-of-the-art in rotation regression tasks. Reasoning about the uncertainty information along with the predicted rotation is also attracting more and more attention, which enables many applications in aerospace (Crassidis & Markley, 2003) , autonomous driving (McAllister et al., 2017) and localization (Fang et al., 2020) . On this front, recent efforts have been developed to model the uncertainty of rotation regression via probabilistic modeling of rotation space. The most commonly used distributions are Bingham distribution (Bingham, 1974) on S 3 for unit quaternions and matrix Fisher distribution (Khatri & Mardia, 1977) on SO(3) for rotation matrices. These two distributions are equivalent to each other (Prentice, 1986) and resemble the Gaussian distribution in Euclidean Space (Bingham, 1974; Khatri & Mardia, 1977) . While modeling noise using Gaussian-like distributions is well-motivated by the Central Limit Theorem, Gaussian distribution is well-known to be sensitive to outliers in the probabilistic regression models (Murphy, 2012) . This is because Gaussian distribution penalizes deviations quadratically, so predictions with larger errors weigh much more heavily with the learning than low-error ones and thus potentially result in suboptimal convergence when a certain amount of outliers exhibit. Unfortunately, in certain rotation regression tasks, we fairly often come across large prediction errors, e.g. 180 • error, due to either the (near) symmetry nature of the objects or severe occlusions (Murphy et al., 2021) . In Fig. 1 (left), using training on single image rotation regression as an example, we show the statistics of predictions after achieving convergence, assuming matrix Fisher distribution (as done in Mohlin et al. ( 2020)). The blue histogram shows the population with different prediction errors and the red dots are the impacts of these predictions on learning, evaluated by computing the sum of their gradient magnitudes ∥∂L/∂(distribution param.)∥ within each bin and then normalizing them across bins. It is clear that the 180 • outliers dominate the gradient as well as the network training though their population is tiny, while the vast majority of points with low error predictions are deprioritized. Arguably, at convergence, the gradient should focus more on refining the low errors rather than fixing the inevitable large errors (e.g. arose from symmetry). This motivates us to find a better probabilistic model for rotation. As pointed out by Murphy (2012), Laplace distribution, with heavy tails, is a better option for robust probabilistic modeling. Laplace distribution drops sharply around its mode and thus allocates most of its probability density to a small region around the mode; meanwhile, it also tolerates and assigns higher likelihoods to the outliers, compared to Gaussian distribution. Consequently, it encourages predictions near its mode to be even closer, thus fitting sparse data well, most of whose data points are close to their mean with the exception of several outliers(Mitianoudis, 2012), which makes Laplace distribution to be favored in the context of deep learning (Goodfellow et al., 2016) . In this work, we propose a novel Laplace-inspired distribution on SO(3) for rotation matrices, namely Rotation Laplace distribution, for probabilistic rotation regression. We devise Rotation Laplace distribution to be an approximation of multivariate Laplace distribution in the tangent space of its mode. As shown in the visualization in Fig. 1 (right), our Rotation Laplace distribution is robust to the disturbance of outliers, with most of its gradient contributed by the low-error region, and thus leads to a better convergence along with significantly higher accuracy. Moreover, our Rotation Laplace distribution is simply parameterized by an unconstrained 3 × 3 matrix and thus accommodates the Euclidean output of neural networks with ease. This network-friendly distribution requires neither complex functions to fulfill the constraints of parameterization nor any normalization process from Euclidean to rotation manifold which has been shown harmful for learning (Chen et al., 2022) . For completeness of the derivations, we also propose the Laplace-inspired distribution on S 3 for quaternions. We show that Rotation Laplace distribution is equivalent to Quaternion Laplace distribution, similar to the equivalence of matrix Fisher distribution and Bingham distribution. We extensively compare our Rotation Laplace distributions to methods that parameterize distributions on SO(3) for pose estimation, and also non-probabilistic approaches including multiple rotation representations and recent SO(3)-aware gradient layer (Chen et al., 2022). On common benchmark datasets of rotation estimation from RGB images, we achieve a significant and consistent performance improvement over all baselines. 



† He Wang and Baoquan Chen are the corresponding authors ({hewang, baoquan}@pku.edu.cn).



Figure1: Visualization of the results of matrix Fisher distribution and Rotation Laplace distribution after convergence. The horizontal axis is the geodesic distance between the prediction and the ground truth. The blue bins count the number of data points within corresponding errors (2 • each bin). The red dots illustrate the percentage of the sum of the gradient magnitude ∥∂L/∂(dist. param.)∥ within each bin. The experiment is done on all categories of ModelNet10-SO3 dataset.

Probabilistic regressionNix & Weigend (1994)  first proposes to model the output of the neural network as a Gaussian distribution and learn the Gaussian parameters by the negative log-likelihood loss function, through which one obtains not only the target but also a measure of prediction uncertainty. More recently, Kendall & Gal (2017) offers more understanding and analysis of the underlying uncertainties.Lakshminarayanan et al. (2017)  further improves the performance of uncertainty estimation by network ensembling and adversarial training.Makansi et al. (2019)  stabilizes the training with the winner-takes-all and iterative grouping strategies. Probabilistic regression for

