A LAPLACE-INSPIRED DISTRIBUTION ON SO(3) FOR PROBABILISTIC ROTATION ESTIMATION

Abstract

Estimating the 3DoF rotation from a single RGB image is an important yet challenging problem. Probabilistic rotation regression has raised more and more attention with the benefit of expressing uncertainty information along with the prediction. Though modeling noise using Gaussian-resembling Bingham distribution and matrix Fisher distribution is natural, they are shown to be sensitive to outliers for the nature of quadratic punishment to deviations. In this paper, we draw inspiration from multivariate Laplace distribution and propose a novel Rotation Laplace distribution on SO(3). Rotation Laplace distribution is robust to the disturbance of outliers and enforces much gradient to the low-error region, resulting in a better convergence. Our extensive experiments show that our proposed distribution achieves state-of-the-art performance for rotation regression tasks over both probabilistic and non-probabilistic baselines. Our project page is at pkuepic.github.io/RotationLaplace.

1. INTRODUCTION

Incorporating neural networks to perform rotation regression is of great importance in the field of computer vision, computer graphics and robotics (Wang et al., 2019b; Yin et al., 2022; Dong et al., 2021; Breyer et al., 2021) . To close the gap between the SO(3) manifold and the Euclidean space where neural network outputs exist, one popular line of research discovers learning-friendly rotation representations including 6D continuous representation (Zhou et al., 2019) , 9D matrix representation with SVD orthogonalization (Levinson et al., 2020 ), etc. Recently, Chen et al. (2022) focuses on the gradient backpropagating process and replaces the vanilla auto differentiation with a SO(3) manifold-aware gradient layer, which sets the new state-of-the-art in rotation regression tasks. Reasoning about the uncertainty information along with the predicted rotation is also attracting more and more attention, which enables many applications in aerospace (Crassidis & Markley, 2003) , autonomous driving (McAllister et al., 2017) and localization (Fang et al., 2020) . On this front, recent efforts have been developed to model the uncertainty of rotation regression via probabilistic modeling of rotation space. The most commonly used distributions are Bingham distribution (Bingham, 1974) on S 3 for unit quaternions and matrix Fisher distribution (Khatri & Mardia, 1977) on SO(3) for rotation matrices. These two distributions are equivalent to each other (Prentice, 1986) and resemble the Gaussian distribution in Euclidean Space (Bingham, 1974; Khatri & Mardia, 1977) . While modeling noise using Gaussian-like distributions is well-motivated by the Central Limit Theorem, Gaussian distribution is well-known to be sensitive to outliers in the probabilistic regression models (Murphy, 2012) . This is because Gaussian distribution penalizes deviations quadratically, so predictions with larger errors weigh much more heavily with the learning than low-error ones and thus potentially result in suboptimal convergence when a certain amount of outliers exhibit. Unfortunately, in certain rotation regression tasks, we fairly often come across large prediction errors, e.g. 180 • error, due to either the (near) symmetry nature of the objects or severe occlusions (Murphy et al., 2021) . In Fig. 1 (left), using training on single image rotation regression as an example, we show the statistics of predictions after achieving convergence, assuming matrix Fisher distribution (as done in Mohlin et al. (2020) ). The blue histogram shows the population with different prediction errors and the red dots are the impacts of these predictions on learning, evaluated Mag. of gradient (%) Figure 1 : Visualization of the results of matrix Fisher distribution and Rotation Laplace distribution after convergence. The horizontal axis is the geodesic distance between the prediction and the ground truth. The blue bins count the number of data points within corresponding errors (2 • each bin). The red dots illustrate the percentage of the sum of the gradient magnitude ∥∂L/∂(dist. param.)∥ within each bin. The experiment is done on all categories of ModelNet10-SO3 dataset. by computing the sum of their gradient magnitudes ∥∂L/∂(distribution param.)∥ within each bin and then normalizing them across bins. It is clear that the 180 • outliers dominate the gradient as well as the network training though their population is tiny, while the vast majority of points with low error predictions are deprioritized. Arguably, at convergence, the gradient should focus more on refining the low errors rather than fixing the inevitable large errors (e.g. arose from symmetry). This motivates us to find a better probabilistic model for rotation. As pointed out by Murphy (2012) , Laplace distribution, with heavy tails, is a better option for robust probabilistic modeling. Laplace distribution drops sharply around its mode and thus allocates most of its probability density to a small region around the mode; meanwhile, it also tolerates and assigns higher likelihoods to the outliers, compared to Gaussian distribution. Consequently, it encourages predictions near its mode to be even closer, thus fitting sparse data well, most of whose data points are close to their mean with the exception of several outliers (Mitianoudis, 2012) , which makes Laplace distribution to be favored in the context of deep learning (Goodfellow et al., 2016) . In this work, we propose a novel Laplace-inspired distribution on SO(3) for rotation matrices, namely Rotation Laplace distribution, for probabilistic rotation regression. We devise Rotation Laplace distribution to be an approximation of multivariate Laplace distribution in the tangent space of its mode. As shown in the visualization in Fig. 1 (right), our Rotation Laplace distribution is robust to the disturbance of outliers, with most of its gradient contributed by the low-error region, and thus leads to a better convergence along with significantly higher accuracy. Moreover, our Rotation Laplace distribution is simply parameterized by an unconstrained 3 × 3 matrix and thus accommodates the Euclidean output of neural networks with ease. This network-friendly distribution requires neither complex functions to fulfill the constraints of parameterization nor any normalization process from Euclidean to rotation manifold which has been shown harmful for learning (Chen et al., 2022) . For completeness of the derivations, we also propose the Laplace-inspired distribution on S 3 for quaternions. We show that Rotation Laplace distribution is equivalent to Quaternion Laplace distribution, similar to the equivalence of matrix Fisher distribution and Bingham distribution. We extensively compare our Rotation Laplace distributions to methods that parameterize distributions on SO(3) for pose estimation, and also non-probabilistic approaches including multiple rotation representations and recent SO(3)-aware gradient layer (Chen et al., 2022) . On common benchmark datasets of rotation estimation from RGB images, we achieve a significant and consistent performance improvement over all baselines.

2. RELATED WORK

Probabilistic regression Nix & Weigend (1994) first proposes to model the output of the neural network as a Gaussian distribution and learn the Gaussian parameters by the negative log-likelihood loss function, through which one obtains not only the target but also a measure of prediction uncertainty. More recently, Kendall & Gal (2017) offers more understanding and analysis of the underlying uncertainties. Lakshminarayanan et al. (2017) further improves the performance of uncertainty estimation by network ensembling and adversarial training. Makansi et al. (2019) stabilizes the training with the winner-takes-all and iterative grouping strategies. Probabilistic regression for uncertainty prediction has been widely used in various applications, including optical flow estimation (Ilg et al., 2018) , depth estimation (Poggi et al., 2020) , weather forecasting (Wang et al., 2019a) , etc. Among the literature of decades, the majority of probabilistic regression works model the network output by a Gaussian-like distribution, while Laplace distribution is less discovered. Li et al. (2021) empirically finds that assuming a Laplace distribution in the process of maximum likelihood estimation yields better performance than a Gaussian distribution, in the field of 3D human pose estimation. Recent work (Nair et al., 2022) (Khatri & Mardia, 1977) on SO(3) over rotation matrices for deep rotation regression. Though both bear similar properties with Gaussian distribution in Euclidean space, matrix Fisher distribution benefits from the continuous rotation representation and unconstrained distribution parameters, which yields better performance (Murphy et al., 2021) . Recently, Murphy et al. (2021) introduces a non-parametric implicit pdf over SO(3), with the distribution properties modeled by the neural network parameters. Implicit-pdf especially does good for modeling rotations of symmetric objects.

Non-probabilistic rotation regression

The choice of rotation representation is one of the core issues concerning rotation regression. The commonly used representations include Euler angles (Kundu et al., 2018; Tulsiani & Malik, 2015) , unit quaternion (Kendall & Cipolla, 2017; Kendall et al., 2015; Xiang et al., 2017) and axis-angle (Do et al., 2018; Gao et al., 2018; Ummenhofer et al., 2017)  p(R; A) = 1 F (A) exp tr(A T R) (1) where A ∈ R 3×3 is an unconstrained matrix, and F (A) ∈ R is the normalization factor. Without further clarification, we denote F as the normalization factor of the corresponding distribution in the remaining of this paper. We also denote matrix Fisher distribution as R ∼ MF(A). Suppose the singular value decomposition of matrix A is given by A = U ′ S ′ (V ′ ) T , proper SVD is defined as A = USV T where U = U ′ diag(1, 1, det(U ′ )) V = V ′ diag(1, 1, det(V ′ )) S = diag(s1, s2, s3) = diag(s ′ 1 , s ′ 2 , det(U ′ V ′ )s ′ 3 ) The definition of U and V ensures that det(U) = det(V) = 1 and U, V ∈ SO(3).

3.2. RELATIONSHIP BETWEEN MATRIX FISHER DISTRIBUTION IN SO(3) AND GAUSSIAN DISTRIBUTION IN R 3

It is shown that matrix Fisher distribution is highly relevant with zero-mean Gaussian distribution near its mode (Lee, 2018a; b) . Denote R 0 as the mode of matrix Fisher distribution, and define R = R T 0 R, the relationship is shown as follows. Please refer to supplementary for the proof. Proposition 1. Let Φ = log R ∈ so(3) and ϕ = Φ ∨ ∈ R 3 . For rotation matrix R ∈ SO(3) following matrix Fisher distribution, when ∥R -R 0 ∥ → 0 , ϕ follows zero-mean multivariate Gaussian distribution.

4. PROBABILISTIC ROTATION ESTIMATION WITH ROTATION LAPLACE DISTRIBUTION 4.1 ROTATION LAPLACE DISTRIBUTION

We get inspiration from multivariate Laplace distribution (Eltoft et al., 2006; Kozubowski et al., 2013) , defined as follows. Definition 2. Multivariate Laplace distribution. If means µ = 0, the d-dimensional multivariate Laplace distribution with covariance matrix Σ is defined as p(x; Σ) = 1 F x T Σ -1 x v/2 Kv √ 2x T Σ -1 x where v = (2 -d)/2 and K v is modified Bessel function of the second kind. We consider three dimensional Laplace distribution of x ∈ R 3 (i.e. d = 3 and v = -1 2 ). Given the property K -1 2 (ξ) ∝ ξ -1 2 exp(-ξ), three dimensional Laplace distribution is defined as p(x; Σ) = 1 F exp - √ 2x T Σ -1 x √ x T Σ -1 x In this section, we first give the definition of our proposed Rotation Laplace distribution and then shows its relationship with multivariate Laplace distribution. Definition 3. Rotation Laplace distribution. The random variable R ∈ SO(3) follows Rotation Laplace distribution with parameter A, if its probability density function is defined as p(R; A) = 1 F (A) exp -tr (S -A T R) tr (S -A T R) (2) where A ∈ R 3×3 is an unconstrained matrix, and S is the diagonal matrix composed of the proper singular values of matrix A, i.e., A = USV T . We also denote Rotation Laplace distribution as R ∼ RL(A). Denote R 0 as the mode of Rotation Laplace distribution and define R = R T 0 R, the relationship between Rotation Laplace distribution and multivariate Laplace distribution is shown as follows. Proposition 2. Let Φ = log R ∈ so(3) and ϕ = Φ ∨ ∈ R 3 . For rotation matrix R ∈ SO(3) following Rotation Laplace distribution, when ∥R -R 0 ∥ → 0 , ϕ follows zero-mean multivariate Laplace distribution.

Proof. Apply proper SVD to matrix

A as A = USV T . For R ∼ RL(A) , we have p(R)dR ∝ exp tr(S-A T R) tr(S-A T R) dR = exp tr(S-SV T RV) tr(S-SV T RV) dR (3) With ϕ = (log R) ∨ ∈ R 3 , R can be parameterized as R(ϕ) = exp( φ) = I + sin ∥ϕ∥ ∥ϕ∥ φ + 1 -cos ∥ϕ∥ ∥ϕ∥ 2 φ2 We follow the common practice (Mohlin et al., 2020; Lee, 2018a) that the Haar measure dR is scaled such that SO(3) dR = 1 and thus the Haar measure is given by d R = 1 -cos ∥ϕ∥ 4π 2 ∥ϕ∥ 2 dϕ = 1 8π 2 + O(∥ϕ∥) 2 dϕ. (4) Also, R expanded at ϕ = 0 is computed as R = I + φ + 1 2 φ2 + O(∥ϕ∥ 3 ), we have V T RV = I + V T φV + 1 2 V T φ2 V + O(∥ϕ∥ 3 ) = I + V T ϕ + 1 2 V T ϕ 2 + O(∥ϕ∥ 3 ) =   1 -1 2 (µ 2 2 + µ 2 3 ) 1 2 µ1µ2 -µ3 1 2 µ1µ3 + µ2 1 2 µ1µ2 + µ3 1 -1 2 (µ 2 3 + µ 2 1 ) 1 2 µ2µ3 -µ1 1 2 µ1µ3 -µ2 1 2 µ2µ3 + µ1 1 -1 2 (µ 2 1 + µ 2 2 )   + O(∥ϕ∥ 3 ), where (µ 1 , µ 2 , µ 3 ) T = V T ϕ, and tr(S-SV T RV) = (i,j,k)∈I 1 2 (sj + s k )µ 2 i + O(∥ϕ∥ 3 ) = 1 2 ϕ T V s 2 +s 3 s 1 +s 3 s 1 +s 2 V T ϕ + O(∥ϕ∥ 3 ) (6) Considering Eq. 3, 4 and 6, we have p(R)dR ∝ exp tr(S-A T R) tr(S-A T R) dR = 1 8π 2 exp -2ϕ T Σ -1 ϕ 2ϕ T Σ -1 ϕ 1 + O(∥ϕ∥ 2 ) dϕ When ∥R-R 0 ∥ → 0 , we have ∥ R-I∥ → 0 and ϕ → 0, so Eq. 7 follows the multivariate Laplace distribution with the covariance matrix as Σ, where Σ = 4V diag( 1 s2+s3 , 1 s1+s3 , 1 s1+s2 )V T . Rotation Laplace distribution bears similar properties with matrix Fisher distribution. Its mode is computed as UV T . The columns of U and the proper singular values S describe the orientation and the strength of dispersions, respectively.

4.2. NEGATIVE LOG-LIKELIHOOD LOSS

Given a collection of observations X = {x i } and the associated ground truth rotations R = {R i }, we aim at training the network to best estimate the parameter A of Rotation Laplace distribution. This is achieved by maximizing a likelihood function so that, under our probabilistic model, the observed data is most probable, which is known as maximum likelihood estimation (MLE). We use the negative log-likelihood of R x as the loss function:  L(x, R x ) = -log p (R x ; A x ) F (A) = SO(3) exp -tr (S -A T R) tr (S -A T R) dR ≈ R i ∈G exp -tr (S -A T Ri) tr (S -A T Ri) ∆Ri where ∆R i = SO(3) dR |G| = 1 |G| . In experiments, we discretize SO(3) space into about 37k points. Please refer to supplementary for analysis of the effect of different numbers of samples.

4.4. QUATERNION LAPLACE DISTRIBUTION

In this section, we introduce our extension of Laplace-inspired distribution for quaternions, namely, Quaternion Laplace distribution. Definition 4. Quaternion Laplace distribution. The random variable q ∈ S 3 follows Quaternion Laplace distribution with parameter M and Z, if its probability density function is defined as p(q; M, Z) = 1 F (Z) exp --q T MZM T q -q T MZM T q (8) where M ∈ O(4) is a 4 × 4 orthogonal matrix, and Z = diag(0, z 1 , z 2 , z 3 ) is a 4 × 4 diagonal matrix with 0 ≥ z 1 ≥ z 2 ≥ z 3 . We also denote Quaternion Laplace distribution as q ∼ QL(M, Z). Proposition 3. Denote q 0 as the mode of Quaternion Laplace distribution. Let π be the tangent space of S 3 at q 0 , and π(x) ∈ R 4 be the projection of x ∈ R 4 on π. For quaternion q ∈ S 3 following Bingham distribution / Quaternion Laplace distribution, when q → q 0 , π(q) follows zero-mean multivariate Gaussian distribution / zero-mean multivariate Laplace distribution. Both Bingham distribution and Quaternion Laplace distribution exhibit antipodal symmetry on S 3 , i.e., p(q) = p(-q), which captures the nature that the quaternions q and -q represent the same rotation on SO(3). Proposition 4. Denote γ as the standard transformation from unit quaternions to corresponding rotation matrices. For rotation matrix R ∈ SO(3) following Rotation Laplace distribution, q = γ -1 (R) ∈ S 3 follows Quaternion Laplace distribution. Prop. 4 shows that our proposed Rotation Laplace distribution is equivalent to Quaternion Laplace distribution, similar to the equivalence of matrix Fisher distribution and Bingham distribution (Prentice, 1986), demonstrating the consistency of our derivations. Please see supplementary for the proofs to the above propositions. The normalization factor of Quaternion Laplace distribution is also approximated by dense discretization, as follows: F (Z) = S 3 exp --q T MZM T q -q T MZM T q dq ≈ q i ∈Gq exp --q T i MZM T qi -q T i MZM T qi ∆qi where G q = q|q ∈ S 3 denotes the set of equivolumetric grids and ∆q i = S 3 dq |Gq| = 2π 2 |Gq| .

5. EXPERIMENT

Following the previous state-of-the-arts (Murphy et al., 2021; Mohlin et al., 2020) , we evaluate our method on the task of object rotation estimation from single RGB images, where object rotation is the relative rotation between the input object and the object in the canonical pose. Concerning this task, we find two kinds of independent research tracks with slightly different evaluation settings. One line of research focuses on probabilistic rotation regression with different parametric or nonparametric distributions on SO(3) (Prokudin et al., 2018; Gilitschenski et al., 2019; Deng et al., 2022; Mohlin et al., 2020; Murphy et al., 2021) , and the other non-probabilistic track proposes multiple rotation representations (Zhou et al., 2019; Levinson et al., 2020; Peretroukhin et al., 2020) or improves the gradient of backpropagation (Chen et al., 2022) . To fully demonstrate the capacity of our Rotation Laplace distribution, we leave the baselines in their original optimal states and adapt our method to follow the common experimental settings in each track, respectively. Acc@3 • ↑ Acc@5 • ↑ Acc@10 • ↑ Acc@15 • ↑ Acc@30 Numbers in parentheses (•) are our reproduced results. Please refer to supplementary for comparisons with each category. Evaluation metrics We evaluate our experiments with the geodesic distance of the network prediction and the ground truth. This metric returns the angular error and we measure it in degrees. In addition, we report the prediction accuracy within the given error threshold.

5.1. DATASETS & EVALUATION METRICS

Acc@3 • ↑ Acc@5 • ↑ Acc@10 • ↑ Acc@15 • ↑ Acc@30 • ↑ Med.( • )↓ Tulsiani & Malik (

5.2.1. EVALUATION SETUP

Settings In this section, we follow the experiment settings of the latest work (Murphy et al., 2021) and quote its reported numbers for baselines. Specifically, we train one single model for all categories of each dataset. For Pascal3D+ dataset, we follow Murphy et al. (2021) to use (the more challenging) PascalVOC val as test set. Note that Murphy et al. (2021) only measure the coarsescale accuracy (e.g., Acc@30 • ) which may not adequately satisfy the downstream tasks (Wang et al., 2019b; Fang et al., 2020) . To facilitate finer-scale comparisons (e.g., Acc@5 • ), we further re-run several recent baselines and report the reproduced results in parentheses (•). Baselines We compare our method to recent works which utilize probabilistic distributions on SO 

5.2.2. RESULTS

Table 1 shows the quantitative comparisons of our method and baselines on ModelNet10-SO3 dataset. From the multiple evaluation metrics, we can see that maximum likelihood estimation with the assumption of Rotation Laplace distribution significantly outperforms the other distributions for rotation, including matrix Fisher distribution (Mohlin et al., 2020) , Bingham distribution (Do et al., 2018) and von-Mises distribution (Prokudin et al., 2018) . Our method also gets superior performance than the non-parametric implicit-PDF (Murphy et al., 2021) . Especially, our method improves the fine-scale Acc@3 • and Acc@5 • accuracy by a large margin, showing its capacity to precisely model the target distribution. The experiments on Pascal3D+ dataset are shown in Table 2 , where our Rotation Laplace distribution outperforms all the baselines. While our method gets reasonably good performance on the median error and coarser-scale accuracy, we do not find a similar impressive improvement on fine-scale metrics as in ModelNet10-SO3 dataset. We suspect it is because the imperfect human annotations of real-world images may lead to comparatively noisy ground truths, increasing the difficulty for networks to get rather close predictions with GT labels. Nevertheless, our method still manages to obtain superior performance, which illustrates the robustness of our Rotation Laplace distribution.

5.3.1. EVALUATION SETUP

Settings For comparisons with non-probabilistic methods, we follow the latest work of Chen et al. (2022) to learn a network for each category. For Pascal3D+ dataset, we follow Chen et al. ( 2022) to use ImageNet val as our test set. We use the same evaluation metrics as in Chen et al. ( 2022) and quote its reported numbers for baselines. Baselines We compare to multiple baselines that leverage different rotation representations to directly regress the prediction given input images, including 6D (Zhou et al., 2019), 9D / 9D-Inf (Levinson et al., 2020) and 10D (Peretroukhin et al., 2020) . We also include regularized projective manifold gradient (RPMG) series of methods (Chen et al., 2022) .

5.3.2. RESULTS

We report the numerical results of our method and on-probabilistic baselines on ModelNet10-SO3 dataset in Table 3 . Our method obtains a clear superior performance to the best competitor under all the metrics among all the categories. Note that we train a model for each category (so do all the baselines), thus our performance in 

5.4. QUALITATIVE RESULTS

We visualize the predicted distributions in Figure 2 with the visualization method in Mohlin et al. (2020) . As shown in the figure, the predicted distributions can exhibit high uncertainty when the object has rotational symmetry, leading to near 180 • errors (a-c), or the input image is with low resolution (d). Subfigure (e-f) show cases with high certainty and reasonably low errors. Please refer to the supplementary for more visual results.

5.5. IMPLEMENTATION DETAILS

For fair comparisons, we follow the implementation designs of Mohlin et al. (2020) and merely change the distribution from matrix Fisher distribution to our Rotation Laplace distribution. For numerical stability, we clip tr(S -A T R) by max(1e -8, tr(S -A T R)) for Eq.2. Please refer to supplementary for more details.

5.6. COMPARISONS OF ROTATION LAPLACE DISTRIBUTION AND QUATERNION LAPLACE DISTRIBUTION

For the completeness of experiments, we also compare our proposed Quaternion Laplace distribution and Bingham distribution and report the performance in Table 5 . As shown in the table, Quaternion Laplace distribution consistently achieves superior performance than its competitor, which validates the effectiveness of our Laplace-inspired derivations. However, its rotation error is in general larger than Rotation Laplace distribution, since its rotation representation, quaternion, is not a continuous representation, as pointed in Zhou et al. (2019) , thus leading to inferior performance.

6. CONCLUSION

In this paper, we draw inspiration from multivariant Laplace distribution and derive two novel distributions for probabilistic rotation regression, namely, Rotation Laplace distribution for rotation matrices on SO(3) and Quaternion Laplace distribution for quaternions on S 3 . Extensive comparisons with both probabilistic and non-probabilistic baselines on ModelNet10-SO3 and Pascal3D+ datasets demonstrate the effectiveness and advantages of our proposed distributions. (Lee, 2018a; Teed & Deng, 2021; Sola et al., 2018) . The three-dimensional special orthogonal group SO(3) is defined as SO(3) = {R ∈ R 3×3 |RR T = I, det (R) = 1}. The Lie algebra of SO(3), denoted by so(3), is the tangent space of SO(3) at I, given by so(3) = {Φ ∈ R 3×3 |Φ = -Φ T }. so(3) is identified with (R 3 , ×) by the hat ∧ map and the vee ∨ map defined as so(3) ∋   0 -ϕz ϕy ϕz 0 -ϕx -ϕy ϕx 0   vee ∨ ⇄ hat ∧   ϕx ϕy ϕz   ∈ R 3 The exponential map, taking skew symmetric matrices to rotation matrices is given by exp( φ) = ∞ k=0 φk k! = I + sin θ θ φ + 1 -cos θ θ 2 φ2 , where θ = ∥ϕ∥. The exponential map can be inverted by the logarithm map, going from SO(3) to so(3) as log(R) = θ 2 sin θ (R -R T ), where θ = arccos tr(R)-1 2 .

A.2 HAAR MEASURE

To evaluate the normalization factors and therefore the probability density functions, the measure dR on SO(3) needs to be defined. For the Lie group SO(3), the commonly used bi-invariant measure is referred to as Haar measure (Haar, 1933; James, 1999) . Haar measure is unique up to scalar multiples (Chirikjian, 2000) and we follow the common practice (Mohlin et al., 2020; Lee, 2018a) that the Haar measure dR is scaled such that SO(3) dR = 1. B MORE ANALYSIS ON GRADIENT W.R.T. OUTLIERS In the task of rotation regression, predictions with really large errors (e.g., 180 • error) are fairly observed due to rotational ambiguity or lack of discriminate visual features. Properly handling these outliers during training is one of the keys to success in probabilistic modeling of rotations. In Figure 3 , for matrix Fisher distribution and Rotation Laplace distribution, we visualize the gradient magnitudes ∥∂L/∂(distribution param.)∥ w.r.t. the prediction errors on ModelNet10-SO3 dataset after convergence, where each point is a data point in the test set. As shown in the figure, for matrix Fisher distribution, predictions with larger errors clearly yield larger gradient magnitudes, and those with near 180 • errors (the outliers) have the biggest impact. Given that outliers may be inevitable and hard to be fixed, they may severely disturb the training process and the sensitivity to outliers can result in a poor fit (Murphy, 2012; Nair et al., 2022) . In contrast, for our Rotation Laplace distribution, the gradient magnitudes are not affected by the prediction errors much, leading to a stable learning process. Consistent results can also be seen in Figure 1 of the main paper, where the red dots illustrate the sum of the gradient magnitude over the population within an interval of prediction errors. We argue that, at convergence, the gradient should focus more on the large population with low errors rather than fixing the unavoidable large errors.

C UNCERTAINTY QUANTIFICATION MEASURED BY DISTRIBUTION ENTROPY

Probabilistic modeling of rotation naturally models the uncertainty information of rotation regression. Yin et al. (2022) proposes to use the entropy of the distribution as an uncertainty measure. We adopt it as the uncertainty indicator of Rotation Laplace distribution and plot the relationship between the error of the prediction and the corresponding distribution entropy on the testset of ModelNet10-SO3 and Pascal3D+ datasets in Figure 4 . As shown in the figure, predictions with lower entropies (i.e., lower uncertainty) clearly achieve higher accuracy than predictions with large entropies, demonstrating the ability of uncertainty estimation of our Rotation Laplace distribution. We compute the entropy via discretization, where SO(3) space is quantized into a finite set of equivolumetric girds G = {R|R ∈ SO(3)}, and H (p) = - SO(3) p log pdR ≈ - Ri∈G p i log p i ∆R i We use about 0.3M grids to discretize SO(3) space.

D EFFECT OF DIFFERENT NUMBERS OF DISCRETIZATION SAMPLES

To compute the normalization factor of our distribution, we discretize SO(3) space into a finite set of equivolumetric grids using Hopf fibration. Here we show the comparison on different numbers of samples. We experiment with ModelNet10-SO3 toilet dataset on a single 3090 GPU. As stated in Table 6 , the approximation with too few samples leads to inferior performance, and increasing the number of samples yields a better performance at the cost of a longer runtime. The performance improvement saturates when the number of samples is sufficient. We choose to use 37k samples in our experiments.

E.1 ADDITIONAL NUMERICAL RESULTS

Table 7 and 8 extend the results on ModelNet10-SO3 dataset and Pascal3D+ dataset in the main paper and show the per-category results. Our prediction with Rotation Laplace distribution is at or near state-of-the-art on many categories. The numbers for baselines are quoted from Murphy et al. (2021) .

E.2 ADDITIONAL VISUAL RESULTS

We show additional visual results on ModelNet10-SO3 dataset in Figure 5 and on Pascal3D+ dataset in Figure 6 . As shown in the figures, our distribution provides rich information about the rotation estimations. To visualize the predicted distributions, we adopt two popular visualization methods used in Mohlin et al. (2020) and Murphy et al. (2021) . The visualization in Mohlin et al. (2020) is achieved by summing the three marginal distributions over the standard basis of R 3 and displaying them on the sphere with color coding. Murphy et al. (2021) introduces a new visualization method based on discretization over SO(3). It projects a great circle of points on SO(3) to each point on the 2-sphere, and then uses the color wheel to indicate the location on the great circle. The probability density is shown by the size of the points on the plot. See the corresponding papers for more details. Input image Distribution visual. Considering Eq. 5 in the main paper, we have tr(SV T RV) = tr(S) + (i,j,k)∈I - 1 2 (sj + s k )µ 2 i + O(∥ϕ∥ 3 ) = tr(S) - 1 2 ϕ T V s 2 +s 3 s 1 +s 3 s 1 +s 2 V T ϕ (10) Thus p(R)dR ∝ exp tr(A T R) dR = exp(tr(S)) 8π 2 exp - 1 2 ϕ T Σ -1 ϕ 1 + O(∥ϕ∥ 2 ) dϕ When ∥R -R 0 ∥ → 0 , we have ∥ R -I∥ → 0 and ϕ → 0, so Eq. 11 follows the multivariate Gaussian distribution with the covariance matrix as Σ, where Σ = V diag( 1 s2+s3 , 1 s1+s3 , 1 s1+s2 )V T . Proposition 3 in the main paper. Denote q 0 as the mode of Quaternion Laplace distribution. Let π be the tangent space of S 3 at q 0 , and π(x) ∈ R 4 be the projection of x ∈ R 4 on π. For quaternion q ∈ S 3 following Bingham distribution / Quaternion Laplace distribution, when q → q 0 , π(q) follows zero-mean multivariate Gaussian distribution / zero-mean multivariate Laplace distribution. Proof. Denote q I = (1, 0, 0, 0) T as the identity quaternion. Define M as an orthogonal matrix such that M T q 0 = q I . Given π(q) = q -(q • q 0 )q 0 , we have M T π(q) = M T q -((M T q) • (M T q0))q I = M T q -wq I , where M T q = (w, x, y, z) T . Let (e 0 , e 1 , e 2 , e 3 ) be the column vectors of I 4×4 , we have (Mei) • q0 = ei • q I = 0 for i = 1, 2, 3. Therefore, Me i (i = 1, 2, 3) form an orthogonal basis of π. Given M T q = we 0 + xe 1 + ye 2 + ze 3 , we have q = w(Me0) + x(Me1) + y(Me2) + z(Me3) Therefore, η = (x, y, z) is the coordinate of π(q) in π under the basis of Me i . The Jacobian of the transformation q → η is given by Therefore, the scaling factor from η to q is given by where we define Z = diag(z 1 , z 2 , z 3 ). For Bingham distribution, we have p(q)dq ∝ exp q T MZM T q dq = exp η T Zη (1 + O(∥η∥ 2 ))dη = exp -η T Σ -1 η (1 + O(∥η∥ 2 ))dη (18) which follows the multivariate Gaussian distribution with the covariance matrix as Σ, where Σ = -diag( 1 z1 , 1 z2 , 1 z3 ) For Quaternion Laplace distribution, we have p(q)dq ∝ exp --q T MZM T q -q T MZM T q dq = 1 √ 2 exp --η T Zη -η T Zη (1 + O(∥η∥ 2 ))dη = 1 √ 2 exp -2η T Σ -1 η 2η T Σ -1 η (1 + O(∥η∥ 2 ))dη which follows the multivariate Laplace distribution with the covariance matrix as Σ, where Σ = -2 diag( 1 z1 , 1 z2 , 1 z3 ). Proposition 4 in the main paper. Denote γ as the standard transformation from unit quaternions to corresponding rotation matrices. For rotation matrix R ∈ SO(3) following Rotation Laplace distribution, q = γ -1 (R) ∈ S 3 follows Quaternion Laplace distribution. Proof. For a quaternion q = [q 0 , q 1 , q 2 , q 3 ], we use the standard transform function γ to compute its corresponding rotation matrix: γ(q) =   1 -2q 2 2 -2q 2 3 2q1q2 -2q0q3 2q1q3 + 2q0q2 2q1q2 + 2q0q3 1 -2q 2 1 -2q 2 3 2q2q3 -2q0q1 2q1q3 -2q0q2 2q2q3 + 2q0q1 1 -2q 2 1 -2q 2 2   Let u = γ -1 (U),v = γ -1 (V) and q = [ q0, q1, q2, q3] T = γ -1 U T RV = uqv (21) Note that the transformation q → uqv is an orthogonal transformation on S 3 . Therefore, there exists an orthogonal Matrix M, such that M T q = uqv = q (22) The scaling factor from quaternions to rotation matrices is given by M T q dq = 1 2π 2 F exp --q T MZM T q -q T MZM T q dq, (26 ) where M is an orthogonal matrix and Z = -2 diag(0, s 2 + s 3 , s 1 + s 3 , s 1 + s 2 ) is a 4 × 4 diagonal matrix.

Elaboration of Eq. 3 in the main paper

Given R 0 = UV T and R = R T 0 R, 

G MORE IMPLEMENTATION DETAILS

For fair comparisons, we follow the implementation designs of Mohlin et al. (2020) and merely change the distribution from matrix Fisher distribution to our Rotation Laplace distribution. We use pretrained ResNet-101 as our backbone, and encode the object class information (for singlemodel-all-category experiments) by an embedding layer that produces a 32-dim vector. We apply a 512-512-9 MLP as the output layer. The batch size is set as 32. We use the SGD optimizer and start with the learning rate of 0.01. For ModelNet10-SO3 dataset, we train 50 epochs with learning rate decaying by a factor of 10 at epochs 30, 40, and 45. For Pascal3D+ dataset, we train 120 epochs with the same learning rate decay at epochs 30, 60 and 90.



† He Wang and Baoquan Chen are the corresponding authors ({hewang, baoquan}@pku.edu.cn).



.3 DISCRETE APPROXIMATION OF THE NORMALIZATION FACTOR Efficiently and accurately estimating the normalization factor for distributions over SO(3) is nontrivial. Inspired byMurphy et al. (2021), we approximate the normalization factor of Rotation Laplace distribution through equivolumetric discretization over SO(3) manifold. We employ the discretization method introduced in Yershova et al. (2010), which starts with the equal area grids on the 2-sphere(Gorski et al., 2005) and covers SO(3) by threading a great circle through each point on the surface of a 2-sphere with Hopf fibration. Concretely, we discretize SO(3) space into a finite set of equivolumetric grids G = {R|R ∈ SO(3)}, the normalization factor of Laplace Rotation distribution is computed as

ModelNet10-SO3(Liao et al., 2019) is a commonly used synthetic dataset for single image rotation estimation containing 10 object classes. It is synthesized by rendering the CAD models of ModelNet-10 dataset(Wu et al., 2015) that are rotated by uniformly sampled rotations in SO(3). Pascal3D+(Xiang et al., 2014) is a popular benchmark on real-world images for pose estimation. It covers 12 common daily object categories. The images in Pascal3D+ dataset are sourced from Pascal VOC and ImageNet datasets, and are split into ImageNet train, ImageNet val, PascalVOC train, and PascalVOC val sets.

3) for the purpose of pose estimation. In concrete, the baselines are with mixture of von Mises distributions Prokudin et al. (2018), Bingham distribution Gilitschenski et al. (2019); Deng et al. (2022), matrix Fisher distribution Mohlin et al. (2020) and Implicit-PDF Murphy et al. (2021). We also compare to the spherical regression work of Liao et al. (2019) as Murphy et al. (2021) does.

Figure 2: Visualizations of the predicted distributions. The top row displays example images with the projected axes of predictions (thick lines) and ground truths (thin lines) of the object. The bottom row shows the visualization of the corresponding predicted distributions of the image. For clarity we have aligned the predicted poses with the standard axes.

Figure 4: Visualization of the indication ability of the distribution entropy w.r.t. the performance. The horizontal axis is the distribution entropy and the vertical axis is the number of data points (in log scale), color coded by the errors (in degrees). The experiments are done on the test set of ModelNet10-SO3 dataset (left) and Pascal3D+ dataset (right).

Figure 5: Visual results on ModelNet10-SO3 dataset. We adopt the distribution visualization methods in Mohlin et al. (2020) and Murphy et al. (2021). For input images and visualizations with Mohlin et al. (2020), predicted rotations are shown with thick lines and the ground truths are with thin lines. For visualizations with Murphy et al. (2021), ground truths are shown by solid circles.

makes use of Laplace distribution to improve the robustness of maximum likelihood-based uncertainty estimation. Due to the heavy-tailed property of Laplace distribution, the outlier data produces comparatively less loss and have an insubstantial impact on training. Other than in Euclidean space,Mitianoudis (2012) develops Generalized Directional Laplacian distribution in S d for the application of audio separation.

Numerical comparisons with probabilistic baselines on ModelNet10-SO3 dataset averaged on all categories. Numbers in parentheses (•) are our reproduced results. Please refer to supplementary for comparisons with each category.

Numerical comparisons with probabilistic baselines on Pascal3D+ dataset averaged on all categories.

Numerical comparisons with non-probabilistic baselines on ModelNet10-SO3 dataset. One model is trained for each category.

Numerical comparisons with non-probabilistic baselines on Pascal3D+ dataset. One model is trained for each category.

Numerical comparisons with our proposed Quaternion & Rotation Laplace distribution and baselines on ModelNet10-SO3 dataset. One model is trained for each category. Quaternion Laplace distribution clearly outperforms Bingham distribution (Deng et al., 2022).

Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017. Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5745-5753, 2019.

Comparison on different numbers of discretization samples. The experiment is done on ModelNet10-SO3 toilet dataset on a single 3090 GPU.

Per-category results on Pascal3D+ dataset.

ACKNOWLEDGEMENT

We thank Haoran Liu from Peking University for the help in experiments. This work is supported in part by National Key R&D Program of China 2022ZD0160801.

