LEARNING TO REGISTER UNBALANCED POINT PAIRS

Abstract

Point cloud registration methods can effectively handle large-scale, partially overlapping point cloud pairs. Despite its practicality, matching the unbalanced pairs in terms of spatial extent and density has been overlooked and rarely studied. We present a novel method, dubbed UPPNet, for Unbalanced Point cloud Pair registration. We propose to incorporate a hierarchical framework that effectively finds inlier correspondences by gradually reducing search space. The proposed method first predicts subregions within target point cloud that are likely to be overlapped with query. Then following super-point matching and fine-grained refinement modules predict accurate inlier correspondences between the target and query. Additional geometric constraints are applied to refine the correspondences that satisfy spatial compatibility. The proposed network can be trained in an end-to-end manner, predicting the accurate rigid transformation with a single forward pass. To validate the efficacy of the proposed method, we create a carefully designed benchmark, named KITTI-UPP dataset, by augmenting the KITTI odometry dataset. Extensive experiments reveal that the proposed method not only outperforms state-of-the-art point cloud registration methods by large margins on KITTI-UPP benchmark, but also achieves competitive results on the standard pairwise registration benchmark including 3DMatch, 3DLoMatch, ScanNet, and KITTI, thus showing the applicability of our method on various datasets. The source code and dataset will be publicly released.

1. INTRODUCTION

Point cloud registration is a task that aims to recover 3D rigid transformation between two possibly overlapping point cloud fragments. The rapid advance of commodity 3D sensors gives rise to the necessity of efficient point cloud registration algorithms for numerous real-world applications, including 3D reconstruction for virtual-, augmented-, and mixed reality applications, and the navigation systems of autonomous vehicles or robotic agents. Recent work has made remarkable progress in developing learning-based point cloud registration algorithms for tackling real-world 3D scans (Geiger et al., 2012; Zeng et al., 2017) with high-resolution feature extraction (Choy et al., 2019b; Bai et al., 2020) under presence of low ratio of the inlier correspondences (Choy et al., 2020b; Bai et al., 2021; Lee et al., 2021) or small overlap region between point pairs (Huang et al., 2021) . However, the imbalance issue in terms of spatial extent and point density between the input point clouds is often overlooked, despite its practical utility in the problems such as incremental mapping, or the registration of partial observations and the holistic environment. For instance, there are sensible solutions for registering a pair of 3D LiDAR scans, but registering a single LiDAR scan and a large-scale 3D map still remains challenging. A viable solution is to apply a global localization approach (Uy & Lee, 2018; Komorowski, 2021; Du et al., 2020; Zhang & Xiao, 2019; Liu et al., 2019) , but the existing methods cast the problem as a retrieval task and assume that the 3D map is given as a set of overlapping 3D scans rather than a holistic map, which is not generally applicable to the unbalanced point pairs. Recent feature-based pairwise point cloud registration methods are equipped with matchability detection (Bai et al., 2020 ), overlap detection (Huang et al., 2021) , or hierarchical correspondence prediction (Yu et al., 2021) , which are possibly advantageous in registering unbalanced point clouds. However, we empirically found that they collapse in registering unbalanced point clouds. Point cloud description and matching in the modern feature-based registration methods tend to be distracted by similar geometric structures that often appear in the larger point cloud. To this end, we propose UPPNet, the first neural architecture which is designed to be efficient for large-scale Unbalanced Point cloud Pair registration. UPPNet is a hierarchical framework that effectively finds inlier correspondences by gradually reducing search space. In the coarsest level, a submap proposal module proposes the subregions that are likely to be overlapped with the query by utilizing a global geometric context. Then, the coarse-to-fine matching module predicts accurate point-level correspondences by utilizing attention-based context aggregation and solving optimal transport problems. The subsequent structured matching module filters out outlier correspondences that violate spatial compatibility. To evaluate our method, we create a carefully designed benchmark, namely KITTI-UPP dataset, for matching point cloud pairs under the diverse spatial extent and point density imbalance by augmenting the KITTI odometry dataset (Geiger et al., 2012) . The experiment shows that our method improves the Registration Recall on the KITTI-UPP dataset by over 19.6% than state-of-the-art registration pipelines when the target point cloud is 11.1 times spatially larger and 11.7 times denser than the query point cloud. Furthermore, we evaluate the proposed method under the unbalanced indoor environments using ScanNet (Dai et al., 2017) dataset and show that the proposed method can be generalized for indoor RGB-D scans. Finally, to demonstrate the applicability of the proposed method for partially overlapped point cloud pairs, we evaluate our method on the standard pairwise registration benchmarks, 3DMatch, 3DLoMatch (Zeng et al., 2017; Huang et al., 2021) , and KITTI odometry (Geiger et al., 2012) datasets. The proposed method achieves competitive registration accuracy with the modern pairwise registration methods (Bai et al., 2020; Huang et al., 2021; Yu et al., 2021) . An overview of our method can be found in Figure 1 . Our main contributions are summarized as follows: • We propose a novel hierarchical framework that gradually reduces the search space via submap proposal module and coarse-to-fine matching modules, which can effectively handle unbalanced point cloud registration tasks. • We introduce a new benchmark, namely KITTI-UPP dataset, by carefully augmenting the large-scale outdoor LiDAR dataset (Geiger et al., 2012 ). • Our method can be trained in an end-to-end manner and demonstrates strong generalization ability on a wide range of spatial and point density. It outperforms the previous state-of-theart methods by 19.6% Registration Recall on a challenging benchmark. • Our method achieves competitive registration accuracy in both unbalanced indoor RGB-D registration (Dai et al., 2017) and the standard pairwise registration benchmarks (Zeng et al., 2017; Huang et al., 2021; Geiger et al., 2012) , showing the applicability of the method.

2. RELATED WORK

Point cloud registration. Given partially overlapping point cloud pairs, point cloud registration aims to estimate the rigid transformation parameters that align the input point clouds. Typical pipelines start with the feature extraction and matching stage to produce a set of putative correspondences, followed by a robust model fitting algorithm to estimate the pose parameters from given correspondences. Traditional local descriptors for point clouds (Johnson & Hebert, 1999; Rusu et al., 2008; 2009; Tombari et al., 2010; Salti et al., 2014) The set of putative correspondences that is built by matching local feature descriptors tends to contain a high portion of outliers. Hence, the pose estimation algorithm should be robust against the existence of outliers. RANdom SAmple Consensus (RANSAC) (Fischler & Bolles, 1981 ) and



encode local geometry using hand-crafted features such as surface normal or curvature. Recent learnable feature descriptors train a network to extract geometric features in a data-driven manner. Qi et al. (2017) incorporate shared MLP and permutation-invariant aggregation operations to process unordered and irregular point cloud data. Deng et al. (2018a;b) extended Qi et al. (2017) by combining point pair feature to extract global context-aware feature descriptors. Choy et al. (2019b) adopted the fully convolutional networks to process point cloud data by utilizing sparse convolution (Choy et al., 2019a) and achieved state-of-the-art feature matching accuracy in large-scale real-world datasets. Bai et al. (2020) and Huang et al. (2021) incorporate keypoint and overlap confidence prediction for robust feature matching between low overlapping point pairs.

