BIPOINTNET: BINARY NEURAL NETWORK FOR POINT CLOUDS

Abstract

To alleviate the resource constraint for real-time point cloud applications that run on edge devices, in this paper we present BiPointNet, the first model binarization approach for efficient deep learning on point clouds. We discover that the immense performance drop of binarized models for point clouds mainly stems from two challenges: aggregation-induced feature homogenization that leads to a degradation of information entropy, and scale distortion that hinders optimization and invalidates scale-sensitive structures. With theoretical justifications and in-depth analysis, our BiPointNet introduces Entropy-Maximizing Aggregation (EMA) to modulate the distribution before aggregation for the maximum information entropy, and Layer-wise Scale Recovery (LSR) to efficiently restore feature representation capacity. Extensive experiments show that BiPointNet outperforms existing binarization methods by convincing margins, at the level even comparable with the full precision counterpart. We highlight that our techniques are generic, guaranteeing significant improvements on various fundamental tasks and mainstream backbones. Moreover, BiPointNet gives an impressive 14.7× speedup and 18.9× storage saving on real-world resource-constrained devices.

1. INTRODUCTION

With the advent of deep neural networks that directly process raw point clouds (PointNet (Qi et al., 2017a) as the pioneering work), great success has been achieved in learning on point clouds (Qi et al., 2017b; Li et al., 2018; Wang et al., 2019a; Wu et al., 2019; Thomas et al., 2019; Liu et al., 2019b; Zhang et al., 2019b) . Point cloud applications, such as autonomous driving and augmented reality, often require real-time interaction and fast response. However, computation for such applications is usually deployed on resource-constrained edge devices. To address the challenge, novel algorithms, such as Grid- GCN (Xu et al., 2020b ), RandLA-Net (Hu et al., 2020 ), and PointVoxel (Liu et al., 2019d) , have been proposed to accelerate those point cloud processing networks. While significant speedup and memory footprint reduction have been achieved, these works still rely on expensive floating-point operations, leaving room for further optimization of the performance from the model quantization perspective. Model binarization (Rastegari et al., 2016; Bulat & Tzimiropoulos, 2019; Hubara et al., 2016; Wang et al., 2020; Zhu et al., 2019; Xu et al., 2019) emerged as one of the most promising approaches to optimize neural networks for better computational and memory usage efficiency. Binary Neural Networks (BNNs) leverage 1) compact binarized parameters that take small memory space, and 2) highly efficient bitwise operations which are far less costly compared to the floating-point counterparts. Despite that in 2D vision tasks (Krizhevsky et al., 2012; Simonyan & Zisserman, 2014; Szegedy et al., 2015; Girshick et al., 2014; Girshick, 2015; Russakovsky et al., 2015; Wang et al., 2 ), leading to homogenization of global features with limited discriminability. Second, the binarization causes immense scale distortion at the point-wise feature extraction stage, which is detrimental to model performance in two ways: the saturation of forward-propagated features and backward-propagated gradients hinders optimization, and the disruption of the scale-sensitive structures (Figure 3 ) results in the invalidation of their designated functionality. In this paper, we provide theoretical formulations of the above-mentioned phenomenons and obtain insights through in-depth analysis. Such understanding allows us to propose a method that turns fullprecision point cloud networks into extremely efficient yet strong binarized models (see the overview in Figure 1 ). To tackle the homogenization of the binarized features after passing the aggregation function, we study the correlation between the information entropy of binarization features and the performance of point cloud aggregation functions. We thus propose Entropy-Maximizing Aggregation (EMA) that shifts the feature distribution towards the statistical optimum, effectively improving expression capability of the global features. Moreover, given maximized information entropy, we further develop Layer-wise Scale Recovery (LSR) to efficiently restore the output scale that enhances optimization, which allows scale-sensitive structures to function properly. LSR uses only one learnable parameter per layer, leading to negligible storage increment and computation overhead. 



2019b; Overview of our BiPointNet on PointNet base model, applying Entropy-Maximizing Aggregation (EMA) and Layer-wise Scale Recovery (LSR). EMA consists of the transformation unit and the aggregation unit for maximizing the information entropy of feature after binarization. LSR with the learnable layer-wise scaling factor α is applied to address the scale distortion of bi-linear layers (which form the BiMLPs), flexibly restore the distorted output to reasonable values Zhang et al., 2021) has been studied extensively by the model binarization community, the methods developed are not readily transferable for 3D point cloud networks due to the fundamental differences between 2D images and 3D point clouds. First, to gain efficiency in processing unordered 3D points, many point cloud learning methods rely heavily on pooling layers with large receptive field to aggregate point-wise features. As shown inPointNet (Qi et al., 2017b), global pooling provides a strong recognition capability. However, this practice poses challenges for binarization. Our analyses show that the degradation of feature diversity, a persistent problem with binarization(Liu et al.,

Our BiPointNet is the first binarization approaches to deep learning on point clouds, and it outperforms existing binarization algorithms for 2D vision by convincing margins. It is even almost on par (within ∼ 1-2%) with the full-precision counterpart. Although we conduct most analysis on the PointNet baseline, we show that our methods are generic and can be readily extendable to other popular backbones, such as PointNet++(Qi et al., 2017b),PointCNN (Li et al., 2018),DGCNN (Wang  et al., 2019a), and PointConv (Wu et al., 2019), which are the representatives of mainstream categories of point cloud feature extractors. Moreover, extensive experiments on multiple fundamental tasks on the point cloud, such as classification, part segmentation, and semantic segmentation, highlight that our BiPointNet is task-agnostic. Besides, we highlight that our EMA and LSR are efficient and easy to implement in practice: in the actual test on popular edge devices, BiPointNet achieves 14.7× speedup and 18.9× storage savings compared to the full-precision PointNet. Our code is released at https://github.com/htqin/BiPointNet.

