DIMENSIONLESS INSTANCE SEGMENTATION BY LEARN-ING GRAPH REPRESENTATIONS OF POINT CLOUDS Anonymous authors Paper under double-blind review

Abstract

Point clouds are an increasingly common spatial data modality, being produced by sensors used in robotics and self-driving cars, and as natural intermediate representations of objects in microscopy and other bioimaging domains (e.g., cell locations over time, or filaments, membranes, or organelle boundaries in cryo-electron micrographs or tomograms). However, semantic and instance segmentation of this data remains challenging due to the complex nature of objects in point clouds. Especially in bioimaging domains where objects are often large and can be intersecting or overlapping. Furthermore, methods for operating on point clouds should not be sensitive to the specific orientation or translation of the point cloud, which is often arbitrary. Here, we frame the point cloud instance segmentation problem as a graph learning problem in which we seek to learn a function that accepts the point cloud as an input and outputs a probability distribution over neighbor graphs in which connected components of the graph correspond to individual object instances. We introduce the Dimensionless Instance Segmentation Transformer (DIST), a deep neural network for spatially invariant instance segmentation of point clouds to solve this point cloud-to-graph problem. DIST uses an SO(n) invariant transformer layer architecture to operate on point clouds of arbitrary dimension and outputs, for each pair of points, the probability that an edge exists between them in the instance graph. We then decode the most likely set of instances using a graph cut. We demonstrate the power of DIST for the segmentation of biomolecules in cryo-electron micrographs and tomograms, far surpassing existing methods for membrane and filament segmentation in empirical evaluation. DIST also applies to scene and object understanding, performing competitively on the ScanNetV2 3D instance segmentation challenge. We anticipate that DIST will underpin a new generation of methods for point cloud segmentation in bioimaging and that our general model and approach will provide useful insights for point cloud segmentation methods in other domains. †

1. INTRODUCTION

Point clouds are a common way to represent objects or scenes in a computer, and are widely used in computer vision, augmented and virtual reality, and imaging. Point clouds of locations are often subsequently processed to semantically classify points -semantic segmentation -or to segment individual objects and instances -instance segmentation (Figure 1 ). Unlike 2D or 3D images, point clouds are disordered, unstructured, and may have noisy point locations, making it difficult to design algorithms or machine learning models to process them. Deep learning methods for processing point clouds have become of increasing interest as more and more point cloud data are being generated by sensors in robotics and as a representation of objects in physics engines, natural images, and bioimaging. Many recent methods have been developed to segment point clouds using deep learning (Lai et al., 2022; Qi et al., 2017; Wang, 2020; Zanjani et al., 2021; Guo et al., 2020; Hong and Pavlic, 2021; Pan et al., 2018; Yuan, 2021) , which address the instance, scene, or part segmentation problems using various architectures or training schemes. However, instance segmentation methods require prior information about the number of present instances or assume some fixed number of instances. Furthermore, many methods convert point clouds into pixel-or voxel-grids to process them with † Code available at redacted. convolutional layers, or otherwise incorporate point coordinates directly into the network, causing their outputs not to be invariant to rotation and translation of the point cloud. In cryo-electron microscopy, an increasingly common task is to segment individual filaments, membranes, organelles, or other biological structures in 2D micrographs or 3D tomograms. These objects are often large and can intersect or overlap causing instance segmentation to be difficult even if a semantic segmentation mask is known. Experienced scientists or technicians often spend weeks to months painstakingly manually labeling these datasets for downstream analysis, limiting throughput. Current state-of-the-art methods barely help. For filament instance segmentation, for example, these methods utilize algorithms custom-tailored to curve tracing (Chai et al., 2022) but have such high error rates that scientists still spend days manually correcting annotation if the algorithms work at all (Redemann et al., 2014; Stalling et al., 2005) . Faster and more accurate instance segmentation methods are urgently needed to facilitate large-scale analysis of these datasets as imaging technology improves. To address these problems, we propose the Dimensionless Instance Segmentation Transformer (DIST). DIST is able to perform SO(n) invariant instance segmentation of point clouds using only geometric features. We accomplish this by framing instance segmentation as a graph prediction problem. Given a point cloud as input, DIST outputs a probability distribution over graphs parameterized by the probability, for each pair of points, that those points are neighbors in sub-graphs defining each instance. Instances, therefore, are defined by connected components of the full point cloud graph. With the output of DIST, we are able to find the most likely instance segmentation using a graph cut. This allows us to perform instance segmentation on any number of underlying instances without any built-in restrictions on the maximum number of instances. Furthermore, DIST is invariant to rotation and translations of the point cloud, because it operates on point-point pairwise representations initially defined by the distances between the points. This also makes DIST dimensionless as the distance between points is invariant for a number of dimensions. The DIST layers update edge representations using geometrically inspired operations, axial attention updates over the source and destination nodes, and a triangular multiplicative update, inspired by (Jumper et al., 2021) . DIST can, optionally, accept additional, non-spatial, point features incorporated via a traditional transformer layer where the attention updates include a bias term learned from the learned edge features. Empirically, we find that DIST performs incredibly well, improving on current state-of-the-art solution for membrane instance segmentation in 2D micrographs and microtubule (MT) segmentation in 3D tomograms by a large margin (from 0.539 mCov for Amira to 0.955 mCov with DIST). DIST also applies to instance segmentation of other point clouds, outperforming other geometric methods for instance segmentation on ScanNetV2 (Dai et al., 2017) and showing competitive results for current state-of-the-art models. In this work, we make the following contributions: • We frame instance segmentation as a graph learning problem, where we learn a function that maps point clouds to distributions over neighbor graphs in which instances are connected components. • We introduce the Dimensionless Instance Segmentation Transformer (DIST) to perform SO(n) invariant inference on the instance neighbor graph using the point cloud as input. 



• DIST can operate on point clouds with only geometric features and can, optionally, incorporate additional point features. • DIST does not require prior knowledge about the number of instances in a point cloud and has no built-in limitations on the number of instances that can be segmented simultaneously. • Empirical results show that DIST dramatically outperforms previous methods for membrane and microtubule segmentation in cryo-electron microscopy data and that DIST outperforms other geometric methods for instance segmentation in natural 3D scene scans. 2 RELATED WORK Recently, interest in point cloud segmentation methods has increased significantly, partly enabled by benchmarks such as ScanNetV2 (Appendix Table B) (Dai et al., 2017). Point cloud segmentation tasks are generally divided into semantic, part, and instance segmentation. Deep learning methods for semantic and part segmentation, such as PointNet++ (Qi et al., 2017) and Point Cloud Transformer (Guo et al., 2020) have achieved significant improvements. Instance segmentation methods have

