LEARNING HYPERBOLIC REPRESENTATIONS FOR UN-SUPERVISED 3D SEGMENTATION

Abstract

There exists a need for unsupervised 3D segmentation on complex volumetric data, particularly when annotation ability is limited or discovery of new categories is desired. Using the observation that much of 3D volumetric data is innately hierarchical, we propose learning effective representations of 3D patches for unsupervised segmentation through a variational autoencoder (VAE) with a hyperbolic latent space and a proposed gyroplane convolutional layer, which better models the underlying hierarchical structure within a 3D image. We also introduce a hierarchical triplet loss and multi-scale patch sampling scheme to embed relationships across varying levels of granularity. We demonstrate the effectiveness of our hyperbolic representations for unsupervised 3D segmentation on a hierarchical toy dataset, BraTS whole tumor dataset, and cryogenic electron microscopy data.

1. INTRODUCTION

Recent advances in technology have greatly increased both the availability of 3D data, as well as the need to process and learn from 3D data. In particular, technologies such as magnetic resonance imaging and cryogenic electron microscopy (cryo-EM) have led to greater availability of 3D voxel data. Deep learning is a promising technique to do so, but producing annotations for 3D data can be extremely expensive, especially for richer tasks such as segmentation in dense voxel grids. In some cases, labels may also be impossible to produce due to the limitations of current knowledge, or may introduce bias if we want to conduct scientific discovery. Unsupervised learning, which does not require annotations, is a promising approach for overcoming these limitations. In this work, we tackle the challenging problem of unsupervised segmentation on complex 3D voxel data by addressing the essential challenge of representation learning. We expand from prior literature in the hyperbolic domain that conducts classification in simple data to the task of segmentation in 3D images, which requires significantly more representation discriminability. In order to learn effective representations, we need to capture the structure of our input data. We observe that 3D images often have inherent hierarchical structure: as a biomedical example, a cryo-EM tomogram of a cell has a hierarchy that at the highest level comprises the entire cell; at a finer level comprises organelles such as the mitochondria and nucleus; and at an even finer level comprises sub-structures such as the nucleolus of a nucleus or proteins within organelles. For downstream analysis, we are typically interested in the unsupervised discovery and segmentation of structures spanning multiple levels of hierarchy. However, prior work on representation learning for unsupervised 3D segmentation does not explicitly model hierarchical structure between different regions of a 3D image. We argue that this hampers the ability to leverage hierarchical relationships to improve segmentation in complex 3D images. Our key insight is that we can utilize a hyperbolic embedding space to learn effective hierarchical representations of voxel regions in 3D images. Hyperbolic representations have been proposed as a continuous way to represent hierarchical data, as trees can be embedded in hyperbolic space with arbitrarily low error (Sarkar, 2011) . These methods have shown promise for modeling data types such as natural language word taxonomies (Nickel & Kiela, 2017; 2018 ), graphs (Nickel & Kiela, 2017; Mathieu et al., 2019; Ovinnikov, 2019; Chami et al., 2019) , as well as simple MNIST (LeCun et al., 2010) image data for classification (Mathieu et al., 2019) . To the best of our knowledge, our work is the first to introduce learning hyperbolic representations to capture hierarchical structure

