DIFFEOMORPHIC TEMPLATE TRANSFORMERS

Abstract

In this paper we propose a spatial transformer network where the spatial transformations are limited to the group of diffeomorphisms. Diffeomorphic transformations are a kind of homeomorphism, which by definition preserve topology, a compelling property in certain applications. We apply this diffemorphic spatial transformer to model the output of a neural network as a topology preserving mapping of a prior shape. By carefully choosing the prior shape we can enforce properties on the output of the network without requiring any changes to the loss function, such as smooth boundaries and a hard constraint on the number of connected components. The diffeomorphic transformer networks outperform their non-diffeomorphic precursors when applied to learn data invariances in classification tasks. On a breast tissue segmentation task, we show that the approach is robust and flexible enough to deform simple artificial priors, such as Gaussian-shaped prior energies, into high-quality predictive probability densities. In addition to desirable topological properties, the segmentation maps have competitive quantitative fidelity compared to those obtained by direct estimation (i.e. plain U-Net).

1. INTRODUCTION

The success of Convolutional Neural Networks (CNNs) in many modeling tasks is often attributed to their depth and inductive bias. An important inductive bias in CNNs is spatial symmetry (e.g. translational equivariance) which are embedded in the architecture through weight-sharing constraints. Alternatively, spatial transformers constrain networks through predicted spatial affine or thin-platespline transformations. In this work, we investigate a special type of spatial transformer, where the transformations are limited to flexible diffeomorphisms. Diffeomorphisms belong to the group of homeomorphisms that preserve topology by design, and thereby guarantee that relations between structures remain, i.e. connected (sub-)regions to stay connected. We propose to use such diffeomorphic spatial transformer in a template transformer setting (Lee et al., 2019) , where a prior shape is deformed to the output of the model. Here a neural network is used to predict the deformation of the shape, rather than the output itself. By introducing a diffeomorphic mapping of a prior shape, and carefully choosing properties of the prior shape, we can enforce desirable properties on the output, such as a smooth decision boundary or a constraint on the number of connected components. To obtain flexible diffeomorphic transformations, we use a technique known as scaling-and-squaring which has been successfully applied in the context of image registration in prior work (Dalca et al., 2018) , but has received relatively little attention in other areas in machine learning. In an attempt to increase flexibility of the flow, we try to approximate a time-dependent parameterisation using Baker-Campbell-Hausdorff (BCH) formula, rather than a stationary field. Hereby, diffeomorphic constraints are directly built into the architecture itself, not requiring any changes to the loss function. Experimentally, we first validate the diffeomorphic spatial transformer to learn data-invariances in a MNIST handwritten digits classification task, as proposed by (Jaderberg et al., 2015) to evaluate the original spatial transformer. The results show that better results can be achieved by employing diffeomorphic transformations. Additionally, we explore the use of diffeomorphic mappings in a spatial template transformer set-up for 3D medical breast tissue segmentation. We find that the diffeomorphic spatial transformer is able to deform simple prior shapes, such as a normally distributed energy, into high-quality predictive probability densities. We are successful in limiting the number of connected components in the output and achieve competitive performance measured by quantitative metrics compared to direct estimation of class probabilities. Spatial Transformers were introduced by Jaderberg et al. ( 2015) as a learnable module that deform an input image, and can be incorporated into CNNs for various tasks. In Spatial Transformer Networks (STNs), the module is used to learn data invariances in order to do better in image classification tasks. The work focuses on simple linear transformations (e.g. translations, rotations, affine) but also allows for more flexible mappings such as thin plate spline (TPS) transformations. The use of spatial transformations in template transformer setting was first proposed by Lee et al. ( 2019), but does not use diffeomorphisms and requires defining a discrete image as shape prior. In the field of image registration, diffeomorphisms have been actively studied and have been succesfully applied in a variety of methods including LDDMM by Beg et al. ( 2005 2020)). It is well known that although these models mathematically describe diffeomorphisms, transformations are not always diffeomorphic; in practice and negative Jacobian determinants can still occur due to approximation errors. To reduce such errors, additional regularisation is often applied (Bro-Nielsen and Gramkow (1996 ), Ashburner (2007 ), Dalca et al. (2018) ), but typically requries careful tuning. Image registration has also been applied to perform segmentation by deforming a basis template commonly referred to as an 'atlas' onto a target image (Rohlfing et al. (2005) , Fortunati et al. ( 2013)), for instance by combining (e.g. averaging) manually labelled training annotations (Gee et al., 1993) . There have been some studies that investigated how to obtain smoother segmentation boundaries in neural-based image registration. For instance, Monteiro et al. (2020) proposed to model spatial correlation by modeling joint distributions over entire label maps, in contrast to pixel-wise estimates. In other work, post-processing steps have been applied in order to smooth predictions or to enforce topological constraints (Chlebus et al. (2018) , Jafari et al. ( 2016)). There have been some studies that try to enforce more consistent topology during training of neural network, but often use a soft constraint that required alteration of the loss function, such as in Hu et al. (2019) , and GAN-based approaches which in addition require a separately trained discriminator model Sekuboyina et al. (2018) . Lastly, there have been some studies in which diffeomorphisms in context of spatial transformer networks were investigated. In Skafte Detlefsen et al. (2018) , subsequent layers of spatial transformer layers with piece-wise affine transformations (PCAB) were used to construct a diffeomorphic neural network, but requires a tessellation strategy (Freifeld et al. (2015 ), Freifeld et al. (2017) ). In Deep Diffeomorphic Normalizing Flows (Salman et al. ( 2018)) a neural network is used to predict diffeomorphic transformations as normalizing flow but to obtain more expressive posteriors for variational inference.

3. DIFFEOMORPHIC SPATIAL TRANSFORMERS

The Spatial Transformer is a learnable module which explicitly allows for spatial manipulation of data within a neural network. The module takes an input feature map U passed through a learnable function which regresses the transformation parameters θ. A spatial grid G over the output is transformed to an output grid T θ (G), which is applied to the input U to produce the output O. In the original spatial transformer, θ could represent arbitrary parameterised mappings such as a simple rotation, translation or affine transformation matrices. We propose flexible transformations in the group of diffeomorphisms T θ ∈ D, which preserve topology, by continuity and continuity of the inverse. In Section 4, we will describe how we can use a diffeomorphic spatial transformer to warp a shape prior, as illustrated in Figure 1 , in a template transformer setting illustrated in Figure 2 .



), Diffeomorphic Demons by Vercauteren et al. (2009), and SyN by Avants et al. (2008). More recently, efforts have been made to fuse such diffeomorphic image registration approaches with neural networks (Dalca et al. (2018), Haskins et al. (

