SHAPE-TAILORED DEEP NEURAL NETWORKS

Abstract

We present Shape-Tailored Deep Neural Networks (ST-DNN). ST-DNN extend convolutional networks (CNN), which aggregate data from fixed shape (square) neighborhoods, to compute descriptors defined on arbitrarily shaped regions. This is natural for segmentation, where descriptors should describe regions (e.g., of objects) that have diverse shape. We formulate these descriptors through the Poisson partial differential equation (PDE), which can be used to generalize convolution to arbitrary regions. We stack multiple PDE layers to generalize a deep CNN to arbitrary regions, and apply it to segmentation. We show that ST-DNN are covariant to translations and rotations and robust to domain deformations, natural for segmentation, which existing CNN based methods lack. ST-DNN are 3-4 orders of magnitude smaller then CNNs used for segmentation. We show that they exceed segmentation performance compared to state-of-the-art CNN-based descriptors using 2-3 orders smaller training sets on the texture segmentation problem.

1. INTRODUCTION

Convolutional neural networks (CNNs) have been used extensively for segmentation problems in computer vision He et al. (2017) ; He et al. (2016) ; Chen et al. (2017) ; Xie & Tu (2015) . CNNs provide a framework for learning descriptors that are able to discriminate different textured or semantic regions within images. Much progress has been made in segmentation with CNNs but results are still far from human performance. Also, significant engineering must be performed to adapt CNNs to segmentation problems. A basic component in the architecture for segmentation problems involves labeling or grouping dense descriptors returned by a backbone CNN. A difficulty in grouping these descriptors arises, especially near the boundaries of segmentation regions, as CNN descriptors aggregate data from fixed shape (square neighborhoods) at each pixel and may thus aggregate data from different regions. This makes grouping these descriptors into a unique region difficult, which often results in errors in the grouping. In segmentation problems (e.g., semantic segmentation), current methods attempt to mitigate these errors by adding post-processing layers that aim to group simultaneously the (coarse-scale) descriptors from the CNN backbone and the fine-level pixel data. However, the errors introduced might not always be fixed. A more natural approach to avoid this problem is to consider the coarse and fine structure together, avoiding aggregation across boundaries, to prevent errors at the outset. To avoid such errors, one could design descriptors that aggregate data only within boundaries. To this end, Khan et al. (2015) introduced "shape-tailored" descriptors that aggregate data within a region of interest, and used these descriptors for segmentation. However, these descriptors are hand-crafted and do not perform on-par with learned approaches. Khan & Sundaramoorthi (2018) introduced learned shape-tailored descriptors by learning a neural network operating on the input channel dimension of input hand-crafted shape-tailored descriptors for segmentation. However, these networks, though deep in the channel dimension, did not filter data spatially within layers. Since an advantage of CNNs comes from exploiting spatial filtering at each depth of the network, in this work, we design shape-tailored networks that are deep and perform shape-tailored filtering in space at each layer using solutions of the Poisson PDE. This results in shape-tailored networks that provide more discriminative descriptors than a single shape-tailored kernel. This extension requires development of techniques to back-propagate through PDEs, which we derive in this work. Our contributions are specifically: 1. We construct and show how to train ST-DNN, deep networks that perform shape-tailored spatial filtering via the Poisson PDE at each depth so as to generalize a CNN to arbitrarily shaped regions. 2. We show analytically and empirically that ST-DNNs are covariant to translations and rotations as they inherit this property from the Poisson PDE. In segmentation, covariance (a.k.a., equivariance)

