SPATIAL DEPENDENCY NETWORKS: NEURAL LAYERS FOR IMPROVED GENERATIVE IMAGE MODELING

Abstract

How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechanism that distributes contextual information across 2-D space. We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation over baseline convolutional architectures and the state-of-the-art among the models within the same class. Furthermore, we demonstrate that SDN can be applied to large images by synthesizing samples of high quality and coherence. In a vanilla VAE setting, we find that a powerful SDN decoder also improves learning disentangled representations, indicating that neural architectures play an important role in this task. Our results suggest favoring spatial dependency over convolutional layers in various VAE settings.

1. INTRODUCTION

The abundance of data and computation are often identified as core facilitators of the deep learning revolution. In addition to this technological leap, historically speaking, most major algorithmic advancements critically hinged on the existence of inductive biases, incorporating prior knowledge in different ways. Main breakthroughs in image recognition (Cireşan et al., 2012; Krizhevsky et al., 2012) were preceded by the long-standing pursuit for shift-invariant pattern recognition (Fukushima & Miyake, 1982) which catalyzed the ideas of weight sharing and convolutions (Waibel, 1987; Le-Cun et al., 1989) . Recurrent networks (exploiting temporal recurrence) and transformers (modeling the "attention" bias) revolutionized the field of natural language processing (Mikolov et al., 2011; Vaswani et al., 2017) . Visual representation learning is also often based on priors e.g. independence of latent factors (Schmidhuber, 1992; Bengio et al., 2013) or invariance to input transformations (Becker & Hinton, 1992; Chen et al., 2020) . Clearly, one promising strategy to move forward is to introduce more structure into learning algorithms, and more knowledge on the problems and data. Along this line of thought, we explore a way to improve the architecture of deep neural networks that generate images, here referred to as (deep) image generators, by incorporating prior assumptions based on topological image structure. More specifically, we aim to integrate the priors on spatial dependencies in images. We would like to enforce these priors on all intermediate image representations produced by an image generator, including the last one from which the final image is synthesized. To that end, we introduce a class of neural networks designed specifically for building image generators -spatial dependency network (SDN). Concretely, spatial dependency layers of

availability

://github.com/djordjemila

