GRADIENT ORIGIN NETWORKS

Abstract

This paper proposes a new type of generative model that is able to quickly learn a latent representation without an encoder. This is achieved using empirical Bayes to calculate the expectation of the posterior, which is implemented by initialising a latent vector with zeros, then using the gradient of the log-likelihood of the data with respect to this zero vector as new latent points. The approach has similar characteristics to autoencoders, but with a simpler architecture, and is demonstrated in a variational autoencoder equivalent that permits sampling. This also allows implicit representation networks to learn a space of implicit functions without requiring a hypernetwork, retaining their representation advantages across datasets. The experiments show that the proposed method converges faster, with significantly lower reconstruction error than autoencoders, while requiring half the parameters.

1. INTRODUCTION

Observable data in nature has some parameters which are known, such as local coordinates, but also some unknown parameters such as how the data is related to other examples. Generative models, which learn a distribution over observables, are central to our understanding of patterns in nature and allow for efficient query of new unseen examples. Recently, deep generative models have received interest due to their ability to capture a broad set of features when modelling data distributions. As such, they offer direct applications such as synthesising high fidelity images (Karras et al., 2020 ), super-resolution (Dai et al., 2019 ), speech synthesis (Li et al., 2019) , and drug discovery (Segler et al., 2018) , as well as benefits for downstream tasks like semi-supervised learning (Chen et al., 2020) . A number of methods have been proposed such as Variational Autoencoders (VAEs, Figure 1a ), which learn to encode the data to a latent space that follows a normal distribution permitting sampling (Kingma & Welling, 2014) . Generative Adversarial Networks (GANs) have two competing networks, one which generates data and another which discriminates from implausible results (Goodfellow et al., 2014) . Variational approaches that approximate the posterior using gradient descent (Lipton & Tripathi, 2017 ) and short run MCMC (Nijkamp et al., 2020) respectively have been proposed, but to obtain a latent vector for a sample, they require iterative gradient updates. Autoregressive Models (Van Den Oord et al., 2016) decompose the data distribution as the product of conditional distributions and Normalizing Flows (Rezende & Mohamed, 2015) chain together invertible functions; both methods allow exact likelihood inference. Energy-Based Models (EBMs) map data points to  x µ σ z x E D µ + σ (a) VAE 0 x F (b) GON 0 c x F (c) Implicit GON 0 µ σ z x F µ + σ (d) Variational GON



Figure 1: Gradient Origin Networks (GONs; b) use gradients (dashed lines) as encodings thus only a single network F is required, which can be an implicit representation network (c). Unlike VAEs (a) which use two networks, E and D, variational GONs (d) permit sampling with only one network.

