GRADIENT ORIGIN NETWORKS

Abstract

This paper proposes a new type of generative model that is able to quickly learn a latent representation without an encoder. This is achieved using empirical Bayes to calculate the expectation of the posterior, which is implemented by initialising a latent vector with zeros, then using the gradient of the log-likelihood of the data with respect to this zero vector as new latent points. The approach has similar characteristics to autoencoders, but with a simpler architecture, and is demonstrated in a variational autoencoder equivalent that permits sampling. This also allows implicit representation networks to learn a space of implicit functions without requiring a hypernetwork, retaining their representation advantages across datasets. The experiments show that the proposed method converges faster, with significantly lower reconstruction error than autoencoders, while requiring half the parameters.

1. INTRODUCTION

Observable data in nature has some parameters which are known, such as local coordinates, but also some unknown parameters such as how the data is related to other examples. Generative models, which learn a distribution over observables, are central to our understanding of patterns in nature and allow for efficient query of new unseen examples. Recently, deep generative models have received interest due to their ability to capture a broad set of features when modelling data distributions. As such, they offer direct applications such as synthesising high fidelity images (Karras et al., 2020 ), super-resolution (Dai et al., 2019 ), speech synthesis (Li et al., 2019) , and drug discovery (Segler et al., 2018) , as well as benefits for downstream tasks like semi-supervised learning (Chen et al., 2020) . A number of methods have been proposed such as Variational Autoencoders (VAEs, Figure 1a ), which learn to encode the data to a latent space that follows a normal distribution permitting sampling (Kingma & Welling, 2014) . Generative Adversarial Networks (GANs) have two competing networks, one which generates data and another which discriminates from implausible results (Goodfellow et al., 2014) . Variational approaches that approximate the posterior using gradient descent (Lipton & Tripathi, 2017 ) and short run MCMC (Nijkamp et al., 2020) respectively have been proposed, but to obtain a latent vector for a sample, they require iterative gradient updates. Autoregressive Models (Van Den Oord et al., 2016) decompose the data distribution as the product of conditional distributions and Normalizing Flows (Rezende & Mohamed, 2015) chain together invertible functions; both methods allow exact likelihood inference. Energy-Based Models (EBMs) map data points to energy values proportional to likelihood thereby permitting sampling through the use of Monte Carlo Markov Chains (Du & Mordatch, 2019) . In general to support encoding, these approaches require separate encoding networks, are limited to invertible functions, or require multiple sampling steps. x µ σ z x E D µ + σ (a) VAE 0 x F (b) GON 0 c x F (c) Implicit GON 0 µ σ z x F µ + σ (d) Variational GON Implicit representation learning (Park et al., 2019; Tancik et al., 2020) , where a network is trained on data parameterised continuously rather than in discrete grid form, has seen a surge of interest due to the small number of parameters, speed of convergence, and ability to model fine details. In particular, sinusoidal representation networks (SIRENs) (Sitzmann et al., 2020b) achieve impressive results, modelling many signals with high precision, thanks to their use of periodic activations paired with carefully initialised MLPs. So far, however, these models have been limited to modelling single data samples, or use an additional hypernetwork or meta learning (Sitzmann et al., 2020a) to estimate the weights of a simple implicit model, adding significant complexity. This paper proposes Gradient Origin Networks (GONs), a new type of generative model (Figure 1b ) that do not require encoders or hypernetworks. This is achieved by initialising latent points at the origin, then using the gradient of the log-likelihood of the data with respect to these points as the latent space. At inference, latent vectors can be obtained in a single step without requiring iteration. GONs are shown to have similar characteristics to convolutional autoencoders and variational autoencoders using approximately half the parameters, and can be applied to implicit representation networks (such as SIRENs) allowing a space of implicit functions to be learned with a simpler overall architecture.

2. PRELIMINARIES

We first introduce some background context that will be used to derive our proposed approach.

2.1. EMPIRICAL BAYES

The concept of empirical Bayes (Robbins, 1956; Saremi & Hyvarinen, 2019) , for a random variable z ∼ p z and particular observation z 0 ∼ p z0 , provides an estimator of z expressed purely in terms of p(z 0 ) that minimises the expected squared error. This estimator can be written as a conditional mean: ẑ(z 0 ) = zp(z|z 0 )dz = z p(z, z 0 ) p(z 0 ) dz. Of particular relevance is the case where z 0 is a noisy observation of z with covariance Σ. In this case p(z 0 ) can be represented by marginalising out z: p(z 0 ) = 1 (2π) d/2 | det(Σ)| 1/2 exp -(z 0 -z) T Σ -1 (z 0 -z)/2 p(z)dz. (2) Differentiating this with respect to z 0 and multiplying both sides by Σ gives: Σ∇ z0 p(z 0 ) = (z -z 0 )p(z, z 0 )dz = zp(z, z 0 )dz -z 0 p(z 0 ). After dividing through by p(z 0 ) and combining with Equation 1 we obtain a closed form estimator of z (Miyasawa, 1961) written in terms of the score function ∇ log p(z 0 ) (Hyvärinen, 2005): ẑ(z 0 ) = z 0 + Σ∇ z0 log p(z 0 ). (4) This optimal procedure is achieved in what can be interpreted as a single gradient descent step, with no knowledge of the prior p(z). By rearranging Equation 4, a definition of ∇ log p(z 0 ) can be derived; this can be used to train models that approximate the score function (Song & Ermon, 2019).

2.2. VARIATIONAL AUTOENCODERS

Variational Autoencoders (VAEs; Kingma & Welling 2014) are a probabilistic take on standard autoencoders that permit sampling. A latent-based generative model p θ (x|z) is defined with a normally distributed prior over the latent variables, p θ (z) = N (z; 0, I d ). p θ (x|z) is typically parameterised as a Bernoulli, Gaussian, multinomial distribution, or mixture of logits. In this case, the true posterior p θ (z|x) is intractable, so a secondary encoding network q φ (z|x) is used to approximate the true posterior; the pair of networks thus resembles a traditional autoencoder. This allows VAEs to approximate p θ (x) by maximising the evidence lower bound (ELBO), defined as: log p θ (x) ≥ L VAE = -D KL (N (q φ (z|x))||N (0, I d )) + E q φ (z|x) [log p θ (x|z)]. (5)



Figure 1: Gradient Origin Networks (GONs; b) use gradients (dashed lines) as encodings thus only a single network F is required, which can be an implicit representation network (c). Unlike VAEs (a) which use two networks, E and D, variational GONs (d) permit sampling with only one network.

