CAN KERNEL TRANSFER OPERATORS HELP FLOW BASED GENERATIVE MODELS?

Abstract

Flow-based generative models refer to deep generative models with tractable likelihoods, and offer several attractive properties including efficient density estimation and sampling. Despite many advantages, current formulations (e.g., normalizing flow) often have an expensive memory/runtime footprint, which hinders their use in a number of applications. In this paper, we consider the setting where we have access to an autoencoder, which is suitably effective for the dataset of interest. Under some mild conditions, we show that we can calculate a mapping to a RKHS which subsequently enables deploying mature ideas from the kernel methods literature for flow-based generative models. Specifically, we can explicitly map the RKHS distribution (i.e., approximate the flow) to match or align with a template/well-characterized distribution, via kernel transfer operators. This leads to a direct and resource efficient approximation avoiding iterative optimization. We empirically show that this simple idea yields competitive results on popular datasets such as CelebA, as well as promising results on a public 3D brain imaging dataset where the sample sizes are much smaller.

1. INTRODUCTION

A flow-based generative model refers to a deep generative model composed using a set of invertible transformations. While GANs and VAEs remain the two dominant generative models in the community, flow based formulations have continually evolved and now offer competitive performance in applications including audio/speech synthesis Kim et al. (2019; 2020 ), text to speech Miao et al. (2020) , photo-realistic image generation Kingma & Dhariwal (2018) , and learning cross-domain mappings Mahajan et al. (2020) . An important property of such models is the explicit use of a tractable likelihood function, which enables leveraging maximum likelihood principles during training as well as efficient/exact density estimation and sampling. The formulation is invertible by design but this involves higher memory requirements. For example, permitting the bijective mapping to be expressive enough involves increases in the memory footprint Lee et al. ( 2020 training samples. In the generation step, we need the inverse of this mapping (given such an inverse exists) to map from a sample drawn from the known distribution back to the input (data) space. When the Jacobian of the transformation mapping can be efficiently computed or estimated (e.g., having a lower triangular form), directly optimizing the likelihood of the training samples is possible. However, in training flow-based generative either we must restrict the expressiveness at each layer or fall back on more numerically heavy solutions, see (Chen et al., 2018) . Next, we discuss how several existing results may provide a simplification strategy. 2020), that offers an arguably easier, optionally linear, procedure that can be used to analyze non-linear dynamics of measurements that evolve temporally. For instance, as described in Arbabi (2018); Lusch et al. ( 2018), if we view the data/measurements as evaluations of functions of the state (of a dynamical system) -where the functions are also called observables -then the entire set of such functions forms a linear vector space. Transfer operators on this space describe a linear evolution of the dynamics, i.e., finite-dimensional nonlinear dynamics are replaced by infinite-dimensional linear dynamics Brunton et al. ( 2017), perfectly evolving one set of measurements to another over time if the space can be well characterized. Of course, this is not practically beneficial because constructing such infinite-dimensional spaces could be intractable. Nonetheless, results in optimal control demonstrate that the idea can still be effective in specific cases, using approximations with either spectral analysis of large but finite number of functions Williams et al. (2015) or via a search for potential eigenfunctions of the operators using neural networks Li et al. ( 2017); Lusch et al. (2018) . Within the last year, several results describe its potential benefits of such operators in machine learning problems as well Li et al. ( 2020 If we consider the transformations that flow-based generative models learn as a non-linear dynamics, also used in (Chen et al., 2018) , a data-driven approximation strategy one can consider is to map the given data (or distribution) into infinite dimension space of functions through the kernel trick, which may allow the use of well known results based on kernel methods, including old and new results on powerful neural kernels Neal (1994); Jacot et al. ( 2018); Arora et al. (2019) . Utilizing these results, a mean embedding on the corresponding Reproducing Kernel Hilbert Space (RKHS) would correspond to the distribution in the input space (the distribution from which input samples are drawn). Therefore, the problem of identifying a nonlinear mapping (or dynamics) in the input space (going from an intractable distribution to a known distribution or vice-versa) reduces to estimating a linear mapping operator between two empirical mean kernel embeddings where recent results on kernel transfer operators Klus et al. ( 2020) could be relevant or applicable. However, due to the high variability of the data, estimation of the distribution directly in the input space, as we will see shortly, can be difficult. But, if the input space is low-dimensional or otherwise structured, this problem could be mitigated. Fortunately, for many image datasets, one can identify a low-dimensional latent space such that, in theory, the above pipeline could be instantiated, enabling us to learn a transfer operator. Conceptually, it is not difficult to see how the foregoing idea could potentially help (or simplify) flow-based generative models. In principle, using a transfer operator, one could push-forward the input data distribution to a target distribution of our choice, if both have already been mapped to a sufficiently high dimensional space. If -additionally -the operator could also be inverted, this strategy may, at least, be viable. Of course, several key components are missing. We need to (a) assess if setting up a suitable infinite dimensional space is possible, (b) identify if we can estimate the transfer operator and then finally, (c) check if the procedure works at all. In the following sections of the paper, we will verify these key components and show that using only a linear operator yield surprisingly competitive results on several image generation tasks.

2. PRELIMINARIES

Auto-encoders. Images often lie close to an (unknown) lower dimensional manifold M ⊂ R m such that dim(M) m, and operating with densities in a lower dimensional setting is often much easier.



); Kim et al. (2019), an issue that is a focus of several recent results Jacobsen et al. (2018); Chen et al. (2016). Moreover, in these models we need to calculate the inverse and backpropagate through all invertible transformations during training. Calculating the inverse incurs a multiplicative increase in cost, usually as a function of the feature dimension, relative to the calculation of the likelihood, an issue addressed to some extent in Dinh et al. (2017); Kingma & Dhariwal (2018).

Figure 1: Representative generated images (of resolution 128 × 128) using our proposed algorithm.

1.1 RELATED WORKS AND RATIONALE Our starting point is the existing literature on Koopman and Perron-Frobenius operators Song et al. (2009); Fukumizu et al. (2013); Klus et al. (

); Azencot et al. (2020).

