CAN KERNEL TRANSFER OPERATORS HELP FLOW BASED GENERATIVE MODELS?

Abstract

Flow-based generative models refer to deep generative models with tractable likelihoods, and offer several attractive properties including efficient density estimation and sampling. Despite many advantages, current formulations (e.g., normalizing flow) often have an expensive memory/runtime footprint, which hinders their use in a number of applications. In this paper, we consider the setting where we have access to an autoencoder, which is suitably effective for the dataset of interest. Under some mild conditions, we show that we can calculate a mapping to a RKHS which subsequently enables deploying mature ideas from the kernel methods literature for flow-based generative models. Specifically, we can explicitly map the RKHS distribution (i.e., approximate the flow) to match or align with a template/well-characterized distribution, via kernel transfer operators. This leads to a direct and resource efficient approximation avoiding iterative optimization. We empirically show that this simple idea yields competitive results on popular datasets such as CelebA, as well as promising results on a public 3D brain imaging dataset where the sample sizes are much smaller.



). An important property of such models is the explicit use of a tractable likelihood function, which enables leveraging maximum likelihood principles during training as well as efficient/exact density estimation and sampling. The formulation is invertible by design but this involves higher memory requirements. For example, permitting the bijective mapping to be expressive enough involves increases in the memory footprint Lee et al. ( 2020 



based generative model refers to a deep generative model composed using a set of invertible transformations. While GANs and VAEs remain the two dominant generative models in the community, flow based formulations have continually evolved and now offer competitive performance in applications including audio/speech synthesis Kim et al. (2019; 2020), text to speech Miao et al. (2020), photo-realistic image generation Kingma & Dhariwal (2018), and learning cross-domain mappings Mahajan et al. (

Figure 1: Representative generated images (of resolution 128 × 128) using our proposed algorithm.

