DEEP GENERATIVE MODELING ON LIMITED DATA WITH REGULARIZATION BY NONTRANSFERABLE PRE-TRAINED MODELS

Abstract

Deep generative models (DGMs) are data-eager because learning a complex model on limited data suffers from a large variance and easily overfits. Inspired by the classical perspective of the bias-variance tradeoff, we propose regularized deep generative model (Reg-DGM), which leverages a nontransferable pre-trained model to reduce the variance of generative modeling with limited data. Formally, Reg-DGM optimizes a weighted sum of a certain divergence and the expectation of an energy function, where the divergence is between the data and the model distributions, and the energy function is defined by the pre-trained model w.r.t. the model distribution. We analyze a simple yet representative Gaussian-fitting case to demonstrate how the weighting hyperparameter trades off the bias and the variance. Theoretically, we characterize the existence and the uniqueness of the global minimum of Reg-DGM in a non-parametric setting and prove its convergence with neural networks trained by gradient-based methods. Empirically, with various pretrained feature extractors and a data-dependent energy function, Reg-DGM consistently improves the generation performance of strong DGMs with limited data and achieves competitive results to the state-of-the-art methods. Our implementa-

1. INTRODUCTION

Deep generative models (DGMs) (Kingma & Welling, 2013; Goodfellow et al., 2014; Sohl-Dickstein et al., 2015; Van den Oord et al., 2016; Dinh et al., 2016; Hinton & Salakhutdinov, 2006) employ neural networks to capture the underlying distribution of high-dimensional data and find applications in various learning tasks (Kingma et al., 2014; Zhu et al., 2017; Razavi et al., 2019; Ramesh et al., 2021; 2022; Ho et al., 2022) . Such models are often data-eager (Li et al., 2021; Wang et al., 2018) due to the presence of complex function classes. Recent work (Karras et al., 2020a) found that the classical variants of generative adversarial networks (GANs) (Goodfellow et al., 2014; Karras et al., 2020b) produce poor samples with limited data, which is shared by other DGMs in principle. Thus, improving the sample efficiency is a common challenge for DGMs. The root cause of the problem is that learning a model in a complex class on limited data suffers from a large variance and easily overfits the training data (Mohri et al., 2018) . To relieve the problem, previous work either employed sophisticated data augmentation strategies (Zhao et al., 2020a; Karras et al., 2020a; Jiang et al., 2021) , or designed new losses for the discriminator in GANs (Cui et al., 2021; Yang et al., 2021) , or transferred a pre-trained DGM (Wang et al., 2018; Noguchi & Harada, 2019; Mo et al., 2020) . Although not pointed out in the literature to our knowledge, prior work can be understood as reducing the variance of the estimate implicitly (Mohri et al., 2018) . In

availability

//github.com/ML

