FUNCTION-SPACE REGULARIZED RÉNYI DIVERGENCES

Abstract

We propose a new family of regularized Rényi divergences parametrized not only by the order α but also by a variational function space. These new objects are defined by taking the infimal convolution of the standard Rényi divergence with the integral probability metric (IPM) associated with the chosen function space. We derive a novel dual variational representation that can be used to construct numerically tractable divergence estimators. This representation avoids risk-sensitive terms and therefore exhibits lower variance, making it well-behaved when α > 1; this addresses a notable weakness of prior approaches. We prove several properties of these new divergences, showing that they interpolate between the classical Rényi divergences and IPMs. We also study the α → ∞ limit, which leads to a regularized worst-case-regret and a new variational representation in the classical case. Moreover, we show that the proposed regularized Rényi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous, e.g., empirical measures and distributions with lowdimensional support. We present numerical results on both synthetic and real datasets, showing the utility of these new divergences in both estimation and GAN training applications; in particular, we demonstrate significantly reduced variance and improved training performance.

1. INTRODUCTION

Rényi divergence, Rényi (1961) , is a significant extension of Kullback-Leibler (KL) divergence for numerous applications; see, e.g., Van Erven & Harremos (2014) . The recent neural-based estimators for divergences Belghazi et al. (2018) along with generative adversarial networks (GANs) Goodfellow et al. (2014) accelerated the use of divergences in the field of deep learning. The neural-based divergence estimators are feasible through the utilization of variational representation formulas. These formulas are essentially lower bounds (and, occasionally, upper bounds) which are approximated by tractable statistical averages. The estimation of a divergence based on variational formulas is a notoriously difficult problem. Challenges include potentially high bias that may require an exponential number of samples McAllester & Stratos (2020) or the exponential statistical variance for certain variational estimators Song & Ermon (2019), rendering divergence estimation both data inefficient and computationally expensive. This is especially prominent for Rényi divergences with order larger than 1. Indeed, numerical simulations have shown that, unless the distributions P and Q are very close to one another, the Rényi divergence R α (P ∥Q) is almost intractable to estimate when α > 1 due to the high variance of the statistically-approximated risk-sensitive observables Birrell et al. (2021) , see also the recent analysis in Lee & Shin (2022) . A similar issue has also been observed for the KL divergence, Song & Ermon (2019) . Overall, the lack of estimators with low variance for Rényi divergences has prevented wide-spread and accessible experimentation with this class of information-theoretic tools, except in very special cases. We hope our results here will provide a suitable set of tools to address this gap in the methodology. One approach to variance reduction is the development of new variational formulas. This direction is especially fruitful for the estimation of mutual information van den Oord et al. (2018); Cheng et al. (2020) . Another approach is to regularize the divergence by restricting the function space of the variational formula. Indeed, instead of directly attacking the variance issue, the function space of the variational formula can be restricted, for instance, by bounding the test functions or more appropriately by bounding the derivative of the test functions. The latter regularization leads to

