QUANTITATIVE UNIVERSAL APPROXIMATION BOUNDS FOR DEEP BELIEF NETWORKS

Abstract

We show that deep belief networks with binary hidden units can approximate any multivariate probability density under very mild integrability requirements on the parental density of the visible nodes. The approximation is measured in the L q -norm for q ∈ [1, ∞] (q = ∞ corresponding to the supremum norm) and in Kullback-Leibler divergence. Furthermore, we establish sharp quantitative bounds on the approximation error in terms of the number of hidden units.

1. INTRODUCTION

Deep belief networks (DBNs) are a class of generative probabilistic models obtained by stacking several restricted Boltzmann machines (RBMs, Smolensky (1986) ). 2019). However, our theoretical understanding of the class of continuous probability distributions, which can be approximated by them, is limited. The ability to approximate a broad class of probability distributions-usually referred to as universal approximation property-is still an open problem for DBNs with real-valued visible units. As a measure of proximity between two real-valued probability density functions, one typically considers the L q -distance or the Kullback-Leibler divergence.

Contributions.

In this article we study the approximation properties of deep belief networks for multivariate continuous probability distributions which have a density with respect to the Lebesgue measure. We show that, as m → ∞, the universal approximation property holds for binary-binary DBNs with two hidden layers of sizes m and m + 1, respectively. Furthermore, we provide an explicit quantitative bound on the approximation error in terms of m. More specifically, the main contributions of this article are: • For each q ∈ [1, ∞) we show that DBNs with two binary hidden layers and parental density ϕ : R d → R + can approximate any probability density f : R d → R + in the L q -norm, solely under the condition that f, ϕ ∈ L q (R d ), where L q (R d ) = f : R d → R : f L q = R d f (x) q dx 1 q < ∞ . In addition, we prove that the error admits a bound of order O m 1 min(q,2) -1 for each q ∈ (1, ∞), where m is the number of hidden neurons. • If the target density f is uniformly continuous and the parental density ϕ is bounded, we provide an approximation result in the L ∞ -norm (also known as supremum or uniform norm), where L ∞ (R d ) = f : R d → R : f L ∞ = sup x∈R d f (x) < ∞ .



For a brief introduction to RBMs and DBNs we refer the reader to the survey articles Fischer & Igel (2012; 2014); Montúfar (2016); Ghojogh et al. (2021). Since their introduction, see Hinton et al. (2006); Hinton & Salakhutdinov (2006), DBNs have been successfully applied to a variety of problems in the domains of natural language processing Hinton (2009); Jiang et al. (2018), bioinformatics Wang & Zeng (2013); Liang et al. (2014); Cao et al. (2016); Luo et al. (2019), financial markets Shen et al. (2015) and computer vision Abdel-Zaher & Eldeib (2016); Kamada & Ichimura (2016; 2019); Huang et al. (

