EXACT MANIFOLD GAUSSIAN VARIATIONAL

Abstract

We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference that implicitly satisfies the positive definite constraint on the variational covariance matrix. Our Exact manifold Gaussian Variational Bayes (EMGVB) provides exact but simple update rules and is straightforward to implement. Due to its black-box nature, EMGVB stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical and econometric models, discussing its performance with respect to baseline methods.

1. INTRODUCTION

Although Bayesian principles are not new to Machine Learning (ML) (e.g. Mackay, 1992; 1995; Lampinen & Vehtari, 2001) , it has been only recently that feasible methods boosted a growing use of Bayesian methods within the field (e.g. Zhang et al., 2018; Trusheim et al., 2018; Osawa et al., 2019; Khan et al., 2018b; Khan & Nielsen, 2018) . In the typical ML settings the applicability of sampling methods for the challenging computation of the posterior is prohibitive, however approximate methods such as Variational Inference (VI) have been proved suitable and successful (Saul et al., 1996; Wainwright & Jordan, 2008; Hoffman et al., 2013; Blei et al., 2017) . VI is generally performed with Stochastic Gradient Descent (SGD) methods (Robbins & Monro, 1951; Hoffman et al., 2013; Salimans & Knowles, 2014) , boosted by the use of natural gradients (Hoffman et al., 2013; Wierstra et al., 2014; Khan et al., 2018b) , and the updates often take a simple form (Khan & Nielsen, 2018; Osawa et al., 2019; Magris et al., 2022) . The majority of VI algorithms rely on the extensive use of models' gradients and the form of the variational posterior implies additional model-specific derivations that are not easy to adapt to a general, plug-and-play optimizer. Black box methods (Ranganath et al., 2014) , are straightforward to implement and versatile use as they avoid model-specific derivations by relying on stochastic sampling (Salimans & Knowles, 2014; Paisley et al., 2012; Kingma & Welling, 2013) . The increased variance in the gradient estimates as opposed to e.g. methods relying on the Reparametrization Trick (RT) (Blundell et al., 2015; Xu et al., 2019 ) can be alleviated with variance reduction techniques (e.g Magris et al., 2022). Furthermore, the majority of existing algorithms do not directly address parameters' constraints. Under the typical Gaussian variational assumption, granting positive-definiteness of the covariance matrix is an acknowledged problem (e.g Tran et al., 2021a; Khan et al., 2018b; Lin et al., 2020) . Only a few algorithms directly tackle the problem (Osawa et al., 2019; Lin et al., 2020) , see Section 3. A recent approximate approach based on manifold optimization is provided by Tran et al. (2021a) . On the theoretical results of Khan & Lin (2017) ; Khan et al. (2018a) we develop an exact version of Tran et al. (2021a) , resulting in an algorithm that explicitly tackles the positive-definiteness constraint for the variational covariance matrix and resembles the readily-applicable natural-gradient black-box framework of (Magris et al., 2022) . For its implementation, we discuss recommendations and practicalities, show that EMGVB is of simple implementation, and demonstrate its feasibility in extensive experiments over four datasets, 12 models, and three competing VI optimizers. In Section 2 we review the basis of VI, in Section 3 we review the Manifold Gaussian Variational Bayes approach and other related works, in Section 4 we discuss our proposed approach. Experi-

