DEEP VARIATIONAL IMPLICIT PROCESSES

Abstract

Implicit processes (IPs) are a generalization of Gaussian processes (GPs). IPs may lack a closed-form expression but are easy to sample from. Examples include, among others, Bayesian neural networks or neural samplers. IPs can be used as priors over functions, resulting in flexible models with well-calibrated prediction uncertainty estimates. Methods based on IPs usually carry out function-space approximate inference, which overcomes some of the difficulties of parameterspace approximate inference. Nevertheless, the approximations employed often limit the expressiveness of the final model, resulting, e.g., in a Gaussian predictive distribution, which can be restrictive. We propose here a multi-layer generalization of IPs called the Deep Variational Implicit process (DVIP). This generalization is similar to that of deep GPs over GPs, but it is more flexible due to the use of IPs as the prior distribution over the latent functions. We describe a scalable variational inference algorithm for training DVIP and show that it outperforms previous IPbased methods and also deep GPs. We support these claims via extensive regression and classification experiments. We also evaluate DVIP on large datasets with up to several million data instances to illustrate its good scalability and performance.

1. INTRODUCTION

The Bayesian approach has become popular for capturing the uncertainty associated to the predictions made by models that otherwise provide point-wise estimates, such as neural networks (NNs) (Gelman et al., 2013; Gal, 2016; Murphy, 2012) . However, when carrying out Bayesian inference, obtaining the posterior distribution in the space of parameters can become a limiting factor since it is often intractable. Symmetries and strong dependencies between parameters make the approximate inference problem much more complex. This is precisely the case in large deep NNs. Nevertheless, all these issues can be alleviated by carrying out approximate inference in the space of functions, which presents certain advantages due to the simplified problem. This makes the approximations obtained in this space more precise than those obtained in parameter-space, as shown in the literature (Ma et al., 2019; Sun et al., 2019; Rodríguez Santana et al., 2022; Ma and Hernández-Lobato, 2021) . A recent method for function-space approximate inference is the Variational Implicit Process (VIP) (Ma et al., 2019) . VIP considers an implicit process (IP) as the prior distribution over the target function. IPs constitute a very flexible family of priors over functions that generalize Gaussian processes (Ma et al., 2019) . Specifically, IPs are processes that may lack a closed-form expression, but that are easy-to-sample-from. Examples include Bayesian neural networks (BNN), neural samplers and warped GPs, among others (Rodríguez Santana et al., 2022) . Figure 1 (left) shows a BNN, which is a particular case of an IP. Nevertheless, the posterior process of an IP is is intractable most of the times (except in the particular case of GPs). VIP addresses this issue by approximating the posterior using the posterior of a GP with the same mean and covariances as the prior IP. Thus, the approximation used in VIP results in a Gaussian predictive distribution, which may be too restrictive. Recently, the concatenation of random processes has been used to produce models of increased flexibility. An example are deep GPs (DGPs) in which a GP is used as the input of another GP, systematically (Damianou and Lawrence, 2013) . Based on the success of DGPs, it is natural to consider the concatenation of IPs to extend their capabilities in a similar fashion to DGPs. Therefore, we introduce in this paper deep VIPs (DVIPs), a multi-layer extension of VIP that provides increased expressive power, enables more accurate predictions, gives better calibrated uncertainty estimates, and captures more complex patterns in the data. Figure 1 (right) shows the architecture considered in

