JOINT LEARNING OF FULL-STRUCTURE NOISE IN HIERARCHICAL BAYESIAN REGRESSION MODELS

Abstract

We consider hierarchical Bayesian (type-II maximum likelihood) models for observations with latent variables for source and noise, where both hyperparameters need to be estimated jointly from data. This problem has application in many domains in imaging including biomagnetic inverse problems. Crucial factors influencing accuracy of source estimation are not only the noise level but also its correlation structure, but existing approaches have not addressed estimation of noise covariance matrices with full structure. Here, we consider the reconstruction of brain activity from electroencephalography (EEG). This inverse problem can be formulated as a linear regression with independent Gaussian scale mixture priors for both the source and noise components. As a departure from classical sparse Bayesan learning (SBL) models where across-sensor observations are assumed to be independent and identically distributed, we consider Gaussian noise with full covariance structure. Using Riemannian geometry, we derive an efficient algorithm for updating both source and noise covariance along the manifold of positive definite matrices. Using the majorization-maximization framework, we demonstrate that our algorithm has guaranteed and fast convergence. We validate the algorithm both in simulations and with real data. Our results demonstrate that the novel framework significantly improves upon state-of-the-art techniques in the real-world scenario where the noise is indeed non-diagonal and fully-structured.

1. INTRODUCTION

Having precise knowledge of the noise distribution is a fundamental requirement for obtaining accurate solutions in many regression problems (Bungert et al., 2020) . In many applications however, it is impossible to separately estimate this noise distribution, as distinct "noise-only" (baseline) measurements are not feasible. An alternative, therefore, is to design estimators that jointly optimize over the regression coefficients as well as over parameters of the noise distribution. This has been pursued both in a (penalized) maximum-likelihood settings (here referred to as Type-I approaches) (Petersen & Jung, 2020; Bertrand et al., 2019; Massias et al., 2018) as well as in hierarchical Bayesian settings (referred to as Type-II) (Wipf & Rao, 2007; Zhang & Rao, 2011; Hashemi et al., 2020; Cai et al., 2020a) . Most contributions in the literature are, however, limited to the estimation of only a diagonal noise covariance (i.e., independent between different measurements) (Daye et al., 2012; Van de Geer et al., 2013; Dalalyan et al., 2013; Lederer & Muller, 2015) . Considering a diagonal noise covariance is a limiting assumption in practice as the noise interference in many realistic scenarios are highly correlated across measurements; and thus, have non-trivial off-diagonal elements. This paper develops an efficient optimization algorithm for jointly estimating the posterior of regression parameters as well as the noise distribution. More specifically, we consider linear regression with Gaussian scale mixture priors on the parameters and a full-structure multivariate Gaussian noise. We cast the problem as a hierarchical Bayesian (type-II maximum-likelihood) regression problem, in which the variance hyperparameters and the noise covariance matrix are optimized by maximizing the Bayesian evidence of the model. Using Riemannian geometry, we derive an efficient algorithm for jointly estimating the source and noise covariances along the manifold of positive definite (P.D.) matrices. To highlight the benefits of our proposed method in practical scenarios, we consider the problem of electromagnetic brain source imaging (BSI). The goal of BSI is to reconstruct brain activity from magneto-or electroencephalography (M/EEG), which can be formulated as a sparse Bayesian learning (SBL) problem. Specifically, it can be cast as a linear Bayesian regression model with independent Gaussian scale mixture priors on the parameters and noise. As a departure from the classical SBL approaches, here we specifically consider Gaussian noise with full covariance structure. Prominent source of correlated noise in this context are, for example, eye blinks, heart beats, muscular artifacts and line noise. Other realistic examples for the need for such full-structure noise can be found in the areas of array processing (Li & Nehorai, 2010) or direction of arrival (DOA) estimation (Chen et al., 2008) . Algorithms that can accurately estimate noise with full covariance structure are expected to achieve more accurate regression models and predictions in this setting.

2. TYPE-II BAYESIAN REGRESSION

We consider the linear model Y = LX + E, in which a forward or design matrix, L ∈ R M ×N , is mapped to the measurements, Y, by a set of coefficients or source components, X. Depending on the setting, the problem of estimating X given L and Y is called an inverse problem in physics, a multitask regression problem in machine learning, or a multiple measurement vector (MMV) recovery problem in signal processing (Cotter et al., 2005) . Adopting a signal processing terminology, the measurement matrix Y ∈ R M ×T captures the activity of M sensors at T time instants, y(t) ∈ R M ×1 , t = 1, . . . , T , while the source matrix, X ∈ R N ×T , consists of the unknown activity of N sources at the same time instants, x(t) ∈ R N ×1 , t = 1, . . . , T . The matrix E = [e(1), . . . , e(T )] ∈ R M ×T represents T time instances of zero-mean Gaussian noise with full covariance Λ, e(t) ∈ R M ×1 ∼ N (0, Λ), t = 1, . . . , T , which is assumed to be independent of the source activations. In this paper, we focus on M/EEG based brain source imaging (BSI) but the proposed algorithm can be used in general regression settings, in particular for sparse signal recovery (Candès et al., 2006; Donoho, 2006) with a wide range of applications (Malioutov et al., 2005) . The goal of BSI is to infer the underlying brain activity X from the EEG/MEG measurement Y given a known forward operator, called lead field matrix L. As the number of sensors is typically much smaller than the number of locations of potential brain sources, this inverse problem is highly ill-posed. This problem is addressed by imposing prior distributions on the model parameters and adopting a Bayesian treatment. This can be performed either through Maximum-a-Posteriori (MAP) estimation (Type-I Bayesian learning) (Pascual-Marqui et al., 1994; Gorodnitsky et al., 1995; Haufe et al., 2008; Gramfort et al., 2012; Castaño-Candamil et al., 2015) or, when the model has unknown hyperparameters, through Type-II Maximum-Likelihood estimation (Type-II Bayesian learning) (Mika et al., 2000; Tipping, 2001; Wipf & Nagarajan, 2009; Seeger & Wipf, 2010; Wu et al., 2016) . In this paper, we focus on Type-II Bayesian learning, which assumes a family of prior distributions p(X|Θ) parameterized by a set of hyperparameters Θ. These hyper-parameters can be learned from the data along with the model parameters using a hierarchical Bayesian approach (Tipping, 2001; Wipf & Rao, 2004) through the maximum-likelihood principle: Θ II := arg max Θ p(Y|Θ) = arg max Θ p(Y|X, Θ)p(X|Θ)dX . (1) Here we assume a zero-mean Gaussian prior with full covariance Γ for the underlying source distribution, x(t) ∈ R N ×1 ∼ N (0, Γ), t = 1, . . . , T . Just as most other approaches, Type-II Bayesian learning makes the simplifying assumption of statistical independence between time samples. This leads to the following expression for the distribution of the sources and measurements: (3) The parameters of the Type-II model, Θ, are the unknown source and noise covariances, i.e., Θ = {Γ, Λ}. The unknown parameters Γ and Λ are optimized based on the current estimates of the source and noise covariances in an alternating iterative process. Given initial estimates of Γ and Λ,



(t)|x(t)) = T t=1 N (Lx(t), Λ) .

