HIERARCHICAL PROBABILISTIC MODEL FOR BLIND SOURCE SEPARATION VIA LEGENDRE TRANSFORMATION

Abstract

We present a novel blind source separation (BSS) method, called information geometric blind source separation (IGBSS). Our formulation is based on the loglinear model equipped with a hierarchically structured sample space, which has theoretical guarantees to uniquely recover a set of source signals by minimizing the KL divergence from a set of mixed signals. Source signals, received signals, and mixing matrices are realized as different layers in our hierarchical sample space. Our empirical results have demonstrated on images and time series data that our approach is superior to well established techniques and is able to separate signals with complex interactions.

1. INTRODUCTION

The objective of blind source separation (BSS) is to identify a set of source signals from a set of multivariate mixed signals 1 . BSS is widely used for applications which are considered to be the "cocktail party problem". Examples include image/signal processing (Isomura & Toyoizumi, 2016) , artifact removal in medical imaging (Vigário et al., 1998) , and electroencephalogram (EEG) signal separation (Congedo et al., 2008) . Currently, there are a number of solutions for the BSS problem. The most widely used approaches are variations of principal component analysis (PCA) (Pearson, 1901; Murphy, 2012) and independent component analysis (ICA) (Comon, 1994; Murphy, 2012) . However, they all have limitations with their approaches. PCA and its modern variations such as sparse PCA (SPCA) (Zou et al., 2006) , non-linear PCA (NLPCA) (Scholz et al., 2005) , and Robust PCA (Xu et al., 2010) extract a specified number of components with the largest variance under an orthogonal constraint, which are composed of a linear combination of variables. They create a set of uncorrelated orthogonal basis vectors that represent the source signal. The basis vectors with the N largest variance are called the principal components and is the output of the model. PCA has shown to be effective for many applications such as dimensionality reduction and feature extraction. However, for BSS, PCA makes the assumption that the source signals are orthogonal, which is often not the case in most practical applications. Similarly, ICA also attempts to find the N components with the largest variance, but relaxes the orthogonality constraint. All variations of ICA such as infomax (Bell & Sejnowski, 1995) , Fas-tICA (Hyvärinen & Oja, 2000) and JADE (Cardoso, 1999) separate a multivariate signal into additive subcomponents by maximizing statistical independence of each component. ICA assumes that each component is non-gaussian and the relationship between the source signal and the mixed signal is an affine transformation. In addition to these assumptions, ICA is sensitive to the initialization of the weights as the optimization is non-convex and is likely to converge to a local optimum. Other potential methods which can perform BSS include non-negative matrix factorization (NMF) (Lee & Seung, 2001; Berne et al., 2007) , dictionary learning (DL) (Olshausen & Field, 1997) , and reconstruction ICA (RICA) (Le et al., 2011) . NMF, DL and RICA are degenerate approaches to recover the source signal from the mixed signal. These approaches are more typically used for feature extraction. NMF factorizes a matrix into two matrices with nonnegative elements representing weights and features. The features extracted by NMF can be used to recover the source 



Mixed signals and received signals are used exchangeably throughout this article. 1

