CORRELATIVE INFORMATION MAXIMIZATION BASED BIOLOGICALLY PLAUSIBLE NEURAL NETWORKS FOR CORRELATED SOURCE SEPARATION

Abstract

The brain effortlessly extracts latent causes of stimuli, but how it does this at the network level remains unknown. Most prior attempts at this problem proposed neural networks that implement independent component analysis, which works under the limitation that latent causes are mutually independent. Here, we relax this limitation and propose a biologically plausible neural network that extracts correlated latent sources by exploiting information about their domains. To derive this network, we choose the maximum correlative information transfer from inputs to outputs as the separation objective under the constraint that the output vectors are restricted to the set where the source vectors are assumed to be located. The online formulation of this optimization problem naturally leads to neural networks with local learning rules. Our framework incorporates infinitely many set choices for the source domain and flexibly models complex latent structures. Choices of simplex or polytopic source domains result in networks with piecewise-linear activation functions. We provide numerical examples to demonstrate the superior correlated source separation capability for both synthetic and natural sources. We assume a linear generative model, that is, the source vectors are mixed through an unknown matrix A ∈ R m×n , x(i) = As(i), ∀i = 1, . . . , N , where we consider the overdetermined case, that is, m ≥ n and rank(A) = n. We define X = [ x(1) . . . x(N ) ]. The purpose of the source separation setting is to recover the original source matrix S from the mixture matrix X up to some scaling and/or permutation ambiguities, that is, the separator output vectors {y(i)} satisfy y(i) = W x(i) = ΠΛs(i), for all i = 1, . . . , N , where W ∈ R n×m is the learned separator matrix, Π is a permutation matrix, Λ is a full-rank diagonal matrix and y(i) refers to the estimate of the source of the sample index i.

1. INTRODUCTION

Extraction of latent causes, or sources, of complex stimuli sensed by sensory organs is essential for survival. Due to absence of any supervision in most circumstances, this extraction must be performed in an unsupervised manner, a process which has been named blind source separation (BSS) (Comon & Jutten, 2010; Cichocki et al., 2009) . How BSS may be achieved in visual, auditory, or olfactory cortical circuits has attracted the attention of many researchers, e.g. (Bell & Sejnowski, 1995; Olshausen & Field, 1996; Bronkhorst, 2000; Lewicki, 2002; Asari et al., 2006; Narayan et al., 2007; Bee & Micheyl, 2008; McDermott, 2009; Mesgarani & Chang, 2012; Golumbic et al., 2013; Isomura et al., 2015) . Influential papers showed that visual and auditory cortical receptive fields could arise from performing BSS on natural scenes (Bell & Sejnowski, 1995; Olshausen & Field, 1996) and sounds (Lewicki, 2002) . The potential ubiquity of BSS in the brain suggests that there exists generic neural circuit motifs for BSS (Sharma et al., 2000) . Motivated by these observations, here, we present a set of novel biologically plausible neural network algorithms for BSS. BSS algorithms typically derive from normative principles. The most important one is the information maximization principle, which aims to maximize the information transferred from input mixtures to separator outputs under the restriction that the outputs satisfy a specific generative assumption about sources. However, Shannon mutual information is a challenging choice for quantifying information transfer, especially for data-driven adaptive applications, due to its reliance on the joint and conditional densities of the input and output components. This challenge is eased by the independent component analysis (ICA) framework by inducing joint densities into separable forms based on the assumption of source independence (Bell & Sejnowski, 1995) . In particular scenarios, the mutual independence of latent causes of real observations may not be a plausible assumption (Träuble et al., 2021) . To address potential dependence among latent components, Erdogan (2022) recently proposed the use of the second-order statistics-based correlative (log-determinant) mutual information maximization for BSS to eliminate the need for the independence assumption, allowing for correlated source separation. In this article, we propose an online correlative information maximization-based biologically plausible neural network framework (CorInfoMax) for the BSS problem. Our motivations for the proposed framework are as follows: • The correlative mutual information objective function is only dependent on the second-order statistics of the inputs and outputs. Therefore, its use avoids the need for costly higher-order statistics or joint pdf estimates, • The corresponding optimization is equivalent to maximization of correlation, or linear dependence, between input and output, a natural fit for the linear inverse problem, • The framework relies only on the source domain information, eliminating the need for the source independence assumption. Therefore, neural networks constructed with this framework are capable of separating correlated sources. Furthermore, the CorInfoMax framework can be used to generate neural networks for infinitely many source domains corresponding to the combination of different attributes such as sparsity, nonnegativity etc., • The optimization of the proposed objective inherently leads to learning with local update rules. • CorInfoMax acts as a unifying framework to generate biologically plausible neural networks for various unsupervised data decomposition methods to obtain structured latent representations, such as nonnegative matrix factorization (NMF) (Fu et al., 2019) , sparse component analysis (SCA) (Babatas & Erdogan, 2018) , bounded component analysis (BCA) (Erdogan, 2013; Inan & Erdogan, 2014) and polytopic matrix factorization (PMF) (Tatli & Erdogan, 2021) . (a) (b) Figure 1 : CorInfoMax BSS neural networks for two different canonical source domain representations. x i 's and y i 's represent inputs (mixtures) and (separator) outputs , respectively, W are feedforward weights, e i 's are errors between transformed inputs and outputs, B y , the inverse of output autocorrelation matrix, represents lateral weights at the output. For the canonical form (a), λ i 's are Lagrangian interneurons imposing source domain constraints, A P (A T P ) represents feedforward (feedback) connections between outputs and interneurons. For the canonical form (b), interneurons on the right impose sparsity constraints on the subsets of outputs. Figure 1 illustrates CorInfoMax neural networks for two different source domain representation choices, which are three-layer neural networks with piecewise linear activation functions.We note that the proposed CorInfoMax framework, beyond solving the BSS problem, can be used to learn structured and potentially correlated representations from data through the maximum correlative information transfer from inputs to the choice of the structured domain at the output.

1.1.1. BIOLOGICALLY PLAUSIBLE NEURAL NETWORKS FOR BSS

There are different methods to solve the BSS problem through neural networks with local learning rules. These methods are differentiated on the basis of the observation models they assume and the normative approach they propose. We can list biologically plausible ICA networks as an example category, which are based on the exploitation of the presumed mutual independence of sources (Isomura & Toyoizumi, 2018; Bahroun et al., 2021; Lipshutz et al., 2022) . There exist alternative approaches which exploit different properties of the data model to replace the independence assumption with a weaker one. As an example, Pehlevan et al. (2017a) uses the nonnegativeness property along with the biologically inspired similarity matching (SM) framework (Pehlevan et al., 2017b) to derive biologically plausible neural networks that are capable of separating uncorrelated but potentially dependent sources. Similarly, Erdogan & Pehlevan (2020) proposes bounded similarity matching (BSM) as an alternative approach that takes advantage of the magnitude boundedness property for uncorrelated source separation. More recently, Bozkurt et al. (2022) introduced a generic biologically plausible neural network framework based on weighted similarity matching (WSM) introduced in Erdogan & Pehlevan (2020) and maximization of the output correlation determinant criterion used in the NMF, SCA, BCA and PMF methods. This new framework exploits the domain structure of sources to generate two-/ three-layer biologically plausible networks that have the ability to separate potentially correlated sources. Another example of biologically plausible neural networks with correlated source separation capability is offered in Simsek & Erdogan (2019) , which also uses the determinant maximization criterion for the separation of antisparse sources. Our proposed framework differs significantly from Bozkurt et al. (2022) : Bozkurt et al. (2022) uses the similarity matching criterion, which is not employed in our framework, as the main tool for generating biologically plausible networks. Therefore, the resulting network structure and learning rules are completely different from Bozkurt et al. (2022) . For example, the lateral connections of the outputs in Bozkurt et al. (2022) are based on the output autocorrelation matrix, while the lateral connections for our proposed framework are based on the inverse of the output autocorrelation matrix. Unlike our framework, neurons in Bozkurt et al. (2022) have learnable gains. The feedforward weights of the networks in Bozkurt et al. (2022) correspond to a cross-correlation matrix between the inputs and outputs of a layer, whereas, for our proposed framework, the feedforward connections correspond to the linear predictor of the output from the input. The feedback connection matrix in Bozkurt et al. (2022) is the transpose of the feedforward matrix which is not the case in our proposed framework. Compared to the approach in Simsek & Erdogan (2019) , our proposed framework is derived from information-theoretic grounds, and its scope is not limited to antisparse sources, but to infinitely many different source domains.

1.1.2. INFORMATION MAXIMIZATION FOR UNSUPERVISED LEARNING

The use of Shannon's mutual information maximization for various unsupervised learning tasks dates back to a couple of decades. As one of the pioneering applications, we can list Linsker's work on self-organizing networks, which proposes maximizing mutual information between input and its latent representation as a normative approach (Linsker, 1988) . Under the Gaussian assumption, the corresponding objective simplifies to determinant maximization for the output covariance matrix. Becker & Hinton (1992) suggested maximizing mutual information between alternative latent vectors derived from the same input source as a self-supervised method for learning representations. The most well-known application of the information maximization criterion to the BSS problem is the ICA-Infomax approach by Bell & Sejnowski (1995) . The corresponding algorithm maximizes the information transferred from the input to the output under the constraint that the output components are mutually independent. For potentially correlated sources, Erdogan (2022) proposed the use of a second-order statistics-based correlative (or log-determinant) mutual information measure for the BSS problem. This approach replaces the mutual independence assumption in the ICA framework with the source domain information, enabling the separation of both independent and dependent sources. Furthermore, it provides an information-theoretic interpretation for the determinant maximization criterion used in several unsupervised structured matrix factorization frameworks such as NMF (or simplex structured matrix factorization (SSMF)) (Chan et al., 2011; Lin et al., 2015; Fu et al., 2018; 2019) , SCA, BCA, and PMF. More recently, Ozsoy et al. (2022) proposed maximization of the correlative information among latent representations corresponding to different augmentations of the same input as a self-supervised learning method. The current article offers an online optimization formulation for the batch correlative information maximization method of Erdogan (2022) that leads to a general biologically plausible neural network generation framework for the unsupervised unmixing of potentially dependent/correlated sources.

2. PRELIMINARIES

This section aims to provide background information for the CorInfoMax-based neural network framework introduced in Section 3. For this purpose, we first describe the BSS setting assumed throughout the article in Section 2.1. Then, in Section 2.2, we provide an essential summary of the batch CorInfoMax-based BSS approach introduced in (Erdogan, 2022).

SOURCES:

We assume a BSS setting with a finite number of n-dimensional source vectors, represented by the set S = {s(1), s(2), . . . , s(N )} ⊂ P, where P is a particular subset of R n . The choice of source domain P determines the identifiability of the sources from their mixtures, the properties of the individual sources and their mutual relations. Structured unsupervised matrix factorization methods are usually defined by the source/latent domain, such as i. Normalized nonnegative sources in the NMF(SSMF) framework: ∆ = {s | s ≥ 0, 1 T s = 1}. Signal processing and machine learning applications such as hyperspectral unmixing and text mining (Abdolali & Gillis, 2021) , (Fu et al., 2016) . ii. Bounded antisparse sources in the BCA framework: Note that the sets in (ii)-(v) of the above list are the special cases of (convex) polytopes. Recently, Tatli & Erdogan (2021) showed that infinitely many polytopes with a certain symmetry restriction enable identifiability for the BSS problem. Each identifiable polytope choice corresponds to different structural assumptions on the source components. A common canonical form to describe polytopes is to use the H-representation Grünbaum et al. (1967) : B ℓ∞ = {s | ∥s∥ ∞ ≤ 1}. P = {y ∈ R n |A P y ≼ b P }, which corresponds to the intersection of half-spaces. Alternatively, similar to Bozkurt et al. (2022) , we can consider a subset of polytopes, which we refer to as feature-based polytopes, defined in terms of attributes (such as non-negativity and sparseness) assigned to the subsets of components: P = s ∈ R n | s i ∈ [-1, 1] ∀i ∈ I s , s i ∈ [0, 1] ∀i ∈ I + , ∥s J l ∥ 1 ≤ 1, J l ⊆ Z + n , l ∈ Z + L , (2) where I s ⊆ Z + n is the set of indexes for signed sources, and I + is its complement, s J l is the subvector constructed from the elements with indices in J l , and L is the number of sparsity constraints imposed on the sub-vector level. In this article, we consider both polytope representations above.

2.2. CORRELATIVE MUTUAL INFORMATION MAXIMIZATION FOR BSS

Erdogan (2022) proposes maximizing the (correlative) information flow from the mixtures to the separator outputs, while the outputs are restricted to lie in their presumed domain P. The corresponding batch optimization problem is given by maximize Y ∈ R n×N I (ϵ) LD (X, Y ) = 1 2 log det( Ry + ϵI) - 1 2 log det( Re + ϵI) (3a) subject to Y :,i ∈ P, i = 1, . . . , N, where the objective function I (ϵ) LD (X, Y ) is the log-determinant (LD) mutual informationfoot_0 between the mixture and the separator output vectors (see Appendix A.1 and Erdogan (2022) for more information), Ry is the sample autocorrelation, i.e., Ry = 1 N Y Y T , (or autocovariance, i.e.,  Ry = 1 N Y (I N -1 N 1 N 1 T N )Y T ) XY T (or cross-covariance Rxy = 1 N X(I N -1 N 1 N 1 T N )Y T ) matrix between mixture and output vectors, and Rx is the sample autocorrelation (or autocovariance) matrix for the mixtures. As discussed in Appendix A.1, for sufficiently small ϵ, Re is the sample autocorrelation (covariance) matrix of the error vector corresponding to the best linear (affine) minimum mean square error (MMSE) estimate of the separator output vector y, from the mixture vector x. Under the assumption that the original source samples are sufficiently scattered in P, (Fu et al., 2019; Tatli & Erdogan, 2021) , i.e., they form a maximal LD-entropy subset of P, (Erdogan, 2022), then the optimal solution of (3) recovers the original sources up to some permutation and sign ambiguities, for sufficiently small ϵ. The biologically plausible CorInfoMax BSS neural network framework proposed in this article is obtained by replacing batch optimization in (3) with its online counterpart, as described in Section 3.

3.1. ONLINE OPTIMIZATION SETTING FOR LD-MUTUAL INFORMATION MAXIMIZATION

We start our online optimization formulation for CorInfoMax by replacing the output and error sample autocorrelation matrices in (3a) with their weighted versions Rζy y (k) = 1 -ζ y 1 -ζ k y k i=1 ζ k-i y y(i)y(i) T Rζe e (k) = 1 -ζ e 1 -ζ k e k i=1 ζ k-i e e(i)e(i) T , where 0 ≪ ζ y < 1 is the forgetting factor, W (i) is the best linear MMSE estimator matrix (to estimate y from x), and e(k) = y(i) -W (i)x(i) is the corresponding error vector. Therefore, we can define the corresponding online CorInfoMax optimization problem as maximize y(k) ∈ R n J (y(k)) = 1 2 log det( Rζy y (k) + ϵI) - 1 2 log det( Rζe e (k) + ϵI) (5a) subject to y k ∈ P. ( ) Note that the above formulation assumes knowledge of the best linear MMSE matrix W (i), whose update is formulated as a solution to an online regularized least squares problem, maximize W (i) ∈ R m×n µ W ∥y(i) -W (i)x(i)∥ 2 2 + ∥W (i) -W (i -1)∥ 2 F .

3.2. DESCRIPTION OF THE NETWORK DYNAMICS FOR SPARSE SOURCES

We now show that the gradient-ascent-based maximization of the online CorInfoMax objective in (5) corresponds to the neural dynamics of a multilayer recurrent neural network with local learning rules. Furthermore, the presumed source domain P determines the output activation functions and additional inhibitory neurons. For an illustrative example, in this section, we concentrate on the sparse special case in Section 2.1, that is, P = B ℓ1 . We can write the corresponding Lagrangian optimization setting as minimize λ≥0 maximize y(k)∈R n L(y(k), λ(k)) = J (y(k)) -λ(k)(∥y(k)∥ 1 -1). To derive network dynamics, we use the proximal gradient update (Parikh et al., 2014) for y(k) with the expression (A.14) for ∇ y(k) J (y(k)), derived in Appendix B, and the projected gradient descent update for λ(k) using ∇ λ L(y(k), λ(k)) = 1 -∥y(k; ν + 1)∥ 1 , leading to the following iterations: e(k; ν) = y(k; ν) -W (k)x(k) (8) ∇ y(k) J (y(k; ν)) = γ y (k)B ζy y (k -1)y(k; ν) -γ e (k)B ζe e (k -1)e(k; ν), y(k; ν + 1) = ST λ(k;ν) y(k; ν) + η y (ν)∇ y(k) J (y(k; ν)) (10) λ(k; ν + 1) = ReLU λ(k; ν) -η λ (ν)(1 -∥y(k; ν + 1)∥ 1 ) , where B ζy y (k) and B ζe e (k) are inverses of R ζy y (k -1) and R ζe e (k -1) respectively, γ y (k) and γ e (k) are provided in (A.11) and (A.13), ν ∈ N is the iteration index, η y (ν) and η λ (ν) are the learning rates for the output and λ, respectively, at iteration ν, ReLU(•) is the rectified linear unit, and ST λ (.) is the soft-thresholding nonlinearity defined as ST λ (y ) i = 0 |y i | ≤ λ, y i -sign(y i )λ otherwise . We represent the values in the final iteration ν f inal , with some abuse of notation, with y(k) = y(k; ν f inal ) and e(k) = e(k; ν f inal ). The neural dynamic iterations in ( 8)-( 11  B ζy y (k + 1) = 1 -ζ k y ζ y -ζ k y (B ζy y (k) -γ y (k)B ζy y (k)y(k)y(k) T B ζy y (k)), B ζe e (k + 1) = 1 -ζ k e ζ e -ζ k e (B ζe e (k) -γ e (k)B ζe e (k)e(k)e(k) T B ζe e (k)). However, note that (12), and (13) violate biological plausibility, since the multiplier γ y and γ e depend on all output and error components contrasting the locality. Furthermore, the update in ( 13) is not local since it is only a function of the feedforward signal z e (k) = B ζe e (k)e(k) entering into output neurons: the update of [B ζe e ] ij , the synaptic connection between the output neuron i and the error neuron j requires [z e ] j which is a signal input to the output neuron j. To modify the updates ( 12) and ( 13) into a biologically plausible form, we make the following observations and assumptions: • Rζe e (k) + ϵI ≈ ϵI ⇒ B ζe e (k + 1) ≈ 1 ϵ I, which is a reasonable assumption, as we expect the error e(k) to converge near zero in the noiseless linear observation model, • If ζ y is close enough to 1, and the time step k is large enough, γ y (k) is approximately 1 -ζ y ζ y . Therefore, we modify the update equation of B ζy y (k + 1) in ( 12) as B ζy y (k + 1) = 1 ζ y (B ζy y (k) - 1 -ζ y ζ y B ζy y (k)y(k)y(k) T B ζy y (k)). Feed-forward synaptic connections W (k): The solution of online optimization in ( 6) is given by W (k + 1) = W (k) + µ W (k)e(k)x(k) T , where µ W (k) is the step size corresponding to the adaptive least-mean-squares (LMS) update based on the MMSE criterion (Sayed, 2003) . Algorithm 1 below summarizes the Sparse CorInfoMax output and learning dynamics: Algorithm 1 Sparse CorInfoMax Algorithm Input: Streaming data {x(k) ∈ R m } N k=1 , Output: {y(k) ∈ R n } N k=1 . 1: Initialize ζy, ζe, µ(1) W , W (1), B ζy y (1), B ζe e (1). 2: for k = 1, 2, . . . , N do 3: run neural output dynamics until convergence: e(k; ν) = y(k; ν) -W (k)x(k) ∇ y(k) J (y(k; ν)) = γy(k)B ζy y (k -1)y(k; ν) -γe(k)B ζe e (k -1)e(k; ν), y(k; ν + 1) = ST λ(k;ν) y(k; ν) + ηy(ν)∇ y(k) J (y(k; ν)) λ(k; ν + 1) = ReLU(λ(k; ν) -η λ (ν)(1 -∥y(k; ν + 1)∥1)),

4:

Update feedforward synapses: W (k + 1) = W (k) + µ W (k)e(k)x(k) T 5: Update lateral synapses: B 

3.3. DESCRIPTION OF THE NETWORK DYNAMICS FOR A CANONICAL POLYTOPE REPRESENTATION

In this section, we consider the optimization problem specified in (5) for a generic polytope with H-representation in (1). We can write the corresponding online optimization setting in Lagrangian form as minimize λ(k)≽0 maximize y(k)∈R n L(y(k), λ(k)) = J (y(k)) -λ(k) T (A P y(k) -b P ), which is a Min-Max problem. For the recursive update dynamics of the network output and the Lagrangian variable, we obtain the derivative of the objective in ( 16) with respect to y(k) and λ(k) as ∇ y(k) L(y(k; ν), λ(k; ν)) = γ y B ζy y (k -1)y(k; ν) -γ e B ζe e (k -1)e(k; ν) -A T P λ(k; ν), (17) ∇ λ(k) L(y(k; ν)) = -A P y(k; ν) + b P . ( ) Recursive update dynamics for y(k) and λ(k): To solve the optimization problem in ( 16), the projected gradient updates on the y(k) and λ(k) lead to the following neural dynamic iterations: y(k; ν + 1) = y(k; ν) + η y (ν)∇ y(k) L(y(k; ν), λ(k; ν)), (19) λ(k, ν + 1) = ReLU (λ(k, ν) -η λ (ν)(b P -A P y(k; ν))) , where η λ (ν) denotes the learning rate for λ at iteration ν. These iterations correspond to a recurrent neural network for which we can make the following observations: i) output neurons use linear activation functions since y(k) is unconstrained in ( 16), ii) the network contains f interneurons corresponding to the Lagrangian vector λ, where f is the number of rows of A P in ( 16), or the number of (n -1)-faces of the corresponding polytope, iii) the nonnegativity of λ implies ReLU activation functions for interneurons. The neural network architecture corresponding to the neural dynamics in ( 19)-( 20) is shown in Figure 1a , which has a layer of f interneurons to impose the polytopic constraint in (1). The updates of W (k) and B ζy y (k) follow the equations provided in Section 3.2. Although the architecture in Figure 1a allows implementation of arbitrary polytopic source domains; f can be a large number. Alternatively, it is possible to consider the subset of polytopes in (2), which are described by individual properties and the relations of source components. Appendix C.5 derives the network dynamics for this feature-based polytope representation, and Figure 1b illustrates its particular realization. The number of interneurons in this case is equivalent to the number of sparsity constraints in (2), which can be much less than the number of faces of the polytope.

4. NUMERICAL EXPERIMENTS

In this section, we illustrate different domain selections for sources and compare the proposed Cor-InfoMax framework with existing batch algorithms and online biologically plausible neural network approaches. We demonstrate the correlated source separation capability of the proposed framework for both synthetic and natural sources. Additional experiments and details about their implementations are available in Appendix D.

4.1. SYNTHETICALLY CORRELATED SOURCE SEPARATION WITH ANTISPARSE SOURCES

To illustrate the correlated source separation capability of the online CorInfoMax framework for both nonnegative and signed antisparse sources, i.e. s(i) ∈ B ℓ∞,+ ∀i and s(i) ∈ B ℓ∞ ∀i, respectively, we consider a BSS setting with n = 5 sources and m = 10 mixtures. The 5-dimensional sources are generated using the Copula-T distribution with 4 degrees of freedom. We control the correlation level of the sources by adjusting a Toeplitz distribution parameter matrix with a first row of [1 ρ ρ ρ ρ] for ρ ∈ [0, 0.8]. In each realization, we generate N = 5 × 10 5 samples for each source and mix them through a random matrix A ∈ R 10×5 whose entries are drawn from an i.i.d. standard normal distribution. Furthermore, we use an i.i.d. white Gaussian noise (WGN) corresponding to the signal-to-noise ratio (SNR) level of 30dB to corrupt the mixture signals. We use antisparse CorInfoMax network in Section C.1 and nonnegative CorInfoMax network in Appendix C.2 for these experiments. To compare, we also performed these experiments with biologically plausible algorithms: online BCA (Simsek & Erdogan, 2019) , WSM (Bozkurt et al., 2022) , NSM (Pehlevan et al., 2017a) , BSM (Erdogan & Pehlevan, 2020), and batch altgorithms: ICA-Infomax (Bell & Sejnowski, 1995) , LD-InfoMax (Erdogan, 2022), PMF (Tatli & Erdogan, 2021) . Figure 3 shows the signal-to-interference-plus-noise ratio (SINR) versus correlation level ρ curves of different algorithms for nonnegative antisparse and antisparse source separation experiments. We observe that the proposed CorInfoMax approach achieves relatively high SINR results despite increasing ρ in both cases. Although the WSM curve has a similar characteristic, its performance falls behind that of CorInfoMax. Moreover, the LD-InfoMax and PMF algorithms typically achieve the best results, as expected, due to their batch learning settings. Furthermore, the performance of NSM, BSM, and ICA-InfoMax degrades with increasing source correlation because these approaches assume uncorrelated or independent sources. = 0.5173, respectively. We use a random mixing matrix A ∈ R 5×3 with positive entries (to ensure nonnegative mixtures so that they can be displayed as proper images without loss of generality), which is provided in Appendix D.3.3. Since the image pixels are in the set [0, 1], we use the nonnegative antisparse CorInfoMax network to separate the original videos. The demo video (which is available in supplementary files and whose link is provided in the footnotefoot_1 ) visually demonstrates the separation process by the proposed approach over time. The first and second rows of the demo are the 3 source videos and 3 of the 5 mixture videos, respectively. The last row contains the source estimates obtained by the CorInfoMax network during its unsupervised learning process. We observe that the output frames become visually better as time progresses and start to represent individual sources. In the end, the CorInfoMax network is trained to a stage of near-perfect separation, with peak signal-to-noise ratio (PSNR) levels of 35.60dB, 48.07dB, and 44.58dB for each source, respectively. Further details for this experiment can be found in the Appendix D.3.3.

5. CONCLUSION

In this article, we propose an information-theoretic framework for generating biologically plausible neural networks that are capable of separating both independent and correlated sources. The proposed CorInfoMax framework can be applied to infinitely many source domains, enabling a diverse set of source characterizations. In addition to solving unsupervised linear inverse problems, CorIn-foMax networks have the potential to generate structured embeddings from observations based on the choice of source domains. In fact, as a future extension, we consider representation frameworks that learn desirable source domain representations by adapting the output-interneuron connections in Figure 1a . Finally, the proposed unsupervised framework and its potential supervised extensions can be useful for neuromorphic systems that are bound to use local learning rules. In terms of limitations, we can list the computational complexity for simulating such networks in conventional computers, mainly due to the loop-based recurrent output computation. However, as described in Appendix D.7, the neural networks generated by the proposed framework have computational loads similar to the existing biologically plausible BSS neural networks.

6. REPRODUCIBILITY

To ensure the reproducibility of our results, we provide i. Detailed mathematical description of the algorithms for different source domains and their neural network implementations in Section 3. 

A APPENDIX A.1 INFORMATION THEORETIC DEFINITIONS

In this section, we review the logarithm-determinant (LD) entropy measure and mutual information for the BSS setting introduced in Section 2.1 based on Erdogan (2022). Note that in this article, we refer to LD-mutual information synonymously as correlative mutual information. For a finite set of vectors X = {x(1), x(2), . . . , x(N )} ⊂ R m with a sample covariance matrix Rx = 1 N XX T - 1 N 2 X11 T X T , where X is defined as X = [ x(1) x(2) . . . x(N ) ], the deterministic LDentropy is defined in Erdogan (2022) as H(X) (ϵ) LD = 1 2 log det( Rx + ϵI) + m 2 log(2πe) (A.1) where ϵ > 0 is a small number to keep the expression away from -∞. If the sample covariance Rx in this expression is replaced with the true covariance, then H(x) (0) LD coincides with the Shannon differential entropy for a Gaussian vector x. Moreover, a deterministic joint LD-entropy of two sets of vectors X ⊂ R m and Y ⊂ R n can be defined as H(X, Y ) (ϵ) LD = 1 2 log det( R x y + ϵI) + m + n 2 log det(2πe) = 1 2 log det Rx + ϵI Rxy Ryx Ry + ϵI + m + n 2 log det(2πe) = 1 2 log det( Rx + ϵI) det( Ry + ϵI -RT xy ( Rx + ϵI) -1 Ryx ) + m + n 2 log det(2πe) = 1 2 log det( Rx + ϵI) + m 2 log(2πe) + 1 2 log det( Re + ϵI) + n 2 log(2πe) = H(X) (ϵ) LD + H(Y | L X) (ϵ) LD (A.2) where Re = Ry -RT xy ( Rx + ϵI) -1 Ryx , and Rxy = 1 N XY T = RT yx . In (A.2), the nota- tion Y | L X is used to signify H(Y | L X) (ϵ) LD ̸ = H(Y |X) (ϵ) LD as the latter requires the use of Ry|x instead of Re . Moreover, H(Y | L X) (ϵ) LD corresponds to the log-determinant of the error sample covariance of the best linear minimum mean squared estimate (MMSE) of y from x. To verify that in the zero-mean and noiseless case, consider the MMSE estimate ŷ = W x for which the solution is given by W = R yx R -1 x = R T xy R -1 x (Kailath et al., 2000) . Then R ŷ = E[ ŷ ŷT ] = E[R T xy R -1 x xx T R -1 x R xy ] = R T xy R -1 x R xy . Therefore, if the error is defined as e = yŷ, its covariance matrix can be found as desired, i.e., R e = R y -R ŷ = R y -R T xy R -1 x R xy . The LD-mutual information for X and Y can be defined based on the equations (A.1) and (A.2) as I (ϵ) (X, Y ) = H (ϵ) LD (Y ) -H (ϵ) LD (Y | L X) = H (ϵ) LD (X) -H (ϵ) LD (X| L Y ) = 1 2 log det( Ry + ϵI) - 1 2 log det( Ry -RT xy ( Rx + ϵI) -1 Ryx + ϵI) + C = 1 2 log det( Rx + ϵI) - 1 2 log det( Rx -RT yx ( Ry + ϵI) -1 Rxy + ϵI) + C, (A.3) where C = m 2 log(2πe) + n 2 log(2πe) is a constant.

B GRADIENT DERIVATIONS FOR ONLINE OPTIMIZATION OBJECTIVE

Assuming that the mapping W (k) changes slowly over time, the current output y(k) can be implicitly defined by the projected gradient ascent with neural dynamics. To derive the corresponding neural dynamics for the output y(k), we need to calculate the gradient of the objective 5a with respect where the nonnegative clipping function is defined as σ + (y) i = 0 y i ≤ 0, y i 0 ≤ y i ≤ 1, 1 y i ≥ 1. The corresponding neural network realization is shown in Figure 5 . 

C.5 DESCRIPTION OF THE NETWORK DYNAMICS FOR FEATURE BASED SPECIFIED POLYTOPES

In this section, we consider the source separation setting where source samples are from a polytope represented in the form of (2). We expand the derivation in Bozkurt et al. ( 2022) (see Appendix D.6 in the reference) to obtain a neural network solution to the BSS problem for any identifiable polytope that can be expressed in the form of (2). Accordingly, we consider the following optimization problem maximize y(k) ∈ R n J (y(k)) (A.16a) subject to -1 ≼ y(k) Is ≼ 1, (A.16b) 0 ≼ y(k) I+ ≼ 1, (A.16c) ||y(k) J l || 1 ≤ 1 ∀l = 1, . . . , L (A.16d) We write the online optimization setting in a Lagrangian Min-Max setting as follows: minimize λ l (k)≥0 maximize y(k) ∈ R n -1 ≼ y(k) Is ≼ 1 0 ≼ y(k) I+ ≼ 1 L(y(k),λ1(k),...,λ L (k)) J (y(k)) - L l=1 λ l (k)(∥y(k) J l ∥ 1 -1) . The proximal operator corresponding to the Lagrangian term can be written as prox λ (y) = argmin q s.t. q I + ≽0 1 2 ∥y -q∥ 2 2 + L l=1 λ l ∥q J l ∥ 1 . (A.17) Let q * be the output of the proximal operator defined in (A.17). From the first order optimality condition, • If j ̸ ∈ I + , then q * j -y j + l∈J l s.t. j∈J l λ l sign(y j ) = 0. Therefore, q * j = y j -l∈J l s.t. j∈J l λ l sign(y j ). • If j ∈ I + , then q * j = y j - l∈J l s.t. j∈J l λ l . As a result, defining I a = (∩ l J l ) ∁ as the set of dimension indices which do not appear in the sparsity constraints, we can write the corresponding output dynamics as ∇ y(k) J (y(k; ν)) = γ y B ζy y (k -1)y(k; ν) -γ e B ζe e (k -1)e(k; ν), ȳ(k; ν + 1) = y(k; ν) + η y (ν)∇ y(k) J (y(k; ν)) (A.18) y j (k; ν + 1) = ST αj (k,ν) (ȳ j (k; ν + 1)) where α j (k; ν) = l∈J l s.t. j∈J l λ l (k; ν) ∀j ∈ I s ∩ I ∁ a y j (k; ν + 1) = ReLU ȳj (k; ν + 1) - l∈J l s.t. j∈J l λ l (k; ν) ∀j ∈ I + ∩ I ∁ a y j (k; ν + 1) = σ 1 (ȳ j (k; ν)) ∀j ∈ I s ∩ I a , y j (k; ν + 1) = σ + (ȳ j (k; ν)) ∀j ∈ I + ∩ I a For inhibitory neurons corresponding to Lagrangian variables λ 1 , . . . , λ L , we obtain the update dynamics based on the derivative of L(y (k; ν), λ 1 (k; ν), . . . , λ L (k; ν)) as dL(y(k), λ 1 (k), . . . , λ L (k)) dλ l (k) λ l (k;ν) = 1 -∥[y(k; ν + 1)] J l ∥ 1 ∀l, λl (k; ν + 1) = λ l (k; ν) -η λ l (ν) dL(y(k), λ 1 (k), . . . , λ L (k)) dλ l (k) λ l (k;ν) , λ l (k, ν + 1) = ReLU λl (k; ν + 1) . In Appendix D.2.4, we demonstrate an example setting in which the underlying domain is defined as P ex =      s ∈ R 5 s 1 , s 2 , s 4 ∈ [-1, 1], s 3 , s 5 ∈ [0, 1], s 1 s 2 s 5 1 ≤ 1, s 2 s 3 s 4 1 ≤ 1      . We summarize the neural dynamics for this specific example: y 1 (k; ν + 1) = ST λ1(k,ν) (ȳ 1 (k; ν + 1)) , y 2 (k; ν + 1) = ST λ1(k,ν)+λ2(k,ν) (ȳ 2 (k; ν + 1)) , y 3 (k; ν + 1) = ReLU ȳ3 (k; ν + 1) -λ 2 (k; ν) , y 4 (k; ν + 1) = ST λ2(k,ν) (ȳ 4 (k; ν + 1)) , y 5 (k; ν + 1) = ReLU ȳ5 (k; ν + 1) -λ 1 (k; ν) , λ 1 (k; ν + 1) = λ 1 (k; ν) -η λ1 (ν) 1 -|y 1 (k; ν + 1)| -|y 2 (k; ν + 1)| -y 5 (k; ν + 1) , λ 2 (k; ν + 1) = λ 2 (k; ν) -η λ1 (ν) 1 -|y 2 (k; ν + 1)| -y 3 (k; ν + 1) -|y 4 (k; ν + 1)| , where ȳ(k; ν) is defined as in (A.18).

D SUPPLEMENTARY ON NUMERICAL EXPERIMENTS

In this section, we provide more details on the algorithmic view of the proposed approach and the numerical experiments presented. In addition, we provide more examples.

D.1 ONLINE CORINFOMAX ALGORITHM IMPLEMENTATIONS FOR SPECIAL SOURCE DOMAINS

Algorithm 2 summarizes the dynamics of the CorInfoMax network and learning rules. For each of the domain choices, the recurrent and feedforward weight updates follow ( 14) and ( 15), respectively, and the learning step is indicated in the 4 th and 5 th lines in the pseudo-code. The line 3 rd expresses the recursive neural dynamics to obtain the output of the network, and its implementation differs for different domain choices. Based on the derivations in Section 3 and Appendix C, Algorithm 3, 4, 5, and 6 summarizes the neural dynamic iterations for some example domains. For example, Algorithm 5 indicates the procedure to obtain the output y(k) of the antisparse CorInfoMax network at time step k corresponding to the mixture vector x(k). As it is an optimization process, we introduce two variables for implementation in digital hardware: 1) numerical convergence tolerance ϵ t , and 2) maximum number of (neural dynamic) iterations ν max . We run the proposed neural dynamic iterations until either a convergence happens, i.e., ||y(k; ν) -y(k; ν -1)||/||y(k; ν)|| > ϵ t , or the loop counter reaches a predetermined maximum number of iterations, that is, ν = ν max . Differently from the antisparse and nonnegative antisparse networks, the other CorInfoMax neural networks include additional inhibitory neurons due to the Lagrangian Min-Max settings, and the activation of these neurons are coupled with the network's output. Therefore, inhibitory neurons are updated in neural dynamics based on the gradient of the Lagrangian objective. Algorithm 2 Online CorInfoMax pseudo-code In this section, we illustrate blind separation of sparse sources, i.e. s(i) ∈ B ℓ1 ∀i. We consider n = 5 sources and m = 10 mixtures. For each source, we generate 5 × 10 5 samples in each realization of the experiments. We examine two different experimental factors: 1) output SINR performance as a function of mixture SNR levels, and 2) output SINR performance for different distribution selections for the entries of the mixing matrix. Input: A streaming data of {x(k) ∈ R m } N k=1 . Output: {y(k) ∈ R n } N k=1 . 1: Initialize ζ y , ζ e , µ(1) W , W (1), B W (k + 1) = W (k) + µ W (k)e(k)x(k) T y(k; ν + 1) = ReLU y(k; ν) + η y (ν)∇ y(k) J (y(k; ν)) -λ(ν) 5: ∇ λ(k) L(y(k; ν)) = 1 -∥y(k; ν + 1)∥ 1 6: λ(k; ν + 1) = λ(k; ν) -η λ (ν)∇ λ(k) L(y(k; ν)) 7: ν = ν + 1, y(k; ν + 1) = ST λ(ν) y(k; ν) + η y (ν)∇ y(k) J (y(k; ν)) -λ(ν) 5: ∇ λ(k) L(y(k; ν)) = 1 -∥y(k; ν + 1)∥ 1 6: λ(k; ν + 1) = ReLU λ(k; ν) -η λ (ν)∇ λ(k) L(y(k; ν)) 7: ν = ν + 1, ∇ y(k) J (y(k; ν)) = γ y B ζy y (k)y(k; ν) -γ e B ζe e (k)e(k; ν) -A T P λ(ν) 4: y(k; ν + 1) = y(k; ν) + η y (ν)∇ y(k) J (y(k; ν)) 5: ∇ λ(k) L(y(k; ν)) = -A P y(k; ν) + b P 6: λ(k; ν + 1) = ReLU λ(k; ν) -η λ (ν)∇ λ(k) L(y(k; ν)) 7: ν = ν + 1, For the first scenario with different mixture SNR levels: the sources are mixed through a random matrix A ∈ R 10×5 whose entries are drawn from i.i.d. standard normal distribution, and the mixtures are corrupted by WGN with 30dB SNR. We compare our approach with the WSM (Bozkurt et al., 2022) , LD-InfoMax (Erdogan, 2022) and PMF (Tatli & Erdogan, 2021) algorithms and visualize the results in Figure 8 . We also use the portion of the mixtures to train batch LD-InfoMax and PMF algorithms. Figure 8a illustrates the SINR performances of these algorithms for different input noise levels. The SINR results of CorInfoMax, LD-InfoMax, and PMF are noticeably close to each other, which is almost equal to the input SNR. Figure 8b illustrates the SINR convergence of the sparse CorInfoMax network for the 30dB mixture SNR level as a function of update iterations. Based on this figure, we can conclude that the proposed CorInfoMax network converges robustly and smoothly. In the second experimental setting, we examine the effect of the distribution choice for generating the random mixing matrix. Figure 9 illustrates the box plots of SINR results for CorInfoMax, LD-InfoMax, and PMF for different distribution selections to generate the mixing matrix, which are N (0, 1), U[-1, 1], U[-2, 2], L(0, 1). where N is normal distribution, U is uniform distribution, and L is the Laplace distribution. It is observable that the performance of CorInfoMax is robust against the different distribution selections of the unknown mixing matrix, while its performance is on par with the batch algorithms LD-InfoMax and PMF.  ( 0 , 1 ) [ 1 , 1 ] [ 2 , 2 ] ( 0 ,

PMF Results

Distribution of the Entries of Mixing Matrix SINR (dB) 

D.2.2 NONNEGATIVE SPARSE SOURCE SEPARATION

We replicate the first experimental setup in Appendix D.2.1 for the nonnegative sparse source separation: evaluate the SINR performance of the nonnegative sparse CorInfoMax network for different levels of mixture SNR, compared to the batch LD-InfoMax algorithm and the biologically plausible WSM neural network. In these experiments, n = 5 uniform sources in B ℓ1,+ are randomly mixed to generate m = 10 mixtures. Figure 10a illustrates the averaged output SINR performances of each algorithm with a standard deviation envelope for different input noise levels. In Figure 10b , we observe the SINR convergence behavior of nonnegative sparse CorInfoMax as a function of update iterations. Note that CorInfoMax outperforms the biologically plausible neural network WSM for all input SNR levels, and its convergence is noticeably stable. 

D.2.3 SIMPLEX SOURCE SEPARATION

We repeat both experimental settings in Appendix D.2.1 for the blind separation of simplex sources using the CorInfoMax network in Figure 7 . Figure 11a shows the output SINR results of both the online CorInfoMax and the batch LD-InfoMax approaches for different mixture SNR levels. Even though simplex CorInfoMax is not as successful as the other examples (e.g., sparse CorInfoMax), in terms of the closeness of its performance to the batch algorithms, it still has satisfactory source separation capability. Similarly to the sparse network examples, its SINR convergence is fast and smooth, as illustrated in Figure 11b . Figure 12 shows the box plots of the SINR performances for both CorInfoMax and LD-InfoMax approaches with respect to different distribution selections for generating the random mixing matrix. Based on this figure, we can conclude that the simplex CorInfoMax network significantly maintains its performance for different distributions for the entries of the mixing matrix. In this section, we demonstrate the source separation capability of CorInfoMax on an identifiable polytope with mixed features, which is a special case of feature-based polytopes in (2). We focus on the polytope ( 0 , 1 ) [ 1 , 1 ] [ 2 , 2 ] ( 0 , P ex =      s ∈ R 5 s 1 , s 2 , s 4 ∈ [-1, 1], s 3 , s 5 ∈ [0, 1], s 1 s 2 s 5 1 ≤ 1, s 2 s 3 s 4 1 ≤ 1      , (A.19) whose identifiability property is verified by the identifiable polytope characterization algorithm presented in Bozkurt & Erdogan (2022) . We experiment with both approaches discussed in Section 3.3 and Appendix C.5. For the feature-based polytope setting introduced in Appendix C.5, the output dynamics corresponding to P ex is also summarized as an example. This polytope can also be represented as the intersection of 10 half-spaces, that is, P ex = {s ∈ R n |A P s ≼ b P } where A P ∈ R 10×5 and b P ∈ R 10 . Therefore, using A P and b P , we can also employ the neural network illustrated in Figure 1a with 10 inhibitory neurons. For this BSS setting, we generated the sources uniformly within this 5 dimensional polytope P ex where the sample size is 5×10 5 . The source vectors are mixed through a random matrix A ∈ R 10×5 with standard normal entries. Figure 13a and 13b show the SINR convergence of CorInfoMax networks based on the feature-based polytope representation in (2) and the H-representation in (1) respectively, for the mixture SNR level of 30dB. Moreover, Figure 13c and 13d illustrate their SINR convergence curves for the SNR level of 40dB. In Table 2 , we compare both approaches with the batch algorithms LD-InfoMax and PMF. We consider sparse dictionary learning for natural images, to model receptive fields in the early stages of visual processing (Olshausen & Field, 1997) . In this experiment, we used 12 × 12 prewhitened image patches as input to the sparse CorInfoMax network. The image patches are obtained from the website http://www.rctn.org/bruno/sparsenet. The inputs are vectorized to the shape 144×1 before feeding to the neural network illustrated in Figure 2 . Figure 14a illustrates the dictionary learned by the sparse CorInfoMax network.

D.3.2 DIGITAL COMMUNICATION EXAMPLE: 4-PAM MODULATION SCHEME

One successful application of antisparse source modeling to solve BSS problems is digital communication systems Cruces (2010); Erdogan (2013) . In this section, we verify that the antisparse CorInfoMax network in Figure 4 can separate 4-pulse-amplitude-modulation (4-PAM) signals with domain {-3, -1, 1, 3} (with a uniform probability distribution). We consider that 5 digital communication (4-PAM) sources with 10 5 samples are transmitted and then mixed through a Gaussian channel to produce 10 mixtures. The mixtures can represent signals received at some base station antennas in a multipath propagation environment. Furthermore, the mixtures are corrupted by WGN that corresponds to SNR level of 30dB. We feed the mixtures to the antisparse CorInfoMax network illustrated as input. For 100 different realization of the experimental setup, Figure 14b illustrates the SINR convergence as a function of update iterations. We note that the proposed approach distinguishably converges fast and each realization of the experiments resulted in a zero symbol error rate.

D.3.3 VIDEO SEPARATION

We provide more details on the video separation experiment discussed in Section 4.2. Three source videos we used in this experiment are from the website https://www.pexels.com/, which are free to download and use. Videos are mixed linearly through a randomly selected nonnegative 5 × 3 matrix 5 for the separation of the videos, we followed the procedure below. A =      • In each iteration of the algorithm, we randomly choose a pixel location and select pixels from one of the color channels of all mixture videos to form a mixture vector of size 5 × 1, • We sample 20 mixture vectors from each frame and perform 20 algorithm iterations per frame using these samples. The demonstration video contains three rows of frames: the first row contains the source frames, the second row contains three of the five mixture frames, and the last row contains the network outputs for these mixture frames. Demo video is located at (https://figshare.com/s/a3fb926f273235068053). It is also included in supplementary files. If we use the separator matrix of the CorInfoMax network, which is an estimate for the left inverse of A, to predict the original videos after training, we obtain PSNR values of 35.60dB, 48.07dB, and 44.58dB for the videos, which are calculated as the average PSNR levels of the frames. Figure 15c illustrates the final output frames of the CorInfoMax. We experiment with the proposed antisparse CorInfomax and biologically plausible NSM and WSM networks, and batch ICA-InfoMax and LD-InfoMax frameworks to separate the original sources, and Figures 16c-16g show the corresponding outputs. We note that the residual interference effects in the output images of ICA-InfoMax algorithm is remarkably perceivable, and the resulting PSNR values are 18.56dB, 20.52dB, and 21.06dB, respectively. The NSM algorithm's outputs are visually better whereas some interference effects are still noticable, and the resulting PSNR values are 25.30dB, 26.49dB, 26.45dB. The visual interference effects in the WSM algorithm's outputs are barely visible, and, therefore, they achieve higher PSNR values of 27.99dB, 29.71dB, and 31.92dB. The batch LD-InfoMax algorithm achieves the best PSNR performances which are 33.60dB, 31.99dB, and 33.62dB. Finally, we note that our proposed CorInfoMax method outperforms other biologically plausible neural networks and the batch ICA-InfoMax algorithm, while its performance is on par with the batch LD-InfoMax algorithm, and its output PSNR values are 32.45dB, 29.72dB, and 32.37dB. We note that ICA and NSM algorithms assume independent and uncorrelated sources, respectively, so their performances are remarkably affected by the correlation level of the sources. Typically, the best performance is obtained by LD-InfoMax due to its batch nature and dependent source separation capability. Finally, we notice that the biologically plausible WSM network is able to separate correlated sources whereas CorInfoMax performs better both visually and in terms of the PSNR metric. The hyperparameters used in this experiment for CorIn-foMax are included in Appendix D.4, and the codes to reproduce each output are included in our supplementary material. 

D.4 HYPERPARAMETER SELECTIONS

Hyperparameter selection has a critical impact on the performance of the neural networks offered in this article. In this section, we provide the list of our hyperparameter selections for the experiments provided in the article. These parameters are selected through ablation studies and some trials, which are discussed in Appendix D.5. Table 3 summarizes the hyperparameter selections for the special domain choices provided in Section 2.1. Based on this table, we can observe that the hyperparameter sets for different special source domains generally resemble each other. However, there are some noticable domain-specific changes in some parameters, such as the starting learning rate for neural dynamics (η y (1)) and the learning rate for the Lagrangian variable (η λ (ν)). For the other experiments presented in the Appendices D.2.4 and D.3, Table 4 summarizes the corresponding hyperparameter selections. feedforward weights µ W and the initialization of the inverse error correlation matrix B ζe e . For these hyperparameters, we consider the following selections: • µ W ∈ {5 × 10 -3 , 10 -2 , 3 × 10 -2 , 5 × 10 -2 }, • B ζe e ∈ {10 3 , 2 × 10 3 , 5 × 10 3 , 10 4 }. We consider the experimental setup in Section 4.1 for both uncorrelated sources and correlated sources with correlation parameter ρ = 0.6. For the ablation study for µ W , we used fixed B ζe e as indicated in Table 3 , and vice versa. Figures 17a and 17b illustrate the mean SINR with standard deviation envelope results with respect to µ W for uncorrelated and correlated source separation settings, respectively. Note that although selection µ W = 5 × 10 -2 seems to be better for uncorrelated sources, its performance degrades for correlated sources as for some of the realizations, the algorithm diverges. We conclude that the selection µ W = 3 × 10 -2 is near optimal in this setting and obtains good SINR results for both correlated and uncorrelated sources. The number of mixtures with respect to the number of sources might be crucial for blind source separation problem, and it can affect the overall performance of the proposed method. To explore the impact of the number of mixtures on the SINR performance of our proposed method, we performed experiments with varying number of mixtures and fixed number of sources for nonnegative antisparse, i.e., P = B ∞,+ , and sparse, i.e., P = B 1 , source separation settings. In these experi-mental settings, we consider 5 sources, and change the number of mixtures gradually from 5 to 10. Figure 18a and 18b illustrate the overall SINR performances with a standard deviation envelope for nonnegative antisparse CorInfoMax and sparse CorInfoMax networks with respect to the number of mixtures, which are averaged over 50 realizations, respectively. For each realization, we randomly generate a mixing matrix with i.i.d. standard normal entries. We observe that the performance of CorInfoMax networks monotonically improves as the number of mixtures increases. This aligns with the theoretical expectations that the condition the random mixing matrix improve with increasing number of mixtures Chen & Dongarra (2005) which positively impacts the algorithm's numerical performance. As a result, ν max neural dynamics iterations require ν max (mn+n 2 +n) ≈ ν max mn multiplications per output computation. Weight Updates' Complexity: For the weight updates, we note the following number of operations Therefore, weight updates for learning require 2mn + 7n 2 +n+2 2 multiplications per input sample. Taking into account both components of the computations, complexity is dominated by the neural dynamics iterations which require approximately O(ν max mn) operations. This is in the same order as the complexity reported in Bozkurt et al. (2022) for biologically plausible WSM, NSM, and BSM networks.



In this article, we use "correlative mutual information" and "LD-mutual information" interchangeably. https://figshare.com/s/a3fb926f273235068053



Applications include digital communication signals Erdogan (2013). iii. Bounded sparse sources in the SCA framework: B ℓ1 = {s | ∥s∥ 1 ≤ 1}. Applications: modeling efficient representations of stimulus such as vision Olshausen & Field (1997) and sound Smith & Lewicki (2006). iv Nonnegative bounded antiparse sources in the nonnegative-BCA framework: B ℓ∞,+ = B ℓ∞ ∩R n + . Applications include natural images Erdogan (2013). v. Nonnegative bounded sparse sources in the nonnegative-SCA framework: B ℓ1,+ = B ℓ1 ∩ R n + . Potential applications similar to ∆ in (i).

) define a recurrent neural network, where W (k) represents feedforward synaptic connections from the input x(k) to the error e(k), B ζe e (k) represents feedforward connections from the error e(k) to the output y(k), and B ζy y (k) corresponds to lateral synaptic connections among the output components. Next, we examine the learning rules for this network. Update of inverse correlation matrices B ζy y (k) and B ζe e (k): We can obtain the update expressions by applying matrix inversion lemma to ( Rζy y (k) + ϵI) -1 and ( Rζe e (k) + ϵI) -1 , as derived in Appendix B to obtain (A.10) and (A.12)

)y(k)y(k) T B ζy y (k)) 6: end for Neural Network Realizations: Figure 2a shows the three-layer realization of the sparse CorInfo-Max neural network based on the network dynamics expressions in (8)-(11) and the approximation B ζe e (k) ≈ 1 ϵ I. The first layer corresponds to the error (e(k)) neurons, and the second layer corresponds to the output (y(k)) neurons with soft thresholding activation functions. If we substitute (8) and B ζe e (k + 1) = 1 ϵ I in (9), we obtain ∇ y(k) J(y(k; ν)) = M ζy y (k)y(k; ν) + γ e ϵ W (k)x(k) where M ζy y (k) = γ y B ζy y (k) -γe ϵ I. Therefore, this gradient expression and (10)-(11) correspond to the two-layer network shown in Figure 2b.

Figure 2: Sparse CorInfoMax Network: (a) three-layer (b) two-layer. x i 's and y i 's represent inputs (mixtures) and (separator) outputs , respectively, W are feedforward weights, e i 's in the threelayer implementation (on the left) are errors between transformed inputs and outputs, B Y in (a), the inverse of output autocorrelation matrix, represents lateral weights at the output. M Y in (b), represents the output lateral weights, which is a diagonally modified form of the inverse of output correlation. In both representations, the rightmost interneuron impose the sparsity constraint. The neural network derivation examples for other source domains are provided in Appendix C.

Figure 3: The SINR performances of CorInfoMax (ours), LD-InfoMax, PMF, ICA-InfoMax, NSM, and BSM, averaged over 100 realizations, (y-axis) with respect to the correlation factor ρ (x-axis). SINR vs. ρ curves for (a) nonnegative antisparse (B ℓ∞,+ ), (b) antisparse (B ℓ∞ ) source domains.

Figure 4: Two-Layer antisparse CorInfoMax Network. x i 's and y i 's represent inputs (mixtures) and (separator) outputs , respectively, W represents feedforward weights, e i 's are errors between transformed inputs and outputs, B Y , the inverse of output autocorrelation matrix, represents lateral weights at the output. The output nonlinearities are clipping functions.

Figure 5: Two-Layer nonnegative antisparse CorInfoMax Network. x i 's and y i 's represent inputs (mixtures) and (separator) outputs , respectively, W represents feedforward weights, e i 's are errors between transformed inputs and outputs, B Y , the inverse of output autocorrelation matrix, represents lateral weights at the output. The output nonlinearities are nonnegative clipping functions.

Figure 7: Three-Layer CorInfoMax Network for unit simplex sources. x i 's and y i 's represent inputs (mixtures) and (separator) outputs , respectively, W represents feedforward weights, e i 's are errors between transformed inputs and outputs, B Y , the inverse of output autocorrelation matrix, represents lateral weights at the output. The output nonlinearities are ReLU functions. The leftmost interneuron imposes sparsity constraints on the outputs through inhibition.

ζy y (1), B ζe e (1), and select P. 2: for k = 1, 2, . . . , N do 3: run neural dynamics (in Algorithms 3 to 5 below according to P) until convergence .4:

Online CorInfoMax neural dynamic iterations: sources in unit simplex 1: Initialize ν max , ϵ t , η y (1), η λ (1), λ(1)( = 0 in general) and ν = 1 2: while (||y(k; ν) -y(k; ν -1)||/||y(k; ν)|| > ϵ t ) and ν < ν max do 3: ∇ y(k) J (y(k; ν)) = γ y B ζy y (k)y(k; ν) -γ e B ζe e (k)e(k; ν) 4:

and adjust η y (ν), η λ (ν) if necessary. 8: end while Algorithm 4 Online CorInfoMax neural dynamic iterations: sparse sources 1: Initialize ν max , ϵ t , η y (1), η λ (1), λ(1)( = 0 in general) and ν = 1 2: while (||y(k; ν) -y(k; ν -1)||/||y(k; ν)|| > ϵ t ) and ν < ν max do 3: ∇ y(k) J (y(k; ν)) = γ y B ζy y (k)y(k; ν) -γ e B ζe e (k)e(k; ν) 4:

and adjust η y (ν), η λ (ν) if necessary. 8: end while Algorithm 5 Online CorInfoMax neural dynamic iterations: antisparse sources 1: Initialize ν max , ϵ t , η y (1) and ν = 1 2: while (||y(k; ν) -y(k; ν -1)||/||y(k; ν)|| > ϵ t ) and ν < ν max do 3: ∇ y(k) J (y(k; ν)) = γ y B ζy y (k)y(k; ν) -γ e B ζe e (k)e(k; ν) 4: y(k; ν + 1) = σ 1 y(k; ν) + η y (ν)∇ y(k) J (y(k; ν)) 5: ν = ν + 1, and adjust η y (ν) if necessary. 6: end while Algorithm 6 Online CorInfoMax neural dynamic iterations for Canonical Form 1: Initialize ν max , ϵ t , η y (1), η λ (1), λ(1)( = 0 in general) and ν = 1 2: while (||y(k; ν) -y(k; ν -1)||/||y(k; ν)|| > ϵ t ) and ν < ν max do 3:

Figure 8: SINR performances of CorInfoMax (ours), LD-InfoMax, PMF, and WSM (averaged over 50 realizations) for sparse sources: (a) output SINR results (vertical axis) with respect to input SNR levels (horizontal axis), (b) SINR (vertical axis) convergence plot as a function of iterations (horizontal axis) of sparse CorInfoMax for the 30 dB SNR level (mean solid line and standard deviation envelopes).

Figure 9: SINR performances of CorInfoMax (ours), LD-InfoMax, and PMF with different distribution selections for the mixing matrix entries and for sparse sources. The horizontal axis represents different distribution choices, and the vertical axis represents the SINR levels.

Figure 10: SINR performances of CorInfoMax (ours), LD-InfoMax, and WSM for nonnegative sparse sources: (a) the output SINR (vertical axis) results with respect to the input SNR levels (horizontal axis), (b) the SINR (vertical axis) convergence plot as a function of iterations (horizontal axis) of nonnegative sparse CorInfoMax for the 30dB SNR level (mean solid line and standard deviation envelopes).

Figure 11: SINR performances of CorInfoMax (ours) and LD-InfoMax (averaged over 50 realizations) for unit simplex sources: (a) the output SINR (vertical axis) results with respect to the input SNR levels (horizontal axis), (b) SINR (vertical axis) convergence plot as a function of iterations (horizontal axis) of simplex-CorInfoMax for 30dB SNR level (mean solid line and standard deviation envelopes).

Figure 13: SINR performances of CorInfoMax networks on P ex with mean-solid line with standard deviation envelope (averaged over 50 realization) for the polytope with mixed latent attributes: (a) SINR convergence curve for CorInfoMax feature-based polytope formulation with 30dB mixture SNR, (b) SINR convergence curve for CorInfoMax canonical formulation with 30dB mixture SNR, (c) SINR convergence curve for CorInfoMax feature-based polytope formulation with 40dB mixture SNR, (d) SINR convergence curve for CorInfoMax canonical formulation with 40dB SNR.

Figure 14: (a) Sparse dictionary learned by sparse CorInfoMax from natural image patches, (b) The SINR (vertical axis) convergence curve for the 4-PAM digital communication example as a function of iterations (horizontal axis) which is averaged over 100 realizations: mean-solid line with standard deviation envelope.

Figure 15: Video separation example: the final frames of (a) sources, (b) mixtures, and (c) the outputs of the nonnegative antisparse CorInfoMax network.

Figure 16: Image separation example: (a) Original RGB images, (b) mixture RGB images, (c) ICA outputs.

Figures 17a and 17b illustrate the mean SINR with standard deviation envelope results with respect to µ W for uncorrelated and correlated source separation settings, respectively. Note that although selection µ W = 5 × 10 -2 seems to be better for uncorrelated sources, its performance degrades for correlated sources as for some of the realizations, the algorithm diverges. We conclude that the selection µ W = 3 × 10 -2 is near optimal in this setting and obtains good SINR results for both correlated and uncorrelated sources. For the initialization of B ζe e , Figures17c and 17dillustrate the SINR performances of CorInfoMax for uncorrelated and correlated source separation experiments, respectively. Note that the selections B ζe e 2000I and B ζe e = 5000I are suitable considering both settings.

Figure 18: CorInfoMax ablation studies on the impact of number of mixtures with fixed number of sources (5 sources). (a) illustrates the SINR performances of nonnegative antisparse CorInfoMax network as a function of number of mixtures, (b) illustrates the SINR performances of sparse Cor-InfoMax network as a function of number of mixtures.

W (k + 1) = W (k) + µ W (k)e(k)x(k) T requires 2mn multiplication,

matrix for the separator output vector, Re equals Ry -RT xy ( Rx + ϵI) -1 Ryx where Rxy is the sample cross-correlation, i.e., Rxy = 1 N

2, Appendix C.1, Appendix C.2, Appendix C.3, Appendix C.4 and Appendix C.5, ii. Detailed information on the simulation settings of the experiments in Section 4 in the main article, and Appendix D, iii. Full list of hyperparameter sets used in these experiments inTable 3, Table 4 in Appendix D.4, iv. Ablation studies on hyperparameters in Appendix D.5, v. Algorithm descriptions for special source domains in pseudo-code format in Appendix D.1, vi. Python scripts and notebooks for individual experiments to replicate the reported results in the supplementary zip file as well as in https://github.com/BariscanBozkurt/Bio-Plausible-CorrInfoMax. 7 ETHICS STATEMENT Related to the algorithmic framework we propose in this article, we see no immediate ethical concerns. In addition, the datasets that we use have no known or reported ethical issues, to the best of our knowledge. Frederik Träuble, Elliot Creager, Niki Kilbertus, Francesco Locatello, Andrea Dittadi, Anirudh Goyal, Bernhard Schölkopf, and Stefan Bauer. On disentangled representations learned from correlated data. In International Conference on Machine Learning, pp. 10401-10412. PMLR, 2021.

Figure 12: SINR performances of CorInfoMax (ours) and LD-InfoMax with different distribution selections for the mixing matrix entries (averaged over 50 realizations) and unit simplex sources. The horizontal axis represents different distribution choices, and the vertical axis represents the SINR levels. D.2.4 SOURCE SEPARATION FOR POLYTOPES WITH MIXED LATENT ATTRIBUTES

Source separation averaged SINR results on P ex for the CorInfoMax (ours), CorInfoMax Canonical (ours) LD-InfoMax, and PMF algorithms (averaged for 50 realizations).We present several potential applications of the proposed approach, for which we illustrate the usage of antisparse, nonnegative antisparse and sparse CorInfoMax neural networks. This section demonstrates sparse dictionary learning, source separation for digital communication signals with 4-PAM modulation scheme, and video separation.

8. ACKNOWLEDGMENTS AND DISCLOSURE OF FUNDING

This work was supported by KUIS AI Center Research Award. CP was supported by an NSF Award (DMS-2134157) and the Intel Corporation through the Intel Neuromorphic Research Community.

annex

to y(k). First, we consider the derivative of both log det( Rζy y (k) + ϵI) and log det( Rζe e (k) + ϵI) with respect to y(k). T + e (i) y(k) T ).(A.5)In (A.5 ), e (i) denotes the standard basis vector with a 1 at position i and should not be confused with the error vector e(k). Combining (A.4) and (A.5), we obtain the following result:which leads toIf we apply the same procedure to obtain the gradient of log det( Rζe e (k) + ϵI) with respect to e(k), we obtainUsing the composition rule, we can obtain the gradient of log det( Rζe e (k)+ϵI) with respect to y(k) as follows:Finally, combining the results from (A.6) and (A.7), we obtain the derivative of the objective function J (y(k)) with respect to y(k)For further simplification of (A.8), we define the recursions for Rζy y (k)-1and Rζe e (k)-1based on the recursive definitions of the corresponding correlation matrices. Based on the definition in (4), we can writePublished as a conference paper at ICLR 2023 Using the assumption in (A.9), we take the inverse of both sides and apply the matrix inversion lemma (similar to its use in the derivation of the RLS algorithm Kailath et al. (2000) ) to obtainwhereWe apply the same procedure to obtain the inverse of ( Rζe e (k) + ϵI) -1 :whereNote that plugging (A.10) into the first part of (A.8) yields the following simplification:A similar simplification can be obtained for (A.12), and incorporating these simplifications into (A.8 yields 

DOMAINS

We can generalize the procedure for obtaining CorInfoMax BSS networks in Section 3.2 for other source domains. The choice of source domain would affect the structure of the output layer and the potential inclusion of additional interneurons. Table 1 summarizes the output dynamics for special source domains provided in Section 2.1, for which the derivations are provided in the following subsections. Recursive update dynamics for y(k): Based on the gradient of (5a) with respect to y(k) in (A.14), derived in Appendix B, we can write the corresponding projected gradient ascent iterations for ( 5) as The corresponding network structure is illustrated in Figure 6 . In this context, contrary to the optimization problem defined in (7), we do not require that the Lagrangian variable λ be nonnegative, due to the equality constraint in (A.15b). Hence, we write the network dynamics for a simplex source as , ζ e = 1 -10 -1 /3, µ W = 3 × 10 -2 ν max = 500, η y (ν) = max{0.9/ν, 10 -3 }, ϵ t = 10 -6 

