SIGNAL CODING AND RECONSTRUCTION USING SPIKE TRAINS

Abstract

In many animal sensory pathways, the transformation from external stimuli to spike trains is essentially deterministic. In this context, a new mathematical framework for coding and reconstruction, based on a biologically plausible model of the spiking neuron, is presented. The framework considers encoding of a signal through spike trains generated by an ensemble of neurons via a standard convolve-thenthreshold mechanism, albeit with a wide variety of convolution kernels. Neurons are distinguished by their convolution kernels and threshold values. Reconstruction is posited as a convex optimization minimizing energy. Formal conditions under which perfect and approximate reconstruction of the signal from the spike trains is possible are then identified. Coding experiments on a large audio dataset are presented to demonstrate the strength of the framework.

1. INTRODUCTION

In biological systems, sensory stimuli is communicated to the brain primarily via ensembles of discrete events that are spatiotemporally compact electrical disturbances generated by neurons, otherwise known as spikes. Spike train representation of signals, when sparse, are not only intrinsically energy efficient, but can also facilitate downstream computation (6; 10) . In their seminal work, Olshausen and Field (13) showed how efficient codes can arise from learning sparse representations of natural stimulus statistics, resulting in striking similarities with observed biological receptive fields. (19) developed a biophysically motivated spiking neural network which for the first time predicted the full diversity of V1 simple cell receptive field shapes when trained on natural images. Although these results signify substantial progress, an effective end to end signal processing framework that deterministically represents signals via spike train ensembles is yet to be laid out. Here we present a new framework for coding and reconstruction leveraging a biologically plausible coding mechanism which is a superset of the standard leaky integrate-and-fire neuron model (5) . Our proposed framework identifies reconstruction guarantees for a very general class of signals-those with finite rate of innovation (18)-as shown in our perfect and approximate reconstruction theorems. Most other classes, e.g. bandlimited signals, are subsets of this class. The proposed technique first formulates reconstruction as an optimization that minimizes the energy of the reconstructed signal subject to consistency with the spike train, and then solves it in closed form. We then identify a general class of signals for which reconstruction is provably perfect under certain ideal conditions. Subsequently, we present a mathematical bound on the error of an approximate reconstruction when the model deviates from those ideal conditions. Finally, we present simulation experiments coding for a large dataset of audio signals that demonstrate the efficacy of the framework. In a separate set of experiments on a smaller subset of audio signals we compare our framework with existing sparse coding algorithms viz matching pursuit and orthogonal matching pursuit, establishing the strength of our technique. The remainder of the paper is structured as follows. In Sections 2 and 3 we introduce the coding and decoding frameworks. Section 4 identifies the class of signals for which perfect reconstruction is achievable if certain ideal conditions are met. In Section 5 we discuss how in practice those ideal conditions can be approached and provide a mathematical bound for approximate reconstruction. Simulation results are presented in Section 6. We conclude in Section 8.

2. CODING

The general class of deterministic mappings (i.e., the set of all nonlinear operators) from continuous time signals to spike trains is difficult to characterize because the space of all spike trains does not lend itself to a natural topology that is universally embraced. The result is that simple characterizations, such as the set of all continuous operators, can not be posited in a manner that has general consensus. To resolve this issue, we take a cue from biological systems. In most animal sensory pathways, external stimulus passes through a series of transformations before being turned into spike trains (17) . For example, visual signal in the retina is processed by multiple layers of non-spiking horizontal, amacrine and bipolar cells, before being converted into spike trains by the retinal ganglion cells. Accordingly, we can consider the set of transformations that pass via an intermediate continuous time signal which is then transformed into a spike train through a stereotyped mapping where spikes mark threshold crossings. The complexity of the operator now lies in the mapping from the continuous time input signal to the continuous time intermediate signal. Since any time invariant, continuous, nonlinear operator with fading memory can be approximated by a finite Volterra series operator (2) , this general class of nonlinear operators from continuous time signals to spike trains can be modeled as the composition of a finite Volterra series operator and a neuronal thresholding operation to generate a spike train. Here, the simplest subclass of these transformations is considered: the case where the Volterra series operator has a single causal, bounded-time, linear term, the output of which is composed with a thresholding operation of a potentially time varying threshold. The overall operator from the input signal to the spike train remains nonlinear due to the thresholding operation. The code generated by an ensemble of such transformations, corresponding to an ensemble of spike trains, is explored. Formally, we assume the input signal X(t) to be a bounded square integrable function over the compact interval [0, T ] for some T ∈ R + , i.e., we are interested in the class of input signals F = {X(t)|X(t) ∈ L 2 [0, T ]}. Since the framework involves signal snippets of arbitrary length, this choice of T is without loss of generalization. We assume an ensemble of convolution kernels K = {K j |j ∈ Z + , j ≤ n}, consisting of n kernels K j , j = 1, . . . , n. We assume that K j (t) is a continuous function on a bounded time interval [0, T ], i.e. ∀j ∈ {1, . . . , n}, K j (t) ∈ C[0, T ], T ∈ R + . Finally, we assume that K j has a time varying threshold denoted by T j (t). The ensemble of convolution kernels K encodes a given input signal X(t) into a sequence of spikes {(t i , K ji )}, where the i th spike is produced by the j th i kernel K ji at time t i if and only if: X(τ )K ji (t i -τ )dτ = T ji (t i ) In our experiments a specific threshold function is assumed in which the time varying threshold T j (t) of the jth kernel remains constant at C j until that kernel produces a spike, at which time an after-hyperpolarization potential (ahp) is introduced to raise the threshold to a high value M j C j , which then drops back linearly to its original value within a refractory period δ j . Stated formally, T j (t) =    C j , t -δ j > t j l (t) M j - (t-t j l (t))(M j -C j ) δj , t -δ j ≤ t j l (t) Where t j l (t) denotes the time of the last spike generated by K j prior to time t.

3. DECODING

How rich is the coding mechanism just described? We can investigate this question formally by positing a decoding module. The objective of the decoding module is to reconstruct the original signal from the encoded ensemble of spike trains. It is worthwhile to mention that to be able to communicate signals properly by our proposed framework, the decoding module needs to be designed in a manner so that it can operate solely on the spike train data handed over by the encoding module, without explicit access to the input signal itself. Considering the prospect of the invertibility of the coding scheme, we seek a signal that satisfies the same set of constraints as the original signal when generating all spikes apropos the set of kernels in ensemble K. Recognizing that such a signal might not be unique, we choose the reconstructed signal as the one with minimum L2-norm. Formally, the reconstruction (denoted by X * (t)) of the input signal X(t) is formulated to be the solution to the optimization problem: X * (t) = argmin X || X(t)|| 2 2 s.t. X(τ )K ji (t i -τ )dτ = T ji (t i ); 1 ≤ i ≤ N where {(t i , K ji )|i ∈ {1, ..., N }} is the set of all spikes generated by the encoder. The choice of L 2 minimization as the objective of the reconstruction problem-which is the linchpin of our framework, as demonstrated in the theorems-can only be weakly justified at the current juncture. The perfect reconstruction theorem that follows provides the strong justification. As it stands, the L 2 minimization objective is in congruence with the dictum of energy efficiency in biological systems. The assumption is that, of all signals, the one with the minimum energy that is consistent with the spike trains is desirable. Additionally, an L 2 minimization in the objective of (2) reduces the convex optimization problem to a solvable linear system of equations as shown in Lemmas 1 and 3. Later we shall show that L 2 -minimization has the surprising benefit of recovering the original signal perfectly under certain conditions.

4. SIGNAL CLASS FOR PERFECT RECONSTRUCTION

To establish the effectiveness of the described coding-decoding model, we have to evaluate the accuracy of reconstruction over a class of input signals. We observe that in general the encoding of square integrable signals into spike trains is not a one-to-one map; the same set of spikes can be generated by different signals so as to result in the same convolved values at the spike times. Naturally, with a finite and fixed ensemble of kernels K, one cannot achieve perfect reconstruction for the general class of signals F as defined in Section 2. We now restrict ourselves to a subset G of the original class F defined as G = {X(t)|X(t) ∈ F, X(t) = N p=1 α p K jp (t p -t), j p ∈ {1, ..., n}, α p ∈ R, t p ∈ R + , N ∈ Z + } and address the question of reconstruction accuracy. Essentially G consists of all linear combinations of arbitrarily shifted kernel functions. N is bounded above by the total number of spikes that the ensemble K can generate over [0, T ]. In the parlance of signal processing, G constitutes Finite rate of Innovation signals (18) . For the class G the perfect reconstruction theorem is presented below. The theorem is proved with the help of three lemmas. Perfect Reconstruction Theorem: Let X(t) ∈ G be an input signal. Then for appropriately chosen time-varying thresholds of the kernels, the reconstruction, X * (t), resulting from the proposed codingdecoding framework is accurate with respect to the L 2 metric, i.e., ||X * (t) -X(t)|| 2 = 0. Lemma 1: The solution X * (t) to the reconstruction problem given by ( 2) can be written as: X * (t) = N i=1 α i K ji (t i -t) where the coefficients α i ∈ R can be solved from a system of linear equations. Proof: An approach analogous to the Representer Theorem (15) , splitting a putative solution to (2) into its within the span of the kernels component and a remnant orthogonal component, results in equation ( 3). In essence, the reconstructed signal X * (t) becomes a summation of the kernels, shifted to their respective times of generation of spikes, scaled by appropriate coefficients. Plugging (3) into the constraints (2) gives: ∀ 1≤i≤N ; N k=1 α k K j k (t k -t)K ji (t i -t)dτ = T ji (t i ) Setting b i = T ji (t i ) and P ik = K j k (t k -τ )K ji (t i -τ )dτ results in: ∀ 1≤i≤N ; N k=1 P ik α k = b i Equation (4) defines a system of N equations in N unknowns of the form: P α = T where α = α 1 , ..., α N T , T = T j1 (t 1 ), ..., T j N (t N ) T and P is an N × N matrix with elements P ik = K j k (t k -τ )K ji (t i -τ )dτ . Clearly P is the Gramian Matrix of the shifted kernels {K ji (t i -t)|i ∈ 1, 2, ..., N } in the Hilbert space with the standard inner product. It is well known that P is invertible if and only if {K ji (t i -t)|i ∈ 1, 2, ..., N } is a linearly independent set. If P is invertible α has a unique solution. If, on the other hand, P is not invertible, α has multiple solutions. However, as the next lemma shows, every such solution leads to the same reconstruction X * (t), and hence any value of α that satisfies 5 can be chosen. We note in passing that in our experiments we have used the least square solution. Import: The goal of the optimization problem is to find the best object in the feasible set. However, the application of the Representer Theorem converts the constraints into a determined system of unknowns and equations, turning the focus onto the feasible set, effectively changing the optimization problem into a solvable system that results in a closed form solution for the α i 's. This implies that instead of solving (2), we can solve for the reconstruction from X * (t) = N i=1 α i K ji (t i -t), where α i is the i-th element of α = P -1 T . Here, P -1 represents either the inverse or the Moore-Penrose inverse, as the case may be. Lemma 2: Let equation 5 resulting from the optimization problem 2 have multiple solutions. Consider any two different solutions for α, namely α 1 and α 2 , and hence the corresponding reconstructions are given by X 1 (t) = N i=1 α 1i K ji (t i -t) and X 2 (t) = N i=1 α 2i K ji (t i -t), respectively. Then X 1 = X 2 . Proof: The proof of this lemma follows from the existence of a unique function in the Hilbert Space spanned by {K ji (t i -t)|i ∈ 1, 2, ..., N } that satisfies the constraint of equation 2. The details of the proof is furnished in the appendix A. Import: Lemma 2 essentially establishes the uniqueness of solution to the optimization problem formulated in 2 as any solution to equation 5. The proof follows from the fact that the reconstruction is in the span of the shifted kernels {K ji (t i -t)|i ∈ 1, 2, ..., N } and the inner products of the reconstruction with each of K ji (t i -t) is given (by the spike constraints of 2). Such a reconstruction must be unique in the subspace S. Lemma 3: Let X * (t) be the reconstruction of an input signal X(t) and {(t i , K ji )} N i=1 be the set of spikes generated. Then, for any arbitrary signal X(t) within the span of {K ji (t it)|i ∈ {1, 2, ..., N }}, i.e., the set of shifted kernels at respective spike times, given by X(t) = N i=1 a i K ji (t i -t) the following inequality holds: ||X(t) -X * (t)|| ≤ ||X(t) -X(t)|| Proof: ||X(t) -X(t)|| = || X(t) -X * (t) A + X * (t) -X(t) B || First, A, K ji (t i -t) = X(t), K ji (t i -t) -X * (t), K ji (t i -t) , ∀i ∈ {1, 2, .., N } = T ji (t i ) -T ji (t i ) = 0 (Using the constraints in (2) & (2)) Second, A, B = A, N i=1 (α i -a i )K ji (t i -t) (By Lemma 1 X * (t) = N i=1 α i K ji (t i -t)) = N i=1 (α i -a i ) A, K ji (t i -t) = 0 =⇒ ||X(t) -X(t)|| 2 = ||A + B|| 2 = ||A|| 2 + ||B|| 2 ≥ ||A|| 2 = ||X(t) -X * (t)|| 2 =⇒ ||X(t) -X(t)|| ≥ ||X(t) -X * (t)|| Import: The implication of the above lemma is quite remarkable. The objective defined in (2) chooses a signal with minimum energy satisfying the constraints, deemed the reconstructed signal. However as the lemma demonstrates, this signal also has the minimum error with respect to the input signal in the span of the shifted kernels. This signifies that our choice of the objective in the decoding module not only draws from biologically motivated energy optimization principles, but also performs optimally in terms of reconstructing the original input signal within the span of the appropriately shifted spike generating kernels. Corollary: An important consequence of Lemma 3 is that additional spikes in the system do not worsen the reconstruction. For a given input signal X(t) if S 1 and S 2 are two sets of spike trains where S 1 ⊂ S 2 , the second a superset of the first, then Lemma 3 implies that the reconstruction due to S 2 is at least as good as the reconstruction due to S 1 because the reconstruction due to S 1 is in the span of the shifted kernel functions of S 2 as S 1 ⊂ S 2 . This immediately leads to the conclusion that for a given input signal the more kernels we add to the ensemble the better the reconstruction. Proof of the Theorem: The proof of the theorem follows directly from Lemma 3. Since the input signal X(t) ∈ G, let X(t) be given by: X(t) = N p=1 α p K jp (t p -t) (α p ∈ R, t p ∈ R + , N ∈ Z + ) Assume that the time varying thresholds of the kernels in our kernel ensemble K are set in such a manner that the following conditions are satisfied: X(t), K jp (t p -t) = T jp (t p ) ∀p ∈ {1, ..., N } i.e., each of the kernels K jp at the very least produces a spike at time t p against X(t) (regardless of other spikes at other times). Clearly then X(t) lies in the span of the appropriately shifted response functions of the spike generating kernels. Applying Lemma 3 it follows that: ||X(t) -X * (t)|| 2 ≤ ||X(t) -X(t)|| 2 = 0 Import: In addition to demonstrating the potency of the coding-decoding scheme, this theorem frames Barlow's efficient coding hypothesis (1)-that the coding strategy of sensory neurons be adapted to the statistics of the stimuli-in mathematically concrete terms. Going by the theorem, the spike based encoding necessitates the signals to be in the span of the encoding kernels for perfect reconstruction. Inverting the argument, kernels must learn to adapt to the basis elements that generate the signal corpora for superior reconstruction.

5. APPROXIMATE RECONSTRUCTION AND THE EFFECT OF AHP

The perfect reconstruction theorem stipulates the conditions under which exact recovery of a signal is feasible in the proposed framework. At first glance, it may seem challenging to meet these conditions for an arbitrary class of natural signals. The concern stems from two difficulties: firstly, given a fixed set of kernels, the input signal may not lie in the span of their arbitrary shifts, and secondly, we may not be able to generate spikes at the desired locations as postulated in the proof of the theorem. To address these issues, we observe that our decoding model is a continuous transformation from the space of spike trains to L 2 -functions, in the sense that small changes in spike times or a slight mismatch of the spiking kernels from the components of the signal, bring about only small changes in the reconstruction. In what follows, we furnish an Approximate Reconstruction Theorem (C) that provides a bound on the reconstruction error under such deviations. To address the first problem, it is important to choose kernel functions appropriately so that they can represent the input signals reasonably well. One can leverage biological knowledge; for example, it is well-known that auditory filters are effectively modeled using gammatones (14) . Hence our experiments in Section 6 on auditory signals were coded using gammatone kernels. Not surprisingly, the reconstructions were excellent. To alleviate the second problem, we observe that spikes can be produced reasonably close to the desired locations by setting a low baseline threshold and a small refractory period of the after-hyperpolarization potential (ahp) for each kernel, a technique that is guaranteed to give good results as is confirmed by our experiments in Section 6. The following lemma formalizes the notion of how lowering the threshold and the refractory period of a kernel helps in generating spikes at the desired locations. The lemma is followed by the Approximate Reconstruction Theorem. Lemma 4: Let X(t) be an input signal. Let K p be a kernel for which we want to generate a spike at time t p . Let the inner product X(t), K p (t p -t) = I p . Then, if the baseline threshold of the kernel K p is C p ≤ I p and the absolute refractory period is δ as modeled in Equation 1, the kernel K p must produce a spike in the interval [t p -δ, t p ] according to the threshold model defined in Equation 1. Proof: The proof of this lemma follows directly from the intermediate value theorem and is detailed in appendix B. Approximate Reconstruction Theorem: Let the following assumptions be true: • X(t), the input signal to the proposed framework, can be written as a linear combination of some component functions as-X(t) = N i=1 α i f pi (t i -t) where α i are bounded real coefficients, the component functions f pi (t) are chosen from a possibly infinite set H = {f i (t)|i ∈ Z + , ||f i (t)|| = 1} of functions of unit L 2 norm with compact support, and the corresponding t i ∈ R + are chosen to be bounded arbitrary time shifts of the component functions so that the overall signal has a compact support in [0, T ] for some T ∈ R + and thus the input signal still belongs to the same class of signals F, as defined in section 2. • There is at least one kernel K ji from the bag of encoding kernels K, such that the L 2 -distance of f pi (t) from K ji (t) is bounded. Formally, ∃ δ ∈ R + s.t. ||f pi (t) -K ji (t)|| 2 < δ ∀ i ∈ {1, ..., N }. • When X(t) is encoded by the proposed framework, each one of these kernels K ji produce a spike at time t i at threshold T i such that |t i -t i | < ∆ ∀i, for some ∆ ∈ R + . • Each kernel K j ∈ K satisfies a Lipschitz type condition as follows: ∃ C ∈ R s.t. ||K j (t) -K j (t -∆t)|| 2 ≤ C|∆t|, ∀ ∆t ∈ R, ∀j. • And lastly the shifted component functions satisfy a frame bound type of condition as follows: k =i f pi (t -t i ), f p k (t -t k ) ≤ η ∀ i ∈ {1, ..., N } Then, reconstruction X * (t), resulting from the proposed framework, has a bounded noise to signal ratio. Specifically, the following inequality is satisfied: ||X(t)-X * (t)|| 2 2/||X (t)|| 2 2 ≤ (δ + C∆) (1+xmax) /(1-η) where x max is a positive number ∈ [0, N -1] that depends on the maximum overlap of the support of component functions f pi (t -t i ). Proof: A detailed proof of this theorem is provided in the appendix C.

6. EXPERIMENTS ON REAL SIGNALS

The proposed framework is general enough to apply to any class of signals. However, since the computational resources necessary to code and reconstruct video signals (function of three variablesx, y, t) would be sufficiently larger than audio signals (function of only one variable t), to demonstrate that the proposed framework can indeed be adopted in real engineering applications as a novel encoding scheme, we ran experiments on a repository of audio signals.

6.1. DATASET

We chose the Freesound Dataset Kaggle 2018 (or FSDKaggle2018 for short), an audio dataset of natural sounds posted on Kaggle referred in (7) , containing 18,873 audio files annotated with labels from Google's AudioSet Ontology (9) . For the purpose of the experiments, we ignored the labels and only focused on the sound data, since we were only interested in encoding and decoding the input signals. All audio samples in this dataset are provided as uncompressed PCM 16bit, 44.1kHz, mono audio files, with each file consisting of sound snippets of duration ranging between 300ms to 30s. In the experiment, we ran our proposed methodology over at least 1000 randomly chosen sound snippets from the samples in the dataset. For ease of computation, we kept the length of the input audio snippets to be relatively small (ideally of size less than 50ms), splitting longer signals. This choice of considering small snippets as input made the computation feasible on limited resource machines within reasonable time bounds by reducing the size of the P -matrix referred to in Equation 4. This choice is without loss of generalisation since for encoding signals of greater length, reconstruction using this framework can be done piece-wise: splitting a longer signal into smaller pieces, reconstructing piece-wise and finally stitching the reconstructed pieces together.

6.2. SET OF KERNELS

The proposed encoding technique is operational on a set of kernels, as stated in Equation 2. The first order of business was therefore the choice of a suitable set of kernels for our experiments. Since gammatone filters are widely used as a reasonable model of cochlear filters in auditory systems (14) , and mathematically, are fairly simple to represent-at n-1 e -2πbt cos(2πf t + φ)-in our experiments we chose a set of gammatone filters as our kernels (Figure 1 ). The implementation of the filterbank is similar to ( 16), and we used up to 2000 gammatone kernels whose center frequencies were uniformly spaced on the ERB scale between 20 Hz to 20 kHz. In all experiments, the kernels were normalized, and the baseline thresholds and the ahp parameters were kept the same across all kernels.

6.3. RESULTS

Following the assertion of Lemma 4, in all experiments, the baseline threshold and the absolute refractory period were kept low enough so that for each sound snippet near perfect reconstructions could be obtained at a high spike rate. A typical value of the refractory period was ≈ 5ms and the baseline threshold value was kept as low as 10 -3 . As a consequence of the corollary to Lemma 3, additional spikes did not hurt reconstruction. Experiments were conducted with varying number of kernels. Once a reconstruction at a high spike rate was attained, a greedy technique that removed spikes in order of their impact on the reconstruction was instituted to get a compressed code for each snippet. Reconstructions were then recomputed with the fewer spikes as constraints. We should emphasize here that soon as spikes are removed to get a compressed representation of the signal, signals are no longer encoded via a simple spike train representation which ideally should communicate only spike times and not their corresponding threshold values. In other words, a compressed signal representation in this scheme needs to communicate both the spike times and the threshold values because once spikes are culled the decoder cannot infer the threshold values from the spike times and the given threshold function of the neurons in equation 1. In that sense a compressed representation of a signal in this approach can be realized through marked spike trains that carry both time as well as threshold information rather than a true spike train based representation. Figure 2 demonstrates this process applied to a sample sound snippet through several stages of removal of spikes. This process was repeated over 1000 randomly chosen sound snippets from the dataset. figure demonstrates, at high spike rates nearly perfect reconstructions were obtained consistently, and even though lowering the spike rate gradually increased noise, reasonable reconstructions could be obtained at low spike rates (≈ 15DB at 25kHz on average). Since each reconstruction is calculated by solving a system of linear equations involving a P -matrix whose dimension is O(N 2 ) where N is the number of spikes under consideration, computation is fairly time consuming, and the choice of parameters, such as the length of the input snippets, the number of kernels or the threshold parameters were made to ensure feasibility of computation with available resources while maintaining efficacy of the overall reconstruction process. Since the proposed framework approximates a signal with a sparse linear combination of shifted kernels (as shown by signal class G in 4) and therefore has similarities to compressed sensing, another set of experiments were designed to compare the proposed framework with existing sparse coding techniques viz, Convolutional Matching Pursuit and Convolutional Orthogonal Matching Pursuit. Since the sparse coding techniques are computationally intensive over continuous signals, this set of experiments were restricted to only 10 gammatone kernels (for CMP and COMP all possible shifts of these 10 kernels were considered) and the experiments were run over 30 sound snippets. Our technique was applied as before, starting at a high spike rate with spikes culled gradually to achieve better compression. The results of comparison of average SNR values obtained by the techniques are shown in figure 4 . As is evident, our technique in its simplest form does slightly better than COMP up until ≈ 50kHz beyond which COMP performs better. Our technique was ≈ 1.2 times faster than COMP on an i7 hexa-core processor for these experiments and should naturally scale much better than COMP since the proposed technique does not involve repeated computation of inner products. The implementation details of the experiment can be found in our simulation code available at: http://bitbucket.org/crystalonix/oldsensorycoding.git.

7. RELATION TO PRIOR WORK

The problem of representing continuous time signals using ensembles of spike trains has a rich history both in the neuromorphic computing community as well as in computational neuroscience. Most such work rely on classical Nyquist-Shannon sampling theory wherein signals are assumed to be band-limited and reconstruction is realized through sinc filters, albeit via the spike trains. Among existing spike based coding techniques, (4) has explored the spike generating mechanism of the neuron as an oversampling, noise shaping analog-to-digital converter, and (11) integrate-threshold-reset framework that results in spike trains, which leverages differential pulse-code modulation (DPCM) at its core. In our case the input signals considered are elements of L 2 (R) with finite rate of innovation, the reconstruction error of which tends to zero as the signal approaches the span of appropriately chosen kernel functions which are again a generic class of continuous functions. Our work differs from existing approaches in that using our scheme signal reconstruction is realized via a sparse set of idealized spikes, whereas in the former case signals need to be sampled at a rate higher than the Nyquist rate and reconstruction implicitly relies on sinc interpolation. Since the class of signals G considered in our analysis takes the form X(t) = N p=1 α p K jp (t p -t), our problem formulation is comparable to that of convolution sparse coding or compressive sensing deconvolution, which, in general, is a hard problem and hence is solved under certain relaxed criteria (3) or by using certain greedy heuristics (12) . Our framework provides a corresponding approximate solution to the general problem leveraging a biological thresholding scheme to produce spikes simultaneously at a high rate and then gradually removing unimportant spikes. The proposed technique, therefore, is a novel alternative to existing solutions.

8. CONCLUSION

We have proposed a framework that codes for continuous time signals using an ensemble of spike trains in a manner that is very different from the pulse-density paradigm. The framework applies to all finite rate of innovation signals, which is a very large class that includes bandlimited signals. Although approximate reconstruction is computationally more expensive than interpolation with a sinc kernel (as in Nyquist Shannon), it is feasible, unlike in the case of compressed sensing where the generic case is NP-hard. Fortuitously, the system of linear equations is best solved using the conjugate gradient method since P is a symmetric positive semidefinite matrix. The excellent reconstruction results we have obtained with 2000 kernels-with no parameter tuning-is a testament to the potential of the technique. The human cochlear nerve, in comparison, contains axons of ≈ 50, 000 spiral ganglion cells (corresponding, therefore, to 50,000 kernels). As our theorems show, reconstruction with such a large set of kernels is guaranteed to be even better, albeit at a higher computational cost. • There is at least one kernel K ji from the bag of encoding kernels K, such that the L 2 - distance of f pi (t) from K ji (t) is bounded. Formally, ∃ δ ∈ R + s.t. ||f pi (t) -K ji (t)|| 2 < δ ∀ i ∈ {1, ..., N }. • When X(t) is encoded by the proposed framework, each one of these kernels K ji produce a spike at time t i at threshold T i such that |t i -t i | < ∆ ∀i, for some ∆ ∈ R + . • Each kernel K j ∈ K satisfies a Lipschitz type of condition as follows: ∃ C ∈ R s.t. ||K j (t) -K j (t -∆t)|| 2 ≤ C|∆t|, ∀ ∆t ∈ R, ∀j. • And lastly the shifted component functions satisfy a frame bound type of condition as follows: k =i f pi (t -t i ), f p k (t -t k ) ≤ η ∀ i ∈ {1, ..., N } Then, reconstruction X * (t), resulting from the proposed framework, has a bounded noise to signal ratio. Specifically, the following inequality is satisfied: ||X(t)-X * (t)|| 2 2/||X (t)|| 2 = α T F α -α T F K a (denote a = [a 1 , a 2 , ..., a N ] T , α = [α 1 , α 2 , ..., α N ] T , F = [F ik ] N ×N , an N × N matrix, where F ik = f i (t -t i ), f k (t -t k ) and F K = [(F K ) ik ] N XN where (F K ) ik = f i (t -t i ), K j k (t -t k ) ) But using the results of Lemma1 a can be written as: a = P -1 T where P = [P ik ] N XN , P ik = K ji (t -t i ), K j k (t -t k ) And, T = [T i ] N ×1 where T i = X(t), K ji (t -t i ) = Σ N k=1 α k f k (t -t k ), K ji (t -t i ) = F T K α =⇒ a = P -1 F T K α Plugging this expression of a in equations 9 we get, ||X(t) -X hyp (t)|| 2 2 = α T F α -α T F K P -1 F T K α (10) But,(F K ) ik = f i (t -t i ), K j k (t -t k ) = K ji (t -t i ), K j k (t -t k ) -K ji (t -t i ) -f i (t -t i ), K j k (t -t k ) = (P ) ik -(E K ) ik (11) (denoting E K = [(E K ) ik ] N ×N , where (E K ) ik = K ji (t -t i ) -f i (t -t i ), K j k (t -t k ) ) Also, (F ) ik = f i (t -t i ), f k (t -t k ) = f i (t -t i ) -K ji (t -t i ) + K ji (t -t i ), f k (t -t k ) -K j k (t -t k ) + K j k (t -t k ) = (E) ik -(E K ) ik -(E K ) ki + (P ) ik ) Combining 10, 11 and 12 we get, ||X(t) -X hyp (t)|| 2 2 = α T F α -α T F K P -1 F T K α = α T Eα -α T E K α -α T E T K α + α T P α -α T P α + α T E K α + α T E T K α -α T E K P -1E T K α = α T Eα -α T E K P -1 E T K α ≤ α T Eα (Since, P is an SPD matrix, α T E K P -1 E T K α > 0) We seek for a bound for the above expression. For that we observe the following: |(E) ik | = | f i (t -t i ) -K ji (t -t i ), f k (t -t k ) -K j k (t -t k ) | = ||f i (t -t i ) -K ji (t -t i )|| 2 ||f k (t -t k ) -K j k (t -t k )|| 2 . x ik (where x ik ∈ [0, 1]. We also note that x ik is close to 0 when there is not much overlap in the support of the two components and their corresponding fitting kernels.)  ≤



≤ (δ + C∆) (1+xmax) /(1-η)where x max is a positive number ∈ [0, N -1] that depends on the maximum overlap of the support of component functions f pi (t -t i ).



Figure 1: Five sample gammatone filters used as kernels with center frequencies located at approximately (a) 82 Hz, (b) 118 Hz, (c) 158 Hz, (d) 203 and (e) 253 Hz, respectively.

Figure2: Reconstruction of a sample snippet in an experiment with 2000 kernels. (a) the input snippet, extended with zero padding to accommodate future spikes; (b) Spike trains of all kernels obtained at a low threshold and refractory period displayed as a raster plot with time (y-axis) and index of the gammatone kernels in increasing order of center frequencies (x-axis), and (c) the resulting reconstruction. This is an almost perfect reconstruction with a 32.7DB SNR at 1146 kHz spike rate of the ensemble. Subsequently spikes were deleted greedily (see text) to obtain reconstructions at lower spike rates. (d) Resulting spike pattern of the ensemble at 88.4kHz and (e) resulting reconstruction with SNR 19.7DB. Likewise, (f) spike pattern of the ensemble at 17.64kHz and (g) resulting reconstruction with SNR of 9DB. Time scale on left apply to all plots. It is noteworthy that in (d)&(f) spikes of the higher frequency kernels ended up deleted in the culling process.

Figure 3: Comprehensive result of an experiment with 2000 kernels reconstructing 1000 sound snippets. The graphs show scatter plots of reconstructions (each dot represents a reconstruction) with spike-rate of the ensemble (x-axis) against corresponding SNR value of the reconstructions (y-axis) (a) Scatter plot of reconstructions up to 441 kHz spike rate and (b) zoom in of the same graph for up to 89kHz. Solid line represents the average SNR of all reconstructions at a given spike rate.

Figure 4: Average SNR values plotted as functions of sampling rate (spike rate for our framework and number of atoms chosen per second of the snippet duration in case of COMP/CMP).

(||(f i (t -t i ) -K ji (t -t i )||+ ||K ji (t -t i ) -K ji (t -t i ))||). (||f k (t -t k ) -K j k (t -t k )|| + ||K j k (t -t k ) -K j k (t -t k )||).x ik =⇒ (E) ik ≤ x ik .(δ + C∆) 2(14)Using Gershgorin circle theorem, the maximum eigen value of E:Λ max (E) ≤ max i ((E) ii + Σ k =i |(E) ik |) ≤ (δ + C∆) 2 (x max + 1)(Using 14) (15) (where x max ∈ [0, N -1] is a positive number that depends on the maximum overlap of the supports of the component signals and their fitting kernels.) Similarly, the minimum eigen value of F is:Λ min (F ) = min i ((F ) ii -Σ i =k | f pi (t -t i ), f p k (t -t k ) |) ≥ 1 -η (16) (By assumption Σ i =k | < f pi (t -t i ), f p k (t -t k ) > | ≤ η )Combining the results from 13, 15 and 16 we get:||X(t)-X hyp (t)|| 2 /||X(t)|| 2 ≤ α T Eα/α T F α ≤ Λ max (E)/Λ min (F ) ≤ (δ + C∆) 2 (x max + 1)/(1 -η)(17) Finally using 8 we conclude, ||X(t)-X * (t)|| 2 /||X(t)|| 2 ≤ ||X(t)-X hyp (t)|| 2 /||X(t)|| 2

A PROOF LEMMA 2:

Lemma 2: Let equation 5 resulting from the optimization problem 2 have multiple solutions. Consider any two different solutions for α, namely α 1 and α 2 , and hence the corresponding reconstructions are given by X 1 (t) = N i=1 α 1i K ji (t i -t) and X 2 (t) = N i=1 α 2i K ji (t i -t), respectively. Then X 1 = X 2 . Proof: Let S be the subspace of L 2 -functions spanned by {K ji (t i -t)|i ∈ 1, 2, ..., N } with the standard inner product (by assumption each of {K ji (t i -t)|i ∈ 1, 2, ..., N } are L 2 -functions and hence S is a subspace of the larger space of all L 2 -functions). Clearly S is a Hilbert space with dim(S) ≤ N . Hence there exists {e 1 , ..., e M }, an orthonormal basis of S (where M ≤ N ). Assume that the hypothesis is false, i.e. X 1 = X 2 . This implies that ∃ a i s and b i s such thatwhere not all a i s are same as the corresponding b i s.-which contradicts the hypothesis.

B PROOF OF LEMMA 4

Lemma 4: Let X(t) be an input signal. Let K p be a kernel for which we want to generate a spike at time t p . Let the inner product X(t), K p (t p -t) = I p . Then, if the baseline threshold of the kernel K p is C p ≤ I p and the absolute refractory period is δ as modeled in Equation 1, the kernel K p must produce a spike in the interval [t p -δ, t p ] according to the threshold model defined in Equation 1.Proof: The lemma is easily proved by contradiction. Assume that prior to and including time t p , the last spike produced by kernel K p was at time t l . Also assume that t l < t p -δ so that there is no spike in the interval 

C PROOF OF APPROXIMATE RECONSTRUCTION THEOREM

Approximate Reconstruction Theorem: Let the following assumptions be true:• X(t), the input signal to the proposed framework, can be written as a linear combination of some component functions as below:where α i are bounded real coefficients, the component functions f pi (t) are chosen from a possibly infinite set H = {f i (t)|i ∈ Z + , ||f i (t)|| = 1} of functions of unit L 2 norm with compact support, and the corresponding t i ∈ R + are chosen to be bounded arbitrary time shifts of the component functions so that overall signal has a compact support in [0, T ] for some T ∈ R + and thus the input signal still belongs to the same class of signals F, as defined in section 2.Proof of the Theorem: By hypothesis each kernel K ji produces a spike at time t i ∀i ∈ {1, ..., N } . Let us call these spikes as fitting spikes. But the coding model might generate some other spikes against X(t) too. Other than the set of fitting spikes {(t i , K ji )|i ∈ {1, ..., N }}, let {( tk , K jk )|k ∈ {1, ..., M }} denote those extra set of spikes that the coding model produces for input X(t) against the kernel bag K and call these extra spikes as spurious spikes. Here, M is the number of spurious spikes. By Lemma1 X * (t) can be represented as below:where α i and αk are real coefficients whose values can be formulated again from Lemma1. Let T i be the thresholds at which kernel K ji produced the spike at time t i as given in the hypothesis. Hence for generation of the fitting spikes the following condition must be satisfied:Consider a hypothetical signal X hyp (t) defined by the equations below:Clearly this hypothetical signal X hyp (t) can be deemed as if it is the reconstructed signal where we are only considering the fitting spikes and ignoring all spurious spikes. Since, X hyp (t) lies in the span of the shifted kernels used in reconstruction of X(t) using Lemma 3 we may now write:||X(t) -X hyp (t)|| 2 2 = X(t) -X hyp (t), X(t) -X hyp (t) = X(t) -X hyp (t), X(t) -X(t) -X hyp (t), X hyp (t) = X(t) -X hyp (t), X(t) -Σ N i=1 a i X(t) -X hyp (t), K ji (t -t i ) = ||X(t)|| 2 2 -X(t), X hyp (t) (Since by construction X hyp (t), K ji (t -t i ) = T i ∀i ∈ {1...N })

