A PROBABILISTIC FRAMEWORK FOR TASK-ALIGNED INTRA-AND INTER-AREA NEURAL MANIFOLD ESTIMATION

Abstract

Latent manifolds provide a compact characterization of neural population activity and of shared co-variability across brain areas. Nonetheless, existing statistical tools for extracting neural manifolds face limitations in terms of interpretability of latents with respect to task variables, and can be hard to apply to datasets with no trial repeats. Here we propose a novel probabilistic framework that allows for interpretable partitioning of population variability within and across areas in the context of naturalistic behavior. Our approach for task aligned manifold estimation (TAME-GP) explicitly partitions variability into private and shared sources which can themselves be subdivided in task-relevant and task irrelevant components, uses a realistic Poisson noise model, and introduces temporal smoothing of latent trajectories in the form of a Gaussian Process prior. This TAME-GP graphical model allows for robust estimation of task-relevant variability in local population responses, and of shared co-variability between brain areas. We demonstrate the efficiency of our estimator on within model and biologically motivated simulated data. We also apply it to several datasets of neural population recordings during behavior. Overall, our results demonstrate the capacity of TAME-GP to capture meaningful intra-and inter-area neural variability with single trial resolution.

1. INTRODUCTION

Systems neuroscience is gradually shifting from relatively simple and controlled tasks, to studying naturalistic closed-loop behaviors where no two observations (i.e.,"trials") are alike (Michaiel et al., 2020; Noel et al., 2021) . Concurrently, neurophysiological techniques are advancing rapidly (Stevenson & Kording, 2011; Angotzi et al., 2019; Boi et al., 2020) to allow recording from an ever-increasing number of simultaneous neurons (i.e., "neural populations") and across multiple brain areas. These trends lead to a pressing need for statistical tools that compactly characterize the statistics of neural activity within and across brain regions. Dimensionality reduction techniques are a popular tool for interrogating the structure of neural responses (Cunningham & Byron, 2014) . However, as neural responses are driven by increasingly complex task features, the main axes of variability extracted using these techniques often intermix task and nuisance variables, making them hard to interpret. Alternatively, dimensionality reduction techniques that do allow for estimating task-aligned axes of variability (Brendel et al., 2011; Semedo et al., 2019; Keeley et al., 2020; Glaser et al., 2020; Hurwitz et al., 2021) , do not apply to communication between brain areas, and/or necessitate trial repeat structure that does not occur in natural behavior. Here, we introduce a probabilistic approach for learning interpretable task-relevant neural manifolds that capture both intra-and inter-area neural variability with single trial resolution. Task Aligned Manifold Estimation with Gaussian Process priors (TAME-GP) incorporates elements of demixed PCA (dPCA; Machens (2010); Kobak et al. ( 2016)) and probabilistic canonical correlation analysis (pCCA; Bach & Jordan ( 2005))foot_0 into a graphical model that additionally includes biologically relevant Poisson noise. The model uses a Gaussian Process (GP) prior to enforce temporal smoothness, which allows for robust reconstruction of single-trial latent dynamics (see Damianou et al. (2016) for a similar approach using Gaussian observation noise). We demonstrate the robustness and flexibility of TAME-GP in comparison to alternative approaches using synthetic data and neural recordings from rodents and primates during naturalistic tasks. This reveals TAME-GP as a valuable tool for dissecting sources of variability within and across brain areas during behavior. Related work. Dimensionality reduction is usually achieved by unsupervised methods that identify axes of maximal variability in the data, such as PCA. In neuroscience, this is often accompanied by additional smoothing over time reflecting the underlying neural dynamics (e.g., Gaussian process factor analysis (GPFA) (Yu et al., 2008) ; see GP-LVM (Ek & Lawrence, 2009) for similar approaches outside of neuroscience). This low dimensional projection is followed by a post hoc interpretation of latents in the context of behavioral variables, often by visualization. Alternative approaches such as dPCA (Machens, 2010; Kobak et al., 2016) explicitly look for axes of neural variability that correlate with task variables of interest (see also Zhou & Wei (2020) for a nonlinear version). However, these require partitioning trials into relatively few categories, based on experimental conditions or behavioral choices and averaging within conditions. This makes them unusable in naturalistic tasks where a single trial treatment is needed. Similarly, SNP-GPFA (Keeley et al., 2020) can partition (multi-region) neural activity into 'shared signal' and 'private noise' components, but only using data with stimulus repeats. Under 'no-repeat' conditions, pCCA (Bach & Jordan, 2005) can find subspaces of maximal cross-correlation between linear projections of task variables and neural responses (under gaussian noise assumptions), without the need for a priori grouping of trials by experimental condition or choice. This approach can also be applied for determining shared axes of co-variability across areas, an analog for communication subspaces (Semedo et al., 2019) . Nonetheless, its noise model assumptions are mismatched to neural data. More fundamentally, pCCA only considers pairwise relationships, preventing a joint multi-area and task variables analysis. Overall, existing approaches come with practical limitations and do not directly address the routing of task-relevant information across brain areas.

2. TASK-ALIGNED MANIFOLD ESTIMATION WITH GP PRIORS (TAME-GP)

In its most general form, the graphical model of TAME-GP models a set of spike-count population responses x (j) from up to n different areas,foot_1 together with task variable of interest y (Fig. 1A ). The neural responses are driven by a set of n + 1 low-dimensional latent variables z (j) . Specifically, the responses of neuron i in area j arise as a linear combination of private latent variability z (j) and shared latents z (0) , which reflect task interpretable aspects of the underlying dynamics, with Poisson noise and an exponential link function: p x (j) i |z (0:n) = Poisson exp W (0,j) i z (0) + W (j,j) i z (j) + h (j) i , with parameters W (0/j,j) and h (j) . To make latents interpretable with respect to task variables y, we adapt a probabilistic framing of CCA (Bach & Jordan, 2005) to introduces dependencies between any of the latents z (k) ), which could be private or shared across areas, and y: p y|z (0) = N y; Cz (0) + d, Ψ , with parameters C, d, Ψ. (2)



See Appendix A.1 for background on probabilistic PCA, CCA and their relation to TAME-GP. Variables x(j) , y are tensors with dimensions corresponding to 1) an area-specific number of neurons/ task variable dimension, 2) time within trial, and 3) trial index. We make indices explicit only where strictly needed.

