SYNTACTIC REPRESENTATIONS IN THE HUMAN BRAIN: BEYOND EFFORT-BASED METRICS

Abstract

We are far from having a complete mechanistic understanding of the brain computations involved in language processing and of the role that syntax plays in those computations. Most language studies do not computationally model syntactic structure and most studies that do model syntactic processing use effort-based metrics. These metrics capture the effort needed to process the syntactic information given by every word (Brennan et al., 2012; Hale et al., 2018; Brennan et al., 2016). They can reveal where in the brain syntactic processing occurs, but not what features of syntax are processed by different brain regions. Here, we move beyond effort-based metrics and propose explicit features capturing the syntactic structure that is incrementally built while a sentence is being read. Using these features and functional Magnetic Resonance Imaging (fMRI) recordings of participants reading a natural text, we study the brain representation of syntax. We find that our syntactic structure-based features are better than effort-based metrics at predicting brain activity in various parts of the language system. We show evidence of the brain representation of complex syntactic information such as phrase and clause structures. We see that regions well-predicted by syntactic features are distributed in the language system and are not distinguishable from those processing semantics. Our results call for a shift in the approach used for studying syntactic processing.

1. INTRODUCTION

Neuroscientists have long been interested in how the brain processes syntax. To date, there is no consensus on which brain regions are involved in processing it. Classically, only a small number of regions in the left hemisphere were thought to be involved in language processing. More recently, the language system was proposed to involve a set of brain regions spanning the left and right hemisphere (Fedorenko & Thompson-Schill, 2014) . Similarly, some findings show that syntax is constrained to specific brain regions (Grodzinsky & Friederici, 2006; Friederici, 2011) , while other findings show syntax is distributed throughout the language system (Blank et al., 2016; Fedorenko et al., 2012; 2020) . The biological basis of syntax was first explored through studies of the impact of brain lesions on language comprehension or production (Grodzinsky, 2000) and later through non-invasive neuroimaging experiments that record brain activity while subjects perform language tasks, using methods such as functional Magnetic Resonance Imaging (fMRI) or electroencephalography (EEG). These experiments usually isolate syntactic processing by contrasting the activity between a difficult syntactic condition and an easier one and by identifying regions that increase in activity with syntactic effort (Friederici, 2011) . An example of these conditions is reading a sentence with an object-relative clause (e.g. "The rat that the cat chased was tired"), which is more taxing than reading a sentence with a subject-relative clause (e.g. "The cat that chased the rat was tired"). In the past decade, this approach was extended to study syntactic processing in naturalistic settings such as when reading or listening to a story (Brennan et al., 2012; Hale et al., 2018; Willems et al., 2015) . Because such complex material is not organized into conditions, neuroscientists have instead devised effort-based metrics capturing the word-by-word evolving syntactic demands required to understand the material. Brain regions with activity correlated with those metrics are suggested to be involved in processing syntax. a sentence's syntactic representation and estimate the number of syntactic operations performed at each word. Node Count is popular such metric. It relies on constituency trees (structures that capture the hierarchical grammatical relationship between the words in a sentence). While traversing the words of the sentence in order, subtrees of the constituency tree get completed; Node Count refers to the number of such subtrees that get completed at each word, effectively capturing syntactic load or effort. Brennan et al. (2012) use Node Count to support the theory that the Anterior Temporal Lobe (ATL) is involved in syntactic processing. Another example of an effort-based metric is given by an EEG study by Hale et al. (2018) . They show that parser action count (the number of possible actions a parser can take at each word) is predictive of the P600, a positive peak in the brain's electrical activity occurring around 600ms after word onset. The P600 is hypothesized to be driven by syntactic processing (to resolve incongruencies), and the results of Hale et al. (2018) align with this hypothesis. Though effort-based metrics are a good proposal for capturing the effort involved in integrating a word into the syntactic structure of a sentence, they are not reflective of the entire syntactic information in play. Hence, these metrics cannot be used to study the brain representation of syntactic constructs such as nouns, verbs, relationships and dependencies between words, and the complex hierarchical structure of phrases and sentences. Constituency trees and dependency trees are the two main structures that capture a sentence's syntactic structure. Constituency trees are derived using phrase structure grammars that encode valid phrase and clause structure (see Figure 1 (A) for an example). Dependency trees encode relations between pairs of words such as subject-verb relationships. We use representations derived from both types of trees. We derive word level dependency (DEP) labels from dependency trees, and we focus on encoding the structural information given by constituency trees since we want to analyze if the brain builds hierarchical representations of phrase structure. We characterize the syntactic structure inherent in sentence constituency trees by computing an evolving vector representation of the syntactic structure processed at each word using the subgraph embedding algorithm by Adhikari et al. (2018) . We show that our syntactic structure embeddings -along with other simpler syntactic structure embeddings built using conventional syntactic features such as part-of-speech (POS) tags and DEP tags -are better than effort-based metrics at predicting the fMRI data of subjects reading text. This indicates that representations of syntax, and not just syntactic effort, can be observed in fMRI. We also address the important question of whether regions that are predicted by syntactic features are selective for syntax, meaning they are only responsive to syntax and not to other language properties such as semantics. To answer this question, we model the semantic properties of words using a contextual word embedding space (Devlin et al., 2018) . We find that regions that are predicted by syntactic features are also predicted by semantic features and thus are not selective for syntax.

Scientific questions

We ask three main questions: • How can scientists construct syntactic structure embeddings that capture the syntactic structure inherent in phrases and sentences? • Are these embeddings better at predicting brain activity compared to effort-based metrics when used as inputs to encoding models? • Which brain regions are involved in processing complex syntactic structure and are they different from regions involved in semantic processing? Contributions We make four main contributions: • We propose a subgraph embeddings-based method to model the syntactic structure inherent in phrases and sentences. • We show that effort-based metrics can be complemented by syntactic structure embeddings which can predict brain activity to a larger extent than effort-based metrics. • Using our syntactic structure embeddings, we find some evidence supporting the hypothesis that the brain processes and represents complex syntactic information such as phrase and clause structure. • We find evidence supporting the existing hypothesis that syntactic processing appears to be distributed in the language network in regions that are not selective for syntax.

