TRAINING LANGUAGE MODELS TO SUMMARIZE NARRATIVES IMPROVES BRAIN ALIGNMENT

Abstract

Building systems that achieve a deeper understanding of language is one of the central goals of natural language processing (NLP). Towards this goal, recent works have begun to train language models on narrative datasets which require extracting the most critical information by integrating across long contexts. However, it is still an open question whether these models are learning a deeper understanding of the text, or if the models are simply learning a heuristic to complete the task. This work investigates this further by turning to the one language processing system that truly understands complex language: the human brain. We show that training language models for deeper narrative understanding results in richer representations that have improved alignment to human brain activity. We further find that the improvements in brain alignment are larger for character names than for other discourse features, which indicates that these models are learning important narrative elements. Taken together, these results suggest that this type of training can indeed lead to deeper language understanding. These findings have consequences both for cognitive neuroscience by revealing some of the significant factors behind brain-NLP alignment, and for NLP by highlighting that understanding of long-range context can be improved beyond language modeling.

1. INTRODUCTION

Language models trained to predict the next word over millions of text documents have led to large improvements on a range of benchmarks in natural language processing (NLP). However, researchers have shown that NLP models may rely on shallow heuristics to perform these tasks rather than on a deeper language understanding (McCoy et al., 2019; Min et al., 2020; Linzen, 2020) . To build systems with deeper language understanding, recent work proposed to train language models on narrative datasets which require extracting the most critical information by integrating across long contexts (Kryscinski et al., 2021; Sang et al., 2022; Kočiskỳ et al., 2018) . Does this approach truly lead to a deeper understanding of language? We investigate this by turning to the one language processing system that truly understands complex language: the human brain. Prior work has used human brain recordings to interpret representations of pretrained language models (LMs) (Søgaard, 2016; Toneva & Wehbe, 2019; Abdou et al., 2021) . They evaluate how well representations obtained from a language model can predict representations sampled from the human brain during language comprehension via a brain imaging device, such as functional magnetic resonance imaging (fMRI). If this prediction performance, also known as brain alignment, is determined to be significant via a statistical test, then the LM and specific brain location or timepoint are thought to significantly align in their representations of language. Using these methods, researchers have shown that pretrained language models significantly predict large parts of the brain regions that are thought to underlie language comprehension (Wehbe et al., 2014b; Jain & Huth, 2018; Toneva & Wehbe, 2019; Caucheteux & King, 2022; Schrimpf et al., 2021; Goldstein et al., 2022) . In this work, we draw insights from the human brain to study whether language models are truly learning deeper language understanding. We analyze the effect of training LMs for narrative summarization on their alignment with fMRI recordings of human subjects reading a book chapter. We specifically investigate 4 pretrained language models (i.e., "base models") and 4 corresponding models obtained by training the base models on the BookSum dataset (Kryscinski et al., 2021) to improve the base language model's narrative understanding (i.e., "booksum models"). The Book-Sum dataset was selected because it is a summarization dataset that requires understanding complex interactions across long narratives. The 4 models were selected because their architectures were designed to integrate information across long contexts. We evaluate the alignment of the base and booksum models with fMRI recordings of 8 participants reading a chapter of a popular book wordby-word, made publicly available by Wehbe et al. (2014a) . This dataset was chosen because it is one of the largest datasets of participants processing a narrative story (5176 words which corresponds to approximately 1300 samples of fMRI recordings per participant). Our main contributions are as follows: 1. In Section 4, we show that training language models for deeper narrative understanding improves alignment to human brain activity. Also, when increasing the number of words fed to the models, up to 500 words, brain alignment increases. Lastly, for each model, we identify the layers where these improvements in brain alignment occur. 2. In Section 5, we show that improved brain alignment in Section 4 is not due to improved language modeling (LM) ability, a possible confounding factor. By disentangling LM ability's contribution to brain alignment, we present evidence that BookSum-trained models develop deeper language understanding. 3. In Section 6, we present a simple interpretability approach to study what brain-relevant information is gained by language models after training for deeper language understanding. Our results reveal that these models are learning richer representations across all tested discourse features (Characters, Emotions, Motions). Furthermore, they learn more about Characters than Emotions and Motions. This indicates that discourse features are a promising dimension to study brain alignment and deep language understanding. Combined, our contributions from Sections 4, 5, and 6 present evidence that models trained to summarize narratives indeed develop deeper language understanding. The first reason is that improved alignment to human brains' deep understanding of characters, emotions and motions suggests the model has developed richer representations of these entities and concepts. Second, we focus on brain regions suggested by previous research to underlie language comprehension in humans. Hence, improved brain alignment is not spuriously related to non-language brain activities. Third, we show that brain alignment improves only when we provide longer input contexts (20 to 1000 words) to the LMs, which may be important for deep contextual understanding.

2. RELATED WORK ON BRAINS AND LANGUAGE

Our work relates to a growing body of research on disentangling the contributions of different types of information towards brain alignment. Toneva et al. (2022a) present an approach to disentangle supra-word meaning from lexical meaning in language models (LMs) and show that supra-word meaning is predictive of fMRI recordings in two language regions (anterior and posterior temporal lobes). Caucheteux et al. ( 2021) and Reddy & Wehbe (2021) disentangle alignment due to syntactic and semantic processing. Toneva et al. (2022b) examine if LM representations align with different language processing regions in different ways. Also, researchers have suggested that one contributor to the alignment is the LM's ability to predict the next word, with a positive relationship between next-word prediction ability and brain alignment across LMs (Schrimpf et al., 2021; Goldstein et al., 2022) . However, more recent work shows that no simple relationship exists, and language modeling loss is not a perfect predictor of brain alignment (Pasquiou et al., 2022; Antonello & Huth, 2022) . Merlin & Toneva (2022) introduced perturbations to disentangle alignment due to next word prediction and semantic knowledge. Our work contributes to this research area by disentangling the contributions of LM ability from deep language understanding towards brain-NLP alignment. Some works investigated the alignment of fine-tuned language models with brain recordings. Schwartz et al. ( 2019) finetuned a pretrained BERT to predict fMRI and MEG recordings of people reading a book chapter, leading to improved prediction of previously unseen brain recordings, specifically in regions known to support language processing. However, it is not clear what information had been induced in the fine-tuned BERT model that contributed to improved brain alignment.



Code available at https://github.com/awwkl/brain language summarization.

