TRAINING LANGUAGE MODELS TO SUMMARIZE NARRATIVES IMPROVES BRAIN ALIGNMENT

Abstract

Building systems that achieve a deeper understanding of language is one of the central goals of natural language processing (NLP). Towards this goal, recent works have begun to train language models on narrative datasets which require extracting the most critical information by integrating across long contexts. However, it is still an open question whether these models are learning a deeper understanding of the text, or if the models are simply learning a heuristic to complete the task. This work investigates this further by turning to the one language processing system that truly understands complex language: the human brain. We show that training language models for deeper narrative understanding results in richer representations that have improved alignment to human brain activity. We further find that the improvements in brain alignment are larger for character names than for other discourse features, which indicates that these models are learning important narrative elements. Taken together, these results suggest that this type of training can indeed lead to deeper language understanding. These findings have consequences both for cognitive neuroscience by revealing some of the significant factors behind brain-NLP alignment, and for NLP by highlighting that understanding of long-range context can be improved beyond language modeling.

1. INTRODUCTION

Language models trained to predict the next word over millions of text documents have led to large improvements on a range of benchmarks in natural language processing (NLP). However, researchers have shown that NLP models may rely on shallow heuristics to perform these tasks rather than on a deeper language understanding (McCoy et al., 2019; Min et al., 2020; Linzen, 2020) . To build systems with deeper language understanding, recent work proposed to train language models on narrative datasets which require extracting the most critical information by integrating across long contexts (Kryscinski et al., 2021; Sang et al., 2022; Kočiskỳ et al., 2018) . Does this approach truly lead to a deeper understanding of language? We investigate this by turning to the one language processing system that truly understands complex language: the human brain. Prior work has used human brain recordings to interpret representations of pretrained language models (LMs) (Søgaard, 2016; Toneva & Wehbe, 2019; Abdou et al., 2021) . They evaluate how well representations obtained from a language model can predict representations sampled from the human brain during language comprehension via a brain imaging device, such as functional magnetic resonance imaging (fMRI). If this prediction performance, also known as brain alignment, is determined to be significant via a statistical test, then the LM and specific brain location or timepoint are thought to significantly align in their representations of language. Using these methods, researchers have shown that pretrained language models significantly predict large parts of the brain regions that are thought to underlie language comprehension (Wehbe et al., 2014b; Jain & Huth, 2018; Toneva & Wehbe, 2019; Caucheteux & King, 2022; Schrimpf et al., 2021; Goldstein et al., 2022) . In this work, we draw insights from the human brain to study whether language models are truly learning deeper language understanding. We analyze the effect of training LMs for narrative summarization on their alignment with fMRI recordings of human subjects reading a book chapter. Code available at https://github.com/awwkl/brain language summarization. 1

