ABSTRACTING INFLUENCE PATHS FOR EXPLAINING (CONTEXTUALIZATION OF) BERT MODELS

Abstract

While "attention is all you need" may be proving true, we do not yet know why: attention-based transformer models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subjectverb number agreement (SVA) is uncertain. We introduce multi-partite patterns, abstractions of sets of paths through a neural network model. Patterns quantify and localize the effect of an input concept (e.g., a subject's number) on an output concept (e.g. corresponding verb's number) to paths passing through a sequence of model components, thus surfacing how BERT contextualizes information. We describe guided pattern refinement, an efficient search procedure for finding sufficient and sparse patterns representative of concept-critical paths. We discover that patterns generate succinct and meaningful explanations for BERT, highlighted by "copy" and "transfer" operations implemented by skip connections and attention heads, respectively. We also show how pattern visualizations help us understand how BERT contextualizes various grammatical concepts, such as SVA across clauses, and why it makes errors in some cases while succeeding in others.

1. INTRODUCTION

Recent advancements in NLP have been spurred by contextualized representations created in deep neural models such as BERT (Devlin et al., 2019) . These contextualized representations, which are designed to be sensitive to the context in which they appear (Ethayarajh, 2019) , are also shown to capture many grammatical concepts (Lin et al., 2019; Tenney et al., 2019a) , including subject-verb agreements(SVA) and reflexive anaphora(RA) (Goldberg, 2019) . However, the exact mechanism of contextualization in BERT, i.e., the process of developing contextualized representations from representations of individual input words in the sentence context, remains unclear. For example, in the sentence the pilots that the architect likes is/are short, choosing the correct the verb is over are to agree with the subject requires contextualizing the verb with plurality information of the subject. In this paper, we answer the central question: How is contextualization realized in BERT for grammatical concepts such as SVA and RA? Specifically, can we identify sub-components of BERT that are a) sufficient for representing those concepts but also b) sparse enough to legibly show how BERT contextualizes the concepts across layers and whether the contextualization follows correct grammatical rules? Prior works on explaining contextualization in BERT rely on the analysis of layer representations and attention components. Representation analyses, either by training a probing classifier (Lin et al., 2019; Tenney et al., 2019a) , or finding parse trees embedded in the representations (Hewitt & Manning, 2019; Reif et al., 2019) , demonstrate that relevant linguistic concepts are associated with the activations of BERT components (i.e. subject's number associated with the activations of a certain head at a certain layer, or subject's representation closer to that of the verb's under certain transformations), but do not tell us how representations come about inside the model. Meanwhile, inspection of attention weights as indicators of the flow of information between BERT layers (Clark et al., 2019) , requires subjective inference of relevant function (i.e. inference that a certain head may be involved because high attention weights between cells at the subject and cells at the verb), which are found to be problematic in other contexts (Brunner et al., 2020; Jain & Wallace, 2019) . Analysis of attention further disregards the role of skip connections that do not involve attention at all. Neither approach allows us to track a concept as a causal chain from input to output or to distinguish

