INTERPRETABLE SEQUENCE CLASSIFICATION VIA PROTOTYPE TRAJECTORY

Abstract

We propose a novel interpretable recurrent neural network (RNN) model, called ProtoryNet, in which we introduce a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most similar prototype for each sentence in a text sequence and feeding an RNN backbone with the proximity of each of the sentences to the prototypes. The RNN backbone then captures the temporal pattern of the prototypes, to which we refer as prototype trajectories. The prototype trajectories enable intuitive, fine-grained interpretation of how the model reached to the final prediction, resembling the process of how humans analyze paragraphs. Experiments conducted on multiple public data sets reveal that the proposed method not only is more interpretable but also is more accurate than the current state-of-the-art prototype-based method. Furthermore, we report a survey result indicating that human users find ProtoryNet more intuitive and easier to understand, compared to the other prototype-based methods.

1. INTRODUCTION

Figure 1 : Prototype trajectory-based explanation. Recurrent neural networks (RNN) have been widely adopted in natural language processing. RNNs achieve the state-ofthe-art performance by utilizing the contextual information in a "memory" mechanism modeled via hidden/cell states. Albeit the benefit, however, the memory mechanism obstructs the interpretation of model decisions: as hidden states are carried over time, various pieces of information get intertwined across time steps, making RNN models a "black box" inherently. The black box nature of RNNs has motivated a body of research works aiming to achieve the interpretability. One approach is to leverage certain architecture design in the DNNs such as Attention-based methods. As will be discussed in Section 2, the attention-based approaches (Karpathy et al., 2015; Strobelt et al., 2017; Choi et al., 2016; Guo et al., 2018) visualize the RNN using the attention mechanism, which weighs the importance of each hidden state element. However, while a few of them could be quite illuminative, the attention weights are, in fact, not always intelligible. Rather, they often turn out to be a gibberish collection of numbers that does not possess much sensical interpretations. In fact, recent research has been considering attention weights as not explanations (Jain & Wallace, 2019). Furthermore, the analysis of attention weights requires a certain level of understanding of how RNNs work in theory. Hence, a novice user may find it difficult to understand and, thus, the broader use in real-world applications might not be so feasible. The other is prototype-based approaches (Ming et al., 2019) , which use prototypes to explain the decision more intuitively. The process is analogous to how, for example, human doctors and judges make decisions on a new case by referring to similar previous cases: for a given sequence, a prototypebased approach looks up a few representative examples, or prototypes, from the data set and deduces a decision. From the interpretability standpoint, such prototypes then provide intuitive clues and evidences of how the model has reached a conclusion in a form that even a layperson can understand. However, the existing prototype-based methods find the prototypes at the whole-paragraph level, making it difficult to break down the analysis at the individual sentence level, e.g., the connections and flows of individual sentences constituting a paragraph. Moreover, there may not be a suitable prototype when the length of a sequence is too large (e.g. a long paragraph), as longer sequences have greater degrees of freedom and, thus, harder to find a matching prototype, as evidenced later in Section 4. Here, we advocate the idea that the sentence level prototyping (as opposed to the paragraph level prototyping in the previous literature) produce more desirable outcomes, namely better interpretability and higher prediction accuracy. We propose a novel architecture, called ProtoryNet, in which we introduce a new concept of the prototype trajectory. Given one or more paragraphs, ProtoryNet looks up the nearest prototype for each sentence and computes the proximity. The prototype proximity values are then fed into an RNN backbone, which then captures the latent patterns across sentences by means of the trajectory of prototypes. The prototype trajectory, therefore, illuminates the semantic structure of a text sequence and the logical flow therein, and, hence, provides highly intuitive, useful interpretation of how the model has predicted an output, as can be witnessed in Figure 1 . In fact, the prototype theory in modern linguistics supplies a strong justification for the proposed idea of prototype trajectory. In the prototype theory, linguists view "a sentence as the smallest linguistic unit that can be used to perform a complete action," (Alston, 1964) and analyze texts with individual sentences as building blocks. Linguists assume that the sentences of a category are distributed along a continuum: at one extreme of this continuum are sentences having a maximal number of common properties; while on the other extreme are sentences that have only one or very few of such properties (Panther & Köpcke, 2008) . Here, the "ideal" sentence that possesses the maximally shared common properties can be considered as a prototypical sentence, or a prototype of the category. Thus, in some sense, this paper takes a meaningful first step towards mathematically formalizing the prototype theory in modern linguistics and its analysis methods by incorporating the above view in a computational framework and emulating how linguists analyze a text. As such, ProtoryNet permits a fine-grained understanding of sequence data alongside an intuitive explanation of the dynamics of the subsequences. In addition, since the technical details are hidden in the prototypes, a non-technical user can comprehend the interpretation. However, when necessary, technical users, i.e., the ones that are more knowledgeable about RNNs, can still look at the coefficients in RNN, similar to how the attention approaches visualize RNNs, as the proximity vectors feeding the RNN backbone are essentially one-hot encoded (i.e., zero everywhere except the k-th position for prototype k), making it convenient to trace how coefficients are related to each prototype.

2. RELATED WORK

In addition to model-agnostic black-box explainers such as LIME (Ribeiro et al., 2016) and SHAP(Lundberg & Lee, 2017), various post hoc explanation methods have been proposed for DNN models, such as Integrated Gradients (Sundararajan et al., 2017 ), DeepLift (Shrikumar et al., 2017) and NeuroX (Dalvi et al., 2019) . Specifically, to understand RNN models, Tsang et al. (2018) proposes a hierarchical explanations for neural networks to capture interactions and Jin et al. ( 2019) adapts the idea to text classification to quantify the importance of each word and phrase. For sentiment analysis, Murdoch et al. (2018) proposes contextual decomposition method for analyzing individual predictions made by standard LSTMs, and the method is able to reliably identify words and phrases of contrasting sentiment, and how they are combined to yield the LSTMs final prediction. In addition to the external explanation methods, the prior efforts to bring interpretability to RNNs can be categorized as attention-based and prototype-based approaches. Bahdanau, Cho, and Bengio (Bahdanau et al., 2014) proposed an encoder-decoder type machine translation algorithm, in which they implemented an attention mechanism in the decoder. By the means of the alignment probabilities and the association energy, reflecting the importance of a given word in predicting a translated word, they let the attention mechanism to weigh which part of the source sentence the model needs to pay attention to. This not only improved the performance of the model by relieving the burden of the encoder having to compress all the information about the source sentence into a fixed-length vector, but also inherently visualized how the translation was conducted through the alignment matrix (e.g., Figure 3 of Bahdanau et al. ( 2014)). Similarly, Rocktäschel et al. (2015) analyzed word-to-word attention weights for achieving insights into how a long short term memory (LSTM) classifier reasons about entailment. Zhang et al. (2017) proposed a language model to read and explore discriminative

