INTERPRETABLE SEQUENCE CLASSIFICATION VIA PROTOTYPE TRAJECTORY

Abstract

We propose a novel interpretable recurrent neural network (RNN) model, called ProtoryNet, in which we introduce a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most similar prototype for each sentence in a text sequence and feeding an RNN backbone with the proximity of each of the sentences to the prototypes. The RNN backbone then captures the temporal pattern of the prototypes, to which we refer as prototype trajectories. The prototype trajectories enable intuitive, fine-grained interpretation of how the model reached to the final prediction, resembling the process of how humans analyze paragraphs. Experiments conducted on multiple public data sets reveal that the proposed method not only is more interpretable but also is more accurate than the current state-of-the-art prototype-based method. Furthermore, we report a survey result indicating that human users find ProtoryNet more intuitive and easier to understand, compared to the other prototype-based methods.

1. INTRODUCTION

Figure 1 : Prototype trajectory-based explanation. Recurrent neural networks (RNN) have been widely adopted in natural language processing. RNNs achieve the state-ofthe-art performance by utilizing the contextual information in a "memory" mechanism modeled via hidden/cell states. Albeit the benefit, however, the memory mechanism obstructs the interpretation of model decisions: as hidden states are carried over time, various pieces of information get intertwined across time steps, making RNN models a "black box" inherently. The black box nature of RNNs has motivated a body of research works aiming to achieve the interpretability. One approach is to leverage certain architecture design in the DNNs such as Attention-based methods. As will be discussed in Section 2, the attention-based approaches (Karpathy et al., 2015; Strobelt et al., 2017; Choi et al., 2016; Guo et al., 2018) visualize the RNN using the attention mechanism, which weighs the importance of each hidden state element. However, while a few of them could be quite illuminative, the attention weights are, in fact, not always intelligible. Rather, they often turn out to be a gibberish collection of numbers that does not possess much sensical interpretations. In fact, recent research has been considering attention weights as not explanations (Jain & Wallace, 2019). Furthermore, the analysis of attention weights requires a certain level of understanding of how RNNs work in theory. Hence, a novice user may find it difficult to understand and, thus, the broader use in real-world applications might not be so feasible. The other is prototype-based approaches (Ming et al., 2019) , which use prototypes to explain the decision more intuitively. The process is analogous to how, for example, human doctors and judges make decisions on a new case by referring to similar previous cases: for a given sequence, a prototypebased approach looks up a few representative examples, or prototypes, from the data set and deduces a decision. From the interpretability standpoint, such prototypes then provide intuitive clues and evidences of how the model has reached a conclusion in a form that even a layperson can understand.

