A MULTI-GRAINED SELF-INTERPRETABLE SYMBOLIC-NEURAL MODEL FOR SINGLE/MULTI-LABELED TEXT CLASSIFICATION

Abstract

Deep neural networks based on layer-stacking architectures have historically suffered from poor inherent interpretability. Meanwhile, symbolic probabilistic models function with clear interpretability, but how to combine them with neural networks to enhance their performance remains to be explored. In this paper, we try to marry these two systems for text classification via a structured language model. We propose a Symbolic-Neural model that can learn to explicitly predict class labels of text spans from a constituency tree without requiring any access to spanlevel gold labels. As the structured language model learns to predict constituency trees in a self-supervised manner, only raw texts and sentence-level labels are required as training data, which makes it essentially a general constituent-level self-interpretable classification model. Our experiments demonstrate that our approach could achieve good prediction accuracy in downstream tasks. Meanwhile, the predicted span labels are consistent with human rationales to a certain degree.

1. INTRODUCTION

Lack of interpretability is an intrinsic problem in deep neural networks based on layer-stacking for text classification. Many methods have been proposed to provide posthoc explanations for neural networks (Lipton, 2018; Lundberg & Lee, 2017; Sundararajan et al., 2017) . However, these methods have multiple drawbacks. First, there is only word-level attribution but no high-level attribution such as those over phrases and clauses. Take sentiment analysis as an example, in addition to the ability to recognize the sentiment of sentences, an ideal interpretable model should be able to identify the sentiment and polarity reversal at the levels of words, phrases, and clauses. Secondly, as argued by Rudin (2019), models should be inherently interpretable rather than explained by a posthoc model. A widely accepted property of natural languages is that "the meaning of a whole is a function of the meanings of the parts and of the way they are syntactically combined" (Partee, 1995). Compared with the sequential outputs of layer-stacked model architectures, syntactic tree structures naturally capture features of various levels because each node in a tree represents a constituent span. Such a characteristic motivates us to think about whether the representations of these internal nodes could be leveraged to design an inherently constituent-level interpretable model. One challenge faced by this idea is that traditional syntactic parsers require supervised training and have degraded performance on out-of-domain data. Fortunately, with the development of structured language models (Tu et al., 2013; Maillard et al., 2017; Choi et al., 2018; Kim et al., 2019) , we are now able to learn hierarchical syntactic structures in an unsupervised manner from any raw text. In this paper, we propose a general selfinterpretable text classification model that can learn to predict span-level labels unsupervisedly as shown in Figure 1 . Specifically, we propose a novel label extraction framework based on a simple inductive bias for inference. During training, we maximize the probability summation of all potential trees whose extracted labels are consistent with a gold label set via dynamic programming



Figure 1: Our model can learn to predict span-level labels without access to span-level gold labels during training. In examples (a) and (b), only raw texts and sentence-level gold labels {request address, navigate} and {negative} are given.

