SAGE: SEMANTIC-AWARE GLOBAL EXPLANATIONS FOR NAMED ENTITY RECOGNITION

Abstract

In the last decades, deep learning approaches achieved impressive results in many research fields, such as Computer Vision and Natural Language Processing (NLP). NLP in particular has greatly benefit from unsupervised methods that allow to learn distributed representation of language. On the race for better performances Language Models have reached hundred of billions parameters nowadays. Despite the remarkable results, deep models are still far from being fully exploited in real world applications. Indeed, these approaches are black-boxes, i.e. they are not interpretable by design nor explainable, which is often crucial to make decisions in business. Several task-agnostic methods have been proposed in literature to explain models' decisions. Most techniques rely on the "local" assumption, i.e. explanations are made example-wise. In this paper instead, we present a post-hoc method to produce highly interpretable global rules to explain NLP classifiers. Rules are extracted with a data mining approach on a semantically enriched input representation, instead of using words/wordpieces solely. Semantic information yields more abstract and general rules that are both more explanatory and less complex, while being also better at reflecting the model behaviour. In the experiments we focus on Named Entity Recognition, an NLP task where explainability is under-investigated. We explain the predictions of BERT NER classifiers trained on two popular benchmarks, CoNLL03 and Ontonotes, and compare our model against LIME (Ribeiro et al., 2016) and Decision Trees.

1. INTRODUCTION

In recent years, Artificial Intelligence (AI) algorithms, especially deep learning models, are emerging in many applications, reporting state-of-the-art performances in many fields. In NLP, for example, the use of Large Language Models (LLM) based on huge deep neural networks achieved impressive results in many linguistic tasks. However, despite the remarkable results, deep approaches are still far from being fully exploited in real world applications. One major issue is the lack of interpretability and control of the models' predictions. This is often an important requirement for many industrial applications, especially in domains like medicine, defense, finance and law, where it is crucial to understand the decisions and build trust in the algorithms. The increasing need to address the problem of interpretability and improve model transparency made the "Explainable Artificial Intelligence" a very popular research area in the Computer Science world. Explainable AI (XAI) or Interpretable AI or Explainable Machine Learning (XML) (Guidotti et al., 2021) is a broad area of research that studies and proposes AI approaches where humans can understand the causes underlying the decisions and predictions made by the machine (Vilone & Longo, 2021b) . The AI algorithms can be usually grouped into two families (Vilone & Longo, 2021a): (a) white-box models, which include algorithms whose interpretation is given by design, and (b) black-box approaches where, on the other hand, the decision making process is "opaque" and hard to understand. White-box models such as linear regression, probabilistic classifiers or decision trees are significantly easier to explain and interpret, but, often, provide a low predictive capacity and they are not always capable of modeling the inherent complexity of the task. In black-box models, on the other hand, very little knowledge is available on how the input variables influence the final decision. The relationship between input and output is often the result of a complex composition of mathematical functions and is not directly interpretable. Although classic ML approaches are still widespread, almost all the modern complex AI techniques, such as deep neural networks, are naturally opaque (Lipton, 2018) . Thus many new methods aimed to make new models more explainable and interpretable have been proposed or are under investigation. Most explainability techniques rely on the "local" assumption where the descriptions are provided for each example and few approaches exist aiming at provide a interpretable description of the model as whole (glass-box). In this paper, we present SAGE (Semantic-Aware Global Explanations), a method to produce highly interpretable global rules to explain NLP classifiers. Rules are extracted using a data mining algorithm which exploits a semantically enriched input representation. Semantic information yields more abstract and general rules that are both more explanatory and less complex, while being also better at reflecting the model behaviour. In the experiments, we focus on Named Entity Recognition, an NLP task where explainability is under-investigated. In particular, we aimed to explain the predictions of a BERT-based ((Devlin et al., 2018) ) NER classifier trained and tested on two popular benchmarks, CoNLL03 (Tjong Kim Sang & De Meulder, 2003) and Ontonotes (Pradhan et al., 2013) . We compare the proposed model against LIME (Ribeiro et al., 2016) , which currently is one of the most popular local explanation algorithm in NLP and Decision Tree classifiers that are classic explainable by design, global explanation models. For the assessment, we exploited two commonly used metrics: fidelity and complexity. The results show that the proposed approach infers a set of rules that reproduce the behavior of the model more accurately than both LIME and Decision Trees. The paper is organized as follow. In Section 2, we summarize related works, while in Section 3 the proposed algorithm is described in detail. Experiments are reported in Section 4, and finally conclusions and future works are drawn in Section 5.

2. RELATED WORK

As deep learning models have become more complex, many methods have been proposed to interpret and explain the predictions of a model. Two main groups of XAI techniques exist: (1) local approaches, which aim to provide an interpretable explanation for each single prediction; (2) global approaches, which try to build a "white-box" version of the black-box model (thus interpretable by design). Local Explanations. Local algorithms focus on finding an interpretable explanation of the prediction returned by the ML model for a given input example. Several approaches focus on interpreting the internal components of a black-box model with intent of shading some lights on its decision making process. In (Csiszár et al., 2020) , authors propose the use of fuzzy logic to "explain" each Artificial Neural Network unit. Similar techniques try to identify which features in a particular input vector contribute the most to a neural network's decision. Layer-wise relevance propagation (LRP), for example, exploits a back-propagation similar algorithm to build a heat-map over the input features (Binder et al., 2016 ),(Montavon et al., 2019) . This method proved to be very effective in Computer Vision, highlighting which pixels of an image contributed most to the final classification. Other methods such as (Lundberg & Lee, 2017) and (Simonyan et al., 2013) follow the same principles. Such techniques, however, are strongly related to the ML models' family used and they can be employed only with neural networks. Furthermore they usually do not produce high-level explanations. Some model-agnostic approaches were proposed in (Ribeiro et al., 2016; 2018; Strumbelj & Kononenko, 2010) . In particular, in (Ribeiro et al., 2016) , LIME (Local Interpretable Modelagnostic Explanations) is presented. LIME trains a white-box classifier, that learns the black-box output distribution on a neighborhood of a given example. Neighbors are obtained perturbing the input features (usually randomly). The explanation can be then constructed by selecting the input features that mostly affect the model prediction. Although these models have been widely applied in explainability problems, including NLP, some limitations occur when dealing with Large Language Models (LLMs) and Named Entity Recognition. LLMs tokenize text in subwords, like wordpieces (Schuster & Nakajima, 2012) in BERT, which requires particular attention in designing the text perturbation step. Masking and perturbation of the input introduce another issue for LLMs, that are contextual. Indeed random masking of tokens can produce artificial and inconsistent contexts, which

