EMPIRICAL ANALYSIS OF UNLABELED ENTITY PROB-LEM IN NAMED ENTITY RECOGNITION

Abstract

In many scenarios, named entity recognition (NER) models severely suffer from unlabeled entity problem, where the entities of a sentence may not be fully annotated. Through empirical studies performed on synthetic datasets, we find two causes of performance degradation. One is the reduction of annotated entities and the other is treating unlabeled entities as negative instances. The first cause has less impact than the second one and can be mitigated by adopting pretraining language models. The second cause seriously misguides a model in training and greatly affects its performances. Based on the above observations, we propose a general approach, which can almost eliminate the misguidance brought by unlabeled entities. The key idea is to use negative sampling that, to a large extent, avoids training NER models with unlabeled entities. Experiments on synthetic datasets and real-world datasets show that our model is robust to unlabeled entity problem and surpasses prior baselines. On well-annotated datasets, our model is competitive with the state-of-the-art method 1 .

1. INTRODUCTION

Named entity recognition (NER) is an important task in information extraction. Previous methods typically cast it into a sequence labeling problem by adopting IOB tagging scheme (Mesnil et al., 2015; Huang et al., 2015; Ma & Hovy, 2016; Akbik et al., 2018; Qin et al., 2019) . A representative model is Bi-LSTM CRF (Lample et al., 2016) . The great success achieved by these methods benefits from massive correctly labeled data. However, in some real scenarios, not all the entities in the training corpus are annotated. For example, in some NER tasks (Ling & Weld, 2012) , the datasets contain too many entity types or a mention may be associated with multiple labels. Since manual annotation on this condition is too hard, some entities are inevitably neglected by human annotators. Situations in distantly supervised NER (Ren et al., 2015; Fries et al., 2017) are even more serious. To reduce handcraft annotation, distant supervision (Mintz et al., 2009) is applied to automatically produce labeled data. As a result, large amounts of entities in the corpus are missed due to the limited coverage of knowledge resources. We refer this to unlabeled entity problem, which largely degrades performances of NER models. There are several approaches used in prior works to alleviate this problem. Fuzzy CRF and Au-toNER (Shang et al., 2018b) allow models to learn from the phrases that may be potential entities. However, since these phrases are obtained through a distantly supervised phrase mining method (Shang et al., 2018a) , many unlabeled entities in the training data may still not be recalled. In the context of only resorting to unlabeled corpora and an entity ontology, Mayhew et al. ( 2019 2019) use it to circumvent training with false negatives. However, as fully annotated corpora are still required to get ground truth training negatives, this approach is not applicable to the situations where little or even no high-quality data is available. In this work, our goal is to study what are the impacts of unlabeled entity problem on the models and how to effectively eliminate them. Initially, we construct some synthetic datasets and introduce degradation rates. The datasets are constructed by randomly removing the annotated named entities in well-annotated datasets, e.g., CoNLL-2003 (Sang & De Meulder, 2003) , with different probabilities. The degradation rates measure how severe an impact of unlabeled entity problem degrades the performances of models. Extensive studies are investigated on synthetic datasets. We find two causes: the reduction of annotated entities and treating unlabeled entities as negative instances. The first cause is obvious but has far fewer influences than the second one. Besides, it can be mitigated well by using a pretraining language model, like BERT (Devlin et al., 2019) ), as the sentence encoder. The second cause seriously misleads the models in training and exerts a great negative impact on their performances. Even in less severe cases, it can sharply reduce the F1 score by about 20%. Based on the above observations, we propose a novel method that is capable of eliminating the misguidance of unlabeled entities in training. The core idea is to apply negative sampling that avoids training NER models with unlabeled entities. Extensive experiments have been conducted to verify the effectiveness of our approach. Studies on synthetic datasets and real-world datasets (e.g., EC) show that our model well handles unlabeled entities and notably surpasses prior baselines. On well-annotated datasets (e.g., CoNLL-2003) , our model is competitive with the state-of-the-art method.

2. PRELIMINARIES

In this section, we formally define the unlabeled entity problem and briefly describe a strong baseline, BERT Tagging (Devlin et al., 2019) , used in empirical studies.

2.1. UNLABELED ENTITY PROBLEM

We denote an input sentence as x = [x 1 , x 2 , • • • , x n ] and the annotated named entity set as y = {y 1 , y 2 , • • • , y m }. n is the sentence length and m is the amount of entities. Each member y k of set y is a tuple (i k , j k , l k ). (i k , j k ) is the span of an entity which corresponds to the phrase x i k ,j k = [x i k , x i k +1 , • • • , x j k ] and l k is its label. The unlabeled entity problem is defined as, due to the limited coverage of machine annotator or the negligence of human annotator, some ground truth entities y of the sentence x are not covered by annotated entity set y. For instance, given a sentence x = [Jack, and, Mary, are, from, New, York] and a labeled entity set y = {(1, 1, PER)}, unlabeled entity problem is that some entities, like (6, 7, LOC), are neglected by annotators. These unlabeled entities are denoted as y = {(3, 3, PER), (6, 7, LOC)}.

2.2. BERT TAGGING

BERT Tagging is present in Devlin et al. ( 2019), which adopts IOB tagging scheme, where each token x i in a sentence x is labeled with a fine-grained tag, such as B-ORG, I-LOC, or O. Formally, its output is a n-length label sequence z = [z 1 , z 2 , • • • , z n ]. Formally, BERT tagging firstly uses BERT to get the representation h i for every token x i : [h 1 , h 2 , • • • , h n ] = BERT(x). (1) Then, the label distribution q i is computed as Softmax(Wh i ). In training, the loss is induced as 1≤i≤n -log q i [z i ]. At test time, it obtains the label for each token x i by arg max q i .

3. EMPIRICAL STUDIES

To understand the impacts of unlabeled entity problem, we conduct empirical studies over multiple synthetic datasets, different methods, and various metrics.



Our source code is available at https://github.com/LeePleased/NegSampling-NER.



); Peng et al. (2019) employ positive-unlabeled (PU) learning (Li & Liu, 2005) to unbiasedly and consistently estimate the task loss. In implementations, they build distinct binary classifiers for different labels. Nevertheless, the unlabeled entities still impact the classifiers of the corresponding entity types and, importantly, the model can't disambiguate neighboring entities. Partial CRF (Tsuboi et al., 2008) is an extension of commonly used CRF (Lafferty et al., 2001) that supports learning from incomplete annotations. Yang et al. (2018); Nooralahzadeh et al. (2019); Jie et al. (

