OPTIMIZING BI-ENCODER FOR NAMED ENTITY RECOGNITION VIA CONTRASTIVE LEARNING

Abstract

We present a bi-encoder framework for named entity recognition (NER), which applies contrastive learning to map candidate text spans and entity types into the same vector representation space. Prior work predominantly approaches NER as sequence labeling or span classification. We instead frame NER as a representation learning problem that maximizes the similarity between the vector representations of an entity mention and its type. This makes it easy to handle nested and flat NER alike, and can better leverage noisy self-supervision signals. A major challenge to this bi-encoder formulation for NER lies in separating non-entity spans from entity mentions. Instead of explicitly labeling all non-entity spans as the same class Outside (O) as in most prior methods, we introduce a novel dynamic thresholding loss, learned in conjunction with the standard contrastive loss. Experiments show that our method performs well in both supervised and distantly supervised settings, for nested and flat NER alike, establishing new state of the art across standard datasets in the general domain (e.g., ACE2004, ACE2005, CoNLL2003) and high-value verticals such as biomedicine (e.g., GENIA, NCBI, BC5CDR, JNLPBA). We release the code at github.com/microsoft/binder.

1. INTRODUCTION

Named entity recognition (NER) is the task of identifying text spans associated with named entities and classifying them into a predefined set of entity types such as person, location, etc. As a fundamental component in information extraction systems (Nadeau & Sekine, 2007) , NER has been shown to be of benefit to various downstream tasks such as relation extraction (Mintz et al., 2009) , coreference resolution (Chang et al., 2013) , and fine-grained opinion mining (Choi et al., 2006) . Inspired by recent success in open-domain question answering (Karpukhin et al., 2020) and entity linking (Wu et al., 2020; Zhang et al., 2021a) , we propose an efficient BI-encoder for NameD Entity Recognition (BINDER). Our model employs two encoders to separately map text and entity types into the same vector space, and it is able to reuse the vector representations of text for different entity types (or vice versa), resulting in a faster training and inference speed. Based on the bi-encoder representations, we propose a unified contrastive learning framework for NER, which enables us to overcome the limitations of popular NER formulations (shown in Figure 1 ), such as difficulty in handling nested NER with sequence labeling (Chiu & Nichols, 2016; Ma & Hovy, 2016), complex learning and inference for span-based classification (Yu et al., 2020; Fu et al., 2021) , and challenges in learning with noisy supervision (Straková et al., 2019; Yan et al., 2021) .foot_0 Through contrastive learning, we encourage the representation of entity types to be similar with the corresponding entity spans, and to be dissimilar with that of other text spans. Additionally, existing work labels all nonentity tokens or spans as the same class Outside (O), which can introduce false negatives when the training data is partially annotated (Das et al., 2022; Aly et al., 2021) . We instead introduce a novel dynamic thresholding loss in contrastive learning, which learns candidate-specific dynamic thresholds to distinguish entity spans from non-entity ones. To the best of our knowledge, we are the first to optimize bi-encoder for NER via contrastive learning. We conduct extensive experiments to evaluate our method in both supervised and distantly



Das et al. (2022) applies contrastive learning for NER in a few-shot setting. In this paper, we focus on supervised NER and distantly supervised NER.

