DISTANTLY SUPERVISED RELATION EXTRACTION IN FEDERATED SETTINGS

Abstract

Distant supervision is widely used in relation extraction in order to create a largescale training dataset by aligning a knowledge base with unstructured text. Most existing studies in this field have assumed there is a great deal of centralized unstructured text. However, in practice, text may be distributed on different platforms and cannot be centralized due to privacy restrictions. Therefore, it is worthwhile to investigate distant supervision in the federated learning paradigm, which decouples the training of the model from the need for direct access to the raw text. However, overcoming label noise of distant supervision becomes more difficult in federated settings, because the sentences containing the same entity pair scatter around different platforms. In this paper, we propose a federated denoising framework to suppress label noise in federated settings. The core of this framework is a multiple instance learning based denoising method that is able to select reliable sentences via cross-platform collaboration. Various experimental results on New York Times dataset and miRNA gene regulation relation dataset demonstrate the effectiveness of the proposed method.

1. INTRODUCTION

Relation extraction (RE) aims to mine factual knowledge from free text by labeling relations between entity mentions, which is a crucial step in knowledge base (KB) construction. For example, given a sentence "[Steve Jobs] e1 and Wozniak co-founded [Apple] e2 in 1967", a relation extractor should identify that "Steve Jobs" and "Apple" are in a "Founder" relationship. Most existing supervised RE systems, such as Zeng et al. (2014) ; Zhang & Wang (2015) ; Wang et al. (2016) ; Zhou et al. (2016) , rely on a large-scale manually annotated training dataset, which is extremely expensive and cannot cover all walks of life. To ease the reliance on annotated data, Mintz et al. (2009) proposed distant supervision to automatically generate training data by heuristically aligning a KB with unstructured text. The key assumption of distant supervision is that if two entities have a relation in the KB, then all sentences that mention these two entities will express this relation. Since then, there has been a rich literature devoted to this topic, such as Riedel et al. ( 2010 Though the progress is exciting, distant supervision approaches have so far been limited to the centralized learning paradigm, which assumes that a great deal of text is easily accessible. However, in practice, text may be distributed on different platforms and be massively convoluted with sensitive personal information, especially in the healthcare and financial fields (Yang et al., 2019; Zerka et al., 2020; Chamikara et al., 2020) . Due to privacy restrictions, it is almost impossible or cost-prohibitive to centralize text from multiple platforms. Recently, federated learning (McMahan et al., 2016) provides a compelling solution for learning a model from decentralized and privacy-sensitive data. The main idea behind federated learning is that each platform trains a local model based on its own local data and a master server coordinates massive platforms to collaboratively train a global model by aggregating these local model updates. Unfortunately, directly applying federated learning to the decentralized distantly supervised data fails, because conventional federated learning requires the local data to come with labels without noise (Tuor et al., 2020) , however, in distant supervision, automatic labeling inevitably accompanies with label noise (Riedel et al., 2010; Hoffmann et al., 2011; Zeng et al., 2015; Lin et al., 2016) , which means not all sentences that mention an entity pair can represent the relation between them. Training on such noisy data will substantially hinder the performance of the RE model. 2019), cannot handle label noise well in federated settings. This point can be illustrated by the example in Figure 1 . S 1 and S 2 contain the same entity pair ("Steve Jobs", "Apple") but are distributed on two platforms. S 1 is true positive while S 2 is a false positive instance, which does not express the "founder" relation. In centralized training, there is no barrier between Platform 1 and Platform 2; therefore, simultaneously considering S 1 and S 2 can easily filter out noise via only selecting S 1 (Zeng et al., 2015) or placing a small weight on S 2 (Lin et al., 2016; Ye & Ling, 2019) . However, raw data exchange between platforms is prohibited in federated settings. Due to the lack of comparison with S 1 , previous denoising methods would mistakenly regard S 2 as a true positive instance. As a result, S 2 is retained and then poisons the local model in platform 2, which would affect the global model in turn. To suppress label noise in federated settings, we propose a federated denoising framework in this paper. The core of this framework is a multiple instance learning (MIL) (Dietterich et al., 1997; Maron & Lozano-Pérez, 1998) based denoising algorithm, called Lazy MIL, which is only executed at the beginning of each communication round and then would rest until the next round. Since the sentences containing the same entity pair scatter around different platforms, Lazy MIL algorithm coordinates multiple platforms to jointly select reliable sentences. Once sentences have been selected, they would be used repeatedly to train local models until the end of this round. In summary, the contributions of this paper are: • Considering data decentralization and privacy protection, we investigate distant supervision under the federated learning paradigm, which decouples the model training from the need for direct access to the raw data. To our best knowledge, combining federated learning with distant supervision is still an unexplored territory, which is the main focus of this paper. • Since the automatic labeling in distant supervision inevitably accompanies with label noise, we present a multiple instance learning based denoising method, which can select reliable instances via cross-platform collaboration. • The proposed method yields promising results on two benchmarks datasets, and we perform various experiments to verify the effectiveness of the proposed method. The code will be released at http://anonymized.

2. RELATED WORK

In this section, we will briefly review the recent progress in distant supervision and some existing studies in federated learning.

Distant supervision.

Relation extraction is a task of mining factual knowledge from free text by labeling relations between entity mentions. To alleviate the dependence of supervised methods on annotated data, Mintz et al. (2009) proposed distant supervision by using a knowledge base to annotate a large-scale dataset automatically. However, automatic labeling inevitably accompanies with label noise. To deal with label noise, most distantly supervised approaches (Riedel et al., 2010; Hoffmann et al., 2011; Surdeanu et al., 2012; Zeng et al., 2015; Lin et al., 2016; Luo et al., 2017; Ye & Ling, 2019; Yuan et al., 2019) focus on reducing label noise at bagfoot_0 level prediction. These studies fall under multiple instance learning framework, which assumes that at least one sentence expresses the relation in a bag. Another line of work aims to reduce label noise at sentence level prediction.



A set of sentences containing the same entity pair is called a "bag"



); Hoffmann et al. (2011); Zeng et al. (2015); Lin et al. (2016); Ye & Ling (2019); Yuan et al. (2019).

Figure 1: An example of the sentences that contain the same entity pair distributed on two plat-The triple (Steve Jobs, Founder, Apple) is a fact in the KB Moreover, even involving previous denoising methods, such as Zeng et al. (2015); Lin et al. (2016); Ye & Ling (2019), cannot handle label noise well in federated settings. This point can be illustrated by the example in Figure1. S 1 and S 2 contain the same entity pair ("Steve Jobs", "Apple") but are distributed on two platforms. S 1 is true positive while S 2 is a false positive instance, which does not express the "founder" relation. In centralized training, there is no barrier between Platform 1 and Platform 2; therefore, simultaneously considering S 1 and S 2 can easily filter out noise via only selecting S 1(Zeng et al., 2015)  or placing a small weight on S 2(Lin et al., 2016; Ye & Ling,  2019). However, raw data exchange between platforms is prohibited in federated settings. Due to the lack of comparison with S 1 , previous denoising methods would mistakenly regard S 2 as a true positive instance. As a result, S 2 is retained and then poisons the local model in platform 2, which would affect the global model in turn.

