LEARNING UNCERTAINTY FOR UNKNOWN DOMAINS WITH ZERO-TARGET-ASSUMPTION

Abstract

We introduce our Maximum-Entropy Rewarded Reinforcement Learning (MERRL) framework that selects training data for more accurate Natural Language Processing (NLP). Because conventional data selection methods select training samples based on the test domain knowledge and not on real life data, they frequently fail in unknown domains like patent and Twitter. Our approach selects training samples that maximize information uncertainty measured by entropy, including observation entropy like empirical Shannon entropy, Min-entropy, Rényi entropy, and prediction entropy using mutual information, to cover more possible queries that may appear in unknown worlds. Our MERRL using regularized A2C and SAC achieves up to -99.7 perplexity decrease (-43.4% relatively) in language modeling, +25.0 accuracy increase (+40.0% relatively) in sentiment analysis, and +5.0 F1 score increase (+30.8% relatively) in named entity recognition over various domains, demonstrating strong generalization power on unknown test sets.

1. INTRODUCTION

We introduce novel training set selection method that does not require target-domain information to improve out-of-domain Natural Language Processing (NLP) model accuracy. Machine learning is a data-driven process whose success relies highly on the data in use. System performance is typically measured on a specific test set, however, in reality, the test domain is often oblivious during model training, resulting in a critical performance gap between laboratory findings and language use in the real world. For example, we often observe that a system that relies on human parity results generates surprising errors in real-life use scenarios. Some work has been done in augmenting or selecting data (Wang et al., 2022) to address this discrepancy. Data optimization can be expensive and error-prone for general domains (Jha et al., 2020) . Thus, conventional approaches choose critical in-domain data that may work well for a pre-defined target domain (Moore & Lewis, 2010; Kirchhoff & Bilmes, 2014; van der Wees et al., 2017; Fan et al., 2017; Qu et al., 2019; Liu et al., 2019; Kang et al., 2020) . However, there are two problems with domain-specific data selection: First, shifting data toward one target domain may fail in the source and other domains. Second, when target domains are unknown, as in the case of most real-world applications, we do not know what future data to receive before model launches. In our study, we select training data without using target-domain information to achieve learning generalization. Our data selection objective is to maximize the uncertainty of the training data. Specifically, we use entropy to measure the uncertainty based on the principle of maximum entropy, which states that subject to known constraints, the probability distribution that best represents the current state of knowledge is the one with the largest entropy (Jaynes, 1957; Katz, 1967; Hernando et al., 2012) . Therefore, a system with the largest remaining uncertainty contains the least extra biases or uncalled-for assumptions and is ideal for modeling distributions for unknown test domains. To that end, we propose to measure the amount of uncertainty in our observational data and in our model prediction output. As observation entropy, we use Shannon Entropy, Rényi Entropy, and Min Entropy on the n-gram relative frequency of all sentences in the dataset instead of one sentence to model the dependency among sentences. As prediction entropy, we compute the mutual information

