K-PLUG: KNOWLEDGE-INJECTED PRE-TRAINED LANGUAGE MODEL FOR NATURAL LANGUAGE UN-DERSTANDING AND GENERATION

Abstract

Existing pre-trained language models (PLMs) have demonstrated the effectiveness of self-supervised learning for a broad range of natural language processing (NLP) tasks. However, most of them are not explicitly aware of domain-specific knowledge, which is essential for downstream tasks in many domains, such as tasks in e-commerce scenarios. In this paper, we propose K-PLUG, a knowledgeinjected pre-trained language model based on the encoder-decoder transformer that can be transferred to both natural language understanding and generation tasks. We verify our method in a diverse range of e-commerce scenarios that require domain-specific knowledge. Specifically, we propose five knowledgeaware self-supervised pre-training objectives to formulate the learning of domainspecific knowledge, including e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities. K-PLUG achieves new state-of-the-art results on a suite of domain-specific NLP tasks, including product knowledge base completion, abstractive product summarization, and multi-turn dialogue, significantly outperforms baselines across the board, which demonstrates that the proposed method effectively learns a diverse set of domain-specific knowledge for both language understanding and generation tasks. The code, data, and models will be publicly available 1 .

1. INTRODUCTION

Pre-trained language models (PLMs), such as ELMo (Peters et al., 2018) , GPT (Radford et al., 2018) , BERT (Devlin et al., 2019) , RoBERTa (Liu et al., 2019), and XLNet (Yang et al., 2019) , have made remarkable breakthroughs in many natural language understanding (NLU) tasks, including text classification, reading comprehension, and natural language inference. These models are trained on large-scale text corpora with self-supervision based on either bi-directional or auto-regressive pre-training. Equally promising performances have been achieved in natural language generation (NLG) tasks, such as machine translation and text summarization, by MASS (Song et al., 2019) , UniLM (Dong et al., 2019) , BART (Lewis et al., 2020 ), T5 (Raffel et al., 2019) , PEGASUS (Zhang et al., 2020), and ProphetNet (Yan et al., 2020) . In contrast, these approaches adopt Transformerbased sequence-to-sequence models to jointly pre-train for both the encoder and the decoder. While these PLMs can learn rich semantic patterns from raw text data and thereby enhance downstream NLP applications, many of them do not explicitly model domain-specific knowledge. As a result, they may not be as sufficient for capturing human-curated or domain-specific knowledge that is necessary for tasks in a certain domain, such as tasks in e-commerce scenarios. In order to overcome this limitation, several recent studies have proposed to enrich PLMs with explicit knowledge, including knowledge base (KB) (Zhang et al., 2019; Peters et al., 2019; Xiong et al., 2020; Wang et al., 2019; 2020 ), lexical relation (Lauscher et al., 2019;; Wang et al., 2020) , word sense (Levine et al., 2020 ), part-of-speech tag (Ke et al., 2019) , and sentiment polarity (Ke et al., 2019; Tian et al., 2020) . However, these methods only integrate domain-specific knowledge into the encoder, and the decoding process in many NLG tasks benefits little from these knowledge.



Our code is available at https://github.com/ICLR21Anonymous/knowledge_pretrain. 1

