K-PLUG: KNOWLEDGE-INJECTED PRE-TRAINED LANGUAGE MODEL FOR NATURAL LANGUAGE UN-DERSTANDING AND GENERATION

Abstract

Existing pre-trained language models (PLMs) have demonstrated the effectiveness of self-supervised learning for a broad range of natural language processing (NLP) tasks. However, most of them are not explicitly aware of domain-specific knowledge, which is essential for downstream tasks in many domains, such as tasks in e-commerce scenarios. In this paper, we propose K-PLUG, a knowledgeinjected pre-trained language model based on the encoder-decoder transformer that can be transferred to both natural language understanding and generation tasks. We verify our method in a diverse range of e-commerce scenarios that require domain-specific knowledge. Specifically, we propose five knowledgeaware self-supervised pre-training objectives to formulate the learning of domainspecific knowledge, including e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities. K-PLUG achieves new state-of-the-art results on a suite of domain-specific NLP tasks, including product knowledge base completion, abstractive product summarization, and multi-turn dialogue, significantly outperforms baselines across the board, which demonstrates that the proposed method effectively learns a diverse set of domain-specific knowledge for both language understanding and generation tasks. The code, data, and models will be publicly available 1 .

1. INTRODUCTION

Pre-trained language models (PLMs), such as ELMo (Peters et al., 2018) , GPT (Radford et al., 2018) , BERT (Devlin et al., 2019) , RoBERTa (Liu et al., 2019), and XLNet (Yang et al., 2019) , have made remarkable breakthroughs in many natural language understanding (NLU) tasks, including text classification, reading comprehension, and natural language inference. These models are trained on large-scale text corpora with self-supervision based on either bi-directional or auto-regressive pre-training. Equally promising performances have been achieved in natural language generation (NLG) tasks, such as machine translation and text summarization, by MASS (Song et al., 2019) , UniLM (Dong et al., 2019) , BART (Lewis et al., 2020 ), T5 (Raffel et al., 2019) , PEGASUS (Zhang et al., 2020), and ProphetNet (Yan et al., 2020) . In contrast, these approaches adopt Transformerbased sequence-to-sequence models to jointly pre-train for both the encoder and the decoder. While these PLMs can learn rich semantic patterns from raw text data and thereby enhance downstream NLP applications, many of them do not explicitly model domain-specific knowledge. As a result, they may not be as sufficient for capturing human-curated or domain-specific knowledge that is necessary for tasks in a certain domain, such as tasks in e-commerce scenarios. In order to overcome this limitation, several recent studies have proposed to enrich PLMs with explicit knowledge, including knowledge base (KB) (Zhang et al., 2019; Peters et al., 2019; Xiong et al., 2020; Wang et al., 2019; 2020 ), lexical relation (Lauscher et al., 2019;; Wang et al., 2020) , word sense (Levine et al., 2020 ), part-of-speech tag (Ke et al., 2019) , and sentiment polarity (Ke et al., 2019; Tian et al., 2020) . However, these methods only integrate domain-specific knowledge into the encoder, and the decoding process in many NLG tasks benefits little from these knowledge. To mitigate this problem, we propose a Knowledge-injected Pre-trained Language model that is suitable for both Natural Language Understanding and Generation (K-PLUG). Different from existing knowledge-injected PLMs, K-PLUG integrates knowledge into pre-training for both the encoder and the decoder, and thus K-PLUG can be adopted to both downstream knowledge-driven NLU and NLG tasks. We verify the performance of the proposed method in various e-commerce scenarios. In the proposed K-PLUG, we formulate the learning of four types of domain-specific knowledge: e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions (USPs) (Garrett, 1961) of product entities. Specifically, e-commerce KB stores standardized product attribute information, product aspects are features that play a crucial role in understanding product information, product categories are the backbones for constructing taxonomies for organization, and USPs are the essence of what differentiates a product from its competitors. K-PLUG learns these types of knowledge into a unified PLM, enhancing performances for various language understanding and generation tasks. To effectively learn these four types of valuable domain-specific knowledge in K-PLUG, we proposed five new pre-training objectives: knowledge-aware masked language model (KMLM), knowledge-aware masked sequence-to-sequence (KMS2S), product entity aspect boundary detection (PEABD), product entity category classification (PECC), and product entity aspect summary generation (PEASG). Among these objectives, KMLM and KMS2S learn to predict the masked single and multiple tokens, respectively, that are associated with domain-specific knowledge rather than general information; PEABD determines the boundaries between descriptions of different product aspects given full product information; PECC identifies the product category that each product belongs to; and PEASG generates a summary for each individual product aspect based on the entire product description. After pre-training K-PLUG, we fine-tune it on three domain-specific NLP tasks, namely, ecommerce knowledge base completion, abstractive product summarization, and multi-turn dialogue. The results show that K-PLUG significantly outperforms comparative models on all these tasks. Our main contributions can be summarized as follows: • We present K-PLUG that learns domain-specific knowledge for both the encoder and the decoder in a pre-training language model framework, which benefits both NLG and NLU tasks. • We formulate the learning of four types of domain-specific knowledge in e-commerce scenarios: e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities, which provide critical information for many applications in the domain of e-commerce. Specifically, five self-supervised objectives are proposed to learn these four types of knowledge into a unified PLM. • Our proposed model exhibits clear effectiveness in many downstream tasks in the ecommerce scenario, including e-commerce KB completion, abstractive product summarization, and multi-turn dialogue. 2 RELATED WORK et al., 2019) and BART (Lewis et al., 2020) present denoising sequence-to-sequence pre-training. PEGASUS (Zhang et al., 2020) pre-trains with gap-sentence generation objective. While humancurated or domain-specific knowledge is essential for downstream knowledge-driven tasks, these methods do not explicitly consider external knowledge like our proposed K-PLUG.



Our code is available at https://github.com/ICLR21Anonymous/knowledge_pretrain.



-trains the sequence-to-sequence LM to recover a span of masked tokens. UniLM(Dong  et al., 2019)  combines bidirectional, unidirectional, and sequence-to-sequence LMs. T5 (Raffel

