K-PLUG: KNOWLEDGE-INJECTED PRE-TRAINED LANGUAGE MODEL FOR NATURAL LANGUAGE UN-DERSTANDING AND GENERATION

Abstract

Existing pre-trained language models (PLMs) have demonstrated the effectiveness of self-supervised learning for a broad range of natural language processing (NLP) tasks. However, most of them are not explicitly aware of domain-specific knowledge, which is essential for downstream tasks in many domains, such as tasks in e-commerce scenarios. In this paper, we propose K-PLUG, a knowledgeinjected pre-trained language model based on the encoder-decoder transformer that can be transferred to both natural language understanding and generation tasks. We verify our method in a diverse range of e-commerce scenarios that require domain-specific knowledge. Specifically, we propose five knowledgeaware self-supervised pre-training objectives to formulate the learning of domainspecific knowledge, including e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities. K-PLUG achieves new state-of-the-art results on a suite of domain-specific NLP tasks, including product knowledge base completion, abstractive product summarization, and multi-turn dialogue, significantly outperforms baselines across the board, which demonstrates that the proposed method effectively learns a diverse set of domain-specific knowledge for both language understanding and generation tasks. The code, data, and models will be publicly available 1 .



To mitigate this problem, we propose a Knowledge-injected Pre-trained Language model that is suitable for both Natural Language Understanding and Generation (K-PLUG). Different from existing knowledge-injected PLMs, K-PLUG integrates knowledge into pre-training for both the encoder and the decoder, and thus K-PLUG can be adopted to both downstream knowledge-driven NLU and NLG tasks. We verify the performance of the proposed method in various e-commerce scenarios. In the proposed K-PLUG, we formulate the learning of four types of domain-specific knowledge: e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions (USPs) (Garrett, 1961) of product entities. Specifically, e-commerce KB stores standardized product attribute information, product aspects are features that play a crucial role in understanding product information, product categories are the backbones for constructing taxonomies for organization, and USPs are the essence of what differentiates a product from its competitors. K-PLUG learns these types of knowledge into a unified PLM, enhancing performances for various language understanding and generation tasks. To effectively learn these four types of valuable domain-specific knowledge in K-PLUG, we proposed five new pre-training objectives: knowledge-aware masked language model (KMLM), knowledge-aware masked sequence-to-sequence (KMS2S), product entity aspect boundary detection (PEABD), product entity category classification (PECC), and product entity aspect summary generation (PEASG). Among these objectives, KMLM and KMS2S learn to predict the masked single and multiple tokens, respectively, that are associated with domain-specific knowledge rather than general information; PEABD determines the boundaries between descriptions of different product aspects given full product information; PECC identifies the product category that each product belongs to; and PEASG generates a summary for each individual product aspect based on the entire product description. After pre-training K-PLUG, we fine-tune it on three domain-specific NLP tasks, namely, ecommerce knowledge base completion, abstractive product summarization, and multi-turn dialogue. The results show that K-PLUG significantly outperforms comparative models on all these tasks. Our main contributions can be summarized as follows: • We present K-PLUG that learns domain-specific knowledge for both the encoder and the decoder in a pre-training language model framework, which benefits both NLG and NLU tasks. • We formulate the learning of four types of domain-specific knowledge in e-commerce scenarios: e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities, which provide critical information for many applications in the domain of e-commerce. Specifically, five self-supervised objectives are proposed to learn these four types of knowledge into a unified PLM. • Our proposed model exhibits clear effectiveness in many downstream tasks in the ecommerce scenario, including e-commerce KB completion, abstractive product summarization, and multi-turn dialogue.

2.1. PLMS IN GENERAL

Unsupervised pre-training language model has been successfully applied to many NLP tasks. ELMo (Peters et al., 2018) learns the contextual representations based on a bidirectional LM. GPT (Radford et al., 2018) predicts tokens based on the context on the left-hand side. BERT (Devlin et al., 2019) adopts a bi-directional LM to predict the masked tokens. XLNet (Yang et al., 2019) predicts masked tokens in a permuted order through an autoregressive method. MASS (Song et al., 2019) pre-trains the sequence-to-sequence LM to recover a span of masked tokens. UniLM (Dong et al., 2019) combines bidirectional, unidirectional, and sequence-to-sequence LMs. T5 (Raffel et al., 2019) and BART (Lewis et al., 2020) present denoising sequence-to-sequence pre-training. PEGASUS (Zhang et al., 2020) pre-trains with gap-sentence generation objective. While humancurated or domain-specific knowledge is essential for downstream knowledge-driven tasks, these methods do not explicitly consider external knowledge like our proposed K-PLUG. We define knowledge as e-commerce knowledge-bases, aspects of product entities, categories of product entities, and ::::::::::::::::::::::::::::::::::::: unique selling propositions of product entities. Pre-training objectives include knowledge-aware masked language model (KMLM), knowledge-aware masked sequence-to-sequence (KMS2S), product entity aspect boundary detection (PEABD), product entity category classification (PECC), and product entity aspect summary generation (PEASG).

2.2. INJECTING KNOWLEDGE INTO PLMS

Recent work investigates how to incorporate knowledge into PLMs for NLU. ERNIE (Sun et al., 2019) enhances language representation with the entity/phrase-level masking. ERNIE (Zhang et al., 2019) identifies and links entity mentions in texts to their corresponding entities in KB. Similar to ERNIE (Zhang et al., 2019) , KnowBERT (Peters et al., 2019) injects KBs into PLM. Xiong et al. (2020) leverages an entity replacement pre-training objective to learn better representations for entities. KEPLER (Wang et al., 2019) adopts the knowledge embedding objective in the pretraining. Besides, SKEP (Tian et al., 2020) , SenseBERT (Levine et al., 2020) , SentiLR (Ke et al., 2019) , and K-ADAPTER (Wang et al., 2020) propose to integrate sentiment knowledge, word sense, sentiment polarity, and lexical relation into PLM, respectively. However, most of these studies are focused on integrating knowledge for language understanding task, work of utilizing domainspecific knowledge for pre-training for language generation tasks are limited. Inspired by these work, we construct K-PLUG that learns domain-specific knowledge into a PLM for both NLU and NLG tasks.

3. KNOWLEDGE-INJECTED PRE-TRAINING

In this section, we explain the data used to pre-train K-PLUG, its model architecture, and our pretraining objectives.

3.1. DATA PREPARATION

We collect the pre-training data from a mainstream Chinese e-commerce platformfoot_1 , which contains approximately 25 million textual product descriptions and covers 40 product categories. With an average length of 405 tokens, these product descriptions constitute a corpus with a size of 10B Chinese characters. Each product description consists of information on 10.7 product aspects on average, and each product aspect is accompanied with a summary highlighting its prominent features, as shown in Figure 1(a) . Additionally, the e-commerce KB and USPs (further explained below) used in our pre-training data are as specified by the e-commerce platform and its online stores.

3.2. MODEL ARCHITECTURE

K-PLUG adopts the standard sequence-to-sequence Transformer architecture (Vaswani et al., 2017) , consisting of a 6-layer encoder and a 6-layer decoder as Song et al. (2019) . We set the size of hidden vectors as 768, and the number of self-attention heads as 12. More details about the experimental settings are in the appendix.

3.3. KNOWLEDGE FORMULATION AND PRE-TRAINING OBJECTIVES

We formulate the learning of four types of knowledge in a unified PLM: e-commerce KB, aspects of product entities, categories of product entities, and USPs of product entities. Specifically, ecommerce KB stores standardized product attribute information, e.g., (Material: Cotton) and (Collar Type: Pointed Collar). It provides details about the products (Logan IV et al., 2017) . Aspects of product entities are features of a product, such as the sound quality of a stereo speaker, etc. (Li et al., 2020) . Categories of product entities such as Clothing and Food are widely used by e-commerce platforms to organize their products so to present structured offerings to their customers (Luo et al., 2020; Dong et al., 2020) USPs of product entities are the essence of what differentiates a product from its competitors (Garrett, 1961) . For example, a stereo speaker's USP exhibiting its supreme sound quality could be "crystal clear stereo sound". An effective USP immediately motivates the purchasing behavior of potential buyers. We propose and evaluate five novel self-supervised pre-training objectives to learn the abovementioned four types of knowledge in the K-PLUG model (see Figure 1 ).

Knowledge-aware Masked Language Model (KMLM)

Inspired by BERT (Devlin et al., 2019) , we adopt the masked language model (MLM) to train the Transformer encoder as one of our pre-training objectives, which learns to predict the masked tokens in the source sequence (e.g., "The company is [MASK] at the foot of a hill."). Similar to BERT, we mask 15% of all tokens in a text sequence; 80% of the masked tokens are replaced with the [MASK] token, 10% with a random token, and 10% left unchanged. Particularly, given an original text sequence x = (x 1 , ..., x m , ..., x M ) with M tokens, a masked sequence is produced by masking x m through one of the three ways explained above, e.g., replacing x m with [MASK] to create x = (x 1 , ..., [MASK], ..., x M ). MLM aims to model the conditional likelihood P (x m | x), and the loss function is: L M LM = log P (x m | x) The major difference from BERT is that our KMLM prioritizes knowledge tokens, which contain knowledge regarding product attributes and USPs, when selecting positions to mask and, in the case that the knowledge tokens make up less than 15% of all tokens, randomly picks non-knowledge tokens to complete the masking. Knowledge-aware Masked Sequence-to-Sequence (KMS2S) K-PLUG inherits the strong ability of language generation from the masked sequence-to-sequence (MS2S) objective. The encoder takes a sentence with a masked fragment (several consecutive tokens) as the input, and the decoder predicts this masked fragment conditioned on the encoder representations (e.g., "  L M S2S = v t=u log P (x t | x) We set the length of the masked span as 30% of the length of the original text sequence. Similar to KMLM, KMS2S prioritizes the masking of text spans that cover knowledge tokens.

Product Entity Aspect Boundary Detection (PEABD)

A product description usually contains multiple product entity aspects. Existing work (Li et al., 2020) proves that product aspects influence the quality of product summaries from the views of importance, non-redundancy, and readability, which are not directly taken into account in language modeling. In order to train a model that understands product aspects, we leverage the PEABD objective to detect boundaries between the product entity aspects. It is essentially a sequence labeling task based on the representations of K-PLUG's top encoder layer. Given a text sequence x = (x 1 , ..., x M ), the encoder of K-PLUG outputs a sequence h = (h 1 , ..., h M ), which is fed into a softmax layer, and generates a probability sequence y. The loss function is: L P EABD = - t ŷt log y t (3) where y ∈ {[0, 1]} are the ground-truth labels for the aspect boundary detection task.

Product Entity Category Classification (PECC)

Product entity categories are the backbones for constructing taxonomies (Luo et al., 2020; Dong et al., 2020) . Each product description document corresponds to one of the 40 categories included in our corpus, such as Clothing, Bags, Home Appliances, Shoes, Foods, etc. Identifying the product entity categories accurately is the prerequisite for creating an output that is consistent with the input. Given a text sequence x = (x 1 , ..., x M ), a softmax layer outputs the classification score, y, based on the representation of the encoder classification token, [CLS] . The loss function maximizes the model's probability of outputting the true product entity category as follows: L P ECC = -ŷ log y (4) where ŷ is the ground-truth product category.

Product Entity Aspect Summary Generation (PEASG)

Inspired by PEGASUS (Zhang et al., 2020) , which proves that using a pre-training objective that more closely resembles the downstream task leads to better and faster fine-tuning performance, we propose a PEASG objective to generate a summary from the description of a product entity aspect. Unlike extracted gap-sentences generation in PEGASUS, our method constructs a more realistic summary generation task because the aspect summary naturally exists in our pre-training data. Given an aspect description sequence x = (x 1 , ..., x M ), and an aspect summary sequence y = (y 1 , ..., y T ), PEASG aims to model the conditional likelihood P (y|x). The loss function is: L P EASG = t log P (y t |x, y <t ) 4 EXPERIMENTS AND RESULTS

4.1. PRE-TRAINED MODEL VARIANTS

To evaluate the effectiveness of pre-training with domain-specific data and with domain-specific knowledge separately, we implement pre-training experiments with two model variants: C-PLUG and E-PLUG, whose configurations are the same as that of K-PLUG. • C-PLUG is a pre-trained language model with the original objectives of MLM and MS2S, trained with a general pre-training corpus, CLUE (Xu et al., 2020) , which contains 30GB of raw text with around 8B Chinese words. • E-PLUG is a pre-trained language model with the original objectives of MLM and MS2S, trained with our collected e-commerce domain-specific corpus.

4.2. DOWNSTREAM TASKS

We fine-tune K-PLUG on three downstream tasks: e-commerce KB completion, abstractive product summarization, and multi-turn dialogue. The e-commerce KB completion task involves the prediction of product attributes and values given product information. The abstractive product summarization task requires the model to generate a product summary from textual product description. The multi-turn dialogue task aims to output the response by utilizing a multi-turn dialogue context. The domain-specific knowledge we defined in this paper is essential for these tasks.

4.2.1. E-COMMERCE KB COMPLETION

Task Definition. E-commerce KB provides abundant product information that is in the form of (product entity, product attribute, attribute value), such as (pid#133443, Material, Copper Aluminum). For the E-commerce KB completion task, the input is a textual product description for a given product, and the output is the product attribute values. Dataset. We conduct experiments on the dataset of MEPAVE (Zhu et al., 2020) . This dataset is collected from a major Chinese e-commerce platform, which consists of 87,194 instances annotated with the position of attribute values mentioned in the product descriptions. There are totally 26 types of product attributes such as Material, Collar Type, Color, etc. The training, validation, and testing sets contain 71,194/8,000/8,000 instances, respectively. Model. We consider the e-commerce KB completion task as a sequence labeling task that tags the input word sequence x = (x 1 , ..., x N ) with the label sequence y = (y 1 , ..., y N ) in the BIO format. For example, for the input sentence "A bright yellow collar", the corresponding labels for "bright" and "yellow" are Color-B and Color-I, respectively, and O for the other tokens. For an input sequence, K-PLUG outputs an encoding representation sequence, and a linear classification layer with the softmax predicts the label for each input token based on the encoding representation. Result. Table 1 shows the experimental results. We observe that our K-PLUG performs better than baselines. C-PLUG achieves significantly better performance than BERT, which indicates that MS2S can also benefit the NLU task. E-PLUG outperforms C-PLUG, showing that training with domainspecific corpus is helpful. K-PLUG further exhibits a 2.51% improvement compared with E-PLUG. In short, we can conclude that the improvement is due to both the domain-specific pre-training data and knowledge-injected pre-training objectives.

4.2.2. ABSTRACTIVE PRODUCT SUMMARIZATION

Task Definition. Abstractive product summarization task aims to capture the most attractive information of a product that resonates with potential purchasers. The input for this task is a product description, and the output is a condensed product summary. Dataset. We perform experiments on the dataset of Li et al. (2020) , which contains 1.4 million instances collected from a major Chinese e-commerce platform, covering three categories of product: Home Appliances, Clothing, and Cases & Bags. Each instance in the dataset is a (product information, product summary) pair, and the product information contains an image, a title, and other product descriptions. In our work, we do not consider the visual information of products. Notice that the task of abstractive product summarization and product entity aspect summary generation (PEASG) are partly different. The abstractive product summarization task aims to generate a complete and cohesive product summary given a detailed product description. Given a product aspect description, PEASG aims to produce an aspect summary that basically consists of condensed USPs. In addition, for abstractive product summarization task, the average length of the product summaries is 79, while the lengths of the product aspect summaries are less than 10 in general. Model. Abstractive product summarization task is an NLG task that takes the product description as the input and product summary as the output. Baselines. • LexRank (Erkan & Radev, 2004 ) is a graph-based extractive summarization method. • Seq2seq (Bahdanau et al., 2015) is a standard seq2seq model with an attention mechanism. • Pointer-Generator (PG) (See et al., 2017 ) is a seq2seq model with a copying mechanism. • Aspect MMPG (Li et al., 2020) is the-state-of-the-art method for abstractive product summarization, taking both textual and visual product information as the input. Result. Table 2 shows the experimental results, including ROUGE-1 (RG-1), ROUGE-2 (RG-2), and ROUGE-L (RG-L) F1 scores (Lin & Hovy, 2003) . K-PLUG clearly performs better than other text-based methods. E-commerce knowledge plays a significant role in the abstractive product summarization task, and domain-specific pre-training data and knowledge-injected pre-training objectives both enhance the model. K-PLUG achieves comparable results with the multimodal model, Aspect MMPG. The work of Li et al. (2020) suggests that product images are essential for this task, and we will advance K-PLUG with multimodal information in the future. 

4.2.3. MULTI-TURN DIALOGUE

Task Definition. The multi-turn dialogue task aims to output a response based on the multi-turn dialogue context (Shum et al., 2018) . The input for this task is the dialogue context consisting of previous question answering, and the output is the response to the last question. Dataset. We conduct experiments on two datasets of JDDC (Chen et al., 2020) and ECD (Zhang et al., 2018) . JDDC is collected from the conversations between users and customer service staffs from a popular e-commerce website in China and contains 289 different intents, which are the goals of a dialogue, such as updating addresses, inquiring prices, etc, from after-sales assistance. There are 1,024,196 multi-turn sessions and 20,451,337 utterances in total. The average number of turns for each session is 20, and the average tokens per utterance is about 7.4. After pre-processing, the training, validation, and testing sets include 1,522,859/5,000/5,000 (dialogue context, response) pairs, respectively. ECD is collected from another popular e-commerce website in China and covers over 5 types of conversations based on 20 commodities. Additionally, for each ground-truth response, negative responses are provided for discriminative learning. The training, validation, and testing sets include 1,000,000/10,000/10,000 (dialogue context, response) pairs, respectively. Model. We test with two types of K-PLUG: retrieval-based K-PLUG on the ECD dataset and generative-based K-PLUG on the JDDC dataset. For the retrieval-based approach, we concatenate the dialogue context and use [SEP] token to separate context and response. The [CLS] representation is fed into the output layer for classification. The generative-based approach is a sequence-to-sequence model, which is the same as the model adopted in the abstractive product summarization task. Baselines. The baselines also include both the retrieval-based (BM25, CNN, BiLSTM, and BERT) and generative-based approaches. Other baselines are as follows. • SMN (Wu et al., 2017 ) matches a response with each utterance in the context. • DUA (Zhang et al., 2018) is a deep utterance aggregation model based on the fine-grained context representations. • DAM (Zhou et al., 2018 ) matches a response with the context based using dependency information based on self-attention and cross-attention. • IoI (Tao et al., 2019) is a deep matching model by stacking multiple interactions blocks between utterance and response. • MSN (Yuan et al., 2019) selects relevant context and generates better context representations with the selected context. Result. Table 3 and 4 show the experimental results on the JDDC and ECD datasets, respectively. We report ROUGE-L (RG-L) F1, BLEU, and recall at position k in n candidates (R n @k). We can observe that, both on the retrieval-based and generative-based tasks, K-PLUG achieves new state-ofthe-art results, and e-commerce knowledge presents consistent improvements. K-PLUG is evidently superior to BERT, possibly due to BERT's lack of domain-specific knowledge for pre-training with the general MLM objective. We further perform a human evaluation on the JDDC dataset. We randomly choose 100 samples from the test set, and three experienced annotators are involved to determine whether K-PLUG outperforms E-PLUG with respect to (1) relevance between the response and the contexts and (2) readability of the response. The results are shown in Table 5 . We can see that the percentage of "Win", which denotes that the results of K-PLUG is better than E-PLUG, is significantly larger than "Lose" (p-value < 0.01 for t-test). Kappa values (Fleiss, 1971 ) confirm the consistency for different annotators. 

4.3. ABLATION STUDIES

To better understand our model, we perform ablation experiments to study the effects of different pre-training objectives. Result. The ablation results are shown in Table 6 . We can conclude that the lack of any pretraining objective hurts performance across all the tasks. KMS2S is the most effective objective for the abstractive product summarization and generative conversation tasks since this objective is highly close to the essence of NLG. Product-aspect-related objectives, i.e., PEABD and PEASG, contribute much to the abstractive product summarization task, which proves that this task requires comprehensively understanding the product description from the view of product aspects, going beyond individual tokens. 

5. CONCLUSION

We present a knowledge-injected pre-trained model (K-PLUG) that is a powerful domain-specific language model trained on a large-scale e-commerce corpus designed to capture e-commerce knowledge, including e-commerce KB, product aspects, product categories, and USPs. The pre-training framework combines masked language model and masked seq2seq with novel objectives formulated as product aspect boundary detection, product aspect summary generation, and product category classification tasks. Our proposed model demonstrates strong performances on both natural language understanding and generation downstream tasks, including e-commerce KB completion, abstractive product summarization, and multi-turn dialogue. Table 9 : Case study for the abstractive product summarization task (Clothing category). The K-PLUG model generates summaries describing more information about e-commerce knowledge bases and ::::::::::::::::::::::::::::::::::::: unique selling propositions of product entities. 



Our code is available at https://github.com/ICLR21Anonymous/knowledge_pretrain. https://www.jd.com/



Figure1: Pre-training data consists of 25 million textual product descriptions depicting multiple product aspects. We define knowledge as e-commerce knowledge-bases, aspects of product entities, categories of product entities, and :::::::::::::::::::::::::::::::::::::unique selling propositions of product entities. Pre-training objectives include knowledge-aware masked language model (KMLM), knowledge-aware masked sequence-to-sequence (KMS2S), product entity aspect boundary detection (PEABD), product entity category classification (PECC), and product entity aspect summary generation (PEASG).

:::::::: 珠片绣花，时尚又大方。经典圆领设计，自然突显 ::::::: 型男风范。 (Made from modal knit fabric. ::::::::::::::::::Incredibly soft and elastic. Features a :::::::::::: slightly looser fit. Natural draping with high durability. Decorated with flower print and :::::::::::::glitter embroidery for a stylish and casual look. Finished with a classy crewneck, :::::::::::::::::::::::::::::: appealing to the modern sense of fashion)E-PLUG 圆领的领口更好的将颈部曲线所勾勒而出。水洗磨白的颜色平添了几分帅气与随性。让你 和夏季的风一起随行。让休闲舒适的田园感油然而生。 (The crewneckdesign helps in elongating the neck. The light wash color gives an attractive yet effortless look. It dresses you up in a country asethetic style, comfy and relaxed just like the summer breeze. ) K-PLUG 采用莫代尔棉针织面料， ::::::: 柔滑触感， ::::::: 清凉舒爽，花卉印花结合 ::::::: 珠片绣花，时尚大方，简约 圆领设计，轻松 ::::::::::: 修饰颈部线条。 (Made from modal knit fabric. ::::::::::::::::::: Incredibly soft and stretchy. ::::::::::::: Slightly looser fit. Flower print and ::::::::::::: glitter embroidery. Stylish and casual. The basic crewneck design easily helps in :::::::::::::: elongating the neck.)Input 吉普，羽绒服，男，中长款，90绒，冬季外套，新品，连帽，加绒，加厚，保暖羽绒外套， Description 黑色，白鸭绒，时尚都市，型男都这么穿，融合艺术细节，创造76年传奇，潮流趋势必备， 温暖新升级...(Jeep's men's down jacket. Mid-thigh length. Filled with 90% down. Winter jacket. This winter's new arrival. Hoodedd. The down fill provides extra warmth. Warm down jacket. Black. White duck down. Urban style. Built for a perfect look. Designed with artistic details. Creating a legend for 76 years. A must-have to keep up with current fashion trends. Keeps you warmer than ever ... )Table 10: Case study for the abstractive product summarization task (Bags&Cases category). The K-PLUG model generates summaries describing more information about e-commerce knowledge bases and ::::::::::::::::::::::::::::::::::::: unique selling propositions of product entities. Input 菲安妮，秋季新款，斜跨包，女，印花磁扣，小方包，时尚单肩包，精致长款肩带，匀 Description 整车线工艺，高档ykk拉链，手感丝滑柔软且不脱色，优选进口水珠纹pvc，logo印花与 包身融为一体，手感柔软舒适，防水耐磨，皮质肩带轻便减压，长度可调节，单肩/斜挎 更舒心，平整均匀的车缝线技术 (Fion. This fall's new arrival. Corss body bag for women. Flower print magnetic snap closure. Square-shaped. Stylish tote bag. Well made long bag straps. Finished with flat lock stiching. Zippers produced by YKK. Flat and smooth surface. Anti-fading coloring. Made from imported PVC coated fabric. Logo print fits nicely. Soft and comfortable. Water-proof and durable. Lightweight leather shoulder strap. Adjustable length. 2 ways to carry. Finished with flat and even flatlock stiching.)Table11: Case study for the multi-turn dialogue task on the ECD dataset. The K-PLUG model produces more accurate responses for the questions related to e-commerce knowledge bases. t these walnuts sized differently? I'm getting 5 kilos. Are you still here?) E-PLUG 这是纸皮核桃手捏就开哦很好剥的呢 (This is paper-peel walnut, which you can peel easily with bare hands.) K-PLUG 一般直径在3-5cm (They mostly have a diameter between 3-5 cm.) Q1 这两个有什么区别 (What's the difference between these two?) A1 您好上面的纯棉的尺寸1520 (Hi, the cotton one above has a size of 1520.) Q2 这三种就是大小不一样啊 (These three only differ in size?) E-PLUG 是的哦亲以上是3种干巾的区别哦用途都一样呢 (Sure. That's exactly the difference between these three wiping products. They are made for the same purpose.) K-PLUG 中间的无纺布的尺寸1820 下面的天然纤维尺寸2020 加大加厚款 (The middle one made of nonwoven fabric has a size of 1820, and the bottom one made of natural fiber, with a size of 2020, is extra large and thick.) 主要是我们30 箱之下就是这个价格了呢 (It's because we have this set price for all orders below 30 boxes.) K-PLUG 单把茶刀48哦 (It's 48 yuan for a single tea knife.)

Experimental results with the F1 score for the e-commerce KB completion task. The results in the first block are taken fromZhu et al. (2020).

Experimental results with the ROUGE score for the abstractive product summarization task. The results in bold are the best performances among the models taking only texts as the input, and * denotes the model taking both product images and texts as the input. The results in the first and second blocks are taken fromLi et al. (2020).

Experimental results for the multiturn conversation task on the JDDC dataset. The results in the first block are taken fromChen et al. (2020).

Experimental results for the multiturn conversation task on the ECD dataset. The results in the first block are taken fromZhang et al. (2018).

Experimental results for ablation studies.

Shirt. This summer's new arrival. Contrasting colors. Soft and comfortable. Flower print. Short-sleeve. Black. Business casual. Crewneck. Summer. Urban style. Short-sleeve t-shirt. Naturally draping with high durability. Stylish crewneck. Basic and causal. Multiple color chioces. Modal knit fabric. Soft and elastic. Loose fit. Flower print. Glitter embroidery. and casual ....)

A.2 CASE STUDIES

We present some examples from the test set of each task, with comparisons of the ground-truth result and the outputs produced by the models of E-PLUG and K-PLUG.Table 7 : Case study for the e-commerce KB complete task. The product attribute and the corresponding attribute value is presented as [attribute value] product attribute . The K-PLUG model accurately complete the e-commerce KB, while the E-PLUG model sometimes fails. The K-PLUG model generates summaries describing more information about e-commerce knowledge bases and ::::::::::::::::::::::::::::::::::::unique selling propositions of product entities. boiling technique speeds up warming and increases efficiency. High concentration oxygen-free copper tank for faster warming up.)

