KETG: A KNOWLEDGE ENHANCED TEXT GENERA-TION FRAMEWORK

Abstract

Embedding logical knowledge information into text generation is a challenging NLP task. In this paper, we propose a knowledge enhanced text generation (KETG) framework, which incorporates both the knowledge and associated text corpus to address logicality and diversity in text generation. Specifically, we validate our framework on rhetorical text generation from our newly built rhetoric knowledge graph. Experiments show that our framework outperforms baseline models such as Transformer and GPT-2, on rhetorical type control, semantic comprehensibility and diversity.

1. INTRODUCTION

Recent pre-trained language models such as GPT-2 can capture clear semantic and syntactic features (Radford, 2018) , performing well in machine translation and abstract generation tasks (Li et al., 2016; Wang et al., 2016) . However, the application of language models in text generation still needs to be explored. The logic in text generation, especially literature creation, is always obscure, which means they are usually low-frequency, causing the difficulty of modeling by current language models. On the other hand, too much limits of prior information will lead to homogenization in generated texts. To address these issues, (Guan et al., 2020) proposes a knowledge-enhanced pretraining model for commonsense story generation, by transforming the commonsense triples into sentences using a template-based method. However, the template-based transformed sentences from commonsense triples for post-training are rather homogeneous. In this paper, we introduce an innovative knowledge enhanced text generation (KETG) framework, which incorporates knowledge tuples and their associated sentences in training, such that the logic relation lying in the knowledge tuples can be effectively addressed. Regarding the sentences associated with the knowledge tuples, we may generate the sentences from the tuples by template-based method as in (Guan et al., 2020) . However, incorporating real corpus sentences would be more beneficial as they generally exhibit more diversity than those generated from templates, if they are available. In this way, the generation model can learn the both logicality and diversity in the knowledge tuples and sentences. We validate our KETG framework on rhetorical text generation, which is an important and essential part in modern literature (Tu et al., 2013) . Rhetoric is quite obscure, requiring strong logical correlation, and a rhetoric knowledge graph with explicit logical information (rather than the commonsense knowledge graph) would be helpful to rhetorical text generation. Unfortunately, to the best of our knowledge, we are not aware of any rhetoric knowledge graph. Hence by using relation extraction methods, we build a rhetoric (specifically, here we refer to metaphor and personification) knowledge graph from a collection of Chinese poems and compositions. With the newly built rhetoric knowledge graph and the corpus from which the knowledge graph is extracted, we train a rhetorical text generation model. Both automatic and manual evaluations show that our KETG model outperforms baseline models on rhetorical type control, semantic comprehensibility and diversity. Experiments also illustrate that incorporating sentences by template-based method in training results in rather similar generated text as the template, while incorporating real corpus sentences brings more diversity in text generation. To sum up ,the main contributions of this paper are summarized as follows: 1

