STRUCTURE CONTROLLABLE TEXT GENERATION

Abstract

Controlling the presented forms (or structures) of generated text are as important as controlling the generated contents during neural text generation. It helps to reduce the uncertainty and improve the interpretability of generated text. However, the structures and contents are entangled together and realized simultaneously during text generation, which is challenging for the structure controlling. In this paper, we propose an efficient, straightforward generation framework to control the structure of generated text. A structure-aware transformer (SAT) is proposed to explicitly incorporate multiple types of multi-granularity structure information to guide the text generation with corresponding structure. The structure information is extracted from given sequence template by auxiliary model, and the type of structure for the given template can be learned, represented and imitated. Extensive experiments have been conducted on both Chinese lyrics corpus and English Penn Treebank dataset. Both automatic evaluation metrics and human judgement demonstrate the superior capability of our model in controlling the structure of generated text, and the quality ( like Fluency and Meaningfulness) of the generated text is even better than the state-of-the-arts model.

1. INTRODUCTION

Natural language is not just a sequence collections of tokens but a structure well-organized sequence expressing understandable information. The structure of language usually obeys a set of grammatical rules, which helps beginners grasp the language with less efforts. Similarly, incorporating the structure into neural language model can obtain an increasing abstract level of representation and improves the generalization which may potentially reduce the need of large amount of training data (Shen et al., 2019b) . The incorporations of structure information demonstrates considerable improvements in many language understanding tasks (Zhang et al., 2019; Hao et al., 2019; Wang et al., 2019) . In text generation, it cares about not only the generated contents (i.e., what to say) but also the presented structure forms (i.e., how to say) (Peng et al., 2019) . Similar contents or meanings can be presented with different structure forms. The structures and contents can be considered and planned separately to achieve a highly informative generated text. From an empirical view, controlling or planning the generated structure may be helpful in several aspects: i) reducing the uncertainty of the generated contents with specific structure conditions, which may contribute to a good quality of generated text; ii) enhancing the interpretability of the generated text since more controlling attributes can be realized during the generation; iii) improving the structure, format or style consistence in specific structure-constraint generation task or specific domain generation with particular formats, such as style or paraphrase generation (Chen et al., 2019; Ficler & Goldberg, 2017) , poetry generation (Deng et al., 2020; Li et al., 2020), and lyric generation (Watanabe et al., 2018; Lu et al., 2019) . The language structures determined by the set of grammatical rules vary from different granularity levels, such as participial construction (pc) is character-level, part of speech (pos) is word/phrase level, and sequence length is sentence level. These kinds of structure are coupled and nested together, which are realized with the contents simultaneously in most of the token by token generation. It is difficult to disentangle the contents and the text structure, and even harder to discriminate and control the different granularity level of structure during text generation. Individually controlling some specific types of structure like sequence length (Kikuchi et al., 2016 ), verbal predicate (Tu et al., 2019) have been investigated in text generation. These works design specific structure representation and are inappropriately for controlling other types of structure, let alone controlling multiple

