STRUCTURE CONTROLLABLE TEXT GENERATION

Abstract

Controlling the presented forms (or structures) of generated text are as important as controlling the generated contents during neural text generation. It helps to reduce the uncertainty and improve the interpretability of generated text. However, the structures and contents are entangled together and realized simultaneously during text generation, which is challenging for the structure controlling. In this paper, we propose an efficient, straightforward generation framework to control the structure of generated text. A structure-aware transformer (SAT) is proposed to explicitly incorporate multiple types of multi-granularity structure information to guide the text generation with corresponding structure. The structure information is extracted from given sequence template by auxiliary model, and the type of structure for the given template can be learned, represented and imitated. Extensive experiments have been conducted on both Chinese lyrics corpus and English Penn Treebank dataset. Both automatic evaluation metrics and human judgement demonstrate the superior capability of our model in controlling the structure of generated text, and the quality ( like Fluency and Meaningfulness) of the generated text is even better than the state-of-the-arts model.

1. INTRODUCTION

Natural language is not just a sequence collections of tokens but a structure well-organized sequence expressing understandable information. The structure of language usually obeys a set of grammatical rules, which helps beginners grasp the language with less efforts. Similarly, incorporating the structure into neural language model can obtain an increasing abstract level of representation and improves the generalization which may potentially reduce the need of large amount of training data (Shen et al., 2019b) . The incorporations of structure information demonstrates considerable improvements in many language understanding tasks (Zhang et al., 2019; Hao et al., 2019; Wang et al., 2019) . In text generation, it cares about not only the generated contents (i.e., what to say) but also the presented structure forms (i.e., how to say) (Peng et al., 2019) . Similar contents or meanings can be presented with different structure forms. The structures and contents can be considered and planned separately to achieve a highly informative generated text. From an empirical view, controlling or planning the generated structure may be helpful in several aspects: i) reducing the uncertainty of the generated contents with specific structure conditions, which may contribute to a good quality of generated text; ii) enhancing the interpretability of the generated text since more controlling attributes can be realized during the generation; iii) improving the structure, format or style consistence in specific structure-constraint generation task or specific domain generation with particular formats, such as style or paraphrase generation (Chen et al., 2019; Ficler & Goldberg, 2017) , poetry generation (Deng et al., 2020; Li et al., 2020), and lyric generation (Watanabe et al., 2018; Lu et al., 2019) . The language structures determined by the set of grammatical rules vary from different granularity levels, such as participial construction (pc) is character-level, part of speech (pos) is word/phrase level, and sequence length is sentence level. These kinds of structure are coupled and nested together, which are realized with the contents simultaneously in most of the token by token generation. It is difficult to disentangle the contents and the text structure, and even harder to discriminate and control the different granularity level of structure during text generation. Individually controlling some specific types of structure like sequence length (Kikuchi et al., 2016 ), verbal predicate (Tu et al., 2019) have been investigated in text generation. These works design specific structure representation and are inappropriately for controlling other types of structure, let alone controlling multiple types of structure simultaneously. Directly embedding the structure and adding them into the word embeddings can achieve considerable controlling capability in character-level structure during text generation, such as tone level and rhyme (Deng et al., 2020) controlling in Chinese poetry generation. While this method may fail when the controlled structure (such as phrase level or sentence level) needs to aware the subsequent structure during the generation process. In addition to summarizing the structure embeddings and word embeddings, SongNet (Li et al., 2020) designs another structure embeddings which are queried and incorporated globally by the summarized embeddings to renew the representation. With pre-training and fine-tuning, the SongNet (Li et al., 2020) can also achieve good controllability in tailor-designed formatsfoot_0 (sentence level structure). The symbol sets for this format are particular designed and may not applicable for other type of structure. Contrast to the above works, in this paper, we are not focus on controlling specific type of structure or format, instead we propose a framework to control more general types of structure in text generation. This framework allows for controlling individual type of structure, multiple or multigranularity types of structure during text generation. The controlled types of structure are extracted from sequence templates (any valid sentence is a valid template) by one or several auxiliary models. The extracted structure information are regarded as conditions, and the auxiliary model can be any credible model or tool that can extract soundable structure information from template. Since we want the generation of the current token or word can aware the global structures, the bi-directional transformer encoder is adopted for structure representation and learning. The learned structure representations are further incorporated into the decoder to guide the realization of the controlled structure. The main contributions of this work are summarized as follows: • A straightforward, interpretable structure controlling text generation framework is proposed, which is capable of controlling multi-granularity sequence structure from characterlevel to sentence-level structure by explicitly incorporating the corresponding structure information. • A simple alignment method and structure embedding, representation and learning method are proposed, which are utilized for representing the multi-granularity and multiple types of structure. • A structure-aware transformer language model is proposed, and the structure representation and token representation can be learned simultaneously. The structure information are queried globally and incorporated into the token representation with attention mechanism, which contribute to controlling the generated structure. • Extensive experiments in controlling different individual type of structure and multigranularity types of structure have been conducted on Chinese lyrics corpus. The structure controllability is effective and the quality of the generated lyrics is favorable. We also conduct controlling experiments on English Penn Treebank dataset, which demonstrates similar structure controlling capability with this proposed framework. (2019) , where for each input text, an exemplar text is retrieved from the training data and is then used to construct a customized decoder for outputting a target. It is ambiguously to discriminate how much the exemplar contributes to the generated structure or contents. Another similar work is SongNet (Li et al., 2020) , which are proposed to control the so called rigid formats. The rigid for-



This format or structure is more about the length of each sentence within one paragraph or passage.



Controllable text generation has received much attention recently. Many efforts are devoted to controlling the content of the generated text(Kiddon et al., 2016; Lebret et al., 2016; Shen et al., 2019a). Based on conditioned RNN language model, stylistic parameters are further incorporated as conditioning context to control stylistic aspects of the generated text(Ficler & Goldberg, 2017). Basing generator on VAEs, Hu et al. (2017) proposes a generative model to generate plausible sentences with designated semantics. A simple plug and play language model is proposed inDathathri et al.  (2019)  to guide controlling attributes (e.g. topic or sentiment) in text generation, without further training of the pre-trained language model. None of these work attempts to control the structure of the generated text. A similar approach, exemplar-based text generation, is proposed in Peng et al.

