Topic Aware Transformer: Domain Shift for Unconditional Text Generation Models

Abstract

Our goal is to guide pre-trained language models (PLMs) towards unconditional text generation tasks while resolving the domain gap and avoiding the catastrophic forgetting. Because Transformer-based models are pretrained on more massive and heterogeneous corpora than specific target corpus, the gap between these corpora and the target corpus raises the question of whether these PLMs will actually benefit this task even after fine-tuning. As the domain adaptation of PLMs needs to bridge this gap, we propose a framework, Topic Aware Transformer (TAT), that adapts PLMs for target-aware text generation while alleviating catastrophic forgetting. The motivation of TAT to distill the target-specific knowledge as topics, and steer PLMs toward these topics. This requirement and motivation lead us to introduce a topic steering layer (TSL) as an additional layer, and Topic Distribution Modeling (TDM) as a training task. Experiments show that these components resolve the gap as the domain shift, and can tailor PLMs to generate text to better reflect a given small fine-tuning corpus.

1. INTRODUCTION

Our goal is to adapt pre-trained language models (PLMs) to achieve unconditional text generation toward a target domain. The success of Transformer-based PLMs motivates us to explore how to fine-tune them so as to well reflect a given target corpus thereby generating more personalized texts with very few specializations. The size of the target corpus is generally much smaller than that of existing pre-training corpora, which may lead to catastrophic forgetting (Ramasesh et al., 2021) . For example, the popular pre-training data sets of Giga5en (Parker et al., 2011) , and ClueWeb 2012-Bfoot_0 occupy 16G, and 25TB, respectively. PLMs can become biased toward the patterns of language used in the training data (Keskar et al., 2019) . Given the rapid diversification of applications, a pre-training approach is needed to effectively achieve domain shift without catastrophic forgetting. Toward this domain shift, we propose a framework, Topic Aware Transformer (TAT), that adapts PLMs as unconditional generative tasks while alleviating catastrophic forgetting. As the domain knowledge consists of global (e.g., linguistic) and specific (e.g., semantic) knowledge, our intuition is that knowledge can be represented as a distribution of words, and the gap between the source and the target domain can be taken to be differences between distributions. These intuitions motivate TAT to detect these distributions via topics, and steer PLMs toward these topics to highlight the target-specific knowledge. That is, the motivation of TAT is to introduce a topic steering layer (TSL) as an additional layer that detects topics and helps training PLMs theoretically, and Topic Distribution Modeling (TDM) as a training task to align text on the topic representation on the target domain. To prevent catastrophic forgetting, TAT can fine-tune PLMs while bridging the domain gap without updating PLM parameters. Experiments confirm that TAT supports PLMs and verify its advantages as follows; •Theoretical contributions: TSL allows topics to act as unsupervised labels that represent global and target-specific word distributions as domain knowledge, and adapts PLMs to



https://www.lemurproject.org/clueweb09.php/ 1

