LEARNING LOCALITY AND ISOTROPY IN DIALOGUE MODELING

Abstract

Existing dialogue modeling methods have achieved promising performance on various dialogue tasks with the aid of Transformer and the large-scale pre-trained language models. However, some recent studies revealed that the context representations produced by these methods suffer the problem of anisotropy. In this paper, we find that the generated representations are also not conversational, losing the conversation structure information during the context modeling stage. To this end, we identify two properties in dialogue modeling, i.e., locality and isotropy, and present a simple method for dialogue representation calibration, namely SimDRC, to build isotropic and conversational feature spaces. Experimental results show that our approach significantly outperforms current state-of-the-art models on three open-domain dialogue tasks with eight benchmarks. More in-depth analyses further confirm the effectiveness of our proposed approach. We release the code at https://github.com/hahahawu/SimDRC.

1. INTRODUCTION

Dialogue modeling (Serban et al., 2016; Mehri et al., 2019; Liu et al., 2021) is to encode the raw text of the input dialogue to the contextual representations. Although the Transformer-based dialogue modeling methods (Hosseini-Asl et al., 2020; Liu et al., 2021) have achieved great success on various dialogue tasks, there are still some impediments in these methods that are not well explored nowadays. Specifically, recent studies (Ethayarajh, 2019; Su et al., 2022) have revealed that on dialogue generation tasks, the representations produced by existing dialogue modeling methods are anisotropic, i.e. features occupy a narrow cone in the vector space, thus leading to the problem of degeneration. To alleviate this problem, previous solutions (e.g. SimCTG) (Su et al., 2021; 2022) encourage the model to learn isotropic token embeddings by pushing away the representations of distinct tokens. While building the more discriminative and isotropic feature space, these methods still ignore learning dialogue-specific features, such as inter-speaker correlations and conversational structure information, in the dialogue modeling stage. Therefore, a question is naturally raised -are the representations produced by existing dialogue modeling methods really conversational? To answer this question, in Figure 1 (a), we showcase the cosine similarity matrix of token representations produced by BART (Lewis et al., 2020) that is well trained on response generation task. First, we can easily observe the phenomenon of anisotropy from the heatmap where the similarities of distinct tokens are relatively high, over 0.5 for most token pairs. Then, Figure 1 (b) illustrates the similarity heatmap of token representations produced by SimCTG where the color is faded on the whole, suggesting the problem of anisotropy is relaxed. However, another critical problem still remains, is that the representations of tokens in different utterances are nearby to each other, making the utterance indistinguishable on the token representations. It is undesirable that no conversational features can be captured from the token similarity matrix while the matrix is produced by a "dialogue modeling" method trained on the dialogue task using dialogue data. Ideally, we expect that the

