NEURAL TOPIC MODEL VIA OPTIMAL TRANSPORT

Abstract

Recently, Neural Topic Models (NTMs) inspired by variational autoencoders have obtained increasingly research interest due to their promising results on text analysis. However, it is usually hard for existing NTMs to achieve good document representation and coherent/diverse topics at the same time. Moreover, they often degrade their performance severely on short documents. The requirement of reparameterisation could also comprise their training quality and model flexibility. To address these shortcomings, we present a new neural topic model via the theory of optimal transport (OT). Specifically, we propose to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions. Importantly, the cost matrix of the OT distance models the weights between topics and words, which is constructed by the distances between topics and words in an embedding space. Our proposed model can be trained efficiently with a differentiable loss. Extensive experiments show that our framework significantly outperforms the state-of-the-art NTMs on discovering more coherent and diverse topics and deriving better document representations for both regular and short texts.

1. INTRODUCTION

As an unsupervised approach, topic modelling has enjoyed great success in automatic text analysis. In general, a topic model aims to discover a set of latent topics from a collection of documents, each of which describes an interpretable semantic concept. Topic models like Latent Dirichlet Allocation (LDA) (Blei et al., 2003) Despite the promising performance and recent popularity, there are several shortcomings for existing NTMs, which could hinder their usefulness and further extensions. i) The training and inference processes of NTMs are typically complex due to the prior and posterior constructions of latent topics. To encourage topic sparsity and smoothness, Dirichlet (Burkhardt & Kramer, 2019) or gamma (Zhang et al., 2018) distributions are usually used as the prior and posterior of topics, but reparameterisation is inapplicable to them, thus, complex sampling schemes or approximations have to be used, which could limit the model flexibility. ii) A desideratum of a topic model is to generate better topical representations of documents with more coherent and diverse topics; but for many existing NTMs, it is hard to achieve good document representation and coherent/diverse topics at the same time. This is because the objective of NTMs is to achieve lower reconstruction error, which usually means topics are less coherent and diverse, as observed and analysed in Srivastava & Sutton (2017); Burkhardt & Kramer (2019) . iii) It is well-known that topic models degrade their performance severely on short documents such as tweets, news headlines and product reviews, as each individual document contains insufficient word co-occurrence information. This issue can be exacerbated for NTMs because of the use of the encoder and decoder networks, which are usually more vulnerable to data sparsity.



and its hierarchical/Bayesian extensions, e.g., in Blei et al. (2010); Paisley et al. (2015); Gan et al. (2015); Zhou et al. (2016) have achieved impressive performance for document analysis. Recently, the developments of Variational AutoEncoders (VAEs) and Autoencoding Variational Inference (AVI) (Kingma & Welling, 2013; Rezende et al., 2014) have facilitated the proposal of Neural Topic Models (NTMs) such as in Miao et al. (2016); Srivastava & Sutton (2017); Krishnan et al. (2018); Burkhardt & Kramer (2019). Inspired by VAE, many NTMs use an encoder that takes the Bag-of-Words (BoW) representation of a document as input and approximates the posterior distribution of the latent topics. The posterior samples are further input into a decoder to reconstruct the BoW representation. Compared with conventional topic models, NTMs usually enjoy better flexibility and scalability, which are important for the applications on large-scale data.

