BT-CHAIN: BIDIRECTIONAL TRANSPORT CHAIN FOR TOPIC HIERARCHIES DISCOVERY

Abstract

Topic modeling has been an important tool for text analysis. Originally, topics discovered by a model are usually assumed to be independent. However, as a semantic representation of a concept, a topic is naturally related to others, which motivates the development of learning hierarchical topic structure. Most existing Bayesian models are designed to learn hierarchical structure, but they need nontrivial posterior inference. Although the recent transport-based topic models bypass the posterior inference, none of them considers deep topic structures. In this paper, we interpret the document as its word embeddings and propose a novel bidirectional transport chain to discover multi-level topic structures, where each layer learns a set of topic embeddings and the document hierarchical representations are defined as a series of empirical distributions according to the topic proportions and corresponding topic embeddings. To fit such hierarchies, we develop an upward-downward optimizing strategy under the recent conditional transport theory, where document information is first transported via the upward path, and then its hierarchical representations are refined according to the adjacent upper and lower layers in a layer-wise manner via the downward path. Extensive experiments on text corpora show that our approach enjoys superior modeling accuracy and interpretability. Moreover, we also conduct experiments on learning hierarchical visual topics from images, which demonstrate the adaptability and flexibility of our method.

1. INTRODUCTION

Topic models (TMs) like latent Dirichlet allocation (LDA) (Blei et al., 2003) , Poisson factor analysis (PFA) (Zhou et al., 2012) , and their various extensions (Teh et al., 2006; Hoffman et al., 2010; Blei, 2012; Zhou et al., 2016) are a family of popular techniques for discovering the hidden semantic structure from a collection of documents in an unsupervised manner. In addition to learning shallow topics, mining the potential hierarchical topic structures has obtained much research effort since the hierarchies are ubiquitous in big text corpora (Meng et al., 2020; Lee et al., 2022) and can be applied to a wide range of applications (Grimmer, 2010; Zhang et al.; Guo et al., 2020) . Hierarchical Bayesian probabilistic models have been commonly used to learn topic structures (Blei et al., 2010; Paisley et al., 2014; Gan et al., 2015; Henao et al., 2015; Zhou et al., 2016) , where a hierarchy of topics are learned and the topics in the higher layers serve as the priors of the topics in the lower layers. Despite the success of Bayesian models in topic structure mining, most of them employ Bayesian posterior inference to optimize their parameters (e.g., Markov Chain Monte Carlo (MCMC) and Variational Inference (VI)), which is usually non-trivial to derive and can be less flexible and efficient for big text corpora (Zhang et al., 2018) . Recent developments in Autoencoding Variational Inference (AVI) (Kingma & Welling, 2013; Rezende et al., 2014) provide stronger inference tools for Bayesian models and have inspired several neural topic models (Zhang et al., 2018; Duan et al., 2021a) , resulting in improved efficiency and flexibility. However, applying AVI to neural topic models still has some limitations or concerns. First, the estimation of variational posterior always needs a trade-off between accuracy and efficiency as an asymptotically exact method (Salimans et al., 2015) . Besides, the latent distributions are required to be reparameterizable and KL divergence is expected to be analytical, both of which are hard to meet for topic models since they usually depend on Dirichlet distribution or the Gamma distribution (Blei et al., 2003; Zhou et al., 2015) . Another concern comes from likelihood maximization, in which the inference of topic structure relies on word 1

