NEURAL NONNEGATIVE CP DECOMPOSITION FOR HIERARCHICAL TENSOR ANALYSIS

Abstract

There is a significant demand for topic modeling on large-scale data with complex multi-modal structure in applications such as multi-layer network analysis, temporal document classification, and video data analysis; frequently this multi-modal data has latent hierarchical structure. We propose a new hierarchical nonnegative CANDECOMP/PARAFAC (CP) decomposition (hierarchical NCPD) model and a training method, Neural NCPD, for performing hierarchical topic modeling on multi-modal tensor data. Neural NCPD utilizes a neural network architecture and backpropagation to mitigate error propagation through hierarchical NCPD.

1. INTRODUCTION

The recent explosion in the collection and availability of data has led to an unprecedented demand for scalable data analysis techniques. Furthermore, data that has a multi-modal tensor format has become ubiquitous across numerous fields (Cichocki et al., 2009) . The need to reduce redundant dimensions (across modes) and to identify meaningful latent trends within data has rightly become an integral focus of research within signal processing and computer science. An important application of these dimension-reduction techniques is topic modeling, the task of identifying latent topics and themes of a dataset in an unsupervised or partially supervised approach. A popular topic modeling approach for matrix data is the dimension-reduction technique nonnegative matrix factorization (NMF) (Lee & Seung, 1999) , which is generalized to multi-modal tensor data by the nonnegative CP decomposition (NCPD) (Carroll & Chang, 1970; Harshman et al., 1970) . These models identify r latent topics within the data; here the rank r is a user-defined parameter that can be challenging to select without a priori knowledge or a heuristic selection procedure. In topic modeling applications, one often additionally wishes to understand the hierarchical topic structure (i.e., how the topics are naturally related and combine into supertopics). For matrices (tensors), a naive approach is to apply NMF (NCPD) first with rank r and then again with rank j < r, and simply identify the j supertopics as linear (multilinear) combinations of the original r subtopics. However, due to the nonconvexity of the NMF (NCPD) objective function, the supertopics identified in this way need not be linearly (multi-linearly) related to the subtopics. For this reason, hierarchical models which enforce these relationships between subtopics and supertopics have become a popular direction of research. A challenge of these models is that the nonconvexity of the model at each level of hierarchy can yield cascading error through the layers of models; several works have proposed techniques for mitigating this cascade of error (Flenner & Hunter, 2018; Trigeorgis et al., 2016; Le Roux et al., 2015; Sun et al., 2017; Gao et al., 2019) . In this work, we propose a hierarchical NCPD model and Neural NCPD, an algorithm for training this model which exploits backpropagation techniques to mitigate the effects of error introduced at earlier (subtopic) layers of hierarchy propagating downstream to later (supertopic) layers. This approach allows us to (1) explore the topics learned at different ranks simultaneously, and (2) illustrate the hierarchical relationship of topics learned at different tensor decomposition ranks. Notation. We follow the notational conventions of Goodfellow et al. ( 2016); e.g., tensor X, matrix X, vector x, and (integer or real) scalar x. In all models, we use variable r (with superscripts denoting layer of hierarchical models) to denote model rank and use j when indexing through rank-one components. In all tensor decomposition models, we use k to denote the order (number of modes) of the tensor and use i when indexing through modes of the tensor. In all hierarchical models, we use L to denote the number of layers in the model and use to index layers. We let ⊗ denote the vector outer product and adopt the CP decomposition notation [[X 1 , X 2 , • • • , X k ]] ≡ r j=1 x (1) j ⊗ x (2) j ⊗ • • • ⊗ x (k) j , where x (i) j is the jth column of the ith factor matrix X i (Kolda & Bader, 2009). Contributions. Our main contributions are two-fold. First, we propose a novel hierarchical nonnegative tensor decomposition model that we denote hierarchical NCPD (HNCPD). Our model treats all tensor modes alike and the output is not affected by the order of the modes in the tensor representation; this is a property not shared by other hierarchical tensor decomposition models such as that of Cichocki et al. (2007a) . Second, we propose an effective neural network-inspired training method that we call Neural NCPD. This method builds upon the Neural NMF method proposed in Gao et al. ( 2019), but is not a direct extension; Neural NCPD consists of a branch of Neural NMF for each tensor mode, but the backpropagation scheme must be adapted for factorization information flow between branches. Organization. In the remainder of Section 1, we present related work on tensor decompositions and training methods. In Section 2, we present our main contributions, hierarchical NCPD and the Neural NCPD method. In Section 3, we test Neural NCPD on real and synthetic data, and offer some brief conclusions in Section 4. We include justification of several computational details of our method and further experimental results in Appendix A.

1.1. RELATED WORK

In this section, we introduce NMF, hierarchical NMF, the Neural NMF method, and NCPD, and then summarize some relevant work. Nonnegative Matrix Factorization (NMF). Given a nonnegative matrix X ∈ R n1×n2 ≥0 , and a desired dimension r ∈ N, NMF seeks to decompose X into a product of two low-dimensional nonnegative matrices; dictionary matrix A ∈ R n1×r ≥0 and representation matrix S ∈ R r×n2 ≥0 so that X ≈ AS = r j=1 a j ⊗ s j , where a j is a column (topic) of A and s j is a row of S. Typically, r is chosen such that r < min{n 1 , n 2 } to reduce the dimension of the original data matrix or reveal latent themes in the data. Each column of S provides the approximation of the respective column in X in the lower-dimensional space spanned by the columns of A. The nonnegativity of the NMF factor matrices yields clear interpretability; thus, NMF has found application in document clustering (Xu et al., 2003; Gaussier & Goutte, 2005; Shahnaz et al., 2006) , and image processing and computer vision (Lee & Seung, 1999; Guillamet & Vitria, 2002; Hoyer, 2002) , amongst others. Popular training methods include multiplicative updates (Lee & Seung, 1999; 2001; Lee et al., 2009) , projected gradient descent (Lin, 2007) , and alternating least-squares (Kim et al., 2008; Kim & Park, 2008) . Hierarchical NMF (HNMF). HNMF seeks to illuminate hierarchical structure by recursively factorizing the NMF S matrices; see e.g., (Cichocki et al., 2009) . We first apply NMF with rank r (0) and then apply NMF with rank r (1) to the S matrix, collecting the r (0) subtopics into r (1) supertopics. HNMF with L layers approximately factors the data matrix as X ≈ A (0) S (0) ≈ A (0) A (1) S (1) ≈ • • • ≈ A (0) A (1) • • • A (L-1) S (L-1) . (3) Here the A (i) matrix represents how the subtopics at layer i collect into the supertopics at layer i + 1. Note that as L increases, the error X -A (0) A (1) • • • A (L-1) S (L-1) F necessarily increases as error propagates with each step. As a result, significant error is introduced when L is large. Choosing r (0) , r (1) , • • • , r (L-1) in practice proves difficult as the number of possibilities grow combinatorially.

