GENERALIZED MATRIX LOCAL LOW RANK REPRESEN-TATION BY RANDOM PROJECTION AND SUBMATRIX PROPAGATION

Abstract

Detecting distinct submatrices of low rank property is a highly desirable matrix representation learning technique for the ease of data interpretation, called the matrix local low rank representation (MLLRR). Based on different mathematical assumptions of the local pattern, the MLLRR problem could be categorized into two sub-problems, namely local constant variation (LCV) and local linear low rank (LLR). Existing solutions on MLLRR only focused on the LCV problem, which misses a substantial amount of true and interesting patterns. In this work, we develop a novel matrix computational framework called RPSP (Random Probing based submatrix Propagation) that provides an effective solution for both of the LCV and LLR problems. RPSP detects local low rank patterns that grow from small submatrices of low rank property, which are determined by a random projection approach. RPSP is supported by theories of random projection. Experiments on synthetic data demonstrate that RPSP outperforms all state-of-the-art methods, with the capacity to robustly and correctly identify the low rank matrices under both LCV and LLR settings. On real-world datasets, RPSP also demonstrates its effectiveness in identifying interpretable local low rank matrices.

1. INTRODUCTION

Matrix approximation has found wide-range utilities in recommendation systems, computer vision and text mining. Traditional matrix low rank approximation methods, such as truncated singular value decomposition (SVD) and rank minimization, assumes that the observed matrix has a global low-rank, indicating that the low rank components are dense. This becomes challenging in the phase of data interpretation. In real world data, both features and incidences may form sparse subspace structures. As illustrated in 



Fig 1, a matrix can be generated as the sum of a series of local low rank matrices, each consists of a sparse set of features and incidences. One example of such 'locality' property is the purchase history data, where a subset of items were purchased under a common reason by a subset of customers, while neither the items bought together or the users sharing a common purchase reason is known Cheng et al. (2014). Similarly, in biological single cell RNA-sequencing data, a subgroup of genes may be regulated by an unknown signal that is activated only in a subset of cells, which forms a local low rank gene co-regulation module Xia et al. (2017); Wan et al. (2019a); Chang et al. (2020). In addition, shapes, numbers and words in imaging data are also local low rankLee et al. (2016). In these situations, Matrix Local Low Rank Representation (MLLRR) is more advantageous with its locality assumptions to uncover more interpretable patterns hidden in the data.

Figure 1: One example of the Matrix Local Low Rank Representation (MLLRR) Problem.

