INVERTIBLE MANIFOLD LEARNING FOR DIMENSION REDUCTION Anonymous

Abstract

It is widely believed that a dimension reduction (DR) process drops information inevitably in most practical scenarios. Thus, most methods try to preserve some essential information of data after DR, as well as manifold based DR methods. However, they usually fail to yield satisfying results, especially in high-dimensional cases. In the context of manifold learning, we think that a good low-dimensional representation should preserve the topological and geometric properties of data manifolds, which involve exactly the entire information of the data manifolds. In this paper, we define the problem of information-lossless NLDR with the manifold assumption and propose a novel two-stage NLDR method, called invertible manifold learning (inv-ML), to tackle this problem. A local isometry constraint of preserving local geometry is applied under this assumption in inv-ML. Firstly, a homeomorphic sparse coordinate transformation is learned to find the lowdimensional representation without losing topological information. Secondly, a linear compression is performed on the learned sparse coding, with the trade-off between the target dimension and the incurred information loss. Experiments are conducted on seven datasets with a neural network implementation of inv-ML, called i-ML-Enc, which demonstrate that the proposed inv-ML not only achieves invertible NLDR in comparison with typical existing methods but also reveals the characteristics of the learned manifolds through linear interpolation in latent space. Moreover, we find that the reliability of tangent space approximated by the local neighborhood on real-world datasets is key to the success of manifold based DR algorithms. The code will be made available soon.

1. INTRODUCTION

In real-world scenarios, it is widely believed that the loss of data information is inevitable after dimension reduction (DR), though the goal of DR is to preserve as much information as possible in the low-dimensional space. In the case of linear DR, compressed sensing (Donoho, 2006) breaks this common sense with practical sparse conditions of the given data. In the case of nonlinear dimension reduction (NLDR), however, it has not been clearly discussed, e.g. what is the structure within data and how to maintain these structures after NLDR? From the perspective of manifold learning, the manifold assumption is widely adopted, but classical manifold based DR methods usually fail to yield good results in the many practical case. Therefore, what is the gap between theoretical and real-world applications of manifold based DR? Here, we give the first detailed discussion of these two problems in the context of manifold learning. We think that a good low-dimensional representation should preserve the topology and geometry of input data, which require the NLDR transformation to be homeomorphic. Thus, we propose an invertible NLDR process, called inv-ML, combining sparse coordinate transformation and local isometry constraint which preserve the property of topology and geometry, to explain the information-lossless NLDR in manifold learning theoretically. We instantiate inv-ML as a neural network called i-ML-Enc via a cascade of equidimensional layers and a linear transform layer. Sufficient experiments are conduct to validate invertible NLDR abilities of i-ML-Enc and analyze learned representations to reveal inherent difficulties of classical manifold learning. Topology preserving dimension reduction. To start, we first make out the theoretical definition of information-lossless DR on a manifold. The topological property is what is invariant under a homeomorphism, and thus what we want to achieve is to construct a homeomorphism for dimension

