A CRITIQUE OF SELF-EXPRESSIVE DEEP SUBSPACE CLUSTERING

Abstract

Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces, with each subspace defining a cluster with dimension lower than the ambient space. Many existing formulations for this problem are based on exploiting the self-expressive property of linear subspaces, where any point within a subspace can be represented as linear combination of other points within the subspace. To extend this approach to data supported on a union of non-linear manifolds, numerous studies have proposed learning an embedding of the original data using a neural network which is regularized by a self-expressive loss function on the data in the embedded space to encourage a union of linear subspaces prior on the data in the embedded space. Here we show that there are a number of potential flaws with this approach which have not been adequately addressed in prior work. In particular, we show the model formulation is often ill-posed in that it can lead to a degenerate embedding of the data, which need not correspond to a union of subspaces at all and is poorly suited for clustering. We validate our theoretical results experimentally and also repeat prior experiments reported in the literature, where we conclude that a significant portion of the previously claimed performance benefits can be attributed to an ad-hoc post processing step rather than the deep subspace clustering model.

1. INTRODUCTION AND BACKGROUND

Subspace clustering is a classical unsupervised learning problem, where one wishes to segment a given dataset into a prescribed number of clusters, and each cluster is defined as a linear (or affine) subspace with dimension lower than the ambient space. There have been a wide variety of approaches proposed in the literature to solve this problem (Vidal et al., 2016) , but a large family of state-of-the-art approaches are based on exploiting the self-expressive property of linear subspaces. That is, if a point lies in a linear subspace, then it can be represented as a linear combination of other points within the subspace. Based on this fact, a wide variety of methods have been proposed which, given a dataset Z ∈ R d×N of N d-dimensional points, find a matrix of coefficients C ∈ R N ×N by solving the problem: min C∈R N ×N F (Z, C) ≡ 1 2 ZC -Z 2 F + λθ(C) = 1 2 Z Z, (C -I)(C -I) + λθ(C) . (1) Here, the first term ZC -C 2 F captures the self-expressive property by requiring every datapoint to represent itself as an approximate linear combination of other points, i.e., Z i ≈ ZC i , where Z i and C i are the i th columns of Z and C, respectively. The second term, θ(C), is some regularization function designed to encourage each data point to only select other points within the 1

