DIMENSION REDUCTION AS AN OPTIMIZATION PROBLEM OVER A SET OF GENERALIZED FUNCTIONS

Abstract

We reformulate unsupervised dimension reduction problem (UDR) in the language of tempered distributions, i.e. as a problem of approximating an empirical probability density function p emp (x) by another tempered distribution q(x) whose support is in a k-dimensional subspace. Thus, our problem is reduced to the minimization of the distance between q and p emp , D(q, p emp ), over a pertinent set of generalized functions. This infinite-dimensional formulation allows to establish a connection with another classical problem of data science -the sufficient dimension reduction problem (SDR). Thus, an algorithm for the first problem induces an algorithm for the second and vice versa. In order to reduce an optimization problem over distributions to an optimization problem over ordinary functions we introduce a nonnegative penalty function R(f ) that "forces" the support of f to be k-dimensional. Then we present an algorithm for minimization of I(f ) + λR(f ), based on the idea of two-step iterative computation, briefly described as a) an adaptation to real data and to fake data sampled around a k-dimensional subspace found at a previous iteration, b) calculation of a new k-dimensional subspace. We demonstrate the method on 4 examples (3 UDR and 1 SDR) using synthetic data and standard datasets.

1. INTRODUCTION

Linear dimension reduction (LDR) is a family of problems in data science that includes principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, sufficient dimensionality reduction (SDR), maximum autocorrelation factors, slow feature analysis and more. In unsupervised dimension reduction (UDR) we are given a finite number of points in R n (sampled according to some unknown distribution) and the goal is to find a "low-dimensional" affine (or linear) subspace that approximates "the support" of the distribution. The study field currently achieved a saturation level at which unifying frameworks for the problem become of special interest Cunningham & Ghahramani (2015) . An approach that we present in that paper is based on the theory of generalized functions, or tempered distributions Soboleff (1936); Schwartz (1949) . An important generalized function that cannot be represented as an ordinary function is the Dirac delta function, denoted δ, and δ n denotes its ndimensional version.

Any dataset {x

i } N i=1 ⊆ R n naturally corresponds to the distribution p emp (x) = 1 N N i=1 δ n (x-x i ) which, with some abuse of terminology, can be called the empirical probability density function. Based on that, UDR can be understood as a task whose goal is to approximate p emp (x) by q(x), where q(x) is a distribution whose density is supported in a k-dimensional affine subspace A ⊆ R n . Note that a function whose density is supported in some low-dimensional subset of R n is not an ordinary function. Exact definitions of such distributions can be found in Section 3. To formulate an optimization task we additionally need a loss D(p emp , q) that measures the distance between the ground truth p emp and a distribution q, that we search for. Thus, in our approach, the UDR problem is defined as: I (q) = D (p emp , q) → min q (1) under the condition that q(x) has a k-dimensional support. 1

