EFFICIENT COVARIANCE ESTIMATION FOR SPARSIFIED FUNCTIONAL DATA

Abstract

To avoid prohibitive computation cost of sending entire data, we propose four sparsification schemes RANDOM-KNOTS, RANDOM-KNOTS-SPATIAL, B-SPLINE, BSPLINE-SPATIAL, and present corresponding nonparametric estimation of the covariance function. The covariance estimators are asymptotically equivalent to the sample covariance computed directly from the original data. And the estimated functional principal components effectively approximate the infeasible principal components under regularity conditions. The convergence rate reflects that leveraging spatial correlation and B-spline interpolation helps to reduce information loss. Data-driven selection method is further applied to determine the number of eigenfunctions in the model. Extensive numerical experiments are conducted to illustrate the theoretical results. 1

1. INTRODUCTION

Dimension reduction has received increasing attention to avoid expensive and slow computation. Stich et al. (2018) investigated the convergence rate of Stochastic Gradient Descent after sparsification. Jhunjhunwala et al. (2021) focused on the mean function of a vector containing only a subset of the original vector. The goal of this paper is to estimate the covariance function of sparsified functional data, which is a set of sparsified vectors collected from a distributed system of nodes. Functional data analysis (FDA) has become an important research area due to its wide applications. Classical FDA requires a large number of regularly spaced measurements per subject. The data takes the form {(x ij , j/d) , 1 ≤ i ≤ n, 1 ≤ j ≤ d} in which x i (•) is a latent smooth trajectory, x i (•) = m(•) + Z i (•). (1) The deterministic function m(•) denotes the common population mean, the random Z i (•) are subjectspecific small variation with EZ i (•) = 0. Hsing & Eubank (2015) . Mercer Lemma entails that the ψ k 's are continuous and ) , in which the random coefficients, ξ k , called functional principal component (FPC) scores, are uncorrelated with mean 0 and variance 1. The rescaled eigenfunctions, ϕ k , called FPC, satisfy (•) = m(•) + Z(•), EZ(t) = 0. The true covariance function is G (t, t ′ ) = Cov {Z(t), Z (t ′ )}. Let sequences {λ k } ∞ k=1 and {ψ k } ∞ k=1 be the eigenvalues and eigenfunctions of G (t, t ′ ), respec- tively, in which λ 1 ≥ λ 2 ≥ • • • ≥ 0, ∞ k=1 λ k < ∞, {ψ k } ∞ k=1 form an orthonormal basis of L 2 [0, 1], see G (t, t ′ ) = ∞ k=1 λ k ψ k (t)ψ k (t ′ ), G (t, t ′ ) ψ k (t ′ ) dt ′ = λ k ψ k (t). The standard process x(•) allows Karhunen-Loève L 2 representation x(•) = m(•)+ ∞ k=1 ξ k ϕ k (• ϕ k = √ λ k ψ k and {x(t) -m(t)}ϕ k (t)dt = λ k ξ k , for k ≥ 1. Although the sequences {λ k } ∞ k=1 , {ϕ k } ∞ k=1 and {ξ ik } n,∞ i=1,k=1 exist mathematically, they are either unknown or unobservable.

1.1. MAIN CONTRIBUTION

In FDA, covariance estimation plays a critical role in FPC analysis (Ramsay & Silverman (2005) , Li & Hsing (2010)), functional generalized linear models and other nonlinear models (Yao et al. (2005b) ). We propose four sparsification schemes, RANDOM-KNOTS & RANDOM-KNOTS-SPATIAL can be classified as RANDOM-SPARSIFICATION where knots are selected uniformly from the entire points at random. B-SPLINE & BSPLINE-SPATIAL are called FIXED-SPARSIFICATION, intercepting knots at fixed positions in each dimension of the vector. For all sparsification modes, we construct the two-step covariance estimator Ĝ (•, •), where the first step involving sparsified trajectories and the second step plug-in covariance estimator by using the estimated trajectories in place of the latent true trajectories. The statistic is further multiplied by an appropriate constant to ensure unbiasedness. The covariance estimator Ĝ (•, •) enjoys good properties and can effectively approximates the sample covariance function Ḡ (•, •) computed directly from the original data. This paper improves the performance of statistics from the following two aspects, requiring little or no side information and additional local computation. • SPATIAL CORRELATION: We adjust the fixed weights assigned to different vectors to datadriven parameters that represent the amount of spatial correlation. This family of statistics naturally takes the influence of correlation among nodes into account, which can be viewed as spatial factorfoot_1 . Theoretical derivation reveals that the estimation error can be drastically reduced when spatial factors among subjects are considered.

• B-SPLINE INTERPOLATION:

To fill in the gap between equispaced knots, we introduce spline interpolation method to characterize the temporally ordered trajectories of the functional data. The estimated trajectory obtained by B-spline smoothing method is as efficient as the original trajectory. The proposed covariance estimator has globally consistent convergence rate, enjoying superior theoretical properties than that without interpolation. Superior to the covariance estimation leveraging tensor product B-splines, our two-step estimators are guaranteed to be the positive semi-definite. In sum, the main advantage of our methods is the computational efficiency and feasibility for largescale dense functional data. It is practically relevant since curves or images measured using new technology are usually of much higher resolution than the previous generation. This directly leads to the doubling of the amount of data recorded in each node, which is also the motivation of this paper to propose sparsification before feature extraction, modeling, or other downstream steps. The paper is organized as follows. Section 2 introduces four sparsification schemes and the corresponding unbiased covariance estimators. We also deduce the convergence rate of the covariance estimators and the related FPCs. Simulation studies are presented in Section 3 and application in domain clustering is in Section 4. All technical proofs are involved in the Appendix. 

2. MAIN RESULTS

We consider n geographically distributed nodes, each node generates a d-dimensional vector x i = (x i1 , . . . , x id ) ⊤ for i ∈ {1, 2, . . . , n}. The mean and covariance functions could be estimated



The code is attached to the supplementary material and will be publicly available once accepted. The superscript SPAT is applied to distinguish whether the statistic considering spatial factor.



RELATED WORK Considerable efforts have been made to analyze first-order structure of function-valued random elements, i.e., the functional mean m (•). Estimation of mean function has been investigated in Jhunjhunwala et al. (2021), Garg et al. (2014), Suresh et al. (2017), Mayekar et al. (2021) and Brown et al. (2021). Cao et al. (2012) and Huang et al. (2022) considered empirical mean estimation using B-spline smoothing. The second-order structure of random functions -covariance function G (•, •) is the next object of interests. To the best of our knowledge, spatial correlation across nodes has not yet been considered in the context of sparsified covariance estimation. The research on sparsification has received wide attention recently, for instance Alistarh et al. (2018), Stich et al. (2018) and Sahu et al. (2021). Sparsification methods mainly focus on sending only a subset of elements of the vectors, yet no existing method combine sparsity method with B-spline fitting. Moreover, there has been striking improvement over sparse PCA. Berthet & Rigollet (2013b) and Choo & d'Orsi (2021) have analyzed the complexity of sparse PCA; Berthet & Rigollet (2013a) and Deshpande & Montanari (2014) have obtained sparse principle components for particular data models. Since our estimation methods are innovative, the related study of PCA is proposed for the first time.

