ONE-STEP ESTIMATOR FOR PERMUTED SPARSE RECOVERY

Abstract

This paper considers the unlabeled sparse recovery under multiple measurements, i.e., represents the observations, missing (or incomplete) correspondence information, sensing matrix, sparse signals, and additive sensing noise, respectively. Different from the previous works on multiple measurements (m > 1) which all focus on the sufficient samples regime, namely, n > p, we consider a sparse matrix B and investigate the insufficient samples regime (i.e., n p) for the first time. To begin with, we establish the lower bound on the sample number and signal-to-noise ratio (SNR) for the correct permutation recovery. Moreover, we present a simple yet effective estimator. Under mild conditions, we show that our estimator can restore the correct correspondence information with high probability. Numerical experiments are presented to corroborate our theoretical claims.

1. INTRODUCTION

In recent years, linear regression with permuted correspondence has received increasing attention due to its wide applications in the field of machine learning, signal processing, and statistics. Among all these applications, two most prominent examples are (i) linkage record, which merges two datasets pertaining to the same objects into one comprehensive dataset; and (ii) data de-anonymization, which infers the hidden labels of private data with public datasets. Apart from these two applications, other applications include correspondence estimation between pose and estimation in graphics; timedomain sampling in the presence of clock jitter; multi-target tracking; unsupervised data alignment, etc (Pananjady et al., 2018; Slawski & Ben-David, 2019; Slawski et al., 2020; Zhang et al., 2018) . In this paper, we consider the canonical model, i.e., a linear sensing relation with permuted labels: Y = Π XB + W, where Y ∈ R n×m is the sensing result, Π ∈ R n×n is an unknown permutation matrix, X ∈ R n×p is the design (sensing) matrix, B ∈ R p×m represents the sparse signals of interests, and W ∈ R n×m denotes the additive noise. Assuming the signal B is a sparse signal, to put more specifically, each column of B is k-sparse, we would like to (i) study the statistical limits of the permutation recovery under this scenario, e.g., the minimum sample number n and signal-to-noise ratio (SNR); and (ii) propose a practical estimator that can efficiently recover the permutation once the minimum requirements are met. To begin with, we briefly review the previous works. Related Works. The study of permuted linear regression has a long history that can at least date back to DeGroot & Goel (1976; 1980); Goel (1975); Bai & Hsing (2005) . Recent interests on this area start from Unnikrishnan et al. (2015) . Focusing on the noiseless case W = 0 with single measurement (m = 1), Unnikrishnan et al. (2015) establish the necessary condition n ≥ 2p for the permutation recovery if B is an arbitrary vector residing within the linear space R p . Later, Pananjady et al. (2018) extend the analysis to the noisy scenario. They showed the minimum SNR should be at least the order of Ω(n c ), where c > 0 is some positive constant. Numerical experiments suggest c is within the region [4, 5] . Other works such as Hsu et al. ( 2017 2014), the setting with a sparse signal B is first studied. However, only empirical investigation is conducted without rigorous theoretical analysis. In the first work with theoretical analysis (Zhang & Li, 2021), both the statistical limits and practical estimators with almost optimal performance are presented for the permutation recovery. Peng et al. (2021) studies the



); Abid et al. (2017); Slawski & Ben-David (2019); Tsakiris et al. (2020); Haghighatshoar & Caire (2018) also focus on this regime and obtain the same answer. In Emiya et al. (

