IDENTIFYING PHASE TRANSITION THRESHOLDS OF PERMUTED LINEAR REGRESSION VIA MESSAGE PASS-ING

Abstract

This paper considers the permuted linear regression, i.e., Y = ⇧ , and W 2 R n⇥m represent the observations, missing (or incomplete) information about ordering, sensing matrix, signal of interests, and additive sensing noise, respectively. As is shown in the previous work, there exists phase transition phenomena in terms of the signal-to-noise ratio (snr), number of permuted rows, etc. While all existing works only concern the convergence rates without specifying the associate constants in front of them, we give a precise identification of the phase transition thresholds via the message passing algorithm. Depending on whether the signal B \ is known or not, we separately identify the corresponding critical points around the phase transition regimes. Moreover, we provide numerical experiments and show the empirical phase transition points are well aligned with theoretical predictions.

1. INTRODUCTION

This paper considers the permuted linear regression Y = ⇧ \ XB \ + W, where Y 2 R m⇥n denotes the sensing result, ⇧ \ 2 R n⇥n represents the permutation matrix, X 2 R n⇥p is the sensing matrix, B \ is the signal of interests, W denotes the additive noise, and is the noise variance. The research on this problem dates back at least to 1970s under the name 'broken sample problem ' (Goel, 1975; Bai & Hsing, 2005; DeGroot et al., 1971; DeGroot & Goel, 1976; 1980) . In recent years, we have witnessed a revival of this problem due to its board spectrum of applications in privacy protection, data integration, etc (Pananjady et al., 2018; Unnikrishnan et al., 2015; Slawski et al., 2020; Slawski & Ben-David, 2019; Pananjady et al., 2017; Zhang et al., 2022; Zhang & Li, 2020) . 2020) can all explain this phenomenon, the precise positions of the phase transition thresholds are never studied but rather their statistical order. In this work, we would like to leverage message passing (MP) algorithm to identify their precise location. As a byproduct, we also come up with an algorithm to partially recover the permutation matrix. 



Associated with this problem comes a phase transition phenomenon: the error rate for the permutation recovery suddenly drops to zero once some parameters reach certain thresholds. Despite previous work such as Slawski et al. (2020); Slawski & Ben-David (2019); Pananjady et al. (2017); Zhang et al. (2022); Zhang & Li (

Related work. The line of research starts with the literature in permuted linear regression. Among all the works mentioned above, the most related works include Slawski et al. (2020); Slawski & Ben-David (2019); Pananjady et al. (2017); Zhang et al. (2022); Zhang & Li (2020), in which almost the same settings as ours are used. Pananjady et al. (2018); Slawski & Ben-David (2019) consider the single observation model (m = 1) and proved the snr for the correct permutation recovery will be O P (n c ), where c > 0 is some positive constant. Later, Slawski et al. (2020); Zhang et al. (2022); Zhang & Li (2020) investigate the multiple observations model (m > 1) and suggest the snr requirement can be significantly decreased, to put it more specifically, from O P (n c ) to O P n c/m . In particular, Zhang & Li (2020) develop an estimator which we will analyze as part of our contributions. Although they obtained the correct convergence rate to restore the correspondence,

