IDENTIFYING PHASE TRANSITION THRESHOLDS OF PERMUTED LINEAR REGRESSION VIA MESSAGE PASS-ING

Abstract

This paper considers the permuted linear regression, i.e., Y = ⇧ , and W 2 R n⇥m represent the observations, missing (or incomplete) information about ordering, sensing matrix, signal of interests, and additive sensing noise, respectively. As is shown in the previous work, there exists phase transition phenomena in terms of the signal-to-noise ratio (snr), number of permuted rows, etc. While all existing works only concern the convergence rates without specifying the associate constants in front of them, we give a precise identification of the phase transition thresholds via the message passing algorithm. Depending on whether the signal B \ is known or not, we separately identify the corresponding critical points around the phase transition regimes. Moreover, we provide numerical experiments and show the empirical phase transition points are well aligned with theoretical predictions.

1. INTRODUCTION

This paper considers the permuted linear regression Y = ⇧ \ XB \ + W, where Y 2 R m⇥n denotes the sensing result, ⇧ \ 2 R n⇥n represents the permutation matrix, X 2 R n⇥p is the sensing matrix, B \ is the signal of interests, W denotes the additive noise, and is the noise variance. The research on this problem dates back at least to 1970s under the name 'broken sample problem' (Goel, 1975; Bai & Hsing, 2005; DeGroot et al., 1971; DeGroot & Goel, 1976; 1980) . In recent years, we have witnessed a revival of this problem due to its board spectrum of applications in privacy protection, data integration, etc (Pananjady et al., 2018; Unnikrishnan et al., 2015; Slawski et al., 2020; Slawski & Ben-David, 2019; Pananjady et al., 2017; Zhang et al., 2022; Zhang & Li, 2020) . Associated with this problem comes a phase transition phenomenon: the error rate for the permutation recovery suddenly drops to zero once some parameters reach certain thresholds. Despite previous work such as Slawski et al. (2020) ; Slawski & Ben-David (2019) ; Pananjady et al. (2017) ; Zhang et al. (2022) ; Zhang & Li (2020) can all explain this phenomenon, the precise positions of the phase transition thresholds are never studied but rather their statistical order. In this work, we would like to leverage message passing (MP) algorithm to identify their precise location. As a byproduct, we also come up with an algorithm to partially recover the permutation matrix.

Related work.

The line of research starts with the literature in permuted linear regression. Among all the works mentioned above, the most related works include Slawski et al. (2020) ; Slawski & Ben-David (2019) ; Pananjady et al. (2017) ; Zhang et al. (2022) ; Zhang & Li (2020) , in which almost the same settings as ours are used. Pananjady et al. (2018) ; Slawski & Ben-David (2019) consider the single observation model (m = 1) and proved the snr for the correct permutation recovery will be O P (n c ), where c > 0 is some positive constant. Later, Slawski et al. (2020) ; Zhang et al. (2022) ; Zhang & Li (2020) investigate the multiple observations model (m > 1) and suggest the snr requirement can be significantly decreased, to put it more specifically, from O P (n c ) to O P n c/m . In particular, Zhang & Li (2020) develop an estimator which we will analyze as part of our contributions. Although they obtained the correct convergence rate to restore the correspondence, which are minimax-optimal in certain regimes, their results fail to specify the leading coefficients, or equivalently, the precise location of the phase transition threshold. Moreover, their analysis does not consider the intertwined influence among the parameters n, p, m, etc. One example would be the impact of the p/n ratio on the maximum allowed number of permuted rows, which has not been studied before this work. Another line of research comes from the field of statistical physics, which begins with Mézard & Parisi (1986; 1985) . Using the replica method, they study the linear assignment problem (LAP), i.e., min ⇧ P i,j ⇧ ij E ij where ⇧ denotes a permutation matrix and E ij is i.i.d random variable uniformly distributed within the regime [0, 1]. Martin et al. (2005) then generalize the LAP to multi-index matching and presented a investigation based on MP algorithm. And Caracciolo et al. (2017) ; Malatesta et al. (2019) extend the distribution of E ij to a broader class. However, all the above works exhibit no phase transition. In Chertkov et al. (2010) , this method is extended to the particle tracking problem, where a phase transition phenomenon is first observed. Later, Semerjian et al. (2020) modify it to fit the graph matching problem, which paves way for our work in studying the permuted linear regression. Our technical contributions are summarized as follows • We propose the first framework that can identify the precise location of phase transition thresholds associated with permuted linear regression. In the oracle case where B \ is known, our scheme is able to determine the phase transition snr. In the non-oracle case where B \ is not given, our scheme can further predict the maximum allowed permuted rows and uncover its dependence on the ratio n /p. • We generalize the full permutation estimator and first obtain a partial permutation estimator. Consider the example where the correspondence for a single index is desired. By removing all function nodes except that corresponds to that index, we exploit the MP algorithm and design an algorithm that converge in one step. Moreover, we show its performance almost match the estimator for the full permutation recovery. In addition, we would like to briefly mention the technical challenges. Compared with the previous works (Mezard & Montanari, 2009; Talagrand, 2010; Linusson & Wästlund, 2004; Mézard & Parisi, 1987; 1986; Parisi & Ratiéville, 2002; Semerjian et al., 2020) , where the edge weights are relatively simple, our edge weights usually involve high-order interactions across Gaussian random variables and are densely correlated. To tackle this issue, our proposed approximation method to compute the phase transition thresholds consists of three parts: (i) perform Taylor expansion; (ii) modify the leave-one-out technique; and (iii) size correction scheme. A detailed explanation can be found in Section 5. Hopefully, it will serve independent technical interests for researchers in the machine learning community. Notations. We use a a.s. ! b to suggest a converges almost surely to b. We denote f (n) u g(n) when lim n!1 f (n) /g(n) = 1. We denote f (n) = O P (g(n)) if the sequence f (n) /g(n) is bounded in probability; while we denote f (n) = o P (g(n)) if the sequence f (n) /g(n) converges to zero in probability. The inner product between two vectors (resp. matrices) are denoted as h•, •i. In addition, for two distributions d 1 and d 2 , we write d 1 ⇠ = d 2 if they are the equal up to some normalization. Moreover, we define P n as the set of all possible permutation matrices, i.e., P n , {⇧ 2 {0, 1} n⇥n , P i ⇧ ij = 1, P j ⇧ ij = 1} ; and associate each permutation matrix ⇧ 2 P n with a mapping ⇡ of {1, 2, . . . , n}, where ⇡(i) denotes the correspondence of index i permuted by ⇧, 1  i  n. The signal-to-noise-ratio (snr) is written as B \ 2 F /(m • 2 ), where | | |•| | | F is the Frobenius norm and 2 denotes the variance of the sensing noise. normal distribution, B \ 2 R p⇥m is the signal of interests, and W 2 R n⇥m represents the additive sensing noise and its entries W ij are i.i.d standard normal random variables. In addition, we denote h as the Hamming distance between the identity matrix and the permutation matrix ⇧ \ , i.e., h , P i (⇡ \ (i) 6 = i). The goal is to reconstruct the permutation matrix ⇧ \ from the pair (Y, X). As is well known in the previous works (Zhang & Li, 2020; Pananjady et al., 2018; Zhang et al., 2022) , there exists a phase transition phenomenon inherent in this problem. However, all these work only present the statistical order without specifying the constants. In this work, we would like to identify the precise position of the phase transition points in the large-system limit, i.e., n, m, p, and h all approach to infinity with m/n ! ⌧ m , p/n ! ⌧ p and h/n ! ⌧ h .foot_1 Inspired by the Mezard & Montanari (2009) ; Semerjian et al. (2020) ; Chertkov et al. (2010) , we borrow the tools from the statistical physics to identify the precise location of the phase transition threshold. In the following context, we separately study the phase transition phenomenon in (i) the oracle case where B \ is given as a prior and (ii) the non-oracle case where B \ is unknown.

3. BACKGROUND KNOWLEDGE ON GRAPHICAL MODELS

To begin with, we briefly review the linear assignment problem (LAP), which is defined as b ⇧ = argmin ⇧2Pn h⇧, Ei , where E 2 R n⇥n is a fixed matrix and P n denotes the set of all possible permutation matrices. The following context investigates the behavior of b ⇧ with the message-passing (MP) algorithm. First, we follow the approach in Semerjian et al. (2020) ; Mezard & Montanari (2009) and introduce a probability measure over the permutation matrix ⇧, which reads as µ(⇧) = Z 1 Y i 1 X j ⇧ ij Y j 1 X i ⇧ ij e P i,j ⇧ij Eij , where (•) is the indicator function, Z is the normalization constant of the probability measure µ(⇧), and > 0 is an auxiliary parameter. We can verify that the solution to the ML estimator maximizes the probability measure µ(⇧), which means we can study the properties of b ⇧ via the configuration argmax ⇧ µ(⇧). In addition, we notice the probability measure µ(⇧) concentrates on b ⇧ when letting ! 1. Then we identify the phase transition thresholds by studying the marginals of µ(⇧). To start with, we construct the factor graph associated with the probability measure in (2). Adopting the same strategy as in Chapter 16 in Mezard & Montanari (2009) , we (i) associate each variable ⇧ ij a variable node v ij ;

3.1. CONSTRUCTION OF GRAPHICAL MODEL

! ! " " ! !"#$!" (ii) connect the variable node v ij a function node representing the term e ⇧ij Eij ; and (iii) associate each constraint P i ⇧ ij = 1 one function node and similarly for the constraint P j ⇧ ij = 1. A graphical representa- tion is put in Figure 1. Then we briefly review the MP algorithm. Informally speaking, MP is local algorithm to compute the marginal probabilities over the graphical model. In each iteration, the variable node v transmits the message to its incident function node f by multiplying all incoming messages except that along the edge (v, f ). And the function node f transmits the message to its incident variable node v by computing the weighted summary of all incoming messages except that along the edge (f, v). For a detailed introduction to MP, we refer readers to Kschischang et al. (2001) ; Mezard & Montanari (2009) . MP is able to obtain the exact marginals (Mezard & Montanari, 2009) for the tree-like graphical models. While for graphs with a lot of short loops, which happens to be our case, this claim may become invalid. However, past works all suggest that MP can still obtain meaningful results when applying to LAP (Mezard & Montanari, 2009; Semerjian et al., 2020) .

3.2. MESSAGE PASSING (MP) ALGORITHM

Next, we turn to the permutation recovery via MP. The following derivation follows the standard procedure, which can be found in the previous works (Semerjian et al., 2020; Mezard & Montanari, 2009) . Denote the message flow from the node i L to the variable node i L , j R as b m i L !(i L ,j R ) (•) and that from the edge i L , j R to node i L as m (i L ,j R )!i L (•). Similarly, we define b m j R !(i L ,j R ) (•) and b m (i L ,j R )!j R (•) as the message flow transmitted between the functional node j R and the variable node i L , j R . Here the superscripts L and R are used to indicate the positions of the node (left and right). Roughly speaking, these transmitted messages can be viewed as (unnormalized) conditional probability P(⇧ i,j = {0, 1}|(•)) with the joint pdf being defined in (2). And the message transmission process is to iteratively compute these conditional probabilities. For a more detailed introduction of the MP algorithm, we refer to Mezard & Montanari (2009) ; MacKay et al. (2003) . First, we consider the message flows transmitted between the functional node i L and the variable node i L , j R , which are written as m (i L ,j R )!i L (⇡) ⇠ = b m j R !(i L ,j R ) (⇡)e ⇡E i L ,j R ; b m i L !(i L ,j R ) (⇡) ⇠ = X ⇡ i L ,k R Y k R 6 =j R b m k R !(i L ,k R ) (⇡ i L ,k R )e ⇡ i L ,k R E i L ,k R (⇡ + X k ⇡ i L ,k R = 1), (3) where ⇡ 2 {0, 1} is a binary data. Similarly, we can write the message flows between the functional node j R and the variable node, which are denoted as i L , j R as m (i L ,j R )!j R (⇡) and b m j R !(i L ,j R ) (⇡), respectively. With the parametrization method, we define h i L !(i L ,j R ) , 1 log b m i L !(i L ,j R ) (1) b m i L !(i L ,j R ) (0) ; h j R !(i L ,j R ) , 1 log b m j R !(i L ,j R ) (1) b m j R !(i L ,j R ) (0) . Define ⇣ as h i L !(i L ,j R ) + h j R !(i L ,j R ) E i L ,j R , we select the edge i L , j R according to the probability m (i L ,j R ) (⇡) , exp( ⇡⇣ i L ,j R ) 1+exp( ⇣ i L ,j R ) , ⇡ 2 {0, 1}. Provided m (i L ,j R ) (1) > m (i L ,j R ) (0), or equivalently, ⇣ i L ,j R > 0, (4) we pick b ⇡(i L ) = j R ; otherwise, we have b ⇡(i L ) 6 = j R . Due to the fact that µ(⇧) concentrates on b ⇧ when is sufficiently large, we can thus simply the MP update equation as h i L !(i L ,j R ) = min k R 6 =j R E i L ,k R h k R !(i L ,k R ) ; h j R !(i L ,j R ) = min k L 6 =i L E k L ,j R h k L !(k L ,j R ) , which is obtained by letting ! 1.

4. ANALYSIS OF ORACLE CASE

As a warm-up example, we first consider the oracle scenario, where B \ is given a prior. To reconstruct the permutation matrix ⇧ \ , we adopt the maximum-likelihood (ML) estimator reading as b ⇧ oracle = argmax ⇧ ⌦ ⇧, YB \> X > ↵ , s.t. X i ⇧ ij = 1, X j ⇧ ij = 1, ⇧ 2 {0, 1} n⇥n . ( ) Denote the variable E oracle ij as X > ⇡ \ (i) B \ B \> X j + W > i B \> X j , (1  i, j  n), we can transform the objective function in (6) as the canonical form of LAP, i.e., P i,j ⇧ ij E oracle ij .

4.1. IDENTIFYING THE PHASE TRANSITION THRESHOLD

This subsection studies the phase transition phenomenon inherent in the MP update equation ( 5). Following the same strategy as in Semerjian et al. (2020) , we divide all edges i L , j R into two categories based on whether the edge i L , j R corresponds to the ground-truth permutation matrix ⇧ \ or not. Within each category, we assume the edges's weights and the message flows along them are independent identically distributed. For the edge i L , ⇡ \ (i L ) corresponding to the ground-truth correspondence, we represent its weight as a random variable called ⌦ and the associated message flow as a random variable called H (both h i L !(i L ,j R ) and h j R !(i L ,j R ) ). Similarly, we define random variables b ⌦ and b H for other edges. Then we can rewrite (5) as b H (t+1) = min ⇣ ⌦ H (t) , H 0 (t) ⌘ , H (t+1) = min 1in 1 b ⌦ i b H (t) i , where (•) (t) denotes the update in the tth iteration, H 0 is an independent copy of H, and {H (t) i } 1in 1 and { b ⌦ i } 1in 1 denote the i.i.d. copies of random variables H (t) (•) and b ⌦ (•) . Then we turn to computing the critical point where the permutation matrix can be perfectly reconstructed. According to (4), this means the event H + H 0 > ⌦ holds with probability one. Conditional on this event, we can simplify (7) to be H (t+1) = min 1in 1 H (t) i + ⌅ i , where the random variable ⌅ is defined as the difference between b ⌦ and ⌦, i.e., ⌅ , b ⌦ ⌦; and {H (t) i } 1in 1 and {⌅ i } 1in 1 denote the i.i.d. copies of random variables H (t) (•) and ⌅ (•) . This equation can be viewed as the analogous version of the density evolution and state evolution, which are used to analyze the convergence of the message passing and approximate message passing algorithm, respectively (Chung, 2000; Richardson & Urbanke, 2001; 2008; Maleki, 2010; Donoho et al., 2009; Bayati & Montanari, 2011; Rangan, 2011) . Adopting the same viewpoint of Semerjian et al. (2020) , we treat (8) as a branching random walk (BRW) process, which satisfies Theorem 1 ((Biggins, 1977; Hammersley, 1974; Kingman, 1975; Semerjian et al., 2020) ). Consider the recursive distributional equation K (t+1) = min 1in K (t) i + ⌅ i , where K (t) i and ⌅ i are i.i.d copies of random variables K (t) (•) and ⌅ (•) , we have K (t+1) t a.s. ! inf ✓>0 1 ✓ log ⇥P n i=1 Ee ✓⌅i ⇤ , conditional on the event lim t!1 K (t) 6 = 1. With Theorem 1, we conclude the critical point for the correct permutation recovery, i.e., H +H 0 > ⌦, can be computed by letting inf ✓>0 1 ✓ log ⇥P n i=1 Ee ✓⌅i ⇤  0, since otherwise the condition in (4) will be violated. In the oracle case where B \ is known, we have random variable ⌅ be written as ⌅ = x > B \ B \> (x y) + wB \> (x y) , (9) where x and y follow the distribution N(0, I p⇥p ), and w follows the distribution N(0, I m⇥m ). For the convenience of computation, we consider the simple case where B \ is I p⇥p (m = p). Proposition 1. Consider the case where B \ is a re-scaled version of the identity matrix, i.e., I p⇥p , we can write the expectation Ee ✓⌅ , which is defined in (9), as Ee ✓⌅ = 1 + 2✓ 2 ✓ 2 2 2 + 2 2 m 2 (10) provided that ✓ 2 2 2 < 1, and ✓ 2 2 2 + 2 2  1 + 2✓ 2 . ( ) Provided the conditions in ( 11) is violated, we have the expectation Ee ✓⌅ to diverge to infinity, which suggests the optimal ✓ ⇤ for inf ✓>0 log(nEe ✓⌅ ) /✓ cannot be achieved. Using (10), we can compute the optimal ✓ ⇤ as ✓ ⇤ u q 2 log n | | |B \> B \ | | | 2 F +2 2 | | |B \ | | | 2 F . The corresponding snr oracle is written as snr oracle u 4 log n m 2 log n . ( ) The comparison between the theoretical values of the phase transition threshold and the numerical values are put in Table 1 , from which we conclude the phase transition threshold snr can be predicted to a good extent. As m increases, we believe the gap between the theoretical values and the numerical values will keep shrinking. The computation of phase transition threshold is by setting inf ✓>0 log(EZ•Ee ✓⌅ ) /✓ be zero. However, in certain scenarios, it can be extremely difficult or impossible to obtain a closed-formula of Ee ✓⌅ , let alone the optimal solution ✓. To handle such difficulties, we propose to approximate Ee ✓⌅ with Taylor expansion, which proceeds as Ee ✓⌅ = e ✓E⌅ • Ee ✓(⌅ E⌅) 1 = e ✓E⌅ •  1 + ✓ 2 2 • E (⌅ E⌅) 2 + O P ⇣ ✓ 4 E (⌅ E⌅) 4 ⌘ , where 1 is due to the fact E (⌅ E⌅) 3 = 0. To simplify the computation, we adopt one widely-used assumption, stating as Assumption 1. We assume ✓foot_2 • E (⌅ E⌅) 4 to be negligible. Then we obtain Ee ✓⌅ u e ✓E⌅ • ✓ 1 + ✓ 2 2 Var⌅ ◆ 2 u exp ✓ ✓E⌅ + ✓ 2 2 Var⌅ ◆ , where in 2 we use the approximation 1 + x u e x when x is near zero. In this way, the rather complicated computation of Ee ✓⌅ is replaced by the computation of mean E⌅ and variance Var⌅, which is still complex but manageable. With this approximation, the optimal ✓ ⇤ for log(nEe ✓⌅ ) /✓ is computed as p 2 log n /Var⌅ and hence the critical point corresponding to the phase transition as 2(log n)Var⌅ = (E⌅) 2 . ( ) To verify that this approximation can yield meaningful results, we revisit the oracle case and have E⌅ = B \ 2 F ; Var⌅ = 3 B \ B \> 2 F + 2 2 B \ 2 F . Plugging ( 15) into ( 14) then yields the relation 6 log n B \ B \> 2 F + 4 2 (log n) B \ 2 F = B \ 4 F , from which we can determine the critical point of snr. Discussion. As a comparison, we first revisit the simple case where B \ is I p⇥p (p = m). We can compute the phase transition snr as 4 log n /(m 6 log n). This solution is almost identical to (12) in the large-system limit as snr oracle u f snr oracle u n 1. Moreover, we should stress that (i) our approximation method applies to other types of matrices as well, rather than limited to the identity matrix; and (ii) our approximation method can predict the phase transition thresholds even when the entries X ij are sub-gaussian. An illustration is given in Table 2 : in (Case I), half of eigenvalues are with Ener while the other half are with Ener /2; in (Case II), half of the eigenvalues are with Ener while the other half are with 3Ener /4.

5. ANALYSIS OF NON-ORACLE CASE

Having presented the oracle case as a warm-up example, we now extend the analysis to the non-oracle case, where the value of B \ is not given a prior. To begin with, we need to recast the permutation recovery problem as a LAP. As shown in Zhang et al. (2022) , the ML estimator yields a quadratic assignment problem (QAP), which is NP-hard to solve and fails to meet this requirement. Fortunately,  ij as E non-oracle ij , Y i Y > XX > j , we can reconstruct the permutation matrix ⇧ \ as (1). Before proceeding, we would like to justify using the estimator in Zhang & Li (2020) to analyze the permuted linear regression in the non-oracle case: first, this estimator is proved to achieve both the computational and statistical optimality; second, this estimator also exhibits a phase transition phenomenon, which behaves similarly to that in the oracle case. Naturally, we should expect this estimator will incorporate some inherent properties of the permutation recovery problem (with unknown B \ ), from which we can gain meaningful insights.

5.1. ANALYSIS OF NON-ORACLE CASE

Having illustrated soundness of the approximation method in (13), we apply it to the non-oracle case, where the random variable is written as ⌅ = ⌅ 1 + (⌅ 2 + ⌅ 3 ) + 2 ⌅ 4 , where ⌅ i (1  i  4) are defined as ⌅ 1 , X > ⇡ \ (i) B \ B \> X > ⇧ \> X X ⇡ \ (i) X j ; ⌅ 2 , X > ⇡ \ (i) B \ W > X X ⇡ \ (i) X j ; ⌅ 3 , W > i B \> X > ⇧ \> X X ⇡ \ (i) X j ; ⌅ 4 , W > i W > X X ⇡ \ (i) X j , respectively. Then we conclude Theorem 2. The mean E⌅ of ⌅ in (39) and its variance Var⌅ are computed as E⌅ u n (1 ⌧ h ) h (1 + ⌧ p ) B \ 2 F + n⌧ m ⌧ p 2 i ; Var⌅ u n 2 ⌧ h (1 ⌧ h ) ⌧ 2 p h B \ 2 F + m 2 i 2 + n 2 h 2⌧ p + 3 (1 ⌧ h ) 2 i B \> B \ 2 F + n 2 h 6⌧ p (1 ⌧ h ) 2 + (3 ⌧ h ) ⌧ 2 p i B \> B \ 2 F , respectively, where the definitions of ⌧ p , ⌧ m and ⌧ h can be found in Section 2. For the clarity of presentation, we only present the proof outlines. Computation of mean E⌅. For the computation of the mean E⌅, easily we can verify that E⌅ 2 and E⌅ 3 are both zero, which is due to the independence between X and W. Regarding the computation of E⌅ 1 and E⌅ 4 , we adopt Wick's theorem and obtain E⌅ 1 = n (1 ⌧ h ) (1 + ⌧ p ) [1 + o P (1)] B \ 2 F ; E⌅ 4 = n 2 ⌧ m ⌧ p (1 ⌧ h ) (1 + o P (1)). Computation of variance Var⌅. With the relation Var(⌅) = E⌅ 2 (E⌅) 2 , our goal becomes computing E⌅ 2 , which consists the calculation of the following six terms E⌅ 2 = E⌅ 2 1 + 2 E⌅ 2 2 + 2 E⌅ 2 3 + 4 E⌅ 2 4 + 2 2 E⌅ 1 ⌅ 4 + 2 2 E⌅ 2 ⌅ 3 . The computation of above terms turns to be quite complex due to the high order Gaussian chaos. For example, term E⌅ 2 1 involves the eighth-order Gaussian chaos; terms E⌅ 2 2 , E⌅ 2 3 , E⌅ 1 ⌅ 4 and E⌅ 2 ⌅ 3 all involves the sixth-order Gaussian variables. To alleviate the computational burden, we compute the expectation E⌅ 2 in the following three phases. Table 3 : Comparison between the predicted value of the phase transition threshold ⌧ h and its numerical value when n = 500. P denotes the predicted value while N denotes the numerical value. (The numerical value of ⌧ h is the minimum ⌧ h when the correct permutation rate drops blow 0.05.) p 75 100 125 150 175 200 P 0.82 0.73 0.68 0.62 0.56 0.52 N 0.77 0.74 0.7 0.66 0.61 0.57 • Phase I. The solution in this phase comes from a modification of the so-called leave-one-out technique (Sur et al., 2019; El Karoui, 2013; 2018; Bai & Silverstein, 2010) . Notice that the major technical difficulty comes from the correlation between the product X > ⇧ \ X and the difference X ⇡ \ (i) X j . We decompose this correlation by first rewriting the matrix X > ⇧ \ X as the sum P `X`X > ⇡ \ (`) . Then we collect all terms X `X> ⇡ \ (`) independent of X ⇡ \ (i) and X j in the matrix ⌃ and leave the rest terms to matrix , which means , X > ⇧ \ X ⌃. This decomposition is in the same spirit of the leave-one-out technique. With this method, we divide all terms involved in the computation of E⌅ 2 into three categories: (i) those only containing matrix ⌃; (ii) those containing both ⌃ and ; and (iii) those only containing . Easily we can see that the first two categories contain most vectors' outer products while the last category only contains a finite number of such terms. • Phase II. Concerning the terms in the first two categories, which contains majority of terms, we can exploit the independence among rows in the sensing matrix X and reduce the order of Gaussian random variables by separately taking expectation w.r.t ⌃ and w.r.t vectors X ⇡ \ (i) and X j . • Phase III. For the few terms in the third category which contains high-order Gaussian chaos, we compute their expectations by iterative applying of Wick's Theorem and Stein's Lemma, which filters out the zero terms to reduce higher-order interactions between Gaussian random variables to lower-order interactions. For more technical details, we refer the interested readers to the supplementary material.

5.2. IDENTIFYING THE PHASE TRANSITION THRESHOLD

Having explained the computation of E⌅ and Var⌅, we turn to identifying the phase transition threshold. Different from the oracle case, we notice the edge weight E ij are strongly correlated in this case especially when j = ⇡ \ (j), which corresponds to the non-permuted rows. To factor out these independence, we only take the permuted rows into account and correct the sample size from n to ⌧ h n. Thus, we have Proposition 2. The critical point for the phase transition phenomenon transforms from (14 ) to 2 (log ⌧ h n) Var⌅ = (E⌅) 2 . Example 1. We consider the case where B \ = I p⇥p as an illustration. With Theorem 2 and Proposition 2, we obtain the solution snr non-oracle = ⌘1 /⌘2, where ⌘ 1 and ⌘ 2 are defined as ⌘ 1 , 2⌧ h ⌧ 2 p log (n⌧ h ) ⌧ p (⌧ p + 1) (1 ⌧ h ) + p 2⌧ p p (1 ⌧ h )⌧ h (log (n⌧ h )); ⌘ 2 , 2⌧ h ⌧ 2 p log(n⌧ h ) (1 ⌧ h ) (⌧ p + 1) 2 . Notice that the negative solution has been abandoned due to the non-negativity requirement of snr. Discussion. For the accuracy of the predicted phase transition snr non-oracle , we notice a increasing gap between the theoretical value and the numerical value when compared with that in the oracle case. Possible reasons include strong correlation across the edge weights {E ij } 1i,jn and the error within the approximation relation Ee ✓⌅ ⇡ E exp ✓E⌅ ✓ 2 Var⌅ /2 . In addition, we observe a singularity point, i.e., ⌧ h is approximately 0.73 in Figure 2 , which suggests a phase transition phenomenon. To validate the predicted phenomenon, we consider the noiseless case, i.e., snr = 1, and reconstruct the permutation matrix ⇧ \ with (1). Numerical experiments confirm our prediction by showing that the correct rate of the permutation recovery exceeds 0.2 when h/n  0.73 and drops below 0.05 when h/n > 0.74. Additional experiments are put in Table 3 , from which we conclude the solution (17) can predict the critical points w.r.t. ⌧ h to a good extent. Remark 1. Compared with the prior work (Zhang & Li, 2020) which only yields the statistical order, i.e. h /n  c (c is a positive constant), our framework can (i) specify the positive constant c, and (ii) uncover the dependence of ⌧ h on the ratio ⌧ p . In addition, Zhang & Li (2020) requires n p while our work allows n to be the same order of p, i.e., p/n = ⌧ p as n, p ! 1. Notice that this is consistent with numerical experiments such that n = O P (p) is sufficient for the permutation recovery. Apart from the phase transition thresholds, we would like to exploit MP to design algorithms for a partial permutation recovery. As an illustration, we consider recovering the correspondence ⇡ \ (i) for a single index i. In an effortless way, we can generalize it to recovering the correspondences for multiple indices by iterative applying the following procedure.

6. PARTIAL PERMUTATION RECOVERY

To start with, we modify the graphical model in Figure 1 by removing the unnecessary function nodes and their incident edge. Then we can rewrite the MP update equation The partial permutation recovery estimator also exhibits a phase transition, similar to the full permutation recovery estimator. However, their phase transition points are larger when compared with the corresponding points for the full permutation recovery. b m i L !(i L ,j R ) (⇡) ' X ⇡ i L ,k R Y k R 6 =j R exp ⇡ i L ,k R E i L ,k R (⇡ + X k ⇡ i L ,k R = 1). Compared with the MP for the full permutation recovery, MP for the partial permutation recovery can reach convergence in one iteration. Letting ! 1, we obtain the edge selection criteria as b ⇡(i R ) = argmin j E ij , which turns out to be a greedy selection scheme. Following the same procedure as in Sections 4 and 5, we can analyze its statistical properties, which are similar (although degraded) to the estimator for the full permutation recovery thereof (both the oracle case and nonoracle case). This claim is also confirmed by numerical experiments in Figure 3 .

7. CONCLUSION

This is the first work that can identify the precise location of phase transition thresholds of permuted linear regressions. For the oracle case where the signal B \ is given a prior, our analysis can predict the phase transition threshold snr oracle to a good extent. For the non-oracle case where B \ is not given, we modified the leave-one-out technique to approximately compute the phase critical snr non-oracle value for the phase transition, as the precise computation becomes significantly complicated as the high-order interaction between Gaussian random variables is involved. Moreover, we associated the singularity point in snr non-oracle with a phase transition point w.r.t the maximum allowed number of permuted rows. In the end, we generalized the full permutation recovery and obtained a partial permutation recovery algorithm. Following the same analytical procedure, we argued it would have a similar although degraded performance compared with the full permutation recovery algorithm, which is later confirmed by our numerical experiments. In the future, we will incorporate the replica symmetry breaking scheme into our framework and extend this framework to broad areas.



PROBLEM SETTINGIn this paper, we consider the linear regression with permuted labels reading asY = ⇧ \ XB \ + W,where Y 2 R n⇥m represents the sensing result, ⇧ \ 2 P n denotes a permutation matrix awaiting to be reconstructed, X 2 R n⇥p is the sensing matrix with each entry X ij following the i.i.d standard Although our analysis concerns the large-system limit, numerical results matches our predicted results to a good extent even when n, m, p, and h are a few hundreds. m



Figure 1: Illustration of the constructed graphical model. The circle icons represents the variable node; while the square icons represent the function node: the blue square icon represents the constraints for the rows of ⇧, the green square icon represents the constraints for the columns of ⇧, and the red square icon denotes the function e ⇡Eij .

in Zhang & Li (2020) is able to fill the gap. Define the edge weight E non-oracle

Figure 2: snr non-oralce when n = 500 and p = 100.

Figure3: Comparison between the full permutation recovery and partial permutation recovery. We set n = 500, p = 100, and m = 75. The partial permutation recovery estimator also exhibits a phase transition, similar to the full permutation recovery estimator. However, their phase transition points are larger when compared with the corresponding points for the full permutation recovery.

Comparison between the predicted value of the phase transition threshold snr oracle and its numerical value when n = 500. P denotes the predicted value while N denotes the numerical value. N value corresponds to the snr oracle when the error rate drops below 0.05.)

Comparison between the predicted value of the phase transition threshold snr oracle and its numerical value when n = 800. Gauss refers toX ij i.i.d ⇠ N(0, 1) while Unif refers to X ij i.i.d ⇠ Unif[ 1, 1].We averaged over 20 experiments.

