A HYPERGRADIENT APPROACH TO ROBUST REGRESSION WITHOUT CORRESPONDENCE

Abstract

We consider a regression problem, where the correspondence between input and output data is not available. Such shuffled data is commonly observed in many real world problems. Taking flow cytometry as an example, the measuring instruments are unable to preserve the correspondence between the samples and the measurements. Due to the combinatorial nature, most of existing methods are only applicable when the sample size is small, and limited to linear regression models. To overcome such bottlenecks, we propose a new computational framework -ROBOT-for the shuffled regression problem, which is applicable to large data and complex models. Specifically, we propose to formulate the regression without correspondence as a continuous optimization problem. Then by exploiting the interaction between the regression model and the data correspondence, we propose to develop a hypergradient approach based on differentiable programming techniques. Such a hypergradient approach essentially views the data correspondence as an operator of the regression, and therefore allows us to find a better descent direction for the model parameter by differentiating through the data correspondence. ROBOT is quite general, and can be further extended to the inexact correspondence setting, where the input and output data are not necessarily exactly aligned. Thorough numerical experiments show that ROBOT achieves better performance than existing methods in both linear and nonlinear regression tasks, including real-world applications such as flow cytometry and multi-object tracking.

1. INTRODUCTION

Regression analysis has been widely used in various machine learning applications to infer the the relationship between an explanatory random variable (i.e., the input) X ∈ R d and a response random variable (i.e., the output) Y ∈ R o (Stanton, 2001) . In the classical setting, regression is used on labeled datasets that contain paired samples {x i , y i } n i=1 , where x i , y i are realizations of X, Y , respectively. Unfortunately, such an input-output correspondence is not always available in some applications. One example is flow cytometry, which is a physical experiment for measuring properties of cells, e.g., affinity to a particular target (Abid & Zou, 2018) . Through this process, cells are suspended in a fluid and injected into the flow cytometer, where measurements are taken using the scattering of a laser. However, the instruments are unable to differentiate the cells passing through the laser, such that the correspondence between the cell proprieties (i.e., the measurements) and the cells is unknown. This prevents us from analyzing the relationship between the instruments and the measurements using classical regression analysis, due to the missing correspondence. Another example is multi-object tracking, where we need to infer the motion of objects given consecutive frames in a video. This requires us to find the correspondence between the objects in the current frame and those in the next frame. The two examples above can be formulated as a shuffled regression problem. Specifically, we consider a multivariate regression model Y = f (X, Z; w) + ε, where X ∈ R d , Z ∈ R e are two input vectors, Y ∈ R o is an output vector, f : R d+e → R o is the unknown regression model with parameters w and ε is the random noise independent on X and Z. When we sample realizations from such a regression model, the correspondence between (X, Y ) and Z is not available. Accordingly, we collect two datasets D 1 = {x i , y i } n i=1 and D 2 = {z j } n j=1 , and there exists a permutation π * such that (x i , z π(i) ) corresponds to y i in the regression model. Our goal is to recover the unknown model parameter w. Existing literature also refer to the shuffled regression problem as unlabeled sensing, homomorphic sensing, and regression with an unknown permutation (Unnikrishnan et al., 2018) . Throughout the rest of the paper, we refer to it as Regression WithOut Correspondence (RWOC). A natural choice of the objective for RWOC is to minimize the sum of squared residuals with respect to the regression model parameter w up to the permutation π(•) over the training data, i.e., min w,π L(w, π) = n i=1 y i -f x i , z π(i) ; w 2 2 . (1) Existing works on RWOC mostly focus on theoretical properties of the global optima to equation 1 for estimating w and π (Pananjady et al., 2016; 2017b; Abid et al., 2017; Elhami et al., 2017; Hsu et al., 2017; Unnikrishnan et al., 2018; Tsakiris & Peng, 2019) . The development of practical algorithms, however, falls far behind from the following three aspects: • Most of the works are only applicable to linear regression models. • Some of the existing algorithms are of very high computational complexity, and can only handle small number of data points in low dimensions (Elhami et al., 2017; Pananjady et al., 2017a; Tsakiris et al., 2018; Peng & Tsakiris, 2020) . For example, Abid & Zou (2018) adopt an Expectation Maximization (EM) method where Metropolis-Hastings sampling is needed, which is not scalable. Other algorithms choose to optimize with respect to w and π in an alternating manner, e.g., alternating minimization in Abid et al. (2017) . However, as there exists a strong interaction between w and π, the optimization landscape of equation 1 is ill-conditioned. Therefore, these algorithms are not effective and often get stuck in local optima. • Most of the works only consider the case where there exists an exact one-to-one correspondence between D 1 and D 2 . For many more scenarios, however, these two datasets are not necessarily well aligned. For example, consider D 1 and D 2 collected from two separate databases, where the users overlap, but are not identical. As a result, there exists only partial one-to-one correspondence. A similar situation also happens to multiple-object tracking: Some objects may leave the scene in one frame, and new objects may enter the scene in subsequent frames. Therefore, not all objects in different frames can be perfectly matched. The RWOC problem with partial correspondence is known as robust-RWOC, or rRWOC (Varol & Nejatbakhsh, 2019) , and is much less studied in existing literature. To address these concerns, we propose a new computational framework -ROBOT (Regression withOut correspondence using Bilevel OptimizaTion). Specifically, we propose to formulate the regression without correspondence as a continuous optimization problem. Then by exploiting the interaction between the regression model and the data correspondence, we propose to develop a hypergradient approach based on differentiable programming techniques (Duchi et al., 2008; Luise et al., 2018) . Our hypergradient approach views the data correspondence as an operator of the regression, i.e., for a given w, the optimal correspondence is π(w) = arg min π L(w, π). (2) Accordingly, when applying gradient descent to (1), we need to find the gradient with respect to w by differentiating through both the objective function L and the data correspondence π(w). For simplicity, we refer as such a gradient to "hypergradient". Note that due to its discrete nature, π(w) is actually not continuous in w. Therefore, such a hypergradient does not exist. To address this issue, we further propose to construct a smooth approximation of π(w) by adding an additional regularizer to equation 2, and then we replace π(w) with our proposed smooth replacement when computing the hyper gradient of w. Moreover, we also propose an efficient and scalable implementation of

