NEAR OPTIMAL PRIVATE AND ROBUST LINEAR RE-GRESSION

Abstract

We study the canonical statistical estimation problem of linear regression from n i.i.d. examples under (ε, δ)-differential privacy when a fraction of response variables are adversarially corrupted. We propose a variant of the popular differentially private stochastic gradient descent (DP-SGD) algorithm with two innovations: a full-batch gradient descent to improve sample complexity and a novel adaptive clipping to guarantee robustness. When there is no adversarial corruption, this algorithm improves upon the existing state-of-the-art approach and achieves near optimal sample complexity. Under label-corruption, this is the first efficient linear regression algorithm to provably guarantee both (ϵ, δ)-DP and robustness. Synthetic experiments confirm the superiority of our approach.

1. INTRODUCTION

Differential Privacy (DP) is a widely accepted notion of privacy introduced in (Dwork et al., 2006) , which is now standard in industry and government (Tang et al., 2017; Erlingsson et al., 2014; Fanti et al., 2016; Abowd, 2018) . A query to a database is said to be (ε, δ)-differentially private if a strong adversary who knows all other entries cannot identify with high confidence whether you participated in the database or not. The parameters ε and δ restrict the Type-I and Type-II errors achievable by the adversary (Kairouz et al., 2015) . Smaller ε > 0 and δ ∈ [0, 1] imply stronger privacy guarantees. Significant advances have been made recently in understanding the utility-privacy trade-offs in canonical statistical tasks. We provide a survey in App. A. However, several important questions remain open, some of which we address below. A canonical statistical task of linear regression is when n i.i.d. samples, {(x i ∈ R d , y i ∈ R)} n i=1 , are drawn from x i ∼ N (0, Σ), y i = x ⊤ i w * + z i , and z i ∼ N (0, σ 2 ). The error is measured in ∥ ŵ -w * ∥ Σ := ∥Σ 1/2 ( ŵ -w * )∥, which correctly accounts for the signal-to-noise ratio in each direction; in the direction of large eigenvalue of Σ, we have larger signal in x i and the same noise in z i , and hence expect smaller error. When computational complexity is not concerned, the best known algorithm is introduced by Liu et al. (2022b), called High-dimensional Propose-Test-Release (HPTR), that can be flexibly applied to a variety of statistical tasks to achieve the optimal sample complexity under (ε, δ)-DP. For linear regression, n = O(d/α 2 +d/(εα)) samples are sufficient for HPTR to achieve an error of (1/σ)∥ ŵw * ∥ 2 Σ = α with high probability. This is optimal, matching known information theoretic lower bounds. It remains an important open question if this can be achieved with an efficient algorithm. After a series of work surveyed in App. A, Varshney et al. (2022) achieves the best known sample complexity for an efficient algorithm: n = Õ(κ 2 d/ε + d/α 2 + κd/(εα)). The last term is suboptimal by a factor of κ, the condition number of the covariance Σ of the covariates, and the first term is unnecessary. We further close this gap in the following. Theorem 1 (informal version of Theorem 3 with no adversary). Under the (Σ, σ 2 , w * , K, a)-model in Assumption 1, n = Õ(d/α 2 + κ 1/2 d/(εα)) samples are sufficient for Algorithm 1 to achieve an error rate of (1/σ)∥ ŵ -w * ∥ 2 Σ = Õ(α) and (ε, δ)-DP, where κ := λ max (Σ)/λ min (Σ). Perhaps surprisingly, we show that the same algorithm is also robust against label-corruption, where an adversary selects arbitrary α corrupt fraction of the data points and changes their response variables arbitrarily. When computational complexity is not concerned, the best known algorithm is HPTR by Liu et al. (2022b) that also provides optimal robustness and (ε, δ)-DP simultaneously, i.e., n =

