FAST NONLINEAR VECTOR QUANTILE REGRESSION

Abstract

Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable Y given explanatory features X. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for vector-valued target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions via optimal transport. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target Y given the features X; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, number of regressed quantile levels, or number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose vector monotone rearrangement, a method which ensures the quantile functions estimated by VQR are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications.

1. INTRODUCTION

Quantile regression (QR) (Koenker & Bassett, 1978 ) is a well-known method which estimates a conditional quantile of a target variable Y, given covariates X. A major limitation of QR is that it deals with a scalar-valued target variable, while many important applications require estimation of vector-valued responses. A trivial approach is to estimate conditional quantiles separately for each component of the vector-valued target. However this assumes statistical independence between targets, a very strong assumption rarely held in practice. Extending QR to high dimensional responses is not straightforward because (i) the notion of quantiles is not trivial to define for high dimensional variables, and in fact multiple definitions of multivariate quantiles exist (Carlier et al., 2016) ; (ii) quantile regression is performed by minimizing the pinball loss, which is not defined for high dimensional responses. Carlier et al. (2016) and Chernozhukov et al. (2017) introduced a notion of quantiles for vector-valued random variables, termed vector quantiles. Key to their approach is extending the notions of monotonicity and strong representation of scalar quantile functions to high dimensions, i.e.

Co-monotonicity:

(Q Y (u) -Q Y (u )) (u -u ) ≥ 0, ∀ u, u ∈ [0, 1] d (1) Strong representation: Y = Q Y (U), U ∼ U[0, 1] d (2) where Y is a d-dimensional variable, and Q Y : [0, 1] d → R d is its vector quantile function (VQF). Moreover, Carlier et al. ( 2016) extended QR to vector-valued targets, which leads to vector quantile regression (VQR). VQR estimates the conditional vector quantile function (CVQF) Q Y|X from samples drawn from P (X,Y) , where Y is a d-dimensional target variable and X are k-dimensional covariates. They show that a function Q Y|X which obeys co-monotonicity (1) and strong representation (2) exists and is unique, as a consequence of Brenier's polar factorization theorem (Brenier, 1991) . Figure 1 provides a visualization of these notions for a two-dimensional target variable. (u) = [Q 1 (u), Q 2 (u)] is co-monotonic with u = (u 1 , u 2 ); Q 1 , Q 2 are Assuming a linear specification Q Y|X (u; x) = B(u) x + a(u), VQR can be formulated as an optimal transport problem between the measures of Y|X and U, with the additional meanindependence constraint E [U|X] = E [X]. The primal formulation of this problem is a large-scale linear program and is thus intractable for modestly-sized problems. A relaxed dual formulation which is amenable to gradient-based solvers exists but leads to co-monotonicity violations. The first goal of our work is to address the following limitations of Carlier et al. (2016; 2020) : (i) the linear specification assumption on the CVQF, and (ii) the violation of co-monotonicity when solving the inexact formulation of the VQR problem. The second goal of this work is to make VQR an accessible tool for off-the-shelf usage on large-scale high-dimensional datasets. Currently there are no available software packages to estimate VQFs and CVQFs that can scale beyond toy problems. We aim to provide accurate, fast and distribution-free estimation of these fundamental statistical quantities. This is relevant for innumerable applications requiring statistical inference, such as distribution-free uncertainty estimation for vector-valued variables (Feldman et al., 2021) We propose custom stochastic-gradient-based solvers which maintain a constant memory footprint regardless of problem size. We demonstrate that our approach scales to millions of samples and thousands of quantile levels and allows for GPU-acceleration. Nonlinear VQR. To address the limitation of linear specification, in Section 4 we propose nonlinear vector quantile regression (NL-VQR). The key idea is fit a nonlinear embedding function of the input features jointly with the regression coefficients. This is made possible by leveraging the relaxed dual formulation and solver introduced in Section 3. We demonstrate, through synthetic and real-data experiments, that nonlinear VQR can model complex conditional quantile functions substantially better than linear VQR and separable QR approaches. Vector monotone rearrangement (VMR). In Section 5 we propose VMR, which resolves the co-monotonicity violations in estimated CVQFs. We solve an optimal transport problem to rearrange



Figure 1: (a) Visualization of the vector quantile function (VQF) and its α-contours, a highdimensional generalization of α-confidence intervals. Data was drawn from a 2d star-shaped distribution. Vector quantiles (colored dots) are overlaid on the data (middle). Different colors correspond to α-contours, each containing 100 • (1 -2α) 2 percent of the data. The VQF Q Y(u) = [Q 1 (u), Q 2 (u)] is co-monotonic with u = (u 1 , u 2 ); Q 1 , Q 2 aredepicted as surfaces (left, right) with the corresponding vector quantiles overlaid. On Q 1 , increasing u 1 for a fixed u 2 produces a monotonically increasing curve, and vice versa for Q 2 . (b) Visualization of conditional vector quantile functions (CVQFs) via α-contours. Data was drawn from a joint distribution of (X, Y) where Y|X = x has a star-shaped distribution rotated by x degrees. The true CVQF Q Y|X changes non-linearly with the covariates X, while E [Y|X] remains the same. This demonstrates the challenge of estimating CVQFs from samples of the joint distribution. Appendix C provides further intuitions regarding VQFs and CVQFs, and details how the α-contours are constructed from them.

Figure 1: (a) Visualization of the vector quantile function (VQF) and its α-contours, a highdimensional generalization of α-confidence intervals. Data was drawn from a 2d star-shaped distribution. Vector quantiles (colored dots) are overlaid on the data (middle). Different colors correspond to α-contours, each containing 100 • (1 -2α) 2 percent of the data. The VQF Q Y(u) = [Q 1 (u), Q 2 (u)] is co-monotonic with u = (u 1 , u 2 ); Q 1 , Q 2 aredepicted as surfaces (left, right) with the corresponding vector quantiles overlaid. On Q 1 , increasing u 1 for a fixed u 2 produces a monotonically increasing curve, and vice versa for Q 2 . (b) Visualization of conditional vector quantile functions (CVQFs) via α-contours. Data was drawn from a joint distribution of (X, Y) where Y|X = x has a star-shaped distribution rotated by x degrees. The true CVQF Q Y|X changes non-linearly with the covariates X, while E [Y|X] remains the same. This demonstrates the challenge of estimating CVQFs from samples of the joint distribution. Appendix C provides further intuitions regarding VQFs and CVQFs, and details how the α-contours are constructed from them.

, hypothesis testing with a vector-valued test statistic (Shi et al., 2022), causal inference with multiple interventions (Williams & Crespi, 2020), outlier detection (Zhao et al., 2019) and others. Below we list our contributions. Scalable VQR. We introduce a highly-scalable solver for VQR in Section 3. Our approach, inspired by Genevay et al. (2016) and Carlier et al. (2020), relies on solving a new relaxed dual formulation of the VQR problem.

