DISTRIBUTED LEAST SQUARE RANKING WITH RANDOM FEATURES

Abstract

In this paper, we study the statistical properties of pairwise ranking using distributed learning and random features (called DRank-RF) and establish its convergence analysis in probability. Theoretical analysis shows that DRank-RF remarkably reduces the computational requirements while preserving a satisfactory convergence rate. An extensive experiment verifies the effectiveness of DRank-RF. Furthermore, to improve the learning performance of DRank-RF, we propose an effective communication strategy for it and demonstrate the power of communications via theoretical assessments and numerical experiments.

1. INTRODUCTION

Distributed learning has attracted much attention in the literature and has been widely used for kernel learning in large scale scenarios (Zhang et al., 2013; Chang et al., 2017; Lin et al., 2020b) . The distributed kernel learning has mainly three ingredients: Processing the data subset in the local kernel machines and producing a local estimator; Communicating exclusive information such as the data (Bellet et al., 2015) , gradients (Zeng & Yin, 2018) and local estimator (Huang & Huo, 2019) between the local processors and the global processor; Synthesizing the local estimators and the communicated information on the global processor to produce a global estimator. Note that, in the divide-and-conquer learning, the second ingredient communications is not necessary. In the terms of practical challenges and theoretical analysis, the distributed learning has made significant breakthroughs in the multi-penalty regularization (Guo et al., 2019) , coefficient-based regularization (Pang & Sun, 2018), spectral algorithms (Mücke & Blanchard, 2018; Lin et al., 2020a) , kernel ridgel regression (Yin et al., 2020; 2021) , and semi-supervised regression (Li et al., 2022) . All the above are restricted to pointwise kernel learning. However, the distributed learning in pairwise kernel learning still has a long way to go. The existing distributed pairwise learning (Chen et al., 2019; 2021) has high computational requirements, which motivates us to explore theoretic foundations and efficient methods for pairwise ranking kernel methods under distributed learning. Random features methods (Rahimi & Recht, 2007; Carratino et al., 2018; Liu et al., 2021) have a long and distinguished history, which embed the non-linear feature space (i.e. the Reproducing Kernel Hilbert Space associated with the kernel) into a low dimensional Euclidean space while incurring an arbitrarily small additive distortion in the inner product values. This enables one to overcome the high computational requirements of kernel learning since one can now work in an explicit low dimensional space with explicit representation whose complexity depends only on the dimensionality of the space. Random features have gained rapid progress in reducing the complexity of the kernel ridge regression (Liu et al., 2021) and semi-supervised regression (Li et al., 2022) .However, it remains unclear for complexity reduction and learning theory analysis to the distributed pairwise ranking kernel learning. In this paper, to reduce the computational requirements of pairwise ranking kernel learning, we investigate the method of combining distributed learning and random features for pairwise ranking kernel learning, called distributed least square ranking with random features (DRank-RF), to deal with large-scale applications, and study its statistical properties in probability by integral operators framework. To further improve the performance of DRank-RF, we consider communications among different local processors. The main contributions of this paper are as follows: 1) We construct a novel method DRank-RF to improve the existing state-of-the-art performance of the distributed pairwise ranking kernel learning. This work is the first time to apply random features to least square

