EF21-P AND FRIENDS: IMPROVED THEORETICAL COMMUNICATION COMPLEXITY FOR DISTRIBUTED OPTIMIZATION WITH BIDIRECTIONAL COMPRESSION

Abstract

The starting point of this paper is the discovery of a novel and simple errorfeedback mechanism, which we call EF21-P, for dealing with the error introduced by a contractive compressor. Unlike all prior works on error feedback, where compression and correction operate in the dual space of gradients, our mechanism operates in the primal space of models. While we believe that EF21-P may be of interest in many situations where it is often advantageous to perform model perturbation prior to the computation of the gradient (e.g., randomized smoothing and generalization), in this work we focus our attention on its use as a key building block in the design of communication-efficient distributed optimization methods supporting bidirectional compression. In particular, we employ EF21-P as the mechanism for compressing and subsequently error-correcting the model broadcast by the server to the workers. By combining EF21-P with suitable methods performing worker-to-server compression, we obtain novel methods supporting bidirectional compression and enjoying new state-of-the-art theoretical communication complexity for convex and nonconvex problems. For example, our bounds are the first that manage to decouple the variance/error coming from the workersto-server and server-to-workers compression, transforming a multiplicative dependence to an additive one. In the convex regime, we obtain the first bounds that match the theoretical communication complexity of gradient descent. Even in this convex regime, our algorithms work with biased gradient estimators, which is nonstandard and requires new proof techniques that may be of independent interest. Finally, our theoretical results are corroborated through suitable experiments.

1. INTRODUCTION: ERROR FEEDBACK IN THE PRIMAL SPACE

The key moment which ultimately enabled the main results of this paper was our discovery of a new and simple error-feedback technique, which we call EF21-P, that operates in the primal space of the iterates/models instead of the prevalent approach to error-feedback (Stich & Karimireddy, 2019; Karimireddy et al., 2019; Gorbunov et al., 2020b; Beznosikov et al., 2020; Richtárik et al., 2021) which operates in the dual space of gradientsfoot_0 . To describe EF21-P, consider solving the optimization problem min x∈R d f (x), where f : R d → R is a smooth but not necessarily convex function. Given a contractive compression operator C : R d → R d , i.e., a (possibly) randomized mapping satisfying the inequality E C(x) -x 2 ≤ (1 -α) x 2 , ∀x ∈ R d (2) for some constant α ∈ (0, 1], our EF21-P method aims to solve (1) via the iterative process x t+1 = x t -γ∇f (w t ), w t+1 = w t + C t (x t+1 -w t ),



Our method is inspired by the recently proposed error-feedback mechanism, EF21, of Richtárik et al. (2021), which compresses the dual vectors, i.e., the gradients. EF21 is currently the state-of-the-art error feedback mechanism in terms of its theoretical properties and practical performance(Fatkhullin et al., 2021). If we wish to explicitly highlight its dual nature, we could instead meaningfully call their method EF21-D.

