PREDICTOR-CORRECTOR ALGORITHMS FOR STOCHAS-TIC OPTIMIZATION UNDER GRADUAL DISTRIBUTION SHIFT

Abstract

Time-varying stochastic optimization problems frequently arise in machine learning practice (e.g. gradual domain shift, object tracking, strategic classification). Often, the underlying process that drives the distribution shift is continuous in nature. We exploit this underlying continuity by developing predictor-corrector algorithms for time-varying stochastic optimization that anticipates changes in the underlying data generating process through a predictor-corrector term in the update rule. The key challenge is the estimation of the predictor-corrector term; a naive approach based on sample-average approximation may lead to non-convergence. We develop a general moving-average based method to estimate the predictorcorrector term and provide error bounds for the iterates, both in presence of pure and noisy access to the queries from the relevant derivatives of the loss function. Furthermore, we show (theoretically and empirically in several examples) that our method outperforms non-predictor corrector methods that do not anticipate changes in the data generating process. 1

1. INTRODUCTION

Stochastic optimization is a basic problem in modern machine learning (ML) theory and practice. Although there is a voluminous literature on stochastic optimization (Agarwal et al., 2014; Moulines & Bach, 2011; Bottou, 2003; 2012; Bottou & Bousquet, 2007) , most prior works consider a timeinvariant stochastic optimization problem in which the data generating distribution is not changing over time. However, there is an abundance of real examples in which the underlying optimization problem is time-varying which can be broadly divided into two categories: the first kind arises due to exogeneous variation in the data generating process. A concrete example is the object tracking problem in which an observer observes (noisy) signals regarding the position of a moving object, and the goal is inferring the trajectory of the object. The second kind of time-varying optimization problem arises due to endogeneous variation in the data generating process. Examples here include strategic classification (Dong et al., 2018; Hardt et al., 2016) and performative prediction (Perdomo et al., 2020; Mendler-Dünner et al., 2020; Brown et al., 2022) . Although there are a few recent papers on time-varying stochastic optimization (e.g., Cutler et al. (2021); Nonhoff & Müller (2020); Dixit et al. (2019; 2018) ), they model the temporal drift as discrete, precluding them from exploiting the smoothness in the drift. This leads to worse asymptotic tracking error, depending on the magnitude of the temporal drift of the optimal solution, (e.g. see Popkov (2005) . Zavlanos et al. (2012 ), Zhang et al. (2009 ), Ling & Ribeiro (2013) and references therein).



Codes: https://github.com/smaityumich/concept-drift.1

