ON THE DYNAMIC REGRET OF ONLINE MULTIPLE MIRROR DESCENT Anonymous

Abstract

We study the problem of online convex optimization, where a learner makes sequential decisions to minimize an accumulation of strongly convex costs over time. The quality of decisions is given in terms of the dynamic regret, which measures the performance of the learner relative to a sequence of dynamic minimizers. Prior works on gradient descent and mirror descent have shown that the dynamic regret can be upper bounded using the path length, which depend on the differences between successive minimizers, and an upper bound using the squared path length has also been shown when multiple gradient queries are allowed per round. However, they all require the cost functions to be Lipschitz continuous, which imposes a strong requirement especially when the cost functions are also strongly convex. In this work, we consider Online Multiple Mirror Descent (OMMD), which is based on mirror descent but uses multiple mirror descent steps per online round. Without requiring the cost functions to be Lipschitz continuous, we derive two upper bounds on the dynamic regret based on the path length and squared path length. We further derive a third upper bound that relies on the gradient of cost functions, which can be much smaller than the path length or squared path length, especially when the cost functions are smooth but fluctuate over time. Thus, we show that the dynamic regret of OMMD scales linearly with the minimum among the path length, squared path length, and sum squared gradients. Our experimental results further show substantial improvement on the dynamic regret compared with existing alternatives.

1. INTRODUCTION

Online optimization refers to the design of sequential decisions where system parameters and cost functions vary with time. It has applications to various classes of problems, such as object tracking (Shahrampour & Jadbabaie, 2017 ), networking (Shi et al., 2018 ), cloud computing (Lin et al., 2012 ), and classification (Crammer et al., 2006) . It is also an important tool in the development of algorithms for reinforcement learning (Yuan & Lamperski, 2017) and deep learning (Mnih et al., 2015) . In this work, we consider online convex optimization, which can be formulated as a discrete-time sequential learning process as follows. At each round t, the learner first makes a decision x t ∈ X , where X is a convex set representing the solution space. The learner then receives a convex cost function f t (x) : X → R and suffers the corresponding cost of f t (x t ) associated with the submitted decision. The goal of the online learner is to minimize the total accrued cost over a finite number of rounds, denoted by T . For performance evaluation, prior studies on online learning often focus on the static regret, defined as the difference between the learner's accumulated cost and that of an optimal fixed offline decision, which is made in hindsight with knowledge of f t (•) for all t: Reg s T = T t=1 f t (x t ) -min x∈X T t=1 f t (x). A successful online algorithm closes the gap between the online decisions and the offline counterpart when normalized by T , i.e., sustaining sublinear static regret in T . In the literature, there are various online algorithms (Zinkevich, 2003; Cesa-Bianchi & Lugosi, 2006; Hazan et al., 2006; Duchi et al., 2010; Shalev-Shwartz, 2012) that guarantee a sublinear bound on the static regret.

