ON THE DYNAMIC REGRET OF ONLINE MULTIPLE MIRROR DESCENT Anonymous

Abstract

We study the problem of online convex optimization, where a learner makes sequential decisions to minimize an accumulation of strongly convex costs over time. The quality of decisions is given in terms of the dynamic regret, which measures the performance of the learner relative to a sequence of dynamic minimizers. Prior works on gradient descent and mirror descent have shown that the dynamic regret can be upper bounded using the path length, which depend on the differences between successive minimizers, and an upper bound using the squared path length has also been shown when multiple gradient queries are allowed per round. However, they all require the cost functions to be Lipschitz continuous, which imposes a strong requirement especially when the cost functions are also strongly convex. In this work, we consider Online Multiple Mirror Descent (OMMD), which is based on mirror descent but uses multiple mirror descent steps per online round. Without requiring the cost functions to be Lipschitz continuous, we derive two upper bounds on the dynamic regret based on the path length and squared path length. We further derive a third upper bound that relies on the gradient of cost functions, which can be much smaller than the path length or squared path length, especially when the cost functions are smooth but fluctuate over time. Thus, we show that the dynamic regret of OMMD scales linearly with the minimum among the path length, squared path length, and sum squared gradients. Our experimental results further show substantial improvement on the dynamic regret compared with existing alternatives.

1. INTRODUCTION

Online optimization refers to the design of sequential decisions where system parameters and cost functions vary with time. It has applications to various classes of problems, such as object tracking (Shahrampour & Jadbabaie, 2017) , networking (Shi et al., 2018 ), cloud computing (Lin et al., 2012) , and classification (Crammer et al., 2006) . It is also an important tool in the development of algorithms for reinforcement learning (Yuan & Lamperski, 2017) and deep learning (Mnih et al., 2015) . In this work, we consider online convex optimization, which can be formulated as a discrete-time sequential learning process as follows. At each round t, the learner first makes a decision x t ∈ X , where X is a convex set representing the solution space. The learner then receives a convex cost function f t (x) : X → R and suffers the corresponding cost of f t (x t ) associated with the submitted decision. The goal of the online learner is to minimize the total accrued cost over a finite number of rounds, denoted by T . For performance evaluation, prior studies on online learning often focus on the static regret, defined as the difference between the learner's accumulated cost and that of an optimal fixed offline decision, which is made in hindsight with knowledge of f t (•) for all t: Reg s T = T t=1 f t (x t ) -min x∈X T t=1 f t (x). A successful online algorithm closes the gap between the online decisions and the offline counterpart when normalized by T , i.e., sustaining sublinear static regret in T . In the literature, there are various online algorithms (Zinkevich, 2003; Cesa-Bianchi & Lugosi, 2006; Hazan et al., 2006; Duchi et al., 2010; Shalev-Shwartz, 2012) that guarantee a sublinear bound on the static regret. However, algorithms that guarantee performance close to that of a static decision may still perform poorly in dynamic settings. Consequently, the static regret fails to accurately reflect the quality of decisions in many practical scenarios. Therefore, the dynamic regret has become a popular metric in recent works (Besbes et al., 2015; Mokhtari et al., 2016; Yang et al., 2016; Zhang et al., 2017) , which allows a dynamic sequence of comparison targets and is defined by Reg d T = T t=1 f t (x t ) - T t=1 f t (x * t ), where x * t = argmin x∈X f t (x) is a minimizer of the cost at round t. It is well-known that the online optimization problem may be intractable in a dynamic setting, due to arbitrary fluctuation in the cost functions. Hence, achieving a sublinear bound on the dynamic regret may be impossible. However, it is possible to upper bound the dynamic regret in terms of certain regularity measures. One of the measures to represent regularity is the path length, defined by C T = T t=2 x * t -x * t-1 , which illustrates the accumulative variation in the minimizer sequence. For instance, the dynamic regret of online gradient descent for convex cost functions can be bounded by O( , 2003) .foot_0 For strongly convex functions, the dynamic regret of online gradient descent can be reduced to O(C T ) (Mokhtari et al., 2016) . When the cost functions are smooth and strongly convex, by allowing the learner to make multiple queries to the gradient of the cost functions, the regret bound can be further improved to O(min(C T , S T )), where S T represents the squared path length, defined by √ T (1 + C T )) (Zinkevich S T = T t=2 x * t -x * t-1 2 , which can be smaller than the path length when the distance between successive minimizers is small. All the aforementioned studies require the cost functions to be Lipschitz continuous. However, there are many commonly used cost functions, e.g., the quadratic function that do not meet the Lipschitz condition. In addition, the above works rely on measuring distances using Euclidean norms, which hinders the projection step in gradient descent update for some constraint sets, e.g., probability simplex (Duchi, 2018) . Besides gradient descent, mirror descent is another well-known technique of online convex optimization (Hall & Willett, 2015; Jadbabaie et al., 2015) . Mirror descent uses the Bregman divergence, which generalizes the Euclidean norm used in the projection step of gradient descent, thus acquiring expanded applicability to a broader range of problems. In addition, the Bregman divergence is only mildly dependent on the dimension of decision variables (Beck & Teboulle, 2003; Nemirovsky & Yudin, 1983) , so that mirror descent is optimal among first-order methods when the decision variables have high dimensions (Duchi et al., 2010) . In this work we focus on the mirror descent approach. In previous works on online mirror descent, the learner queries the gradient of each cost function only once, and performs one step of mirror descent to update its decision (Hall & Willett, 2015; Shahrampour & Jadbabaie, 2017) . In this case, the dynamic regret has an upper bound of order O( √ T (1 + C T )), which is the same as that of online gradient descent in (Zinkevich, 2003) . In this work, we investigate whether it is possible to improve the dynamic regret when the learner performs multiple mirror descent steps in each online round, while relaxing the Lipschitz continuity condition on the cost functions. To this end, we analyze the performance of the Online Multiple Mirror Descent (OMMD) algorithm, which uses multiple steps of mirror descent per online round. When the cost functions are smooth and strongly convex, we show that the upper bound on the dynamic regret can be reduced from



A more general definition of the dynamic regret was introduced in (Zinkevich, 2003), which allows comparison against an arbitrary sequence {ut} T t=1 . We note that the regret bounds developed in(Zinkevich, 2003) also hold for the specific case of ut = x * t .

