MULTI-OBJECTIVE ONLINE LEARNING

Abstract

This paper presents a systematic study of multi-objective online learning. We first formulate the framework of Multi-Objective Online Convex Optimization, which encompasses a novel multi-objective regret. This regret is built upon a sequencewise extension of the commonly used discrepancy metric Pareto suboptimality gap in zero-order multi-objective bandits. We then derive an equivalent form of the regret, making it amenable to be optimized via first-order iterative methods. To motivate the algorithm design, we give an explicit example in which equipping OMD with the vanilla min-norm solver for gradient composition will incur a linear regret, which shows that merely regularizing the iterates, as in single-objective online learning, is not enough to guarantee sublinear regrets in the multi-objective setting. To resolve this issue, we propose a novel min-regularized-norm solver that regularizes the composite weights. Combining min-regularized-norm with OMD results in the Doubly Regularized Online Mirror Multiple Descent algorithm. We further derive the multi-objective regret bound for the proposed algorithm, which matches the optimal bound in the single-objective setting. Extensive experiments on several real-world datasets verify the effectiveness of the proposed algorithm.

1. INTRODUCTION

Traditional optimization methods for machine learning are usually designed to optimize a single objective. However, in many real-world applications, we are often required to optimize multiple correlated objectives concurrently. For example, in autonomous driving (Huang et al., 2019; Lu et al., 2019b) , self-driving vehicles need to solve multiple tasks such as self-localization and object identification at the same time. In online advertising (Ma et al., 2018a; b) , advertising systems need to decide on the exposure of items to different users to maximize both the Click-Through Rate (CTR) and the Post-Click Conversion Rate (CVR). In most multi-objective scenarios, the objectives may conflict with each other (Kendall et al., 2018) . Hence, there may not exist any single solution that can optimize all the objectives simultaneously. For example, merely optimizing CTR or CVR will degrade the performance of the other (Ma et al., 2018a; b) . Multi-objective optimization (MOO) (Marler & Arora, 2004; Deb, 2014) is concerned with optimizing multiple conflicting objectives simultaneously. It seeks Pareto optimality, where no single objective can be improved without hurting the performance of others. Many different methods for MOO have been proposed, including evolutionary methods (Murata et al., 1995; Zitzler & Thiele, 1999) , scalarization methods (Fliege & Svaiter, 2000) , and gradient-based iterative methods (Désidéri, 2012) . Recently, the Multiple Gradient Descent Algorithm (MGDA) and its variants have been introduced to the training of multi-task deep neural networks and achieved great empirical success (Sener & Koltun, 2018) , making them regain a significant amount of research interest (Lin et al., 2019; Yu et al., 2020; Liu et al., 2021) . These methods compute a composite gradient based on the gradient information of all the individual objectives and then apply the composite gradient to update the model parameters. The composite weights are determined by a min-norm solver (Désidéri, 2012) which yields a common descent direction of all the objectives. However, compared to the increasingly wide application prospect, the gradient-based iterative algorithms are relatively understudied, especially in the online learning setting. Multi-objective online learning is of essential importance for reasons in two folds. First, due to the data explosion in many real-world scenarios such as web applications, making in-time predictions requires performing online learning. Second, the theoretical investigation of multi-objective online learning will lay a solid foundation for the design of new optimizers for multi-task deep learning. This is analogous to the single-objective setting, where nearly all the optimizers for training DNNs are initially analyzed in the online setting, such as AdaGrad (Duchi et al., 2011 ), Adam (Kingma & Ba, 2015) , and AMS-Grad (Reddi et al., 2018) . In this paper, we give a systematic study of multi-objective online learning. To begin with, we formulate the framework of Multi-Objective Online Convex Optimization (MO-OCO). One major challenge in deriving MO-OCO is the lack of a proper regret definition. In the multi-objective setting, in general, no single decision can optimize all the objectives simultaneously. Thus, to devise the multi-objective regret, we need to first extend the single fixed comparator used in the singleobjective regret, i.e., the fixed optimal decision, to the entire Pareto optimal set. Then we need an appropriate discrepancy metric to evaluate the gap between vector-valued losses. Intuitively, the Pareto suboptimality gap (PSG) metric, which is frequently used in zero-order multi-objective bandits (Turgay et al., 2018; Lu et al., 2019a) , is a very promising candidate. PSG can yield scalarized measurements from any vector-valued loss to a given comparator set. However, we find that vanilla PSG is unsuitable for our setting since it always yields non-negative values and may be too loose. In a concrete example, we show that the naive PSG-based regret R I (T ) can even be linear w.r.t. T when the decisions are already optimal, which disqualifies it as a regret metric. To overcome the failure of vanilla PSG, we propose its sequence-wise variant termed S-PSG, which measures the suboptimality of the whole decision sequence to the Pareto optimal set of the cumulative loss function. Optimizing the resulting regret R II (T ) will drive the cumulative loss to approach the Pareto front. However, as a zero-order metric motivated geometrically, designing appropriate first-order algorithms to directly optimize it is too difficult. To resolve the issue, we derive a more intuitive equivalent form of R II (T ) via a highly non-trivial transformation. Based on the MO-OCO framework, we develop a novel multi-objective online algorithm termed Doubly Regularized Online Mirror Multiple Descent. The key module of the algorithm is the gradient composition scheme, which calculates a composite gradient in the form of a convex combination of the gradients of all objectives. Intuitively, the most direct way to determine the composite weights is to apply the min-norm solver (Désidéri, 2012) commonly used in offline multi-objective optimization. However, directly applying min-norm is not workable in the online setting. Specifically, the composite weights in min-norm are merely determined by the gradients at the current round. In the online setting, since the gradients are adversarial, they may result in undesired composite weights, which further produce a composite gradient that reversely optimizes the loss. To rigorously verify this point, we give an example where equipping OMD with vanilla min-norm incurs a linear regret, showing that only regularizing the iterate, as in OMD, is not enough to guarantee sublinear regrets in our setting. To fix the issue, we devise a novel min-regularized-norm solver with an explicit regularization on composite weights. Equipping it with OMD results in our proposed algorithm. In theory, we derive a regret bound of O( √ T ) for DR-OMMD, which matches the optimal bound in the single-objective setting (Hazan et al., 2016) and is tight w.r.t. the number of objectives. Our analysis also shows that DR-OMMD attains a smaller regret bound than that of linearization with fixed composite weights. We show that, in the two-objective setting with linear losses, the margin between the regret bounds depends on the difference between the composite weights yielded by the two algorithms and the difference between the gradients of the two underlying objectives. To evaluate the effectiveness of DR-OMMD, we conduct extensive experiments on several largescale real-world datasets. We first realize adaptive regularization via multi-objective optimization, and find that adaptive regularization with DR-OMMD significantly outperforms fixed regularization with linearization, which verifies the effectiveness of DR-OMMD over linearization in the convex setting. Then we apply DR-OMMD to deep online multi-task learning. The results show that DR-OMMD is also effective in the non-convex setting.

