MULTI-OBJECTIVE ONLINE LEARNING

Abstract

This paper presents a systematic study of multi-objective online learning. We first formulate the framework of Multi-Objective Online Convex Optimization, which encompasses a novel multi-objective regret. This regret is built upon a sequencewise extension of the commonly used discrepancy metric Pareto suboptimality gap in zero-order multi-objective bandits. We then derive an equivalent form of the regret, making it amenable to be optimized via first-order iterative methods. To motivate the algorithm design, we give an explicit example in which equipping OMD with the vanilla min-norm solver for gradient composition will incur a linear regret, which shows that merely regularizing the iterates, as in single-objective online learning, is not enough to guarantee sublinear regrets in the multi-objective setting. To resolve this issue, we propose a novel min-regularized-norm solver that regularizes the composite weights. Combining min-regularized-norm with OMD results in the Doubly Regularized Online Mirror Multiple Descent algorithm. We further derive the multi-objective regret bound for the proposed algorithm, which matches the optimal bound in the single-objective setting. Extensive experiments on several real-world datasets verify the effectiveness of the proposed algorithm.

1. INTRODUCTION

Traditional optimization methods for machine learning are usually designed to optimize a single objective. However, in many real-world applications, we are often required to optimize multiple correlated objectives concurrently. For example, in autonomous driving (Huang et al., 2019; Lu et al., 2019b) , self-driving vehicles need to solve multiple tasks such as self-localization and object identification at the same time. In online advertising (Ma et al., 2018a; b) , advertising systems need to decide on the exposure of items to different users to maximize both the Click-Through Rate (CTR) and the Post-Click Conversion Rate (CVR). In most multi-objective scenarios, the objectives may conflict with each other (Kendall et al., 2018) . Hence, there may not exist any single solution that can optimize all the objectives simultaneously. For example, merely optimizing CTR or CVR will degrade the performance of the other (Ma et al., 2018a; b) . Multi-objective optimization (MOO) (Marler & Arora, 2004; Deb, 2014) is concerned with optimizing multiple conflicting objectives simultaneously. It seeks Pareto optimality, where no single objective can be improved without hurting the performance of others. Many different methods for MOO have been proposed, including evolutionary methods (Murata et al., 1995; Zitzler & Thiele, 1999) , scalarization methods (Fliege & Svaiter, 2000) , and gradient-based iterative methods (Désidéri, 2012) . Recently, the Multiple Gradient Descent Algorithm (MGDA) and its variants have been introduced to the training of multi-task deep neural networks and achieved great empirical success (Sener & Koltun, 2018) , making them regain a significant amount of research interest (Lin et al., 2019; Yu et al., 2020; Liu et al., 2021) . These methods compute a composite gradient based on

