LEARNING MULTIOBJECTIVE PROGRAM THROUGH ONLINE LEARNING

Abstract

We investigate the problem of learning the parameters (i.e., objective functions or constraints) of a multiobjective decision making model, based on a set of sequentially arrived decisions. In particular, these decisions might not be exact and possibly carry measurement noise or are generated with the bounded rationality of decision makers. In this paper, we propose a general online learning framework to deal with this learning problem using inverse multiobjective optimization, and prove that this framework converges at a rate of O(1/ √ T ) under certain regularity conditions. More precisely, we develop two online learning algorithms with implicit update rules which can handle noisy data. Numerical results with both synthetic and real world datasets show that both algorithms can learn the parameters of a multiobjective program with great accuracy and are robust to noise.

1. INTRODUCTION

In this paper, we aim to learn the parameters (i.e., constraints and a set of objective functions) of a decision making problem with multiple objectives, instead of solving for its efficient (or Pareto) optimal solutions, which is the typical scenario. More precisely, we seek to learn θ given {y i } i∈ [N ] that are observations of the efficient solutions of the multiobjective optimization problem (MOP): min x {f 1 (x, θ), f 2 (x, θ), . . . , f p (x, θ)} s.t. x ∈ X(θ), where θ is the true but unknown parameter of the MOP. In particular, we consider such learning problems in online fashion, noting observations are unveiled sequentially in practical scenarios. Specifically, we study such learning problem as an inverse multiobjective optimization problem (IMOP) dealing with noisy data, develop online learning algorithms to derive parameters for each objective function and constraint, and finally output an estimation of the distribution of weights (which, together with objective functions, define individuals' utility functions) among human subjects. Learning human participants' decision making scheme is critical for an organization in designing and providing services or products. Nevertheless, as in most scenarios, we can only observe their decisions or behaviors and cannot directly access decision making schemes. Indeed, participants probably do not have exact information regarding their own decision making process (Keshavarz et al., 2011) . To bridge the discrepancy, we leverage the inverse optimization idea that has been proposed and received significant attention in the optimization community, which is to infer the missing information of the underlying decision models from observed data, assuming that human decision makers are making optimal decisions (Ahuja & Orlin, 2001; Iyengar & Kang, 2005; Schaefer, 2009; Wang, 2009; Keshavarz et al., 2011; Chan et al., 2014; Bertsimas et al., 2015; Aswani et al., 2018; Esfahani et al., 2018; Tan et al., 2020) . This subject actually carries the data-driven concept and becomes more applicable as large amounts of data are generated and become readily available, especially those from digital devices and online transactions. 2018). Nevertheless, a particular challenge, which is almost unavoidable for any large data set, is that the data could be inconsistent due to measurement errors or decision makers' sub-optimality. To address this challenge, the assumption on the observations' optimality is weakened to integrate those noisy data, and KKT conditions or strong duality is relaxed to incorporate inexactness. Our work is most related to the subject of inverse multiobjective optimization. The goal is to find multiple objective functions or constraints that explain the observed efficient solutions well. There are several recent studies related to the presented research. One is in Chan et al. ( 2014), which considers a single observation that is assumed to be an exact optimal solution. Then, given a set of well-defined linear functions, an inverse optimization is formulated to learn their weights. Another one is Dong & Zeng (2020), which proposes the batch learning framework to infer utility functions or constraints from multiple noisy decisions through inverse multiobjective optimization. This work can be categorized as doing inverse multiobjective optimization in batch setting. Recently, Dong & Zeng (2021) extends Dong & Zeng (2020) with distributionally robust optimization by leveraging the prominent Wasserstein metric. In contrast, we do inverse multiobjective optimization in online settings, and the proposed online learning algorithms significantly accelerate the learning process with performance guarantees, allowing us to deal with more realistic and complex preference inference problems. Also related to our work is the line of research conducted by Bärmann et al. ( 2017) and Dong et al. ( 2018), which develops online learning methods to infer the utility function or constraints from sequentially arrived observations. However, their approach is only possible to handle inverse optimization with a single objective. More specifically, their methods apply to situations where observations are generated by decision making problems with only one objective function. Differently, our approach does not make the single-objective assumption and only requires the convexity of the underlying decision making problem with multiple objectives. Hence, we believe that our work generalizes their methods and extends the applicability of online learning from learning single objective program to multiobjective program.

1.2. OUR CONTRIBUTIONS

To the best of authors' knowledge, we propose the first general framework of online learning for inferring decision makers' objective functions or constraints using inverse multiobjective optimization. This framework can learn the parameters of any convex decision making problem, and can explicitly handle noisy decisions. Moreover, we show that the online learning approach, which adopts an implicit update rule, has an O( √ T ) regret under suitable regularity conditions when using the ideal loss function. We finally illustrate the performance of two algorithms on both a multiobjective quadratic programming problem and a portfolio optimization problem. Results show that both algorithms can learn parameters with great accuracy and are robust to noise while the second algorithm significantly accelerate the learning process over the first one.

2.1. DECISION MAKING PROBLEM WITH MULTIPLE OBJECTIVES

We consider a family of parametrized multiobjective decision making problems of the form min x∈R n f 1 (x, θ), f 2 (x, θ), . . . , f p (x, θ) s.t. x ∈ X(θ), (DMP) where p ≥ 2 and f l (x, θ) : R n × R n θ → R for each l ∈ [p]. Assume parameter θ ∈ Θ ⊆ R n θ . We denote the vector of objective functions by f (x, θ) = (f 1 (x, θ), f 2 (x, θ), . . . , f p (x, θ)) T . Assume X(θ) = {x ∈ R n : g(x, θ) ≤ 0, x ∈ R n + }, where g(x, θ) = (g 1 (x, θ), . . . , g q (x, θ)) T is another vector-valued function with g k (x, θ) : R n × R n θ → R for each k ∈ [q].



1.1 RELATED WORKOur work draws inspiration from the inverse optimization problem with single objective. It seeks particular values for those parameters such that the difference between the actual observation and the expected solution to the optimization model (populated with those inferred values) is minimized. Although complicated, an inverse optimization model can often be simplified for computation through using KKT conditions or strong duality of the decision making model, provided that it is convex. Nowadays, extending from its initial form that only considers a single observation Ahuja & Orlin (2001); Iyengar & Kang (2005); Schaefer (2009); Wang (2009), inverse optimization has been further developed and applied to handle many observations Keshavarz et al. (2011); Bertsimas et al. (2015); Aswani et al. (2018); Esfahani et al. (

