LEARNING MULTIOBJECTIVE PROGRAM THROUGH ONLINE LEARNING

Abstract

We investigate the problem of learning the parameters (i.e., objective functions or constraints) of a multiobjective decision making model, based on a set of sequentially arrived decisions. In particular, these decisions might not be exact and possibly carry measurement noise or are generated with the bounded rationality of decision makers. In this paper, we propose a general online learning framework to deal with this learning problem using inverse multiobjective optimization, and prove that this framework converges at a rate of O(1/ √ T ) under certain regularity conditions. More precisely, we develop two online learning algorithms with implicit update rules which can handle noisy data. Numerical results with both synthetic and real world datasets show that both algorithms can learn the parameters of a multiobjective program with great accuracy and are robust to noise.

1. INTRODUCTION

In this paper, we aim to learn the parameters (i.e., constraints and a set of objective functions) of a decision making problem with multiple objectives, instead of solving for its efficient (or Pareto) optimal solutions, which is the typical scenario. More precisely, we seek to learn θ given {y i } i∈[N ] that are observations of the efficient solutions of the multiobjective optimization problem (MOP): min x {f 1 (x, θ), f 2 (x, θ), . . . , f p (x, θ)} s.t. x ∈ X(θ), where θ is the true but unknown parameter of the MOP. In particular, we consider such learning problems in online fashion, noting observations are unveiled sequentially in practical scenarios. Specifically, we study such learning problem as an inverse multiobjective optimization problem (IMOP) dealing with noisy data, develop online learning algorithms to derive parameters for each objective function and constraint, and finally output an estimation of the distribution of weights (which, together with objective functions, define individuals' utility functions) among human subjects. Learning human participants' decision making scheme is critical for an organization in designing and providing services or products. Nevertheless, as in most scenarios, we can only observe their decisions or behaviors and cannot directly access decision making schemes. Indeed, participants probably do not have exact information regarding their own decision making process (Keshavarz et al., 2011) . To bridge the discrepancy, we leverage the inverse optimization idea that has been proposed and received significant attention in the optimization community, which is to infer the missing information of the underlying decision models from observed data, assuming that human decision makers are making optimal decisions (Ahuja & Orlin, 2001; Iyengar & Kang, 2005; Schaefer, 2009; Wang, 2009; Keshavarz et al., 2011; Chan et al., 2014; Bertsimas et al., 2015; Aswani et al., 2018; Esfahani et al., 2018; Tan et al., 2020) . This subject actually carries the data-driven concept and becomes more applicable as large amounts of data are generated and become readily available, especially those from digital devices and online transactions.

1.1. RELATED WORK

Our work draws inspiration from the inverse optimization problem with single objective. It seeks particular values for those parameters such that the difference between the actual observation and the expected solution to the optimization model (populated with those inferred values) is minimized. Although complicated, an inverse optimization model can often be simplified for computation through 1

