LEARNING FOR EDGE-WEIGHTED ONLINE BIPARTITE MATCHING WITH ROBUSTNESS GUARANTEES

Abstract

Many real-world problems, such as online ad display, can be formulated as online bipartite matching. The crucial challenge lies in the nature of sequentially-revealed online item information, based on which we make irreversible matching decisions at each step. While numerous expert online algorithms have been proposed with bounded worst-case competitive ratios, they may not offer satisfactory performance in average cases. On the other hand, reinforcement learning (RL) has been applied to improve the average performance, but they lack robustness and can perform arbitrarily badly. In this paper, we propose a novel RL-based approach to edgeweighted online bipartite matching with robustness guarantees (LOMAR), achieving both good average-case and good worst-case performance. The key novelty of LOMAR is a new online switching operation which, based on a judiciously-designed condition to hedge against future uncertainties, decides whether to follow the expert's decision or the RL decision for each online item arrival. We prove that for any ρ ∈ [0, 1], LOMAR is ρ-competitive against any given expert online algorithm. To improve the average performance, we train the RL policy by explicitly considering the online switching operation. Finally, we run empirical experiments to demonstrate the advantages of LOMAR compared to existing baselines.

1. INTRODUCTION

Online bipartite matching is a classic online problem of practical importance (Mehta, 2013; Kim & Moon, 2020; Fahrbach et al., 2020; Antoniadis et al., 2020b; Huang & Shu, 2021; Gupta & Roughgarden, 2020) . In a nutshell, online bipartite matching assigns online items to offline items in two separate sets: when an online item arrives, we need to match it to an offline item given applicable constraints (e.g., capacity constraint), with the goal of maximizing the total rewards collected (Mehta, 2013) . For example, numerous applications, including scheduling tasks to servers, displaying advertisements to online users, recommending articles/movies/products, among many others, can all be modeled as online bipartite matching or its variants. The practical importance, along with substantial algorithmic challenges, of online bipartite matching has received extensive attention in the last few decades (Karp et al., 1990; Fahrbach et al., 2020) . Concretely, many algorithms have been proposed and studied for various settings of online bipartite matching, ranging from simple yet effective greedy algorithms to sophisticated ranking-based algorithms (Karp et al., 1990; Kim & Moon, 2020; Fahrbach et al., 2020; Aggarwal et al., 2011; Devanur et al., 2013) . These expert algorithms typically have robustness guarantees in terms of the competitive ratio -the ratio of the total reward obtained by an online algorithm to the reward of another baseline algorithm (commonly the optimal offline algorithm) -even under adversarial settings given arbitrarily bad problem inputs (Karp et al., 1990; Huang & Shu, 2021) . In some settings, even the optimal competitive ratio for adversarial inputs has been derived (readers are referred to (Mehta, 2013) for an excellent tutorial). The abundance of competitive online algorithms has clearly demonstrated the importance of performance robustness in terms of the competitive ratio, especially in safety-sensitive applications such as matching mission-critical items or under contractual obligations (Fahrbach et al., 2020) . Nonetheless, as commonly known in the literature, the necessity of conservativeness to address the worst-case adversarial input means that the average performance is typically not optimal (see, e.g., (Christianson et al., 2022; Zeynali et al., 2021) for discussions in other general online problems). 1

