OFFLINE CONGESTION GAMES: HOW FEEDBACK TYPE AFFECTS DATA COVERAGE REQUIREMENT

Abstract

This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and show a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback.

1. INTRODUCTION

Congestion game is a special class of general-sum matrix games that models the interaction of players with shared facilities (Rosenthal, 1973) . Each player chooses some facilities to utilize, and each facility will incur a different reward depending on how congested it is. For instance, in the routing game (Koutsoupias & Papadimitriou, 1999) , each player decides a path to travel from the starting point to the destination point in a traffic graph. The facilities are the edges and the joint decision of all the players determines the congestion in the graph. The more players utilize one edge, the longer the travel time on that edge will be. As one of the most well-known classes of games, congestion game has been successfully deployed in numerous real-world applications such as resource allocation (Johari & Tsitsiklis, 2003) , electrical grids (Ibars et al., 2010) and cryptocurrency ecosystem (Altman et al., 2019) . Nash equilibrium (NE), one of the most important concepts in game theory (Nash Jr, 1950) , characterizes the emerging behavior in a multi-agent system with selfish players. It is commonly known that solving for the NE is computationally efficient in congestion games as they are isomorphic to potential games (Monderer & Shapley, 1996) . Assuming full information access, classic dynamics such as best response dynamics (Fanelli et al., 2008 ), replicator dynamics (Drighes et al., 2014) and no-regret dynamics (Kleinberg et al., 2009) provably converge to NE in congestion games. Recently Heliou et al. (2017) and Cui et al. (2022) relaxed the full information setting to the online (semi-) bandit feedback setting, achieving asymptotic and non-asymptotic convergence, respectively. It is worth noting that Cui et al. (2022) proposed the first algorithm that has sample complexity independent of the number of actions. Offline reinforcement learning has been studied in many real-world applications (Levine et al., 2020) . From the theoretical perspective, a line of work provides understanding of offline singleagent decision making, including bandits and Markov Decision Processes (MDPs), where researchers derived favorable sample complexity under the single policy coverage (Rashidinejad et al., 2021; Xie et al., 2021b) . However, how to learn in offline multi-agent games with offline data is still far from clear. Recently, the unilateral coverage assumption has been proposed as the minimal assumption for offline zero-sum games and offline general-sum games with corresponding algorithms to learn the NE (Cui & Du, 2022a; b; Zhong et al., 2022) . Though their coverage assumption and the algorithms apply to the most general class of normal-form games, when specialized to congestion games, the sample complexity will scale with the number of actions, which can be exponentially large. Since congestion games admit specific structures, one may hope to find specialized data coverage assumptions that permit sample-efficient offline learning.

One-Unit Deviation

Weak Covariance Domination Strong Covariance Domination Facility-Level ✔ ✔ ✔ Agent-Level ✘ ✔ ✔ Game-Level ✘ ✘ ✔ Table 1 : A summary of how data coverage assumptions affect offline learnability. In particular, ✔ represents under this pair of feedback type and assumption, an NE can be learned with a sufficient amount of data; on the other hand, ✘ represents there exists some instances in which a NE cannot be learned no matter how much data is collected. In different applications, the types of feedback, i.e., the revealed reward information, can be different in the offline dataset. For instance, the dataset may include the reward of each facility, the reward of each player, or the total reward of the game. With decreasing information contained in the dataset, different coverage assumptions and algorithms are necessary. In addition, the main challenge in solving congestion games lies in the curse of an exponentially large action set, as the number of actions can be exponential in the number of facilities. In this work, we aim to answer the following question: When can we find approximate NE in offline congestion games with different types of feedback, without suffering from the curse of large action set? We provide an answer that reveals striking differences between different types of feedback.

1.1. MAIN CONTRIBUTIONS

We provide both positive and negative results for each type of feedback. See Table 1 for a summary. 1. Three types of feedback and corresponding data coverage assumptions. We consider three types of feedback: facility-level feedback, agent-level feedback and game-level feedback to model different real-world applications and what dataset coverage assumptions permit finding an approximate NE. In offline general-sum games, Cui & Du (2022b) proposes the unilateral coverage assumption. Although their result can be applied to offline congestion games with agent-level feedback, their unilateral coverage coefficient is at least as large as the number of actions and thus has an exponential dependence on the number of facilities. Therefore, for each type of feedback, we propose a corresponding data coverage assumption to escape the curse of the large action set. Specifically: • Facility-Level Feedback: For facility-level feedback, the reward incurred in each facility is provided in the offline dataset. This type of feedback has the strongest signal. We propose the One-Unit Deviation coverage assumption (cf. Assumption 2) for this feedback. • Agent-Level Feedback: For agent-level feedback, only the sum of the facilities' rewards for each agent is observed. This type of feedback has weaker signals than the facility-level feedback does, and therefore we require a stronger data coverage assumption (cf. Assumption 4). • Game-Level Feedback: For the game-level feedback, only the sum of the agent rewards is obtained. This type of feedback has the weakest signals, and we require the strongest data coverage assumption (Assumption 5). Notably, for the latter two types of feedback, we leverage the connections between congestion games and linear bandits.

