ROBUST LEARNING FOR CONGESTION-AWARE ROUT-ING

Abstract

We consider the problem of routing users through a network with unknown congestion functions over an infinite time horizon. On each time step t, the algorithm receives a routing request and must select a valid path. For each edge e in the selected path, the algorithm incurs a cost c t e = f e (x t e ) + η t e , where x t e is the flow on edge e at time t, f e is the congestion function, and η t e is a noise sample drawn from an unknown distribution. The algorithm observes c t e , and can use this observation in future routing decisions. The routing requests are supplied adversarially. We present an algorithm with cumulative regret Õ(|E|t 2/3 ), where the regret on each time step is defined as the difference between the total cost incurred by our chosen path and the minimum cost among all valid paths. Our algorithm has space complexity O(|E|t 1/3 ) and time complexity O(|E| log t). We also validate our algorithm empirically using graphs from New York City road networks.

1. INTRODUCTION

Modern navigation applications such as Google Maps and Apple Maps are critical tools in large scale mobility solutions that route billions of users from their source to their destination. In order to be effective, the application should be able to accurately estimate the time required to traverse each road segment (edge) along the user route in the road network (graph): we call this the cost of the edge. In general, the cost of an edge also depends on the current traffic level (the flow) on the edge. Furthermore, costs may scale differently on different edges: a highway can tolerate more traffic than a small residential street. We model this using congestion functions that map traffic flows to edge costs. Usually, cost information is not readily available to the routing engine and can only be inferred indirectly. For example, this can be done using location pings from vehicles, or with loop detectors that record when vehicles cross a particular marker. All realistic methods of measuring the cost of an edge (such as those above) require the presence of a vehicle that reports this information back to the routing platform. In this paper, we assume that whenever a vehicle traverses an edge, we observe the time spent on the edge. We can then use this observation in future routing decisions. This induces a natural exploration/exploitation trade-off: we may wish to send vehicles on underexplored routes, even if those routes currently seem suboptimal. In this paper, we propose a learning model for congestion-aware routing, and present an algorithm which seeks to minimize the total driving time across all vehicles. Our algorithm applies to arbitrary networks and arbitrary (Lipschitz-continuous) congestion functions, even when observations are noisy. The algorithm is also robust to changes in traffic conditions in a strong sense: we show that even when request endpoints are chosen adversarially and even when traffic congestion on the edges is updated adversarially between requests, our algorithm learns an optimal routing policy.

1.1. MODEL

Consider a directed graph (V, E). Each edge e has a deterministic and fixed (but unknown) congestion function f e : R ≥0 → R ≥0 . We assume each f e is L-Lipschitz continuous and nondecreasing, with L known. For simplicity, we also assume that f e (0) = 0 for all e ∈ E (if not, we can simply translate and extend the function appropriately). We consider an infinite horizon of time steps, starting from t = 0. At each time t, a new car arrives. An adversary tells us the current amount of flow on each edge, and the source and destination of the new car. Let x t e be the flow on edge e at time t and let P t be the set of paths between the source and destination for the time t arrival. Let x t max = max e,t x t e . We must choose how to route the car, i.e., we must select a path p t ∈ P t . Based on our choice of p t , we incur a cost based on the flow and the congestion functions on the edges in our chosen path: c t = e∈pt f e (x t e ) For each e ∈ p t , we observe c t e = f e (x t e ) + η t e , where η t e is a random variable with expectation 0. The distribution of η t e is unknown, and can vary between edges and time steps. The distributions can be correlated across edges, as long as for a given edge, all the individual samples (i.e., ηfoot_0 e , η 2 e , . . . ) are independent. We assume that there exists β such that η t e ∈ [-β/2, β/2] for all edges e and times t, and that β (or an upper bound on β) is known. The optimal cost at time t is c * t = min p∈Pt e∈p f e (x t e ) so the regret of our algorithm over the first t time steps is R t = t r=1 E[c t -c * t ] where the expectation is over the randomness in the noise samples. Note also that we do not include the noise we observe in the objective function, since the noise has expectation 0. Any algorithm with sublinear regret can be said to learn an optimal routing policy: if R t = o(t), then E[c t -c * t ] must shrink as t goes to infinity, i.e., the difference between our algorithm's cost and the optimal cost must go to 0 on average. 1 In this paper, we give an algorithm with regret Õ(t 2/3 ).

1.2. OUR CONTRIBUTION

Our main result is the following theorem: Theorem 1.1. The expected regret of Algorithm 1 after t time steps is R t = O t 2/3 • |E| log x t max (β log t + x t max L) The space complexity and time complexity on time step t are O(|E|t  (|E| + V log V ). We also validate our algorithm's performance using graphs from New York City road networks.

1.3. RELATED WORK

Comparison with multi-armed bandits. Perhaps the simplest model of exploration vs exploitation is the classical multi-armed bandit (MAB) problem (Slivkins, 2019). A MAB instance consists of n arms, each with an unknown but fixed distribution. At each time step, the algorithm selects an arm and observes a reward drawn randomly from that arm's distribution. Several algorithms obtaining regret O( √ t) are known, and this is also known to be the best possible, up to logarithmic factors.



Note that E[ct -c * t ] may not monotonically decrease: a sublinear regret algorithm can still incur large errors sporadically, as long as the the frequency of the large errors goes to 0.



1/3 log x t max ) and O |E|(log t+ log log x t max ) + SP(|E|, |V |) , respectively. Here SP(|E|, |V |) denotes the time complexity of computing the shortest path between two vertices in a graph with nonnegative weights. For example, this can be done by Dijkstra's algorithm in time O

