Fluid models of congestion collapse in overloaded switched networks

Devavrat Shah and Damon Wischik. Submitted [preprint pdf]

Abstract.

We consider a switched network (i.e. a queueing network in which there are constraints on which queues may be served simultaneously), in a state of overload. We analyse the behaviour of two scheduling algorithms for multihop switched networks: a generalized version of max-weight, and the α-fair policy. We show that queue sizes grow linearly with time, under either algorithm, and we characterize the growth rates. We use this characterization to demonstrate examples of congestion collapse, i.e. cases in which throughput drops as the switched network becomes more overloaded. We further show that the loss of throughput can be made arbitrarily small by the max-weight algorithm with weight function f(q) = qα as α→0.
In a data center, jobs may have to be served by several database machines, and the database machines may have a choice of which jobs to prioritize. Here is a simple example with three job classes, each of which earns a different revenue on completion, and one of which has to use two machines in tandem. The revenue-maximizing policy is for the database machines to prioritize £6 jobs, to let through as many £5 jobs as will get through, and to give the rest of service to £4 jobs. But this policy requires full knowledge of the network structure and of arrival rates. On the other hand, the greedy policy of prioritizing high-revenue jobs ends up wasting effort on £5 jobs which will just be dropped downstream. Is it possible to do better with a simple myopic algorithm?

The plot shows what happens when machines use the max-weight scheduling policy with weight function f(qi)=revenuei qiα. (This is a myopic policy.) The plot shows the rate at which revenue is earned, as a function of load. As load increases, revenue drops. But as α→0, revenue approaches optimal.

See section 4.1 of the paper. Detailed calculations [pdf]

An input-queued switch is the piece of silicon in the heart of high-end Internet routers. This diagram shows an input-queued switch with two inputs and two outputs. At each clock tick, the switch chooses which inputs to connect to which outputs, and it sends packets along the connections: it either serves q1 1 and q2 2, or it serves q1 2 and q2 1. How should it choose which of these to actions to perform, given the queue sizes?

The plot shows total throughput (i.e. total departure rate) as a function of load, for two scheduling algorithms: α-fair scheduling and max-weight scheduling. In both cases there is congestion collapse, i.e. throughput falls as load increases beyond critical. As α→0, throughput comes close to optimal.

See section 4.3 of the paper. Detailed calculations [pdf]