Large deviations and Internet congestion

If written five years ago, the title of this dissertation would probably have been Large Deviations and Queueing Theory. However, the Internet is one of the most important queueing networks there is today. This year it will be directly involved in hundreds of billions of dollars of economic activity, and each year its size more than doubles. It should be of interest to mathematicians because it raises interesting mathematical questions, and because good and timely answers to those questions can feed back into better design.

To give a sense of the scale and speed of the Internet, here are some figures, obtained from transmitting this dissertation across the Atlantic to www.wischik.com in July 1999. There are roughly 215000 characters to transmit, grouped into packets of 1500 characters. Each packet takes 50-60 milliseconds to reach its destination, and is forwarded through 20 different way-stations or routers. In the evening, when the Internet is lightly loaded, the entire transfer takes just under 2 seconds. In the afternoon, when there is congestion, it can take over 40 seconds. The reason it takes so long is that the source computer waits between sending packets, so as not to overload the network.

Such variability is unacceptable even for something as simple as a telephone call. If the Internet is to evolve into a high-performance network, suitable for communication that is richer than simple file-transfers, we must understand how congestion arises and find ways to keep the network operating within its capacity. These are the topics of my dissertation, and my main tool is the mathematical theory of large deviations.

Internet Congestion

The basic building block of the Internet is the router, connected by cables to other routers and to users. Each packet of data from a user is labelled with its destination and sent out to the first router on its path; at a router, each incoming packet is examined and sent out on the appropriate cable, either to the next router in its path or to its final destination.

When too many packets arrive at a router they are queued until they can be processed. But a router only has enough buffer space for a limited number of packets to queue, and when the buffer is full further incoming packets are dropped, that is, discarded. An end-system will eventually detect the drop, and would typically respond by reducing its transmission rate (and by resending the dropped packet). The frequency of packet drops is thus the primary measure of congestion.

For the past decade Internet congestion has been controlled in this way, relying on users' computers to detect congestion and to delay sending the next packets. As the Internet becomes more commercially important, this consensus arrangement is likely to fail. Engineers have recently proposed new mechanisms for signalling congestion, and economists have begun to look at usage-sensitive pricing schemes. But, without a clear mathematical understanding of the phenomenon of congestion, it is hard to see how these approaches can been reconciled.

Large Deviations

Since the Internet operates as a network of queues, the tools of queueing theory can be used to study it. This is not a simple question of applying well-understood mathematical results, getting an answer, and rewording it to refer to the Internet. Rather, there is an ongoing development of the mathematics, driven by the particular needs of this application.

In this dissertation I develop the large deviations theory of queueing networks. Large deviations theory is a modern branch of probability, concerned with estimating the probabilities of rare events. This makes it well-suited to studying high-performance communications networks, in which dropping a packet should be a rare event.

More precisely, large deviations theory is concerned with limiting regimes. In queueing problems, while precise equations can be written down, it is very rare that they can be solved exactly: so instead one seeks limiting results. A typical result would be that for a router used by n traffic flows of a specified type, with capacity to serve C n packets a millisecond, the probability of dropping a packet is roughly 10-θn, where θ can be calculated, and where the approximation is accurate in the limit as n tends to infinity. (In other words, its accuracy improves as the number of flows increases.)

Large deviations theory has been widely studied, and much work has been done on large deviations in queueing theory. What makes my work different is the limiting regime I look at. I consider the many flows limiting regime, exemplified above, in which the number of traffic flows increases. This limiting regime is well-suited to the Internet, which has many thousands of simultaneous traffic flows in its core.

Furthermore, this limiting regime yields significant new results in queueing theory. For example, I show that, under the many flows limiting regime, the essential characteristics of a flow of traffic do not change as it travels through a network. The limiting regime was prompted by the structure of the Internet, but the result stands on its own as an important and unexpected result in queueing theory. In other areas to which queueing theory has been applied, such as job scheduling in factories, the many flows limiting regime is not appropriate and such clean results are hard to find.

My Results

My study of large deviations in queueing networks has three parts. In the first I use large deviations to model traffic coming into a router and to find the way in which the router's buffer overflows. In the second I examine how the characteristics of traffic are altered as it passes through a router, and from this obtain a model for traffic through the entire network. In the third I analyse the fairness and efficiency of algorithms for signalling and controlling congestion.

Modelling Traffic and Queues

Before one can begin to analyse congestion, one must be able to model traffic; and I do this using large deviations theory and the many flows limiting regime. This is the most mathematically involved part of my dissertation. But while my results are new, they are not inherently different to those that have been found before.

Others have already developed a comprehensive theory to describe large deviations of traffic flows, under the large buffer limiting regime, in which the buffer size of a router increases. Under the many flows limiting regime, however, there have only been isolated results: so I first develop a full theory for this regime. Formally speaking, I establish a Large Deviations Principle for random processes under the many flows regime.

This Principle gives estimates for the probability of any event associated with the aggregate of many traffic flows. For example, it can be used to estimate the probability that a router's buffer overflows, and to calculate the manner in which overflow is most likely to occur.

I also compare results from the large buffer and many flows regimes. It turns out that large buffer results arise simply as a special case of the corresponding many flows results. Also, the many flows results are more applicable to flows which exhibit characteristics seen in real Internet traffic.

Networks of Queues

The preceding part describes how to model traffic as it enters a network—but what really matters is how traffic behaves as it travels through the network. In this part I prove the new and surprising result that the statistical characteristics of a flow of traffic are essentially unchanged as it passes through a router. This makes it meaningful to talk about the intrinsic characteristics of, say, video traffic as opposed to audio traffic: it is not necessary to consider either the routers a flow passes through, or the other flows it interacts with.

Earlier work on networks has reached different conclusions. The reason for the difference is that these are all limiting results, and earlier work has considered different limiting regimes. For example, under the large buffer regime, the characteristics of a flow of traffic change along its route in ways which are complicated and do not lend themselves to general principles.

It is my choice of limiting regime that allows a much cleaner conclusion. I consider the many flows limiting regime, in which the number of flows of traffic increases, and I make the additional assumption that the different flows follow diverse routes through the network.

My result has a straightforward mathematical formulation, but its implications are significant. It dramatically simplifies the analysis of networks.

Signalling Congestion

While the results of the first two parts of the dissertation are framed with the Internet in mind, they apply in principle to any queueing network. In the last part however I address a question which has arisen specifically from the needs of the Internet engineering community: How should routers signal congestion to users? There are actually two parts to this question: What should be the goals of a congestion-signalling algorithm? and What sort of algorithm can achieve these goals?

Both of these questions have been looked at, though mainly in isolation: economists have considered the first, and engineers the second. But without a mathematical model of the phenomenon of congestion, economists cannot devise pricing structures to prevent it; and without a theory which explains how congestion occurs, engineers cannot analyse their algorithms. Only recently have mathematicians begun to study these issues. I address both questions, using the large deviations tools developed in the preceding parts.

First, I define what it means for a router to signal congestion fairly and efficiently. This involves a large deviations analysis of the impact of a user on the network. It also involves economic modelling of how users behave—economically speaking, a congestion-signalling algorithm has the same purposes as a pricing scheme: to convey information and to direct consumption.

I then analyse the performance of various congestion-signalling algorithms that have been proposed, including one which has been implemented in commercial routers. This is, as far as I am aware, the first theoretical analysis of these algorithms. It turns out that they are all unfair and economically inefficient. I go on to suggest improvements based on principles from large deviations theory.

This part of the dissertation is more discursive, as it takes some effort to frame an appropriate mathematical question. Once that is done, the tools of queueing theory can be powerfully applied.

Relevance

The Internet raises interesting mathematical issues. Using a limiting regime suggested by the structure of the Internet, I have been able to prove a result which significantly simplifies the analysis of networks of queues.

Mathematical study is also of benefit to the Internet. If the Internet is to fulfil its promise of revolutionising the way we communicate, it needs to evolve new ways of coping with congestion. The first step must be to understand the nature of congestion—how it occurs and how it affects traffic—and I have extended mathematical tools for describing these things.

Then there must be some way to signal and control congestion. Signalling mechanisms are just beginning to be developed and built into commercial routers. I have used large deviations theory to explain what they should do and to analyse how they perform. A good signalling mechanism will be fundamental to congestion control, whether this takes the form of usage-sensitive pricing schemes or of operator-imposed restrictions.

Further Work

I believe there is considerable scope for further mathematical investigation of the Internet. Such work can bring interesting and challenging problems to queueing theory—and also to dynamical systems theory, control theory, microeconomics, and other disciplines. The insights gained from studying these problems will assist in the evolving design of the Internet.