To give a sense of the scale and speed of the Internet, here are some
figures, obtained from transmitting this dissertation across the
Atlantic to www.wischik.com
in July 1999. There are roughly
215000 characters to transmit, grouped into packets of 1500
characters. Each packet takes 50-60 milliseconds to reach its
destination, and is forwarded through 20 different way-stations or
routers. In the evening, when the Internet is lightly loaded,
the entire transfer takes just under 2 seconds. In the afternoon,
when there is congestion, it can take over 40 seconds. The reason it
takes so long is that the source computer waits between sending
packets, so as not to overload the network.
Such variability is unacceptable even for something as simple as a telephone call. If the Internet is to evolve into a high-performance network, suitable for communication that is richer than simple file-transfers, we must understand how congestion arises and find ways to keep the network operating within its capacity. These are the topics of my dissertation, and my main tool is the mathematical theory of large deviations.
When too many packets arrive at a router they are queued until they can be processed. But a router only has enough buffer space for a limited number of packets to queue, and when the buffer is full further incoming packets are dropped, that is, discarded. An end-system will eventually detect the drop, and would typically respond by reducing its transmission rate (and by resending the dropped packet). The frequency of packet drops is thus the primary measure of congestion.
For the past decade Internet congestion has been controlled in this way, relying on users' computers to detect congestion and to delay sending the next packets. As the Internet becomes more commercially important, this consensus arrangement is likely to fail. Engineers have recently proposed new mechanisms for signalling congestion, and economists have begun to look at usage-sensitive pricing schemes. But, without a clear mathematical understanding of the phenomenon of congestion, it is hard to see how these approaches can been reconciled.
In this dissertation I develop the large deviations theory of queueing networks. Large deviations theory is a modern branch of probability, concerned with estimating the probabilities of rare events. This makes it well-suited to studying high-performance communications networks, in which dropping a packet should be a rare event.
More precisely, large deviations theory is concerned with limiting regimes. In queueing problems, while precise equations can be written down, it is very rare that they can be solved exactly: so instead one seeks limiting results. A typical result would be that for a router used by n traffic flows of a specified type, with capacity to serve C n packets a millisecond, the probability of dropping a packet is roughly 10-θn, where θ can be calculated, and where the approximation is accurate in the limit as n tends to infinity. (In other words, its accuracy improves as the number of flows increases.)
Large deviations theory has been widely studied, and much work has been done on large deviations in queueing theory. What makes my work different is the limiting regime I look at. I consider the many flows limiting regime, exemplified above, in which the number of traffic flows increases. This limiting regime is well-suited to the Internet, which has many thousands of simultaneous traffic flows in its core.
Furthermore, this limiting regime yields significant new results in queueing theory. For example, I show that, under the many flows limiting regime, the essential characteristics of a flow of traffic do not change as it travels through a network. The limiting regime was prompted by the structure of the Internet, but the result stands on its own as an important and unexpected result in queueing theory. In other areas to which queueing theory has been applied, such as job scheduling in factories, the many flows limiting regime is not appropriate and such clean results are hard to find.
My study of large deviations in queueing networks has three parts. In the first I use large deviations to model traffic coming into a router and to find the way in which the router's buffer overflows. In the second I examine how the characteristics of traffic are altered as it passes through a router, and from this obtain a model for traffic through the entire network. In the third I analyse the fairness and efficiency of algorithms for signalling and controlling congestion.
Others have already developed a comprehensive theory to describe large deviations of traffic flows, under the large buffer limiting regime, in which the buffer size of a router increases. Under the many flows limiting regime, however, there have only been isolated results: so I first develop a full theory for this regime. Formally speaking, I establish a Large Deviations Principle for random processes under the many flows regime.
This Principle gives estimates for the probability of any event associated with the aggregate of many traffic flows. For example, it can be used to estimate the probability that a router's buffer overflows, and to calculate the manner in which overflow is most likely to occur.
I also compare results from the large buffer and many flows regimes. It turns out that large buffer results arise simply as a special case of the corresponding many flows results. Also, the many flows results are more applicable to flows which exhibit characteristics seen in real Internet traffic.
Earlier work on networks has reached different conclusions. The reason for the difference is that these are all limiting results, and earlier work has considered different limiting regimes. For example, under the large buffer regime, the characteristics of a flow of traffic change along its route in ways which are complicated and do not lend themselves to general principles.
It is my choice of limiting regime that allows a much cleaner conclusion. I consider the many flows limiting regime, in which the number of flows of traffic increases, and I make the additional assumption that the different flows follow diverse routes through the network.
My result has a straightforward mathematical formulation, but its implications are significant. It dramatically simplifies the analysis of networks.
Both of these questions have been looked at, though mainly in isolation: economists have considered the first, and engineers the second. But without a mathematical model of the phenomenon of congestion, economists cannot devise pricing structures to prevent it; and without a theory which explains how congestion occurs, engineers cannot analyse their algorithms. Only recently have mathematicians begun to study these issues. I address both questions, using the large deviations tools developed in the preceding parts.
First, I define what it means for a router to signal congestion fairly and efficiently. This involves a large deviations analysis of the impact of a user on the network. It also involves economic modelling of how users behave—economically speaking, a congestion-signalling algorithm has the same purposes as a pricing scheme: to convey information and to direct consumption.
I then analyse the performance of various congestion-signalling algorithms that have been proposed, including one which has been implemented in commercial routers. This is, as far as I am aware, the first theoretical analysis of these algorithms. It turns out that they are all unfair and economically inefficient. I go on to suggest improvements based on principles from large deviations theory.
This part of the dissertation is more discursive, as it takes some effort to frame an appropriate mathematical question. Once that is done, the tools of queueing theory can be powerfully applied.
Mathematical study is also of benefit to the Internet. If the Internet is to fulfil its promise of revolutionising the way we communicate, it needs to evolve new ways of coping with congestion. The first step must be to understand the nature of congestion—how it occurs and how it affects traffic—and I have extended mathematical tools for describing these things.
Then there must be some way to signal and control congestion. Signalling mechanisms are just beginning to be developed and built into commercial routers. I have used large deviations theory to explain what they should do and to analyse how they perform. A good signalling mechanism will be fundamental to congestion control, whether this takes the form of usage-sensitive pricing schemes or of operator-imposed restrictions.