The scaling hypothesis: simplifying the prediction of network performance using scaled-down simulations

Konstantinos Psounis, Rong Pan, Balaji Prabhakar, Damon Wischik. ACM Computer Communication Review, Vol. 33, January 2003. [pdf] [journal] [group website]

Abstract.

As the Internet grows, so do the complexity and computational requirements of network simulations. This leads either to unrealistic, or to prohibitely expensive simulation experiments.

We explore a way to side-step this problem, by combining simulation with sampling and analysis. Our hypothesis is this: if we take a sample of the traffic, and feed it into a suitably scaled version of the system, we can extrapolate from the performance of the scaled system to that of the original.

We find that when we scale a network which is shared by TCP-like flows, and which is controlled by a variety of active queue management schemes, then performance measures such as queueing delay and the distribution of flow transfer times are left virtually unchanged. Hence, the computational requirements of network simulations and the cost of experiments can decrease dramatically.

An earlier version appeared in Proceedings of SIGCOMM HotNets-I, 2002. [ps]

SHRINK: a method for scalable performance prediction and efficient network simulation

Rong Pan, Balaji Prabhakar, Konstantinos Psounis, Damon Wischik. Proceedings of IEEE Infocom 2003. [pdf] [conference] [group website]

Abstract.

In networks and in web-server farms, it is useful to collect performance measurements, to monitor the state of the system, and to perform simulations. However, the sheer volume of traffic in large high-speed network systems makes it hard to monitor their performance or to simulate them efficiently. And the heterogeneity of the Internet means it is time-consuming and difficult to devise the traffic models and analytic tools which would allow us to work with summary statistics.

We explore a method to side-step these problems by combining sampling, modeling and simulation. Our hypothesis is this: if we take a sample of the input traffic, and feed it into a suitably scaled version of the system, we can extrapolate from the performance of the scaled system to that of the original.

Our main findings are: When we scale an IP network which is shared by TCP-like, UDP and web flows; and which is controlled by a variety of active queue management schemes, then performance measures such as queueing delay and drop probability are left virtually unchanged. We show this in theory and in simulations. This makes it possible to capture the performance of large networks quite faithfully using smaller scale replicas.