In this section we present performance measurements for the implementation of the ATM protocol. The performance of the control path was investigated by measuring the time taken to set up connections over the ATM network and the Ethernet. In addition, the manager was profiled to determine which parts of the code were critical in determining its performance.
The implementation of the data path in the kernel is crucial to the
performance of the ATM protocol. Comparison of the performance
achievable with that for the micro-kernel demonstrate that it is
limited by the current structure of the BSD 4.3 protocol stack.
Nevertheless, high data transfer rates can be achieved. Performance
measurements using a very simple cell based interface with little
buffering [7] indicate that a raw ATM throughput of
20Mb/s can be achieved between two DS5000/25 hosts. Using an MTU of
KB a throughput of 15Mb/s for a TCP connection over IP on ATM was
achieved.
The performance of the control path is affected by several factors: scheduling of the manager as a user space process, concurrency within the manager, and communication between the manager and the kernel. Table 1 gives typical connection setup and teardown times over the Ethernet and a direct ATM link between two lightly loaded DS5000/25 hosts running the kernel and user space manager, and for local IPC on a single host. Measurements are also given when the hosts are connected through an ATM switch. The performance measurements were made using etp [4].
Table 1: Connection setup and teardown times
The measurements indicate that our implementation performs worse than an earlier implementation of a (simpler) version of the control plane within the Unix kernel, for which average connection setup times of 3.5 ms were measured between two Unix hosts over the Ethernet during periods of low network activity [5]. Although it would be unreasonable to expect our implementation to perform as well as a more mature, though less functional, kernel implementation, we might have hoped for better results. The results can be only partially explained by competition between the client and connection manager (and in the local case, the server), all of which were running at the same, default, user priority level. To determine why connection setup took longer than expected, we profiled the connection manager.
The profile results showed that the manager typically spends just over half of its active time (i.e. time not blocked in the select() call) executing code in the Pthreads library. In all of our measurements, the 10 most frequently called functions were in the Pthreads library. Most of these functions are concerned with thread context switching and concurrency control. Table 2 lists these functions, their associated frequencies, and the percentage of the total active time of the manager for which they account, for a typical profile during which 20 connections were set up. Clearly the concurrency control primitives have a high cost. The manager could be made more efficient by implementing it as a single threaded process. On an operating system platform which supports lightweight threads, such as OSF/1, it would be hoped that the multi-threaded design of the manager would not impact its performance, and the connection setup time would be proportionately reduced. We believe that this, in conjunction with some further optimisations identified, would enable a reduction of approximately 50 % in the connection setup time, reducing it to the order of 10 ms.
Table 2: Most frequently called functions
In the following sets of results, the influence of system load on
connection setup time is studied. Three sets of measurements were
taken. In each set the system load was varied by running a number of
CPU-bound processes in competition with the manager. The competing
processes each consisted of a shell pipeline which required both user
space and kernel processing. The competing pipelines were run at the
default user priority of 0. The number of interfering pipelines was
varied from 0 to 3. For a given system load the performance of the
manager was measured by profiling a client which repeatedly set up a
connection, exchanged a single message with a remote server, and then
tore down the connection. In each experiment, the client set up 100
connections with a random interval between each. In all of the
experiments, connections were set up from a DS5000/25 over the ATM
network, via 2 Fairisle ATM switch ports, to a second
DS5000/25 also running our code. Both workstations were fitted with a
simple, cell-based ATM interface [7].
In the first set of measurements, the manager was run at a priority of 0. As expected, the connection setup time degraded as the system load increased. Figure 2 shows an initial sharp increase in connection setup time with 1 competing pipeline, followed by a linear increase with increasing load. In the second set, the priority of the manager was increased to -1. As can be seen from the graph, the connection setup time initially increased rapidly with increasing load, but the rate of increase slowed thereafter. In the third set of experiments the priority of the manager was set at -6, higher than that of the automount daemon. This served to stabilise the performance, in spite of an initial sharp increase with a single competing pipeline.
Figure 2: Influence of system load on mean connection setup time
The results suggest that the manager should be run at a high priority, to ensure that an increase in system load does not affect its performance excessively.
An important consequence of our design is that the complexity of the system, in terms of code size, has been significantly reduced. The connection manager compiles to approximately 0.5 MB, almost all of which is due to the Pthreads library. The size of the kernel ATM code has been reduced by about 30 % to a mere 27 KB of text segment including the ATM device driver.