Performance



next up previous
Next: Conclusion Up: Experience and Results from Previous: Design of the

Performance

In this section we present performance measurements for the implementation of the ATM protocol. The performance of the control path was investigated by measuring the time taken to set up connections over the ATM network and the Ethernet. In addition, the manager was profiled to determine which parts of the code were critical in determining its performance.  

Data Path

The implementation of the data path in the kernel is crucial to the performance of the ATM protocol. Comparison of the performance achievable with that for the micro-kernel demonstrate that it is limited by the current structure of the BSD 4.3 protocol stack. Nevertheless, high data transfer rates can be achieved. Performance measurements using a very simple cell based interface with little buffering [7] indicate that a raw ATM throughput of 20Mb/s can be achieved between two DS5000/25 hosts. Using an MTU of KB a throughput of 15Mb/s for a TCP connection over IP on ATM was achieved.

Control Path

The performance of the control path is affected by several factors: scheduling of the manager as a user space process, concurrency within the manager, and communication between the manager and the kernel. Table 1 gives typical connection setup and teardown times over the Ethernet and a direct ATM link between two lightly loaded DS5000/25 hosts running the kernel and user space manager, and for local IPC on a single host. Measurements are also given when the hosts are connected through an ATM switch. The performance measurements were made using etp [4].

  
Table 1: Connection setup and teardown times

The measurements indicate that our implementation performs worse than an earlier implementation of a (simpler) version of the control plane within the Unix kernel, for which average connection setup times of 3.5 ms were measured between two Unix hosts over the Ethernet during periods of low network activity [5]. Although it would be unreasonable to expect our implementation to perform as well as a more mature, though less functional, kernel implementation, we might have hoped for better results. The results can be only partially explained by competition between the client and connection manager (and in the local case, the server), all of which were running at the same, default, user priority level. To determine why connection setup took longer than expected, we profiled the connection manager.

The profile results showed that the manager typically spends just over half of its active time (i.e. time not blocked in the select() call) executing code in the Pthreads library. In all of our measurements, the 10 most frequently called functions were in the Pthreads library. Most of these functions are concerned with thread context switching and concurrency control. Table 2 lists these functions, their associated frequencies, and the percentage of the total active time of the manager for which they account, for a typical profile during which 20 connections were set up. Clearly the concurrency control primitives have a high cost. The manager could be made more efficient by implementing it as a single threaded process. On an operating system platform which supports lightweight threads, such as OSF/1, it would be hoped that the multi-threaded design of the manager would not impact its performance, and the connection setup time would be proportionately reduced. We believe that this, in conjunction with some further optimisations identified, would enable a reduction of approximately 50 % in the connection setup time, reducing it to the order of 10 ms.

  
Table 2: Most frequently called functions

Influence of System Load

In the following sets of results, the influence of system load on connection setup time is studied. Three sets of measurements were taken. In each set the system load was varied by running a number of CPU-bound processes in competition with the manager. The competing processes each consisted of a shell pipelinegif which required both user space and kernel processing. The competing pipelines were run at the default user priority of 0. The number of interfering pipelines was varied from 0 to 3. For a given system load the performance of the manager was measured by profiling a client which repeatedly set up a connection, exchanged a single message with a remote server, and then tore down the connection. In each experiment, the client set up 100 connections with a random interval between each. In all of the experiments, connections were set up from a DS5000/25 over the ATM network, via 2 Fairisle ATM switch ports, to a second DS5000/25 also running our code. Both workstations were fitted with a simple, cell-based ATM interface [7].

In the first set of measurements, the manager was run at a priority of 0. As expected, the connection setup time degraded as the system load increased. Figure 2 shows an initial sharp increase in connection setup time with 1 competing pipeline, followed by a linear increase with increasing load. In the second set, the priority of the manager was increased to -1. As can be seen from the graph, the connection setup time initially increased rapidly with increasing load, but the rate of increase slowed thereafter. In the third set of experiments the priority of the manager was set at -6, higher than that of the automount daemon. This served to stabilise the performance, in spite of an initial sharp increase with a single competing pipeline.

  
Figure 2: Influence of system load on mean connection setup time

The results suggest that the manager should be run at a high priority, to ensure that an increase in system load does not affect its performance excessively.

Code Size

An important consequence of our design is that the complexity of the system, in terms of code size, has been significantly reduced. The connection manager compiles to approximately 0.5 MB, almost all of which is due to the Pthreads library. The size of the kernel ATM code has been reduced by about 30 % to a mere 27 KB of text segment including the ATM device driver.



next up previous
Next: Conclusion Up: Experience and Results from Previous: Design of the



Richard Black and Simon Crosby