Performance



next up previous
Next: Conclusions Up: Experiences of building ATM Previous: Software

Performance

 

The performance measurements presented in this section were aimed at verifying that our switch design (which included, at some considerable complexity, the ability to easily reprogram and experiment with its major components) provides a realistic platform on which to perform further ATM switch and network experiments.

The first results represent the raw throughput of the port controller for ATM traffic. In the first iterations of the switch, throughput was found to be a problem and the firmware was redesigned to lower the number of non-cached memory reads and the processor upgraded to one with a write buffer to reduce the impact of writes (the ARM cache is write through so this was relevant for all memory writes). Full line rates are now achievable with sufficient processor power left to enable the implementation of more complex queueing algorithms.

The second two experiments demonstrate the performance of the port controller when acting as a gateway from Fairisle to both Ethernet and the Cambridge Fast Ring (CFR). Both are found to be limited by the appropriate network interface, rather than the forwarding process to Fairisle.

Port controller throughput

Initial throughput measurements for the Fairisle port controller (version 2) were reported in [Orange93], from which the FPC2 data is taken. The data rates are presented in terms of the data portion of the cells only - the 100Mbit/sec transmission line rate is equivalent to a maximum data rate of 87.3 Mbit/sec.

Figure 5 shows the performance for various burst sizes. The first plot indicates the sustainable data rate at which the processor spent all its time in the FIQ routine, the second the absolute maximum rate achievable.

  
Figure 5: FPC2 throughput performance

As expected, for small bursts the overhead of entering and leaving the interrupt routine reduces the rate at which the CPU first becomes always busy. For larger blocks the interrupt overhead becomes amortised and the full rate is obtained. For small blocks at the maximum rate, the artifacts are caused by beating between the arriving cells, the cell-synchronous switching fabric and the arbitration of accesses to the cell buffer memory.

The measurements show that the early port controllers (using the ARM 3 processor) were capable of a sustained throughput of 73.4 Mbit/sec. One of these port controllers, which was equipped with an alpha-silicon ARM 600 processor (8 entry write buffer), was observed to be limited in throughput at 82.5 Mbit/sec by a firmware restriction in ``Xi3'', the CPU having cycles to spare.

  
Figure 6: FPC3 processor usage

The issue three port controllers (ARM 610 processor) are capable of forwarding at the full line rate (87.3 Mbit/sec). The information presented in figure 6 demonstrates for various loadings and burst sizes the percentage of time the CPU is idle; the idle time is defined to be that which is available for Wanda threads, once interrupt dispatching and other kernel activities have been taken into consideration.

Depending on the burst size, the CPU starts to be entirely consumed by the FIQ routines in the range 40 to 50 Mbit/sec. Hence, although the port controller is capable of handling the entire line rate, above the 40 to 50Mbit/sec rate, the interrupt dispatching overhead prevents any time from being made available to Wanda processes.

Routeing to other networks

Using the standard I/O bus for the ARM chip set it is possible to interface to other networks; in particular we have used Ethernet and the CFR.

Ethernet

Forwarding between the Ethernet and Fairisle is performed by a user level Wanda process dealing in packets. All data movements between the user level buffers and the Ethernet or ATM interface is performed in software. In the case of packets to and from the ATM interface, the segmentation and reassembly functions are also performed by software.

The sustainable throughput achievable is approximately 6.7 Mbit/sec; previous experiments indicate that the limiting factor is the bandwidth available for the copy between memory and the Ethernet card which only provides a 16bit wide interface.

Cambridge Fast Ring

As part of the Super-JANET demonstration phase, a Pandora system situated at University College London was connected to the CFR based Pandora infrastructure at Cambridge via a pair of Fairisle switches and an intermediate 34Mbit/sec PDH circuit carrying ATM cells. The experimental setup is shown in figure 7.

  
Figure 7: Pandora over Super-JANET

Again the performance was limited by the I/O throughput of the network interface, at about 3 Mbit/sec. However this was sufficient to run the Pandora video applications between the sites.

While the CFR is also an ATM network with identical concepts of virtual channels and signalling to those of Fairisle, the CFR predates B-ISDN standardisation and uses a 38 byte cell (32 payload, 4 header, 2 CRC). In this case forwarding is performed by mapping the payloads of 3 CFR cells to that of 2 Fairisle cells in the obvious manner. To be implemented effectively this requires some form of ``push'' indication - irrespective of adaptation layer. For this we used the AAL5 user-user indication for all adaptation layers.

Hence, the bit was found to be useful in implementing a buffering strategy, but at a point in the network which has (and should have) no comprehension of adaptation layers. The bit was simply used as a generic indication of the ``recovery unit''; in this sense, it can be used both as an aid to implementing timely buffering strategies, and more importantly to implement more effective discard algorithms in switches [\bf\protect\citenameRamanathan93].



next up previous
Next: Conclusions Up: Experiences of building ATM Previous: Software



Richard Black, Ian Leslie et. al.