Chapter 6 Evaluation Earlier chapters have described the design and implementation of mechanisms which can be used within an operating system to provide QOS contracts between the system and applications. These mechanisms are evaluated with respect to their ability to be used by applications. The work presented is then compared with other, related work. 6.1 Experimental Assessment Two important areas of the system which have to be evaluated are the suitability of the VPI for providing applications with the information they need to meet their own QOS requirements and the QOS mechanisms. The application which is used to test the system is a variant of the JPEG video decoder which was described in section 3.1. 6.1.1 Application Use of the System VPI The aim of this section is to evaluate the usefulness of the VPI in terms of the ease or otherwise with which applications may make use of it. Such an assessment is necessarily subjective but does serve to demonstrate basic functionality. When the system is built, the activate field of the decoder's VPI is set to point to the address of the decoder's first instruction. The code at this address initialises the application's environment and installs the initial activation handler. The decoder runs in two phases. While initialising its internal data structures, it arranges to have its saved context resumed by installing a default activation handler. When 79 initialisation is completed, the decoder arranges to be informed of the resources which are allocated to it by installing a second activation handler which decodes and displays video frames. Activation handlers typically consist of an assembler stub /* * Default activation vectoring code */ .globl jd—avec—default .ent jd—avec—default jd—avec—default: la gp,—gp /* Init global pointer */ la sp,jd—astack /* Init stack pointer */ jal jd—act—default /* Call handler */ .end jd—avec—default Figure 6.1: Assembler part of default activation handler. and a body which is written in C. Figure 6.1 shows the default activation vectoring code used while the decoder is initialising itself and figure 6.2 shows the code which implements the body of the handler. Whenever the decoder loses the processor, vp—t *my—vpp; /* Pointer to application's VPI */ void jd—act—init() -- if (my—vpp­?status & VP—STS—EVENT) -- ev—handle(); ť sc—rfar(&my—vpp­?ecx); ť Figure 6.2: Body of default activation handler. its context is saved in the ecx field of its VPI by the NTSC code. When the decoder regains the processor, the vectoring code initialises the minimum amount of environment required then calls the body of the handler. gp is reserved by the MIPS R3000 register usage convention and is used by the compiler to refer to data held in a global data segment. Since, in this design, activation handlers are not reentrant, an execution stack can be statically allocated and its address used to initialise the stack 80 pointer with two machine instructions. 1 The default activation handler checks for the arrival of new events and handles any which are pending then resumes the context saved in its VPI's ecx field. Using this default activation handler, the decoder executes in the same manner as a conventional process, resuming its context from where it was saved when the processor was taken from it. There is an important distinction to be made between using the default activation handler to restore the state of a process and having the system do it; use of the activation handler allows an application to decide whether or not it wants to restore its saved state, rather than having this action always forced upon it by the system. An example of when an application sometimes does not want to resume its saved context is provided by the activation handler which is used by the decoder when its initialisation has completed and it is decoding and displaying frames. Processor cycles are allocated to the decoder periodically at the frame display rate. At the start of each period, the decoder displays the results of its attempt to decode the previous frame and then begins decoding the next frame. This strategy allows the decoder to maintain temporal correctness by sacrificing the logical correctness of its results. Logically incorrect results manifest themselves as frames of video which are displayed on time but which are not completely decoded. The vectoring code for the decoding activation handler is similar to that of the default activation handler shown in figure 6.1. Figure 6.3 shows the body of the handler. When the handler is entered, the decoder's previous context is saved in the VPI's ecx field. After detecting and processing any pending events, the handler determines the reason for the current activation by examining the bits in the status field of its VPI. If the VP STS ALLOC bit in the VPI status register is set, the decoder has just been allocated some processor cycles to decode the next frame. sc rfa is called to leave activation mode and discard any context which was saved in the VPI's ecx field. The results (possibly incomplete) of the attempt to decode the previous frame are presented on the frame buffer by a call to jd show and jd frame is called to initiate the decoding of the next frame. If the current contract affords sufficient time to decode the frame completely, the kernel is informed that the decoder wants to give up the processor until the next time cycles are allocated. This is done by marshalling the argument KC WAIT ALLOC into process/kernel shared memory and executing the NTSC call sc kernel which reflects the call in the kernel process as an event. This call does not return to the activation handler as would the invocation of a normal C function. 1 The la (load address) opcode is expanded by the assembler into a lui (load upper immediate) followed by an addiu (add immediate unsigned). 81 char *fr; /* Buffer for one decoded frame */ vp—t *my—vpp; /* Pointer to application's VP */ vd—t *vd; /* Video stream descriptor */ pk—t *pkp; /* Decoder/kernel shared memory */ void jd—act—display() -- if (my—vpp­?status & VP—STS—EVENT) -- ev—handle(); ť switch (my—vpp­?status & VP—STS—ARMSK) -- case VP—STS—ALLOC: sc—rfa(); jd—show(fr, vd­?wp, vd­?width, vd­?height); jd—frame(vd); pkp­?op = KC—WAIT—ALLOC; sc—kernel(); case VP—STS—PRMPT: case VP—STS—EXTRA: sc—rfar(&my—vpp­?ecx); ť ť Figure 6.3: Body of the decode/display activation handler. If the activation was caused by another process preempting the decoder, then VP STS PRMPT will be set, causing the decoder to resume its previous context from ecx. If, having used its contracted cycles, the decoder is allocated extra cycles by the system, VP STS EXTRA will be set, which also causes the decoder to resume its previous context. The overall effect of these actions on execution of the decoder is that jd frame is called whenever the application is allocated its share of the processor and continues to execute until either the application is next allocated resources or until it completes decoding of the current frame and calls sc kernel to return any unused resources to the system. This allows the decoder to maintain the timeliness with which it delivers frames even when it does not have sufficient resources to decode them completely. In more general terms, it also demonstrates that the system can provide applications with both QOS contracts and sufficient information to enable them to control their 82 behaviour when they do not receive all of the resources which they require. 6.1.2 Interaction With QOS Mechanisms To demonstrate the behaviour of the basic QOS mechanisms, the decoder is allo­ cated a processor bandwidth of (40; 100) milliseconds and executed on an otherwise unloaded system. The decoder is written so that, should it not require all 40 mil­ liseconds to decode a frame, it returns any unused processor cycles to the system. Should the system find that it has any unused idle time, it will choose a process which is capable of using more cycles and offer them to the process. The kernel was instrumented with code to log the occurrence of events during a run and dump the log after completion of the run. During execution, the cycles accumulated by the decoder during the 100 millisecond period assigned for decoding frame f were logged at times t p f , when the policing function was activated and t s f when the decoder surrendered the processor to the kernel to wait for the next allocation of resources. Figure 6.4 shows a summary of this data. The solid line plots min f (t s f ; t p f ) versus 0 10 20 30 40 50 60 70 0 500 1000 1500 2000 Decode Time (milliseconds) Frame Number A B Figure 6.4: Contracted and additional processor times versus frame number. frame number for 2000 frames of a video sequence. This represents the decoder's use of its contracted processor cycles and the fact that this line never exceeds 40 milliseconds demonstrates the actions of the system policing mechanism. The stip­ pled line plots the t s f versus frame number which were recorded in the same run. 83 This represents the total time required by the decoder to display the previous frame and completely decode the next frame. At point A on the graph, the decoder has decoded the frame within its contracted 40 milliseconds and surrendered the processor before it was policed. With reference to the activation handler shown in figure 6.3, execution of the handler has reached the sc kernel call within the contracted amount of processor time. At point B on the graph, the decoder was unable to decode all of the next frame within its contracted time, causing the policing mechanism to intervene while the decoder was executing the jd frame routine. The system saved the current context in the ecx field of the decoder's VPI and tried to give the processor to another, con­ tracted process. Since, in this experiment, there were no other processes to execute, the system allocated extra time to the decoder. This resulted in the decoder being activated with VP STS EXTRA set in its VPI's status field. The handler resumed the saved context and the decoder continued executing until the call to jd frame com­ pleted, when it surrendered the processor to wait for the start of the next resource allocation period. This graph demonstrates the nature of resource allocation within the system. Applications will always obtain a minimum amount of processor time, and may receive additional time if the system has nothing better to do. In the case where multiple contracted processes are able to make use of additional resources, some algorithm for choosing the processes most eligible to receive extra processor time is required. The current implementation simply gives this time to processes in a fixed order, but it is anticipated that within a workstation environ­ ment, the user will be provided with an interface which enables this order to be modified according to their directions. Figure 6.5 shows the processor times in mil­ liseconds which were accumulated by ten decoders, each decoding the same video stream, plotted against the frame number within the stream. The stream consists of the first 100 frames of the stream used to obtain the graph of figure 6.4 and each of the decoders was allocated a processor bandwidth of (30; 400) milliseconds, giving a total contracted processor bandwidth of (30 \Theta 10; 400) = (300; 400). The maximum processor time which can be allocated to each of the decoders before the system is overloaded is 400=10 = 40 milliseconds. Three sections of this graph are of interest. Point A marks a frame which requires less than 30 milliseconds to decode, so all applications return their unused processor time to the system and the system idles. Point B marks a frame which requires a processing time which is between 30 and 40 milliseconds; each of the decoders has consumed its contracted 30 milliseconds and then been flagged as a candidate for extra resources should they become available. The system then offers idle time to the decoders in order starting with number 9 and working down to number 0. For this frame, the system has sufficient idle time for all of the decoders to complete decoding the frame. Point C marks a frame which 84 0 10 20 30 40 50 60 70 80 90 100 0 1 2 3 4 5 6 7 8 9 20 30 40 50 60 Frame Number Stream Number Processor Time (milliseconds) A B C 20 30 40 50 60 10 10 Figure 6.5: Processor time obtained by ten decoder applications. requires more than 40 milliseconds to decode, so that there is insufficient processor bandwidth in the system to completely decode all 10 copies of this frame and the system experiences transient overload. Each decoder obtains its contracted 30 mil­ liseconds, then additional time is offered in order of descending number. This frame requires so much time to decode completely, that only four of the decoders succeed in doing so; the rest can only decode part of the frame. At point C, this graph also demonstrates a number of important properties which the system exhibits as a result of using QOS mechanisms to allocate resources: - even though the system is overloaded, all of the decoders still receive their minimum contracted PB on time and; - not only is the system degrading gracefully, but the manner in which it de­ grades can be directly controlled by allowing the user to specify the algorithm used for allocating extra processing time to contracted processes. 6.1.3 Varying Application QOS It has been shown how the system can control the resources used by applications, and also how applications can detect when they are not receiving enough resources. The usefulness of such a system depends on the ability to construct applications which 85 are capable of producing acceptable results when resources are scarce. This section describes an implementation of the decoder application which has these properties. 6.1.3.1 Layered Processing The video decoder uses the JPEG algorithm, an outline of which is given in section 3.1. At the centre of this algorithm is a loop which converts the incoming compressed bit stream into a sequence of 8\Theta8 tiles of DCT coefficients. An IDCT is performed on each of these tiles to obtain an 8 \Theta 8 array of pels. When the decoder is sequentially processing tiles, a shortage of processor time during a sequence of frames causes areas at the bottom of the picture not to be updated. For many video presentation applications, this form of degradation may not be acceptable. An alternative method of processing the sequence of tiles representing one image uses layered processing. As each coefficient tile is reconstructed from the incoming bit stream, its individual coefficients are recorded and an approximation to the pic­ ture elements represented by the tile is generated and stored into the resultant array of image pels. When all of the tiles have been received, the decoder has an initial approximation to the image which it can further refine as more cycles are allocated to it. A version of the decoder was constructed in which the first layer of process­ ing generates its approximation by taking the first non­zero coefficient in the tile, 2 setting all other coefficients in the approximation tile to zero and calculating the IDCT of this. An optimisation in the IDCT implementation means that calculating the IDCT of a tile containing only one non­zero coefficient can be done reasonably cheaply. The second layer of processing refines the picture by recalling all of the tiles which contain more than one non­zero coefficient 3 and calculating their corresponding pic­ ture elements exactly. Figure 6.6 shows an image in which the first layer processing has completed and the decoder has been given enough time to refine roughly half of the tiles in the second layer. In this image, there are a total of 64 \Theta 64 = 4096 tiles, 347 of which contain a single non­zero coefficient. In terms of the types of imprecise computations discussed in section 3.11.3 this version of the decoder is using the milestone method, each layer in the processing representing a milestone. This is to some extent an improvement on only processing part of the picture completely, but the lower areas of the image still need refining. The picture quality could be im­ proved by increasing the number of DCT coefficients processed in the first layer; this would increase the cost of computing the first layer and, correspondingly increase 2 In zigzag order. 3 Tiles containing only one non­zero coefficient will have been processed completely by the first layer; the ``approximation'' in these cases being the desired result. 86 Figure 6.6: Partially decoded frame: milestone version. the minimum amount of processor time below which the decoder can do little useful work. The fundamental problem with the second layer processing as just described is that it refines the picture from left to right and from top to bottom, so some of the extra processing time is spent refining ``uninteresting'' areas of the image. Recognising this, a second version of the decoder was produced in which the first layer processing sorts tiles into a number of classes according to the number of non­zero coefficients they contain. At the same time, the coordinates of the tile within the resultant image are also recorded. Second layer processing then refines the tiles starting with those which have the most non­zero coefficients and working down towards those which have the least. 4 Figure 6.7 shows the image which results when the decoder has spent the same amount of time in second layer processing as was spent in figure 6.6. Ordering the processing of the second layer has con­ verted a computation with two milestones (layers) into one which has one milestone 4 These will be the tiles with two non­zero coefficients. 87 Figure 6.7: Partially decoded frame: milestone/monotone version. (the first layer) and after reaching this milestone becomes monotone. This type of computational requirement is easy to schedule within Nemo by allocating a proces­ sor bandwidth which ensures that the first milestone will be met (or exceeded by some required amount), then using any spare processor time to refine the remaining monotone part of the computation. The aim in choosing the second layer heuristic was to process first those parts of the picture which contain the most information, thus focussing the remaining processor time on those areas of the picture which will benefit from it the most. This is a rather arbitrary heuristic which is seen to work well in practice when applied to a number of different video streams, but there is clearly scope for further work in the development of such heuristics for use in layered processing applications. This is especially the case if the video is stored. While the decoder applies its simple heuristic in real­time as tiles are reconstructed from the input stream, the ability to preprocess a stored video stream offline allows more complex algorithms to determine 88 a (possibly optimal) processing order for the bits which comprise a frame and to present these bits to applications in this order. 6.1.4 User Level Threads The decoder application does not make full use of the activation mechanism; a more complex application might want to make use of a user­level thread package. In such a case, the activation handler can be written to call into the user level thread scheduler, passing it a pointer to the context which was saved at the time the process last lost the processor. The thread scheduler can decide, based on which external events have been received, whether to use the context saved in ecx to save it and resume that of another thread. Whatever the decision, the chosen context can be resumed at the end of the handler by calling sc rfar. 6.2 Comparison With Related Work Support for continuous media applications is an active area of research and there is a considerable corpus of related work. The work selected for comparison in the fol­ lowing sections was chosen for its direct relevance to the work presented in previous chapters. 6.2.1 Scheduling [Coulson93] describes work which aims to provide a set of low level abstractions for programming distributed CM applications and provide them with pre­specified, guaranteed QOS constraints. The basic system support for these abstractions is the Chorus [Bricker91] microkernel, which provides a number of real­time facilities such as page locking, preemptive scheduling, system call timeouts and scheduling classes. It is noted that, while the Chorus microkernel is in use within a number of real­time systems, it does not provide facilities for controlling application QOS or reserving resources to meet QOS guarantees and that, while it is possible to specify thread scheduling constraints relative to other threads, there is no way to specify absolute thread scheduling requirements. Within the system described, thread scheduling uses an EDF policy and does not guarantee that deadlines will be met, so QOS guarantees may be violated when the system becomes overloaded. The suggestion is made that this could be avoided by using a suitable admission control algorithm. The success of such an algorithm will depend on the amount of information available 89 about the resource requirements of the threads being scheduled. The incorporation of a policing mechanism could help prevent QOS guarantees from being violated when thread resource requirements are underestimated. [Mercer93] describes the processor capacity reserve mechanism. A reserve repre­ sents access to a certain processor capacity, expressed as a computation time and a reservation period. Processes with reserves are given the processor in preference to time sharing processes. Processes present the system with requests for reservations and the system determines whether it can accommodate their requests using a rate monotonic admission test. At the beginning of every reservation period, a process with a reservation is allocated its reserved processor capacity. When this has been consumed, the process is scheduled under the time sharing policy until it receives its next allocation. Accounting for the use of reserves by server threads is performed by passing the client's current reserve to the server which charges its computation time to the client. This resource allocation strategy is similar to that used by Nemo, with processor capacity equating to PB. Relegation of processes which have consumed their reserved capacity to the time sharing scheduling policy loses the ability to focus additional processing time on a favoured process as might be required to take advantage of any of the benefits of statistical multiplexing. 6.2.2 Virtual Processor Interface Viewing an operating system as a provider of a VPI is a well established concept in computing systems work; [Leffler89] describes the development of the unix virtual machine and the modifications which were required to remove race conditions in the signal delivery mechanism and provide signal masking facilities. This version of unix also makes certain kernel information available to applications by mapping their u areas into their address spaces and allowing them read only access to it. The use of threads as a means of obtaining concurrency and clarity has motivated further changes in the VPI, primarily to facilitate the implementation of user level threads. [Anderson92] describes scheduler activations, which closely resemble kernel threads. Typically, there is one active scheduler activation per physical processor; user level threads are multiplexed on a scheduler activation by the user level thread scheduler. The kernel informs the user level scheduler of scheduling events by allo­ cating a new scheduler activation and upcalling the user level code at a fixed address, passing the context of any blocked activations as arguments. The user level sched­ uler informs the kernel of user level events by calling into the system. Scheduler activations were implemented on a uniform memory architecture multiprocessor, so among the scheduling events are indications of: when a process loses or gains processors and; when a process has idle processors. A process which has a single 90 processor and loses it does not find out that it lost the processor until it is next given another processor, whereupon the kernel allocates a scheduler activation and upcalls the user level scheduler. Nemo provides the same information to a process by activating the process and informing the process that it has been activated be­ cause it has just obtained the processor. In the scheduler activation scheme, when a user level scheduler finds itself with an idle processor, it is obliged to surrender that processor by calling into the system; failure to do this results in an unfair usage of system resources by the process. No direct mechanism is provided for limiting the impact of such selfish processes on the rest of the system. Instead, the reactions of the multilevel feedback scheduler are relied upon to identify the process as com­ putationally bound and prevent it from interfering with more interactive processes. Nemo's policing mechanism can be used to limit the impact of such processes on the rest of the system. Psyche [Marsh91] provides another example of a virtual processor interface which has been augmented to provide support for large scale user level parallelism. Kernel threads are used to implement virtual processors of which typically one is allocated per physical processor. User level schedulers multiplex threads on top of these virtual processors and are informed of scheduling events by the kernel via virtual processor interrupts. These are generated in response to kernel events including: virtual processor initialisation; threads blocking and unblocking in the kernel; signals from other virtual processors and; an interrupt warning of imminent preemption. User level schedulers and the kernel communicate via a piece of shared memory which is part of the virtual processor interface. The use of shared memory to interface the user and kernel schedulers reduces the number of protection domain crossings and advantage is also taken of this mechanism in Nemo. Of particular interest in Psyche is a ``two­minute warning'' interrupt which alerts a process that it is about to lose the processor. This aims to provide a virtual processor with a hint so that it can clean up when it is about to lose a physical processor. This is not a guarantee that there will be enough time to complete the clean up, but it is intended to minimise the likelihood of inopportune preemption. [Govindan91] describes the ACME continuous media I/O server as a typical CM application along with its performance when running under a typical workstation operating system. The observations are that the application's behaviour suffers from timing errors and lost data when running concurrently with other system activity, and cannot meet the low delay requirements of even moderate audio data formats. It is claimed that these problems are due in part to the overhead of the user/kernel interaction mechanisms by which user level programs invoke system functions such as CPU scheduling and I/O. Split Level Scheduling (SLS) is proposed as an operat­ ing system mechanism for supporting CM applications. SLS presents to applications a virtual processor which is implemented as a kernel thread and incorporates time in 91 the form of deadlines into the interface between the Kernel Level Scheduler (KLS) and the User Level Scheduler (ULS). Incorporation of thread deadlines into the virtual processor interface means that it is possible for the kernel to allocate the processor to the address space which contains a runnable thread with the earliest deadline and similar functionality is achieved within Nemo. The SLS interface is quite complex; for example, it allows the kernel to examine the contents of thread descriptor queues and I/O descriptors. This gives rise to synchronisation require­ ments which are met by enabling the application to disable its preemption from user level code. Interrupts presented by the SLS to the ULS include INT RESUME which occurs when the address space is given the processor and INT TIMER which occurs when the address space's software timer expires. The means by which interrupted thread context is made available to the ULS is not specified. Scheduler activations, Psyche's virtual processors and the Split Level Scheduler all have the common goal of providing an efficient means by which a kernel thread scheduler can communicate with a user level thread scheduler so that the overall cost of providing user level parallelism is reduced. The incorporation of deadlines into the SLS interface makes this goal more applicable to use within CM applications. While Nemo's virtual machine interface is similar in many respects to these, the reasons for it being so are in many respects quite different. The Nemo kernel does not know about process's threads; its sole responsibility is to apportion the available processor cycles to processes in the manner dictated by the QOS manager. I/O is performed by device driver processes, and applications wishing to perform I/O communicate directly with those processes via IPC rather than through the kernel. This obviates the problem of what to do when a thread blocks in the kernel because it is waiting for slow I/O to complete. Nemo threads do not enter the kernel as do the usual kernel thread implementations. Threads are scheduled at user level within processes and when a user level thread scheduler decides that it can do no more useful work because all of its threads are blocked or because it must wait until the right time, it surrenders the processor to the kernel using the sc kernel NTSC call. Execution within the kernel occurs only as part of the kernel itself and is not done as part of a user process's execution. Consequently there is no need for a process to have an associated kernel stack and all the NTSC requires of a process is a place to store a process's context when that process is deactivated. In contrast to the systems reviewed which use the scheduling information provided by virtual processor interrupts solely to implement efficient user level threads, Nemo applications can additionally make direct use of this information to control their QOS as illustrated by the example of section 6.1.1. 92 6.2.3 QOS [Coulson93] describes abstractions for providing and maintaining QOS guarantees to distributed CM applications. These mechanisms include rtports which are end points for CM communications and handlers. Whenever data is sent to an rtport, any associated handler is invoked. It is claimed that real­time programming is simplified by structuring applications to react to events which are generated by the system, and that the use of handlers reduces the number of protection do­ main crossings required to deliver data to an application; the requirement of having the application call into the system to request data is removed. The functionality provided by rtports and handlers can be implemented directly from the shared memory and event mechanisms provided by Nemo. Since rtports which are bound between address spaces 5 use the standard Chorus IPC mechanisms, they incur the overhead of copying or remapping their data. Shared memory IPC and an appropri­ ate event mechanism can obviate these. The QOS of data associated with an rtport can be specified as a vector of parameters including: guarantee, the desired degree of certainty with which the requested QOS is to be provided; delay and jitter which specify the temporal requirements of the rtports's data, and determine the scheduling requirements of the handler. These QOS specifications correspond to the low level QOS parameters described in section 3.2. The discussion does not contain any information on the exact effects of the guarantee parameter or on how the guarantees it offers may be quantified. [Tokuda92] presents the Capacity Based Session Reservation Protocol (CBSRP), which reserves system resources for CM applications in order to guarantee their QOS. Qualities of CM services are expressed in terms of temporal and spacial resolution: temporal resolution may be mapped to frames per second of video or samples per second for audio and spacial resolution may be mapped to bits per display pixel or maximum spacial frequency resulting from a video compression algorithm. In their terminology, spacial and temporal resolution are QOS parameters, and are chosen so that they may easily be mapped onto a reasonable set of lower level system attributes such as processor and memory allocations. As a result of this, the user is presented with a selection of QOS classes from which to choose. This definition of a QOS parameter is equivalent to what is referred to as an application or high level QOS parameter in Nemo, and the low level system attributes correspond to system or low level QOS parameters. A possible advantage of identifying system QOS parameters as such is that, if there is no mapping from an application to a system QOS parameter, then QOS may be specified directly in terms of system QOS parameters. The system entities which are involved in the provision of QOS guarantees are the Session Manager (SM), which handles creation, termination and 5 Actors in Chorus terminology. 93 reconfiguration requests from users and renegotiates with remote session managers, System Resource Manager (SRM) and Network Resource Manager (NRM) which handle admission control and resource management. The session manager performs the equivalent function to Nemo's QOS manager, the functions of SRM and NRM being performed by equivalent management entities within the system and network domains. [Nicolaou91] makes a strong case for the construction of QOS management facilities as a collection of subsystem management domains such as SRM and NRM which engage in negotiation in terms of QOS parameters. The establishment of interfaces which are well defined in terms of QOS parameters and classes between QOS management subsystems can improve the modularity of the resulting system and, in the case of the pseudo code presented for the SM, would remove the need for a high level entity to have to calculate subsystem specific resource requirements such as MAC layer bandwidth requirements. 6.3 Summary Nemo is evaluated with respect to both its ability to provide the resource manage­ ment facilities required to support application QOS and the usability of the VP interface which it presents to applications. Correct behaviour of the resource man­ agement mechanisms is demonstrated, and an example video decoder application is presented as an example of how an application can make use of the resource avail­ ability information provided to it by the system via the VPI. The work presented in this dissertation is then compared with other, related work. It is shown that, while the facilities provided by the Nemo VPI are similar to those provided in other sys­ tems, Nemo differs in its use of the VPI to provide resource availability information for applications to use in maintaining their application­level QOS. 94