Kernel Structure

Next: Scheduling Up: Structural Overview Previous: Time

Kernel Structure

The Nemesis kernel consists almost entirely of interrupt and trap handlers; there are no kernel threads. When the kernel is entered from a domain due to a system call, a new kernel stack frame is constructed in a fixed (per processor) area of memory; likewise when fielding an interrupt.

Kernel traps are provided to send() events, yield() the processor (with and without timeout) and a set of three variations of a ``return from activation'' system call, rfa(). The return from activation system calls perform various forms of atomic context switches for the user level schedulers. Privileged domains can also register interrupt handlers, mask interrupts and if necessary (on some processors) ask to be placed in a particular processor privileged modes (e.g. to access TLB etc).

On the Alpha AXP implementation, the above calls are all implemented as PAL calls with only the scheduler written in C (for reasons of comprehension).

Nemesis aims to schedule domains with a clear allocation of CPU time according to specifications. In most existing operating systems, the arrival of an interrupt usually causes a task to be scheduled immediately to handle the interrupt, preempting whatever is running. The scheduler itself is usually not involved in this decision; the new task runs as an interrupt service routine.

The for a high interrupt rate device can therefore hog the processor for long periods, since the scheduler itself hardly gets a chance to run, let alone a user process. Such high frequency interruptions can be counter productive; [18] describes a situation where careful prioritising of interrupts led to high throughput, but with most interrupts disabled for a high proportion of the time.

Sensible design of hardware interfaces can alleviate this problem, but devices designed with this behaviour in mind are still rare, and moreover they do not address the fundamental problem: scheduling decisions are being made by the interrupting device and interrupt dispatching code, and not by the system scheduler, effectively bypassing the policing mechanism.

The solution adopted in Nemesis decouples the interrupt itself from the domain which is handling the interrupt source. Device drivers are implemented as privileged domains - they can register an interrupt handler with the system, which is called by the interrupt dispatch code with a minimum of registers saved. This typically clears the condition, masks the source of the interrupt, and sends an event to the domain responsible. This sequence is sufficiently short that it can be ignored from an accounting point of view. For example, the ISR for the LANCE Ethernet driver on the Sandpiper is 12 instructions long.

Sometimes the recipient domain will actually be a specific application domain (e.g. an application which has exclusive access to a device). However, where the recipient domain of the event is a device driver domain providing a (de-)multiplexing function (e.g. demultiplexing Ethernet frame types), this domain is under the control of the based scheduler like any other and can (and does) have resource limits attached.

A significant benefit of the single virtual address space approach for ISRs is that virtual addresses are valid regardless of which domain is currently scheduled. The maintenance of scatter-gather maps to enable devices to DMA data directly to and from virtual memory addresses in client domains is thus greatly simplified.

Next: Scheduling Up: Structural Overview Previous: Time

I. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns & E. Hyden