Chapter 16 Distributed IPC

Exercises

16-1 Compare and contrast a client server model of distributed computation with an object model.

Could a pipeline be said to adhere to either model? How would you implement a pipelined compiler in a world based on client-server interactions or object invocations?

Chapter 16 starts with a discussion of client-server and object models. A pipeline does not naturally adhere to either model since both assume a request-response flow of control rather than a stream. One can implement each stage of a pipeline as a server with the client as the previous stage. The server-components must be programmed to take a request and acknowledge it immediately in a reply (assuming that a reply is mandatory, as in most RPC systems) so as not to delay the previous component while it carries out processing.

16-2 How can the components of a distributed system agree on a value to use for system time? Under what circumstances might such a value be needed?

The nature of distributed systems is that there is no universal time. The components may use a clock synchronisation protocol to keep their clocks reasonably close together. There may be a requirement to be able to make a decision such as "who asked first". An example is the distributed mutual exclusion protocol outlined in the appendix. There is no correct answer to the question but for the purposes of algorithm design all we have to do is ensure that every component takes the same decision about the order of events. For example. each component uses a timestamp generated from its local clock. There is system-wide agreement on the ordering of the components. No two timestamps from the same component can be the same (by definition). If timestamps deriving from different components are the same they are resolved on the basis of the agreed component ordering.

16-3 A process on node A invokes an operation on node B in a distributed system.

How might the fact that the nodes, and the network connecting them, may fail independently of each other be taken into account in the design of the distributed invocation at the language level and at the communications level?

Can you think of examples where it is desirable that the clocks at nodeA and nodeB should be kept in synchronisation?

At the communications level we use timeouts. A protocol may retry a few times on timeout expiry. On repeated failure a failure return is made to the application.

At the language level we have to be prepared for a failure return.

The language may allow user-defined exceptions and exception handlers.

Suppose clients and servers synchronise their clocks on service invocation and that an RPC contains a timestamp. If the server crashes and restarts it can distinguish between a new, post-crash, call from a client and a repeat of a pre-crash call. See also exercise 16-10.

16-4 For the styles of IPC studied in Part II, explain how the mechanism might be extended for inter-node IPC.

Consider how the network communication software and the local IPC support might be integrated.

Focus on the naming schemes that are used by the centralised IPC mechanisms. Consider the problem of naming when the mechanisms are extended for distributed IPC.

When might the name of a remote operation be bound to a network address. How could this binding be supported at kernel, local service, or dedicated remote service level?

One approach is to detect a non-local destination in IPC and invoke communications services. This might take place within the kernel or through a user-level process or server which stands in place of all non-local destinations. See, for example, the Accent and Mach kernels.

Another approach is to make local IPC a special case of network communication, as in UNIX BSD, see Section 23.16.

A global naming scheme for IPC destinations is required if IPC is to be distributed. Section 16.9 covers some approaches to providing a naming, registration and binding service and gives the method used by ANSA as an example.

16-5 Security involves both authentication and access control (when you invoke a remote operation you should be able to prove that you are who you say you are (authentication) and there should be a check that you are authorised to invoke that operation on that object (access control)). Consider how both of these functions are supported in a centralised, timesharing system and in a single user workstation. What infrastructure would be needed to support these functions in a distributed system?

In a centralised system you typically authenticate yourself by password when you login to the system. This is potentially insecure if your password is transmitted as plaintext (without encryption) from your terminal to the host. Also, if you are using a terminal emulator rather than a dumb terminal, someone may have inserted a password grabbing program.

The method used for authorisation (access control) depends on the type of object you wish to use. Access control lists associated with an object indicate WHO may use it and in what way (and you have proved who you are in a once-for-all authentication at login). Certain objects, such as memory segments, may be created dynamically and associated with a process you have created to run a program. Access control for these is set up in the memory management hardware.

A secure distributed system would have an authentication service with encryption key management and encrypted communication would be used. A full discussion is beyond the scope of this book but a point to emphasise is that most distributed systems are not highly secure. In the context of RPC a danger is that an erroneous or malicious client could waste server time, even if the requests for service were rejected.

Students could be asked to design a registration and "login" service for users of a small localised distributed system.

16-6 How can compile time type checking be supported in an RPC system?

How can an RPC system check that you are not attempting to call a remote procedure that has been changed and recompiled since your program was compiled and type checked against it?

Interface specifications of remote procedures must be available for compile time type checking. This might be through shared libraries of public service interfaces and those defined by components of a distributed application.

One method is to associate the identifier of the remote interface with the call expression on compilation, see 16.7.2 and to pass this identifier with each call at run-time so that it may be checked.

The approach taken by ANSA is given in 16.9.3. Here, the provider of a service exports its interface to a system server (interface trader) and is returned an identifier. Clients import the interface and are given the same identifier which may be checked on an RPC. The provider may later withdraw or change the interface and the client’s identifier will become out of date.

16-7 Can RPC be used in a heterogeneous environment?

An aim of standards organisations is to make interworking of heterogeneous systems possible. There are standard protocols for remote execution, for example, the ISO REX protocol, and standards for external data representation, for example, ASN.1 (ISO, 1986).

16-8 How could you contrive to program an application in which massive amounts of data are to be transferred according to a "stream" paradigm in a system which supports only RPC for distributed IPC. You may assume that multi-threaded processes and dynamic thread creation are available.

A naive use of an RPC scheme is that a block of data would be sent and acknowledged, then another block and so on. A disadvantage is that the sender may not want to wait for an acknowledgement before sending the next block in sequence. This disadvantage is minimised if very large amounts of data are sent in an RPC. The problem then is that the receiver may be able to start work as soon as a small amount of data arrives and large packets occupy too much buffer space.

If the RPC system allows many threads of a process to use the service at the same time, a number of threads could be used to send a sequence of data blocks. As soon as an acknowledgement arrives for a thread, that thread could send off the next block of data.

16-9 Is distribution transparency fully achievable in an RPC system? What software components can be used to provide this illusion? How could the independent failure modes of system components be handled in an RPC system where distribution transparency was a design aim?

Suppose that at the language level the program is written without distinguishing between local and remote procedure calls. A pre-processor could detect unknown local procedures and, if all is well, insert calls to library routines to invoke remote procedures. A certain amount of exception handling could be done at this level and a call retried, for example, but if a failure has occurred the main program cannot continue. That is, in the absence of failures the scheme as presented will work. If there are failures the program has to deal with them and transparency is violated.

As discussed in 16.7.1 certain arguments do not make sense for a remote call. An RPC design aiming for transparency has to decide what to do about reference parameters and so on.

16-10 Is it desirable that an RPC system should provide support for client, network and server crash and restart? Distinguish between "exactly once" semantics in the absence of crashes and restarts and in their presence. What are the implications on the application level of providing universal "exactly once" semantics? What are the likely implications on the performance of all RPC’s?

In general, it is not desirable to attempt to handle crash and restart transparently. One would be setting up a transaction system, supporting atomic RPCs, which is not appropriate at this level.

Exactly once semantics in the absence of crashes indicates that the RPC communications level will retry on timeout to attempt to succeed in making the call in spite of server load or network congestion. Eventually failure is reported to the application. If the server has crashed, this might have been part way through a call. Exactly once semantics in the presence of crashes would indicate that the server, on restarting, should return its state to that before such a call started. This would require the server to support atomic operations by a method such as those described in Chapter 14. It would also require the RPC system to be able to store persistently the calls in progress at all times in case of crashes.