There are two aspects of load sharing: when to migrate processes to idle nodes
and how to find which nodes are idle. The former is simple to decide; if the
computational capacity of the node is exceeded, then the load sharing algorithm
attempts to migrate processes elsewhere. The latter is rather more involved,
and there are various techniques:
- Centralised approaches
- A central node collects load statistics and
details of processes and decides which should migrate to where at what time. As
with all centralised approaches this is subject to three major drawbacks: the
state information is always out of date, it is a potential bottleneck, and it
is a single point of failure. This is not as critical as for some other
distributed systems tasks, since load sharing is not a correctness criterion
(so long as we have no timeliness constraints),
merely a method of performance enhancement. In addition, techniques have been
proposed for monitoring the state of servers and recovering them to other nodes
in event of failure [#LamANAFI##1#].
- Distributed approaches
- Workstations exchange information and cooperate
in deciding which processes to move. Since there is true concurrency in
distributed systems, there is a need for synchronisation between different
nodes. In view of the additional communications overhead and suboptimality this
is less efficient than the centralised approach. However, it is rather more
robust.
In the distributed approach, various different techniques exist for finding
idle processors. Amongst these are [#GoscinskiDOSTL##1#]:
- A logical hierarchy of processors, as in MICROS [#WittieMADOS##1#].
- A logical ring of processors.
- No particular structure, as in Worms [#ShochTWPEE##1#]
- The Condor system [#LitzkowCAHOI##1#]