Computer Laboratory

Technical reports

Efficient data sharing

Michael Burrows

December 1988, 99 pages

This technical report is based on a dissertation submitted September 1988 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Churchill College.

Abstract

As distributed computing systems become widespread, the sharing of data between people using a large number of computers becomes more important. One of the most popular ways to facilitate this sharing is to provide a common file system, accessible by all the machines on the network. This approach is simple and reasonably effective, but the performance of the system can degrade significantly if the number of machines is increased. By using a hierarchical network, and arranging that machines typically access files stored in the same section of the network it is possible to build very large systems. However, there is still a limit on the number of machines that can share a single file server and a single network effectively.

A good way to decrease network and server load is to cache file data on client machines, so that data need not be fetched from the centralized server each time it is accessed. This technique can improve the performance of a distributed file system and is used in a number of working systems. However, caching brings with it the overhead of maintaining consistency, or cache coherence. That is, each machine in the network must see the same data in its cache, even though one machine may be modifying the data as others are reading it. The problem is to maintain consistency without dramatically increasing the number of messages that must be passed between machines on the network.

Some existing file systems take a probabilistic approach to consistency, some explicitly prevent the activities that can cause inconsistency, while others provide consistency only at the some cost in functionality or performance. In this dissertation, I examine how distributed file systems are typically used, and the degree to which caching might be expected to improve performance. I then describe a new file system that attempts to cache significantly more data than other systems, provides strong consistency guarantees, yet requires few additional messages for cache management.

This new file-system provides fine-grain sharing of a file concurrently open on multiple machines on the network, at the granularity of a single byte. It uses a simple system of multiple-reader, single writer locks held in a centralized server to ensure cache consistency. The problem of maintaining client state in a centralized server are solved by using efficient data structures and crash recovery techniques.

Full text

PDF (6.0 MB)

BibTeX record

@TechReport{UCAM-CL-TR-153,
  author =	 {Burrows, Michael},
  title = 	 {{Efficient data sharing}},
  year = 	 1988,
  month = 	 dec,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-153.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-153}
}