Department of Computer Science and Technology

Technical reports

Management of replicated data in large scale systems

Noha Adly

November 1995, 182 pages

This technical report is based on a dissertation submitted August 1995 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Corpus Christi College.

DOIhttps://doi.org/10.48456/tr-383

Abstract

Data is replicated in distributed systems to improve system availability and performance. In recent years, the growth of internetworks and distributed applications has increased the need for large scale replicated systems. However, existing replication protocols do not address scale and autonomy issues adequately. Further, current applications require different degrees of of consistency, and therefore they should be given the ability to choose the level of consistency that is appropriate for their particular semantics. This dissertation presents a new scalable replication protocol (HARP) that is based on organising the replicas into a logical hierarchy. It is argues that adopting a hierarchical structure allows for exploiting localised communication, which is taken as the key to achieve scalability. Moreover it gives the ability to provide different degrees of consistency.

HARP provides an efficient and scalable propagation scheme where each node needs to communicate with a few nodes only while ensuring reliable delivery. A new service interface is proposed that gives the application the flexibility to choose between strong and weak consistency. Further the scheme provides the ability to offer different levels of staleness, depending on the needs of various applications. Dynamic restructuring operations are presented which allow the hierarchy to be built and reconfigured, including the restarting of failed nodes and re-merging partitioned networks. The operations produce low message traffic by exploiting localised communication, and do not disturb normal operations. This is achieved while ensuring no loss of messages.

Reconciliation methods based on delivery order mechanisms are provided to resolve temporary inconsistencies and an application can choose from them. A new algorithm that supports casual order delivery is proposed. The desirable characteristic of the algorithm is that, by relying on the hierarchical propagation of HARP, it cuts down the size of the timestamp required to verify causality significantly, and thus enhances scalability.

A detailed simulation study was carried out to evaluate the performance of HARP and to quantify the benefits and losses resulting from moving from strong consistency to weak consistency under different system configurations and load mixes. Further, a simulation study was conducted to compare the performance of HARP to another weak consistency replication protocol, the Time Stamped Anti Entropy.

An alternative hierarchical propagation protocol is proposed as an optimisation of HARP, called HPP. The main difference between HPP and HARP is that HPP avoids the exchange of global state information when reconfiguration or failiures occur. Therefore HPP is more scalable; however, it can tolerate only special patterns of failiure. The protocol is presented in detail and its strengths and limitations are analysed.

Full text

PDF (18.2 MB)

BibTeX record

@TechReport{UCAM-CL-TR-383,
  author =	 {Adly, Noha},
  title = 	 {{Management of replicated data in large scale systems}},
  year = 	 1995,
  month = 	 nov,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-383.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-383},
  number = 	 {UCAM-CL-TR-383}
}