Computer Laboratory - 17th June 2004

	Computer Laboratory 17th June 2004

Computer Laboratory > Research > Systems Research Group > NetOS > Seminars > 17th June 2004

Alternatives for Detecting Redundancy in Storage Systems Data

Calicrates Policroniades
Abstract Storage systems frequently maintain identical copies of data. Identifying these data can assist in the design of solutions in which data storage, transmission, and management are optimised. This talk presents the evaluation of three methods used to discover data redundancy: whole file content hashing, fixed size blocking, and a chunking strategy that uses Rabin fingerprints to delimit content-defined data chunks. Data sets such as a mirrored section of sunsite.org.uk, different data profiles in the file system infrastructure of the Computer Laboratory, and source code distributions were analysed. Experimental results and a comparative analysis of these methods will be presented.



© 2004 University of Cambridge Computer Laboratory Please send any comments to julian.chesterfield@cl.cam.ac.uk Page last updated on 12-Jul-2004 at 13:04 by Julian Chesterfield