Computer Laboratory Home Page Search A-Z Directory Help
University of Cambridge Home Computer Laboratory
17th June 2004
Computer Laboratory > Research > Systems Research Group > NetOS > Seminars > 17th June 2004

Alternatives for Detecting Redundancy in Storage Systems Data

Calicrates Policroniades
Abstract Storage systems frequently maintain identical copies of data. Identifying these data can assist in the design of solutions in which data storage, transmission, and management are optimised. This talk presents the evaluation of three methods used to discover data redundancy: whole file content hashing, fixed size blocking, and a chunking strategy that uses Rabin fingerprints to delimit content-defined data chunks. Data sets such as a mirrored section of sunsite.org.uk, different data profiles in the file system infrastructure of the Computer Laboratory, and source code distributions were analysed. Experimental results and a comparative analysis of these methods will be presented.