Alternatives for Detecting Redundancy in Storage Systems Data
Abstract Storage systems frequently maintain identical copies of data.
Identifying these data can assist in the design of solutions in which data
storage, transmission, and management are optimised. This talk presents the
evaluation of three methods used to discover data redundancy: whole file
content hashing, fixed size blocking, and a chunking strategy that uses
Rabin fingerprints to delimit content-defined data chunks. Data sets such
as a mirrored section of sunsite.org.uk, different data profiles in the
file system infrastructure of the Computer Laboratory, and source code
distributions were analysed. Experimental results and a comparative
analysis of these methods will be presented.