Dictionary Approaches to Compression

Next: Continuous Data: Sample Rates Up: Lossless Data Compression Previous: Huffman Compression

Dictionary Approaches to Compression

A completely different approach is to look at the data as it arrives and form a dictionary on the fly. As the dictionary is formed, it can be used to look up new input, dynamically, and if the new input existed earlier in the stream, the dictionary position can be transmitted instead of the new input codes. These schemes are known as ``substitutional'' compression algorithms, and there are two patented families of schemes invented by J Ziv and A Lempel in the 1970s that cover a broad class of the ideas here.

Essentially, the dictionary is constructed as a data structure that holds strings of symbols that are found in the input data, together with short bitstring entry numbers. Whenever an entry is not found, it is added to the dictionary in a new position, and the new position and string sent. This means that the dictionary is constructed at the receiver dynamically, so that there is no need to carry out statistics or share a table separately.

A second family of Lempel-Ziv dictionary based compression schems is based on the idea of a sliding window over the input text. The compression algorithm consists of searching for substrings ahead in the text, in the current window. This approach constrains the size of the dictionary, which could otherwise grow in an unbounded way.

Next: Continuous Data: Sample Rates Up: Lossless Data Compression Previous: Huffman Compression

Jon CROWCROFT
1998-12-03