Download (direct link):
There is a mistaken impression that the LZ coding works for text compression only. In lossless mode, compressing images can be the same as compressing text. The popular image format, TIFF (Tag Image File Format), supports LZ coding. LZ coding techniques are widely used in dithered binary images as well.
2.5.1 The LZ77 Algorithm
LZ77 is the first form of Ziv-Lempel coding proposed by Ziv and Lempel in 1977 . In this approach, a fixed-size buffer containing a previously
ZIV-LEMPEL CODING 45
encoded character sequence that precedes the current coding position can be considered as a dictionary. The encoder matches the input sequence through a sliding window as shown in Figure 2.7. The window is divided into two parts: a search window that consists of an already encoded character sequence and a lookahead buffer that contains the character sequence to be encoded.
Search Window Lookahead Buffer
/r 7 6 5 4 3 2 rs /
b a a b I a c b; a a c b c d b c d b c a c ...
a c b a a c b c d b c d b c a c ...
c b a a c : b c —~ b C; d l b c i a c ...
Fig. 2.7 \JL11 coding: An example with sliding window.
To encode the sequence in the lookahead buffer, the search window is searched to find the longest match with a prefix of the lookahead buffer. The match can overlap with the lookahead buffer, and obviously cannot be the buffer itself. Once the longest match is found, it is coded into a triple <offset, length, C(char) >, where offset is the distance of the first character of the longest match in the search window from the lookahead buffer, length is the length of the match, and C(char) is the codeword of the symbol char that follows the match in the lookahead buffer. The window is shifted left by length + 1 symbols to begin the next search.
184.108.40.206 Example—LZ77 Algorithm: Consider the sequence to be encoded is baabacbaacbcdbcdbcac •••. We assume that the size of the search window is 8 and that of the lookahead buffer is 6. Assume that the substring baabacba in the search window has already been encoded and the substring acbcdb in the lookahead buffer is to be encoded as shown in Figure 2.7(a). After searching the search window, we find that the longest match found is the substring acb of length 3 at distance 4 from the lookahead buffer. The character following the prefix acb in lookahead buffer is c. Hence the triple to output is < 4, 3, C(c) >, where C(c) is the codeword for the character c. Since the match length is 3, we shift the window left by 4 characters.
46 SOURCE CODING ALGORITHMS
Now the first character in the lookahead buffer is d as shown in Figure 2.7(b) and there is no match for d in the search window. Hence we output the triple < 0, 0, C(d) > and shift the sliding window by one.
Now the longest match in the sliding window is the substring bcdbc as shown in Figure 2.7(c). It should be noted that the matching substring starts in character position 3 in the search window and overlaps with the first two characters bcdbc in the lookahead buffer. Hence we output the triple <
3, 5, C(a) > and shift the sliding window left by 6 characters and continue.
There are many variations of LZ77 coding. The popular compression softwares Zip and PKZip both use a variation of the LZ77 coding scheme called the LZSS coding scheme . Use of the codeword for an explicit character in a triple in LZ77 is wasteful in practice because it could often be included as part of the next pointer. This inefficiency has been reduced by the LZSS coding scheme simply by adding a flag bit to indicate whether what follows the pointer is the codeword of a single character. As a result, we need to send only a pair of values corresponding to the offset and the length of the match instead of the triple.
2.5.2 The LZ78 Algorithm
LZ78 is the other key algorithm in the LZ family proposed by Ziv and Lempel in 1978 . Instead of using the previously encoded sequence of symbols (or string) in the sliding window as the implicit dictionary, the LZ78 algorithm explicitly builds a dictionary of patterns dynamically at both the encoder and the decoder. The encoder searches this dictionary to find the longest pattern that matches with the prefix of the input string and encodes it as a pair
< i, C(S) >, where i is the index of the matched pattern in the dictionary and C(S) is the codeword of the first symbol following the matched portion of the input. A new entry is then added in the dictionary. The new entry is the matched pattern concatenated by the symbol S. The codeword C(S) is usually a Huffman-type variable-length code of the symbol S.
In order to achieve further compression, the index i in the output pair can be encoded using some Huffman-type variable-length binary encoding in order to achieve better compression by exploiting the statistics of the indexes. But for the sake of simplicity of explanation, we avoid it here.
220.127.116.11 Example—LZ78 Encoding: Consider the following sequence: