Download (direct link):
Note that none of the codes in column A or in column B is a prefix of any other code in the corresponding column. The codes formed using either column A or column B are uniquely decipherable. On the other hand, binary code of a in column C is a prefix of both the binary codes of 7 and 5.
Some of the popular variable-length coding techniques are Shannon-Fano Coding , Huffman Coding , Elias Coding , Arithmetic Coding , etc. It should be noted that the fixed-length codes can be treated as a special case of uniquely decipherable variable-length code.
1.4 CLASSIFICATION OF COMPRESSION ALGORITHMS
In an abstract sense, we can describe data compression as a method that takes an input data D and generates a shorter representation of the data c(D) with a fewer number of bits compared to that of D. The reverse process is called decompression, which takes the compressed data c(D) and generates or reconstructs the data D' as shown in Figure 1.1. Sometimes the compression (coding) and decompression (decoding) systems together are called a “CODEC,” as shown in the broken box in Figure 1.1.
Input Data ; Compression Compressed Data Decompression ! Data
D : System c(D) System i D'
Fig. 1.1 CODEC.
10 INTRODUCTION TO DATA COMPRESSION
The reconstructed data D' could be identical to the original data D or it could be an approximation of the original data D, depending on the reconstruction requirements. If the reconstructed data D' is an exact replica of the original data D, we call the algorithm applied to compress D and decompress c(D) to be lossless. On the other hand, we say the algorithms are lossy when D' is not an exact replica of D. Hence as far as the reversibility of the original data is concerned, the data compression algorithms can be broadly classified in two categories — lossless and lossy . Usually we need to apply lossless data compression techniques on text data or scientific data. For example, we cannot afford to compress the electronic copy of this text book using a lossy compression technique. It is expected that we shall reconstruct the same text after the decompression process. A small error in the reconstructed text can have a completely different meaning. We do not expect the sentence “You should not delete this file” in a text to change to “You should now delete this file” as a result of an error introduced by a lossy compression or decompression algorithm. Similarly, if we compress a huge ASCII file containing a program written in C language, for example, we expect to get back the same C code after decompression because of obvious reasons. The lossy compression techniques are usually applicable to data where high fidelity of reconstructed data is not required for perception by the human perceptual system. Examples of such types of data are image, video, graphics, speech, audio, etc. Some image compression applications may require the compression scheme to be lossless (i.e., each pixel of the decompressed image should be exactly identical to the original one). Medical imaging is an example of such an application where compressing digital radiographs with a lossy scheme could be a disaster if it has to make any compromises with the diagnostic accuracy. Similar observations are true for astronomical images for galaxies and stars.
Sometimes we talk about perceptual lossless compression schemes when we can compromise with introducing some amount of loss into the reconstructed image as long as there is no perceptual difference between the reconstructed data and the original data, if the human perceptual system is the ultimate judge of the fidelity of the reconstructed data. For example, it is hardly noticeable by human eyes if there is any small relative change among the neighboring pixel values in a smooth non-edge region in a natural image.
In this context, we need to mention that sometimes data compression is referred as coding in the literature. The terms noiseless and noisy coding, in the literature, usually refer to lossless and lossy compression techniques respectively. The term “noise” here is the “error of reconstruction” in the lossy compression techniques because the reconstructed data item is not identical to the original one. Throughout this book we shall use lossless and lossy compression in place of noiseless and noisy coding respectively.
Data compression schemes could be static or dynamic. In static methods, the mapping from a set of messages (data or signal) to the corresponding set of compressed codes is always fixed. In dynamic methods, the mapping from the set of messages to the set of compressed codes changes over time. A
A DATA COMPRESSION MODEL 11
dynamic method is called adaptive if the codes adapt to changes in ensemble characteristics over time. For example, if the probabilities of occurrences of the symbols from the source are not fixed over time, we can adaptively formulate the binary codewords of the symbols, so that the compressed file size can adaptively change for better compression efficiency.