# - Acharya T.

ISBN 0-471-48422-9

**Download**(direct link)

**:**

**15**> 16 17 18 19 20 21 .. 100 >> Next

2.2 HUFFMAN CODING

From Shannonâ€™s Source Coding Theory, we know that a source can be coded with an average code length close to the entropy of the source. In 1952, D. A. Huffman [1] invented a coding technique to produce the shortest possible average code length given the source symbol set and the associated probability of occurrence of the symbols. Codes generated using this coding technique are popularly known as Huffman codes. Huffman coding technique is based on the following two observations regarding optimum prefix codes.

HUFFMAN CODING 25

â€¢ The more frequently occurring symbols can be allocated with shorter codewords than the less frequently occurring symbols.

â€¢ The two least frequently occurring symbols will have codewords of the same length, and they differ only in the least significant bit.

Average length of these codes is close to entropy of the source.

Let us assume that there are m source symbols {si, S2, â€¢ â€¢ â€¢, sm} with associated probabilities of occurrence {p\, P2, â€¢ â€¢ â– , pm}- Using these probability values, we can generate a set of Huffman codes of the source symbols. The Huffman codes can be mapped into a binary tree, popularly known as the Huffman tree. We describe the algorithm to generate the Huffman tree and hence the Huffman codes of the source symbols below. We show a Huffman tree in Figure 2.1.

1. Produce a set N={N\, N2, â€¢ â€¢ â– , Nm} of m nodes as leaves of a binary tree. Assign a node Ni with the source symbol Si, i = 1, 2, â€¢ â€¢ â€¢, m and label the node with the associated probability pi.

(Example: As shown in Figure 2.1, we start with eight nodes N0, N1, N2, N3, N4, N5, Nq, N7 corresponding to the eight source symbols a, b, c, d, e, f, g, h, respectively. Probability of occurrence of each symbol is indicated in the associated parentheses.)

2. Find the two nodes with the two lowest probability symbols from the current node set, and produce a new node as a parent of these two nodes.

(Example: From Figure 2.1 we find that the two lowest probability symbols g and d are associated with nodes N$ and N3 respectively. The new node Ng becomes the parent of N3 and iV6.)

3. Label the probability of this new parent node as the sum of the probabilities of its two child nodes.

(Example: The new node Ng is now labeled by probability 0.09, which is the sum of the probabilities 0.06 and 0.03 of the symbols d and g associated with the nodes N3 and iV6 respectively.)

4. Label the branch of one child node of the new parent node as 1 and the branch of the other child node as 0.

(Example: The branch N3 to Ng is labeled by 1 and the branch N$ to Ns is labeled by 0.)

5. Update the node set by replacing the two child nodes with smallest probabilities by the newly generated parent node. If the number of nodes remaining in the node set is greater than 1, go to Step 2. (Example: The new node set now contains the nodes No, N\, N2, N4, N5, Nt, Ng and the associated probabilities are 0.30, 0.10, 0.20, 0.09, 0.07, 0.15, 0.09, respectively. Since there are more than one node in

26 SOURCE CODING ALGORITHMS

the node set, Steps 2 to 5 are repeated and the nodes Ng, Nio, Nu, N\2, N13, Nn are generated in the next six iterations, until the node set consists only of N14.)

6. Traverse the generated binary tree from the root node to each leaf node Ni, i = 1, 2, â€¢â€¢â€¢, m, to produce the codeword of the corresponding symbol Si, which is a concatenation of the binary labels (0 or 1) of the branches from the root to the leaf node.

(Example: The Huffman code of symbol h is 110, formed by concatenating the binary labels of the branches -/V14 to N13, N\3 to Nn and Nu to N7.)

It is needless to mention that any ensemble of binary codes, which can be mapped into a binary tree, consists of prefix codes. Hence Huffman code is

HUFFMAN CODING 27

also a prefix code. The Huffman code generation process described above is a bottom-up approach, since we perform the code construction process on the two symbols with least probabilities.

2.2.0.1 Example 1: Assume the alphabet S = {a, b, c, d, e, /, g, h} with

8 source symbols and their corresponding probabilities are p(a) = 0.30, p(b) = 0.10, p(c) = 0.20, p(d) = 0.06, p(e) = 0.09, p(f) = 0.07, p(g) = 0.03, and p(h) = 0.15 respectively. The Huffman tree generated by the Huffman Coding algorithm is shown in Figure 2.1 and the corresponding Huffman code table is shown in Table 2.1.

Let us consider a string M of 200 symbols generated from the above source, where the numbers of occurrences of a, b, c, d, e, /, g and h in M are 60, 20, 40, 12, 18, 14, 6 and 30 respectively. Size of the encoded message M using the Huffman codes in Table 2.1 will be 550 bits. Here it requires 2.75 bits per symbol on the average. On the other hand, the length of the encoded message M will be 600 bits if it is encoded by a fixed-length code of length

3 for each of the symbols. This simple example demonstrates how we can achieve compression using variable-length coding or source coding techniques.

**15**> 16 17 18 19 20 21 .. 100 >> Next