# - Acharya T.

ISBN 0-471-48422-9

**Download**(direct link)

**:**

**51**> 52 53 54 55 56 57 .. 100 >> Next

ЛГ N N

N + —--1-----h

2 4

N

2l-i

< 2 N,

where N is the number of samples in the input signal.

Most of the traditional DWT architectures compute the second level of decomposition upon completion of the first level of decomposition and so on. Hence the ith level of decomposition is performed after completion of the (i — l)th level at stage i in recursion. However, the number of samples to be processed in each level is always half of the size in the previous level. As a result, it is possible to process multiple levels of decomposition simultaneously. This is the basic principle of recursive architecture for DWT computation, which was first proposed for a convolution-based DWT in [18]. Later the same principle was applied to develop recursive architecture for lifting-based DWT by Liao, Cockburn, and Mandal [34, 35]. Here computations in higher levels of decomposition are initiated as soon as enough intermediate data in low-frequency subband are available for computation. The architecture for a three-level of decomposition of an input signal using Daubaches-4 DWT proposed by Liao et al. is shown in Figure 5.11. However, the same principle can be extended to other wavelet filters as well.

сен>-(

u

[ть©-1>

Fig. 5.11 Recursive architecture for lifting.

The basic circuit elements used in this architecture for arithmetic computation are delay elements, multipliers and multiply-accumulators (MAC). The MAC is designed using a multiplier, an adder, and two shifters. The multiplexers Ml and M2 select the even and odd samples of the input data as needed by the lifting scheme. The SI, S2, and S3 are the control signals for data flow of the architecture. For the first level of computation the select signal (SI) of each multiplexer is set to 0, and it is set to 1 during the second

VLSI ARCHITECTURES FOR LIFTING-BASED DWT

125

or third level of computation. The switches S2 and S3 select the input data for the second and third level of computation. The multiplexer M3 selects the delayed samples for each level of decomposition based on the clocked signals shown in Figure 5.11. The total time required by this recursive architecture to compute an L-level DWT is

T = JV + Td + 2(1 + 2 +•■■ + 2i_1) = N + Td + 2L - 2,

where Td is the circuit delay from input to output.

5.3.6 A DSP-Type Architecture for Lifting

A filter independent DSP-type parallel architecture has been proposed by Martina, Masera, Piccinini, and Zamboni in [37]. The architecture consists of Nt = maXi{kSi,kt.} number of MAC (multiply-accumulate) units, where kSi and kti are length of the primal and dual lifting filters Sj and ti respectively in step i of lifting factorization. The architecture is shown in Figure 5.12.

Qout [j ] ) dout \j ]

Fig. 5.12 Parallel MAC architecture for lifting.

The architecture essentially computes the following two streams in each lifting step.

aout[j] = Q-inlj] ~ ^in\j A;].Sj[/c] + oj,

^out[.?] = din[j] — — fc].tj[A;] + 2_|,

126

VLSI ARCHITECTURES FOR DISCRETE WAVELET TRANSFORMS

where ain and din are two input substreams formed by the even and odd samples of the original input signal stream x. It is obvious that streams ciin and bin are not processed together in this architecture; while one is processed the other has to be delayed enough to guarantee a consistent subtraction at the end of the lifting step. The above architecture is designed to compute nt simultaneous partial convolution products selected by the multiplexer (MUX), where nt is the length of filter tap for the lifting step being currently executed in the architecture. After nt clock cycles, the first filtered sample is available for rounding operation at the output of the first MAC\ and subsequent samples are obtained in consecutive clock cycles from the subsequent MAC units (MAC2, MACnt). The “programmable delay” is a buffer that guarantees the subtraction consistency to execute corresponding aout[j} and dout\j] samples at the output. The ROUND unit in Figure 5.12 computes the floor function shown in the lifting equations and the SUB unit processes the corresponding subtraction operations. The input sample streams (a twodimensional image) are stored into a RAM in four sub-sampled blocks in order to properly address the row-wise and column-wise processing of the image for 2-D lifting DWT implementation. A detailed memory addressing scheme and their access patterns have been discussed in great detail in [37].

5.3.7 A Generalized and Highly Programmable Architecture for Lifting

The architecture proposed by Andra, Chakrabarti, and Acharya [25, 26, 27] is an example of a highly programmable architecture that can support a large set of wavelet filters. These include filters (5,3), (9,7), C(13,7), S(13,7), (2,6), (2,10), and (6,10). In this architecture, each stage of the data dependency diagram in Figure 5.6 is assigned to a processor. For wavelet filters requiring only two lifting stages (as in the (5, 3) wavelet filter), this maps to a two processor architecture. For wavelet filters with four lifting stages (such as the (9, 7) wavelet filter), this maps to a four-processor architecture. Figure 5.13 describes the assignment of computation to processors PI and P2 for the (5, 3) wavelet filter.

**51**> 52 53 54 55 56 57 .. 100 >> Next