# - Acharya T.

ISBN 0-471-48422-9

**Download**(direct link)

**:**

**50**> 51 52 53 54 55 56 .. 100 >> Next

Fig. 5.8 A reconfigurable folded architecture [30].

Since the datapath architecture of the last two pipeline stages of the architecture in Figure 5.7 and their data dependency behavior is exactly identical to the first two stages, the last two pipeline stages can be folded into the first

VLSI ARCHITECTURES FOR LIFTING-BASED DWT

121

two. This can be accomplished by appropriately scheduling the necessary data for operation in the architecture. This folding architecture was proposed by Lian, Chen, Chen, and Chen in [30]. The architecture has two pipeline stages, with three pipeline registers, Rl, R2, and R3. For the wavelet filters requiring computation of four lifting factors, such as the (9, 7) filter, intermediate data (R3) after first two lifting steps (phase 1) are folded back to Rl as shown in Figure 5.8 for computation of the last two lifting steps (phase 2). The architecture can be reconfigured so that computation of the two phases can be interleaved by selecting proper data by the multiplexers. As a result, we need two delay registers (D) in each lifting step in order to properly schedule the data for each phase. Based on the phase of interleaved computation, the coefficient for multiplier Ml is chosen as either a or c and similarly b or d for multiplier M2. As a result, the hardware utilization is always 100%. For wavelet filters requiring only two lifting steps, such as (5, 3) type wavelet filters, the folding is not required.

5.3.3 Flipping Architecture

While conventional lifting-based architectures require fewer arithmetic operations compared to the convolution-based approach for DWT, they sometimes have long critical paths. For instance, the critical path of the lifting-based architecture for the (9, 7) filter is 4Tm + 8Ta while that of the convolution implementation is Tm -I- 2Ta. One way of improving this is by pipelining, as has been demonstrated in [28, 29, 30]. However, this results in the number of registers increasing significantly. For instance, to pipeline the lifting-based (9, 7) filter such that the critical path is tm + Ta, six additional registers are required.

Recently Huang, Tseng, and Chen [36] proposed a very efficient way of solving the timing accumulation problem. The basic idea is to remove the multiplications along the critical path by scaling the remaining paths by the inverse of the multiplier coefficients. Figures 5.9(a)-(c) describes how scaling at each level can reduce the multiplications in the critical path. Figure 5.9(d) further splits the three input addition nodes into two 2-input adders. The critical path is now Tm -I- 5Ta. Note that the flipping transformation changes the round-off noise considerably. Techniques to address precision and noise problems have also been addressed in [36].

5.3.4 A Register Allocation Scheme for Lifting

Chang, Lee, Peng, and Lee [31] proposed a programmable architecture to map the data dependency diagram of lifting-based DWT using four 3-input MAC (multiply-adder calculator), nine registers, and a register allocation scheme. The algorithm consists of two phases as shown in Figure 5.10. We explain

122

VLSI ARCHITECTURES FOR DISCRETE WAVELET TRANSFORMS

Fig. 5.9 A flipping architecture proposed in [36]; (a) original architecture, (b)-(c) scaling the coefficients to reduce the number of multiplications, (d) splitting the three-input addition nodes to two-input adders.

VLSI ARCHITECTURES FOR LIFTING-BASED DWT

123

below the data-flow principle of the architecture in terms of the register allocation of the nodes in the data dependency diagram as proposed in [31].

Phase 1 Phase 2 Phase 1 Phase 2

Input

First stage

Second stage

HP output LP output

Fig. 5.10 Data-flow and register allocation of the data dependency diagram of lifting.

From the data-flow in Figure 5.10, it is obvious that the architecture has two phases (Phase 1 and Phase 2). These two phases operate in alternate fashion. The sequential computation and register allocation in phase 1 of the data dependency diagram shown in Figure 5.10 are in the following order:

R0 < X2i—\\ i?2 <— x2u R3 <- R0 + a{Rl +R2);

RA <- Rl + b(R5 + R3);

R8 <— R5 + c(R6 + -R4);

OutputLP <— R6 + d(R7 + i?8); OutputHP *— R8.

Similarly, the sequential computation and register allocation in phase 2 of the data dependency diagram of lifting are as follows:

R0 <— X2i + 1‘, Rl <— %2i + 2\

Rb <- RQ + a(R2 + Rl)\

R6 <- R2 + b(R3 + Rb)~,

R7 <— R3 + c(R4 + R6)\

OutputLP <— RA + d(R8 + R7)\ OutputHP R7.

As explained above, two samples are input in each phase and two samples (LP and HP) are output at the end of every phase until the end of input data. The output samples are also stored into a temporary buffer for usage in the vertical filtering for the two-dimensional implementation of lifting-based discrete wavelet transform.

124

VLSI ARCHITECTURES FOR DISCRETE WAVELET TRANSFORMS

5.3.5 A Recursive Architecture for Lifting

According to the multiresolution decomposition principle of DWT, in every stage the low-pass subband is further decomposed recursively by applying the same analysis filters. The total number of the output samples to be processed for an L-level of DWT is

**50**> 51 52 53 54 55 56 .. 100 >> Next