Download (direct link):
stance, if the pre-defined reference model represents the shape of a human head-and-shoulder scene, then the video coder would not produce optimal results were it to be used to code sequences featuring for example, a car-racing scene, thereby limiting its versatility. In order to enhance the versatility of a model-based coder, the pre-defined model must be applicable to a wide range of video scenes. The first dynamic model generation technique was proposed (Siu and Chan, 1999) to build a model and dynamically modify it during the encoding process in accordance with new video frames scanned. Thus, the model generation is content-based, hence more flexible. This approach does not specify any prior orientation of a video object since the model is built according to the position and orientation of the object itself. Significant improvement was achieved on the flexibility and compression efficiency of the encoder when the generic model was dynamically adapted to the shape of the object of interest anytime new video information is available. Figure 2.6(b) shows frame 60 of sequence Claire coded using the 3-D pre-defined model depicted in (a). The average bit rate generated by the model-aided coder was almost 19kbit/s for a frame rate of 25f/s and CIF (352 x 288) picture resolution. For a 3-D model of 316 vertices (control points), the coder was able to compress the 60th frame with a luminance PSNR value of 35.05 dB.
2.4.3 Sub-band coding
Sub-band coding is one form of frequency decomposition. The video signal is decomposed into a number of frequency bands using a filter bank. The high-frequency signal components usually contribute to a low portion of the video quality so they can either be dropped out or coarsely quantised. Following the filtering process, the coefficients describing the resulting frequency bands are transformed and quantised according to their importance and contribution to reconstructed video quality. At the decoder, sub-band signals are up-sampled by zero insertion, filtered and de-multiplexed to restore the original video signal.
OVERVIEW OF DIGITAL VIDEO COMPRESSION ALGORITHMS
Figure 2.6 (a) 3-D model composed of 316 control points; (b) 60th frame of CIF-resolution Claire model-based coded using 716 bits
Figure 2.7 shows a basic two-channel filtering structure for sub-band coding.
Since each input video frame is a two-dimensional matrix of pixels, the sub-band coder processes it in two dimensions. Therefore, when the frame is split into two bands horizontally and vertically, respectively, four frequency bands are obtained: low-low, low-high, high-low and high-high. The DCT transform is then applied to the lowest sub-band, followed by quantisation and variable-length coding (entropy coding). The remaining sub-bands are coarsely quantised. This unequal decomposition was employed for High Definition TV (HDTV) coding (Fleisher, Lan and Lucas, 1991) as shown in Figure 2.8.
The lowest band is predictively coded and the remaining bands are coarsely quantised and run-length coded. Sub-band coding is naturally a scaleable compression algorithm due to the fact that different quantisation schemes could be used for various frequency bands. The use of the properties of HVS could also be incorporated into the sub-band compression algorithm to improve the coding efficiency. This could be achieved by taking into account the non-uniform sensitiv-
Figure 2.7 Basic two-channel filter structure for sub-band coding
2.4 CONTEMPORARY VIDEO CODING SCHEMES
Figure 2.8 Adaptive sub-band predictive-DCT HDTV coding
ity of the human eye in the spatial frequency domain. On the other hand, improvement could be achieved during the filtering process through the use of a special filter structure (Lookabaugh and Perkins, 1990) or by allocating more bits to the eye-sensitive portion of the frame spatial frequency band.
2.4.4 Codebook vector-based coding
A vector in video can be composed of prediction errors, transform coefficients, or sub-band samples. The concept of vector coding consists of identifying a vector in a video frame and representing it by an element of a codebook based on some criteria such as minimum distance, minimum bit rate or minimum mean-squared error. When the best-match codebook entry is identified, its corresponding index is sent to decoder. Using this index, the decoder can restore the vector code from its own codebook which is similar to that used by the encoder. Therefore, the codebook design is the most important part of a vector-based video coding scheme. One popular procedure to design the codebook is to use the Linde-Buzo-Gray (LBG) algorithm (Linde, Buzo and Gray, 1980) which consists of an iterative search to achieve an optimum decomposition of the vector space
OVERVIEW OF DIGITAL VIDEO COMPRESSION ALGORITHMS
into subspaces. One criterion for the optimality of the codebook design process is the smallest achieved distortion with respect to other codebooks of the same size. A replication of the optimally trained codebook must also exist in the decoder. The codebook is normally transmitted to the decoder out-of-band from the data transmission, i.e. using a separate segment of the available bandwidth. In dynamic codebook structures, updating the decoder codebook becomes a rather important factor of the coding system, hence leading to the necessity of making the update of codebooks a periodic process. In block-based video coders, each macroblock of a frame is mapped to a codebook vector that best represents it. If the objective is to achieve the highest coding efficiency then the vector selection must yield the lowest output bit rate. Alternatively, if the quality is the ultimate concern then the vector must be selected based on the lowest level of distortion. The decoder uses the received index to find the corresponding vector in the codebook and reconstruct the block. Figure 2.9 depicts the block diagram of a vector coding scheme The output bit rate of a vector-based video encoder can be controlled by the design parameters of the codebook. The size M of the codebook (number of vectors) and the vector dimension K (number of bits per vector) are the major factors that affect the bit rate. However, increasing M would entail some quantisation complexities such as large storage requirements and added search complexity. For quality/rate optimisation purposes, the vectors in the codebook are variable-length coded.