Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu Image and Videl Comoression P10 ppt
Nội dung xem thử
Mô tả chi tiết
11
© 2000 by CRC Press LLC
Block Matching
As mentioned in the previous chapter, displacement vector measurement and its usage in motion
compensation in interframe coding for a TV signal can be traced back to the 1970s. Netravali and
Robbins (1979) developed a pel-recursive technique, which estimates the displacement vector for
each pixel recursively from its neighboring pixels using an optimization method. Limb and Murphy
(1975), Rocca and Zanoletti (1972), Cafforio and Rocca (1976), and Brofferio and Rocca (1977)
developed techniques for the estimation of displacement vectors of a block of pixels. In the latter
approach, an image is first segmented into areas with each having an approximately uniform
translation. Then the motion vector is estimated for each area. The segmentation and motion
estimation associated with these arbitrarily shaped blocks are very difficult. When there are multiple
moving areas in images, the situation becomes more challenging. In addition to motion vectors,
the shape information of these areas needs to be coded. Hence, when moving areas have various
complicated shapes, both computational complexity and coding load will increase remarkably.
In contrast, the block matching technique, which is the focus of this chapter, is simple,
straightforward, and yet very efficient. It has been by far the most popularly utilized motion
estimation technique in video coding. In fact, it has been adopted by all the international video
coding standards: ISO, MPEG-1 and MPEG-2, and ITU H.261, and H.263. These standards will
be introduced in detail in Chapters 16, 17, and 19, respectively.
It is interesting to note that even nowadays, with the tremendous advancements in multimedia
engineering, object-based and/or content-based manipulation of audiovisual information is still very
demanding, particularly in audiovisual data storage, retrieval, and distribution. The applications
include digital library, video on demand, audiovisual databases, and so on. Therefore, the coding
of arbitrarily shaped objects has attracted great research attention these days. It has been included
in the MPEG-4 activities (Brailean, 1997), and will be discussed in Chapter 18.
In this chapter various aspects of block matching are addressed. They include the concept and
algorithm, matching criteria, searching strategies, limitations, and new improvements.
11.1 NONOVERLAPPED, EQUALLY SPACED, FIXED SIZE,
SMALL RECTANGULAR BLOCK MATCHING
To avoid the kind of difficulties encountered in motion estimation and motion compensation with
arbitrarily shaped blocks, the block matching technique was proposed by Jain and Jain (1981) based
on the following simple motion model.
An image is partitioned into a set of nonoverlapped, equally spaced, fixed size, small rectangular
blocks; and the translation motion within each block is assumed to be uniform. Although this simple
model considers translation motion only, other types of motions, such as rotation and zooming of
large objects, may be closely approximated by the piecewise translation of these small blocks
provided that these blocks are small enough. This observation, originally made by Jain and Jain,
has been confirmed again and again since then.
Displacement vectors for these blocks are estimated by finding their best matched counterparts
in the previous frame. In this manner, motion estimation is significantly easier than that for
arbitrarily shaped blocks. Since the motion of each block is described by only one displacement
vector, the side information on motion vectors decreases. Furthermore, the rectangular shape
© 2000 by CRC Press LLC
information is known to both the encoder and the decoder, and hence does not need to be encoded,
which saves both computation load and side information.
The block size needs to be chosen properly. In general, the smaller the block size, the more
accurate is the approximation. It is apparent, however, that the smaller block size leads to more
motion vectors being estimated and encoded, which means an increase in both computation and
side information. As a compromise, a size of 16 ¥ 16 is considered to be a good choice. (This has
been specified in international video coding standards such as H.261, H.263, and MPEG-1 and
MPEG-2.) Note that for finer estimation a block size of 8 ¥ 8 is sometimes used.
Figure 11.1 is utilized to illustrate the block matching technique. In Figure 11.1(a) an image
frame at moment tn is segmented into nonoverlapped p ¥ q rectangular blocks. As mentioned above,
in common practice, square blocks of p = q = 16 are used most often. Consider one of the blocks
centered at (x, y). It is assumed that the block is translated as a whole. Consequently, only one
displacement vector needs to be estimated for this block. Figure 11.1(b) shows the previous frame:
the frame at moment tn-1. In order to estimate the displacement vector, a rectangular search window
is opened in the frame tn-1 and centered at the pixel (x, y). Consider a pixel in the search window,
a rectangular correlation window of the same size p ¥ q is opened with the pixel located in its
center. A certain type of similarity measure (correlation) is calculated. After this matching process
has been completed for all candidate pixels in the search window, the correlation window corresponding to the largest similarity becomes the best match of the block under consideration in frame
tn. The relative position between these two blocks (the block and its best match) gives the displacement vector. This is shown in Figure 11.1(b).
The size of the search window is determined by the size of the correlation window and the
maximum possible displacement along four directions: upward, downward, rightward, and leftward.
In Figure 11.2 these four quantities are assumed to be the same and are denoted by d. Note that d
is estimated from a priori knowledge about the translation motion, which includes the largest
possible motion speed and the temporal interval between two consecutive frames, i.e., tn – tn-1.
11.2 MATCHING CRITERIA
Block matching belongs to image matching and can be viewed from a wider perspective. In many
image processing tasks, we need to examine two images or two portions of images on a pixel-by-pixel
FIGURE 11.1 Block matching.