The present invention relates to a method for motion estimation in images, and particularly to a method for motion estimation in multimedia images.
In image coding, motion prediction is mainly achieved by two different methods. One method adopts the architecture of using the origin as the center of search window. This architecture has much regularity in accessing reference data, and hence facilitating data reuse. However, this architecture needs a greater search region for achieving accurate motion prediction. In particular, when motions in a video are greater or irregular, the increase in search region is significant, thus increasing amount in computations. The other method adopts the architecture of using a predict motion vector as the center of search window. The search region according this architecture is substantially reduced, by approximately 75%, in comparison with the previous architecture. Thereby, the number of computations is roughly 6.25% of the previous one. However, because of using the predict motion vector as the center of search window, it is not sure if the search windows of two adjacent macroblocks can be reused. Thereby, it is necessary to read the search window of each macroblock from the external memory, increasing the frequency of data access and significantly increasing bandwidth requirement to the memory.
Besides, for smaller frames such as QCIF and CIF, the differences in the performance between the architectures described above are not obvious. Nevertheless, when the resolution increases, for example, for D1, HD720, Full HD1080, or even QFHD, the differences in characteristics between the architectures described above are outstanding. By adopting the first architecture, motion prediction will occupy a great amount of coding time. However, because data can be reused efficiently, the bandwidth requirement to the external memory can be reduced effectively. On the other hand, by adopting the second architecture, because the required search region is reduced substantially, the time for motion prediction is reduced significantly. However, because the data of adjacent macroblocks cannot be shared, frequent data access to the external memory requires substantial increase in bandwidth.
Accordingly, the present invention provides a novel method for motion estimation in multimedia images. According to the present invention, not only the drawbacks in algorithm as described above can be avoided, but also the advantages of the two architectures described can be combined. Thereby, the problems mentioned above can be solved.
An objective of the present invention is to provide a method for motion estimation in multimedia images, which gathers a plurality of macroblocks and shares a predict motion vector for reducing computations in coding. Hence, the coding efficiency can be enhanced.
Another objective of the present invention is to provide a method for motion estimation in multimedia images, which combines the advantages of using the origin and of using the predict vector as the center of search window via the architecture of the degree of update frequency for achieving the purpose of shrinking search region while maintaining excellent data sharing.
Still another objective of the present invention is to provide a method for motion estimation in multimedia images, which changes search region dynamically and automatically according to the motion characteristics of a video while coding the video, and thus reducing memory demand effectively.
The method for motion estimation in multimedia images comprises steps of: dividing a predict image frame into a plurality of groups of macroblocks, and each of the groups of macroblocks including a plurality of macroblocks; predicting a motion vector of each of the groups of macroblock, and producing a predict motion vector; producing one or more search windows according to the predict motion vector; and comparing a plurality of pixels in each macroblock of each group of macroblocks to a plurality of pixels in the search window, and producing an actual motion vector, respectively. Thereby, by gathering a plurality of macroblocks, a shared predict motion vector is produced for reducing computations in coding. Hence, the coding efficiency can be enhanced.
In addition, the method for motion estimation in multimedia images according to the present invention further comprises steps of: producing an update window located in the search window; and judging if to reuse the predict motion vector while predicting the corresponding motion vector of the next group of macroblock according to whether the actual motion vector of the macroblock falls into the update window. Thereby, the present invention combines the advantages of using the origin and of using the predict vector as the center of search window via the architecture of the degree of update frequency for achieving the purpose of shrinking search region while maintaining excellent data sharing.
Furthermore, after the step of producing the update window, the method for motion estimation in multimedia images according to the present invention further determines the region of the search window of the next predict image frame according to the ratio of the actual motion vectors of the plurality of groups of macroblocks of the predict image frame falling into the update window. Thereby, the present invention changes search region dynamically and automatically according to the motion characteristics of a video while coding the video, and thus reducing memory demand effectively.
In order to make the structure and characteristics as well as the effectiveness of the present invention to be further understood and recognized, the detailed description of the present invention is provided as follows along with embodiments and accompanying figures.
Next, the step S102 is executed for predicting a motion vector (MV) of each of the groups of macroblocks, and producing a predict motion vector (PMV). That is, the motion vector of one macroblock MB of the plurality of macroblocks MB in the group of macroblocks GOMB is predicted, and thus the predict motion vector PMV is produced. As shown in
Besides, regarding to how to predict the motion vector MV of the first macroblock 12 of the group of macroblocks GOMB for producing the predict motion vector PMV, the present invention provides a method for producing the predict motion vector PMV. As shown in
However, if any of the motion vectors of the macroblocks MB(A), MB(B), MB(C), and MB(D) does not exist, such as the boundaries of a frame, the system will process as follows: (1) If the motion vectors MV of the macroblocks MB(A) and MB(D) do not exist, the system will preset them as zero. (2) If the motion vectors MV of the macroblocks MB(B), MB(C), and MB(D) do not exist, the system will use the motion vector MV of the macroblock MB(A) as the predict motion vector PMV. (3) If the motion vector MV of the macroblock MB(C) does not exist, the system will replace the macroblock MB(C) by the macroblock MB(D). This is only a preferred embodiment of the present invention, not used to limit the scope of the present invention.
Afterwards, the step S104 is executed for producing one or more search windows SW of a reference image frame (not shown in the figure) according to the predict motion vector PMV. Then, the step S106 is executed for comparing a plurality of pixels in each macroblock MB of each group of macroblock GOMB to a plurality of pixels in each candidate MB of the search window SW, and producing an actual motion vector, respectively. Thereby, by gathering the plurality of macroblocks MB, a shared predict motion vector PMV is produced for reducing computations in coding. Hence, the coding efficiency can be enhanced.
Owing to the drawback described above, the present invention produces a shared search-window center 26 (as shown in
For adjusting the location of the search window 40 according to the present invention, the location of the search-window center is adjusted to the closest initial access address of the memory. Taking the example above, the initial access address of the memory is a multiple of 4, such as the addresses of 4, 8, 12, or 16. If the location of the search-window center 42 is at the address 7, the closest initial address is 8, which represents the search-window center 43 in the figure. Thereby the location of the search window 40 is adjusted to the search window 41 in the figure. On the other hand, if the location of the search-window center 42 is at the address 5, the closest initial address is 4. Thereby the location of the search-window center 42 is adjusted to the initial access address 4. After the adjustments described above, invalid access of pixel data owing to inconformity between the initial access address of the pixel data in the search window 40 and the initial access address of the memory.
Afterwards, according to the method for prediction motion in multimedia images of the present invention, before executing the step S208 of comparing a plurality of pixels in each macroblock MB of each group of macroblocks GOMB to a plurality of pixels in each candidate macroblock MB of the search window SW, and producing an actual motion vector, respectively, the step S206 is first executed for calculating a predict motion vector of the plurality of macroblocks, respectively. After comparing a plurality of pixels in each macroblock of each group of macroblocks to a plurality of pixels in the search window and producing an actual motion vector, the step S206 can calculate and give a difference motion vector (DMV) according to the actual motion vector and the predict motion vector for subsequent circuit calculations. Referring back to
Refer back to
According to the present preferred embodiment, the ratio of actual motion vectors 46 falling into the update window 40 is divided to a first ratio 0%˜level—1%, a second ratio level—1%˜level—2%, and a third ratio level—2%˜100%, which correspond to a first search-window region SR1, a second search-window region SR2, and a third search-window region SR3, respectively. When the ratio of the actual motion vectors 46 of each row of the plurality of groups of macroblocks GOMB falling into the update window 40 is the first ratio 0%˜level—1%, the second ratio level—1%˜level—2%, or the third ratio level—2%˜100%, the region of the search window is adjusted to the corresponding first search-window region SR1, the second search-window region SR2, or the third search-window region SR3. Thereby, the present invention changes search region dynamically and automatically according to the motion characteristics of a video while coding the video, and thus reducing memory demand effectively.
Furthermore, the present invention includes a memory 55 for storing the corresponding search-window regions SR1, SR2, or SR3 of each row of the plurality of groups of macroblocks GOMB of the next predict image frame 60. The search-window regions SR1, SR2, or SR3 stored in the memory 55 is determined according to the ratio of the actual motion vectors 46 of each row of the plurality of groups of macroblocks GOMB of the predict image frame 10 falling into the update window 40.
The memory 55 comprises a plurality of storage locations 550, 552, 554, which correspond to each row of the plurality of groups of macroblocks 11, 13, 15, respectively. Namely, a first-row group of macroblocks 11, a second-row group of macroblocks 13, to a Nth-row group of macroblocks 15 correspond to a first storage location 550, a second storage location 552, to a Nth storage location 554, respectively, which further correspond to each row of the plurality of groups of macroblocks GOMB of the next predict image frame 60, respectively. In other word, the first, second, to the Nth storage locations 550, 552, 554 correspond to groups of macroblocks of the first row 61, groups of macroblocks of the second row 63, to groups of macroblocks of the Nth row 65, respectively.
The first, second, and third search-window regions SR1, SR2, SR3 corresponding to the first ratio 0%˜level—1%, the second ratio level—1%˜level—2%, and the third ratio level—2%˜100% are stored in the first, second, to the Nth storage locations 550, 552, 554, respectively. Then, the groups of macroblocks of the first row 61, the groups of macroblocks of the second row 63, to the groups of macroblocks of the Nth row 65 of the next predict image frame 60 correspond to the first, second, and third search-window regions SR1, SR2, SR3 of the first, second, to the Nth storage locations 550, 552, 554, respectively, no that the region of the search window can be adjusted accordingly.
For example, when the ratio of the plurality of actual motion vectors 46 of the groups of macroblocks of the first row 11 of the predict image frame 10 falling into the update window 40 is the third ratio level—2%˜100%, the first storage location 550 of the memory 55 stores the third search-window region SR3 correspondingly. Thereby, when the groups of macroblocks of the first row 61 of the next predict image frame 60 perform searches, the third search-window region SR3 in the first storage location 550 will be read as the search-window region of the groups of the macroblocks of the first row 61 and the search window will thus be produced. Accordingly, the search-window region of the groups of macroblocks of the first row 61 of the predict image frame 10 differs from the search-window region of the groups of macroblocks of the first row 61 of the next predict image frame 60. Thereby, the present invention can change search region dynamically and automatically according to the motion characteristics of a video while coding the video, and thus reducing memory demand effectively. According to the present preferred embodiment, the second and Nth storage locations 552, 554 of the memory 55 store the second and first search-window regions SR2, SR1, respectively. Hence, search-window regions of the groups of macroblocks of the second row 63 and of the Nth row 65 are the second search-window region SR2 and the first search-window region SR1, respectively.
According to the description above, each row of the plurality of groups of macroblocks GOMB of the predict image frame 10 corresponds to the plurality of groups of macroblocks of the same row for the next predict image frame 60, respectively. Besides, the region of the search window of each row of the plurality of groups of macroblocks GOMB for the next predict image frame 60 is determined according to the ratio of the actual motion vectors of the plurality of groups of macroblocks GOMB of the same row for the predict image frame 10 falling into the update window 40.
The memory module 80 stores a reference macroblock 82, a first macroblock 84, and a second macroblock 86. The control unit 74 reads the reference pixel data of the second memory unit 76 via the address generator 72 and stores it to the memory module 80 as the reference macroblock 82. The first and second macroblocks 84, 86 are the pixel data in the macroblock of the current predict image frame. In addition, the first and second macroblocks 84, 86 are top/bottom adjacent macroblocks. The operational module 90 includes a first operational unit matrix 92 and a second operational unit matrix 94. The first operational unit matrix 92 receives the reference pixel data of the reference macroblock 82 and the pixel data of the first macroblock 84 for calculating and giving a first sum of absolute difference (SAD). Likewise, the second operational unit matrix 94 receives the reference pixel data of the reference macroblock 82 and the pixel data of the second macroblock 86 for calculating and giving a second sum of absolute difference.
After the first mode generator 100 receives the first sum of absolute difference, the first mode generator 100 perform 7 modes of combination. Taking a 16×16 reference macroblock as example, because the throughput of the pixel data processed by the first and second operational unit matrixes 92, 94 is limited, only part of the pixel data can be processed at a time. According to the present preferred embodiment, the handleable pixel data at a time is 4×4. After the first and second operational unit matrixes 92, 94 process the 4×4 pixel data, the first model generator 100 perform 7 modes of combination for rebuilding the 16×16 macroblock, producing a corresponding 16×16 first sum of absolute difference, and transmitting the first sum of absolute difference to the first comparison unit 104. Likewise, the second mode generator 102 produces a 16×16 second sum of absolute difference and transmits it to the second comparison unit 106.
The rate distortion cost calculating unit 108 receives a motion vector, a reference signal (λ factor), and a predict motion vector for producing a rate-distortion-cost signal. The first comparison unit 104 gives a best motion vector of the first macroblock 84 and the mode corresponding to the best motion vector according to the rate-distortion-cost signal and the first sum of absolute difference for subsequent operations performed by the coding circuit. Likewise, the second comparison unit 106 gives the best motion vector the second macroblock 86 and the mode corresponding to the best motion vector according to the rate-distortion-cost signal and the second sum of absolute difference for subsequent operations performed by the coding circuit. The first operational unit matrix 92, the second operational unit matrix 94, the first mode generator 100, the second mode generator 102, the first comparison unit 104, the second comparison unit 106, and the rate distortion cost calculating unit 108 described above are presently available technologies, and hence will not be explained in detail.
Besides, the best motion vectors of the first and second macroblocks 84, 86 are transmitted to the third memory unit 110. The control unit 74 acquires the best motion vectors of the first and second macroblocks 84, 86 by accessing the third memory unit 110. Then the control unit 74 performs the method for motion estimation in multimedia images according to the present invention for changing the searching region dynamically and for judging if to reuse the predict motion vector PMV while predicting the corresponding motion vector of the next group of macroblocks. Thereby, the amount of computations can be reduced.
Moreover, according to the present invention, because the top/bottom adjacent macroblocks can use the same search window, the operational module 90 in circuit contains both of the first and second operation unit matrixes 92, 94 for calculating the pixel data of two macroblocks simultaneously. Hence, the operation efficiency can be enhanced.
To sum up, the method for motion estimation in multimedia images according to the present invention comprises steps of: dividing a predict image frame into a plurality of groups of macroblocks, and each of the groups of macroblocks including a plurality of macroblocks; predicting a motion vector of each of the groups of macroblocks, and producing a predict motion vector; producing one or more search windows according to the predict motion vector; and comparing a plurality of pixels in each macroblock of each group of macroblocks to a plurality of pixels in the search window, and producing an actual motion vector, respectively. Thereby, by gathering a plurality of macroblocks, a shared predict motion vector is produced for reducing computations in coding. Hence, the coding efficiency can be enhanced.
Accordingly, the present invention conforms to the legal requirements owing to its novelty, nonobviousness, and utility. However, the foregoing description is only embodiments of the present invention, not used to limit the scope and range of the present invention. Those equivalent changes or modifications made according to the shape, structure, feature, or spirit described in the claims of the present invention are included in the appended claims of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
098138386 | Nov 2009 | TW | national |