There are no related applications.
The claimed invention relates generally to image/video signal processing. In particular, the claimed invention relates to motion estimation for video encoding and motion detection. In particular, the motion estimation in the claimed invention refers to multiple reference picture motion estimation. Furthermore, the claimed invention relates to efficient use of data for multiple reference picture motion estimation. The claimed invention is applicable in motion estimation algorithm with fixed search range as well as motion estimation algorithm with non-fixed search range.
For transmission or other purposes, a digital video is encoded to reduce the video size and thus the bandwidth required. At the receiver side, the encoded digital video will be decoded to reproduce the digital video.
Motion estimation is a common technique used in various video coding. Motion estimation exploits the temporal redundancy to achieve bandwidth reduction because a video is simply a series of pictures (also known as frames), and the content of these pictures is repetitive as the pictures share similar scenes or objects. Therefore, data required for transmission can be reduced if the pattern of how they are going to repeat themselves in the subsequent pictures is known.
In order to know the pattern of how data are going to repeat themselves in the subsequent pictures, pictures need to be compared with one another to see how they match with each other. For example, in order to encode a picture, another picture which immediately precedes the picture to be encoded is used for comparison. In order to enhance the accuracy of the comparison result, more than one picture which are neighbors to the picture to be encoded are used, for example, an object in a picture may fail to appear in the immediate subsequent picture and cannot be matched because it may be blocked by a moving car; however, it will appear in the following pictures for matching to be done after the moving car has gone. These pictures which are used for comparing with the picture to be encoded are known as reference pictures, therefore, motion estimation which makes use of multiple reference pictures is generally known as multiple reference picture motion estimation. The picture to be encoded is generally known as current picture.
To process those pictures, a huge computation power or memory size is required if the processing is done in a picture-by-picture manner. Therefore, pictures are further divided into smaller units known as blocks (macroblock is one kind of blocks) for processing. A picture block is a block in a picture, and the terms “picture block” and “block” are used interchangeably hereinafter.
For motion estimation involving multiple reference pictures, multiple reference blocks of multiple reference pictures are required to be loaded into the internal memory, for example, cache, from an external memory, for example, RAM (random access memory), in order to process a current picture. However, a problem arises; firstly, the loading of data from external memory to internal memory takes time. Therefore, it is time-consuming if each block in a current picture needs to wait for multiple reference blocks of multiple reference pictures to be loaded before performing multiple reference picture motion estimation. Secondly, the storage of multiple reference blocks of multiple reference pictures for each block in a current picture requires a large internal memory.
Therefore, instead of having to wait for loading multiple reference blocks of multiple reference pictures into internal memory, the claimed invention provides a solution to save time and to reduce the size requirement of the internal memory.
In order to achieve such efficiency improvement, the claimed invention adopts three approaches as follows; firstly, at each time instance, one or more current pictures are compared with a single reference picture concurrently. That means multiple current pictures may be referenced to a single reference picture concurrently and processing such as encoding and motion estimation is performed for each of these multiple current pictures in parallel. Therefore, reference blocks of each reference picture need not be loaded into the internal memory or be present in the internal memory for multiple time instances because all current pictures which are required to reference to this reference picture have done so within a single time instance.
Secondly, instead of waiting all the multiple reference pictures to be available for processing, each current picture is processed with one reference picture at a time. Therefore, there is no need to wait for the loading of multiple blocks of multiple reference pictures and no huge memory size is required to hold all the multiple blocks of multiple reference pictures.
Thirdly, the claimed invention does not limit the reference picture type, the reference picture can be the original raw picture, the reconstructed picture, synthesis picture and so on.
In this document, the terms “frame” and “picture” are used interchangeably hereinafter to represent a picture in a video. For a multiple reference picture motion estimation, a current picture, which may be under encoding, references to one or more reference pictures. The claimed invention allows multiple current pictures which are under processing to reference to a single reference picture. The claimed invention reuses the overlapped searching region without any shift operation.
Consequently, at any time instance, not all the multiple blocks of multiple reference pictures are required to be loaded into the internal memory. The size of the internal memory of the claimed invention is reduced and the idle time before processing a current picture to wait for the internal memory to be loaded with multiple blocks of multiple reference pictures is reduced. Search window data for temporally adjacent reference blocks, i.e. the reference pictures, are thus reused. Memory bandwidth is reduced because not all multiple reference pictures are required to be loaded at a time.
The claimed invention is suitable for FPGA and DSP implementation among others.
It is an object of the claimed invention to not only consider the single reference picture motion estimation data reuse and internal memory reduction but also consider the multiple reference picture motion estimation algorithm. The claimed invention also overcomes the limitation of the parallel operation of direct memory access (DMA) and motion estimation (ME) as well as some limitations to the motion estimation precision. The claimed invention enables the parallel running of multiple reference search modules so that searches are performed for one or more current pictures simultaneously.
It is a further object of the claimed invention to simplify the control logic of reference blocks loading to support different block types (such as 16×16/16×8/8×16/8×8 and others) multiple reference picture motion estimation. The claimed invention supports multiple inter block type as well as multiple block types and only needs to run interpolation module once to encode M block types rather than running interpolation module N×M times for supporting N reference pictures and M block types (M: 0-9) so that calculation overlay problem can be overcome.
It is a further object to further decrease the bandwidth requirement by incorporating data reuse for block matching motion estimation into the claimed invention. The claimed invention fulfills low bandwidth requirements.
It is a further object of the claimed invention to enable data reuse for multiple reference picture motion estimation. The claimed invention is capable to be combined with certain single picture data reuse methods, such as Level C, Level C+, to enhance the performance.
It is a further object of the claimed invention to enhance the coding efficiency. The claimed invention decreases the algorithm control logic complexity.
It is a further object of the claimed invention to enable the claimed invention to be applicable in motion estimation algorithm with fixed search range as well as motion estimation algorithm with non-fixed search range.
It is a further object of the claimed invention to decrease the bus bandwidth and internal memory requirement.
It is a further object of the clamed invention to decrease the algorithm control logic complexity.
Other aspects of the claimed invention are also disclosed.
These and other objects, aspects and embodiments of this claimed invention will be described hereinafter in more details with reference to the following drawings, in which:
In an embodiment, the following assumptions are made, and the data/parameters in use are for illustrative purposes whereas the method as illustrated is capable to be easily adapted to any other data/parameters:
1. The encoder supports N reference pictures, n changes from 0 to N−1.
2. The size of motion information (sizeof (blk_info)) is equal to 64 bytes: sizeof (blk_info)=64;
3. Block width (blk_width) is equal to 16: blk_width=16;
4. Block height (blk_height) is equal to 16 : blk_height=16;
5. Search range (SR) is from −127 to 128, i.e., [−127, 128]: SR=128;
6. Reference block width equal to ((SR<<1)+blk_width);
7. Picture_width is the horizontal size of the picture;
8. Picture_height is the vertical size of the picture;
9. Frame_rate is the frame rate per second of the input video sequence.
Furthermore, the following are defined for external memory organization:
1. Original video sequences, which include current encoding picture, is curr_pic[n] whereas 0≦n<N.
2. Reference picture is one of the previous reonstructed pictures: ref_pic.
3. Predict pictures are pred_pic[n] whereas 0≦n<N.
4. Best Information pictures are info_pic[n] whereas 0≦n<N.
In addition to the external data organization, the following are defined for internal data organization:
1. Memory for current block of curr_pic[n] is curr_blk[n] whereas 0≦n<N;
2. Memory for reference block of ref_pic is ref_blk;
3. Memory for Motion information of curr_blk[n] is blk_info[n] whereas 0≦n<N;
4. Memory for Predict block data of curr_blk[n] is pred_blk[n] whereas 0≦n<N;
5. Reconstructed block data of curr_pic[0] is recon_blk whereas 0≦n<N;
6. 1 set of Half (½) pixel arrays and 1 set of quarter (¼) pixel arrays for ref_blk. Different fractional search algorithms lead to different fractional array sizes;
7. Neighbor blocks' motion information is neigh_info;
According to the above definitions, the encoding flow is defined as follows:
Step 1: In a current picture loading step 101, start with a new current picture (curr_pic[0]) in which the encoder are initialized. Correspondingly, the picture coding type and other related header information are determined before proceeding to the block encoding process.
Step 2: Begin the block encoding process. For example, the encoder supports N reference frames. N current blocks are loaded from the subsequent N encoding current pictures to the internal memory. The N current blocks (curr_blk[n]) are loaded from the original video sequence in external memory to internal memory in the following way:
N×16×16=256N bytes
256×5=1280 bytes
Step 3: In a reference block loading step 102, load one reference block (ref_blk) for all current block (curr_blk[n]) from a reference picture (ref_pic) in external memory to internal memory according to the search range.
(SR×2+blk_width)×(SR×2+blk_height)=(128×2+16)×(128×2+16)=73,984 bytes
(SR×2+blk_width)×(SR×2+blk_height)×total_block_number×frame_rate=(128×2+16)×(128×2+16)×total_block_number×frame_rate=73,984×total_block_number×frame-rate (bytes/second)
sizeof(blk-info)×N=64N bytes.
64×5=320 bytes
73,984+320=74,304 bytes
sizeof(blk_info)×N×total_block_number×frame_rate=64×N×total_block_number×frame_rate bytes/second
Step 4: In an integer pixel motion estimation step 103, perform integer pixel motion estimation for all the current blocks (curr_blk[n]) by using the reference block (ref_blk). Find the best integer motion information blk_info[n], such as motion vectors, and the best integer matching blocks (pred_blk[n]) in the reference block (ref_blk) of the reference picture (ref_pic) for all the current blocks (curr_blk[n]). Each encoder can decide which motion estimation algorithm to be used. In general, motion estimation algorithms can be classified into three types:
1. Fixed search center and fixed search range. This type of motion estimation algorithms are hardware friendly, most of the hardware design use this kind of motion estimation implementation.
2. Non-fixed search center but with fixed search range which is not good for hardware implementation.
3. Non-fixed search center and non-fixed search range which is bad for hardware implementation.
Step 5: In an interpolation step 104, prepare the data for fractional search. Interpolate the half and quarter pixel arrays for the reference block (ref_blk).
Interpolate the horizontal, vertical and cross half pixel arrays for ref_blk;
Interpolate the horizontal, vertical and cross quarter pixel arrays for the reference block (ref_blk).
Step 6: In a fractional pixel search step 105, do fractional pixel search for all current blocks curr_blk[n] by using the half pixel and quarter pixel reference arrays, and get all the best matching block (pred_blk[n]), i.e. predict block in a predict picture and the motion information (blk_info[n]) corresponding to the best matching block (pre_blk[n]) after the fractional search has been finished.
In a comparing step 106, compare the results with the motion information which are obtained from Step 3 for all the current blocks (curr_blk[n]) and update the best results to the motion information (blk_info[n]) and the best matching block (pred_blk[n]).
Step 7: In a best result updating step 107, store the updated best matching block (pred_blk[n]) and the corresponding motion information (blk_info[n]) for all the current blocks (curr_blk[n]) back to the external memory if necessary. If the best matching block (pred_blk[n]) and the corresponding motion information (blk_info[n]) have not been updated, they do not need to be stored back to external memory again.
So the maximum bandwidth for pred_blk[n] and blk_info[n] which are stored back to the external memory is:
N×(sizeof(blk_info)+(blk_width×blk_height))×total_block_number×frame_rate=N×(64+16×16)×total_block_number×frame_rate=320N×total_block_number×frame_rate bytes/sonds
320×5×total_block_number×frame_rate=1600×total_block_number×frame-rate bytes/second
Step 8: In a reference block checking step 108, if the current coding block's (curr_blk[0]) best matching block (pred_blk[0]) is not coming from the reference block (ref_blk), the encoder needs to load the best matching block (pred_blk[0]) from external memory in a best matching block loading step 118. Otherwise, do nothing.
blk_width×blk_height×total_block_number×frame_rate=16×16×total_block_number×frame_rate=256×total_block_number×frame_rate (bytes/second)
Step 9: In a difference block generating step 109, obtain a difference block by subtracting the current coding block (curr_blk[0]) and the best matching block (pred_blk[0]).
Step 10: In a processing step 110, implement DCT/Quant/VLC/De-Quant/IDCT based on the Difference block obtained from the difference block generating step 109.
Step 11: In a reconstructing step 111, reconstruct the current block to generate the reconstructed block (recon_blk).
Step 12: Reconstructed block (recon_blk), store the reconstructed block (recon_blk) back to the external memory, if the current picture (curr_pic[0]) can be used as reference picture according to a reference picture checking step 122, the reconstructed block (recon_blk) will be saved as the reference picture (ref_pic) into the reference picture list for next coming encoding picture, otherwise only store it to display picture buffer in a reconstructed block storing step 123.
blk_width×blk_height×total_block_number×frame_rate=(16×16)×total_block_number×frame_rate=256×total_block_number×frame_rate bytes/second
Step 13: In a next block looping step 113, if all the blocks in the current picture (curr_pic[0]) has been processed, go to Step 1 and begin to process the next encoding picture until all the pictures have been processed, then exit in a ending step 120. Otherwise, go to Step 1 and continue to process the next block in the current picture (curr_pic[0]).
Addr_pic(n+0)=Start_Addr+sizeof(pic_info)×(n % 5)
Addr_pic(n+1)=Start_Addr+sizeof(pic_info)×(n % 5+1)
Addr_pic(n+2)=Start_Addr+sizeof(pic_info)×(n % 5+2)
Addr_pic(n+3)=Start_Addr+sizeof(pic_info)×(n % 5+3)
Addr_pic(n+4)=Start_Addr+sizeof(pic_info)×(n % 5+4)
The best motion information (blk_info) and the predict block (pred_blk) for all blocks of the current picture curr_pic[0] are computed from the block from the current picture curr_pic[0] in the internal memory at the first time instance 210. Then the best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[0] is stored in the first address location 201 in the external memory. Starting from the second time instance, the first address location 201 in the external memory will be used to store the best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[5] which are to be computed by the block from the current picture curr_pic[5] in the internal memory at the second time instance 220, the block from the current picture curr_pic[5] in the internal memory at the third time instance 230, the block from the current picture curr_pic[5] in the internal memory at the fourth time instance (not shown), the block from the current picture curr_pic[5] in the internal memory at the fifth time instance (not shown) and the block from the current picture curr_pic[5] in the internal memory at the sixth time instance (not shown).
The best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[1] are computed from the block from the current picture curr_pic[1] in the internal memory at the first time instance 210 and the block from the current picture curr_pic[1] in the internal memory at the second time instance 220. Then the best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[1] is stored in the second address location 202 in the external memory. Starting from the third time instance, the second address location 202 in the external memory will be used to store the best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[6] which are to be computed by the block from the current picture curr_pic[6] in the internal memory at the third time instance 230 and the 4 other processes for the current picture curr_pic[6] blocks in the internal memory at the subsequent 4 time instances (not shown). The best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[2] are computed from the block from the current picture curr_pic[2] in the internal memory at the first time instance 210, the block from the current picture curr_pic[2] in the internal memory at the second time instance 220 and the block from the current picture curr_pic[2] in the internal memory at the third time instance 230. Then the best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[2] is stored in the third address location 203 in the external memory.
The best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[3] are computed from the block from the current picture curr_pic[3] in the internal memory at the first time instance 210, the block from the current picture curr_pic[3] in the internal memory at the second time instance 220, the block from the current picture curr_pic[3] in the internal memory at the third time instance 230 and the block from the current picture curr_pic[3] in the internal memory at the fourth time instance (not shown). Then the best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[3] is stored in the fourth address location 204 in the external memory.
The best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[4] are computed from the block from the current picture curr_pic[4] in the internal memory at the first time instance 210, the block from the current picture curr_pic[4] in the internal memory at the second time instance 220, the block from the current picture curr_pic[4] in the internal memory at the third time instance 230, the other two processes for the current picture curr_pic[4] blocks in the internal memory at the subsequently two time instances (not shown). Then the best motion information (blk_info) and the predict block (pred_blk) for the current picture curr_pic[4] is stored in the fifth address location 205 in the external memory.
In this embodiment, there are 5 reference pictures. Furthermore, half pixel interpolation and quarter pixel interpolation may also be supported. In the claimed invention, only one interpolation operation is required for each coding block during horizontal half pixel interpolation and horizontal quarter pixel interpolation. Only one interpolation operation is required for each coding block during vertical half pixel interpolation and vertical quarter pixel interpolation. Only one interpolation operation is required for each coding block during cross half pixel interpolation and cross quarter pixel interpolation. This is much more efficient than any method which requires 5 interpolation operations for each coding block during each of half and quarter pixel interpolations in horizontal, vertical and cross directions.
Therefore, in this embodiment, at some time instance, it can support partially parallel and pipeline. Such as the motion estimation process can be pipelined with the coding and reconstruct operation. Multiple motion estimation processes can be running in parallel or serial based on the hardware implementation.
I0601 has no reference frame and is encoded into a reconstructed frame recon_I0602. The reconstructed frame recon_I0602 is used as the reference frame for P0611, B0612, and P1613. The input pictures P0611, B0612 and P1613 do motion estimation. P0611 is encoded into a reconstructed frame recon_P0610. The reconstructed frame recon_P0610 is the reference frame for B0621, P1622, B1623 and P2624. The input pictures B0621, P1622, B1623, P2624 do motion estimation. B0621 is encoded into a reconstructed frame recon_B0620 and P1622 is encoded into a reconstructed frame recon_P1629. The reconstructed frame recon_P1629 is the reference frame for B1631, P2632, B2633 and P3634. The input pictures B1, P2, B2, P3 do motion estimation. B1631 is encoded into a reconstructed frame recon_B1630 and P2632 is encoded into a reconstructed frame recon_P2639. recon_P2639 is the reference frame for B2641, P3642, B3643 and P4644. The input pictures B2641, P3642, B3643, P4644 do motion estimation, B2641 is encoded into a reconstructed frame recon B2640 and P3642 is encoded into a reconstructed frame recon_P3649. The reconstructed frame recon_P3649 is the reference frame for B3651, P4652, B4653 and P5654. The input pictures B3651, P4652, B4653 and P5654 do motion estimation B3651 is encoded into a reconstructed frame recon_B3650 and P4652 is encoded into a reconstructed frame recon_P4659, so on and so forth. The process continues until all N input frames are encoded, assuming there are N frames in the video to be encoded. In this embodiment of IBPBPBPBP coding pattern, at the B frame coding and reconstruct stage, there is parallel P frame coding and reconstruct stage.
Therefore, in this embodiment, at each time instance, there are parallel and pipeline running of the following operations for different input frames: motion estimation, coding and reconstruct operation. For example, when blocks in B0 is encoded and reconstructed, motion estimation is applied to blocks in P1 in parallel, and when blocks in P1 is encoded and reconstructed, motion estimation is applied to blocks in B1 in parallel. There is no need to store the best match block of P1 back into external memory, and there is no need to reload the original P1 when it is encoded and reconstructed, the bandwidth is thus further reduced.
For example, the input frame I0801 has no reference frame and is encoded and reconstructed into a reconstructed frame recon_I0802. The reconstructed frame recon_I0802 is the reference frame of the input frames P0811, P1812, b1813, and B0814. The input frames P0811, P1812, b1813, B0814 do motion estimation. The input frame P0811 is encoded and reconstructed into a reconstructed frame recon_P0803. The reconstructed frame recon_P0803 is the reference frame of the input frames P1821, b1822, B2823, P2824, b4825 and B3826. The input frames P1821, b1822, B2823, P2824, b4825, B3826 do motion estimation. The input frame P1 (812 and 821) is encoded and reconstructed into a reconstructed frame recon_P1810. The input frame b1 (813 and 822) is encoded and reconstructed into a reconstructed frame recon_b1819. The reconstructed frame recon_b1819 is the reference frame of the input frames B0831 and B2832. The input frames B0831, B2832 do motion estimation. The input frame B0 (814 and 831) is encoded and reconstructed into a reconstructed frame recon_B0820. The input frame B2 (823 and 832) is encoded and reconstructed into a reconstructed frame recon_B2829. The reconstructed frame recon_P1810 is the reference frame of the input frames P2841, b4842, B5843, P3844, b7845, and B6846. The input frames P2841, b4842, B5843, P3844, b7845, B6846 do motion estimation. The input frame P2 (824 and 841) is encoded and reconstructed into a reconstructed frame recon_P1810. The input frame b4 (825 and 842) is encoded and reconstructed into a reconstructed frame recon_b4839. The reconstructed frame recon_b4839 is the reference frame of the input frames B3851 and B5852 for motion estimation. The input frames B3851, B5852 do motion estimation. The input frame B3 (826 and 851) is encoded and reconstructed into a reconstructed frame recon_B3840. The input frame B5 (843 and 852) is encoded and reconstructed into a reconstructed frame recon_B5849. The reconstructed frame recon_P2830 is the reference frame of the input frames P3861, b7862, B8863, P4864, b10865, and B9866 for motion estimation. The input frames P3861, b7862, B8863, P4864, b10865, B9866 do motion estimation. The input frame P3 (844 and 861) is encoded and reconstructed into a reconstructed frame recon_P3850. The input frame b7 (845 and 862) is encoded and reconstructed into a reconstructed frame recon_b7859. The reconstructed frame recon_b7859 is the reference fame of the input frames B6871 and B8872. The input frames B6871, B8872 do motion estimation. The input frame B6 (846 and 871) is encoded and reconstructed into a reconstructed frame recon_B6860. The input frame B8 (863 and 872) is encoded and reconstructed into a reconstructed frame recon_B8869. The process continues until all N input frames are encoded, assuming there are N frames in the video to be encoded.
The description of preferred embodiments of this claimed invention are not exhaustive and any update or modifications to them are obvious to those skilled in the art, and therefore reference is made to the appending claims for determining the scope of this claimed invention.
The claimed invention has industrial applicability in consumer electronics, in particular with video applications. The claimed invention can be used in the video encoder, and in particular, in a multi-standard video encoder. The multi-standard video encoder implements various standards such as H.263, H.263+, H.263++, H264, MPEG-1, MPEG-2, MPEG-4, AVS (Audio Video Standard) and the like. More particularly, the claimed invention is implemented for multiple video standards encoder which supports multiple references picture motion estimation. The claimed invention can be used not only for software implementation but also for hardware implementation. For example, the claimed invention can be implemented in a DSP (digital signal processing) video encoder, Xilinx FPGA chip or SoC ASIC chip.