1. Field
Digital image data motion estimation and prediction.
2. Background
Video data motion estimation and prediction is used in video or image processing, encoding, and/or display. For example, predicting the motion of objects in images included in an input stream of video may provide better overall quality display, such as by providing a display of video and/or images that is smooth and appealing to a viewer. Specifically, the motion of objects, which are present in a current frame of image or video data, can be computed based on the previous frame, in a sequence of frames of the data by a motion estimation unit (MEU). An MEU may be used to estimate the motion in video data formatted in Moving Picture Experts Group (MPEG) (e.g., such as MPEG2 or MPEG4).
Various features, aspects and advantages will become more thoroughly apparent from the following detailed description, the claims, and accompanying drawings in which:
Motion Estimation is a process of predicting the motion of objects. In this process, the motion of objects, which are present in a current frame or image of a stream of video data, is computed based on the previous frame, in a sequence of frames of the data. Specifically, according to embodiments, a motion estimation (ME) unit or “MEU” may produce motion vectors based on comparisons of reference blocks and search window areas of images from a sequence of presumably temporally and spatially related images or material, such as a stream of video data having frames of pixel or image data. Note that the ME unit need not necessarily provide true motion vectors, but may instead provide the locations of the best matches of a reference block against an image in a particular search window. It is entirely possible that the true motion carried an object partially or fully out of the search range. Even so, the ME unit may still give an answer that represents the best match, based on a sum of the absolute differences for example, within the search window. As an alternative, the best match may be determined by computing a sum of squared differences (SSD) or other appropriate comparison or difference for each pair of current and previous blocks.
It can be appreciated that the apparatus, systems, and processes describe herein may also be applied to compare or determine a difference between a reference block of data of a previous image and search window data of a current image. Furthermore, the apparatus, systems, and processes describe herein may be applied to compare or determine a difference between a reference block and a search window of data of any two frames of a stream of video data, such as a stream of data having a sequence of frames of pixel data being transmitted, received, or having the capability to be displayed such that the frames appear to be in constant motion.
For example, a sum of absolute differences (SAD) may be a function applied by or during a ME unit or a ME process or calculation, which indicates the difference between a block of data in the current frame to another block in the previous frame. The lower the SAD, the better the match and thus better the overall quality of the motion estimation, image processing, encoding, and/or display. A SAD value may be calculated as:
SAD(x,y)=IΣjΣ|C(I,j)−P(x+I,y+j)|, a)
where “C(I,j)” stands for current frame, “P(x, y)” stands for previous frame, “i” and “j” define the search window region (e.g., such as for either a 4×4 pixel block, or an 8×8 pixel block).
In accordance with embodiments, a MEU (e.g., such as PE4224 and/or PE6226 as described below with respect to
For example,
Pixel source 320 is for providing a source of pixel input data of a previous image to search memory 322, which may store a total search region and may send portions of the total search region to SAD engine 330 via data path 326 to form search windows (e.g., such as by providing portions thereof). Moreover, search memory 322 may provide write address to store or write a total search region of data into search memory 322 from pixel source 320, and may also provide a read address to retrieve or read a search window from search memory 322 to SAD engine 330. Specifically, search memory 322 may provide portions (e.g., such as columns) of one or more search windows of data from a total search region of a previous image to SAD engine 330, such as according to instructions, addresses, data, or information received by search memory 322 from address generator 324. Thus, search memory 322 may be configured or described as a search region memory to store a total search region of data, pixels, pixel blocks of previous image including a number of search windows of data and portions thereof. It is contemplated that search memory 322 may be a random access memory (RAM) (e.g., such as an 8 kilobyte (KB) RAM memory), a static RAM (SRAM), a dynamic RAM (DRAM), an MCH memory, a programmable memory, a local memory, a cache memory, or another appropriate memory to temporary release store data, pixels, or pixel block.
Similarly, reference source 310 is for providing a source of reference input data of a current image, such as reference block 312, to SAD engine 330 via data path 316. More particularly, reference source 310 may provide a write address to store or write a reference block of data into SAD engine 330.
According to embodiments, reference source 310, such as including a current image, and pixel source 320, such as including a previous image, may be part of a digital data stream of pixels, video, source input, and/or image data. For example, the digital data stream may include frames of data pixels, and/or images, such as a current frame or image and previous frame or image, from video data of related images, frames, data, pixels, etc.
It is also considered that pixel source 320 may be or may be provided by a one or more registers, cluster communication registers (CCRs), general purpose registers (GPRs), data paths, or couplings (e.g., such as described herein with respect to couplings 230 through 237 and 260 of
SAD engine 330 may access or obtain search windows of image pixel data from search memory 322 and a reference block of image pixel data from the reference storage device within SAD engine 330 to determine SAD values between the reference block of data from the current image and the plurality of search windows of data from the previous image. Moreover, each search window may include a first part or portion of a previous search window already compared (e.g., such as previously compared with respect to time) with the reference block by the SAD engine, and another part or portion of a subsequent different search window adjacent to the previous search window. For example, reference block 312 and search memory 322 may include a number of pixels of video or image data, such as from a data stream as described herein. Thus SAD engine 330 may be a comparison, difference, SAD, or SSD engine, array, unit, comparison unit, processor, signal processor, digital signal processor, or other computing entity as described herein that compares one or more pixels of reference block 312 with one or more pixels of search memory 322. Specifically, SAD engine 330 may calculate a SAD value equal to a sum of absolute values which are the value of a pixel of the reference block less a value of a pixel of the search window (e.g., such as described by equation “a” above).
In addition, search memory 322 may provide data to temporary registers within SAD engine 330. Similarly, reference block 312 may be stored in a reference storage device within SAD engine 330. More particularly, SAD engine 330 may calculate, be configured to calculate, and/or be programmed to calculate a SAD value for various sized pixel blocks. Specifically, SAD engine 330 may calculate a SAD value for an 8×8 and/or 4×4 pixel block within reference block 312 as compared to the search window temporary registers.
For instance,
For example, temporary register 1-532 may receive pixels from search memory 322 via data path 521, such as where data path 521 may be part of data path 326. Thus, temporary register herein, such as temporary register 1-532, may be considered or described as search memory to store search window data, pixels, pixel blocks, and portions thereof from a previous image. In addition, search memory 322 may be described as a search memory as well.
Likewise, reference register 1-534 may be a part of a reference storage device as described above, such as to store a pixel of data of reference block 312 (e.g., part of reference block 312 stored in SAD engine 330). Reference registers, such as reference register 1-534 may be considered or described as reference storage device to store reference block data, pixels, pixel block, and portions thereof from a current image. Thus, reference register 1-534 may receive a pixel of reference block data via data path 511, such as where data path 511 may be part of data path 316. Absolute difference unit 1-538 receives the search window data stored in temporary register 1-532 and the reference block data stored in reference register 1-534 and produces an absolute difference between the data, such as by producing an absolute difference value for the value of the pixel of search window data as compared to the pixel of reference block data. The absolute difference calculated may then be output via data path 533.
SAD engine 330 may include 64 pairs of registers coupled to absolute difference units, such as to process an 8×8 pixel block reference block of data as compared to an 8×8 search window. Thus, SAD engine 330 may include register pairs and absolute difference units 1 through 64. Specifically,
Specifically, as shown in
It may be appreciated that the structure shown in
SAD engine 330 may also produce a motion vector or vectors providing the location of the best match of a reference block against an image in a particular search window. For example, SAD engine 330 may produce, identify, or generate a motion vector corresponding to any SAD value as described above, such as a motion vector corresponding to a 4×4 pixel block SAD value, and 8×8 pixel block SAD value, a 8×16 pixel block SAD value, a 16×8 pixel block SAD value, a 16×16 pixel block SAD value . . . etc., as mentioned herein. Specifically, the motion vector may be a vector equal to a best matched based on the SAD value subtraction, comparison, or difference between a location of a reference block in a current image as compared to a location of a corresponding block of search data (e.g., such as a block of search data for which the SAD values or values have been calculated by SAD engine 330) of a total search region (e.g., such as search region 420 of
According to one embodiment, SAD engine 330 may calculate a motion vector as described above for each of four different 4×4 pixel blocks within a reference block as compared to a or each search window, as well as one 8×8 pixel block within the reference block as compared to a or each search window of data. In one instance, SAD engine 330 may implement an 8×8 pixel block SAD, and optionally four 4×4 pixel block SAD's within the 8×8 pixel block SAD, using a pipelined implementation with throughput of 1 SAD calculation per clock cycle.
For instance,
It can be noted that reference block 312 may have a size similar to that described above with respect to search windows for
Address generator 324 may select or identify a total search region or portion thereof of data, pixels, or pixel blocks of a previous image to be stored in search memory 322. Address generator 324 may send a write address or addresses of search memory 322 identifying an address or addresses of search memory 322 to which a total search region or portion thereof is to be written (e.g., such as the address to temporary register 1-532 and temporary register 64-542). In one example, the write address would correspond to the addresses in search memory 322 to which total search region 420 is to be written.
Also, according to embodiments, address generator 324 may select or identify the search window or portion thereof to be compared with reference block 312. More particularly, generator 324 may generate a read address or addresses corresponding to an address or addresses in search memory 322, where the address or addresses correspond to or are the address of a portion of data, pixels, or pixel blocks of a previous image to be stored in temporary registers of SAD engine 330 (e.g., such as to be stored in temporary register 1-532 and temporary register 64-542 to form a search window). In fact, address generator 324 may select one or more of portions 430 through 438, such as by selecting portions 430 and 432 to form a first search window, and then appending portion 434 to portion 432 to form a second search window, as described above and as shown in
Specifically, for example, address generator 324 may generate a read address corresponding to a 1×8 column of data, such as portion 434 so that when search memory 322 receives that address it sends portion 434 to append portion 434 to portion 432 (e.g., such as where portion 432 is an “old” portion of data included in a search window for which SAD values have previously been calculated) to form a search window at the temporary registers of SAD engine 330, as described above with respect to
Moreover, SAD engine 330 may include one or more adders to add portions 430 to 438 of total search region 420, to form search windows by adding or combining data, pixels, or pixel block of a previous image, and/or as described above with respect to
Once enough search window data is present and the reference data is stored in the SAD engine 330, a command can be provided to the SAD engine along with start and end addresses, to do the SAD computation(s). The start and end addresses could be the same in which case the SAD computation may be performed at single pixel position.
In this architecture, a column of 8-pixels may be sent from search memory 322 to temporary registers of SAD engine 330 every clock cycle. As such, the end of 8 cycles, the entire 8×8 search window data would reside or be stored in SAD engine 330. SAD engine 330 can then compute the SAD value and send the SAD value out to downstream stages (e.g., such as motion estimation of image processing or encoding post-processing, a motion estimation threshold stage, threshold unit 340, expansion unit 350, and/or memory SAD memory 352 as described below).
According to embodiments, during the next clock cycle, another column of 8-pixels may be sent to temporary registers of SAD engine 330 and the resulting SAD computation can be the value at the position offset by 1 in the x-direction. This processing may continue until the column of 8-pixels at the end of the row is sent and the SAD value including that row is calculated and processed. Moreover, SAD engine 330 may compute SAD values at both a 4×4 pixel block level as well as an 8×8 pixel block level. Thus, the SAD engine may produce one set of SAD value output(s) and motion vector(s) every clock cycle once the pipeline is full of columns of 8-pixels.
Moreover, MEU 300 may be programmed to handle various ME search widow selection algorithms such as a full search, a logarithmic search, a three-tier search, a diamond search, etc. For instance, it is contemplated that address generator 324 may be programmable, such as by including a memory to store a program, configuration registers to be configured, or other known programmable means, to select the portions or search windows of data from total search region 420 according to various programmable patterns and for motion estimation selection algorithms. For example, address generator 324 may select portions of search windows or search windows according to a full search pattern, a logarithmic search pattern, or a diamond search pattern, or other search pattern as known in the art. A full search pattern may include appending portions 430 through 438 as described above to form consecutive search windows moving in direction D1 as shown in
Referring to
Thus, after SAD engine 330 produces SAD value(s) and motion vector(s), threshold unit 340 may then receive the SAD value(s) and compare them against one or more corresponding threshold value(s). If a threshold value is met (e.g., such as by a SAD valued being less than, or less than or equal to the threshold valued), or the end of the search region is reached, then threshold unit 340 may send out the motion vector(s) and the corresponding SAD value(s). Specifically, threshold unit 340 may send out both 4×4 and 8×8 pixel block motion vectors and SAD values for an 8×8 pixel block reference block as compared to 8×8 pixel block search windows. In addition, once a threshold value is met, then threshold unit 340 may send out a termination or halt signal to cause early termination or halting of the motion vector search algorithm.
According to embodiments, threshold unit 340 may be a programmable architecture and/or post data processing unit to SAD engine 330 having at least one threshold memory block or threshold cell. Thus, threshold unit 340 may include one or more threshold cells for determining whether or not one or more SAD values satisfy, meet, are less than, are less than or equal to, or exceed a threshold value, such as a threshold value selected, entered, programmed, chosen, or input to the threshold unit from or by an apparatus, a PE, and/or a person or user. For example,
For example,
Correspondingly, register 622 may hold a SAD value that is the best SAD value determined for the cell so far or thus far according to calculations performed by the threshold cell. For instance, register 622 may contain, store, hold, or otherwise maintain temporarily or permanently a value of a best SAD value for the 4×4 pixel block, or 8×8 pixel block. Likewise, register 620 may hold the corresponding motion vector to the SAD value held at register 622, such as a motion vector corresponding to a SAD value as described above with respect SAD engine 330 of
Moreover,
Specifically, subtractor 630 may be a subtractor or comparator to compare a progression or sequence of SAD values for a reference block as compared to a progression or sequence of search windows such as for a total search region (e.g., such as for 4×4 or 8×8 pixel blocks) with a best SAD value (e.g., such as a best SAD value determined thus far or the progression or sequence of search windows as compared to that specific reference block) by comparing the scalar SAD values received and temporarily stored at register 612 with whatever current best SAD value is stored at register 622 and updating the best SAD value at register 622 with any value temporarily stored at register 612 that is better, such as by being less than, the value stored at register 622.
Correspondingly, each time a SAD value stored at register 612 is determined to be better than the best SAD value stored at register 622, the motion vector stored at register 610 is also identified as, stored at, or used to replace the motion vector stored at register 620.
In addition, cell 600 may include threshold comparator 650, as shown in
For instance, an active signal transmitted on termination line 660 may cause a termination, halting, discontinuation, or otherwise stop SAD value calculations by SAD engine 330, address generation by address generator 324, search window determination by search memory 322, threshold value determination by threshold unit 340, and/or determinations described for cell 600 as described herein. Moreover, upon determining that a SAD value satisfies or is better than the threshold SAD value stored in register 654, the SAD value better than the threshold value and the motion vector corresponding to that SAD value may be stored and/or output, transmitted, or sent to downstream processing upon or after termination related to the active signal on termination line 660.
In one embodiment, a cell similar to cell 600 (e.g., such as a cell including threshold comparator 650) exists for each of threshold cells 342 through 348. Thus, after generation of each set of SAD values and corresponding motion vectors by SAD engine 330 for each search window compared to the reference block, a best SAD value and associated motion vector is determined for four 4×4 pixel blocks and an 8×8 pixel block, and the best SAD value is compared to the threshold value for each of the four 4×4 pixel blocks and the 8×8 pixel block.
It is contemplated that the processing described above with respect to threshold unit 340 and/or cell 600 may occur once per clock cycle. In other words, during a first clock cycle, SAD engine 330 may determine four 4×4 SAD values and/or an 8×8 SAD value and corresponding motion vectors for a reference block of a current image as compared to a search window of a previous image and transmit those values and vectors to threshold unit 340. Then, during a subsequent clock cycle, the SAD engine may determine another set of SAD values and vectors, while threshold unit compares the SAD values received to current best SAD values to make a best SAD value determination and determinates whether any of the SAD values and/or best SAD values is better than a threshold value.
Thus, threshold unit 340 and/or cells 600 may output a best SAD value and/or corresponding motion vector for each best SAD value for four 4>4 pixel blocks and an 8×8 pixel block prior to, upon, or after transmitting an active signal on one or more termination lines, similar to line 660, or upon completion of SAD value calculations for a total search region, such as search region 420. In other words, as shown in
In addition, a MEU as described above, such as MEU 300, may be programmable to handle SAD computations at 4×4, 8×8 and also can be extended to handle reference block sizes greater than 8×8 pixel block SAD values (e.g., 8×16, 16×8, 16×16, etc.). For instance, embodiments of MEU 300 can include programmable logic circuits and registers to allow a user to change a pixel block size of a reference block of data and a plurality of search windows of data that the comparison unit is to ultimately compare. Thus, MEU 300 may have capability to send out SAD value computed at every pixel to the destination. In one case, this feature may be used to extend this architecture to support 16×16 pixel block SAD values. In this case, an 8×8 pixel block SAD values SAD computation may be done using the reference block from the left quadrant 8×8 reference block and the resulting SAD values every pixel is sent out to the destination, where it is stored temporarily.
For example, according to embodiments, as shown in
Therefore, according to embodiments, expansion unit 350 of
Thus, adder 354 may be used to add SAD values and/or motion vectors for a set of search windows of a total search region as compared to a first reference block of data (e.g., such as a reference block of data of a first 8×8 pixel block quadrant of a 16×16 total reference block) stored in SAD memory 352 to corresponding SAD values and motion vectors for the same set of search windows as compared to a second reference block of data (e.g., such as a second 8×8 pixel block reference block of data of a 16×16 total reference block) for the same total search region, such as by adding the SAD value and motion vector calculated at each pixel of the total search region for both of the reference blocks. Furthermore, the added SAD values and motion vectors output by adder 354 may be subsequently stored or replace the values previously stored in SAD memory 352 (e.g., such as by replacing the SAD values and motion vectors stored in SAD memory 352 for the first reference block with the SAD values and motion vectors added at adder 354 for the first and second reference block). Using this architecture or process it is possible to add together SAD values and motion vectors for subsequent reference blocks (e.g., such as four 8×8 reference blocks of a 16×16 total reference block of data, where the four 8×8 reference blocks represent the four quadrants of the 16×16 total reference block) to determine a set of total SAD values and/or total motion vectors for a total reference block of data greater than 8×8 (e.g., such as a 8×16, 16×8, 16×16, 32×32, etc. total reference block of data).
It is appreciated that the SAD values and motion vectors added by adder 354 for more than one reference block of data will have to take into consideration the locations of the reference blocks of data as compared to each other in the current image. For example, adder 354 may add SAD values for a second 8×8 pixel block reference block of data of a 16×16 total reference block as compared to a total search region to SAD values for a first 8×8 pixel block reference block of data of the 16×16 total reference block as compared to the same total search region, where the first reference block is a first 8×8 pixel block of a current image and the second reference block is the subsequent or next 8×8 reference block of data of the current image (e.g., such as where the first reference block is rows 0-7 and columns 0-7 of pixels of the current image and the second reference block is rows 0-7 and columns 8-15 of the pixel blocks of the current image). In this case, an appropriate offset of the first set of SAD values and motion vectors from SAD memory 352 as compared to the second set of SAD values and motion vectors generated by SAD engine 330 for the second reference block must be considered. An appropriate offset will cause adder 354 to add the first set and second set of SAD values and motion vectors that correspond to the appropriate pixel location within the total search region (e.g., such as by adding to the SAD value and motion vector calculated for each pixel of the first reference block stored in SAD memory 352 with the SAD value and motion vector calculated for a pixel 8 pixels to the right, or 8 columns over but in the same row, of the second reference block determined by SAD engine 330).
Moreover, once a total search region is completed, then the above process may be repeated, by using the 2nd quadrant 8×8 reference block, but at the same time, the SAD values from the 1st quadrant may be sent to adder 354 using SAD memory 352. At adder 354, the SAD value computed at every pixel for the second quadrant is then added with the SAD values from the corresponding pixel in the 1st quadrant and sent out to SAD memory 352 where it is stored temporarily again. This procedure is repeated for a 3rd and 4th quadrant to get the entire 16×16 total reference blocks SAD value. This approach allows computation of SAD for blocks greater than 8×8 (16×8, 8×16, 16×16, etc) using external temporary storage (e.g., SAD memory 342 and/or a MCH as described for
It is also contemplated that a SAD value compared to the threshold value of register 654 may be a SAD value received from a SAD value stored in a memory. Hence, for embodiments using expansion unit 350, threshold unit 340 may store a threshold value, such as a selected value as described above with respect to threshold register 654 for the total reference block (e.g., such as a total reference block having a size greater than an 8×8 pixel block, such as a total reference block of 8×16, 16×8, 16×16, 32×32, etc. pixel blocks). Thus, it is contemplated that threshold unit 340 may include a threshold value to compare to the total SAD value for each pixel generated by adder 354 up on completion of adding the values at each pixel for all of the reference blocks of data for the total reference block region (e.g., such as by comparing the total SAD value at each pixel of the total search region after the SAD values for each of the four 8×8 reference block quadrants of a total 16×16 reference block region has been added together at each of the pixels, as compared to the threshold value). In cases of SAD values for pixel blocks greater than 8×8 (e.g., such as 16×16 pixel block reference blocks), threshold unit 340 may simply compare SAD values received with the threshold value stored in register 654.
It is to be appreciated that SAD values and motion vectors for various other locations or quadrants of reference blocks of data as compared to the total search region may also be considered when adding SAD values and motion vectors for a third, fourth, etc . . . reference block of data to the SAD values and motion vectors of the first and second, first second and third, etc . . . reference block stored in SAD memory 352.
Thus, the various reference blocks of data to be compared to the total search region may be related, corner to corner, adjacent, super adjacent, or otherwise associated in location within the current image. More particularly, the SAD values and motion vectors for a third quadrant may be offset by considering pixels or loads of pixels that are down or below the first quadrant pixel by eight pixels or eight loads and are in the same first eight columns or in the same eight column as the first quadrant to form a third quadrant of a 8×8 pixel block reference block of data for a four 8×8 pixel block quadrant 16×16 total reference block of current image data.
More particularly, according to one embodiment, where the total reference block is a 16×16 pixel block separated into four 8×8 reference blocks having SAD values and motion vectors added by adder 354, threshold unit 340 (e.g., such as including a cell 600 having a threshold value stored in threshold register 654 for a 16×16 total reference block) may wait until SAD values and motion vectors for all four 8×8 reference blocks of data have been added together via adder 354 before determining whether the threshold value is satisfied. Thus, in this case, as the SAD values and motion vectors for the fourth quadrant 8×8 pixel block reference block of data are added to the first 3 quadrants of SAD values and motion vectors (e.g., such as by adder 354 adding the SAD values and motion vectors for quadrants 1, 2, and 3 added together and stored in SAD memory 352 to the SAD values and motion vectors being calculated by SAD engine 330 for the fourth 8×8 pixel block reference block of data stored at reference block 312) threshold unit 340 may determine whether the threshold value stored at threshold register 654 is met for each pixel of the total search region. In other words, during one clock cycle, SAD engine 330 may be determining SAD values and motion vectors for the fourth quadrant reference block of data, and during that or a subsequent clock cycle, adder 354 may be adding the SAD values and motion vectors for the fourth quadrant to those of the first three quadrants, and during that subsequent or another subsequent clock cycle threshold unit 340 may be determining whether the SAD value and/or threshold value for a pixel for all four quadrants of reference block data satisfy the threshold value at that pixel. Thus, if the SAD value of all four quadrants added together for a certain pixel location of the total search region satisfies or is less than the threshold value for the total 16×16 reference block, subtractor 650 and multiplexors 652 and 653 may output an active signal on termination length 660 and the best SAD value and best motion vector via lines 658 through 660, as described above with respect to
Also, according to embodiments, MEU 300 may exclude or not use expansion unit 350, such as by not including or using adder 354 or SAD memory 352, but instead having data path 353 equal to data path 333.
At block 720, total search region “Y” is stored. Block 720 may correspond to storing a total search region of pixel data of a previous image such as described above with respect to pixel source 320 of
At block 730, one or more threshold values “Th” are stored. Block 730 may correspond to storing threshold values such as described above with respect to threshold unit 340, cell 600, and/or threshold register 654.
According to embodiments, the process as described above with respect to block 710, 720, 730, and/or 740 may be performed in various orders. Specifically, according to one embodiment, the order of occurrence may be block 720, block 710, block 730, and then block 740.
At block 740, search window “Z” is stored. Block 740 may correspond to storing or generating a search window of data from a total search region as described above with respect to search memory 322, address generator 324, SAD engine 330, and/or temporary register 532. Specifically, at block 740, consecutive 1×8 pixel blocks or columns of pixel data may be sent to SAD engine 330 to create a consecutive search window for each consecutive block or column of data as described with respect to
At block 750, a current one or more SAD values (e.g., such as a set of SAD values for four 4×4 pixel blocks and an 8×8 pixel block and motion vectors corresponding thereto) may be calculated for reference block X as compared to search window Z. Block 750 may correspond to calculating one or more SAD values and determining one or more motion vectors corresponding to those SAD values as described above with respect to SAD engine 330, and data path 333.
At block 760, the current SAD values and motion vectors are stored. Block 760 may correspond to storing one or more SAD values and motion vectors as described above with respect to threshold unit 340, register 610, and register 612.
At decision block 770, it is determined whether any of the current SAD values are better than a best SAD value. For example, block 770 may represent comparing a SAD value to a best SAD value as described above with respect to threshold unit 340, cell 600, register 622, subtractor 630, and/or multiplexor 632. If at decision block 770 any current SAD value is not better than a best SAD value, the process continues on to decision block 785.
On the other hand, if at decision block 770 a current SAD value is better than a best SAD value, then the process proceeds to block 780. At block 780, any current SAD value(s) determined to be better than a best SAD value, and vectors corresponding to any current SAD values determined to be better than a best SAD value are stored, write over, or replace, the current best SAD value(s) and corresponding vector(s). Block 770 may correspond to storing a best SAD value and corresponding motion vector as described above with respect to threshold unit 340, cell 600, register 620, register 622, subtractor 630, and/or multiplexor 632.
At decision block 785 it is determined whether any best SAD value satisfies a threshold value. Block 785 may correspond to comparing a SAD value or a best SAD value as described above with respect to threshold unit 340, cell 600, threshold comparator 650, threshold register 654, subtractor 651, multiplexors 652 and 653, termination line 660, best SAD line 658, and/or best motion vector line 659. If at block 785 any best SAD value satisfies or is less than a corresponding threshold value, the process continues on to block 795.
At block 795 calculating is halted or terminated. Block 795 may correspond to the description above with respect to threshold unit 340, cell 600, threshold comparator 650, threshold register 654, subtractor 651, multiplexors 652 and 653, and termination line 660.
At block 796, the best SAD value or values and corresponding motion vector or vectors are sent or transmitted to downstream processing. Block 796 may correspond to the description above with respect to threshold unit 340, cell 600, threshold comparator 650, threshold register 654, subtractor 651, multiplexors 652 and 653, best motion vector line 659, and best SAD line 658.
If at block 785, no best value satisfies or is less than a corresponding threshold value, the process continues to decision block 790. At decision block 790 it is determined whether the total search region is exhausted, such as by determining whether all search windows of a total search region have been processed by the motion estimation unit. For example, block 790 may correspond to determining whether all search windows of total search region 420 have been processed as described above with respect to SAD engine 330, threshold unit 340, cell 600, threshold comparator 650, and/or expansion unit 350. If at block 790 the total search region has not been exhausted or processed then the process continues to block 792 where “Z” is incremented by 1. After block 792, the process continues back to block 740 where another search window is loaded and the process continues.
If at block 790 the total search region is exhausted, the process continues to block 796, where the best SAD value or values and corresponding motion vector or vectors are sent, as described above.
At block 820, total search region “Y” is stored. Block 820 may correspond to the description above for block 720.
At block 830 reference block “X” is stored or loaded. Reference block X may be a total or a subdivision of total reference region W. For example, reference block X may be an 8×8 pixel block of data that is a portion or quadrant of total reference region W of a current image (e.g., such as where W is a 16×16 pixel block total reference block). In addition, block 830 may correspond to the description above with respect to block 710.
At block 840, one or more threshold values “Th” are stored. Block 840 may correspond to descriptions above with respect to block 730, threshold unit 340, cell 600, threshold register 654, threshold comparator 650, and/or extension unit 350. Specifically, block 840 may correspond to storing a threshold value for a block of pixel data having a size greater than an 8×8 pixel block, such as for a 16×16 pixel block.
It is contemplated that blocks 810, 820, 830, 840 and/or 850 may occur in various orders. For example, block 820 may occur before any of the other blocks and/or block 840 may occur before any of blocks 810 through 830. Similarly, the order of block 810 and block 820, or block 830 and block 840 may be reversed. In addition, block 830 may occur before block 820. Finally, block 850 may occur prior to block 840 or block 810, so long as block 850 occurs after block 820.
At block 850, search window “Z” is stored. Block 850 may correspond to the description above with respect to block 740.
At block 860, the SAD value or values and motion vectors for block X and search window Z are calculated. Block 860 may correspond to the description above with respect to block 750, SAD engine 330, expansion unit 350, adder 354, and/or SAD memory 352.
At block 870, the SAD values and motion vectors calculated at block 860 are added to SAD values and motion vectors currently stored in the SAD memory. Block 870 may correspond to the descriptions above with respect to expansion unit 350, SAD memory 352, adder 354, threshold comparator 650, subtractor 651, and/or threshold register 654. It may be appreciated that if the current SAD values and motion vector values stored in the SAD memory are zero, do not exist, or are for a previous total search region (e.g., such as being for total search region 420 while current SAD value calculations are being performed for total search region 422) then the SAD values and motion vectors calculated at block 860 may be replaced, or become the total value stored in the SAD memory. For example, the SAD values and motion vectors calculated at block 860 may replace any current zero or non-zero SAD values and motion vector values with the SAD values calculated at block 860, such as when the SAD values calculated at block 860 are for a first portion or quadrant of a total reference block.
At decision block 880, it is determined whether search window Z is the end of or exhausts total search region Y. Block 880 may correspond to the description above with respect to block 790. If at block 880 it is determined that total search region Y is not exhausted, processing continues to block 887 where “Z” is incremented by one. From block 887 processing continues to block 850 where the next search window is stored or loaded, and the process continues.
If at block 880, it is determined that total search region Y is exhausted, then the process continues to block 884 where “X” is incremented by one. After block 884, processing continues to block 885.
At block 885 it is determined whether reference block X is the last block of total reference region W, such as by determining whether the total reference region has been exhausted so that the current block X is the last reference block of region W. Block 885 may correspond to the description above with respect to calculating SAD values and motion vectors for multiple reference blocks, such as described with respect to expansion unit 350, SAD memory 352, adder 354, threshold unit 340, threshold comparator 650, and/or threshold register 654.
If at block 885 it is determined that reference block X is not the end of total reference region W, then processing continues to block 830 where a subsequent, next, additional, associated, or other reference block of total reference region W is stored or loaded for consideration and the process continues. For example, loading a subsequent or next reference block X of total reference region W may correspond to descriptions above with respect to expansion unit 350, SAD memory 352, adder 354, reference source 310, reference block 312, threshold unit 340, threshold comparator 650 and/or threshold register 654.
If at block 885 it is determined that reference block X is the last block of the total search region, then the process continues to block 889. At block 889, the last reference block “X” for region W is stored or loaded. Block 889 may correspond to the description above for block 830 and block 885. For example, at block 889, a subsequent or additional reference block of total reference region W may be stored or loaded, where that block is the last or final reference block of total reference region W, thus completing the consideration of total reference region W as compared to the total search region Y. After block 889, processing continues to block 890.
At block 890, search window “Z” is stored. Block 890 may correspond to the description above with respect to block 850. At block 891, the SAD value or values and motion vector or vectors for block X and search windows Z are calculated. Block 891 may correspond to the description above with respect to block 860.
At block 892, the SAD values and motion vectors calculated at block 891 are added to SAD values and motion vectors currently stored in the SAD memory. Block 892 may correspond to the description above with respect to block 870. It is noted that since the current block X is the last block of region W, the SAD value and motion vector sums at block 892 may be the total SAD values and total motion vectors for the total reference region W as compared to total search region Y (e.g., such as where block 892 provides a pixel by pixel total SAD value and motion vector for each pixel of total search region Y as compared to total reference region W).
At decision block 893 it is determined whether one or more SAD values summed at block 892 (e.g., such as the sum of SAD values calculated at block 891 and appropriate corresponding SAD values currently stored in the SAD memory as described above with respect to expansion unit 350 of
On the other hand, if at block 893 one or more SAD values summed at block 892 do satisfy (e.g., such as by being less than, or less than or equal to) a threshold value, then the process continues to block 895. At block 895, calculations or processing is halted block 895 may correspond to descriptions above with respect to block 795, threshold unit 340, cell 600, threshold comparator 650, termination line 660, and/or extension unit 350 (e.g., such as description thereof and appropriate for motion estimation of a reference block having a size greater than an 8×8 pixel block). After block 895, the process continues to block 896.
At decision block 894, it is determined whether search window Z is the end of or exhausts total search region Y. Block 894 may correspond to the description above with respect to block 880. If at block 894 it is determined that total search region Y is not exhausted, processing continues to block 897 where “Z” is incremented by 1. From block 897, processing continues to block 890 where the next search window is stored or loaded, and the process continues.
If at block 894 it is determined that total search region Y is exhausted, processing continues to block 896.
At block 896, the current best SAD value or values for the total reference block and corresponding motion vector or vectors are sent or transmitted to downstream processing. Block 896 may correspond to the description above with respect to block 796, threshold unit 340, cell 600, threshold comparator 650, best motion vector line 659, best SAD line 658, and/or expansion unit 350.
It is contemplated that a ME unit as described herein (e.g., such as MEU 300) may be part of a larger and/or more complex image signal processor or processing element. For instance,
Moreover, signal processor 200 may be coupled to one or more similar signal processors, where each signal processor may also be coupled to one or more memory and/or other signal processors (e.g., such as in a “cluster”). Also, each cluster may be coupled to one/or more other clusters. For instance signal processor 200 may be connected together in a cluster of eight or nine digital signal processors in a mesh configuration using Quad-ports. The quad-ports can be configured (statically) to connect various ISP's to other ISP's or to double data rate (DDR) random access memory (RAM), such as a “main memory” using direct memory access (DMA) channels. For example, signal processor 200 may be or may be part of programmable multi-instruction multiple data stream (MIMD) digital image processing device. More particularly, signal processor 200, whether coupled or not coupled to another signal processor, can be used for image processing related to a copier, a scanner, a printer, or other image processing device including to process a raster image, a Moving Picture Experts Group (MPEG) image, or other digital image data.
In addition, signal processor 200 can use several PE's connected together through CCRs 210 (e.g., such as where CCRs 210 is a register file switch) to provide a fast and efficient interconnection mechanism and to maximize performance for data-driven applications by mapping individual threads to PE's in such a way as to minimize communication overhead. Moreover, a programming model of the ISP's can be implemented is such that each PE implements a part of a data processing algorithm and data flows from one PE to another and from one ISP to another until the data is completely processed.
Moreover, in embodiments, a PE may be one of various types of processing elements, digital signal processors, comparison units, video and/or image signal processors for processing digital data. Similarly, a PE may be an input from one or more other ISP's, an output to one or more other ISP's, a hardware accelerator (HWA), a MEU (e.g., such as MEU 300), memory controller, and/or a memory command handler (MCH). For example, one of the PE's (e.g., PE0220) may be an input from another ISP, one of the PE's (e.g., PE1221) may be an output to other ISP, from one to three of the PEs (e.g., PE4, PE5 and PE6) may be configured as HWAs, at least one of the PEs (e.g., PE4) may be configured as a MEU (e.g., such as a HWA MEU, such as MEU 300), and one of the PEs (e.g., PE7227) may be configured as a MCH functioning as a special HWA to manage the data flow for the other PE's in and out of a local memory. Thus, for example, an embodiment may include a cluster of PEs interconnected through CCRs 210, where CCRs 210 is a shared memory core of up to sixteen CCRs and each CCR is coupled to and mapped to the local address space of each PE.
According to embodiments, memory 270 may be a static RAM (SRAM) type memory, or memory 270 may be a type of memory other than SRAM. Memory 270 may be a local signal processor memory used for storing portions of images and/or for storing data temporarily, such as sum of absolute differences (SAD) values between pixels of a current data image and a prior data image. Specifically, memory 270 may provide the function of search memory 322, SAD memory 352, and/or block 870 as described above. Thus, memory 270 may SAD memory 352 by being an SRAM MCH memory, similar to a cache memory, used to temporarily store portions of images or complete image data that may originate from a DDR and may be staged in MCH 227.
Within signal processor 200, or a cluster of such signal processors (e.g., ISPs), Input PE and Output PE may be the gateways to the rest of the ISPs and can also be programmed to some level of processing. Other PEs within an ISP may also provide special processing capabilities. For instance, PE's acting as MEU's (e.g., such as MEU 300) of signal processor 200 (e.g. such as PE 4 and/or other PE's as shown in
Thus, the design of the MEU may consider and/or place emphasis on throughput and area (gate count), such as to achieve the highest performance at the lowest possible gate count. In one case, a MEU as described above, may produce one Sum of Absolute Difference (SAD) every clock cycle. Moreover, as described above, such an MEU can be programmed to handle various ME search widow selection algorithm (e.g. Full search, Logarithmic search etc.). Also, as described above, such an MEU may be programmable to handle SAD computations at 4×4, 8×8 and also can be extended to handle reference block sizes greater than 8×8 (e.g., 8×16, 16×8, 16×16, etc.). For instance, embodiments described herein provide motion estimation capabilities that can be very useful for MPEG2 and MPEG4 encoding applications.
It is considered that the couplings, connections, lines, or data paths connecting devices, apparatus, systems, modules or components herein (e.g., such as those shown and described with respect to
In the foregoing specification, specific embodiments are described. However, various modifications and changes may be made thereto without departing from the broader spirit and scope of embodiments as set forth in the claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.