This is the first application filed for the present invention.
Not Applicable.
The present invention relates to digital video image processing systems, and in particular to a method of motion compensation using shared resources of a graphics processor unit.
As is well known in the art, digital video image processing systems commonly rely on Moving Picture Experts Group (MPEG) standards for compressing and transmitting digital video images. These MPEG standards use discrete cosine transform (DCT) for encoding/compressing video images. DCT encoding produces a stream of coefficients, which are then quantized to produce digitized video image data that can be stored or transmitted. This process of discrete cosine transformation and quantization greatly reduces the amount of information needed to represent each frame of video. In order to reconstruct the original video image (e.g. for display), the encoding process is reversed. That is, the video image data is inverse quantized, and then processed using an inverse discrete cosine transform (IDCT).
However, during reconstruction of the video image, if there is not enough processing power to maintain the frame rate and perform IDCT, then a simplified IDCT is performed. This is typically accomplished by simply ignoring certain portions of the transform altogether, resulting in imperfect replication of the original image. Image quality will suffer, typically by loss of detail, and “blocking” (that is, square areas lacking in detail) within the displayed image.
A known method of reducing the amount of IDCT data that must be processed to reconstruct a video image is to implement a motion compensation technique during the encoding process. For example, consider an area of an image that moves, but does not change in detail. For such an area, much of IDCT data can be replaced by a motion vector. During subsequent reconstruction of the video image, a motion compensation algorithm applies the motion vector to a previous and/or next image to estimate the correct image data for the relevant portion of the display image. Digital image processing systems commonly rely on a host processor (e.g. the main CPU of a personal computer) to perform motion compensation processing when needed.
A difficulty with this technique is that it requires processing of multiple image frames (or fields) during reconstruction of the video image. This operation is computationally intensive and can impose an unacceptable burden on the resources of the host processor. If sufficient processing power is unavailable, various motion-compensation “artifacts” (e.g. choppy movements within the frame) become apparent to a person viewing the reconstructed image.
An alternative solution is to embed a single-purpose motion compensation engine (within the graphics processor unit (GPU)). This approach ensures that all video image processing can be performed independently of the host CPU. However, this solution consumes valuable “real-estate” on the GPU, which increases costs. In view of the limited use of the motion compensation engine, this increased cost is frequently considered to be unacceptable.
Accordingly, a system enabling shared resources of a GPU to perform motion compensation processing during the reconstruction of video images remains highly desirable.
Accordingly, an object of the present invention is to provide a method and system for controlling shared resources of a GPU to perform motion compensation processing, during the reconstruction of a video image.
An aspect of the present invention provides a method of motion compensation within a displayable video stream using shared resources of a Graphics Processor Unit (GPU). In accordance with the present invention, image data including a sequential series of image frames is received. Each frame includes any one or more: frame-type; image texture; and motion vector information. At least a current image frame in analyzed, and the shared resources of the GPU are controlled to generate a motion compensated image frame corresponding to the current image frame, using one or more GPU commands.
Further features and advantages of the present invention will become apparent from the following detailed description, taken in combination with the appended drawings, in which:
a–d show principle steps in a motion compensation process for I, P and B blocks in accordance with an embodiment of the present invention;
It will be noted that throughout the appended drawings, like features are identified by like reference numerals.
The present invention provides a method and system for performing motion compensation using shared resources of a graphics processor unit (GPU) of a digital image processing system. In order to facilitate understanding of the present invention, a description of conventional motion compensation in accordance with MPEG standards is presented below with reference to
As is well known in the art, three main types of frames are defined in the MPEG standard, namely: Intra coded frames (I-Frames); Predictive coded frames (P-Frames); and Bi-directionally predictive frames (B-Frames).
I-Frames are coded/decoded without reference to any other frame. This means that motion vectors do not play a role in the reconstruction of an I-Frame, which is accomplished solely on the basis of its IDCT data.
P-Frames are coded/decoded with reference to a previous frame. This means that IDCT data of the frame, in combination with motion vectors from a previous Intra or Predictive coded frame, are used in the reconstruction of a P-Frame.
B-Frames are coded/decoded with reference to motion vectors of both previous and following reference frames. This type of frame, however, does not contribute as a reference frame to any other future or past frame.
At step S2, a macroblock (e.g. representing a 16×16 block of pixels) of the frame is extracted from the encoded video bit stream, and decrypted (at S4). The type of macroblock is then determined (at S6), based on the frame type.
All macroblocks are decompressed (following the left-hand path in
For P- and B-type macroblocks, the processing path on the right hand side of
The motion references obtained from P(t−1) (and possibly P(t+1)) are then modified by respective half-pel prediction averaging (at S20). The resulting predictions are then combined (at S22), and the combination summed (at S24) with the IDCT coefficients calculated in step S14. If the current macroblock in question is of I-type, the summation result is saturated (e.g. by adding 128) at step S26. This result is then clipped (at S26) to obtain a value within an appropriate value range for display.
If the entire frame has been processed, then the frame reconstruction operation is complete. Otherwise, processing of the frame continues with the next macroblock.
The present invention provides a method and system for performing efficient motion compensation using shared resources of a Graphics Processor Unit (GPU). A preferred embodiment of the present invention is described below with reference to
It is estimated that utilization of shared resources of the GPU 8 for motion compensation, in accordance with the present invention, will consume approximately 25% of the processing bandwidth of a typical CPU. Thus no modifications need be made to implement the methods of the present invention. However, in some cases it may be considered desirable to increase the processing bandwidth of the CPU and/or GPU 8, to optimize performance. It is considered that any such modifications are well within the purview of those skilled in the art, and thus are not described in detail herein.
In general, the framer 12 is designed to receive decrypted frame data from a conventional CODEC (not shown). The framer 12 operates to convert the image data of the frame into normalized spatial coordinates (i.e. u, v) appropriate for display, and analyses the frame to determine its type (i.e. I, P or B) and field format (if any). The Command generator 14 uses this information to generate appropriate GPU commands for controlling the operation of one or more shared resources of the GPU 8 to perform motion compensation.
As shown in
x=int(N%(frame—width/16))*16 (eq. 1)
y=int(N/(frame—width/16))*16 (eq. 2)
These x,y values provide the coordinate values of the involved macroblock 28a in both the current (IDCT) frame 20 and the reconstructed frame 26. Based on these x,y coordinate values, normalized coordinates (u,v) within the image (frame) can be calculated for the current and reconstructed frames 20 and 26 as follows:
u offset=float(x/frame—width) (eq. 3)
v offset=float(y/frame—height) (eq. 4)
The rectangle coordinates for the current and reconstructed frames 20 and 26 can then be calculated using equations 5–7 below:
upper left=(u,v) (eq. 5)
upper right=(u+macrobloc—width/frame—width, v) (eq. 6)
lower left=(u, v+macrobloc—height/frame—width) (eq. 7)
lower right=(u+macrobloc—width/frame—width, v+macrobloc—height/frame—width) (eq. 8)
These coordinates define the rectangle within which the image data of the frame 20 is sampled to obtain the content of the involved macroblock. Following calculation of rectangle coordinates, the IDCT (image texture) information of the current frame 20 is sampled using a “nearest” filter (at step S32), which by definition has single-pel precision. The macroblock 28a (now containing sampled image data) is forwarded (at S34) to the pixel shader 18 for further processing.
The macroblock 28a (forwarded from the texture engine 16 at S34) is saturated (at S48) in a conventional manner. The saturated macroblock data result is then clipped (at S50), again in a conventional manner, to provide a valid color range (e.g. of 16–240) of the reconstructed macroblock 28b. The pixel shader 18 then writes the reconstructed macroblock 28b to memory 30 (at S50) using the x,y coordinates calculated above to assemble the reconstructed frame 26, in displayable format, within memory 30.
As is well known in the art, the input frame 20, may be different than a x,y texture, it could also be in the form of a linear texture. In this case the u,v coordinates are calculated using the following equations:
u offset=float (linear—address/surface—length) (eq. 9)
v offset=0 (eq. 10)
The rectangle coordinates for the current frames 20 are then calculated using equations 10–12 below:
upper left=(u,v) (eq. 11)
upper right=(u+macrobloc—width/surface—length, v) (eq. 12)
lower left=(u+macrobloc—width*macrobloc—height/surface—length, v) (eq. 13)
lower right=(u+macrobloc—width/surface—length+macrobloc—width*macrobloc—height/surface—length, v) (eq. 14)
As shown in
Rectangle coordinates are calculated for the current frame 20 and reconstructed frame 26 (step S30); the current frame 20 sampled (step S32); and the sampled macroblock 28a forwarded to the pixel shader 18 (step S34), all as described above with reference to
u offset=float ((x+(x½))/ frame width) eq. 15)
v offset=float ((y+(y½))/ frame—height) (eq. 16)
These offsets can be used to calculate the appropriate rectangle coordinates in each macroblocks of the previous frame 22, in accordance with equations 17–20 below. These rectangle coordinates define the square area within which the image data of the frames 22 is sampled to obtain the motion vector information of the involved macroblocks 28c.
upper left=(u,v) (eq. 17)
upper right=(u+macrobloc—width/picture—width, v) (eq. 18)
lower left=(u, v+macrobloc—height/picture—width) (eq. 19)
lower right=(u+macrobloc—width/picture—width, v+macrobloc—height/picture—width) (eq. 20)
Following calculation of rectangle coordinates, motion vector information of each of the previous frame 22 is sampled, using a “Bilinear” filter (at step S44), which has half-pel precision. The sampled macroblocks 28c (now containing sampled image data) are forwarded (at S46) to the pixel shader 18 for further processing.
A conventional pixel shader 18 is designed to blend a given number of surfaces (frames) in parallel. In order to generate the reconstructed frame 26 only two surfaces need to be blended, macroblocks 28 of all two frames 20,22 can be processed simultaneously. As shown in
A P-block can also be constructed only from the previous frame information. In this case the error surface is not used, and the pixel shader operation (S48) has the following equation:
Result=Source2 (eq. 21)
As specified in
d shows the case for a bi-directional B-block. The main difference from previous demonstration consists of generating the frame 30 from the input frames 20, 22 and 24. The macroblocks 28c,d must include motion vector information (step S42) (e.g. in the form of (x1,y1) for previous frame 22 and (x2,y2) for the following frame 24). The operation of the pixel shader (S48) consists of performing a blending operation of both the previous and following macroblock pixels using equation 22.
Result=Source1+(Source2+Source3)/2 (eq. 22)
As is well known in the art, an interlaced formatted frame uses two fields (e.g. a top field and a bottom field) to convey a complete image frame. For a non-interlaced field format, both fields can be processed independently, following the same steps described above with reference to
For an interlaced field format, a three pass process can be used, in which both fields are first extracted from the input frame, these fields are then process independently, and finally both results are combined into a final interlaced field format.
The extraction of the top interlaced field consists of sampling the provided input frame using a “Nearest” filtering, scaling by half in the vertical direction, and forwarding the result to respective memory location. The extraction of the bottom interlaced field consists of offsetting the input frame by one line, sampling the frame using a “Nearest” filtering, scaling by half in the vertical direction, and forwarding the result to a second memory location.
The motion compensation on both top and bottom field can be processed separately using the method outlined in
The interlacing of two resulting motion compensated fields into a single output frame is generated with the help of a “key”. This key can be a 3d command, an additional texture, a pixel shader color index counter or a specific pixel shader operation.
For the purposes of merging the top and bottom, the key 36 is provided as a texture, with the value of each bit indicating whether a value should be sampled from source 1 or source 2. In order to ensure an appropriate amount of key information is available to complete the merge, the key 36 is sampled using “Nearest” filtering (at S60); scaled to the size of the frame 26 (at S62); and then forwarded to the pixel shader 18 (at S48).
Within the pixel shader 18, the two sources are merged (at S48) by sampling pixels from each source in accordance with the key 36. In this case, the key 36 operates to select lines 1, 3, 5 . . . etc. (containing the top field information) from source 1, and lines 2, 4, 6 . . . etc. (containing the bottom field information) from source 2. This selection sequence yields a single merged frame 26, containing both fields in an interlaced manner.
Another method for reconstructing a complete image frame from two interlaced fields consists of using a cascading merge (or blending) operation.
An important point of the cascading merge concerns the vertical half-pel sampling. This operation is performed by a pixel shader instruction instead of a filtering mode. This is required since the filtering method would occur on consecutive (top and bottom) field instead of only the top field information. Also the calculation of rectangle offset for the top field is taken from equation 23 and 24.
u offset=float((x+(x½))/frame—width) eq. 23)
v offset=float((y+int(y½))/frame—height) (eq. 24)
While the bottom field gets processed by using macroblock having the same rectangle values offset by 2 lines. This is done by finding the u,v coordinates using equations 25 and 26.
u offset=float((x+(x½))/frame—width) (eq. 25)
v offset=float((y+int(y½)+2)/frame—height) (eq. 26)
For the present invention the term “rectangle” can be represented as a sprite, a quad or two triangles.
In general, the number of steps required to generate the reconstructed frame 26 is directly related to the number of surfaces (i.e. frames or fields) that must be processed by the pixel shader 18. In principle, the whole cascading process can be done in one pass, if the pixel shader 18 is designed to process at least seven surfaces (one for the key 36, and two surfaces for each of the current 20, previous 22 and following frames 24) in parallel.
The embodiment(s) of the invention described above is (are) intended to be exemplary only. The scope of the invention is therefore intended to be limited solely by the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5977997 | Vainsencher | Nov 1999 | A |
6208350 | Herrera | Mar 2001 | B1 |
6538658 | Herrera | Mar 2003 | B1 |
6621490 | Frank et al. | Sep 2003 | B1 |