This invention relates generally to bandwidth reduction in the transmittal of images using digital communications techniques.
The MPEG video compression scheme has become the worldwide standard for video compression, and is used in digital satellite broadcast, digital cable distribution, digital terrestrial broadcast, and DVD video encoding. The MPEG takes advantage of both spatial and temporal redundancy in conventional video content to achieve high compression ratios, while maintaining quality at reasonable data rates.
Temporal redundancy is exploited in MPEG video compression through the use of predictive frames. Once a frame has been encoded, transmitted and decoded, the frame content can be used as a prediction for other frames. One clever feature of the MPEG standard is the ability to use both a past reference frame (one which has already been displayed) and a future reference frame (one which has not yet been displayed). A reference frame can be created either by encoding the entire contents of the frame at once (an intra-coded or I-frame), or by coding the difference from a previous reference frame (a predictive or P-frame). An I-frame encompasses a relatively large amount of data, since every 16×16 pel region of the video frame must be encoded in a self-contained manner, that is, as an intra-coded macroblock. On the other hand, a P-frame can use one of two methods for each macroblock: Either the content can be predicted from a portion of the previous reference frame (by specifying a motion vector to a given position in the previous reference frame) with an optional differential correction applied (a motion-compensated predictive macroblock); or the content can be fully specified (an intra-coded macroblock).
A third type of frame can also be used in an encoded sequence. This frame type, a bi-directionally-predicted or B-frame, allows a flexible combination of a motion-compensated macroblock from a past reference frame and/or a motion-compensated macroblock from a future reference frame, with an optional differential correction applied (a bi-directional motion-compensated predictive macroblock). Alternatively, macroblocks in a B-frame can be encoded using intra-coding.
One common technique used in video production and in computer interfaces is the gradual transition from one image to another—a fade. Fades are used to enliven a video presentation, or for special effects in applications, particularly in games. By definition, a fade takes more than one frame to accomplish—a complete change of visual content in a single frame is considered a cut, not a fade. The MPEG encoding standard allows a simple and efficient technique for achieving a two-step fade through the use of P- and B-frames. Suppose that a first reference frame contains the visual content before the fade. A second reference frame can be encoded to contain the visual content after the fade. The two reference frames can be encoded as either I- or P-frames as desired. A single intermediate state can then be created by constructing a B-frame that simply averages the contents of the past and future reference frames, providing a two-frame fade. This procedure produces a two-step fade, but there is no simple extension of this technique to accomplish a multi-frame fade. To do this using conventional coding techniques requires the generation of multiple B-, P- or I-frames, each of which encodes part of the transition between the old and new visual content.
MPEG video image content is often used in contexts other than conventional linear video broadcast. For instance, many interactive television (iTV) applications use MPEG video encoding to produce full-color still frame images, which can then be decoded by MPEG decoding hardware during playout of the application. In such applications, memory and broadcast bandwidth both limit the amount of data that can be transmitted to and used on the set-top box (STB) by the application. Producing a fade effect in an iTV application through the use of conventional MPEG encoding thus requires a series of MPEG-encoded frames that must be broadcast to and decoded by the application.
Therefore, there exists a need for systems and methods that produce multi-frame fade effects in an iTV application that is memory efficient while providing for flexible use in the application.
The present invention provides methods and systems of using a single MPEG frame to produce a fade effect that extends over more than one frame period.
An example system includes a computer-based device that includes a receiver that receives an MPEG formatted image from a source system over a network, a component that modifies a sequence header of the received MPEG formatted image based on a pre-determined fade event, and a decoder that decodes the MPEG formatted image with the modified sequence header. Also, the system includes a display device that displays the decoded image.
The received MPEG formatted image may be a P- or B-frame formatted image.
Preferred and alternative embodiments of the present invention are described in detail below with reference to the following drawings.
FIGS. 5A-D illustrate fade effects for various levels of fades in accordance with embodiments of the present invention; and
The current invention defines methods and systems that produce a fade effect that extends over more than one frame period. Because the invention is particularly useful in the context of broadcast systems, the preferred embodiment is described as such a system.
The STB 36 receives input from the network 32 via an input/output controller 218, which directs signals to and from a video controller 220, an audio controller 224, and a central processing unit (CPU) 226. In one embodiment, the input/output controller 218 is a demultiplexer for routing video data blocks received from the network 32 to a video controller 220 in the nature of a video decoder, routing audio data blocks to an audio controller 224 in the nature of an audio decoder, and routing other data blocks to a CPU 226 for processing. In turn, the CPU 226 communicates through a system controller 228 with input and storage devices such as ROM 230, system memory 232, system storage 234, and input device controller 236.
The system 36 thus can receive incoming data files of various kinds. The system 36 can react to the files by receiving and processing changed data files received from the network 32.
While a set-top box is preferred, the same functionality may be implemented within a television, computer device, or other configuration.
At a decision block 308, the STB 36 having a processing device, receives the transmission and determines if a fade of the received P-frame encoded image is to occur. The request for presentation of the P-frame encoded image may be as a result of the occurrence of a particular frame within a video sequence, as a result of the passage of time, as the result of viewer interaction with the STB 36 via the user keypad 117, or by other means. The determination of whether a fade is to occur can be implemented, for example, by an automatic setting stored within the STB 36 or by a user fade request. The STB 36 receives the user request by any of a number of means, for example, a fade request signal is transmitted from an interface device, such as the user keypad 217 or by any of a number of different data input means. If no manual or automatic fade request is detected at the decision block 308, then the received encoded P-frame formatted image is decoded at a block 310 and sent to the display device 38 for display, see block 312. If, however, a fade request was present, as determined at the decision block 308, the STB 36 determines the number of fade frames required in accordance with the fade request, see block 320. The sequence header of the P-frame formatted image is modified based on the fade request (determined number of fade frames), see block 324. At a block 326, the STB 36 decodes the recently modified P-frame image at a block 328 and sends the decoded image to the display device 38 to be presented to a user. At a decision block 332, the STB 36 determines if the determined number of fade frames has been reached. If the determined number of fade frames has been reached, then the fade process is complete. If the number of fade frames has not been reached, then the process returns to subsequent decoding of the modified P-frame image at the block 326 until the fade process is complete. By the repeated decoding of the modified P-frame image (updating of reference frame), a fade effect occurs.
In MPEG video encoding, each macroblock in a P- or B-frame is either coded or skipped. If the macroblock is skipped, the content of the previous reference frame is copied into the current frame without modification. If the macroblock is coded, several options are available for the coding method:
All of these coding techniques except ‘Intra-coded’ and ‘Intra-coded using new quantizer’ result in non-intra encoding. The present invention requires encoding of each macroblock in a P- or B-frame as a non-intra macroblock with zero motion vectors, meaning that the final content for the macroblock is created by combining a prediction from a past and/or future reference frame, plus a correction encoded in the current frame data. The MPEG standard specifies default quantizers for each coefficient in both intra and non-intra encoding. The MPEG standard also allows for the specification of new quantizer matrices for either or both cases. The current invention takes advantage of this latter capability to accomplish the task of producing a fade effect from a single frame.
For convenience in what follows, the invention will be described through the use of P-frame encoding. However, the same approach can be used with B-frame encoding.
In the MPEG-1 video compression standard, the non-intra quantizer matrix can be specified in the sequence header element. This element must occur at the beginning of a video sequence, and can be repeated before any I-frame or P-frame in the sequence. Each repetition of the video sequence header can specify new content for either or both of the intra and non-intra quantizer matrices. In MPEG-1 video, the same quantizer matrix is used for luminance and chrominance components of the image.
In the MPEG-1 standard, the value for the DCT coefficient of a given row m and column n in a non-intra 8×8 coefficient matrix is given by Equation (1):
dct _recon[m][n]=(2* dct _zz[i]* quantizer _scale * non _intra _quant[m][n])/16 (1)
where dct_recon[m][n] is the reconstructed coefficient for row m, column n; dct_zz[i] is the i-th coefficient in zig-zag order; quantizer_scale is the overall quantizer for the slice; and non_intra_quant[m][n] is the non-intra quantizer matrix element for row m, column n. The reconstruction process requires that any even non-zero value is decremented by one if greater than zero, or incremented by one if less than zero. The default non-intra quantizer matrix value is 16 for every element, so Equation (1) reduces to Equation (2):
dct _recon[m][n]=2* dct _zz[i]* quantizer _scale (2)
which always yields an even value, and is thus always decremented by one. Thus, for any coefficient value k, the reconstructed coefficient value is (2*k*quantizer_scale−1).
The adjustment of even non-zero reconstructed coefficients limits the accuracy of the fade technique described above. The conversion from the reconstructed DCT coefficients to the luminance or chrominance adjustment is linear (except for round-off error), so applying a difference twice is equivalent to applying twice the difference. Consider the case where a P-frame is created with a quantizer_scale value of 4, and the resulting data is used to produce a fade effect according to the method described above. Suppose that for a given encoded macroblock coefficient value k is 1. In this case, the reconstructed coefficient is 7 (2*4−1) for the original non-intra quantizer matrix value of 16, but the reconstructed coefficient is 3 (2*2−1) when a two-step fade is performed (non-intra quantizer matrix value of 8). The difference introduces a modest error—applying the fade step twice yields a final value of 3+3=6, which is smaller than the original value of 7 by 15%. However, if a four-step fade is performed, the reconstructed coefficient for the fade frame (using a non-intra quantizer matrix value of 4) is 1 (2*1−1), so applying the fade four times yields a final value of 4, which is only 57% of the desired value. In practice, this means that when creating a fade, the quantizer should be at least as large as the number of fade steps, and preferably twice as large.
Note that at each step in any given fade, the identical P-frame encoded data content is presented to the decoder, resulting in an increment of the total change from the first frame to the second frame. Note that display time codes contained in a picture header of each P-frame may need to be modified so that time code for each presentation of the P-frame data corresponds to its linear position in time.
Unequal Fade Steps
FIGS. 5B-D have the advantage that the same P-frame content is decoded at each step (except for the temporal reference in the header). As an alternative, the P-frame content could be modified at each step to have a different fraction of the initial differential content. Thus for instance a three-step fade could be created by using non-intra quantizer matrix values of 3, 5, and 8 (3+5+8=16).
Extension to MPEG-2
In another embodiment, the MPEG-2 video encoding standard is used. In the MPEG-2standard, video color formats other than 4:2:0 Y:Cb:Cr are permitted. The 4:2:2 and 4:4:4 color formats require the use of two non-intra quantizer matrices, which are defined in the Quant Matrix Extension header. In this case, the matrix values in the Quant Matrix Extension header would be modified according to the scheme described above.
B-Frame Fade Effect
An alternative embodiment of this invention would employ the use of B-frame encoding rather than P-frame encoding. The quantizer values for each macroblock are modified to change the magnitude of change applied for each non-intra macroblock. Rather than using the default non-intra quantizer matrix, the values of the non-intra quantizer matrix are reduced to one-half, one-quarter, or one-eighth of the default value, with the quantizer scale value correspondingly multiplied by two, four, or eight. The new non-intra quantizer matrix is used to encode both the first and second frames of the fade, and the non-intra quantizer matrix is incorporated into the sequence header for the first reference I- or P-frame.
The first reference frame is encoded as an I- or P-frame, using the new non-intra quantizer matrix as required. The second frame is then encoded as a B-frame, using only the Fwd/Coded and Fwd/Not Coded macroblock types, which encode the differences between the reference frame and the second frame. In the resulting B-frame MPEG data, quantizer values are given in each successive Slice header. Decoding of this B-frame results in a new picture which is constructed relative to the past reference frame, and the new picture is displayed at the output. However, the new frame does not become the new reference frame or modify the existing reference frame. Thus, if the quantizer is gradually increased in successive presentations, the image content differences will be gradually applied to the reference image, yielding the desired fade effect. Thus, for instance, if a four-step fade is desired, the quantizer value q for each slice would be set successively to q/4, q/2, 3q/4, and q. Because slice headers present a unique byte pattern, they can be located in the encoded data with relative ease. In the preferable embodiment, the encoded data is contained in an alternate form. The data starts with a slice table header, which denotes the number of slices in the data. The slice table header is followed by a series of slice offsets, which give the offset in bytes from the beginning of the data to each corresponding slice. Following the slice table is the conventional MPEG picture header, and the slice data. The presence of the slice table allows for rapid location and modification of the quantizer values supplied in each slice header. The data configuration for this preferred data format is shown in
When this alternative is used, the quantizer value can be modified from frame to frame according to any desired sequence, including non-monotonic sequences, so that for instance an image fading from black could appear to fade in, then fade out, then fade back in again. Note that with the B-frame technique, no error accumulation occurs from step to step, so the number of steps in the fade sequence is essentially unlimited.
While the preferred embodiment of the invention has been illustrated and described, as noted above, many changes can be made without departing from the spirit and scope of the invention. Accordingly, the scope of the invention is not limited by the disclosure of the preferred embodiment. Instead, the invention should be determined entirely by reference to the claims that follow.
This application claims priority to provisional patent application Ser. No. 60/682,025, filed May 16, 2005 and is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60682025 | May 2005 | US |