A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates generally to the field of digital video encoding and, more particularly in one exemplary aspect to methods and systems of changing bitrate of a digital video bitstream.
2. Description of the Related Technology
Since the advent of Moving Pictures Expert Group (MPEG) digital audio/video encoding specifications, digital video is ubiquitously used in today's information and entertainment networks. Example networks include satellite broadcast networks, digital cable networks, over-the-air television broadcasting networks, and the Internet.
Furthermore, several consumer electronics products that utilize digital audio/video have been introduced in the recent years. Some examples included digital versatile disk (DVD), MP3 audio players, digital video cameras, etc.
Such proliferation of digital video networks and consumer products has led to an increased need for a variety of products and methods that perform storage or processing of digital video. One such example of video processing is changing the bitrate of a compressed video bitstream. Such processing may be used, for example, to change the bitrate of a digital video program stored on a personal video recorder (PVR) at the bitrate received from a broadcast video network, to the bitrate of a home network to which the program is being sent. Changing the bitrate of a video program is also performed in other video distribution networks such as digital cable networks, or an Internet protocol television (IPTV) distribution network.
In conventional approaches, one simple way to change the bitrate is by decoding received video bitstream into an uncompressed video stream, and then re-encoding the uncompressed video to a desired output rate. While conceptually easy, this method is practically inefficient because of the need to implement a computationally expensive video encoder to perform bitrate changes, i.e., transrating.
Several transrating techniques have been proposed for the MPEG-2 video compression format. With the recent introduction of advanced video codecs such as VC-1, also known as the 421M video encoding standard of the Society of Motion Picture and Television Engineering (SMPTE), and H.264, the problem of transrating has become even more complex. Broadly speaking, it takes much higher amounts of computations to encode video to one of the advanced video codecs. Similarly, decoding an advanced video codec bitstream is computationally more intensive than first generation video encoding standards. As a result of increased complexity, transrating requires a higher amount of computations. Furthermore, due to wide scale proliferation of multiple video encoding schemes (e.g., VC-1 and H.264), seamless functioning of consumer video equipment requires transcoding from one encoding standard to another, besides transrating to an appropriate bitrate.
While the computational complexity requirements have increased due to sophisticated video compression techniques, the need for less complex and efficient transrating solutions has also increased due to the proliferation of digital video deployments, and increased number of applications where transrating is employed in a digital video system. Many consumer devices, which are traditionally cost sensitive, also require transrating.
Hence, there is a salient need for improved methods and apparatus that enable lower complexity transrating of digital video streams in an efficient and cost effective manner. Such improved methods and apparatus will also ideally be compatible with extant (legacy) processing platforms and protocols, as well as with newer and future implementations.
The present invention satisfies the foregoing needs by providing improved methods and apparatus for video transrating and transcoding.
In a first aspect of the invention, a video transrating method is disclosed. In one embodiment, the method comprises: (1) receiving an input compressed video bitstream having a first format and having a first bitrate, (2) parsing the input compressed video bitstream to generate a pass-through syntax bitstream, (3) decompressing the input compressed video bitstream to produce an intermediate format video signal, (4) processing the intermediate format video signal, and (5) recompressing the intermediate format video signal to produce an output compressed video bitstream having a second format and having a second bitrate. In one variant, the recompressing is responsive to an information in the pass-through syntax bitstream and the intermediate format video signal comprises a plurality of decoded macroblocks and mode refinement information for each of the plurality of decoded macroblocks.
In a second aspect of the invention, a video transcoding apparatus is disclosed. In one embodiment, the apparatus comprises: a processor; a data bus; and a computer-readable memory. The processor is configured to: (1) receive an input compressed video bitstream having a first format and having a first bitrate; (2) parse the input compressed video bitstream to generate a second bitstream; (3) decompress the input compressed video bitstream to produce an intermediate format video signal; (4) process the intermediate format video signal; (5) compress the intermediate format video signal to produce an output compressed video bitstream having a second format and having a second bitrate, said compression responsive at least to information in said second bitstream; said second bitrate responsive to the said first bitrate and a target transrating bitrate.
In one variant, the intermediate format video signal comprises a plurality of decoded macroblocks and mode information for each of the plurality of decoded macroblocks. In another variant, the mode information comprises intra encoding modes for at least some of the plurality of decoded macroblocks. In a further variant, the compressing preserves an encoding mode of substantially all macroblocks in the input compressed video bitstream. In still another variant, the decompressing is performed without performing a deblocking operation on any macroblock in the input compressed video bitstream. In yet another variant, the compressing is performed without performing a deblocking operation on any macroblock in the output compressed video bitstream.
In a third aspect of the invention, a video transrating method is disclosed. In one embodiment, the method comprises: decoding an input video bitstream to generate a first residual signal having a first temporal location; generating a second residual signal responsive to the first residual signal and a value of a first intermediate signal having a temporal location earlier in time than the first temporal location; requantizing and retransforming the second residual signal to form a second intermediate signal; filtering the second intermediate signal to generate a third intermediate signal; and reconstructing and motion compensating the third intermediate signal to re-generate a value of the first intermediate signal corresponding to the first temporal location.
In one variant, said reconstructing is responsive to mode refinement information extracted from the input video bitstream. In another variant, said motion compensating is responsive to an intra-predicted signal generated from said second intermediate signal.
In a fourth aspect, a video transrating method is disclosed. In one embodiment, the method comprises: (1) decoding, dequantizing, detransforming an input video bitstream to generate a first residual signal; (2) generating a second residual signal responsive to the first residual signal and a first reconstructed intermediate signal; (3) requantizing and transforming the second residual signal to a second reconstructed intermediate signal; (4) filtering the second reconstructed intermediate signal to generate a third intermediate signal; and (5) reconstructing and motion compensating the third intermediate signal to generate the first reconstructed intermediate signal. In one variant, the aforementioned decoding comprises an entropy decoding.
In another embodiment, the method comprises: providing a video bitstream having a first bitrate associated therewith; processing said video bitstream utilizing temporal and spatial correlation; and generating an output bitstream having a second bitrate different from the said first bitrate. The processing of said video bitstream utilizing temporal and spatial correlation decodes a plurality of macroblocks comprising the video bitstream to a partially decoded intermediate format.
In one variant, the method further comprises: extracting a plurality of header bits from the video bitstream; and inserting the plurality of header bits in the output bitstream. In another variant, the partially decoded intermediate format is calculated without performing motion compensation on the plurality of macroblocks.
In a fifth aspect, a video transrating apparatus is disclosed. In one embodiment, the apparatus comprises: (1) a decoding module for decoding an input video bitstream to produce a decoded bitstream; (2) dequantizing and detransforming the decoded bitstream to generate a first residual signal; (3) a residual signal generation module for generating a second residual signal responsive to the first residual signal and a first reconstructed intermediate signal; (4) a requantizing module for requantizing the second residual signal to a second reconstructed intermediate signal; (5) a filtering module for filtering the second reconstructed intermediate signal to generate a third intermediate signal; and (6) a reconstructing module for reconstructing and a motion compensation module for motion compensating the third intermediate signal to generate the first reconstructed intermediate signal. In one variant, the aforementioned decoding comprises an entropy decoding.
In a sixth aspect of the invention, a video processor is disclosed. In one embodiment, the video processor comprises a digital processor such as a DSP or microprocessor having one or more video transcoding and/or transrating algorithms running thereon in the form of computer code.
These and other features, aspects, and advantages of the present invention will be better understood with reference to the following drawings, description and claims.
The following detailed description is of the best currently contemplated modes of carrying out the invention. The description is not to be taken in a limiting sense, but is made merely for the purpose of illustrating the general principles of the invention, since the scope of the invention is best defined by the appended claims.
As used herein, “video bitstream” refers without limitation to a digital format representation of a video signal that may include related or unrelated audio and data signals.
As used herein, “transrating” refers without limitation to the process of bit-rate transformation. It changes the input bit-rate to a new bit-rate which can be constant or variable according to a function of time or satisfying a certain criteria. The new bitrate can be user-defined, or automatically determined by a computational process such as statistical multiplexing or rate control.
As used herein, “transcoding” refers without limitation to the conversion of a video bitstream (including audio, video and ancillary data such as closed captioning, user data and teletext data) from one coded representation to another coded representation. The conversion may change one or more attributes of the multimedia stream such as the bitrate, resolution, frame rate, color space representation, and other well-known attributes.
As used herein, the term macroblock (MB) refers without limitation to a two dimensional subset of pixels representing a video signal. A macroblock may or may not be comprised of contiguous pixels from the video and may or may not include equal number of lines and samples per line. A preferred embodiment of a macroblock comprises an area 16 lines wide and 16 samples per line.
In one salient aspect, the present invention takes advantage of temporal and spatial correlation of video signals to reduce complexity of transrating a video bitstream. The video signal underlying a video bitstream has the notion of time sequenced video frames. For example, National Television System Committee (NTSC) signal broadcast in analog television networks in the United States is made up of 29.97=30/1.001 frames per second video signal. Furthermore, each video picture is made up of two-dimensional arrays of pixels. In one embodiment, the present invention contemplates processing video bitstreams representing smaller units of a frame; these smaller units are referred to herein as macroblocks (MB), although other nomenclature may be used. An MB may comprise for example a rectangular area of 16×16 pixels, each pixel being represented by a value or a set of values. For instance, a pixel may have a luminance value and two color values (Cb and Cr). Other possible implementations are possible and will be recognized by those of ordinary skill in the video processing field given the present disclosure.
In a video bitstream representing a video signal in a sequence that comprises video pictures, grouped together in sequence of MBs, the present invention applies transrating techniques to exploit correlations among MBs that are spatially near to each other, or to video pictures that are temporally near to each other. In particular, the present invention in one embodiment uses MB-level encoding decisions from spatially nearby MBs, and picture-level encoding decisions from temporal neighbors to trade off complexity of transrating.
The present invention further exploits characteristics of many advanced video encoders, whose processing techniques include both a “lossless” part—such as run-length encoding or filtering, and a “lossy” part such as quantization and rounding. The present invention may handle the lossless computational steps and the lossy computational steps separately, exploiting them for inter alia trading off complexity and quality of the resulting transrated video.
The intermediate format processing in the illustrated transrater 300 comprises a MB decision module 350. For processing in module 350, the transrater 300 may have most or substantially all pixels of a picture available in decompressed form. In one embodiment, the transrater 300 may make decisions regarding how to code each MB by processing the decompressed video. In another embodiment, the transrater 300 may preserve the MB modes as encoded in the incoming video bitstream.
In yet another embodiment, the transrater 300 may change MB decisions to help maintain video quality at the output of the transrater 300. This change in MB decisions may also be responsive to the target output bitrate. For example, to reduce number of bits generated by encoding a MB in the output video bitstream, the transrater 300 may favor encoding more MBs as inter-MBs instead of intra-MBs.
The recompression module 322 re-encodes the uncompressed video back to a compressed video bitstream by performing a recompression operation. The recompression may be performed such that the output video bitstream 354 comprises format compliant to an advanced video encoding standard such as e.g., H.264/MPEG-4 or VC-1. Because the input video bitstream is converted into an intermediate uncompressed video format, transrater 300 may advantageously be used to change the bitstream standard also. For example, input video bitstream 102 may be in H.264 compression format and the output video bitstream 104 may be in the VC-1 compression format, or vice-a-versa. The recompression module 322 includes a module 324 for processing decoded macroblocks, and a forward quantizer and forward transformer 326 that quantizes and transforms the residual output e2(i) generated from subtraction of the predicted signal p2(i) from the output of the decoded MB module 324. Another forward quantizer and forward transformer module 326 is used to quantize and transform coded residual signal for the decoder loop insider the recompression module 322. The decoder loop also includes an add/clip module 332, and a deblocking module 346 that provides input to the reconstruction module 340. The output predicted pictures from the reconstruction module 340 are used by a motion estimation module 338. The motion estimation module 338 receives motion vector information from the entropy decoder 308 (i.e., via the mode refinement module 352) to help speed up estimation of accurate motion vectors. A motion compensation module 336 is used to perform motion compensation in the recompression module 322. The motion compensation module 322 can be functionally different from the motion compensation module 304. The latter does a single motion compensation for a given mode specified in the compressed bitstream. In 322, the motion compensation module does motion compensation for one or more modes and passes on the results to the mode decision engine 334 to decide which mode to choose among the many tried. The output of motion compensation 322 is fed into a mode decision module 334, along with the output of an intra prediction module 342. The mode decision module 334, in turn drives the inputs to the add/clip module 332.
In
While conceptually easy to understand, the system shown in
When the transcoding system 100 is implemented in hardware, firmware or software, or a combination thereof, a designer may make several tradeoffs regarding timing of circuits used, bus bandwidth, available data and instruction storage memory, complexity of software instructions, and so on. When processing high pixel resolution data, one salient consideration for implementation is the amount of bus bandwidth required for reading the input bitstream, reconstructing the pixels, performing motion search, and storing intermediate results. In particular, the intermediate format processing module 108 (
The compression subsystem 422 of the illustrated embodiment comprises a decoded MB processing module 412 that receives decisions from MB decision module 450 and produces decoded MB pixel values. A residual signal, e2(i) is generated by subtracting output of the decoded MB processing module 412 and predicted pixel values p2(i). The residual signal e2(i) is then be quantized and transformed in module 426 to produce signal v2(i) used for entropy encoding to generate the output video bitstream 104. A quantizer and transformer module 430 is used to re-quantize pixel values of signal v2(i). The output of the quantizer and transformer module 430 is then processed through an add/clip module 432 to produce a signal x2(i) that is input to a deblocking module 446. The reconstruction module 440 is used to reconstruct pixels in uncompressed video format from output of the deblocking module 446. The uncompressed video is processed in a motion compensation module MC2436.
As previously noted, the apparatus 400 of
The intra decision module 344 and motion estimation module 338 and mode decision module 334 used in the transrater 300 of
Table 1 shows exemplary pass-through syntax that may be processed in the module 400:
If the video bitstream processed by a transrater represents interlaced high definition video at 1920 pixels×1080 lines resolution at 30 frames per second, the bus bandwidth required for data read/writes may include for example the values shown in Table 2 below:
As shown in
Advantages of the A1p 500 embodiment over the A1 400 embodiment include: (i) less logic due to the absence of deblocking at the decoder and partial encoder stages, (ii) less bus bandwidth out of the device to external memory (e.g., by approximately 62 megabytes per second in one implementation), (iii) less bus bandwidth into the device from external memory (e.g., by approximately 62 megabytes per second), and (iv) less use of internal memory (e.g., by approximately 2 megabytes).
Using the notations in
e
1(i)=iQ1T(v1(i)) (1)
x
1(i)=e1(i)+p1(i)=iQ1T(v1(i))+p1(i) (2)
Here, x1(i) is the pre-deblocked reconstructed picture for the decoder, p1(i) is the intra or inter-prediction, and iQ1T(·) is the inverse quantization and transform with quantization step Q1. The prediction p1(i) is given by:
where I1(·) is the intra-prediction operation, D1(·) is the deblocking operation, and MC1(·) is the motion compensated prediction operation. Intra-prediction I1(·) may require un-deblocked pixels from the current reconstructed picture x1(i), whereas the inter-prediction MC1(·) may require pixels from the deblocked stored reconstructed picture x1(i−j), j≠0.
The recompression operation may be mathematically represented by the following equations:
e
2(i)=x1(i)−p2(i)=e1(i)+p1(i)−p2(i) (4)
v
2(i)=fTQ2(e2(i))=fTQ2(iQ1T(v1(i))+p1(i)−p2(i)) (5)
x
2(i)=iQ2T(v2(i))+p2(i)=iQ2T(fTQ2(iQ1T(v1(i))+p1(i)−p2(i)))+p2(i) (6)
For frame i in a given MB, x2(i) is the pre-deblocked reconstructed picture for the encoder, v2(i) is output of the transcoder before entropy encoder, and p2(i) is the intra- or inter-prediction:
where iQ2T(·) and fTQ2(·) are the inverse and forward transforms with quantization Q2, I2(·) is the intra-prediction, D2(·) is the deblock, and MC2(·) is the motion compensated prediction function for the encoder stage. Using equations (4) through (6), equation (5) may be simplified to:
where j≠0. Equation (8) may be considered to be a general expression for the full decode and full encode as embodied in the transrater A0 300 in
I
1(·)=I2(·), and MC1(·)=MC2(·). (9)
Replacing this, in Equation (8), we get:
From Equation (4), we get the following approximate equation for the transrating architecture A2:
Equation (4) indicates that the intra-prediction uses the difference of the current un-deblocked reconstructed picture x1(i)−x2(i), whereas the motion compensated prediction uses the difference of the deblocked reference pictures D1(x1(i−j))−D2(x2(i−j)). If deblock is not there, i.e., D1(·)=D2(·)=identity function, then inter-prediction also uses the difference of the reference pictures x1(i−j))−x2(i−j), for j≠0. However, if deblock is present, the inter-prediction can be modified as:
D
1(x1(i−j))−D2(x2(i−j))˜XF(x1(i−j)−x2(i−j)), (12)
where XF(·) is the new X-Filter used instead of the deblocking filter defined in the H.264 standard. Thus, the new equation for the output v2 is:
Following Equation (13), instead of the individual reconstructed pictures, we use the difference of the reconstructed pictures.
For architecture A1p referenced above, the deblock functions D1(·) and D2(·) are removed from Equation (8) as:
This equation can be further approximated as in Equation (13) as:
The algorithm embodiment A2 may in certain cases have several advantages over the embodiments A1 400 and A1p. 500 previously described. These may include:
The bandwidth numbers for 1080i HD video at 30 frames/sec are:
1. Reference out: 1920 wide×1088 high×1.5 color×30 frames/sec=94,003,200 B/sec.
2. Reference in: 16 Parts×(9×9Y+2×3×3Cb/Cr) support×8160 MBs/frame×30 frames/sec×2 refs=775,526,400 B/sec.
3. Co-located samples out: 8160 MBs×160B×30 frames/sec=39,168,000 B/sec.
4. Co-located in: 8160 MBs×160B×30 frames/sec=39,168,000 B/sec.
5. Intra Pred: 2 in/out×1920 wide×2 color×30 frames/sec=230,400 B/sec.
Total bandwidth=948,096,000 B/sec.
Compared to the architectures A1 or A1p, architectures A2a or A2b each use one-half the bandwidth.
Returning to equation (1), A3 assumes that for a given picture i, the decoder and encoder stage predictions are the same, i.e., p1(i)=p2(i). Therefore, from equations (1) or (6) we have
v2(i)≈fTQ2(iQ1T(v1(i))) (16)
Clearly, the assumption is invalid for bit-rate changes beyond a small amount (e.g., less than 5%). Thus, larger bit-rate changes during transrating cause significant drift—that is increasing degradation in video quality in successive video frames, until refreshed by a corrective frame. As will be seen, this algorithm has much less logic and bandwidth requirements than the A2 embodiment described above; practically all of the bandwidth calculations discussed above are eliminated.
In one exemplary software implementation, methods of the present invention are implemented as a computer program that is stored on a computer useable medium, such as a memory card, a digital versatile disk (DVD), a compact disc (CD), USB key, flash memory, optical disk, and so on. The computer readable program, when loaded on a computer or other processing device, implements the transcoding and/or transrating methodologies of the present invention.
It would be recognized by those skilled in the art, that the invention described herein can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. In an exemplary embodiment, the invention may be implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
In this case, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
It will also be appreciated that while the above description of the various aspects of the invention are rendered in the context of particular architectures or configurations of hardware, software and/or firmware, these are merely exemplary and for purposes of illustration, and in no way limiting on the various implementations or forms the invention may take. For example, the functions of two or more “blocks” or modules may be integrated or combined, or conversely the functions of a single block or module may be divided into two or more components. Moreover, it will be recognized that certain of the functions of each configuration may be optional (or may be substituted for by other processes or functions) depending on the particular application.
It should be understood, of course, that the foregoing relates to exemplary embodiments of the invention and that modifications may be made without departing from the spirit and scope of the invention as set forth in the following claims.
This application claims priority to co-owned and co-pending U.S. provisional patent application Serial No. 61/197,216 filed Oct. 24, 2008 entitled “Method And Apparatus For Transrating Compressed Digital Video”, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61197216 | Oct 2008 | US |