The present invention relates to an apparatus and a related method for transcoding a previously coded digital video data stream into a layered stream consisting of a base layer having a lower data rate than the original source stream and an enhancement layer encoded using Fine-Granular Scalability (FGS) techniques. The present invention comprises an efficient means for re-encoding existing digital video into FGS multilayer video to provide variable levels of displayed picture quality under conditions of changing bandwidth degradation in wireless and/or wireline networks.
Digital streaming video may be transmitted using a video coding standard, such as MPEG, over a channel in which the available bandwidth is time-varying and location dependent. This frequently occurs in wireless networks, but may also occur in a wireline networks in which bandwidth is limited. When the available bandwidth is less than the minimum level required for the data rate of the video stream being sent over the network, degradation of the displayed video results.
This problem may be solved by change the data rate of the pre-coded video content according to channel conditions. This technique is known as trans-rating. However, trans-rating requires fast and accurate predictions of channel capacity, which is difficult to obtain. Consequently, there still are occasions when a mismatch between channel capacity and the video source data rate occurs, which results in a loss of video packets.
Prioritized streaming technologies can better adapt to varying channel capacity. In prioritized streaming, the essential (or base layer) information is encoder according to one embodiment of the prior art;
Streaming video transmitter 110 comprises video frame source 112, video encoder 114, storage 115, and encoder buffer 116. Video frame source 112 may be any device capable of generating a sequence of uncompressed video frames, including a television antenna and receiver unit, a video cassette player, a video camera, a disk storage device capable of storing a video clip, and the like. The uncompressed video frames enter video encoder 114 at a given picture rate (or streaming rate) and are compressed according to any known compression algorithm or device, such as an MPEG-4 encoder. Video encoder 114 then transmits the compressed video frames to encoder buffer 116 for buffering in preparation for transmission across data network 120.
Data network 120 may be any suitable network and may include portions of both public data networks, such as the Internet, and private data networks, such as an enterprise-owned local area network (LAN) or a wide area network (WAN). In an advantageous embodiment of the present invention, data network 120 comprises a wireless network. In particular, data network 120 may be a wireless home network
Streaming video receiver 130 comprises decoder buffer 132, video decoder 134, storage 135, and video display 136. Depending on the application, streaming video receiver may be any one of a wide variety of receivers of video frames, including a television receiver, a desktop personal computer (PC), a video cassette recorder (VCR), or the like. Decoder buffer 132 receives and stores streaming compressed video frames from data network 120. Decoder buffer 132 then transmits the compressed video frames to video decoder 134 as required. Video decoder 134 decompresses the video frames at the same rate (ideally) at which the video frames were compressed by video encoder 114. Video decoder 134 sends the decompressed frames to video display 136 for play-back on the screen of video display 134.
In an advantageous embodiment of the present invention, video encoder 114 may represent a standard MPEG encoder implemented using any hardware, software, firmware, or combination thereof, such as a software program executed by a conventional data processor. In such an implementation, video encoder 114 may comprise a plurality of computer executable instructions stored in storage 115. Storage 115 may comprise any type of computer storage medium, including a fixed magnetic disk, a removable magnetic disk, a CD-ROM, magnetic tape, video disk, and the like. Furthermore, in an advantageous embodiment of the present invention, video decoder 134 also may represent a conventional MPEG decoder implemented using any hardware, software, firmware, or combination thereof, such as a software program executed by a conventional data processor. In such an implementation, video decoder 134 may comprise a plurality of computer executable instructions stored in storage 135. Storage 135 also may comprise any type of computer storage medium, including a fixed magnetic disk, a removable magnetic disk, a CD-ROM, magnetic tape, video disk, and the like.
Due to variations in the available bandwidth in data network 120, it is necessary to transcode video data in video encoder 114 using fine granular scalability (FGS) according to the principles of the present invention. Trans-rating and FGS are briefly described herein. Trans-rating consists of the direct re-encoding of an existing (original) video stream to a new video stream having a lower data rate than the original. The new lower-rate video stream may be correctly decoded and displayed with only a reduction in image quality relative to that of the original stream. This is a widely-used scheme for reducing the data rate of a video stream when the available transmission bandwidth is less than the full data rate of the original stream.
Re-quantization coefficients block 230 determines new (or re-quantization) coefficients suited to the new, lower video data rate (i.e., video data rate conversion ratio). Quantization circuit 215 uses the re-quantization coefficients to re-quantize the output of inverse quantization circuit 210, thereby producing a stream of re-quantized DCT coefficients. Variable-length coder (VLC) 220 then encodes the re-quantized DCT coefficients to produce the desired low-rate video stream.
Transrater 200 decodes the original video stream to the extent necessary to identify and evaluate the quantized DCT coefficients, along with the associated quantization factors, so that the original coefficient values can be readily computed. Given the data rate of the original stream and the desired rate of the trans-rated video stream, re-quantization coefficients block 230 computes a new quantization factor for each coefficient. Quantization circuit 215 then scales the de-quantized DCT stream by this factor. In this manner, a video stream having the same content as the original stream, but a lower data rate and a correspondingly lower image quality, is generated for transmission under network bandwidth conditions that correspond to the lower rate. However, due to the complexity of the trans-rating algorithm, it is typically implemented using a special-purpose processor.
Motion estimation circuit 330 receives the original video signal and estimates the amount of motion between a reference frame provided and a current present video frame as represented by changes in pixel characteristics. For example, the MPEG standard specifies that motion information may be represented by one to four spatial motion vectors per 16×16 sub-block of the frame. Motion compensation circuit 325 receives the motion estimates from motion estimation circuit 330 and generates motion compensation factors that are subtracted from the original input video signal by adder (or combiner) 305.
DCT circuit 310 receives the resultant output from adder 305 and transforms it from a spatial domain to a frequency domain using known techniques such as discrete cosine transform (DCT). Quantization circuit 315 receives the original DCT coefficient outputs from DCT circuit 310 and further compresses the motion compensation prediction information using well-known quantization techniques. Quantization circuit 315 determines a division factor to be applied for quantization of the transform output.
Variable length coder (VLC) 320, which may be, for example, an entropy coding circuit, receives the quantized DCT coefficients from quantization circuit 315 and further compresses the data using variable-length coding techniques that represent areas with a high probability of occurrence with a relatively short code and that represent areas of lower probability of occurrence with a relatively long code. The output of VLC 320 comprises the base-layer video stream.
Inverse quantization circuit 335 de-quantizes the output of quantization circuit 315 to produce a signal that represents the transform input to quantization circuit 315. This signal comprises the reconstructed base layer DCT coefficients. As is well known, the inverse quantization process is a “lossy” process, since the bits lost in the division performed by quantization circuit 315 are not recovered. Inverse discrete cosine transform (IDCT) circuit 340 decodes the output of inverse quantization circuit 335 to produce a signal which provides a frame representation of the original video signal, as modified by the transform and quantization processes.
Adder (or combiner) 345 combines the output of motion compensation circuit 325 with the output of IDCT circuit 340. The output of adder 345 is one of the inputs to motion compensation circuit 325. Motion compensation circuit 325 uses the frame data from adder 345 as the input reference signal for determining motion changes in the original input video signal.
Adder (or combiner) 350 receives the original video signal and substracts the reconstructed base layer frame information from adder 345. This gives difference data that represents the enhancement layer information. Discrete cosine transform (DCT) circuit 355 receives the resultant output from adder 350 and transforms it from a spatial domain to a frequency domain. The DCT outputs are shifted by bitplane shift circuit 350. Finally, VLC 365 receives the shifted DCT coefficients and further compresses the data using variable-length coding techniques. The output of VLC 365 comprises the enhancement-layer video stream.
VLD 405 receives the transmitted base layer video stream. VLD 405, inverse quantization circuit 410, inverse discrete cosine transform (IDCT) 415, adder 420 and motion compensation circuit 425 essentially reverse the processing performed by adder 305, DCT 310, quantization circuit 315, VLC 320 and motion compensation circuit 325 in
VLD 430 receives the transmitted enhancement layer video stream. VLD 430, bitplane shift circuit 435 and inverse discrete cosine transform (IDCT) circuit 440 essentially reverse the processing performed by DCT circuit 355, bitplane shift circuit 360, and VLC 365 in
In conventional FGS encoder 300, an input video sequence is encoded such that the base layer has a specified data rate at which the quality of the decoded video is lower than that of the original source. Nevertheless, the base layer conforms to a digital video coding standard (such as MPEG-4) and can thereby be independently decoded and displayed. The enhancement layer data is encoded such that the residual information (i.e., the difference between the original video and the decoded base layer) is transmitted in order of decreasing bit significance. In other words, the most significant bit of this residual data is transmitted for an entire video image, followed by the second-most significant, followed by the third-most significant bit, and so forth.
This allows the enhancement layer to be truncated at any point within a video image, depending upon the available network bandwidth. Less transmitted data results in lower video quality. However, all of the data that is actually transmitted data may be used for improving video quality above that of the base layer alone.
Conventional FGS coding is performed in conjunction with the digital encoding of a source video sequence according to the standard (e.g., MPEG-4) used for the base layer. The residual video is encoded in the spatial frequency domain using the Discrete Cosine Transform (DCT) and is subsequently arranged in order of decreasing bit-plane significance. Such encoding requires the base-layer data rate to be specified and is thereby performed as part of the source sequence encoding. FGS coding of digital video, such as on a DVD or transmitted over a satellite or digital cable service, requires trans-coding or decoding of the digital video partially followed by re-encoding at a lower data rate for the base layer and simultaneous coding of the residual video for the enhancement layer. This procedure often proves difficult to perform in real time.
A layered video scheme, such as fine granular scalability (FGS), offers the advantage of always providing the full quality of the original video whenever sufficient bandwidth is available to transmit and receive all of the base layer information and the enhancement layer information. FGS only degrades when the full enhancement layer cannot be transmitted. Consequently, the trans-rating of a first video stream having a higher data rate to a second video stream (which serves as a base layer) having a lower rate and the simultaneous coding of the residual between the higher-rate and lower-rate streams permit the methods of trans-rating and FGS layered coding to be combined. This also allows taking advantage of prioritized streaming technologies to leverage MAC layer QoS support defined in IEEE 802.11e to achieve better and faster adaptation to the varying channel conditions.
In the present invention, the trans-coded video stream and the original stream are both decoded to generate the FGS layer stream in such a manner that no additional encoding is required beyond the FGS layer itself (i.e., no re-encoding of the base layer is necessary). In a digital video coding method where motion estimation and compensation are used in the video compression, inaccurate decoding can result in prediction drift, since a video image can serve as a reference for decoding a subsequently-transmitted image.
In conventional FGS encoding, the residual video for the enhancement layer is computed after the base-layer coding, which includes motion prediction. This allows the base layer to be decoded with no prediction drift in the absence of the enhancement layer. However, trans-rating of a video stream results in a video stream whose DCT coefficients have been re-quantized. When decoded, the DCT coefficients could have different values than were used for the original motion encoding and thereby cause prediction drift.
If a video stream is trans-rated to a reduced-rate stream that serves as the base layer for an FGS layered stream, the original stream must be fully decoded, along with the trans-coded stream, before the FGS enhancement layer can be encoded. However, the FGS base layer has some prediction drift when decoded without an enhancement layer. When the latter is fully present, however, its encoding relative to the original stream ensures that the quality of the decoded images is identical to that obtained by decoding the original video stream. In particular, the effects of prediction drift introduced by the trans-rating will not be present.
This method has the advantage of using only standard decoders, but does not require encoders, which are much more complicated and, depending upon the encoding method and parameters, may result in lower image quality in applications where an inexpensive encoder is desired. Another advantage is that this method can work with any trans-rating scheme, so that any conventional trans-rater may be used.
Since FGS enhancement-layer coding is fairly straightforward, the present invention permits effective and economical real-time trans-rating of a digital video stream into a base-layer of a desired data rate and a corresponding FGS enhancement layer. If a trans-rater that accepts analog or pixel domain input is used, MPEG decoder 505 for the original video stream is not required and may be replaced by the appropriate converter to the video format required by FGS enhancement layer encoder 510.
Although FGS encoding is conventionally performed such that the residual is computed in the picture domain and relative to the prediction-coded base layer, it has been demonstrated that, in an FGS encoder, the residual may instead be computed in the DCT coefficient domain using the pre-quantized DCT and the subsequently de-quantized DCT in the motion prediction loop of the base-layer encoder. This eliminates the DCT operation otherwise required for the FGS enhancement-layer encoding. The decoded video that results from a stream encoded in this manner differs very slightly in the picture domain from that of one encoded using the conventional FGS method shown in
This result may be used to simplify the FGS trans-coding method, as shown in
Re-quantization coefficients block 650 determines new (or re-quantization) coefficients suited to the new, lower video data rate (i.e., video data rate conversion ratio). Quantization circuit 615 uses the re-quantization coefficients to re-quantize the output of inverse quantization circuit 610 at the new data rate R2, thereby producing a stream of re-quantized DCT coefficients at rate R2. VLC 620 then encodes the re-quantized DCT coefficients to produce a base layer video stream at the desired low-rate, R2.
Inverse quantization circuit 635 receives the re-quantized DCT coefficients from quantization circuit 615 and produces de-quantized DCT coefficients at rate R2. Adder (or combiner) 630 subtracts the output of inverse quantization circuit 635 from the output of inverse quantization circuit 610, thereby producing a residual signal. The residual signal is shifted by bitplane shift circuit 640 and then encoded by VLC 645. The coded output of VLC 645 comprises the FGS enhancement layer video stream.
In this arrangement, the residual is computed directly from the de-quantized coefficients in the base-layer trans-rater and the de-quantization of the same re-quantized coefficient in the trans-rater. Such a scheme eliminates the need for both decoders, requiring only a base-layer trans-coder of the type described above and an FGS enhancement-layer coder in the DCT coefficient domain that further eliminates the need for its DCT computation.
Unlike the prior art methods, the present invention introduces prediction drift into both the base and enhancement layers due to the effects of trans-rating and of performing the FGS residual computation in the DCT domain. Consequently, it is best suited for applications in which the number of pictures and especially the number of reference pictures (MPEG I or P pictures) in a Group of Pictures (GOP) is always small enough that the accumulated prediction error will be imperceptible or at least not objectionable.
While this disclosure has described certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure, as defined by the following claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB05/54131 | 12/8/2005 | WO | 00 | 6/8/2007 |
Number | Date | Country | |
---|---|---|---|
60635212 | Dec 2004 | US |