Method and apparatus for accomplishing multiple description coding for video

Abstract
A method and apparatus for utilizing temporal prediction and motion compensated prediction to accomplish multiple description video coding is disclosed. An encoder receives a sequence of video frames and divides each frame into non-overlapping macroblocks. Each macroblock is then encoded using either an intraframe mode (I-mode) or a prediction mode (P-mode) technique. Both the I-mode and the P-mode encoding techniques produce an output for each of n channels used to transmit the encoded video data to a decoder. The P-mode technique generates at least n+1 prediction error signals for each macroblock. One of the at least n+1 P-mode prediction error signals is encoded such that it may be utilized to reconstruct the original sequence of video frames regardless of the number of channels received by the decoder. A component of the one of the at least n+1 P-mode prediction error signals is sent on each of the n channels. Each of the remaining at least n+1 P-mode prediction error signals is sent on a separate one of the n channels (along with the above mentioned component). These remaining at least n+1 P-mode prediction error signals are encoded such that, when combined with the component of the one P-mode prediction error signal which was sent on the same channel, a reasonably good reconstruction of the original sequence of video frames may be obtained if the number of received channels is between 1 and n−1.
Description




FIELD OF THE INVENTION




The present invention relates to video coding. More particularly, the present invention relates to a method for utilizing temporal prediction and motion compensated prediction to accomplish multiple description video coding.




BACKGROUND




Most of today's video coder standards use block-based motion compensated prediction because of its success in achieving a good balance between coding efficiency and implementation complexity.




Multiple Description Coding (“MDC”) is a source coding method that increases the reliability of a communication system by decomposing a source into multiple bitstreams and then transmitting the bitstreams over separate, independent channels. An MDC system is designed so that, if all channels are received, a very good reconstruction can be made. However, if some channels are not received, a reasonably good reconstruction can still be obtained. In commonly assigned U.S. patent application Ser. No. 08/179,416, a generic method for MDC using a pairwise correlating transform referred to as (“MDTC”) is described. This generic method is designed by assuming the inputs are a set of Gaussian random variables. A method for applying this method for image coding is also described. A subsequent and similarly commonly assigned U.S. Provisional Application Ser. No. 60/145,937, describes a generalized MDTC method. Papers describing MDC-related work include: Y. Wang et al., “Multiple Description Image Coding for Noisy Channels by Pairing Transform Coefficients,” in


Proc. IEEE


1997


First Workshop on Multimedia Signal Processing


, (Princeton, N.J.), June, 1997; M. T. Orchard et al., “Redundancy Rate Distortion Analysis of Multiple Description Image Coding Using Pairwise Correlating Transforms,” in


Proc. ICIP


97, (Santa Barbara, Calif.), October, 1997; Y. Wang et al., “Optimal Pairwise Correlating Transforms for Multiple Description Coding,” in


Proc. ICIP


98, (Chicago, Ill.), October 1998; and V. A. Vaishampayan, “Design of Multiple Description Scalar Quantizer,” in


IEEE Trans. Inform. Theory


, vol. 39, pp. 821-834, May 1993.




Unfortunately, in existing video coding systems when not all of the bitstream data sent over the separate channels is received, the quality of the reconstructed video sequence suffers. Likewise, as the amount of the bitstream data that is not received increases the quality of the reconstructed video sequence that can be obtained from the received bitstream decreases rapidly.




Accordingly, there is a need in the art for a new approach for coding a video sequence into two descriptions using temporal prediction and motion compensated prediction to improve the quality of the reconstructions that can be achieved when only one of the two descriptions is received.




SUMMARY OF THE INVENTION




Embodiments of the present invention provide a block-based motion-compensated predictive coding framework for realizing MDC, which includes two working modes: Intraframe Mode (I-mode) and Prediction Mode (P-mode). Coding in the P-mode involves the coding of the prediction errors and estimation/coding of motion. In addition, for both the I-mode and P-mode, the MDTC scheme has been adapted to code a block of Discrete Cosine Transform (“DCT”) coefficients.




Embodiments of the present invention provide a system and method for encoding a sequence of video frames. The system and method receive the sequence of video frames and then divide each video frame into a plurality of macroblocks. Each macroblock is then encoded using at least one of the I-mode technique and the P-mode technique, where, for n channels the prediction mode technique generates at least n+1 prediction error signals for each block. The system and method then provide the I-mode technique encoded data and the at least n+1 P-mode technique prediction error signals divided between each of the n channels being used to transmit the encoded video frame data.











BRIEF DESCRIPTION OF THE DRAWINGS





FIG. 1

provides a block diagram of the overall framework for Multiple Description Coding (“MDC”) of video using motion compensated prediction.





FIG. 2

provides a block diagram of the framework for MDC in P-mode.





FIG. 3A

provides a block diagram of the general framework for the MDC Prediction Error (“MDCPE”) encoder of FIG.


2


.





FIG. 3B

provides a block diagram of the general framework for the MDCPE decoder of FIG.


2


.





FIG. 4

provides a block diagram of an embodiment of the MDCPE encoder of FIG.


3


A.





FIG. 5

provides a block diagram of another embodiment of the MDCPE encoder of FIG.


3


A.





FIG. 6

provides a block diagram of another embodiment of the MDCPE encoder of FIG.


3


A.





FIG. 7

provides a block diagram of an embodiment of multiple description motion estimation and coding (“MDMEC”) using spatial interleaving of the present invention.





FIG. 8

is a block diagram of an embodiment of an odd-even block encoding of a macroblock in the present invention.





FIG. 9

is a flow diagram representation of an embodiment of the encoder operations of the present invention.





FIG. 10

is a flow diagram representation of an embodiment of the decoder operations of the present invention when the decoder receives two coded descriptions of a video frame.





FIG. 11

is a flow diagram representation of another embodiment of the decoder operations of the present invention when the decoder only receives one coded description of a video frame.











DETAILED DESCRIPTION




The Overall Coding Framework




In accordance with an embodiment of the present invention, a multiple description (“MD”) video coder is developed using the conventional block-based motion compensated prediction. In this embodiment, each video frame is divided into non-overlapping macroblocks which are then coded in either the I-mode or the P-mode. In the I-mode, the color values of each of the macroblocks are directly transformed using a Discrete Cosine Transform (“DCT”) and the resultant quantized DCT coefficients are then entropy coded. In the P-mode, a motion vector which describes the displacement between the spatial position of the current macroblock and the best matching macroblock, is first found and coded. Then the prediction error is coded using the DCT. Additional side information describing the coding mode and relevant coding parameters is also coded.




An embodiment of an overall MDC framework of the present invention is shown in FIG.


1


and is similar to the conventional video coding scheme using block-based motion compensated predictive coding. In

FIG. 1

, an input analog video signal is received in an analog-to-digital (“A/D”) converter (not shown) and each frame from the input analog video signal is digitized and divided into non-overlapping blocks of approximately uniform size as illustrated in FIG.


8


. Although shown as such in

FIG. 8

, the use of non-overlapping macroblocks of approximately uniform size is not required by the present invention and alternative embodiments of the present invention are contemplated in which non-overlapping macroblocks of approximately uniform size are not used. For example, in one contemplated alternative embodiment, each digitized video frame is divided into overlapping macroblocks having non-uniform sizes. Returning to

FIG. 1

, each input macroblock X


100


is input to a mode selector


110


and then the mode selector selectively routes the input macroblock X


100


for coding in one of the two modes using switch


112


by selecting either channel


113


or channel


114


. Connecting switch


112


to channel


113


enables I-mode coding in an I-mode MDC


120


, and connecting switch


112


with channel


114


enables P-mode coding in a P-mode MDC


130


. In the I-mode MDC


120


, the color values of the macroblock are coded directly into two descriptions, description 1


122


and description 2


124


, using either the MDTC method; the generalized MDTC method described in co-pending U.S. patent application Ser. No. 08/179,416; Vaishampayan's Multiple Description Scalar Quantizer (“MDSQ”); or any other multiple description coding technique. In the P-mode MDC


130


, the macroblock is first predicted from previously coded frames and two (2) descriptions are produced, description 1


132


and description 2


134


. Although shown as being output on separate channels, embodiments of the present invention are contemplated in which the I-mode description 1


122


and the P-mode description 1


132


are output to a single channel. Similarly, embodiments are contemplated in which the I-mode description 2


124


and the P-mode description 2


134


are output to a single channel.




In

FIG. 1

, the mode selector


110


is connected to a redundancy allocation unit


140


and the redundancy allocation unit


140


communicates signals to the mode selector


110


to control the switching of switch


112


between channel


113


for the I-mode MDC


120


and channel


114


for the P-mode MDC


130


. The redundancy allocation unit


140


is also connected to the I-mode MDC


120


and the P-mode MDC


130


to provide inputs to control the redundancy allocation between motion and prediction error. A rate control unit


150


is connected to the redundancy allocation unit


140


, the mode selector


110


, the I-mode MDC


120


and the P-mode MDC


130


. A set of frame buffers


160


is also connected to the mode selector


110


for storing previously reconstructed frames from the P-mode MDC


130


and for providing macroblocks from the previously reconstructed frames back to the P-mode MDC


130


for use in encoding and decoding the subsequent macroblocks.




In an embodiment of the present invention, a block-based uni-directional motion estimation method is used, in which, the prediction macroblock is determined from a previously decoded frame. Two types of information are coded: i) the error between the prediction macroblock and the actual macroblock, and ii) the motion vector, which describes the displacement between the spatial position of the current macroblock and the best matching macroblock. Both are coded into two descriptions. Because the decoder may have either both descriptions or one of the two descriptions, the encoder has to take this fact into account in coding the prediction error. The proposed framework for realizing MDC in the P-mode is described in more detail below.




Note that the use of I-mode coding enables the system to recover from an accumulated error due to the mismatch between the reference frames used in the encoder for prediction and that available at the decoder. The extra number of bits used for coding in the I-mode, compared to using the P-mode, is a form of redundancy that is intentionally introduced by the coder to improve the reconstruction quality when only a single description is available at the decoder. In conventional block-based video coders, such as an H.263 coder, described in ITU-T, “Recommendation H.263 Video Coding for Low Bitrate Communication,” July 1995, the choice between I-mode and P-mode is dependent on which mode uses fewer bits to produce the same image reconstruction quality. For error-resilience purposes, I-mode macroblocks are also inserted periodically, but very sparsely, for example, in accordance with an embodiment of the present invention, one I-mode macroblock is inserted after approximately ten to fifteen P-mode macroblocks. The rate at which the I-mode macroblocks are inserted is highly dependent on the video being encoded and, therefore, the rate at which the I-mode macroblocks are inserted is variably controlled by the redundancy allocation unit


140


for each video input stream. In applications requiring a constant output rate, the rate control component


150


regulates the total number of bits that can be used on a frame-by-frame basis. As a result, the rate control component


150


influences the choice between the I-mode and the P-mode. In an embodiment of the present invention, the proposed switching between I-mode and P-mode depends not only on the target bit rate and coding efficiency but also on the desired redundancy. As a result of this redundancy dependence, the redundancy allocation unit


140


, which, together with the rate control unit


150


, determines, i) on the global level, redundancy allocation between I-mode and P-mode; and ii) for every macroblock, which mode to use.




P-mode Coding. In general, the MDC coder in the P-mode will generate two descriptions of the motion information and two descriptions of the prediction error. A general framework for implementing MDC in the P-mode is shown in FIG.


2


. In

FIG. 2

, the encoder has three separate frame buffers (“FB”), FB


0




270


, FB


1




280


and FB


2




290


, for storing previously reconstructed frames from both descriptions (ψ


o,k−m


), description one (ψ


1,k−m


), and description two (ψ


2,k−m


), respectively. Here, k represents the current frame time, k−m, m=1, 2, . . . , k, the previous frames up to frame


0


. In this embodiment, prediction from more than one of the previously coded frames is permitted. In

FIG. 2

, a Multiple Description Motion Estimation and Coding (“MDMEC”) unit


210


receives as an initial input macroblock X


100


to be coded at frame k. The MDMEC


210


is connected to the three frame buffers FB


0




270


, FB


1




280


and FB


2




290


and the MDMEC


210


receives macroblocks from the previously reconstructed frames stored in each frame buffer. In addition, the MDMEC


210


is connected to a redundancy allocation unit


260


which provides an input motion and prediction error redundancy allocation to the MDMEC


210


to use to generate and output two coded descriptions of the motion information, {tilde over (m)}


1


and {tilde over (m)}


2


. The MDMEC


210


is also connected to a first Motion Compensated Predictor


0


(“MCP


0


”)


240


, a second Motion Compensated Predictor


1


(“MCP


1


”)


220


and a third Motion Compensated Predictor


2


(“MCP


2


”)


230


. The two coded descriptions of the motion information, {tilde over (m)}


1


and {tilde over (m)}


2


are transmitted to the MCP


0




240


, which generates and outputs a predicted macroblock P


0


based on {tilde over (m)}


1


, {tilde over (m)}


2


and macroblocks from the previously reconstructed frames from the descriptions ψ


i,k−m


, where i=0,1,2, which are provided by frame buffers FB


0




270


, FB


1




280


and FB


2




290


. Similarly, MCP


1




220


generates and outputs a predicted macroblock P


1


based on {tilde over (m)}


1


from the MDMEC


210


and a macroblock from the previously reconstructed frame from description one (ψ


1,k−m


) from FB


1




280


. Likewise, MCP


2




230


generates and outputs a predicted macroblock P


2


based on {tilde over (m)}


2


from the MDMEC


210


and a macroblock from the previously reconstructed frame from description two (ψ


2,k−m


) from FB


2




290


. In this general framework, MCP


0




240


can make use of ψ


1,1,k−m


and ψ


2,k−m


in addition to ψ


o,k−m


MCP


0




240


, MCP


1




220


and MCP


2




230


are each connected to a multiple description coding of prediction error (“MDCPE”)” unit


250


and provide predicted macroblocks P


0


, P


1


and P


2


, respectively, to the MDCPE


250


. The MDCPE


250


is also connected to the redundancy allocation unit


260


and receives as input the motion and prediction error redundancy allocation. In addition, the MDCPE


250


also receives the original input macroblock X


100


. The MDCPE


250


generates two coded descriptions of prediction error, {tilde over (E)}


1


and {tilde over (E)}


2


, based on input macroblock X


100


, P


0


P


1


, P


2


and the motion and prediction error redundancy allocation. Description one


132


, in

FIG. 1

, of the coded video consists of {tilde over (m)}


1


and {tilde over (E)}


1


for all the macroblocks. Likewise, description two


134


, in

FIG. 1

, consists of {tilde over (m)}


2


and {tilde over (E)}


2


for all the macroblocks. Exemplary embodiments of the MDMEC


210


and MDCPE


250


are described in the following sections.




Multiple Description Coding of Prediction Error (MDCPE)




The general framework of a MDCPE encoder implementation is shown in FIG.


3


A. First, the prediction error in the case when both descriptions are available, F=X−P


0


, is coded into two descriptions {tilde over (F)}


1


and {tilde over (F)}


2


. In

FIG. 3A

, predicted macroblock P


0


is subtracted from input macroblock X


100


in an adder


306


and a both description side prediction error F


0


is input to an Error Multiple Description Coding (“EMDC”) Encoder


330


. The encoding is accomplished in the EMDC Encoder


330


using, for example, MDTC or MDC. To deal with the case when only the i-th description is received (that is where i=1 or 2) either an encoder unit one (“ENC


1


”)


320


or an encoder unit two (“ENC


2


”)


310


takes either pre-run length coded coefficients, Δ{tilde over (C)}


n


, Δ{tilde over (D)}


n


, respectively, and a description i side prediction error E


i


, where E


i


=X−P


i


, and produces a description i enhancement stream {tilde over (G)}


i


. {tilde over (G)}


i


together with {tilde over (F)}


i


form a description i. Embodiments of the encoders ENC


1




320


and ENC


2




310


are described in reference to

FIGS. 3A

,


4


,


5


,


6


and


7


. As shown in

FIG. 3A

, P


2


is subtracted from input macroblock X


100


by an adder


302


and a description two side prediction error E


2


is output. E


2


and Δ{tilde over (D)}


n


are then input to ENC


2




310


and a description two enhancement stream {tilde over (G)}


2


is output. Similarly, P


1


is subtracted from input macroblock X


100


in an adder


304


and a description one side prediction error E


1


is output. E


1


and Δ{tilde over (C)}


n


are then input to ENC


1




320


and a description one enhancement stream {tilde over (G)}


1




322


is output. In an alternate embodiment (not shown), Δ{tilde over (C)}


n


and Δ{tilde over (D)}


n


, are determined from {tilde over (F)}


1


and {tilde over (F)}


2


by branching both of the {tilde over (F)}


1


and {tilde over (F)}


2


output channels to connect with ENC


1




320


and ENC


2




310


, respectively. Before the branches connect to ENC


1




320


and ENC


2




310


, they each pass through separate run length decoder units to produce Δ{tilde over (C)}


n


and Δ{tilde over (D)}


n


, respectively. As will be seen in the description referring to

FIG. 4

, this alternate embodiment requires two additional run length decoders to decode {tilde over (F)}


1


and {tilde over (F)}


2


to obtain Δ{tilde over (C)}


n


and Δ{tilde over (D)}


n


, which had just been encoded into {tilde over (F)}


1


and {tilde over (F)}


2


in EMDC encoder


320


.




In the decoder, shown in

FIG. 3B

, if both descriptions, that is, {tilde over (F)}


1


and {tilde over (F)}


2


, are available, an EMDC decoder unit


360


generates {circumflex over (F)}


0


from inputs {tilde over (F)}


1


and {tilde over (F)}


2


, where {circumflex over (F)}


0


represents the reconstructed F from both {tilde over (F)}


1


and {tilde over (F)}


2


. {circumflex over (F)}


0


is then added to P


0


in an adder


363


to generate a both description recovered macroblock {circumflex over (X)}


0


. {circumflex over (X)}


0


is defined as {circumflex over (X)}


0


=P


0


+{circumflex over (F)}


0


. When both descriptions are available, enhancement streams {tilde over (G)}


1


and {tilde over (G)}


2


are not used. When only description one is received, a first side decoder (“DEC


1


”)


370


, produces Ê


1


from inputs Δ{tilde over (C)}


n


and {tilde over (G)}


1


and then Ê


1


is added to P


1


in an adder


373


to generate a description one recovered macroblock {circumflex over (X)}


1


. The description one recovered macroblock is defined as {circumflex over (X)}


1


=P


1





1


. When only description two is received, a second side decoder (“DEC


2


”)


380


, produces Ê


2


from inputs Δ{tilde over (D)}


n


and {tilde over (G)}


2


and then Ê


2


is added to P


2


in an adder


383


to generate a description two recovered macroblock {circumflex over (X)}


2


. The description two recovered macroblock, {circumflex over (X)}


2


, is defined as {circumflex over (X)}


2


=P


2





2


. Embodiments of the decoders DEC


1




370


and DEC


2




380


are described in reference to

FIGS. 3B

,


4


,


5


,


6


and


7


. As with the encoder in

FIG. 3A

, in an alternate embodiment of the decoder (not shown), Δ{tilde over (C)}


n


and Δ{tilde over (D)}


n


are determined from {tilde over (F)}


1


and {tilde over (F)}


2


by branching both of the {tilde over (F)}


1


and {tilde over (F)}


2


output channels to connect with ENC


1




320


and ENC


2




310


, respectively. Before the branches connect to ENC


1




320


and ENC


2




310


, they each pass through separate run length decoder units (not shown) to produce Δ{tilde over (C)}


n


and Δ{tilde over (D)}


n


, respectively. As with the alternate embodiment for the encoder described above, this decoder alternative embodiment requires additional run length decoder hardware to extract Δ{tilde over (C)}


n


and Δ{tilde over (D)}


n


from {tilde over (F)}


1


and {tilde over (F)}


2


just before Δ{tilde over (C)}


n


and Δ{tilde over (D)}


n


are extracted from {tilde over (F)}


1


and {tilde over (F)}


2


in EMDC decoder


360


.




Note that in this framework, the bits used for G


i


, i=1,2 are purely redundancy bits, because they do not contribute to the reconstruction quality when both descriptions are received. This portion of the total redundancy, denoted by ρ


e,2


can be controlled directly by varying the quantization accuracy when generating G


i


. The other portion of the total redundancy, denoted by ρ


e,1


, is introduced when coding Fusing the MDTC coder. Using the MDTC coder enables this redundancy to be controlled easily by varying the transform parameters. The redundancy allocation unit


260


manages the redundancy allocation between ρ


e,2


and ρ


e,1


for a given total redundancy in coding the prediction errors.




Based on this framework, alternate embodiments have been developed, which differ in the operations of ENC


1




320


/DEC


1




370


and ENC


2




310


/DEC


2




380


. While the same type of EMDC encoder


330


and EMDC decoder


380


described in

FIGS. 3A and 3B

are used, the way in which {tilde over (G)}


i


is generated by ENC


1




320


and ENC


2




310


is different in each of the alternate embodiments. These alternate embodiments are described below in reference to

FIGS. 4

,


5


and


6


.




Implementation of the EMDC ENC


1


and ENC


2


Encoders





FIG. 4

provides a block diagram of an embodiment of multiple description coding of prediction error in the present invention. In

FIG. 4

, an MDTC coder is used to implement the EMDC encoder


330


in FIG.


3


A. In

FIG. 4

, for each 8×8 block of central prediction error P


0


is subtracted from the corresponding 8×8 block from input macroblock X


100


in an adder


306


to produce E


0


and then E


0


is input to the DCT unit


425


which performs DCT and outputs N≦64 DCT coefficients. A pairing unit


430


receives the N≦64 DCT coefficients from the DCT unit


425


and organizes the DCT coefficients into N/2 pairs (Ã


n


, {tilde over (B)}


n


) using a fixed pairing scheme for all frames. The N/2 pairs are then input with an input, which controls the rate, from a rate and redundancy allocation unit


420


to a first quantizer one (“Q


1


”) unit


435


and a second Q


1


unit


440


. The Q


1


units


435


and


440


, in combination, produce quantized pairs (ΔÃ


n


, Δ{tilde over (B)}


n


). It should be noted that both N and the pairing strategy are determined based on the statistics of the DCT coefficients and the k-th largest coefficient is paired with the (N−k)-th largest coefficient. Each quantized pair (ΔÃ


n


, Δ{tilde over (B)}


n


) is then input with a transform parameter β


n


, which controls a first part of the redundancy, from the rate and redundancy allocation unit


420


to a Pairwise Correlating Transform (“PCT”) unit


445


to produce the coefficients (Δ{tilde over (C)}


n


, Δ{tilde over (D)}


n


), which are then split into two sets. The unpaired coefficients are split even/odd and appended to the PCT coefficients. The coefficients in each set, Δ{tilde over (C)}


n


and Δ{tilde over (D)}


n


, are then run length and Huffman coded in run length coding units


450


and


455


, respectively, to produce {tilde over (F)}


1


and {tilde over (F)}


2


. Thus, {tilde over (F)}


1


contains Δ{tilde over (C)}


n


in coded run length representation, and {tilde over (F)}


2


contains Δ{tilde over (D)}


n


in coded run length representation. In the following, three different embodiments for obtaining {tilde over (G)}


1


from

FIG. 3A

are described. For ease of description, in the descriptions related to the detailed operation of the ENC


1




320


and ENC


2




310


in

FIGS. 4

,


5


and


6


, components in ENC


2




310


which are analogous to components in ENC


1




320


are denoted as primes. For example, in

FIG. 4

, ENC


1




320


has a DCT component


405


for calculating {tilde over (G)}


1


and ENC


2




310


has an analogous DCT component


405


′ for calculating {tilde over (G)}


2


.




In accordance with an embodiment of the present invention, shown in

FIG. 4

, the central prediction error {tilde over (F)}


1


is reconstructed from Δ{tilde over (C)}


n


and Δ{tilde over (C)}


n


is also used to generate {tilde over (G)}


1


. To generate {tilde over (G)}


1


, Δ{tilde over (C)}


n


from PCT unit


445


is input to an inverse quantizer (“Q


1




Γ


”)


460


and dequantized C coefficients, ΔĈ


n


are output. A linear estimator


465


receives the ΔĈ


n


and outputs estimated DCT coefficients ΔÂ


n




1


and Δ{circumflex over (B)}


n




1


. ΔÂ


n




1


and Δ{circumflex over (B)}


n




1


which are then input to inverse pairing unit


470


which converts the N/2 pairs into DCT coefficients and outputs the DCT coefficients to an inverse DCT unit


475


which outputs {circumflex over (F)}


1


to an adder


403


. P


1


is subtracted from each corresponding 8×8 block from input macroblock X


100


in the adder


302


and the adder


302


outputs E


1


to the adder


403


. {circumflex over (F)}


1


is subtracted from E


1


in the adder


403


and G


1


is output. In the absence of any additional information, the reconstruction from description one alone will be P


1


+{circumflex over (F)}


1


. To allow for a more accurate reconstruction, G


1


is defined as G


1


=X−P


1


−{circumflex over (F)}


1


, and G


1


is coded into {tilde over (G)}


1


using conventional DCT coding. That is, G


1


is DCT transformed in a DCT coder


405


to produce DCT coefficients for G


1


. The DCT coefficients are then input to a quantizer two (“Q


2


”)


410


, quantized with an input, which controls a second part of redundancy, from the rate and redundancy unit


420


in Q


2




410


and the quantized coefficients are output from Q


2




410


to a run length coding unit


415


. The quantized coefficients are then run length coded in run length coding unit


415


to produce the description one enhancement stream {tilde over (G)}


1


.




Also shown in

FIG. 4

, the central prediction error {tilde over (F)}


2


is reconstructed from Δ{tilde over (D)}


n


and Δ{tilde over (D)}


n


is also used to generate {tilde over (G)}


2


. To generate {tilde over (G)}


2


, Δ{tilde over (D)}


n


from PCT unit


445


′ is input to Q


1




Γ




460


′ and dequantized D coefficients, Δ{circumflex over (D)}


n


are output. A linear estimator


465


′ receives the Δ{tilde over (D)}


n


and outputs estimated DCT coefficients ΔÂ


n




2


and Δ{circumflex over (B)}


n




2


, ΔÂ


n




2


and Δ{circumflex over (B)}


n




2


are then input to inverse pairing unit


470


′ which converts the N/2 pairs into DCT coefficients and outputs the DCT coefficients to an inverse DCT unit


475


′ which outputs {circumflex over (F)}


2


to an adder


403


′. P


2


is subtracted from each corresponding 8×8 block from input macroblock X


100


in the adder


304


and the adder


304


outputs E


2


to the adder


403


′. {circumflex over (F)}


2


is subtracted from E


2


in the adder


403


′ and G


2


is output. In the absence of any additional information, the reconstruction from description two alone will be P


2


+{circumflex over (F)}


2


. To allow for a more accurate reconstruction, G


2


is defined as G


2=X−P




2


−{circumflex over ({circumflex over (F)})}


2


, and G


2


is coded into {tilde over (G)}


2


using conventional DCT coding. That is, G


2


is DCT transformed in a DCT coder


405


′ to produce DCT coefficients for G


2


. The DCT coefficients are then input to Q


2




410


′, quantized with an input from the rate and redundancy unit


420


in Q


2




410


′ and the quantized coefficients are output from Q


2




410


′ to a run length coding unit


415


′. The quantized coefficients are then run length coded in run length coding unit


415


′ to produce the description two enhancement stream {tilde over (G)}


2


.




In accordance with the current embodiment of the present invention, the EMDC decoder


360


in

FIG. 3B

is implemented as an inverse circuit of the EMDC encoder


330


described in FIG.


4


. With the exception of the rate and redundancy unit


420


, all of the other components described have analogous inverse components implemented in the decoder. For example, in the EMDC decoder, if only description one is received, the same operation as described above for the encoder is used to generate {circumflex over (F)}


1


from Δ{tilde over (C)}


n


. In addition, by inverse quantization and inverse DCT, the quantized version of G


1


, denoted by Ĝ


1


, is recovered from {tilde over (G)}


1


. The finally recovered block in this side decoder is X


1


, which is defined as X


1


=P


1


+{circumflex over (F)}


1





1


.




In the embodiment of

FIG. 4

, more than 64 coefficients are needed to be coded in the EMDC


330


and ENC


1




320


together. While the use of the 64 coefficients completely codes the mismatch error, G


1


, subject to quantization errors, it requires too many bits. Therefore, in accordance with another embodiment of the present invention, only 32 coefficients are coded when generating {tilde over (G)}


1


, by only including the error for the D coefficients. Likewise, only 32 coefficients are coded when generating {tilde over (G)}


2


, by only including C coefficients. Specifically, as shown in

FIG. 5

, DCT is applied to side prediction error E


1


in the DCT coder


405


, where E


1


=X−P


1


, and the same pairing scheme as in the central coder is applied to generate N pairs of DCT coefficients in pairing unit


510


.




As in

FIG. 4

, in

FIG. 5

, to implement the EMDC encoder


330


, a MDTC coder is used. For each 8×8 block of central prediction error, P


0


is subtracted from each corresponding 8×8 block from input macroblock X


100


in the adder


306


to produce E


0


and then E


0


is input to the DCT unit


425


which performs DCT on E


0


and outputs N≦64 DCT coefficients. In pairing unit


430


, the coder takes the N≦64 DCT coefficients from the DCT unit


425


and organizes them into N/2 pairs (Ã


n


, {tilde over (B)}


n


) using a fixed pairing scheme for all frames. The N/2 pairs are then input with an input from the rate and redundancy allocation unit


420


to the Q


1


quantizer units


435


and


440


, respectively, and Q


1


quantizer units


435


and


440


produce quantized pairs (ΔÃ


n


, Δ{tilde over (B)}


n


), respectively. It should be noted that both N and the pairing strategy are determined based on the statistics of the DCT coefficients and the k-th largest coefficient is paired with the (N−k)-th largest coefficient. Each quantized pair (ΔÃ


n


, Δ{tilde over (B)}


n


) is input with an input from the rate and redundancy allocation unit


420


to a PCT unit


445


with the transform parameter β


n


to produce the coefficients (Δ{tilde over (C)}


n


, Δ{tilde over (D)}


n


), which are then split into two sets. The unpaired coefficients are split even/odd and appended to the PCT coefficients.




In accordance with an embodiment of the present invention, shown in

FIG. 5

, an estimate of the central prediction error {tilde over (F)}


1


is reconstructed from Δ{tilde over (C)}


n


and Δ{tilde over (C)}


n


is also used to generate {tilde over (G)}


1


. To generate {tilde over (G)}


1


, {tilde over (C)}


n


from PCT unit


445


is input to Q


1




Γ




460


and dequantized C coefficients, ΔĈ


n


are output to a linear estimator


530


. The linear estimator


530


receives the ΔĈ


n


and outputs an estimated DCT coefficient {circumflex over (D)}


n




1


, which is input to an adder


520


. P


1


is subtracted from each corresponding 8×8 block from input macroblock X


100


in the adder


302


to produce side prediction error E


1


which is then input to conventional DCT coder


405


where DCT is applied to E


1


. The output of the DCT coder


405


is input to pairing unit


510


and the same pairing scheme as described above for pairing unit


430


is applied to generate N pairs of DCT coefficients. The N pairs of DCT coefficients are then input to a PCT unit


515


with transform parameter β


n


which generates only the D component,







D
n
1

.










Then,






D
n
1










is input to an adder


520


and







D
^

n
1










is subtracted from






D
n
1










and an error C


n







is output. The error C


n







, which is defined as








C
n


=


D
n
1

-


D
^

n
1



,










is input with an input from the rate and redundancy allocation unit


420


to Q


2




525


and quantized to produce a quantized error, Ĉ


n







. The {tilde over (C)}


n


coefficients from the PCT unit


515


and the quantized error Ĉ


n




195


are then together subjected to run-length coding in run length coding unit


450


to produce a resulting bitstream {tilde over (F)}


1


, {tilde over (G)}


1


, which constitutes {tilde over (F)}


1


and {tilde over (G)}


1


from FIG.


3


A.




Likewise, an estimate of the central prediction error {tilde over (F)}


2


is reconstructed from Δ{tilde over (D)}


n


and Δ{tilde over (D)}


n


is also used to generate {tilde over (G)}


2


. To generate {tilde over (G)}


2


, {tilde over (D)}


n


from PCT unit


445


′ is input to Q


1




Γ




460


′ and dequantized D coefficients, Δ{tilde over (D)}


n


are output to a linear estimator


530


′. The linear estimator


530


′ receives the Δ{tilde over (D)}


n


and outputs an estimated DCT coefficient








D
^

n
1

,










is input to an adder


520


′. P


2


is subtracted from each corresponding 8×8 block from input macroblock X


100


in the adder


304


to produce side prediction error E


2


which is then input to conventional DCT coder


405


where DCT is applied to E


2


. The output of the DCT coder


405


′ is input to pairing unit


510


′ and the same pairing scheme as described above for pairing unit


430


is applied to generate N pairs of DCT coefficients. The N pairs of DCT coefficients are then input to a PCT unit


515


′ with transform parameter β


n


which generates only the C component,







C
n
1

.










Then,






C
n
1










input to an adder


520


′ and







C
^

n
1










is subtracted from






C
n
1










and an error D


n







is output. The error D


n







, which is defined as








D
n


=


C
n
1

-


C
^

n
1



,










input with an input from the rate and redundancy allocation unit


420


to Q


2




525


′ and quantized to produce a quantized error, {circumflex over (D)}


n







. The {tilde over (D)}


n


coefficients from the PCT unit


515


′ and the quantized error {tilde over (D)}


n







are then together subjected to run-length coding in run length coding unit


450


′ to produce a resulting bitstream {tilde over (F)}


2


, {tilde over (G)}


2


, which constitutes {tilde over (F)}


2


and {tilde over (G)}


2


from FIG.


3


A.




In accordance with the current embodiment of the present invention, the DEC


1




370


from

FIG. 3B

is implemented as an inverse circuit of the ENC


1




320


described in FIG.


4


. With the exception of the rate and redundancy unit


420


, all of the other components described have analogous inverse components implemented in the decoder. For example, in the DEC


1




370


, if only description one is received, which includes, after run length decoding and dequantization, C


n


and








C
^

n


,










the PCT coefficients corresponding to the side prediction error can be estimated by









C
^

n
1

=


C
^

n


,



D
^

n
1

=




D
^

n
1



(


C
^

n

)


+



C
^

n


.













Then inverse PCT can be performed on









C
^

n
1






and







D
^

n
1


,










followed by inverse DCT to arrive at quantized prediction error Ê


1


. The finally recovered macroblock, X


1


, is reconstructed by adding P


1


and Ê


1


together, such that, X


1


=P


1





1


.




In another embodiment of the present invention, the strategy is to ignore the error in the side predictor and use some additional redundancy to improve the reconstruction accuracy for the D


n


in the central predictor. This is accomplished by quantizing and coding the estimation error for








C
n


=


Δ



D
^

n


-



D
^

n



(


C
^

n

)




,










as shown in FIG.


6


. This scheme is the same as the generalized PCT, where four variables are used to represent the initial pair of two coefficients




As in the previously described embodiments, in

FIG. 6

, to implement the EMDC encoder


330


, a MDTC coder is used. For each 8×8 block of central prediction error, P


0


is subtracted from each corresponding 8×8 block from input macroblock X


100


in the adder


306


to produce E


0


and then E


0


is input to the DCT unit


425


which performs DCT on E


0


and outputs N≦64 DCT coefficients. A pairing unit


430


receives the N≦64 DCT coefficients from the DCT unit


425


and organizes them into N/2 pairs (Ã


n


,{tilde over (B)}


n


) using a fixed pairing scheme for all frames. The N/2 pairs are then input with an input from the rate and redundancy allocation unit


420


to Q


1


quantizer units


435


and


440


, respectively, and Q


1


quantizer units


435


and


440


produce quantized pairs (ΔÃ


n


, Δ{tilde over (B)}


n


), respectively. It should be noted that both N and the pairing strategy are determined based on the statistics of the DCT coefficients and the k-th largest coefficient is paired with the (N−)-th largest coefficient. Each quantized pair (ΔÃ


n


, Δ{tilde over (B)}


n


) is input with an input from the rate and redundancy allocation unit


420


to the PCT unit


445


with the transform parameter β


n


to produce the PCT coefficients (Δ{tilde over (C)}


n


, Δ{tilde over (D)}


n


), which are then split into two sets. The unpaired coefficients are split even/odd and appended to the PCT coefficients.




In accordance with an embodiment of the present invention, shown in

FIG. 6

, {tilde over (C)}


n


is input to inverse quantizer Q


1




Γ




460


and dequantized C coefficients, ΔĈ


n


are output to a linear estimator


610


. The linear estimator


610


is applied to ΔĈ


n


to produce an estimated DCT coefficient {circumflex over (D)}


n


which is output to an adder


630


. Similarly, {tilde over (D)}


n


is input to a second inverse quantizer Q


1




Γ




620


and dequantized D coefficients, Δ{circumflex over (D)}


n


are also output to the adder


630


. Then, {circumflex over (D)}


n


is subtracted from Δ{circumflex over (D)}


n


in the adder


630


and the error C


n







is output. The error







C
n


=


Δ







D
^

n


-



D
^

n



(


C
^

n

)













is input with an input from the rate and redundancy allocation unit


420


to quantizer Q


2




640


and quantized to produce Ĉ


n







. The {tilde over (C)}


n


coefficients and the quantized error Ĉ


n







are then together subjected to run-length coding in run length coding unit


650


to produce the resulting bitstream {tilde over (F)}


1


, {tilde over (G)}


1


, which constitutes {tilde over (F)}


1


and {tilde over (G)}


1


from FIG.


3


A.




Similarly, in

FIG. 6

, {tilde over (D)}


n


is input to inverse quantizer Q


1




Γ




460


′ and dequantized D coefficients, Δ{circumflex over (D)}


n


are output to a linear estimator


610


′. The linear estimator


610


′ is applied to Δ{circumflex over (D)}


n


to produce an estimated DCT coefficient Ĉ


n


which is output to an adder


630


′. Similarly, {tilde over (C)}


n


is input to a second inverse quantizer Q


1




Γ




620


′ and dequantized C coefficients, ΔĈ


n


are also output to the adder


630


′. Then, Ĉ


n


is subtracted from ΔĈ


n


in the adder


630


′ and the error D


n







is output. The error D


n







is input with an input from the rate and redundancy allocation unit


420


to quantizer Q


2




640


′ and quantized to produce {circumflex over (D)}


n







. The {tilde over (D)}


n


coefficients and the quantized error







D
^

n











are then together subjected to run-length coding in run length coding unit


650


′ to produce the resulting bitstream {tilde over (F)}


2


, {tilde over (G)}


2


, which constitutes {tilde over (F)}


2


and {tilde over (G)}


2


from FIG.


3


A.




In accordance with the current embodiment of the present invention, the DEC


2


decoder


380


decoder from

FIG. 3B

is implemented as an inverse circuit of the ENC


2


encoder


310


described in FIG.


4


. With the exception of the rate and redundancy unit


420


, all of the other components described have analogous inverse components implemented in the decoder. For example, the DEC


2


decoder


380


operation is the same as in the DEC


1


decoder


370


embodiment, the recovered prediction error is actually a quantized version of F, so that X


2


=P


1


+{circumflex over (F)}. Therefore, in this implementation, the mismatch between P


0


and P


1


are left as is, and allowed to accumulate over time in successive P-frames. However, the effect of this mismatch is eliminated upon each new I-frame.




In all of the above embodiments, the quantization parameter in Q


1


controls the rate, the transform parameters β


n


controls the first part of redundancy ρ


e,1


and the quantization parameter in Q


2


controls the second part of redundancy ρ


e,2


. In each embodiment, these parameters are controlled by the rate and redundancy allocation component


420


. This allocation is performed based on a theoretical analysis of the trade-off between rate, redundancy, and distortion, associated with each implementation. In addition to redundancy allocation between ρ


e,1


and ρ


e,2


for a given P-frame, the total redundancy, ρ, among successive frames must be allocated. This is accomplished by treating coefficients from different frames as different coefficient pairs.




Multiple Description Motion Estimation and Coding (MDMEC)




In accordance with an embodiment of the present invention, illustrated in

FIG. 7

, in a motion estimation component


710


, conventional motion estimation is performed to find the best motion vector for each input macroblock X


100


. In an alternate embodiment (not shown) a simplified method for performing motion estimation is used in which the motion vectors from the input macroblock X


100


are duplicated on both channels.

FIG. 8

shows an arrangement of odd and even macroblocks within each digitized frame in accordance with an embodiment of the present invention. Returning to

FIG. 7

, the motion estimation component


710


is connected to a video input unit (not shown) for receiving the input macroblocks and to FB


0




270


(not shown) for receiving reconstructed macroblocks from previously reconstructed frames from both descriptions, ψ


o,k−1


. The motion estimation component


710


is also connected to a motion-encoder-


1




730


, an adder


715


and an adder


718


. Motion-encoder-


1




730


is connected to a motion-interpolator-


1




725


and the motion-interpolator-


1




725


is connected to the adder


715


. The adder


715


is connected to a motion-encoder-


3




720


. Similarly, motion-encoder-


2




735


is connected to a motion-interpolator-


2




740


and the motion-interpolator-


2




740


is connected to the adder


718


. The adder


718


is connected to a motion-encoder-


4




745


.




In

FIG. 7

, the motion vectors for the even macroblocks output from the motion estimation unit


710


, denoted by m


1







, are input to Motion-Encoder-


1




730


, and coded to yield {tilde over (m)}


1,1


and reconstructed motions {tilde over (m)}


1,1


. The reconstructed motions, {circumflex over (m)}


1,1


, are input to motion interpolator-


1




725


which interpolates motions in odd macroblocks from the coded ones in even macroblocks, and outputs m


2,p


to adder


715


. In adder


715


m


2,p


is subtracted from m


2 and m




1,2


is output, where m


2


was received from motion estimation unit


710


. m


1,2


is then input to motion encoder-


3




720


and {tilde over (m)}


1,2


is output. Similarly, motion vectors for the odd macroblocks, m


2


, are input to and coded by Motion-Encoder-


2




735


, and the coded bits and reconstructed motions denoted by {tilde over (m)}


2,1


and {circumflex over (m)}


2,1


, respectively, are output. There constructed motions, {circumflex over (m)}


2,1


, are input to motion interpolator-


2




740


which interpolates motions in even macroblocks from the coded ones in odd macroblocks, and outputs m


1,p


to adder


718


. In adder


718


m


1,p


is subtracted from m


1


and m


2,2


is output, where m


1


was received from motion estimation unit


710


. m


2,2


is then input to motion encoder-


4




745


and {tilde over (m)}


2,2


is output.




For a lossless description of motion, all of the four encoders involved should be lossless. An encoder is “lossless” when the decoder can create an exact reconstruction of the encoded signal, and an encoder is “lossy” when the decoder can not create an exact reconstruction of the encoded signal. In accordance with an embodiment of the present invention, lossless coding is used for m


1


and m


2


and lossy coding is used for m


1,2


and m


2,2


.




The bits used for coding m


1,2


and m


2,2


are ignored when both descriptions are received and, therefore, are purely redundancy bits. This part of the redundancy for motion coding is denoted by ρ


m,2


. The extra bits in independent coding of m


1


and m


2


, compared to joint coding, contribute to the other portion of the redundancy. This is denoted by ρ


m,1


.




In another embodiment of the present invention, conventional motion estimation is first performed to find the best motion vector for each macroblock. Then, the horizontal and vertical components of each motion vector are treated as two independent variables a (pre-whitening transform can be applied to reduce the correlation between the two components) and generalized MDTC method is applied to each motion vector. Let m


h


, m


v


represent the horizontal and vertical component of a motion vector. Using a pairing transform, T, the transformed coefficients are obtained from Equation (1):










[




m
c






m
d




]

=

T


[




m
h






m
v




]






(
1
)













Where {tilde over (m)}


i,1


=1,2, represents the bits used to code m


c


and m


d


, respectively, and m


i,2


,i=1,2 represents the bits used to code m


c







and m


d







, the estimation error for m


d


from m


c


and the estimation error for m


c


from m


d


, respectively. The transform parameters in T are controlled based on the desired redundancy.




In another embodiment of the present invention (not shown), each horizontal or vertical motion component is quantized using MDSQ to produce two bit streams for all the motion vectors.




Application of MDTC to Block DCT Coding




The MDTC approach was originally developed and analyzed for an ordered set of N Gaussian variables with zero means and decreasing variances. When applying this approach to DCT coefficients of a macroblock (either an original or a prediction error macroblock),which are not statistically stationary and are inherently two-dimensional, there are many possibilities in terms of how to select and order coefficients to pair. In the conventional run length coding approach for macroblock DCT coefficients, used in all of the current video coding standards, each element of the two-dimensional DCT coefficient array is first quantized using a predefined quantization matrix and a scaling parameter. The quantized coefficient indices are then converted into a one-dimensional array, using a predefined ordering, for example, the zigzag order. For image macroblocks, consecutive high frequency DCT coefficients tend to be zero and, as a result, the run length coding method, which counts how many zeros occur before a non-zero coefficient, has been devised. A pair of symbols, which consist of a run length value and the non-zero value, are then entropy coded.




In an embodiment of the present invention, to overcome the non-stationarity of the DCT coefficients as described above, each image is divided into macroblocks in a few classes so that the DCT coefficients in each class are approximately stationary. For each class, the variances of the DCT coefficients are collected, and based on the variances, the number of coefficients to pair, N, the pairing mechanism and the redundancy allocation are determined. These are determined based on a theoretical analysis of the redundancy-rate-distortion performance of MDTC. Specifically, the k-th largest coefficient in variance is always paired with the (N−k)-th largest, with a fixed transform parameter prescribed by the optimal redundancy allocation. The operation for macroblocks in each class is the same as that described above for the implementation of EMDC. For a given macroblock, it is first transformed into DCT coefficients, quantized, and classified into one of the predefined classes. Then depending on the determined class, the first NDCT coefficients are paired and transformed using PCT, while the rest are split even/odd, and appended to the PCT coefficients. The coefficients in each description (C coefficients and remaining even coefficients, or D coefficients and remaining odd coefficients) usually have many zeros. Therefore, the run length coding scheme is separately applied to the two coefficient streams.




In an alternative embodiment of the present invention (not shown), instead of using a fixed pairing scheme for each macroblock in the same class, which could be pairing zero coefficients, a second option is to first determine any non-zero coefficients (after quantization), and then apply MDTC only to the non-zero coefficients. In this embodiment, both the location and the value of the non-zero coefficients need to be specified in both descriptions. One implementation strategy is to duplicate the information characterizing the locations of the two coefficients in both descriptions, but split the two coefficient values using MDTC. A suitable pairing scheme is needed for the non-zero coefficients. An alternative implementation strategy is to duplicate some of the non-zero coefficients, while splitting the remaining one in an even/odd manner.





FIG. 9

is a flow diagram representation of an embodiment of an encoder operation in accordance with the present invention. In

FIG. 9

, in block


905


a sequence of video frames is received and in block


910


the frame index value k is initialized to zero. In block


915


the next frame in the sequence of video frames is divided into a macroblock representation of the video frame. In an embodiment of the present invention, the macroblock is a 16×16 macroblock. Then, in block


920


, for a first macroblock a decision is made on which mode will be used to code the macroblock. If the I-mode is selected in block


920


, then, in block


925


the 16×16 macroblock representation is divided into 8×8 blocks and in block


930


DCT is applied to each of the 8×8 blocks and the resulting DCT coefficients are passed to block


935


. In an embodiment of the present invention, four 8×8 blocks are created to represent the luminance characteristics and two 8×8 blocks are created to represent the chromanance characteristics of the macroblock. In block


935


, a four-variable transform is applied to the DCT coefficients to produce 128 coefficients, which, in block


940


, are decomposed into two sets of 64 coefficients. The two sets of 64 coefficients are each run length coded to form two separate descriptions in block


945


. Then, in block


950


, each description is output to one of two channels. In block


952


, a check is made to determine if there are any more macroblocks in the current video frame to be coded. If there are more macroblocks to be coded, then, the encoder returns to block


920


and continues with the next macroblock. If there are not any more macro blocks to be coded in block


952


, then, in block


954


a check is made to determine if there are any more frames to be coded, and if there are not any more frames to be coded in block


954


, then the encoder operation ends. If, in block


954


, it is determined that there are more frames to be coded, then, in block


955


the frame index k is incremented by 1 and operation returns to block


915


to begin coding the next video frame.




If, in block


920


, the P-mode is selected, then, in block


960


, the three best prediction macroblocks are determined with their corresponding motion vectors and prediction errors using a reconstructed previous frame from both descriptions and zero, one or two of the reconstructed previous frames from description one and description two. Then, in block


965


for the three best macroblocks a decision is made on which mode will be used to code the macroblocks. If the I-mode is selected in block


965


, then, the macroblocks are coded using the method described above for blocks


925


through block


955


. If the P-mode is selected in block


965


, then, in block


970


each of the three prediction error macroblocks is divided into a set of 8×8 blocks. In block


975


, DCT is applied to each of the three sets of 8×8 blocks to produce three sets of DCT coefficients for each block and, then, in block


980


, a four-variable pairing transform is applied to each of the three sets of DCT coefficients for each block to produce three sets of 128 coefficients. Each of the three sets of 128 coefficients from block


980


are decomposed into two sets of 64 coefficients in block


985


and the results are provided to block


990


. In block


990


, up to two motion vectors and each of the two sets of 64 coefficient are encoded using run-length coding to form two descriptions. Then, in block


950


, each description is output to one of two channels. In block


952


, a check is made to determine if there are any more macroblocks in the current video frame to be coded. If there are more macroblocks to be coded, then, the encoder returns to block


920


and continues with the next macroblock. If there are not any more macro blocks to be coded in block


952


, then, in block


954


a check is made to determine if there are any more frames to be coded, and if there are not any more frames to be coded in block


954


, then the encoder operation ends. If, in block


954


, it is determined that there are more frames to be coded, then, in block


955


the frame index k is incremented by 1 and operation returns to block


915


to begin coding the next video frame.





FIG. 10

is a flow diagram representation of the operations performed by a decoder when the decoder is receiving both descriptions, in accordance with an embodiment of the present invention. In

FIG. 10

, in block


1005


the frame index k is initialized to zero. Then, in block


1010


, the decoder receives bitstreams from both channels and in block


1015


the bitstreams are decoded to the macroblock level for each frame in the bitstreams. In block


1020


, the mode to be used for a decoded macroblock is determined. If, in block


1020


, the mode to be used for the macroblock is determined to be the I-mode, then, in block


1025


the macroblock is decoded to the block level. In block


1030


, each block from the macroblock is decoded into two sets of 64 coefficients, and in block


1035


an inverse four-variable pairing transform is applied to each of the two sets of 64 coefficients to produce the DCT coefficients for each block. In block


1040


, an inverse 8×8 DCT is applied to the DCT coefficients for each block to produce four 8×8 blocks. Then, in block


1045


, the four 8×8 blocks are assembled into one 16×16 macroblock.




If, in block


1020


, the mode to be used for the macroblock is determined to be the P-mode, then, in block


1065


, the motion vectors are decoded and a prediction macroblock is formed from a reconstructed previous frame from both descriptions. In block


1070


the prediction macroblock from block


1065


is decoded to the block level. Then, in block


1075


, each block from the prediction macroblock is decoded into two sets of 64 coefficients, and in block


1080


an inverse four-variable pairing transform is applied to each of the two sets of coefficients to produce the DCT coefficients for each block. In block


1085


, an inverse 8×8 DCT is applied to the DCT coefficients for each block to produce four 8×8 blocks. Then, in block


1090


, the four 8×8 blocks are assembled into one 16×16 macroblock, and in block


1095


the 16×16 macroblock from block


1090


is added to the prediction macroblock which was formed in block


1065


.




Regardless of whether I-mode or P-mode decoding is used, after either block


1045


or block


1095


, in block


1050


the macroblocks from block


1045


and block


1095


are assembled into a frame. Then, in block


1052


, a check is made to determine if there are any more macroblocks in the current video frame to be decoded. If there are more macroblocks to be decoded, then, the decoder returns to block


1020


and continues with the next macroblock. If there are not any more macro blocks to be decoded in block


1052


, then, in block


1055


, the frame is sent to the buffer for reconstructed frames from both descriptions. In block


1057


a check is made to determine if there are any more frames to decode, and if there are not any more frames to decode in block


1057


, then the decoder operation ends. If, in block


1057


, it is determined that there are more frames to decode, then, in block


1060


the frame index, k, is incremented by one and the operation returns to block


1010


to continue decoding the bitstreams as described above.





FIG. 11

is a flow diagram representation of the operations performed by a decoder when the decoder is receiving only description one, in accordance with an embodiment of the present invention. In

FIG. 11

, in block


1105


the frame index k is initialized to zero. Then, in block


1110


, the decoder receives a single bitstream from channel one and in block


1115


the bitstream is decoded to the macroblock level for each frame in the video bitstream. In block


1120


, the mode used for a decoded macroblock is determined. If, in block


1120


, the mode of the macroblock is determined to be the I-mode, then, in block


1125


the macroblock is decoded to the block level. In block


1130


, each block from the macroblock is decoded into two sets of 64 coefficients, and in block


1132


an estimate for the two sets of 64 coefficients for the description on channel two, which was not received, is produced for each block. In block


1135


an inverse four-variable pairing transform is applied to each of the two sets of 64 coefficients to produce the DCT coefficients for each block. In block


1140


, an inverse 8×8 DCT is applied to the DCT coefficients for each block to produce four 8×8 blocks. Then, in block


1145


, the four 8×8 blocks are assembled into a 16×16 macroblock.




If, in block


1120


, the mode of the macroblock is determined to be the P-mode, then, in block


1165


, up to two motion vectors are decoded and a prediction macroblock is formed from a reconstructed previous frame from description one. In block


1170


the prediction macroblock from block


1165


is decoded to the block level. Then, in block


1175


, each block from the prediction macroblock is decoded into two sets of 64 coefficients, and in block


1177


an estimate for the two sets of 64 coefficients for the description on channel two, which was not received, is produced for each block. In block


1180


an inverse four-variable pairing transform is applied to each of the two sets of 64 coefficients to produce the DCT coefficients for each block. In block


1185


, an inverse 8×8 DCT is applied to the DCT coefficients for each block to produce four 8×8 blocks. Then, in block


1190


, the four 8×8 blocks are assembled into a 16×16 macroblock, and in block


1195


the macroblock from block


1190


is added to the prediction macroblock formed in block


1165


.




Regardless of whether I-mode or P-mode decoding is used, after either block


1145


or block


1195


, in block


1150


the macroblocks from block


1145


and block


1195


are assembled into a frame. In block


1152


, a check is made to determine if there are any more macroblocks in the current video frame to be decoded. If there are more macroblocks to be decoded, then, the decoder returns to block


1120


and continues with the next macroblock. If there are not any more macro blocks to be decoded in block


1152


, then, in block


1155


, the frame is sent to the buffer for reconstructed frames from description one. In block


1157


a check is made to determine if there are any more frames to decode, and if there are not any more frames to decode in block


1157


, then the decoder operation ends. If, in block


1157


, it is determined that there are more frames to decode, then, in block


1160


the frame index, k, is incremented by one and the operation returns to block


1110


to continue decoding the bitstream as described above.




While the decoder method of operations shown in

FIG. 11

, and described above, are directed to an embodiment in which the decoder is only receiving description one, the method is equally applicable when only description two is being received. The modifications that are required merely involve changing block


1110


to receive the bitstream from channel two; changing block


1165


to form the prediction macroblock from reconstructed previous frame from description two; and changing blocks


1132


and


1177


to estimate the coefficients sent on channel one.




In the foregoing detailed description and figures, several embodiments of the present invention are specifically illustrated and described. Accordingly, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.



Claims
  • 1. A method for encoding a sequence of video frames, said method comprising:receiving the sequence of video frames; for each video frame from the sequence of video frames: dividing the video frame into a plurality of macroblocks; encoding each of the plurality of macroblocks using at least one of an intraframe mode (I-mode) technique and a prediction mode (P-mode) technique, and for said video frame, at least one macroblock is encoded using an intraframe mode technique and at least one other macroblock is encoded using prediction mode technique wherein the P-mode technique generates at least n+1 prediction error signals for each of the plurality of macroblocks, where n represents a number of channels; and providing both the I-mode technique encoded data and the at least n+1 P-mode technique prediction error signals to each of the n channels, wherein each of the n channels are used to transmit the encoded plurality of macroblocks.
  • 2. The method of claim 1, wherein encoding each of the plurality of macroblocks using at least one of an intraframe mode (I-mode) technique and a prediction mode (P-mode) technique comprises:selecting either the I-mode technique or the P-mode technique based on which technique uses fewer bits to produce the same image reconstruction quality; and encoding each of the plurality of macroblocks using either the I-mode technique or the P-mode technique.
  • 3. The method of claim 1, wherein encoding each of the plurality of macroblocks using at least one of an intraframe mode (I-mode) technique and a prediction mode (P-mode) technique comprises:selecting either the I-mode technique or the P-mode technique based on a target bit rate, a coding efficiency and a redundancy rate, wherein the target bit rate and coding efficiency are defined by a predetermined number of bits that are to be used for each frame of video, and the redundancy rate is determined from a predefined redundancy rate; and encoding each of the plurality of macroblocks using either the I-mode technique or the P-mode technique.
  • 4. The method of claim 1, wherein, for each macroblock to be encoded using the I-mode technique, the I-mode technique comprises:dividing the macroblock into a plurality of eight-by-eight blocks; applying an eight-by-eight discrete cosine transform (DCT) to produce a set of DCT coefficients for each of the plurality of eight-by-eight blocks; applying a four-variable pairing transform to all of said sets of DCT coefficients for each of the plurality of eight-by-eight blocks to produce a plurality of coefficients; decomposing said plurality of coefficients into two sets of coefficients; encoding each of said two sets of coefficients to form two descriptions; and outputting said two descriptions over separate channels.
  • 5. The method of claim 1, wherein, for each macroblock to be encoded using the P-mode technique, the P-mode technique comprises:determining three best prediction error macroblocks with corresponding motion vectors and prediction errors from the macroblock; determining whether to code the prediction error macroblocks using the I-mode technique or the P-mode technique; if using the P-mode technique, then: dividing each of said three prediction error macroblocks into a plurality of eight-by-eight blocks; applying an eight-by-eight discrete cosine transform (DCT) to each of the plurality of eight-by-eight blocks to produce three sets of DCT coefficients; applying a four-variable pairing transform to each of said three sets of DCT coefficients to produce a plurality of coefficients; decomposing said plurality of coefficients into two sets of coefficients; encoding each of said two sets of coefficients to form two descriptions; and outputting said two descriptions over separate channels.
  • 6. The method of claim 5, wherein determining three best prediction macroblocks with corresponding motion vectors and prediction errors from the macroblock comprises:using a reconstructed previous frame from both descriptions.
  • 7. The method of claim 5, wherein determining three best prediction macroblocks with corresponding motion vectors and prediction errors from the macroblock comprises:using a reconstructed previous frame from both descriptions; and at least one selected from the group of: a reconstructed previous frame from said first description; and a reconstructed previous frame from said second description.
  • 8. A computer-readable medium having stored therein a computer program for encoding a sequence of video frames, said computer program comprising:receiving the sequence of video frames; for each video frame from the sequence of video frames: dividing the video frame into a plurality of macroblocks; encoding each of the plurality of macroblocks using at least one of an intraframe mode (I-mode) technique and a prediction mode (P-mode) technique, and for said video frame, at least one macroblock is encoded using an intraframe mode technique and at least one other macroblock is encoded using prediction mode technique wherein the P-mode technique generates at least n+1 prediction error signals for each of the plurality of macroblocks, where n represents a number of channels; and providing both the I-mode technique encoded data and the at least n+1 P-mode technique prediction error signals to each of the n channels, wherein each of the n channels are used to transmit the encoded plurality of macroblocks.
  • 9. The computer-readable medium of claim 8, wherein encoding each of the plurality of macroblocks using at least one of an intraframe mode (I-mode) technique and a prediction mode (P-mode) technique comprises:selecting either the I-mode technique or the P-mode technique based on which technique uses fewer bits to produce the same image reconstruction quality; and encoding each of the plurality of macroblocks using either the I-mode technique or the P-mode technique.
  • 10. The computer-readable medium of claim 8, wherein encoding each of the plurality of macroblocks using at least one of an intraframe mode (I-mode) technique and a prediction mode (P-mode) technique comprises:selecting either the I-mode technique or the P-mode technique based on a target bit rate, a coding efficiency and a redundancy rate, wherein the target bit rate and coding efficiency are defined by a predetermined number of bits that are to be used for each frame of video, and the redundancy rate is determined from a predefined redundancy rate; and encoding each of the plurality of macroblocks using either the I-mode technique or the P-mode technique.
  • 11. The computer-readable medium of claim 8, wherein, for each macroblock to be encoded using the I-mode technique, the I-mode technique comprises:dividing the macroblock into a plurality of eight-by-eight blocks; applying an eight-by-eight discrete cosine transform (DCT) to produce a set of DCT coefficients for each of the plurality of eight-by-eight blocks; applying a four-variable pairing transform to all of said sets of DCT coefficients for each of the plurality of eight-by-eight blocks to produce a plurality of coefficients; decomposing said plurality of coefficients into two sets of coefficients; encoding each of said two sets of coefficients to form two descriptions; and outputting said two descriptions over separate channels.
  • 12. The computer-readable medium of claim 8, wherein, for each macroblock to be encoded using the P-mode technique, the P-mode technique comprises:determining three best prediction error macroblocks with corresponding motion vectors and prediction errors from the macroblock; determining whether to code the prediction error macroblocks using the I-mode technique or the P-mode technique; if using the P-mode technique, then: dividing each of said three prediction error macroblocks into a plurality of eight-by-eight blocks; applying an eight-by-eight discrete cosine transform (DCT) to each of the plurality of eight-by-eight blocks to produce three sets of DCT coefficients; applying a four-variable pairing transform to each of said three sets of DCT coefficients to produce a plurality of coefficients; decomposing said plurality of coefficients into two sets of coefficients; encoding each of said two sets of coefficients to form two descriptions; and outputting said two descriptions over separate channels.
  • 13. The method of claim 12, wherein determining three best prediction macroblocks with corresponding motion vectors and prediction errors from the macroblock comprises:using a reconstructed previous frame from both descriptions.
  • 14. The method of claim 12, wherein determining three best prediction macroblocks with corresponding motion vectors and prediction errors from the macroblock comprises:using a reconstructed previous frame from both descriptions; and at least one selected from the group of: a reconstructed previous frame from said first description; and a reconstructed previous frame from said second description.
CROSS-REFERENCE TO RELATED PATENT APPLICATIONS

This patent application claims the benefit of U.S. Provisional Application Serial No. 60/145,852 entitled Method and Apparatus for Accomplishing Multiple Description Coding for Video, filed Jul. 27, 1999. This patent application is related to the following commonly assigned U.S. Provisional Application: Multiple Description Coding Communication System, Ser. No.60/145,937, filed Jul. 28, 1999. This patent application is also related to the following commonly assigned U.S. Patent Applications: Multiple Description Communications System, Ser. No. 08/740,416, filed Jan. 30, 1997 now abandoned, and Multiple Description Coding Communications System, Ser. No. 09/511,367, filed Feb. 23, 2000.

US Referenced Citations (7)
Number Name Date Kind
5990955 Koz Nov 1999 A
6122314 Bruls et al. Sep 2000 A
6151360 Kato et al. Nov 2000 A
6181743 Bailleul Jan 2001 B1
6330370 Goyal et al. Dec 2001 B2
6345125 Goyal et al. Feb 2002 B2
6460153 Chou et al. Oct 2002 B1
Foreign Referenced Citations (1)
Number Date Country
1 202 463 Jan 2002 EP
Non-Patent Literature Citations (2)
Entry
Y Wang et al, Multiple Description Image Coding for Noisy Channels by Pairing Transform Coefficients, 1997, IEEE, P419-423.*
V. A. Vaishampayan, Design of Multiple Scalar Quantizers, 1993, IEEE, p 821-834.
Provisional Applications (1)
Number Date Country
60/145852 Jul 1999 US