This invention relates to video decoding and in particular to methods and apparatus for detecting, isolating and repairing errors within a video bitstream.
A video sequence consists of a series of still pictures or frames. Video compression methods are based on reducing the redundant and the perceptually irrelevant parts of video sequences. The redundancy in video sequences can be categorised into spectral, spatial and temporal redundancy. Spectral redundancy refers to the similarity between the different colour components of the same picture. Spatial redundancy results from the similarity between neighbouring pixels in a picture. Temporal redundancy exists because objects appearing in a previous image are also likely to appear in the current image. Compression can be achieved by taking advantage of this temporal redundancy and predicting the current picture from another picture, termed anchor or reference picture. Further compression may be achieved by generating motion compensation data that describes the displacement between areas of the current picture and similar areas of the reference picture.
Frames coded without reference to another frame are known as intra-frames (also known as I-frames). Pictures that are compressed using temporal redundancy techniques are generally referred to as inter-pictures or inter-frames (also known as P-frames). Parts of an inter-picture can also be encoded without reference to another frame (known as intra-refresh).
Sufficient compression cannot usually be achieved by only reducing the inherent redundancy of a sequence. The redundancy of the encoded bit stream is usually therefore further reduced by means of efficient lossless coding of compression parameters. The main technique is to use variable length codes.
Compressed video is usually corrupted by transmission errors, mainly for two reasons. Firstly, due to utilisation of temporal predictive differential coding (inter-frame coding) an error is propagated both spatially and temporally. In practise this means that, once an error occurs, it is usually visible to the human eye for a relatively long time. Especially susceptible are transmissions at low bit rates where there are only a few intra-coded frames, so temporal error propagation is not stopped for some time. Secondly, the use of variable length codes increases susceptibility to errors. When a bit error alters the code word, the decoder will lose code word synchronisation and also decode subsequent error-free code words (comprising several bits) incorrectly until the next synchronisation (or start) code. A synchronisation code is a bit pattern which cannot be generated from any legal combination of other code words and such start codes are added to the bit stream at intervals to enable resynchronisation. In addition, errors occur when data is lost during transmission. For example, for video applications using an unreliable transport protocol such as UDP in IP Networks, network elements may discard parts of the encoded bit stream.
The transmission of video data over networks prone to transmission errors (for instance mobile networks) is subject to channel errors and channel congestion. Even a low Bit Error Rate (BER) can produce a significant degradation of video quality. Whilst channel error may cause significant visual impairments, it is undesirable to request a transmitting device to retransmit the corrupted data as any re-transmitted information is likely to be subject to similar channel degradation and also processing and transmitting resources may be unnecessarily occupied when other data is to be transmitted. Thus techniques have been developed to detect, isolate and/or conceal errors at a decoder.
There are many ways for the receiver to address the corruption introduced in the transmission path. In general, on receipt of the signal, transmission errors are first detected and then corrected or concealed by the receiver. Error correction refers to the process of recovering the erroneous data preferably as if no errors had been introduced in the first place. Error concealment refers to the process of concealing the effects of transmission errors so that they are hardly visible in the reconstructed video sequence. Typically an amount of redundancy is added by the source transport coding in order to help error detection, correction and concealment.
Current video coding standards define a syntax for a self-sufficient video bit-stream. The most popular standards at the time of writing are ITU-T Recommendation H.263, “Video coding for low bit rate communication”, February 1998; ISO/IEC 14496-2, “Generic Coding of Audio-Visual Objects. Part 2: Visual”, 1999 (known as MPEG-4); and ITU-T Recommendation H.262 (ISO/IEC 13818-2) (known as MPEG-2). These standards define a hierarchy for bit-streams and correspondingly for image sequences and images.
In accordance with the invention there is provided a method of decoding encoded video data, the encoded video data being arranged as a plurality of video picture segments, the data of the video picture segments comprising header data and motion vector data for the segment, the method comprising:
Preferably the encoded video data is arranged as a plurality of video picture segments, the data of the video picture segments being arranged so that all header data for the segment are transmitted together and all motion vector data for the segment are transmitted together, the header data and motion vector data being separated by markers. Such a data format is defined in international standards H.263 Annex V and ISO 13818 (MPEG4).
Preferably the step of decoding the motion vector data of the encoded video segment comprises decoding a first portion of the motion vector data that represents the motion vector data for the segment and decoding a second portion of the motion vector data that represents the sum of the motion vector data in the encoded video segment and comparing the two portions, the comparison indicating whether the decoding of the motion vector data is successful. The decoding of the motion vector data may be deemed successful if the second portion equals the cumulative effect of the first portion. For instance, the encoded video conforms to H.263, Annex V and the first portion represents the Motion Vector Difference data and the second portion represents the Last Motion Vector Value of the segment.
The method may further comprise attempting to decode header data of an encoded video segment; and when an attempt to decode all of the header data of an encoded video segment is unsuccessful, ignoring the motion vector data associated with those parts of the encoded video data for which the decoding of the associated header data was unsuccessful. The method may also ignore the motion vector data associated with those parts of the encoded video data that occur subsequent in the video segment to those parts of the encoded video data for which the decoding of the associated header data was unsuccessful.
When the successfully decoded motion vector data corresponds to a number of valid code words that equals the maximum number of valid code words allowed for the segment, the whole segment may be decoded in an inter-frame manner.
Preferably, when all the motion vector data for the encoded video segment have not been successfully decoded, missing motion vector data is interpolated from any successfully decoded motion vector data for the segment.
The invention is particularly suitable for use with encoded video which conforms to H.263, in particular Annex V, MPEG2 or MPEG4.
In a further aspect of the invention there is provided a video decoder for decoding encoded video data, the encoded video data being arranged as a plurality of video picture segments, the data of the video picture segments comprising header data and motion vector data for the segment, the decoder comprising:
Preferably the encoded video data is arranged as a plurality of video picture segments, the data of the video picture segments being arranged so that all header data for the segment are transmitted together and all motion vector data for the segment are transmitted together, the header data and motion vector data being separated by markers.
Preferably the decoder is arranged to decode the motion vector data of the encoded video segment by decoding a first portion of the motion vector data that represents the motion vector data for the segment and decoding a second portion of the motion vector data that represents the sum of the motion vector data in the encoded video segment, the decoder being arranged to compare the two portions, the comparison indicating whether the decoding of the motion vector data is successful. The decoder may be arranged to determine that the decoding of the motion vector data is successful if the second portion equals the cumulative effect of the first portion.
The decoder may be arranged to decode header data of an encoded video segment; and, when an attempt to decode all of the header data of an encoded video segment is unsuccessful, to ignore the motion vector data associated with those parts of the encoded video data for which the decoding of the associated header data was unsuccessful. The decoder may also ignore those parts of the encoded video data that occur subsequent in the video segment to those parts of the encoded video data for which the decoding of the associated header data was unsuccessful.
The decoder is particularly suitable for use with encoded video which conforms to H.263, in particular Annex V of H.263, MPEG2 or MPEG4.
The invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
The video codec 10 receives signals for coding from a video capture or storage device of the terminal (not shown) (e.g. a camera) and receives signals for decoding from a remote terminal 2 for display by the terminal 1 on a display 70. The audio codec 20 receives signals for coding from the microphone (not shown) of the terminal 1 and receive signals for decoding from a remote terminal 2 for reproduction by a speaker (not shown) of the terminal 1. The terminal may be a portable radio communications device, such as a radio telephone.
The control manager 40 controls the operation of the video codec 10, the audio codec 20 and the data protocols manager 30. However, since the invention is concerned with the operation of the video codec 10, no further discussion of the audio codec 20 and protocol manager 30 will be provided.
The video codec comprises an encoder part 100 and a decoder part 200. The encoder part 100 comprises an input 101 for receiving a video signal from a camera or video source of the terminal 1. A switch 102 switches the encoder between an INTRA-mode of coding and an INTER-mode. The encoder part 100 of the video codec 10 comprises a DCT transformer 103, a quantiser 104, an inverse quantiser 108, an inverse DCT transformer 109, an adder 110, a plurality of picture stores 107 (see
The operation of an encoder according to the invention will now be described. The video codec 10 receives a video signal to be encoded. The encoder 100 of the video codec encodes the video signal by performing DCT transformation, quantisation and motion compensation. The encoded video data is then output to the multiplexer 50. The multiplexer 50 multiplexes the video data from the video codec 10 and control data from the control 40 (as well as other signals as appropriate) into a multimedia signal. The terminal 1 outputs this multimedia signal to the receiving terminal 2 via the modem 60 (if required).
In INTRA-mode, the video signal from the input 101 is transformed to DCT coefficients by a DCT transformer 103. The DCT coefficients are then passed to the quantiser 104 that quantises the coefficients. Both the switch 102 and the quantiser 104 are controlled by the encoding control manager 105 of the video codec, which may also receive feedback control from the receiving terminal 2 by means of the control manager 40. A decoded picture is then formed by passing the data output by the quantiser through the inverse quantiser 108 and applying an inverse DCT transform 109 to the inverse-quantised data. The resulting data is added to the contents of the picture store 107 by the adder 110.
In INTER mode, the switch 102 is operated to accept from the subtractor 106 the difference between the signal from the input 101 and a reference picture which is stored in a picture store 107. The difference data output from the subtractor 106 represents the prediction error between the current picture and the reference picture stored in the picture store 107. A motion estimator 111 may generate motion compensation data from the data in the picture store 107 in a conventional manner.
The encoding control manager 105 decides whether to apply INTRA or INTER coding or whether to code the frame at all on the basis of either the output of the subtractor 106 or in response to feedback control data from a receiving decoder. The encoding control manager may decide not to code a received frame at all when the similarity between the current frame and the reference frame is so high or there is not time to code the frame. The encoding control manager operates the switch 102 accordingly.
When not responding to feedback control data, the encoder typically encodes a frame as an INTRA-frame either only at the start of coding (all other frames being inter-frames), or at regular periods e.g. every 5 s, or when the output of the subtractor exceeds a threshold i.e. when the current picture and that stored in the picture store 107 are judged to be too dissimilar. The encoder may also be programmed to encode frames in a particular regular sequence e.g. I P P P P I P etc.
The video codec outputs the quantised DCT coefficients 112a, the quantising index 112b (i.e. the details of the quantising used), an INTRA/INTER flag 112c to indicate the mode of coding performed (I or P), a transmit flag 112d to indicate the number of the frame being coded and the motion vectors 112e for the picture being coded. These are multiplexed together by the multiplexer 50 together with other multimedia signals.
The decoder part 200 of the video codec 10 comprises an inverse quantiser 220, an inverse DCT transformer 221, a motion compensator 222, one or more picture stores 223 and a controller 224. The controller 224 receives video codec control signals demultiplexed from the encoded multimedia stream by the demultiplexer 50. In practice the controller 105 of the encoder and the controller 224 of the decoder may be the same processor.
Considering the terminal 1 as receiving coded video data from terminal 2, the operation of the video codec 10 will now be described with reference to its decoding role. The terminal 1 receives a multimedia signal from the transmitting terminal 2. The demultiplexer 50 demultiplexes the multimedia signal and passes the video data to the video codec 10 and the control data to the control manager 40. The decoder 200 of the video codec decodes the encoded video data by inverse quantising, inverse DCT transforming and motion compensating the data. The controller 224 of the decoder checks the integrity of the received data and, if an error is detected, attempts to correct or conceal the error in a manner to be described below. The decoded, corrected and concealed video data is then stored in one of the picture stores 223 and output for reproduction on a display 70 of the receiving terminal 1.
In H.263, the bit stream hierarchy has four layers: block, macroblock, picture segment and picture layer. A block relates to 8×8 pixels of luminance or chrominance. Block layer data consist of uniformly quantised discrete cosine transform coefficients, which are scanned in zigzag order, processed with a run-length encoder and coded with variable length codes.
A macroblock relates to 16×16 pixels (or 2×2 blocks) of luminance and the spatially corresponding 8×8 pixels (or block) of chrominance components.
The picture segment layer can either be a group of blocks (GOB) layer or a slice layer. Each GOB or slice is divided into macroblocks. Data for each GOB consists of an optional GOB header followed by data for macroblocks. If the optional slice structured mode is used, each picture is divided into slices instead of GOBs. A slice contains a number of macroblocks but has a more flexible shape and use than GOBs. Slices may appear in the bit stream in any order. Data for each slice consists of a slice header followed by data for the macroblocks.
The picture layer data contain parameters affecting the whole picture area and the decoding of the picture data. Most of this data is arranged in a so-called picture header.
MPEG-2 and MPEG-4 layer hierarchies resemble the one in H.263.
Errors in video data may occur at any level and error checking may be carried out at any or each of these levels.
The invention has particular application in situations in which the encoded video data is arranged as video picture segments and the macroblocks in the segment are arranged so that header information for all the macroblocks in the segment are transmitted together followed by the motion vectors for all the macroblocks in the segment and then by the DCT coefficients for the macroblocks in the segment. The header, motion vector and DCT partitions are separated by markers, allowing for resynchronisation at the end of a partition in which an error occurred. Each segment contains the data for an integer number of macroblocks.
One example of such a data structure is when the picture segment is a slice and the data partitioned slice (DPS) mode (Annex V of H.263) is implemented. The data structure for this DPS mode is as shown in
The macroblock data comprises the following fields: HD, Header Data, contains the COD and MCBPC information for all the macroblocks in the slice. The COD is set to 0 when the macroblock is coded and set to 1 if no further information is transmitted for a macroblock. MCBPC is a code word giving information about the macroblock type and the coded block pattern for chrominance. In Annex V of H.263, a reversible variable length code (RVLC) is used to combine the COD and the MCBPC for all the macroblocks in the packet. A header marker (HM), which is a fixed code word of 9 bits, terminates the header partition. When reverse coding is used by a decoder, the decoder searches for this marker to decode the header data in the reverse direction.
The macroblock data further comprises motion vector data (MVD) which is included for all INTER macroblocks and consists of a variable length codeword for the horizontal component followed by a variable length codeword for the vertical component. In DPS mode, the motion vector data represents the difference between the motion vector for the previous macroblock and the current one. That is to say, the first motion vector of a segment is coded using a predictor value of 0 for both the horizontal and the vertical component and the motion vectors for the subsequent coded macroblocks of the segment are coded predictively using the motion vector difference. The last motion vector value (LMVV) contains the last motion vector in the packet or segment. It is coded using a predictor value of 0 for both the horizontal and vertical components i.e. it represents the sum of all the MVD for the segment. If there are no motion vectors or only one motion vector in the segment or packet, LMVV is not present. The motion vector marker (MVM) is a code word of 10 bits having a fixed non-symmetrical value. The MVM terminates the motion vector partition. When reverse coding is used in a decoder, the decoder searches for this marker.
The coefficient data comprises various optional fields (INTRA_MODE, CBPB, CBPC, DQUANT), CBPY, and the DCT coefficients for the macroblocks of the segment. The DCT coefficients comprise INTRA DC, an 8-bit word representing the DC co-efficient for INTRA blocks, and TCOEF, the DCT coefficient(s) for the block. TCOEF has a value from 1 to 102, which value indicates (a) whether the coefficient is the last non-zero coefficient in the macroblock, (b) the number of zeros preceding the coded coefficient and (c) the level of the coefficient. TCOEF is coded using a variable length code.
A slice comprises N×M macroblocks, where N and M are integers. Say there are 11 macroblocks in a slice with N=1 and M=11. Thus, in the macroblock data, the HD field should include header data for all 11 macroblocks of the slice followed by the header marker HM. A receiving decoder therefore tries to decode 11 headers and checks that the next data received is the header marker HM. If an error is detected in the header data, or the decoder manages to decode the header data for 11 macroblocks but this data is not followed by a header marker, the data is deemed to be corrupted. The decoder then starts from the header marker HM and decodes in the reverse direction until a point in the data is reached at which another error is detected. Thus, an intermediate portion of code labelled X in
In the invention, even if an error is detected in the header data HD of the segment, the decoder attempts to decode the motion vector data MVD for the segment. The decoder therefore looks for the Header Marker HM to detect the end of the Header Data and decodes the data between the Header Marker HM and the Motion Vector Marker MVM. In Annex V of H.263 Reversible Variable Length Codes (RVLC) are used and it is possible to decode a received bit stream in a forward and a reverse direction. Thus the decoder decodes the data between HM and MVM in a forward direction and decodes the data is a reverse direction from MVM towards HM and then determines whether the MVD decoded is equal to the LMVV. If so, the motion vector data is deemed to be uncorrupted.
This can be illlustrated with reference to
If a valid LMVV codeword is found in the reverse direction but it does not terminate at the same point as the termination of the decoding in the forward direction (i.e. at point • in
Uncorrupted motion vector data may be used in association with any uncorrupted macroblock header data to decode the associated macroblock. In addition the uncorrupted motion data may be used in subsequent error correction. For instance, considering the case shown in
As mentioned above, the macroblock header data includes information MCBPC as to how a macroblock is coded i.e. I or P. If a macroblock is coded in an intra-frame manner, then no motion vector data will be associated with the macroblock. If the macroblock is coded in an inter-frame manner, then motion vector data will be asssociated with the macroblock. Thus, when decoding in either direction, it is known whether there is any motion vector data associated with the macroblock.
This may be illustrated for a slice having 11 macroblocks MB1 to MB11 with reference to
Thus, say that the last macroblock of the segment MB11 is inter-coded, the decoder decodes the last macroblock by decoding the first code word (LMVV) of the motion vector data in the reverse direction. Next, say MB10 is intra-coded. The decoder decodes MB10 without reference to the motion vector data. Say MB9 is inter-frame coded, then the decoder uses the next codeword in the reverse direction in the motion vector data to decode MB9. As the header data for MB8 was corrupted, the decoder may be unable to use the motion vector data (if any) occurring between the motion vector data associated with MB1-5 and the motion vector data associated with MB9-11.
However it may be possible to do so. For instance, in the above example, say that the motion vector data occurring between the motion vector data associated with MB1-5 and the motion vector data associated with MB9-11 corresponds to three valid motion vector data code words. As the number of valid code words equals the number of skipped macroblocks, there is therefore a good probability that the three macroblocks that have been skipped MB6-8 were inter-coded. The remaining motion vector data may therefore be used to predict the data for MB6-8 and hence the skipped macroblocks are reconstructed by the decoder. Additionally or alternatively, the decoder may assess whether the number of valid code words in the motion vector data as a whole equals the number of macroblocks in the segment. If so, the decoder may assume that all the macroblocks of the segment are coded in an inter-frame manner and use the motion vector data to predict the data for each macroblock, whether or not the macroblock header is corrupted.
Thus whether or not the header data is decoded succesfully, the motion vector data is decoded and a check made to see if the motion vector data is corrupted. this check involves comparing the incremental sum of the MVD data with the LMVV. If the decoding of the header data is successful, then the header data is examined to determine the number of motion vectors that should exist for the macroblock.
In a preferred implementation of the invention, if the attempt to decode motion vector data is only partially successful, resulting in the successful decoding of some of the motion vector data and the unsuccessful decoding of the remaining motion vector data of the segment, then the missing motion vector data for the segment is interpolated from the MVD for other macroblocks within the same segment of video data.
If the motion vector data for less than a predetermined proportion of macroblocks in a segment are lost, then the lost motion vector data is interpolated from the motion vector data for the other macroblocks in the segment. For instance, say 50% of the macroblocks within a segment are successfully decoded. The other 50% of motion vector data is then interpolated from this 50% successfully decoded data.
where ✓ means that the MVD for a macroblock was decoded successfully
X means that the MVD for a macroblock was not decoded successfully
As can be seen, less than 50% of the motion vector data has been lost. Therefore the decoder interpolates the lost motion vectors from the ones that have been decoded successfully.
In a second embodiment of the invention, the picture segment layer is a group of blocks. As shown in
According to H.263, data for each macroblock consists of a header followed by data for the blocks. Fields of the header are as set out in H.263 and comprise:
The data for the macroblock comprises:
For simplicity,
The data for the block consists of INTRA DC, an 8-bit word representing the DC co-efficient for INTRA blocks, and TCOEF, the DCT coefficient for the block. TCOEF has a value from 1 to 102, which value indicates (a) whether the coefficient is the last non-zero coefficient in the macroblock, (b) the number of zeros preceding the coded coefficient and (c) the level of the coefficient. TCOEF is coded using a variable length code.
Say a GOB includes 11 macroblocks. When a new GOB is received, the decoder decodes the data for the first macroblock MB1 of the segment. This is achieved by reading COD and MCBPC to determine the type of macroblock (I or P) and the coded chrominance block pattern, CBPY to determine the coded luminance block pattern and DQUANT to determine if the quantiser to be used is altered.
The decoder then reads the MVD and the block data and decodes the information as described with reference to part 200 of
Since the data structure as shown in
The invention is also applicable to a video bit stream that complies with ISO/IEC 14496-2, “Generic Coding of Audio-Visual Objects. Part 2: Visual”, 1999 (known as MPEG-4). MPEG 4 adopts a video packet approach having periodic resynchronisation markers throughout the bit stream. In Part E 1.2 of Annex E of this coding scheme, data partitioning similar to that adopted in Annex V of H.263 is described. An example of the data structure adopted in MPEG-4 is shown in
The invention is not intended to be limited to the video coding protocols discussed above: these are intended to be merely exemplary. The invention is applicable to any video coding protocol using motion compensation techniques. The operation of the decoder as discussed above allows a receiving decoder to determine the best cause of action if a picture is corrupted.
Number | Date | Country | Kind |
---|---|---|---|
02251864.1 | Mar 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB03/01073 | 3/12/2003 | WO |