The present invention relates to the concealment of data loss when decoding a video signal received over a lossy medium such as a packet-switched network.
The transmission of video over an unreliable channel naturally leads to some frames being lost, or some data within a frame being lost (e.g. certain blocks or macroblocks). For example, the loss could be due to packet-loss if the channel traverses a packet-switched network, or could be due to corruption of data caused by noise or interference.
An encoded video stream typically comprises two types of video frames: key frames and inter-frames. A key frame, also sometimes referred to as an intra-frame, is compressed based on information from within only the current video frame itself (i.e. using intra-frame prediction, similarly to static image coding). An inter-frame on the other hand is compressed using knowledge of a preceding frame within the stream, and allows for much more efficient compression when the scene has relatively few changes because only the differences between frames need to be encoded and transmitted. Inter-frame coding is particularly efficient for a situation such as a talking-head with static background, typical in video conferencing. Depending on the resolution, frame-rate, bit-rate and scene, an intra-frame can be up to 20-100 times larger than an inter-frame. On the other hand, an inter-frame imposes a dependency relation to previous frames. If any of those frames are lost, all subsequent inter-frames will not be properly decoded.
For these reasons, as illustrated schematically in
Other techniques for recovering the decoding state may also be used, e.g. by the decoder or receiver requesting retransmission of the lost data from the encoder or transmitter.
However, in the meantime while the decoder is waiting for the lost state to be recovered, it is desirable for the decoder to be able to continue decoding and playing out some video based on an approximation of the current decoding state.
A simple approach is to freeze the last successfully decoded frame until recovery is possible. However, modern video decoders employ a more sophisticated approach by applying a concealment algorithm. When a portion of image data is lost from a video signal, a typical concealment algorithm works by using preceding and/or adjacent portions of image data to extrapolate or interpolate the lost portion. Interpolation here means generating new, replacement data between received data points; whilst extrapolation means generating new, replacement data extending beyond received data points. Either way, the concealment algorithm is thus able to regenerate an estimate of the lost image data from one or more other received portions of image data.
Concealment algorithms may operate on a temporal and/or spatial basis. A temporal concealment algorithm is illustrated schematically in
There is a problem with existing concealment algorithms in that the perceived efficacy of concealment varies greatly depending on the content of the video. However, regardless of perceived efficacy, the algorithm still incurs a significant processing burden in terms of the number of processing cycles required to execute. In fact, the inventors have recognized that in some circumstances it may be preferable to simply freeze a preceding frame or image portion rather than incurring the processing cost of attempting concealment.
The present invention addresses this problem by estimating the efficacy of the concealment process in the event of one or more particular frames or portions of image data being lost. After the loss event (or events), this estimate is then used to make a decision about whether a particular frame or image portion should be decoded using concealment or whether the decoding process should instead be frozen, e.g. until recovery is possible.
According to one aspect of the present invention, there is provided a system comprising: a receiver for receiving a video signal over a lossy medium; a decoder coupled to the receiver and arranged to decode the video signal for output to a display, the decoder including a concealment module for regenerating a portion of image data lost from the video signal over said medium, by interpolating or extrapolating from other image data of the video signal received over said medium; and wherein the decoder comprises a controller configured to select, based on a measure of loss effect estimated for said portion of image data, whether (i) to apply the concealment module to regenerate said portion of image data, or alternatively (ii) to freeze preceding image data of the video signal in place of said portion of image data.
In embodiments the system may comprise an encoder arranged to encode the video signal; a transmitter arranged to transmit the video signal to the receiver over said medium; and an encoder-side estimation module configured to estimate said measure of loss effect based on knowledge of said portion of image data before transmission and loss over said medium.
The encoder may be configured to signal the loss effect to the decoder for use by the controller in making said selection.
The encoder may be configured to determine, based on the measure of loss effect, a concealment decision as to whether to apply the concealment module, and to signal the concealment decision to the decoder for use by the controller in making said selection.
The encoder-side estimation module may be configured to estimate the measure of loss effect based on an unencoded version of said portion of image data.
The encoder-side estimation module may comprise a parallel instance of the decoder, and may be configured to estimate said measure of loss effect by comparing an unencoded version of said portion of image data with a simulated regeneration of said portion of image data by an instance of the concealment module.
The encoder-side estimation module may comprise multiple parallel instances of the decoder arranged for execution on different respective parallel execution units of a processor, each configured to estimate said measure of loss effect by comparing a respective unencoded portion of image data with a simulated regeneration of that portion by a respective instance of the concealment module.
The encoder-side estimation module may be configured to estimate said measure of loss effect based on a difference between an unencoded version of said portion of image data and preceding unencoded image data of the video signal.
The encoder-side estimation module may be configured to estimate said measure of loss effect based on coding parameters used to encode said portion of image data.
The coding parameters may comprise motion vectors of said portion of image data.
The encoder-side estimation module may be configured to estimate said measure of loss effect based on an encoded version of said portion of image data, after encoding but before transmission.
The encoder-side estimation module may be configured to estimate said measure of loss effect based on one of: a size of the portion of image data, a Q parameter of the portion of image data, a ratio of an amount of motion estimation data to residual data in the portion of image data, a position of motion vectors in the portion of image data, and a size of motion vectors in the portion of image data.
In embodiments, the system may comprise a decoder-side estimation module may be configured to estimate said measure of loss effect based on the video signal after reception by the receiver.
The decoder-side estimation module may be configured to determine said measure of loss effect based on the video signal before decoding by the decoder.
The decoder-side estimation module may be configured to estimate said measure of loss effect based on a size of the portion of image data lost over said medium.
Said portion of image data may comprise a number of areas within a frame of the video signal which reference another lost frame, and the decoder-side estimation module may be configured to determine said measure of loss effect based on said number areas.
In embodiments, the decoder may be configured to instigate a recovery operation, and to apply the selected one of the concealment or freezing during recovery.
In one application of the invention, said lossy medium may comprise a packet-switched network.
According to a further aspect of the present invention, there may be provided a system comprising: an encoder arranged to encode a video signal for transmission; a transmitter for transmitting the video signal, over a lossy medium, to a decoder having a concealment module for regenerating a portion of image data lost from the video signal over said medium by interpolating or extrapolating from other image data of the video signal received over said medium; and an encoder-side estimation module configured to estimate a measure of loss effect for said portion of image data based on knowledge of said portion of image data before transmission and loss over said medium; wherein the transmitter is arranged to signal information relating to said loss effect to the decoder, thereby enabling the decoder to select whether (i) to apply the concealment module to regenerate said portion of image data, or alternatively (ii) to freeze preceding image data of the video signal in place of said portion of image data.
According to another aspect of the present invention, there may be provided a method comprising: receiving a video signal over a lossy medium; and using a decoder to decode the video signal for output to a display, the decoder including a concealment module for regenerating a portion of image data lost from the video signal over said medium, by interpolating or extrapolating from other image data of the video signal received over said medium; wherein the decoding comprises selecting, based on a measure of loss effect estimated for said portion of image data, whether (i) to apply the concealment module to regenerate said portion of image data, or alternatively (ii) to freeze preceding image data of the video signal in place of said portion of image data.
According to another aspect of the present invention, there may be provided a method comprising: operating an encoder to encode a video signal for transmission; transmitting the video signal, over a lossy medium, to a decoder having a concealment module for regenerating a portion of image data lost from the video signal over said medium by interpolating or extrapolating from other image data of the video signal received over said medium; operating an encoder-side estimation module to estimate a measure of loss effect for said portion of image data based on knowledge of said portion of image data before transmission and loss over said medium; and signaling information relating to said loss effect to the decoder, thereby enabling the decoder to select whether (i) to apply the concealment module to regenerate said portion of image data, or alternatively (ii) to freeze preceding image data of the video signal in place of said portion of image data.
According to another aspect of the present invention, there may be provided computer program product comprising code embodied on a non-transient computer-readable medium and configured so as when executed on a processing apparatus to perform operations in accordance with any of the above system or method features.
For a better understanding of the present invention and to show how it may be put into effect, reference is made by way of example to the accompanying drawings in which:
a is a schematic illustration of a stream of encoded video,
b is a schematic illustration of a temporal concealment algorithm,
c is a schematic illustration of a spatial concealment algorithm,
a is a schematic block diagram of a decoder-side apparatus,
b is a schematic block diagram of an encoder-side apparatus, and
c is a schematic block diagram of an encoder.
As discussed, current systems may employ one of two techniques.
(Recovery may for example comprise the receiver requesting a lost packet or frame from the transmitter, or simply waiting for the next key frame.)
The first technique (i) can improve the perceived quality of the video in certain situations, but also incurs a high processing burden and is not always effective. The second technique (ii) incurs relatively little processing cost but is unlikely to go unnoticed by the viewer. On the other hand, in situations where the first technique (i) is ineffective, the second technique (ii) may appear no worse from the perspective of the viewer, or may even appear better.
In the following embodiments, the present invention achieves the benefit of both techniques by estimating a loss effect that will be experienced if a portion of data is lost and regenerated by concealment, and then using that estimate to switch between concealment and simple freezing. This advantageously presents fluid motion to the viewer using concealment when possible, yet freezes the video when it is estimated that the degree of decoding errors on screen would actually be perceived as a worse compared to just freezing the last good frame.
The estimate of loss effect for a particular portion of image data is a measure of the effect that the loss of that portion will have on the viewer's perception if the video signal is decoded through a decoder employing a concealment algorithm. As will be discussed in more detail below, the inventors have developed a variety of different options that can be used to estimate the loss effect, either at the encoder side or decoder side of the system. These will be described shortly, but first an exemplary architecture for a decoder-side apparatus and encoder-side apparatus is described with reference to
As shown in
The output of the receiver 104 is coupled to the input of a receive buffer 106 (sometimes referred to as the jitter buffer) for buffering blocks or macroblocks of one or more frames of the video signal. The output of this receive buffer 106 is then coupled to the input of a decoder 108, arranged to decode the video signal. The decoder may for example be implemented in accordance with ISO standard H.264 or others. The decoder 108 comprises a decoding module in the form of a decoding algorithm, which is arranged to regenerate lost data during the decoding process as described above. The output of the decoder 108 is in turn coupled to the input of a playout buffer 110 to buffer the frames for playout. Finally, the output of the payout buffer 110 is coupled to the input of a display screen 112 for displaying the decoded video.
The encoder-side apparatus is shown in
Returning to
In accordance with embodiments of the present invention, the controller 107 is also configured to select, based on a measure of loss effect, whether (i) to operate in a first mode whereby it applies the concealment algorithm in order to attempt to regenerate the lost data, or instead whether (ii) to operate in an alternative second mode in which it simply freezes the preceding frame until a recovery is possible. That is to say, the invention does not just detect whether or not there is a loss, but also determines some information about the extent to which that loss will be perceived, and based on that determination the controller 107 selects the best way in which to mitigate the loss.
There are several places in the transmit and/or receiver chain where the module for estimating the loss effect can be implemented, represented by components 105, 109, 111, 205, and/or 207 in
The most effective place to locate an estimation module is at the encoder 206, shown by component 207 in
One such technique is achieved by running a parallel instance of the decoder within the encoder 206 (including a parallel instance of the concealment algorithm) This parallel instance can be used to simulate the loss of a particular one or more frames or slices of frames. The encoder-side estimation module 207 then computes the loss effect by comparing the simulated result with the original unencoded data, thereby measuring the loss effect in terms of a difference value (e.g. based on MSE, PSNR or other error measure). The greater the difference, the less effective would concealment be if that frame or slice was lost. The encoder 206 may signal this difference value to the controller 107 on the receive side, e.g. being transmitted in the video stream in the form of side information, which may be considered an additional part of the meta-information of the encoded video. Note that the encoder 206 may be configured to ensure that the side information indicating the loss effect is transmitted in a different packet of the network layer protocol than the frame, block or portion of vide data to which it relates, so that if packet loss is encountered over the network then the required indication regarding the estimated loss effect is not itself lost and therefore rendered pointless. However, in an embodiment intended only to deal with corruption of some of the data within a packet, or indeed in the case of a non-packet-based medium, then this is not absolutely required.
Having received the relevant side information in event of loss, the controller 107 may then select to apply the concealment algorithm on condition that the difference is below a certain threshold. Alternatively (and more preferably from a point of view of signaling overhead), the encoder-side estimation module 207 may decide whether the difference value is above or below a threshold and signal only the decision to the controller 107 as side information. Line 105 between the received video stream and the controller 107 is shown in
This technique gives the most precise estimate of the loss effect. The history of loss can be from 1 frame to N and any possible loss pattern can be evaluated. The only limit here would be the computational power available and amount of memory.
In a particularly preferred embodiment multiple instances of the decoder may be run in parallel at the encoding side, so that the estimation of effect of different or multiple losses can be calculated in parallel, e.g. where multiple processors or processor cores are available.
This idea is illustrated schematically in
In a preferred implementation, in a case where the encoder 206 is arranged for execution on a processor having an architecture which supports parallel execution, each of the decoder instances 224 may be implemented on a separate respective parallel execution unit of the processor.
Note also the instance or instances 224 of the decoder implemented in the encoder 206 for the purpose of simulating loss effects need not necessarily be exact replicas of the actual decoder 108 at the decoder side. In alternative embodiments, lower complexity approximations of the decoder 108 may be used in the duplicate instance(s) 224 so as to reduce the processing burden incurred by the encoder 206. Furthermore, note that there may also be another, lossless encoding stage involved, such as an entropy encoder, not shown in
Returning to the general diagram of
In the case of option (a), a measure of the amount of motion in a frame (e.g. in terms of the magnitude, position and/or number of one or more motion vectors) may be used as a measure of loss effect and compared to a threshold to determine whether the concealment algorithm would be worthwhile. In case of options (b) to (d), the difference value is used. The more motion or the bigger the difference, the less effective the concealment algorithm is likely to be compared with simply freezing the preceding frame. Again, the encoder 206 may signal the coding parameters or difference values to the controller 107 to compare to a threshold, or more preferably the encoder-side estimation module 207 may make the threshold decision at the encoding end and signal the decision to the receive-side controller 107.
The next most effective place to locate the estimation module is right after the encoder 206, shown by component 205 in
Again, the encoder 206 may signal these parameters or difference values to the controller 107 to compare to a threshold, or more preferably the encoder-side estimation module 205 may make the threshold decision at the encoding end and signal the decision to the receive-side controller 107.
Any or all of options (a) to (g) can be used together to give an overall measure of loss effect.
An alternative place to locate an estimation module is right before the decoder 108 on the decode side, shown by component 109 in
Another possible location for an estimation module is inside the decoder 208 itself, shown by component 111 in
For instance, none of the options for measuring loss effect described above are exclusive of one another, and any or all of these options can be used together to give an overall measure of loss effect. Further, the invention is not limited to these options and other measures of loss effect can be used as long as they reflect a perceived efficacy of concealment. The thresholds or other conditions used to select whether to conceal or freeze are not fixed and may vary depending on a particular system design or application. Further, as discussed the estimation of loss effect can be performed at the transmit side and or receive side, and the concealment decision itself can be performed at either the transmit side and/or receive side.
Each of the receive buffer 106 and playout buffer 110 may be implemented as a region of a general purpose memory unit or as a dedicated register. Each of the decoder 108 (including concealment module), controller 107, encoder 206 and estimation modules 109, 111, 205 and 207 is preferably implemented in software stored on a storage medium such as a flash memory or hard drive and arranged for execution on a processor. However, the option of one or more of these components being implemented wholly or partially in dedicated hardware is not excluded.
The encoder 206 and decoder 108 may be implemented according to ISO standard H.264, but are not limited to such implementations and other suitable decoders will be known to a person skilled in the art. Similarly, the invention is not limited to any particular concealment algorithm, and various suitable examples will be known to a person skilled in the art.
Whilst the above may have been described in relation to frames or macroblocks of a frame, these terms are not intended to limit the applicability of the present invention. More generally, the portion of image data in question may be a frame or a subdivision within a frame, and the subdivision may be a slice, macroblock or block, or any other division or subdivision of a video signal.
The terms “interpolate” and “extrapolate” are not intended to limit to any specific mathematical operation, but generally can refer to any technique for regenerating lost data by approximating from other spatially and/or temporally nearby image data (as opposed to just freezing past data).
Further, “lost” need not necessarily mean completely absent, but could instead mean that only some of the data of a frame, macroblock, block or other image portion is lost; and/or that the image data in question is corrupted rather than absent.
Other variants may be apparent to a person skilled in the art given the disclosure herein. The invention is not limited by the described embodiments, but only by the appendant claims.
This application claims the benefit of U.S. Provisional Application No. 61/428,645, filed on Dec. 30, 2010. The entire teachings of the above application are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61428645 | Dec 2010 | US |