This disclosure relates video decoding.
With the rapid development of wired and wireless networks, more and more users are seeking video services, including video streaming and video conferencing over the Internet. However, the Internet does not provide guaranteed quality of service (QoS). Traffic congestion usually results in the loss of data packets. A lost packet/frame is detected when a next data packet or image frame is received while the previous data packet or image frame has not been received for a certain time. In wireless networks, packet losses happen frequently due to multi-path fading, shadowing and noise disturbance of wireless channels. Although existing error concealment techniques can typically deal with the loss of macroblocks, existing error concealment techniques cannot adequately reconstruct or handle the loss of an entire frame.
Frame loss is commonplace in video transmission, typically resulting in severe distortions to decoded/reconstructed video data. To save transmission overhead, one data packet may carry information for an entire video frame. Thus, loss of a single packet in a low bit rate application may result in loss of an entire frame. Additionally, and in high bit rate applications, traffic congestion may cause a burst of packet/frame losses. Moreover, and to make matters worse, if a spatial-temporal predictive coding scheme is utilized to achieve high compression efficiency, an erroneously recovered block (due to packet/frame loss) may lead not only to errors in the subsequent blocks in the same frame, but also propagate errors to subsequent frames.
Systems and methods for bi-directional temporal error concealment are described. In one aspect, a lost frame is detected during encoded video decoding operations. Bi-directional estimations for each pixel of the lost frame are calculated to generate a current frame for bi-directional temporal error concealment of the lost frame.
In the Figures, the left-most digit of a component reference number identifies the particular Figure in which the component first appears.
Overview
To control errors in video transmission, many video encoding techniques and decoder error concealment methods have been developed. One such error concealment technique is temporal error concealment, which assumes that motion in the video is smooth or continuous. In general, temporal error concealment replaces a damaged block with content of a previous frame at a motion-compensated location. The limitation to such an approach is that it relies on the knowledge of motion information that may not be available in all situations, especially when a whole frame is lost.
Motion vector extrapolation (MVE) and multi-frame motion averaging (MMA) are other error concealment techniques that attempt to accurately estimate lost motion information. In MVE, motion vectors are extrapolated from the last received frame. For each 8×8 block in the current frame, its motion vector is determined by the extrapolated macroblock that possesses the largest overlapped area on it. In stationary and slow motion scenes, where the last received frame is highly temporally correlative to the current frame, MVE can yield relatively satisfactory results. However, due to the rough motion vectors, of 8×8 pixel size, and the missing of residual information, this method usually introduces obvious block artifacts. In particular, the method may give erroneously estimated motions in large motion scenes, as illustrated in
MMA is a pixel-based temporal error concealment method. It starts from a last received frame. MMA inversely tracks the motion of each pixel in a few past frames, and then averages the motion vectors in the trace to estimate the forward motion vector of the last received frame. This method can smooth the boundaries of blocks at stationary areas, but fails with respect to areas representing motion.
In contrast to the above described techniques, the following systems and methods provide bi-directional temporal error concealment when decoding video data. These systems and methods resolve the loss of an entire frame. This is in stark contrast to existing error decoding techniques, which generally cannot recover an entire missing frame. More particularly, bi-directional temporal error concealment, for each pixel in a lost frame of video data, extrapolates two motion vectors from the motion vectors of the previous reconstructed frame and the next frame. The lost pixel is then reconstructed using multi-hypothesis motion compensation. Such bi-directional temporal error concealment does not increase bit rate, or delay. In this implementation, bi-directional temporal error concealment is implemented with respect to low bit rate real-time video communications.
These and other aspects of the systems and methods for bi-directional temporal error concealment are now described in greater detail.
An Exemplary System
In this implementation, client computing device 202 receives encoded video data 210 from server 206. In another implementation, the encoded video data 210 is received from another entity such as from a CD-ROM, DVD, etc. The server may encode the video data using any one or more video encoding techniques such as those performed by an H.261, H.263, H.264, or MPEG 1/2/4 video encoder, and/or the like. Client computing device 202 decodes encoded video data 210 and recovers any lost frame(s) with bi-directional error concealment operations. To this end, client computing device 202 includes program modules 212 and program data 214. The program modules include, for example, video decoding module 216. Video decoding module 216 decodes encoded video data 210 to generate decoded video data 218. Video decoding module 216 implements operations of bi-directional temporal error concealment module 220 to recover a lost video frame when data for the frame is missing from the encoded data 210. The operations of program module 220 are bi-directional because they are forward and backward estimating, as described below.
Exemplary Forward Estimation
To recover a first aspect of a lost video frame, bi-directional temporal error concealment module 218 implements forward estimating pixel-based MVE operations. These operations differ from block-based MVE in at least the following two ways:
In this implementation, in stationary or little motion video scenes, video decoding module 216 repeats motion vectors 222 of a previous frame. A pixel that is covered by more than one extrapolated MB is called a multi-covered pixel. In general, video scenes that possess large motion also have a large number of multi-covered pixels. Thus, bi-directional temporal error concealment 220, for a frame (a respective portion of decoded video data 218) that has the number of multi-covered pixels smaller than a threshold T, directly duplicates the frame's motion vectors 222 from the corresponding motion vectors 222 of its previous frame. For purposes of illustration, threshold T is shown as a respective portion of “other data” 224.
In this implementation, T for QCIF size videos is 2,000. In other implementations, T is an arbitrary value selected as a function of the image size of the associated decoded video data 218.
If an estimated motion vector 222 is represented as ƒ=(ƒx, ƒy), bi-directional temporal error concealment 220 recovers each lost pixel pƒ(x, y) as follows:
pƒ(x, y)=pr(x+ƒx, y+ƒy) (1),
wherein pr(x, y) refers to pixels in the previous frame.
Exemplary Backward Estimation
Most existing temporal error concealment techniques utilize only the information of past frames. However, the information of a next frame is usually also available, because we are not aware of the loss of the current frame until we receive the next frame. Besides the forward-based estimations of the pixel-based MVE described above, bidirectional temporal error concealment module 220 implements backward estimation by extrapolating motion vectors 222 from a next frame after a missing frame. However, pixel values of the next frame are not available until the current frame is recovered. Bi-directional temporal error concealment module 220 solves this problem by compensating each pixel pb(x, y) of the lost frame on the last reconstructed frame using the backward estimated motion vector (bx, by).
pb(x, y)=pr(x+bx, y+by) (2)
wherein pr(x, y) refers to pixels in the last reconstructed frame (part of decoded video data 218).
Error concealment performance of the forward and backward pixel-based MVE operations is shown in Table 1. The Foreman, Suzie, and Miss-am images, which respectively represent small, moderate, and large motion scenes, are standard images in the encoding and decoding industry. As indicated, the backward method is more efficient that the forward operations.
Exemplary Bi-Directional Compensation
Bi-directional temporal error concealment module 220 obtains two estimations of a current frame (the missing frame) with the described forward and backward estimations. Bi-directional temporal error concealment module 220 combines these forward and backward pixel-based MVE operations for pixel (x, y) in the lost frame. More particularly, pixel value p(x, y) is estimated as follows:
p(x, y)=w×pƒ(x, y)+1−w)×pb(x, y) (3),
wherein pixel-based weight w=w(x, y) is used to adjust the weights of the forward and backward methods.
In this implementation, w(x, y)=0.5 to simply average the two candidate concealments. In another implementation, the multi-hypothesis weights are adaptively adjusted. For example, the weights can be adjusted in terms of the correlativity between the adjacent frames, with additional weight being provide to a candidate that possesses higher correlation to a lost frame.
An Exemplary Procedure
An Exemplary Operating Environment
Although not required, the systems and methods for block importance analysis to enhance browsing of web page search results have been described in the general context of computer-executable instructions (program modules) being executed by a computing device such as a personal computer. Program modules generally include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. While the systems and methods are described in the foregoing context, acts and operations described hereinafter may also be implemented in hardware.
The methods and systems described herein are operational with numerous other general purpose or special purpose computing system, environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use include, but are not limited to, mobile computing devices such as mobile phones and personal digital assistants, personal computers, server computers, multiprocessor systems, microprocessor-based systems, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and so on. The invention is practiced in a distributed computing environment where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
With reference to
A computer 1010 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computer 1010 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 1010.
Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example and not limitation, communication media includes wired media such as a wired network or a direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
System memory 1030 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 1031 and random access memory (RAM) 1032. A basic input/output system 1033 (BIOS), containing the basic routines that help to transfer information between elements within computer 1010, such as during start-up, is typically stored in ROM 1031. RAM 1032 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 1020. By way of example and not limitation,
The computer 1010 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
A user may enter commands and information into the computer 1010 through input devices such as a keyboard 1062 and pointing device 1061, commonly referred to as a mouse, trackball or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 1020 through a user input interface 1060 that is coupled to the system bus 1021, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).
A monitor 1091 or other type of display device is also connected to the system bus 1021 via an interface, such as a video interface 1090. In addition to the monitor, computers may also include other peripheral output devices such as speakers 1098 and printer 1096, which may be connected through an output peripheral interface 1095.
The computer 1010 operates in a networked environment using logical connections to one or more remote computers, such as a remote computer 1080. In one implementation, remote computer 1050 represents server 206 of
When used in a LAN networking environment, the computer 1010 is connected to the LAN 1081 through a network interface or adapter 1080. When used in a WAN networking environment, the computer 1010 typically includes a modem 1082 or other means for establishing communications over the WAN 1083, such as the Internet. The modem 1082, which may be internal or external, may be connected to the system bus 1021 via the user input interface 1060, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 1010, or portions thereof, may be stored in the remote memory storage device. By way of example and not limitation,
Although the systems and methods for block importance analysis to enhance browsing of web page search results have been described in language specific to structural features and/or methodological operations or actions, it is understood that the implementations defined in the appended claims are not necessarily limited to the specific features or actions described. For example, although client computing device 202 of