Predictive coding of video data can improve coding efficiency. However, predictive coding can cause “drift” when some video data is lost in transmission (such as by not arriving at all or arriving too late). “Drift” refers to the propagation of errors from missing data in subsequent frames. For example, when a first video frame (sometimes referred to as a picture) is lost, a second frame that follows the first frame may be coded using prediction that references that first frame. Accordingly, the decoding computer system may be unable to correctly decode that second frame. A third frame may be coded using prediction that references that second frame, and so forth. Indeed, the error from the lost frame (i.e., a frame where at least a portion of the data for the frame was lost) may get worse as subsequent frames are decoded, due to the reliance of the predictive coding on the lost frame. In a conferencing system, intra-coded frames may be inserted in the bitstream to combat this drift problem. For example, intra-coded frames may be periodically inserted in the bitstream. As another example, a coding computer system may dynamically insert an intra-coded frame when the encoding computer system is informed that data from the bitstream has been lost.
The disclosure relates to dynamically inserting synchronization predicted video frames. As used herein, dynamically inserted synchronization video frames are video frames that are inserted dynamically and avoid having predictions that rely on specified data, such as lost data. Because these dynamically inserted frames can be predictively coded with reference to previous frames, the frames may be more efficient than comparable intra-coded frames. However, the synchronization predicted video frames can allow for synchronization to cut off drift by avoiding predictions that reference lost data.
In one embodiment, the tools and techniques can include an encoding computer system encoding and sending a video bitstream over a computer network to a decoding computer system. The bitstream can follow a regular prediction structure when the encoding computer system is not notified of lost data from the bitstream. The encoding computer system can receive a notification of lost data in the bitstream. The lost data can include at least a portion of a reference frame of the bitstream. Also, the encoding computer system can respond to the notification by dynamically encoding a synchronization predicted frame with a prediction that references one or more other previously-sent frames in the bitstream and that does not reference the lost data. The encoding computer system can insert the synchronization predicted frame in the bitstream in a position where the regular prediction structure would have dictated inserting a different predicted frame with a prediction that would have referenced the lost data according to the regular prediction structure.
This Summary is provided to introduce a selection of concepts in a simplified form. The concepts are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Similarly, the invention is not limited to implementations that address the particular techniques, tools, environments, disadvantages, or advantages discussed in the Background, the Detailed Description, or the attached drawings.
Embodiments described herein are directed to techniques and tools for improved encoding of video bitstreams when a coding computer system is informed that data from a bitstream has been lost. Such improvements may result from the use of various techniques and tools separately or in combination.
Such techniques and tools may include dynamically inserting different types of synchronization predicted video frames for different types of regular prediction structures. For example, in a bitstream with periodic key frames, which have predictions that are limited to referring to other key frames, a key frame can be dynamically inserted when a coding computer system is notified of lost data from the bitstream. As another example, in a bitstream without periodic key frames but that allows long term reference key frames, a long term reference key frame can be dynamically inserted when a coding computer system is notified of lost data from the bitstream. Long term reference key frames are key frames that are kept in an active frame window (the window of frames that are to be kept in a decoder frame buffer) for longer than regular frames. For example, a long term reference key frame may be kept in the active frame window until a coding computer system sends an explicit notification to remove that key frame from the active frame window. As another example, in a bitstream with a base layer and an enhancement layer, the base layer of a frame can be coded using prediction that references a previous frame's base layer but does not reference the previous frame's enhancement layer, and the enhancement layer of a frame can be coded using prediction that references a previous frame's enhancement layer but does not reference the previous frame's base layer. With this regular prediction structure, when a coding computing system is informed that data from a frame's enhancement layer has been lost but data from the frame's base layer has not been lost, the coding computing system can dynamically insert an anchor frame. As used herein, an anchor frame is a frame where the base layer is predictively coded with prediction that references a previous frame's base layer and an enhancement layer is intra-coded so that the enhancement layer only references other layers (e.g., the base layer) within the reference frame.
Accordingly, one or more benefits may be realized from the tools and techniques described herein. For example, the dynamic insertion of synchronization predicted video frames can allow for synchronization to cut off drift, while preserving some efficiency by using some predictive coding where the prediction references data from previous frames.
The subject matter defined in the appended claims is not necessarily limited to the benefits described herein. A particular implementation of the invention may provide all, some, or none of the benefits described herein. Although operations for the various techniques are described herein in a particular, sequential order for the sake of presentation, it should be understood that this manner of description encompasses rearrangements in the order of operations, unless a particular ordering is required. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, flowcharts may not show the various ways in which particular techniques can be used in conjunction with other techniques.
Techniques described herein may be used with one or more of the systems described herein and/or with one or more other systems. For example, the various procedures described herein may be implemented with hardware or software, or a combination of both. For example, dedicated hardware implementations, such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement at least a portion of one or more of the techniques described herein. Applications that may include the apparatus and systems of various embodiments can broadly include a variety of electronic and computer systems. Techniques may be implemented using two or more specific interconnected hardware modules or devices with related control and data signals that can be communicated between and through the modules, or as portions of an application-specific integrated circuit. Additionally, the techniques described herein may be implemented by software programs executable by a computer system. As an example, implementations can include distributed processing, component/object distributed processing, and parallel processing. Moreover, virtual computer system processing can be constructed to implement one or more of the techniques or functionality, as described herein.
The computing environment (100) is not intended to suggest any limitation as to scope of use or functionality of the invention, as the present invention may be implemented in diverse general-purpose or special-purpose computing environments.
With reference to
Although the various blocks of
A computing environment (100) may have additional features. In
The storage (140) may be removable or non-removable, and may include computer-readable storage media such as magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment (100). The storage (140) stores instructions for the software (180).
The input device(s) (150) may be a touch input device such as a keyboard, mouse, pen, or trackball; a voice input device; a scanning device; a network adapter; a CD/DVD reader; or another device that provides input to the computing environment (100). The output device(s) (160) may be a display, printer, speaker, CD/DVD-writer, network adapter, or another device that provides output from the computing environment (100).
The communication connection(s) (170) enable communication over a communication medium to another computing entity. Thus, the computing environment (100) may operate in a networked environment using logical connections to one or more remote computing devices, such as a personal computer, a server, a router, a network PC, a peer device or another common network node. The communication medium conveys information such as data or computer-executable instructions or requests in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
The tools and techniques can be described in the general context of computer-readable media, which may be storage media or communication media. Computer-readable storage media are any available storage media that can be accessed within a computing environment, but the term computer-readable storage media does not refer to propagated signals per se. By way of example, and not limitation, with the computing environment (100), computer-readable storage media include memory (120), storage (140), and combinations of the above.
The tools and techniques can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment. In a distributed computing environment, program modules may be located in both local and remote computer storage media.
For the sake of presentation, the detailed description uses terms like “determine,” “choose,” “adjust,” and “operate” to describe computer operations in a computing environment. These and other similar terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being, unless performance of an act by a human being (such as a “user”) is explicitly noted. The actual computer operations corresponding to these terms vary depending on the implementation.
It is possible that data in the bitstream (220) can be lost during transmission to the decoding computer system(s) (230). For example, the lost data may be delayed so that the data arrives at the decoding computer system(s) too late to be used, or the data may never arrive at the decoding computer system(s). Either way, the data can be considered to be lost. The decoding computer system(s) (230) and/or the transmission server (250) may send one or more loss notices (260) to the coding computer system (210) to identify the lost data (e.g., identifying which frames and/or frame layers included all or part of the lost data). The loss notices (260) may be sent in transmissions using the same protocol as the bitstream (220), or in one or more out-of-band communications. Upon receiving such a loss notice (260), the coding computer system (210) can code and insert in the bitstream (220) a synchronization predicted video frame (270) that is coded using prediction, but where the prediction does not directly or indirectly reference the data identified in the loss notice (260) (the prediction does not reference the data identified in the loss notice (260), does not reference data that itself references the data identified in the loss notice (260), etc.). Such a video frame (270) can be used to synchronize the decoding computer system(s) (230) with the coding computer system (210), which may have been out of synchronization due to the lost data.
Accordingly, the video frame (270) can cut off drift that may have occurred due to the lost data. The synchronization predicted video frame (270) may be the next frame after a frame that includes the lost data, or it may be some later frame. For example, the coding computer system (210) may not receive the loss notice (260) until the coding computer system (210) has already coded and sent one or more subsequent frames in the bitstream (220). For the intervening frames between the frame that includes the lost data and the synchronization predicted video frame (270), the decoding computer system (230) may take measures to avoid or decrease the adverse effects from the reliance on lost data, such as by dropping or concealing those intervening frames.
The coding computer system (210) may use different techniques and/or different types of inserted synchronization predicted video frames (270) to allow for synchronization. For scalable coded video such as H.264 SVC, performance may be improved by analyzing the location of the loss (such as by receiving a notice of data loss and a location (e.g., which frame and/or which layer) of the data loss) and inserting appropriate synchronization information based on the inter-layer dependency and predictive coding structure. In the following description, dynamic synchronization video frame insertion will be discussed with reference to some predictive coding structures from H.264 SVC as an example, although the tools and techniques can be applied to other standards as well. Some examples of such techniques and tools will now be discussed with reference to
Referring now to
The regular prediction structure (300) can start with an instantaneous decoding refresh (IDR) type key frame (310) (frame 0), which is an intra-coded key frame. A key frame, as used herein, is a frame that is limited to having no inter-frame predictions or only having inter-frame predictions that reference other key frames. The IDR-type key frame (310) can have intra-frame prediction, with an enhancement layer (304) (such as a quality enhancement layer) being coded with a prediction that references (directly or indirectly) the base layer of that frame and possibly lower enhancement layers of that frame. Additionally, and IDR-type key frame (310) can signal that subsequent frames should not include prediction references to frames prior to the IDR-type key frame (310).
The IDR-type key frame (310) can be followed by regular predicted frames (330) (frames 1, 2, 3, 4, 6, 7, 8, and 9). The regular predicted frames (330) can each include a base layer (302) and an enhancement layer (304). Each base layer (302) of a regular predicted frame (330) can be coded with a prediction that references a highest enhancement layer of the previous frame. Each enhancement layer (304) of a regular predicted frame (330) can be coded with a prediction that also references the highest enhancement layer of the previous frame, and that references the base layer (302) and/or one or more lower enhancement layers of that same frame.
The regular prediction structure (300) of
Referring still to
Referring still to
Referring now to
The bottom of
Referring now to
The bottom of
In response to receiving a notification of lost data (560) in the base layer (502) of frame 9, the coding computer system can code and insert an IDR-type key frame (510) as frame 10 to cut off drift from the lost data (560) in frame 9.
Note that combinations of the above types of synchronization predicted frames could be used. For example, combinations of anchor predicted frames (520) and long term predicted key frames (420) could be used to deal with losses in the same bitstream (e.g., in an H.264 CGS bitstream).
Several techniques for dynamic insertion of synchronization predicted video frames will now be discussed. Each of these techniques can be performed in a computing environment. For example, each technique may be performed in a computer system that includes at least one processor and memory including instructions stored thereon that when executed by at least one processor cause at least one processor to perform the technique (memory stores instructions (e.g., object code), and when processor(s) execute(s) those instructions, processor(s) perform(s) the technique). Similarly, one or more computer-readable storage media may have computer-executable instructions embodied thereon that, when executed by at least one processor, cause at least one processor to perform the technique.
Referring to
The synchronization predicted frame can include a predicted key frame whose prediction is limited to referencing one or more other key frames prior to the reference frame. The prediction of the predicted key frame can reference one or more intra-coded key frames (e.g. IDR frame(s)) and/or one or more predicted key frames.
The synchronization predicted frame can be a long term predicted key frame whose prediction is limited to referencing one or more other key frames prior to the lost data. The long term predicted key frame can reference one or more other long term key frames.
The different predicted frame that would have been coded and sent under the regular structure may have been a frame that would have referenced an enhancement layer of the reference frame. The synchronization predicted frame may be a key frame whose prediction references one or more key frames prior to the lost data. The lost data may include at least a portion of the enhancement layer of the reference frame and/or at least a portion of a base layer of the reference frame. The base layer can be referenced by prediction of the enhancement layer.
The lost data can include at least a portion of a lost enhancement layer, and the synchronization predicted frame can include an enhancement layer that references a base layer of the synchronization predicted frame. A prediction of the enhancement layer of the synchronization predicted frame can avoid referencing the lost enhancement layer. Additionally, an enhancement layer of the different predicted frame may have been a frame that would have referenced the enhancement layer with at least a portion of the lost data. A prediction of the base layer of the synchronization predicted frame may reference the base layer of a frame that includes at least a portion of the lost data.
Referring now to
Inserting the synchronization predicted frame can include inserting the synchronization predicted frame in the bitstream in a position where the regular prediction structure would have dictated inserting a different predicted frame with a prediction that would have referenced the lost data according to the regular prediction structure. The different frame can be a frame that would have referenced an enhancement layer of the reference frame. For example, the enhancement layer of the reference frame may be a quality enhancement layer or a spatial enhancement layer. The lost data may include at least a portion of the enhancement layer, and the prediction of the synchronization predicted frame may reference a base layer below the enhancement layer without referencing the enhancement layer. The synchronization predicted frame may include a key frame whose prediction references one or more key frames prior to the lost data in the bitstream.
Referring now to
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.