The present invention is directed to video processing techniques and devices. In particular, the present invention is directed to a video encoding system that builds a hierarchy of long term reference frames and adjusts the hierarchy adaptively.
In a video coding system, such as that illustrated in
The resulting compressed sequence (bitstream) is transmitted to a decoder 120 via a channel 130, which can be a transmission medium or a storage device such as an electrical, magnetic or optical storage medium. To recover the video data, the bitstream is decompressed at the decoder 120, which inverts coding processes performed by the encoder and yields a decoded video sequence.
The compressed video data may be transmitted in packets when transmitted over a network. The communication conditions of the network may cause packets of one or more frames to be lost. Lost packets can cause visible errors and the errors can propagate to subsequent frames if the subsequent frames depend on the frames that have packet loss. One solution is for the encoder/decoder to keep the reference frames in a buffer and start using another reference frame (e.g., an earlier reference frame) if a packet loss for the current reference frame is detected. However, due to constraints in buffer sizes, the encoder/decoder is not able to save all the reference frames in the buffer. For error resilience purposes, the encoder can mark certain frames in the bit stream and signal the decoder to store these frames in the buffer until the encoder signals to discard them. They are called long term reference (LTR) frames.
For example, as shown in
Accordingly, there is a need in the art for adjusting the designations of the LTRs adaptively based on channel conditions and quickly stopping the error propagation.
a) is a simplified block diagram of an exemplary encoding system according to an embodiment of the present invention.
b) is a hierarchy of coded frames encoded by an exemplary encoding system according to an embodiment of the present invention.
Embodiments of the present invention provide an encoder that may build a hierarchy of coded frames in the bit stream to improve the video quality and viewing experience when transmitting video data in a channel that is subject to transmission errors. The hierarchy may include “long term reference” (LTR) frames and frames coded to depend from the LTR frames. LTR frames may be provided in the channel on a regular basis (e.g., 1 frame in every 10 frames). The hierarchy, including the frequency of the LTR frames, can be adjusted adaptively based on the channel conditions (e.g., the error rate, error pattern and delay), in order to provide effective error protection at reasonably small cost. If a channel error does occur and transmitted frames are lost, use of the LTR frames permits the decoder to recover from the transmission error even before the encoder can be notified of the problem.
a) illustrates a simplified block diagram of a video coding/encoding system 200, in which an encoder 210 and decoder 220 are provided in communication via a forward channel 230 and a back channel 240. The encoder 210 may encode video data into a stream of coded frames. The coded frames may be transmitted via the forward channel 230 to the decoder 220, which may decode the coded frames. The coded frames may include LTR frames and frames encoded using LTR frames as prediction references (“LTRP frames”). The coded frames may also include frames that are neither LTR nor LTRP (e.g., frames that are coded using a preceding non-LTR frame as a reference). The decoder 220 may send acknowledgement messages to the encoder 210 via a back channel 240 when LTR frames are received and decoded successfully.
In one embodiment, the encoder 210 may encode source video frames as LTR or LTRP frames at a predetermined rate (e.g., one LTR frame every 10 frames, rest nine frames being LTRP frames encoded using the LTR frame as a reference frame). In a further embodiment, some of the LTRP frames may also be selected to be marked as LTR frames (e.g., secondary LTR frames), and each secondary LTR frames may be encoded with reference to a preceding acknowledged LTR frame. The encoder 210 may encode frames subsequent to a secondary LTR frame using the secondary LTR frame as a reference. The decoder 220 may retain the LTR frame (including the secondary LTR frames) in a buffer until instructed to discard it, decode the subsequently received frames according to each frame's reference frame, and report packet losses. the encoder 210 may periodically send instructions to the decoder 220 to manage the decoder 220's roster of LTR frames, e.g., identifying a specific LTR frame for eviction from the decoder's cache, sending a generic message that causes eviction of all reference frames that occur in coding order prior to a designated frame.
The channels 230, 240 may be provided as respective communication channels in a packet-oriented network. The channel may be provided in a wired communication network (e.g., by fiber optical or electrical physical channels), may be provided in a wireless communication network (e.g., by cellular or satellite communication channels) or by a combination thereof. The channel may be unreliable and packets may be lost. The channel conditions (e.g., the delay time, error rate, error pattern, etc.) may be detected by other service layers (not shown) of the communication network between the encoder 210 and decoder 220.
a) also illustrates a sequence of events for the communication between the encoder 210 and decoder 220 in communication via the channel. As shown in
Upon receipt of the acknowledgement that the LTR frame 80 has been correctly received by the decoder 220, the encoder 210 may encode a subsequent frame 101 using the LTR frame 80 as a reference. Thus, the frame 101 may be a LTRP frame. The encoder 210 may also mark the frame 101 as a LTR frame (e.g., a secondary LTR frame) and transmit it to the decoder 220. Subsequently, the encoder 210 may code a segment of frames using the secondary LTR frame 101 as a reference. The segment may contain a predetermined number of frames, for example, 4 frames.
Thereafter, the encoder 210 may code the next frame (e.g., frame 106) using the LTR frame 80 as a reference. Thus, the frame 106 may be another LTRP frame. And, subsequently, the encoder 210 may code a segment of frames using the LTR frame 106 as a reference. The segment may contain the predetermined number of frames as discussed above, for example, 4 frames.
In one or more embodiments, the decoder 220 may send acknowledgements of successful receipt of subsequent LTR frames (e.g., frames 101, 106) to the encoder 210. If the acknowledgements are received by the encoder 210, the encoder 210 may update its record and start using the most recently acknowledged LTR frame as a reference to code subsequent frames as described above. However, as shown in
The secondary LTR frames 101 and 106 may stop error propagation caused by any errors that occurred before their arrival. For example, if frame 101 is received correctly, frame 102, 103, 104 and 105 may be correctly decoded as long as not packet loss occurs for either one of these frames. Thus, secondary LTR frames 101 and 106 may stop any error propagation due to packet losses prior to their arrival.
b) illustrates a stream of coded frames encoded according to a three-level hierarchy 200 and to be transmitted from the encoder 210 to the decoder 220. In one or more embodiments, the encoder 210 may adjust the levels of hierarchy and/or span of number of frames (e.g., adjusting the predetermined number to change the frequency of secondary LTR frames) in a segment according to the channel conditions (e.g., the delay time, error rate, error pattern, etc.). The three-level hierarchy 200 may include a top-tier LTR frame 80. The top-tier LTR frame 80 may be an acknowledged LTR frame (e.g., acknowledgement received by the encoder 210 as shown in
In one or more embodiments, the predetermined number (e.g., frequency of the LTR frames) may be adjusted as needed. For example, if it is nine (9), then there will be a secondary LTR frame based on an acknowledged LTR frame every 10 frames; if it is fourteen (14), then there will be a secondary LTR frame based on an acknowledged LTR frame every 15 frames. The predetermined number may determine the span of frames without a LTR frame and this may be adjusted based on the channel conditions.
As described with respect to
In one embodiment, after an acknowledgement is received for a secondary LTR frame, the acknowledged secondary LTR frame may be designated as a new top-tier LTR frame for subsequent coding. The above hierarchy may be repeated based on the new top-tier LTR frame. Further, the encoder (e.g., encoder 210) may send an instruction to the decoder (e.g., decoder 220) to clear all LTR frames in the decoder's buffer received prior to the new top-tier LTR frame. Alternatively, the encoder does not need to send such instruction to flush all LTR frames prior to the new top-tier LTR frame. As long as the buffer is big enough, keeping multiple top-tier LTR frames gives the option of choosing one that may give best quality when time is allowed.
As shown in
For the example shown in
In one embodiment, the period may be a different number other than 2. For example, the period may be every one in three frames, so underneath each secondary LTR frame, there will be one LTRP frame at the third level and two frames at the fourth level. In this configuration, the 1st,4th frames after a secondary LTR frame may be coded as LTRP frames using the preceding secondary LTR frame as a reference, the 2nd frame may be coded using the 1st frame as a reference and 3rd frame may be coded using the 2nd frame as a reference; and the errors occurring in any frames after the secondary LTR frame will propagate from one frame to next until the next LTRP frame.
In another embodiment, the predetermined number can also be a different number other than 2. For example, if it is three (3), then there may be three LTRP frames underneath each secondary LTR frame. In those embodiments described above, the predetermined number may determine the span of frames without a LTR frame, and this may be adjusted based on the channel conditions.
At the fourth level, the frames are coded using a preceding frame as a reference, thus, frames of fourth-tier level are not be LTRP frames. For example, frames 103, 105, 108 and 110 are coded using LTRP frames 102, 104, 107 and 109 as references respectively. Although the hierarchy 300 shows three tiers of LTR frames, in one or more embodiments, an encoder according to the present invention may encode the video data in more tires according to the channel conditions.
Adjustment of the Hierarchy According to Channel Conditions
In an embodiment of the present invention, the number of hierarchy levels, the number and distribution of frames in each hierarchy level, may be adjusted according to channel conditions, including the delay time, error rate, error pattern, etc, in order to achieve different trade off between error resilience capability and frame quality. For example, with respect to the four level hierarchy 300 described above, the number of frames contained at the fourth level may be increased or decreased based on channel conditions. Further, the frequency of the LTR frames may be adjusted (e.g., one LTR frame in every 5 frames, or one in every 10 frames). In addition, levels of LTR frames may also be adjusted (e.g., in addition to top-tier and second-tier as described above, more tiers of LTR frames may be added when needed).
In another embodiment, the distance between two secondary LTR frames may be kept shorter than the channel round trip delay time, in order to achieve a faster recover during packet loss than the “refresh frame request” mechanism, in which case the receiver requests a refresh frame upon packet loss, and the encoder sends a refresh frame (an instantaneous decoding refresh (IDR) for example) to stop the error propagation after getting the request.
Stopping Error Propagation
In both of the hierarchies 200 and 300 shown in
Hierarchy 200 may have more overhead (more cost for coding, transmission and/or decoding) than hierarchy 300. In hierarchy 200, for example, each of frames 102, 103, 104 and 105 may be coded with reference to the LTR frame 101. For frames 103, 104 and 105, they are further away from the reference frame 101, and thus, may need more bits to code. In hierarchy 300, however, frames 103 and 105 are coded using an immediately preceding frame as a reference frame, thus, may not need a lot of bits to code.
As shown in
The video decoding system 650 may include a decoding engine 660, a reference frame cache 670 and a post-processor 690. The decoding engine 660 may parse coded video data received from the encoder and perform decoding operations that recover a replica of the source video sequence. The reference frame cache 670 may store decoded data of reference frames previously decoded by the decoding engine 660, which may be used as prediction references for other frames to be recovered from later-received coded video data. The post-processor 690 may condition the recovered video data for rendering on a display device.
The stream of coded frames may be a stream representing the hierarchy 200 shown in
During operation, the coding engine 620 may select dynamically coding parameters for video, such as selection of reference frames, computation of motion vectors and selection of quantization parameters, which are transmitted to the decoding engine 660 as part of channel data; selection of coding parameters may be performed by a coding controller (not shown). Similarly, selection of pre-processing operation(s) to be performed on the source video may change dynamically in response to changes in the source video. Such selection of pre-processing operations may also be administered by the coding controller.
As noted, in the video coding system 600, the reference frame cache 630 may store decoded video data of a predetermined number n of reference frames (for example, n=16). The reference frames may have been previously coded by the coding engine 620 then decoded and stored in the reference frame cache 630. Many coding operations are lossy processes, which cause decoded frames to be imperfect replicas of the source frames that they represent. By storing decoded reference frames in the reference frame cache, the video coding system 600 may store recovered video as it will be obtained by the decoding engine 660 when the channel data is decoded; for this purpose, the coding engine 620 may include a video decoder (not shown) to generate recovered video data from coded reference frame data. As illustrated in
In the video decoding system 650, the reference frame cache 670 may store decoded video data of frames identified in the channel data as reference frames. For example,
The post-processor 690 may perform additional video processing to condition the recovered video data for rendering, commonly at a display device. Typical post-processing operations may include applying deblocking filters, edge detection filters, ringing filters and the like. The post-processor 690 may output recovered video sequence that may be rendered on a display device or stored to memory for later retrieval and display.
As discussed above, the foregoing embodiments provide a coding/decoding system that build a hierarchy of coded frames in the bit stream to protect the bit stream against transmission errors. The techniques described above find application in both software- and hardware-based coders. In a software-based coder, the functional units may be implemented on a computer system (commonly, a server, personal computer or mobile computing platform) executing program instructions corresponding to the functional blocks and methods described in the foregoing figures. The program instructions themselves may be stored in a storage device, such as an electrical, optical or magnetic storage medium, and executed by a processor of the computer system. In a hardware-based coder, the functional blocks illustrated hereinabove may be provided in dedicated functional units of processing hardware, for example, digital signal processors, application specific integrated circuits, field programmable logic arrays and the like. The processing hardware may include state machines that perform the methods described in the foregoing discussion. The principles of the present invention also find application in hybrid systems of mixed hardware and software designs.
In an embodiment, the channel may be a wired communication channel as may be provided by a communication network or computer network. Alternatively, the communication channel may be a wireless communication channel exchanged by, for example, satellite communication or a cellular communication network. Still further, the channel may be embodied as a storage medium including, for example, magnetic, optical or electrical storage devices.
Those skilled in the art may appreciate from the foregoing description that the present invention may be implemented in a variety of forms, and that the various embodiments may be implemented alone or in combination. Therefore, while the embodiments of the present invention have been described in connection with particular examples thereof, the true scope of the embodiments and/or methods of the present invention should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.
The present application claims the benefit of US Provisional application, Ser. No. 61/321,811, filed Apr. 7, 2010, entitled “ERROR RESILIENT HIERARCHICAL LONG TERM REFERENCE FRAMES,” the disclosure of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61321811 | Apr 2010 | US |