Video conferencing over high latency, high jitter, and low bandwidth networks is a challenging problem, especially when network conditions change dynamically. To smooth out the impact of network condition changes, a buffer can be implemented between the network layer and codec layer. Based on the estimation of current network conditions, the network layer sets buffer parameters for the encoder. Next, the encoder may calculate buffer status based on these parameters, and encode a source video sequence using coding parameters that are based in part on the buffer condition. Once the encoder codes a frame, it outputs the frame to a transmit buffer for use by the network layer. The encoder employs predictive coding techniques to reduce bandwidth of the coded video signal. These predictive techniques rely on an implicit guarantee that all coded data will be transmitted by the network layer once generated by the encoder.
However, due to potentially quick changes in network conditions, e.g., link failure, bandwidth reduction, and high feedback latency, the buffer condition may not match the instantaneous network condition. For example, during a video conference session, bandwidth may drop significantly in a short period of time. In this case, conventional coding systems require the network layer to transmit all frames generated by the encoder even when network bandwidth drops materially. This operation may contribute to degraded performance when network conditions degrade. Accordingly, there is a need for a video coder and control system that responds dynamically to dynamic changes in network conditions.
Embodiments of the present invention provide techniques for adapting buffered video to network condition changes. Video data may be coded as reference data and non-reference data. According to the embodiments, non-reference frame may be detected in buffered video while awaiting transmission to a network. When network degradation is detected, one or more of the buffered non-reference frames may be dropped when network degradation is detected. Information about the dropped frames may be passed to an encoder for updating buffer parameters for future encoding. In this manner, a video coding system may provide faster responses to changing network conditions.
The buffer stages 130.1, 130.2 may include respective transmit buffers 132.1, 132.2, receive buffers 134.1, 134.2 and buffer controllers 136.1, 136.2. The transmit buffers 132.1, 132.2 may receive coded video data from the respective video coders 122.1, 122.2 and hold it in queue until needed by the network layer 140.1, 140.2. Similarly, the receive buffers 134.1, 134.2 may store coded video data provided by the respective network layers 140.1, 140.2 and store it in queue until consumed by the video decoders 124.1, 124.2. Buffer controllers 136.1, 136.2 may manage operations of the transmit buffer 132.1, 132.2 and may perform queue decimation as needed to accommodate network degradation events.
The network layers 140.1, 140.2 may include respective transceivers 142.1, 142.2 and network monitors 144.1, 144.2. The transceivers 142.1, 142.2 may receive data from the transmit buffers 132.1, 132.2, format it for transmission over the communication network 120 and transmit the data. The transceivers 142.1, 142.2 also may receive data from the communication network 120 and process the data to format it for consumption at the terminal. In so doing, the transceivers 142.1, 142.2 may perform error recovery processes to recover from data transmission errors that may have been induced by the communication network 120. The network monitors 144.1, 144.2 may monitor execution of these error recovery processes and estimate other network performance metrics to determine the operational state of the network 120. For example, the network monitors 144.1, 144.2 may estimate transmitted packet loss rate from negative acknowledgment messages (commonly, “NACKs”) received from far-end transmitters. The network monitors 144.1, 144.2 may estimate packet arrival time jitter based on received packets. They may estimate round trip communication latency on one-way latency based on packets delivered to the network and packets received therefrom. The network monitors 144.1, 144.2 also may exchange messages between them, transmitted by the transceivers 142.1, 142.2, identifying to the other the packet transmission rate and/or packet arrival rates at the respective transceivers 142.1, 142.2. In an embodiment of the present invention, the network monitors 144.1, 144.2 may estimate operational state of the network 120 and report indicators of the operational state to the buffer controllers 136.1, 136.2 and the codec controllers 126.1, 126.2. The codec controllers 126.1, 126.2 and buffer controllers 136.1, 136.2 may adjust their operation based on operational state as determined by the network monitors 144.1, 144.2.
As noted, the transmit buffer 220 may store coded video data until it is read out by the transmitter 230 for transmission to the network. The transmit buffer 220 may operate under control of a buffer controller 250, which receives data representing the network operating point from the network monitor 240. In an embodiment, when the buffer controller 250 receives revised operating point data from the network monitor 240 indicating diminished bandwidth available from the network, the buffer controller 250 may selectively decimate coded video data in the buffer. If the buffer controller 250 decimates data in the buffer 220, it may report the decimating to the codec controller 260.
State 320 illustrates control operations that may occur when a network monitor determines the network is in an unstable operating condition. The “unstable” state 320 may be one in which network statistics indicate a greater number of communication errors than are expected in stable operation but the network statistics do not clearly show that the network is not capable of carrying data at the currently assigned channel rate. In this case, the coder controller may revise coding parameters to decrease the rate of data dependencies among portions of coded video data but need not revise the channel rate at which the coder currently is working. The coder control chain may exit the unstable state 320 by returning to the stable state 310 if communication errors return to expected levels over time. Alternatively, the coder control chain may exit the unstable state 320 by advancing to a network diminished state 330.
The network diminished state 330 may represent a state in which network conditions generate unacceptable transmission errors at a currently-assigned channel rate. In the network diminished state 330, a buffer controller 250 (
Selective decimation may occur immediately when the system enters the network diminished state 330. In such an embodiment, the system may identify and delete coded video data present in the transmit buffer 220 in a single atomic act.
Alternatively, the system may perform selective decimation by scheduling different elements of coded video data for prioritized transmission. In such an embodiment, the system may schedule coded reference data for prioritized transmission and may schedule data elements that are candidates for deletion at lower priority. The system may transmit the high priority data first and, if network resources permit transmission of lower priority elements, the network may transmit the lower priority elements as well. The transmit buffer 220 may operate on a pipelined basis, receiving new coded video data as other video data is read out and transmitted by the network layer. Thus, the transmit buffer 220 may receive other elements of video data that are higher priority than lower priority elements already stored by the transmit buffer 220. Again, the higher priority elements may be scheduled for transmission over the lower priority elements. At some point, various lower priority elements may expire within the transmit buffer 220 and be deleted prior to transmission.
a) illustrates P frame F5 relying on I frame F1 as a prediction reference and P frame F9 relying on P frame F5 as a prediction reference. Moreover, B frames F2-F4 and F6-F8 rely on P frame F5 as a prediction reference.
b) also illustrates a second enhancement layer with coded video data of the B frames. The B frames of the second enhancement layer may refer to the reference frames of the base layer as prediction references. Decoding of the coded B frames from the second enhancement layer provides an increases display frame rate as compared to coding of the base layer and first enhancement layer.
By switching to scalable coding techniques when network instability is observed, it permits decimation of data in a transmit buffer if state advances to a network diminished state. Thus, data of the first and second enhancement layers may be deleted from a transmit buffer prior to transmission when the system advances to the network diminished state.
In another embodiment, in a network diminished state, the system may select and delete coded reference frames from the transmit buffer. In such an embodiment, the codec controller may cause the video coder to recode source video corresponding to the coded reference frame as a non-reference frame. The codec controller also may recode source video corresponding to coded data that depends on the reference frame. Such embodiments find application in systems in which the codec operates with sufficient throughput to repopulate the transmit buffer before the network layer would transmits the deleted coded reference data and coded dependent data.
The foregoing embodiments of the present invention provide several techniques for video coders to adapt quickly to bandwidth changes in transmission networks. It provides techniques to adapt video coder performance to changing network environments and also to adapt transmission of already-coded data. By adapting transmission of already-coded data in response to changing network conditions, it is expected the video coding systems of the present invention will respond more quickly to changing network conditions than conventional systems.
The foregoing embodiments provide a coding/control system that estimates network characteristics and adapts performance of an encoder and a transmit buffer to respond quickly to changing network characteristics. The techniques described above find application in both software- and hardware-based control systems. In a software-based control system, the functional units described hereinabove may be implemented on a computer system (commonly, a server, personal computer or mobile computing platform) executing program instructions corresponding to the functional blocks and methods listed above. The program instructions themselves may be stored in a storage device, such as an electrical, optical or magnetic storage medium, and executed by a processor of the computer system. In a hardware-based system, the functional blocks illustrated above may be provided in dedicated functional units of processing hardware, for example, digital signal processors, application specific integrated circuits, field programmable logic arrays and the like. The processing hardware may include state machines that perform the methods described in the foregoing discussion. The principles of the present invention also find application in hybrid systems of mixed hardware and software designs.
Several embodiments of the invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention. For example,
The present application claims the benefit of U.S. Provisional Application Ser. No. 61/317,625, filed Mar. 25, 2010, entitled “Frame Dropping Algorithm for Fast Adaptation of Buffered Compressed Video to Network Condition Changes,” the disclosure of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61317625 | Mar 2010 | US |