The present disclosure relates to video communication systems and techniques.
Real-time video is sensitive to latency. As a result, lost video packets in a video stream are usually not retransmitted because the decoder at the destination device cannot use them to correct for the lost packet when the retransmitted video packet eventually arrives. A packet is normally only useful to reconstruct a “current” video frame for display of a picture.
In some cases, it is unavoidable but to use a network over which the video streams are transmitted that has a relatively high error rate. At a certain level of packet loss, the probability is very low that the delivery of a frame is completely error free to all the destination devices. It is nevertheless desirable to guarantee that a certain reference frame is received and decoded without error by all intended destination devices. Error resilience of a video decoding process can be improved by decoding a late packet, if that packet can be used to repair an error in a frame that will be used as a reference frame in the future, that is, for display of a video frame yet to be displayed with respect to the current playout time.
Overview
Techniques are provided for video communication between multiple devices. Each of a plurality of video packets is designated as being part of a required reference frame that is subsequently to be used for a repair process. A stream of video packets that includes the packets for the required reference frame is transmitted from a source device over a communication medium for reception by a plurality of destination devices. A determination is made that at least one of the plurality of destination devices did not receive at least one packet of the required reference frame, and the at least one packet is retransmitted to the at least one of the plurality of destination devices. When the retransmitted packet is received at the at least one destination device, it is decoded and stored without using it for generating a picture for display at the time that the at least one packet is received.
Referring first to
The system 5 comprises a plurality of endpoint devices 100(1), 100(2), . . . , 100(N) each of which can simultaneously serve as both a source and a destination of a video stream (containing video and audio information). Each endpoint device, generically referred to by reference numeral 100(i), comprises at least one video camera 110, at least one display 120, an encoder 130, a decoder 140 and a network interface and control unit 150. The video camera 110 captures video and supplies video signals to the encoder 130. The encoder 130 encodes the video signals into packets for further processing by the network interface and control unit 150 that transmits the packets to one or more other endpoint devices. Conversely, the network interface and control unit 150 receives packets sent from another endpoint device and supplies them to the decoder 140. The decoder 140 decodes the packets into a format for display of picture information on the display 120. Audio is also captured by one or more microphones and encoded into the stream of packets passed between endpoint devices.
A video conference may be established between any two or more endpoint devices via a network 50. In particular, when there are more than two endpoint devices involved in a video conference, it is advantageous to have a third device that manages the distribution of information to all of the intended destination endpoint devices. To this end, a multipoint control unit (MCU) 200 is provided that also connects to the network 50 and forwards packets of information (video packets) from one endpoint device, referred to as a source device, to each of the other endpoint devices involved in the video conference, referred to herein as destination devices.
The endpoint devices and MCU 200 are configured to perform a retransmission process for packets that are part of a certain type of video frame, called a reference frame, and more particularly, part of a certain type of reference frame, called a required or “must-have” reference frame. When an endpoint device is acting as a source for a video stream, the endpoint device generates and includes in the video stream packets associated with a required reference frame. For example, the endpoint device 100(1) is acting as a source device with respect to a video stream that is being transmitted to a plurality of intended destination devices 100(2)-100(N) as part of a video conference. The endpoint device 100(1) designates, labels or marks packets (e.g., in an appropriate header or other field) that are part of a required reference frame packets to indicate that they are part of a required reference frame. Normally, when a device successfully receives a packet, it transmits an acknowledgement (ACK) message for that packet. When an intended destination device, such as device 100(2) in
Turning now to
Turning to
The logic for performing the functions of processes 300, 400 and 500 may be embodied by computer software instructions stored or encoded in a computer processor readable memory medium that, when executed by a computer processor, cause the computer processor to perform the process functions described herein. Alternatively, these processes may be embodied in appropriate configured digital logic gates, in programmable or fixed form, such as in an application specific integrated circuit with programmable and/or fixed logic. Thus, in general, these processes may be embodied in fixed or programmable logic, in hardware or computer software form. Furthermore, the functions the encoder 130 and decoder 140 in the endpoint devices may also be performed using logic in a form that is also used for performing the processes 300, 400, and 500.
According to the techniques described herein, certain packets of video are designated or “marked” to be retransmitted if they are lost because these packets are part of valuable reference frames, the aforementioned required reference frames, which will be referenced by many future frames. The fact that these frames are guaranteed to be received correctly at the destination devices is relied upon by the source device of the video stream. Thus, all of the packets associated with a required reference frames should be correctly received by all of the intended destination devices for that video stream.
Turning now to
Thus, at 60, a source device, e.g., 100(1), sends packets of a required reference frame K to the MCU 50 for distribution to all of the intended destination devices. In this example, one destination device 100(2) is shown. At 62, the MCU stores a copy of the packets for frame K, and at 64 transmits the packets of frame K to all of the intended destination devices, including device 100(2). At 66, device 100(2) receives all of the packets of frame K without error, decodes frame K for storage and uses frame K for displaying data at the appropriate time. Since frame K is complete at all of the destination devices at this point, the source device 100(1) knows that it can, in the future, send repair frames that use frame K at any and all of the destination devices.
At 70, source device 100(2) again sends packets of a new required reference frame, this time called reference frame N. At 72, the MCU stores a copy of the packets for frame N and because frame N is a new required reference frame, the MCU also deletes a copy of the previously received required reference frame, frame K. At 74, the MCU sends the packets for frame N to all of the intended destination devices. At 76, the destination device 100(2) fails to receive without error (i.e., loses) a packet of frame N, and accordingly at 78 sends NACK message to the MCU for the lost packet of frame N. Nevertheless, the destination device 100(2) uses frame N to display a picture at the appropriate time, albeit with the packet error. At 80, the MCU retransmits the lost packet of frame N to destination device 100(2). At 82, the destination device 100(2) receives the lost packet and now has a complete frame N that it decodes and stores as a required reference frame. When the retransmitted packet of frame N is received, it is decoded and stored in memory and is not used for displaying a picture (since the time has passed when it would have been used for displaying a picture). Thus, the required reference frame N has value even if it is not completely available (error-free) at the destination device, due to a lost packet, until after the time at which the picture data in the frame is to be displayed.
As described above, the required reference frame contains picture data associated with a “live” video stream and is therefore intended to be used for generating a picture at the appropriate time when received and decoded by a destination device. In this case, if a required reference frame packet is lost and needs to be retransmitted, the retransmitted packet is not intended for use in displaying picture data at the time that it is received and decoded by the destination device.
According to one variation, the required reference frame may itself be a repair frame. In this case it would not use the previous frame for prediction, and would not propagate any errors in the image from previous frames. A repair frame may be an intra-coded frame (I-frame) described hereinafter. A repair frame may also be a P frame that is motion predicted with reference to a prior (older) reference frame that has been acknowledged by all of the destination devices.
According to another variation, the required reference frame may contain data that is not part of a live-encoded video stream and as such is not intended for use in displaying a picture when it is received (from an initial transmission).
The processes performed at the source device, destination device and MCU are now described in greater detail with reference to
At 340, the source device receives ACK and NACK messages from the destination for packets of past frames that it transmitted, both for a normal video frame transmitted at 330 and for a required reference frame transmitted at 320. As explained hereinafter in conjunction with
At 350, the source device determines whether all of the destination devices correctly received and decoded (ACK'd) all packets of a previous required frame transmitted at 320. When it is determined that all of the destination devices ACK'd all of the packets of the required reference frame, then at 360, the source device “promotes” the required reference frame, by designating that the required reference frame is a best required reference frame for use in error correction when generating repair frames. In addition, the source device deletes the older required reference frame that was previously transmitted. When at 350 it is determined that one or more destination devices did not receive all the required reference frame packets, no more action is taken on that frame and the process continues to 370. Eventually, the required reference frame will become promoted, since the MCU 200 will take care of retransmitting it until it is completely received by all destination devices. The test 350 will be repeated on each trip through the process 300, at every frame time, so that the source device knows when to promote the frame to be the new best required reference frame.
Next, at 370, based on the ACK and NACK messages received at 340, the source device determines whether any recent frame was received in error. An error in a frame is caused by any lost packet from that frame. If a recent frame was received in error or (a packet is) lost by a destination device, then at 380, the source device generates a repair frame using the most recent best required reference frame as the reference picture and transmits that repair frame to the MCU 200 that sends it to the requesting destination device. As explained above, the repair frame is predictively encoded with reference to the required reference frame and without reference to a most recent video frame. The process 300 then repeats at 310 after 380 or after 370 if it is determined that no recent was received in error or lost by a destination device.
Turning now to
When a new ACK or NACK is received, the process continues with ACK/NACK processing at 430. It proceeds in a loop as shown by arrow from function 475 back to 430, repeating the next steps for each recently transmitted packet. At 430, the MCU determines for each recently transmitted packet, whether the packet is part of a required reference frame.
If a packet is determined to be part of a required reference frame, then at 440, the MCU determines whether all destination devices have sent an ACK message for a required reference frame packet. At 450, the MCU transmits an ACK to the source device when all destination devices ACK a packet for a required reference frame. When at 440 the MCU determines that all destination devices did not ACK a packet for a required reference frame, then at 460, the MCU determines, for each destination device that did not ACK that packet (i.e. NACK'd a required reference frame packet), whether that packet is stored in MCU memory and if so, the MCU transmits a copy of that packet to the appropriate destination device(s). When that copy packet is ACK'd or NACK'd, the function of 440 will evaluate again whether all devices have ACK'd it. Eventually the packet is received, and the function of 440 evaluates to “yes” and proceeds to 450. The source device is waiting for a confirmation that all of the destination devices have received all packets of the required reference frame as indicated at function 350 of the process 300 shown in the flowchart of
When at 430 it is determined that the packet is not a required reference frame packet, then at 470, the MCU determines if all destination devices ACK'd the packet and if so, sends an ACK to the source device, and otherwise sends a NACK with a packet sequence identifier for that packet to the source device. The source device uses information contained in the NACK message to generate a repair frame (at 380 in
Turning now to
When at 520 it is determined that the packet is not for a future video frame not yet displayed, then the process proceeds to 550 where it is determined whether the packet is part of a required reference frame 550. This would be the case when the device receives a retransmitted required reference frame packet. When the packet is part of a required reference frame, then at 560 the packet is decoded and used to change or update pixel data associated with the previously decoded and stored packets of a required reference frame stored in the required reference frame memory storage 158 of the memory 154. Thus, even though a retransmitted packet for a required reference frame is received too late to be used in generating a picture, the packet is used to update or change picture data in the reference frame storage 158, and the required reference frame is available for later use for a frame repair at a later time. As explained above, when generating a new required reference frame, the source device may encode the packets for the new required reference frame based solely on one or more previously transmitted required reference frames (such as the most recent reference frame) that have been successfully decoded and stored by the destination devices. This reduces complexity in the system because the reference frame needed for image reconstruction is guaranteed to be in memory and it is guaranteed to have been received and decoded without error.
Next, at 570, the endpoint device transmits an ACK message to the MCU for the packet. Then, at 580, the device examines the packet sequence number for the packet to determine whether it indicates that a prior packet (based on that prior packet's packet sequence number) has been lost, and if so at 590 the device transmits a NACK to the source (via the MCU) together with an packet number/identifier for the lost prior packet. The transmitted message at 590 may be referred to as a packet loss message. After that, the process repeats at 510. The NACK message will identify a lost required reference frame packet by packet sequence number and the MCU will respond to this NACK message by retransmitting that required reference frame packet (see functions 440 and 460 in
It is evident from the description of the process 500 that a device decodes a received repair frame and displays a picture from the repair frame, wherein the repair frame is predictively encoded with reference to the required reference frame and without reference to a most recent video frame.
While the foregoing description and figures indicate that the MCU serves as an intermediary between endpoint devices, it should be understood that the MCU is not required. That is, there may be circumstances or implementations in which each endpoint device also performs the MCU functions depicted in
As is known in the art, a group of pictures (GOP) sequence formatted according to the MPEG standards begins with an intra-coded (I) picture or frame that serves as an anchor. All of the frames after an I frame are part of a GOP sequence. Within the GOP sequence there are a number of forward predicted or P frames. The first P frame is decoded using the I frame as a reference using motion compensation and adding difference data. The next and subsequent P frames are decoded using the previous P frame as a reference. When a new endpoint joins a communication session, it will need to receive an I-frame to begin decoding the stream, but thereafter it may not need to receive another I-frame if the techniques described herein for a required reference frame are used. In fact, the I-frame may be encoded so as to serve as both an I-frame and as a required reference frame.
There are numerous advantages of the scheme described above and depicted in FIGS. 1 and 4-7. First, because this mechanism ensures that all destination devices receive a required reference frame, the use of such a reference frame-based repair mechanism is more viable with there are a large number of endpoint devices involved (i.e., a large number of destination devices). Moreover, the reference frame-based repair technique can be used over networks that exhibit greater packet loss performance because, again, certain reference frames are designated as required reference frames whose error free reception is in essence guaranteed by the retransmission techniques described herein. Further, the source device avoids a situation where it has to send numerous I-frames to repair past errors experienced by a destination device. Instead, the source device can use more predictive coding and therefore provide higher quality video to the destination devices. The delivery mechanism described herein provides for retransmission of packets of a required reference frame based on packet loss, rather than using additional bandwidth to add redundancy for required reference frames.
The required reference frame packet retransmission techniques described herein achieves improved picture quality even when the network is introducing packet loss issues. Prior retransmission schemes cause delays. Forward error correction (FEC) increases payload size unconditionally. FEC also causes latency when data is redistributed over multiple packets.
The techniques described herein provide some of the quality improvement of retransmission, but without an increase in the video latency. The techniques described herein guarantee that a certain reference frame is ultimately received and decoded without error by all intended destination devices. If a retransmission is necessary to achieve this, it is only a retransmission of one or packets that were lost or not decodable. Furthermore, the retransmission occurs only between the MCU and the destination endpoint device that experiences the lost packet. Consequently, a greater number of endpoint destination devices can be accommodated without bogging down the source endpoint device.
Although the apparatus, system, and method are illustrated and described herein as embodied in one or more specific examples, it is nevertheless not intended to be limited to the details shown, since various modifications and structural changes may be made therein without departing from the scope of the apparatus, system, and method and within the scope and range of equivalents of the claims. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the apparatus, system, and method, as set forth in the following claims.