The present invention is related to a copending U.S. patent application Ser. No. 10/281,395, filed Oct. 23, 2002, assigned to the assignee of the present invention. The present invention is also related to, and may have been claimed in part in a copending patent application No. PCT/IB02/02193, application date Jun. 14, 2002, assigned to the assignee of the present invention.
The present invention relates generally to error concealment and, more particularly, to packet loss recovery for the concealment of transmission errors occurring in digital audio streaming applications.
If a streaming medium is available in a mobile device, a user can use the mobile device for listening to music, for example. For music listening applications, audio signals are generally compressed into digital packet formats for transmission. The transmission of compressed digital audio, such as MP3 (MPEG-1/2 layer 3), over the Internet has already had a profound effect on the traditional process of music distribution. Recent developments in the audio signal compression field have rendered streaming digital audio using mobile terminals possible. With the increase in network traffic, a loss of audio packets due to traffic congestion or excessive delay in the packet network is likely to occur. Moreover, the wireless channel is another source of errors that can also lead to packet losses. Under such conditions, it is crucial to improve the quality of service (QoS) in order to induce widespread acceptance of music streaming applications.
To mitigate the degradation of sound quality due to packet loss, various prior art techniques and their combinations have been proposed. UEP (unequal error protection), a subclass of forward error correction (FEC), is one of the important concepts in this regard. UEP has been proven to be a very effective tool for protecting compressed domain audio bitstreams, such as MPEG AAC (Advanced Audio Coding), where bits are divided into different classes according to their bit error sensitivities. Using UEP for error concealment of percussive sound has been disclosed in U.S. patent application Ser. No. 10/281,395.
In another approach, Korhonen (“Error Robustness Scheme for Perceptually Coded Audio Based on Interframe Shuffling of Samples”, Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing 2002, Orlando Fla., pp. 2053–2056, May 2002) separates an audio frame to two parts: a critical data part and a less critical data part. The payload including the critical data part is transported via a reliable means, such as TCP (Transmission Control Protocol), while the less critical data part is transported by such means as UDP (User Datagram Protocol).
However, due to the error characteristics of mobile IP networks and the constraints on latency, packet delivery in the various UEP schemes and the selective retransmission schemes is still not very reliable. Especially when errors are due to packet losses in the congested IP networks, bit errors in wireless air interfaces, and hand-over in cellular networks. Thus, it is advantageous and desirable to provide a robust method and system for high quality audio streaming over packet networks, such as mobile IP networks, 2.5 G and 3 G networks and bluetooth. Such method and system must take into account the required computational complexity and memory/power consumption.
MPEG-2/MPEG-4 AAC coders and their related data structure are known in the art. The data structure of an AAC frame is shown in
The present invention provides a method and device for error concealment of transmission errors occurring in digital audio streaming. More specifically, packet loss due to transmission are recovered in the compressed domain. Error concealment is carried out in three separate data parts of the AAC frames: the critical data part including the header and the global gain, the QMDCT data and the scale factors. These data parts are stored in a plurality of buffers so that if one or more of the data parts are lost or corrupted, the corresponding data parts in the neighboring frames are used to conceal the errors in the current frame.
Thus, according to the first aspect of the present invention, there is provided a method of error concealment in a bitstream indicative of audio signals, wherein the bitstream comprises a current frame and at least one neighboring frame, each frame having a plurality of data parts in a compressed domain. The method is characterized by
storing said plurality of data parts in the compressed domain in said at least one neighboring frame,
determining whether the current frame is defective,
detecting at least one defective data part in the current frame if the current frame is defective, and
recovering said at least one defective data part in the current frame based on at least one of the stored data parts in said at least one neighboring frame.
If the defective data part in the current frame is a header, the defective header is recovered based on a statistical characteristic associated with the header of said at least one of the stored data parts in said at least one neighboring frame.
If the defective data part in the current frame is the global gain value, the defective data part is recovered based on the global gain in said at least one neighboring frame for recovering said at least one defective data part in the current frame.
Preferably, said at least one neighboring frame includes a first frame having a first global gain value and a second frame having a second global gain value smaller than the first global gain value, the defective data part in the current frame is recovered based on the second global gain value.
If the defective data parts in the current frame include one or more scale factors, the defective data parts are recovered based on the scale factors in said at least one neighboring frame for recovering said at least one defective data part in the current frame.
If the defective data parts in the current frame include the QMDCT coefficients, the defective data parts are recovered based on the QMDCT coefficients in said at least one neighboring frame, especially those in the lower frequency region. It is possible that the lost QMDCT coefficients in the current frame can be replaced by zeros.
According to the second aspect of the present invention, there is provided an audio decoder for decoding a bitstream indicative of audio signals for providing audio data in a modulation domain, wherein the bitstream comprises a current frame and at least one neighboring frame, each frame having a plurality of data parts, said decoder comprising a first module for decoding said each frame for providing a signal indicative of the plurality of data parts in a compressed domain. The decoder is characterized by
a second module, responsive to the signal, for storing said plurality of data parts in the compressed domain in said at least one neighboring frame, and by
a third module for detecting at least one defective data part in the compressed domain if the current frame is defective, so as to recover said at least one defective data part in the current frame based on at least one of the stored data parts in said at least one neighboring frame.
According to the third aspect of the present invention, there is provided an audio receiver adapted to receive packet data in audio streaming, said receiver comprising an unpacking module for unpacking the received packet data into a bitstream indicative of audio signals, wherein the bitstream comprises a current frame and at least one neighboring frame, each frame having a plurality of data parts. The receiver is characterized by
a decoding module, for decoding said each frame for providing a signal indicative of the plurality of data parts in a compressed domain, by
a storage module, responsive to the signal, for storing said plurality of data parts in the compressed domain in said at least one neighboring frame, and by
an error concealing module for detecting at least one data part in the current frame if the current frame is defective so as to recover said at least one defective data part in the current frame based on at least one of the stored data parts in said at least one neighboring frame.
According to the fourth aspect of the present invention, there is provided a telecommunication device, such as a mobile terminal. The telecommunication device comprises:
an antenna, and
an audio receiver connected to the antenna for receiving packet data in audio streaming, wherein the receiver comprises an unpacking module for unpacking the received packet data into a bitstream indicative of audio signals, wherein the bitstream comprises a current frame and at least one neighboring frame, each frame having a plurality of data parts, and wherein the receiver further comprises:
a decoding module, for decoding said each frame for providing a signal indicative of the plurality of data parts in a compressed domain,
a storage module, responsive to the signal, for storing said plurality of data parts in the compressed domain in said at least one neighboring frame, and
an error concealing module for detecting at least one data part in the current frame if the current frame is defective so as to recover said at least one defective data part in the current frame based on at least one of the stored data parts in said at least one neighboring frame.
The present invention will become apparent upon reading the description taken in conjunction with
a is a plot showing QMDCT coefficients in one of the stereo channels of an AAC frame.
b is a s plot showing QMDCT coefficients in another of the stereo channels of the AAC frame.
After applying various UEP (unequal error protection) schemes, the situation in the receiver side is likely to be that the most packet loss occurs in the QMDCT (Quantized Modified Discrete Cosine Transform) data in an AAC frame. Some packet loss occurs in the AAC scale factors. In rare situations, packet loss can occur in the critical data, or the AAC header and global—gain. If the critical data is loss, it is very difficult to decode the rest of that AAC frame.
Thus, the present invention carries out error concealment directly in the compressed domain. More particularly, the present invention conceals errors in three separate parts of the AAC frame: the critical data part including the header and the global—gain, the QMDCT data and the scale factors. The error concealment method, according to the present invention, is illustrated in the flowchart 500 of
For concealing errors in data 110, 112 and 114 in a current AAC frame, it is preferred that corresponding data in at least one previous frame is stored in a buffer. A receiver capable of carrying out the present invention is shown in
Because the data indicative of the AAC header and global—gain is the most critical data in error concealment, the protection of this critical data must be emphasized. The protection can be achieved by a number of ways as described below.
1) The critical data can be transmitted in advance, before the streaming starts. In this way, the occurrence of packet loss is most likely in the QMDCT data and the scale factors.
2) The critical data is protected by a selective re-transmission scheme. Because the critical data occupies less than 10% of the bits in most AAC bitstreams, a network-based re-transmission scheme will not reduce the transmission bandwidth significantly.
3) The critical data is embedded in multiple packets as ancillary data in the sender side.
With any one of these methods, the critical data of one or more frames can be stored in the receiver side. In case the packet loss is in the critical data, at least part of the critical data can be derived from neighboring frames based on their statistical characteristics and data structures. For example, the MDCT window—sequence of a frame n can be determined from the corresponding data in frames n−1 and n+1. Likewise, the window—shape can be reliably estimated from the neighboring frames. Regarding the global—gain, it is preferred that the smaller one of the global—gain values in the neighbor frames n−1 and n+1 be used to replace the missing value in the frame n. The criterion reflects the fact that a fill-in sound segment that results in a dip is perceptually more pleasant than that of a surge, according to psychoacoustics. The critical data buffer for error concealment in the critical data is shown in
After the critical data in the corrupted frame n is derived based on the critical data in frame n−1 and frame n+1 and the derived critical data is stored, there are at least two ways to generate the fill-in:
1. Estimate the missing scale factors and QMDCT data for frame n from neighboring frames as described later herein.
2. Mute the entire frame n in the compressed domain by setting the scale factors and the QMDCT coefficients in the frame to zero, and conceal the errors in the MDCT domain or PCM domain (see
If the packet loss is in the AAC scale factors only (i.e., the AAC header and the global—gain in the same frame are available), then the global—gain and the Huffman table can be used to code the individual scale factors. Furthermore, the sections with zero scale factors can be obtained from the section—data and the maximum value in each data section. As such, it is possible to estimate the individual DPCM (differential pulse code modulation) scale-factor and even the entire scale-factors in the AAC frame. The basic methodology for estimating the missing data is a partial pattern matching approach.
The errors in the scale factors can occur in different ways: 1). The entire scale factors in an AAC frame are lost; 2) a section of the scale factors in the AAC frame is lost; and 3) an individual scale factor in the AAC frame is lost. When all scale factors in an AAC frame are lost, the missing scale factors can be calculated based on one or more neighboring frames, as shown in
Excluding the first scale factor, which is the global—gain, we calculate the partial Euclidian distance dx,y between two channels x, y as follows:
where N is the number of scale factors in a channel, SCF is an individual scale factor, w is a percecptual weighting factor and c=Gx,−Gy and Gx,, Gy are global—gains of channels x an y. For more sophisticated implementation, c can be derived with a search method to yield the minimum distance between the two channels.
For example, if a section or all of the scale factors for the right channel of frame n are lost, the partial Euclidian distance d1 between the left and right channels of frame n−1 and the partial Euclidian distance d2 between the left channel of frame n−1 and the left channel of frame n are computed in order to decide whether inter-channel correlation or inter-frame correlation is used for error concealment purposes. If d1>d2 (or lag=2), then inter-frame correlation should be used and the lost scale factors in the right channel of frame n should be recovered based on the scale factors in the right channel of frame n−1. If d1<d2 (or lag=1), then inter-channel correlation should be used and the lost scale factors in the right channel of frame n should be recovered based on the scale factors in the left channel of frame n. Before replacing the missing scale factors with the stored ones, some adjustments may be necessary in order to prevent any false energy surge or to avoid creating false salient frequency components. For example, the global—gain offset, c, between two channels should be taken into account.
If an individual scale factor in an AAC frame is lost and its position is known, it is possible to estimate the missing DPCM coded scale factor if the scale factors in one or more neighboring frames are not corrupted. Without losing generality, we assume that two individual scale factors are missing, as shown in
The most frequent situation in packet loss is that the QMDCT coefficients are corrupted or lost, but the header and the scale factors are available. In this situation, the partial pattern matching approach can also be used to recover the lost QMDCT coefficients. An example of QMDCT coefficients of an AAC frame is shown in
In order to recover the QMDCT in the high frequency region, two situations are assumed. If the entire QMDCT coefficients of a frame are lost (max 1024), it is preferred that the buffered information alone is used to recover the missing QMDCT coefficients. The lag value (1 or 2) using the autocorrelation of the FVs in the previous frame is calculated in order to determine whether inter-channel or inter-frame correlation should be used. Based on the lag value, it can be determined whether a different channel of the same frame or the same channel of a different frame is used. With lag values calculated from frames, it is also possible to determine which previous frame is to be used to replace the missing one. In order to prevent the fill-in QMDCT coefficients from exceeding the maximum value as defined by the Huffman codebook being used, the fill-in QMDCT coefficients should be clipped. The entire fill-in QMDCT coefficients can be decreased by a constant, for example, so that there will not be an energy surge in the fill-in frame.
If only an isolated cluster of QMDCT coefficients (a cluster of 2 or 4, for example) in the high frequency region is lost, the simplest way to conceal the errors is to replace all the missing QMDCT coefficients with zeros.
In a situation where only an isolated cluster of QMDCT coefficients in the low frequency region is lost, inter-frame correlation can be used to check the partial Euclidian distance with neighboring frames, and the fill-in coefficients are modified by a decreasing factor in order to prevent a false energy surge from occurring.
It should be noted that the receiver 5, as described above, also includes error concealment modules and buffers to reconstruct the corrupted or missing percussive sounds in an audio bitstream. The detail of percussive sound recovery has been disclosed in the copending U.S. patent application Ser. No. 10/281,395. However, the method and device for compressed-domain packet loss concealment, according to the present invention, can be implemented without the percussive sound recovery scheme.
The error concealment method and device, can be used in a mobile terminal, as shown in
Thus, although the invention has been described with respect to a preferred embodiment thereof, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the scope of this invention.
Number | Name | Date | Kind |
---|---|---|---|
5862518 | Nomura et al. | Jan 1999 | A |
5928379 | Hattori | Jul 1999 | A |
6327689 | Tian | Dec 2001 | B1 |
6490243 | Tanaka et al. | Dec 2002 | B1 |
20020126988 | Togashi et al. | Sep 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
20040128128 A1 | Jul 2004 | US |