This application claims the benefit under 35 U.S.C. §119(a) of Korean Patent Application No. 10-2006-0037247, filed in the Korean Intellectual Property Office on Apr. 25, 2006, the entire disclosure of which is incorporated herein by reference.
1. Field of the Invention
The present invention relates generally to an apparatus and method for recovering (concealing the loss of) voice packets. More particularly, the present invention relates to an apparatus and method for determining whether a voice packet received at a relay is lost and if the voice packet is lost, recovering (concealing the loss of) the voice packet before relaying the voice packet to a receiver.
2. Description of the Related Art
Voice over Internet Protocol (VoIP) is Internet telephony that sends voice packets over a packet network designed for data communications. VoIP is a communication technology that enables a call to be performed as in a regular phone by converting voice data into IP packets.
In a VoIP communication network, a transmitter converts a voice Pulse Coded Modulation (PCM) signal into compressed voice parameter information using a voice encoder constructed based on a human voice generation model and stores the voice parameter information in packets, prior to transmission to a receiver. The receiver extracts the voice parameter information from the voice packets and reproduces the PCM signal using the extracted information. Since packet transmission is carried out asynchronously in the VoIP communication network, the voice packets do not arrive at the receiver in a consistent manner. If a large number of packets arrive around a particular time, they may be lost. Also, in a mobile communication environment, a bad channel status can lead to packet loss.
Accordingly, recovering of lost packets is a necessary task to be performed by the receiver. Generally, Packet Loss Concealment (PLC) is used to recover the lost packets. Existing PLC techniques basically use the voice information of adjacent normal packets. If only voice information of a previous normal packet is used, a lost packet can be recovered to a certain extent without adding to a packet transmission delay. However, the use of the voice information of both previous and following packets more effectively recovers the lost packet. Unfortunately, the packet transmission delay increases with the number of following packets used to recover the lost packet and thus, the number of packets used needs to be controlled according to a packet transmission time and service requirements.
For further illustration, a conventional voice packet communication environment in the case where the transmitter and the receiver use the same type of audio Coder-Decoder (CODEC) will be described in detail below.
The transmitter 110 has a voice packet generator 112 and outputs a voice packet generated from the voice packet generator 112 on a first channel 120. The relay 130, which can be a gateway or a Base Station (BS), outputs the voice packet received from the transmitter 110 on a second channel 140 using a bypass block 132. When the transmitter 110 and the receiver 150 use the same type of audio CODEC, the bypass block 132 outputs the voice packet without any additional processing, to the receiver 150. The receiver 150 recovers the voice packets that may be lost during transmission on the channels 120 and 140, and converts the voice packets to an analog voice signal using a voice packet recovery and output block 152.
As described above, in the case where the transmitter and the receiver use the same kind of audio CODEC and thus, the relay simply bypasses a received voice packet, it is difficult to improve the performance of the packet recovery block in the receiver unless the receiver is a new product, such as a terminal or an IP phone. However, even though voice packet recovery devices and methods have been improved, their features are not applicable to conventional receivers in real implementation. Consequently, users of conventional receivers may not receive services with better voice quality.
Voice packet recovery in the voice packet recovery block 152 has a number of drawbacks, including the following.
First, recovery of a lost voiced packet may be recovered using a previous unvoiced packet. For example, when voiced packets follow an unvoiced packet, that is, when an unvoiced-voiced transient area exists, loss and noise information in the first of the voiced packet is recovered using the previous unvoiced packet.
Second, with the conventional voice packet recovery, when a plurality of packets are contiguously lost, buzz may be output because they are recovered using the voice information of a previous packet.
Third, voice waveforms may become bigger in the process of deriving a lost packet from a previous packet.
Accordingly, a need exists for a system and method for more effectively and efficiently recovering voice packets.
An object of embodiments of the present invention is to substantially solve at least the above problems and/or disadvantages and provide at least the advantages below. Accordingly, an object of embodiments of the present invention is to provide a voice packet recovery apparatus and method for determining whether a voice packet received at a relay is lost and if the voice packet is lost, recovering the voice packet prior to relaying the voice packet to a receiver.
Another object of embodiments of the present invention is to provide a voice packet recovery apparatus and method for correcting voiced information which has been recovered using unvoiced information of a previous packet.
A further object of embodiments of the present invention is to provide a voice packet recovery apparatus and method for reducing buzz that may be created when a plurality of voice packets are contiguously lost.
Still another object of embodiments of the present invention is to provide a voice packet recovery apparatus and method for restricting voice information with a large waveform that can be created during voice packet recovery.
According to one aspect of embodiments of the present invention, an apparatus for recovering a lost voice packet is provided, in which a packet loss detector determines whether a received packet has been lost, packet information storage stores voice information of previous voice packets and voice information of the received voice packet, a packet error corrector measures the voice information of the received voice packet, stores the measured voice information in the packet information storage, corrects the voice information when necessary, and generates a corrected voice packet, if the received voice packet is normal, and a packet loss recoverer recovers the voice information of the received voice packet using the voice information of previous voice packets stored in the packet information storage and generates a recovered voice packet, if the received voice packet has been lost.
According to another aspect of embodiments of the present invention, a method for recovering a lost voice packet is provided, in which a voice packet is received and it is determined whether the received packet has been lost, and voice information of the received voice packet is recovered using voice information of previous voice packets if the received voice packet has been lost.
The above and other objects, features and advantages of embodiments of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings in which:
Throughout the drawings, like reference numerals will be understood to refer to like parts, components and structures.
Exemplary embodiments of the present invention will be described herein below with reference to the accompanying drawings. Descriptions of well-known functions or constructions are omitted for clarity and conciseness.
Embodiments of the present invention provide an apparatus and method for determining whether a voice packet received at a relay is lost and if the voice packet is lost, recovering the voice packet before relaying the voice packet to a receiver.
The packet loss detector 310 determines whether there is any packet loss in a received voice packet (a received voice packet has been lost). The packet loss detector 310 determines there is the packet loss if the packet rate of the received voice packet is not one of predetermined packet rates representing existence of packets in the communication system. If the voice packet is normal, the packet loss detector 310 sends the voice packet to the packet error corrector 320. If there is any packet loss in a received voice packet, the packet loss detector 310 sends the voice packet to the packet loss recoverer 340.
The packet error corrector 320 measures voice information of the voice packet, stores the measurements in the packet information storage 330, corrects the voice information when needed, and outputs the corrected voice packet.
The packet loss recoverer 340 recovers the lost voice packet using voice information of previous voice packets stored in the packet information storage 330.
The packet information storage 330 stores information with which to determine whether correction is required, the data rate of the latest received valid voice packet, and the voice information of voice packets.
The voice information extractor 422 measures the data rate of a received voice packet and extracts voice information from the voice packet according to the data rate. If the data rate is equal to or lower than a threshold, the voice information extractor 422 determines that the voice packet is an unvoiced one and calculates a Line Spectrum Pair (LSP) and a gain representing a voice amplitude from the voice packet.
If the data rate is higher than the threshold, the voice information extractor 422 determines that the voice packet is a voiced one which includes unvoiced information and voiced information and calculates an LSP, a pitch, an Adaptive CodeBook (ACB) gain, and a Fixed CodeBook (FCB) gain from the voice packet. The LSP is voiced information indicating the spectral energy of voice. The pitch is the interval between voiced sounds and the ACB gain is the gain of voiced sound. The FCB gain is the gain of unvoiced sound. Additional details of the voice information are described in publication “TIA/EIA/IS-127, Enhanced Variable Rate Codes, Speech Service Option3 For Wideband Spread Spectrum Digital Systems, 1997”, the entire disclosure of which is incorporated herein by reference.
The voice information extractor 422 extracts voice information. The extracted voice information is stored in the packet information storage 330. Also, the extracted voice information is transferred to the voice information calculator 424.
The voice information calculator 424 receives the data rate and the voice information from the voice information extractor 422. If the data rate is equal to or less than the threshold, which means that the received voice packet is an unvoiced one, the voice information calculator 424 checks the average LSP and average gain of unvoiced packets among packets up to the previous packet, and calculates the average LSP and the average gain of the previous unvoiced packets and the received packet by using Equation (1) below.
If the data rate is higher than the threshold, which means that the received packet is a voiced one, the voice information calculator 424 calculates the average ACB gain and average FCB gain of the received voice packet by using Equation (2) below. Equation (1) applies to an unvoiced packet, and Equation (2) applies to a voiced packet.
LSPavg[i]=(1−α)·LSPavg,pre[i]+α·LSP[i]; i[0,MAXindex−1]GAINavg=(1−α)·GAINavg,pre+α·GAIN (1)
In the above Equation (1), LSPavg,prev represents the average LSP of the previous unvoiced packets among received packets up to the previous packet, LSP represents the LSP of the received unvoiced packet, LSPavg represents the average LSP of the previous unvoiced packets and the received unvoiced packet, MAXindex represents the number of elements in an LSP set, GAINavg,pre represents the average gain of the previous unvoiced packets, GAIN represents the gain of the received unvoiced packet, GAINavg represents the average gain of the previous unvoiced packets and the received unvoiced packet, and α is a weight.
In the above Equation (2), ACB[i] represents the ACB gain of an ith subframe in the received voice packet, ACBavg represents the average ACB gain of the subframes of the received packet, and subframe_num represents the number of the subframes of the received voice packet.
Returning to
The corrector 426 receives the data rate, the voice information, and the average voice information of the received voice packet and determines whether to correct the voice packet, referring to a correction flag stored in the packet information storage 330. If correction is not required, the corrector 426 initializes the correction flag and outputs them to the packetizer 428. If correction is required, the corrector 426 converts the ACB gain to 0, thereby reducing noise that may be produced, initializes the correction flag of the packet information storage 330, stores the corrected voice information in the packet information storage 330, and outputs it to the packetizer 428.
The corrector 426 determines that correction is required if the data rate is high and the correction flag is set by the packet loss recoverer 340. That is, the corrector 426 compensates for possible noise if a previous recovered packet was recovered using an unvoiced packet and the received packet is a voiced one with a high data rate.
The packetizer 428 then generates a corrected voice packet using the voice information received from the corrector 426.
Upon receipt of a lost voice packet, the rate decider 542 checks the data rate of the latest valid packet stored in the packet information storage 330 and provides the data rate to the voice information recoverer 544. The data rate of the latest valid packet is that of the latest received voice packet which is not damaged among received voice packets.
The voice information recoverer 544 recovers voice information of the received voice packet using voice information stored in the packet information storage 330 according to the data rate of the latest valid packet, stores the recovered voice information in the packet information storage 330, and sends it to the packetizer 546. If the data rate of the latest valid packet is low and thus, the voice information is recovered using unvoiced information stored in the packet information storage 330, the voice information recoverer 544 sets the correction flag in the packet information storage 330.
The packetizer 546 then generates a recovered voice packet using the voice information received from the voice information recoverer 544.
The correction flag storage 632 stores the correction flag. The correction flag is set when the last recovered voice packet was recovered using unvoiced information. The correction flag is set by the packet loss recoverer 340 and initialized by the packet error corrector 320.
The rate storage 634 stores the data rate of the latest valid packet, which is the last of data rates received from the packet error corrector 320 and provides it to the packet loss recoverer 340 upon request from the packet loss recoverer 340.
The voice information storage 636 stores the voice information of normal voice packets and lost voice packets.
If the voice packet has been lost, the relay recovers the voice packet in step 704 and outputs the recovered voice packet in step 708. The voice packet recovery of step 704 will be described later in great detail below with reference to
If the voice packet is normal, the relay corrects the voice packet in step 706 and outputs the corrected voice packet in step 708. The voice packet correction of step 706 will be described in great detail below with reference to
In case of a low data rate, the packet error corrector 320 measures unvoiced information of the voice packet in step 810 and initializes the correction flag in the packet information storage 330 in step 812. The packet error corrector 320 stores the unvoiced information in the packet information storage 330 in step 814 and generates a voice packet using the unvoiced information in step 816.
In case of a high data rate, the packet error corrector 320 measures unvoiced information and voiced information of the voice packet in step 804 and determines whether voice correction is required according to the correction flag stored in the packet information storage 330 in step 806. If the correction is required, which means that the correction flag is set, the packet error corrector 320 changes the ACB gain of the voice packet to 0, for correcting the voice information of the voice packet in step 808. The packet error corrector 320 initializes the correction flag in step 812, stores the unvoiced information, the voiced information, and the corrected voice information in the packet information storage 330 in step 814, and generates a voice packet using the unvoiced information, the voiced information, and the corrected voice information in step 816.
If the correction is not required in step 806, which means that the correction flag is not set, the packet error corrector 320 stores the unvoiced information and the voiced information in the packet information storage 330 in step 814, and generates a voice packet using the unvoiced information and the voiced information in step 816.
If the data rate is not low, the packet loss recoverer 340 recovers voiced information and unvoiced information of the voice packet using voice information stored in the packet information storage 330 in step 904 and stores the recovered voice information in the packet information storage 330 in step 908, and generates a voice packet using the recovered voice information in step 910.
If the data rate is low, the packet loss recoverer 340 recovers unvoiced information of the voice packet using voice information stored in the packet information storage 330 in step 906 and stores the recovered voice information in the packet information storage 330 in step 908, and generates a voice packet using the recovered voice information in step 910. The recovery of unvoiced information will be described later in great detail below with reference to
After the pitch recovery, the voice information recoverer 544 calculates a pitch difference by subtracting the pitch of the previous voice packet from the recovered pitch in step 1004.
The voice information recoverer 544 checks the average ACB gain of the previous packet stored in the packet information storage 330 and if the average ACB gain is larger than an ACB gain limit, changes the average ACB gain to the ACB gain limit in step 1006. The ACB gain restriction is for preventing a sudden increase in voice waveform during the voice packet recovery.
In step 1008, the voice information recoverer 544 recovers the ACB gain of the received voice packet to be equal to the average ACB gain. If a voice packet does not have a low data rate, it includes both voiced information and unvoiced information, and one packet is comprised of three subframes each having voiced information. Therefore, the voice information recoverer 544 recovers the ACB gain of each subframe to be equal to the average ACB gain in step 1008.
The voice information recoverer 544 compares the average ACB gain with an ACB gain threshold in step 1010. If the average ACB gain is larger than the ACB gain threshold, the voice information recoverer 544 sets the FCB gain of the received voice packet to a predetermined value in step 1012 and randomly sets an FCB index in step 1016. The predetermined value is an empirically obtained small value or 0. If the average ACB gain of the previous voice packet is larger than the ACB gain threshold, this means that it is highly probable that the received voice packet is a voiced packet. Therefore, the effects of unvoiced sound are reduced by setting the FCB gain representing the amplitude of the unvoiced sound to the predetermined small value in step 1012.
If the average ACB gain is equal to or less than the ACB gain threshold, the voice information recoverer 544 recovers the FCB gain of the received voice packet to be equal to the FCB gain of the previous packet in step 1014 and randomly sets the FCB index of the received voice packet in step 1016. Since the FCB index is not related to the previous voice packet, it is randomly generated.
GAIN=(1−β)·GAINprev+β·GAINavg (3)
In the above Equation (3), GAIN represents the recovered gain of the received unvoiced packet, GAINprev represents the gain of the latest unvoiced packet, GAINavg represents the average gain of the unvoiced packets, and β is a weight.
In step 1104, the voice information recoverer 544 sets the correction flag in the packet information storage 330. The correction flag is set when a lost packet is recovered using unvoiced information stored in the packet information storage 330.
As is apparent from the above description, the voice packet recovery apparatus and method according to embodiments of the present invention provide a voice service with less noise, suppressed buzz, and restricted abrupt big sounds to a voice packet receiver. Also, application of embodiments of the present invention to a relay leads to cost decreases.
While the present invention has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2006-0037247 | Apr 2006 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
3952164 | David et al. | Apr 1976 | A |
4541111 | Takashima et al. | Sep 1985 | A |
4797926 | Bronson et al. | Jan 1989 | A |
4912764 | Hartwell et al. | Mar 1990 | A |
5073940 | Zinser et al. | Dec 1991 | A |
5142582 | Asakawa et al. | Aug 1992 | A |
5224061 | Veldhuis | Jun 1993 | A |
5414796 | Jacobs et al. | May 1995 | A |
5450449 | Kroon | Sep 1995 | A |
5623575 | Fette et al. | Apr 1997 | A |
5657420 | Jacobs et al. | Aug 1997 | A |
5732389 | Kroon et al. | Mar 1998 | A |
5734789 | Swaminathan et al. | Mar 1998 | A |
5765127 | Nishiguchi et al. | Jun 1998 | A |
5787387 | Aguilar | Jul 1998 | A |
5835480 | Chennakeshu | Nov 1998 | A |
5897615 | Harada | Apr 1999 | A |
5995923 | Mermelstein et al. | Nov 1999 | A |
6028890 | Salami et al. | Feb 2000 | A |
6101463 | Lee et al. | Aug 2000 | A |
6104726 | Yip et al. | Aug 2000 | A |
6233552 | Mustapha et al. | May 2001 | B1 |
6252952 | Kung et al. | Jun 2001 | B1 |
6256609 | Byrnes et al. | Jul 2001 | B1 |
6377914 | Yeldener | Apr 2002 | B1 |
6385578 | Lee et al. | May 2002 | B1 |
6415252 | Peng et al. | Jul 2002 | B1 |
6418407 | Huang et al. | Jul 2002 | B1 |
6526376 | Villette et al. | Feb 2003 | B1 |
6587816 | Chazan et al. | Jul 2003 | B1 |
6636829 | Benyassine et al. | Oct 2003 | B1 |
6714907 | Gao | Mar 2004 | B2 |
6728669 | Benno | Apr 2004 | B1 |
6775649 | DeMartin | Aug 2004 | B1 |
6810377 | Ho et al. | Oct 2004 | B1 |
6882711 | Nicol | Apr 2005 | B1 |
6952668 | Kapilow | Oct 2005 | B1 |
6968309 | Makinen et al. | Nov 2005 | B1 |
6981193 | Park | Dec 2005 | B2 |
7092881 | Aguilar et al. | Aug 2006 | B1 |
7280960 | Wang et al. | Oct 2007 | B2 |
7457746 | Gao | Nov 2008 | B2 |
7519535 | Spindola | Apr 2009 | B2 |
7539615 | Koistinen et al. | May 2009 | B2 |
7864814 | Johansson et al. | Jan 2011 | B2 |
8351341 | Callon | Jan 2013 | B1 |
20010008995 | Kim et al. | Jul 2001 | A1 |
20010021906 | Chihara | Sep 2001 | A1 |
20010044718 | Cox et al. | Nov 2001 | A1 |
20020062209 | Choi | May 2002 | A1 |
20020123887 | Unno | Sep 2002 | A1 |
20030097260 | Griffin et al. | May 2003 | A1 |
20030101049 | Lakaniemi et al. | May 2003 | A1 |
20030135374 | Hardwick | Jul 2003 | A1 |
20040044487 | Jung | Mar 2004 | A1 |
20040073692 | Gentle et al. | Apr 2004 | A1 |
20040181405 | Shlomot et al. | Sep 2004 | A1 |
20040184443 | Lee et al. | Sep 2004 | A1 |
20040192259 | Xie | Sep 2004 | A1 |
20040260542 | Ananthapadmanabhan et al. | Dec 2004 | A1 |
20050010401 | Sung et al. | Jan 2005 | A1 |
20050049853 | Lee et al. | Mar 2005 | A1 |
20050143987 | Cox et al. | Jun 2005 | A1 |
20050154584 | Jelinek et al. | Jul 2005 | A1 |
20050228651 | Wang et al. | Oct 2005 | A1 |
20050259681 | Lai | Nov 2005 | A1 |
20050261897 | Jelinek | Nov 2005 | A1 |
20060074643 | Lee et al. | Apr 2006 | A1 |
20060206318 | Kapoor et al. | Sep 2006 | A1 |
20060209955 | Florencio et al. | Sep 2006 | A1 |
20060271354 | Sun et al. | Nov 2006 | A1 |
20060271359 | Khalil et al. | Nov 2006 | A1 |
20060271373 | Khalil et al. | Nov 2006 | A1 |
20070025538 | Jarske et al. | Feb 2007 | A1 |
20070061137 | Yang et al. | Mar 2007 | A1 |
20070174047 | Anderson et al. | Jul 2007 | A1 |
20080086302 | Krishnan et al. | Apr 2008 | A1 |
20080151769 | El-Hennawey et al. | Jun 2008 | A1 |
20080189101 | Jabri et al. | Aug 2008 | A1 |
20080189102 | Takada | Aug 2008 | A1 |
20080312917 | Ananthapadmanabhan et al. | Dec 2008 | A1 |
20090043569 | Gao | Feb 2009 | A1 |
20090240490 | Kim et al. | Sep 2009 | A1 |
20110022924 | Malenovsky et al. | Jan 2011 | A1 |
Number | Date | Country |
---|---|---|
10-2005-0066477 | Jun 2005 | KR |
10-2006-0002569 | Jan 2006 | KR |
Entry |
---|
Translation of Korean Published Patent Application, KR 10-2006-0002569 A, published on Jan. 9, 2006, Do Hoon Lee. |
Variable Rate Multimodal Speech coder with Gain Matched Analysis by Synthesis, by Erdal Paksoy, 1997. |
ITU-T Standard, G728 Annex J, released Sep. 1999. |
3GPP2-WG of Association of Radio Industries and Businesses (ARIB), “ARIB STD-T64-C.S0014-0 v1.0, Enhanced Variable Rate Codec (EVRC), Speech Service Option 3 for Wideband Spread Spectrum Digital Systems”, 1997, pp. 1-141, 3rd Generation Partnership Project 2. |
Number | Date | Country | |
---|---|---|---|
20070258385 A1 | Nov 2007 | US |