This application is the U.S. National phase of international application PCT/SEO1/02798 filed 14 Dec. 2001 which designates the U.S.
The invention relates to a method for controlling a jitter buffer in a node of a communication system and a communication apparatus implementing said method in a communication system
Currently, there is a strong trend in the telecommunication business to merge data and voice traffic into one network using packet switched transmission technology. This trend, often referred to as “Voice over IP” or “IP-telephony”, is now also moving into the world of cellular radio communications.
One problem associated with IP-telephony communication systems, is that individual speech packets in a stream of speech packets generated and transmitted from an originating node to a receiving node in the communication system, experiences stochastic transmission delays, which may even cause speech packets to arrive at the receiving node in a different order than they were transmitted from the originating node. In order to cope with the variable transmission delays, causing so-called jitter in the time of arrival of the speech packets at the receiving node and potentially even resulting in packets arriving in a different order than transmitted, the receiving node is typically provided with a jitter buffer used for sorting the speech packets into the correct sequence and delaying the packets as needed to compensate for transmission delay variations, i.e. the packets are not played back immediately upon arrival.
Another problem that is present in “IP-telephony” as opposed to traditional circuit switched telephony is that the clock that controls sampling frequency, and thereby the rate at which speech packets are produced by the originating node, is not locked to, or synchronized with, the clock controlling the sample playout rate at the receiving node. In an “IP-telephony” call involving two personal computers (PC), it is typically the sound board clocks of the PCs that control the respective sampling rates which is known to cause problems. As a result of the difference in clock rates at the originating node and the receiving node, so called clock skew, the receiving node may experience either buffer overflow or buffer underflow in the jitter buffer. If the clock at the originating node is faster than the clock at the receiving node, the delay in the jitter buffer will increase and eventually cause buffer overflow, while if the clock at the originating node is slower than the clock at the receiving node, the receiving node will eventually experience buffer underflow.
One way of handling clock skew has been to perform a crude correction whenever needed. Thus, upon encountering, buffer overflow of the jitter buffer, packets may be discarded. On the other hand, upon encountering buffer underflow of the jitter buffer, certain packets may be replayed to avoid pausing. If the clock skew is not too severe, then such correction may take place once every few minutes which may be perceptually acceptable. However, if the clock skew is severe, then corrections may be needed more frequently, up to once every few seconds. In this case, a crude correction will create perceptually unacceptable artefacts.
U.S. Pat. No. 5,699,481 teaches a timing recovery scheme for packet speech in a communication system comprising a controller, a speech decoder and a common buffer for exchanging coded speech packages (CSP) between the controller and the speech decoder. The coded speech packages are generated by and transmitted from another communication system to the communication system via a communication channel, such as a telephone line. The received coded speech packets are entered into the common buffer by the controller. Whenever the speech decoder detects excessive or missing speech packages in the common buffer, the speech decoder switches to a special corrective mode. If excessive speech data is detected, it is played out faster than usual while if missing data is detected, the available data is played out slower than usual. Faster playout of data is effected by the speech decoder discarding some speech information while slower playout of data is effected by the speech decoder synthesizing some speech-like information. The speech decoder may modify either the synthesized output speech signal, i.e. the signal after complete speech decoding, or, in the preferred embodiment, the intermediate excitation signal, i.e. the intermediate speech signal prior to LPC-filtering. In either case, manipulation of smaller duration units and silence or unvoiced units results in better quality of the modified speech.
The article “Priority discarding of Speech in Integrated Packet Networks”, Petr et al, IEEE Journal on Selected Areas in Communication, vol. 7, no. 5, June 1989 discloses an integrated packet network (IPN) in which overload control is accomplished by taking advantage of the inherent structure of the speech signal. A delivery priority is assigned to each speech packet at the transmitter. The delivery priority assigned to a certain speech packet depends on how important the content of the speech packet is to the communication. In response to an overload situation occuring in the network, i.e. when for a short period of time the arrival rate at a packet multiplexer exceeds the service rate causing short-term congestion at the packet multiplexer, speech packets are discarded according to the assigned delivery priority.
U.S. Pat. No. 5,659,541 discloses methods of reducing accumulated gross delay in packet switched voice transmission. According to U.S. Pat. No. 5,659,541, an A/D converter provides voice samples by sampling the analog voice signal at a predetermined rate. The voice samples are assembled into packets and transmitted through a packet switching network to a receiving party where the digitized speech samples are inserted in an arrival buffer. The depth of the arrival buffer is monitored and if the arrival buffer contains more than a threshold number of digital samples, a leaky filter is actuated to discard a predetermined number of the samples in the arrival buffer. The leaky filter may e.g. discard one out of every “X” samples where “X” is either a predetermined fixed number, compute a random number “Y” and discard the “Yth” sample in the arrival buffer or detect a group of low energy samples in the buffer (e.g. samples corresponding to a period of silence) and discard the samples of this group.
The problem dealt with by the present invention is providing an improved way of controlling buffering delay in a jitter buffer for buffering blocks of compressed speech information in a first node of a communication system.
A an improved way of controlling buffering delay in a jitter buffer for buffering blocks of compressed speech information in a first node of a communication system is provided.
Speech quality is improved by enabling actions aiming at reducing buffering delay in the jitter buffer to be selectively applied to blocks of compressed speech information which are of less importance to perceived voice quality as compared to other blocks of compressed speech information.
The technology advantageously affords an improved way of controlling buffering delay in a jitter buffer for buffering blocks of compressed speech information.
Speech quality may be improved by selectively applying actions aiming at reducing buffering delay in the jitter buffer to blocks of compressed speech information which are of less importance to perceived voice quality as compared to other blocks of compressed speech information.
The invention will now be described in more detail with reference to exemplary embodiments thereof and also with reference to the accompanying drawings.
Thus, in an exemplary scenario of a voice communication session, i.e. a phone call, involving a user at the fixed terminal TE1 and a user at the mobile station MS1, voice information is communicated between the fixed terminal TE1 and the base station BS1 using a packet switched mode of communication. The well known real-time transport protocol (RTP), User Datagram Protocol (UDP) and Internet Protocol (IP) specified by IETF are used to convey speech packets, including blocks of compressed speech information, between the fixed terminal TE1 and the base station BS1. At the base station BS1, the RTP, UDP and IP protocols are terminated and the blocks of compressed speech information are transported between the base station BS1 and the mobile station MS1 over a circuit switched radio channel CH1 assigned for serving the phone call. The radio channel CH1 being circuit switched implies that the radio channel CH1 is dedicated to transport blocks of speech information associated with the call at a fixed bandwith.
In order to manage variations in transmission delay, which individual packets experience when being transmitted through the packet switched network NET1 from the fixed terminal TE1 to the base station BS1, the base station BS1 includes a jitter buffer JB1 associated with the radio channel CH1.
In the exemplary communication system SYS1 of
44.1*10/55=8.018 kHz (1)
Thus the problem of clock skew between a fixed terminal and the base station BS1 may occur frequently, causing a significant risk for a jitter buffer, e.g. jitter buffer JB1, in the base station BS1 to experience an ever increasing buffering delay which eventually causes buffer overflow.
The present invention provides a way of controlling buffering delay and avoiding overflow of a jitter buffer in a node of a communications system.
At step 201, blocks in said stream of blocks of compressed speech information are selected according to a predetermined rule based on the current buffering delay of the jitter buffer and at least one characteristic of said blocks of compressed speech information.
At step 202, predetermined actions aiming at reducing buffering delay in the jitter buffer are applied to the selected blocks of compressed speech information. The predetermined actions may e.g. include removing selected blocks from the jitter buffer and/or refraining from inserting selected blocks into the jitter buffer.
The base station BS1 includes a plurality of jitter buffers including the jitter buffer JB1, a buffer manager 301, a plurality of transceivers including a transceiver 302 and a network interface 303.
The network interface 303 connects the base station BS1 to the packet switched network NET1 and enables the base station BS1 to transmit and receive packets of both user data, including speech packets 304, and signalling data via the packet switched network NET1. In the exemplary communication system SYS1 of
The transceiver 302 handles transmission and reception of radio signals on the radio channel CH1. At regular intervals, the transceiver is provided with blocks 305 of compressed speech information from the jitter buffer JB1 for transmission over the radio channel CH1 to the mobile station MS1.
The jitter buffer JB1 is provided to delay the individual blocks 305 as needed to compensate for transmission delay variations occurring when transmitting said blocks 305 of compressed speech information from the fixed terminal TE1 to the base station BS1 via the packet switched network NET1. In the exemplary first embodiment, each entry 306 in the jitter buffer JB1 includes one block 305 of compressed speech information together with a timestamp 307 indicating when the entry 306 was inserted into the jitter buffer JB1.
The buffer manager 301 controls operation of the jitter buffer JB1. It is responsible for arranging the blocks 305 of compressed speech information in the jitter buffer JB1 in the correct sequence, providing the transceiver 302 with blocks 305 of compressed speech information for transmission over the radio channel CH1 at regular intervals and also to control the buffering delay of the jitter buffer JB1.
At step 401 the buffer manager 301 determines the current buffering delay in the jitter buffer JB1. The current delay is in this exemplary embodiment calculated by comparing the current time and the time stamp 307 of the oldest jitter buffer entry 306. At step 402 the current delay is compared to a first delay threshold. If the current delay is less than the first delay threshold (an alternative YES at step 402), processing continues at step 408. Otherwise (an alternative NO at step 402), a check is made at step 403 whether there is a jitter buffer entry 306 including a block 305 of compressed speech information representing a speech segment having an energy level below a first energy level threshold. If such a jitter buffer entry 306 is found (an alternative YES at step 403), processsing continues at step 407. Otherwise (an alternative NO at step 403), the current delay is compared to a second delay threshold, which is greater than the first delay threshold, at step 404. If the current delay is less than the second delay threshold (an alternative YES at step 404), processing continues at step 408.
Otherwise (an alternative NO at step 404), a check is made at step 405 whether there is a jitter buffer entry 306 including a block 305 of compressed speech information representing a speech segment having an energy level below a second energy level threshold which is greater than the first energy level threshold. If such a jitter buffer entry 306 is found (an alternative YES at step 405), processsing continues at step 407. Otherwise (an alternative NO at step 405), the current delay is compared to a third delay threshold, which is greater than the second delay threshold, at step 406. If the current delay is less than the third delay threshold (an alternative YES at step 406), processing continues at step 408. Otherwise (an alternative NO at step 406), processing continues at step 407 where, the buffer manager 301 drops, i.e. removes from the jitter buffer JB1 and discards, the block 305 which represents a speech segment having the lowest energy level among the speech segments represented by all the blocks 305 currently stored in the jitter buffer JB1. After step 407 and alternative YES at steps 402, 404 and 406, processing continues at step 408 wherein the buffer manager 301 fetches the next jitter buffer entry 306, i.e. the oldest jitter buffer entry, from the jitter buffer JB1 and delivers the block 305 of compressed speech information included in said jitter buffer entry 306 to the transceiver 302 for transmission over the radio channel CH1.
Thus, as illustrated by
If the current delay of the jitter buffer JB1 exceeds the second delay threshold, which is greater than the first delay threshold, the buffer manager 301 selects and drops blocks 305 representing speech segments having energy levels below the second energy level threshold, which is higher than the first energy threshold level.
Each time the current delay of the jitter buffer JB1 exceeds the third delay threshold DTH3, which is greater than the second delay threshold DTH2, the buffer manager 301 selects and drops a block 305 which represents a speech segment having the lowest energy level among the speech segments represented by all the blocks 305 currently stored in the jitter buffer JB1.
The first, second and third delay thresholds as well as the first and second energy threshold levels are selected according to the general principles that if a net speech quality improvement is achieved from dropping a particular block, that block should be dropped. The net speech quality results from the quality improvement due to a reduced buffering delay as a consequence of dropping a block of compressed speech information and the quality degradation due to the information loss that would be caused by dropping the same block of compressed speech information. The exact threshold values may be determined by performing informal listening tests, i.e. evaluating the opinions of a plurality of people on which combinations of delay and energy threshold levels provide the best speech quality. The jitter buffer JB1 is designed for being able to compensate for normal variations in transmission delay of speech packets which can be expected in the packet switched network NET1 and the first delay threshold is thus selected such that normal variations in transmission delay does not cause dropping of blocks of compressed speech information. As an example, assuming the normal variations in transmission delay could be up to 10 ms, the first delay threshold would be set to at least 30 ms to ensure that if a block of compressed speech information is dropped, causing the buffering delay of the jitter buffer JB1 to be reduced by 20 ms, the jitter buffer JB1 is still capable of providing up to at least 10 ms of delay if necessary to compensate for variations in transmission delay.
Apart from the exemplary first embodiment disclosed above, there are numerous ways of providing rearrangements, modifications and substitutions of the first embodiment resulting in additional embodiments.
Thus, as an alternative, or a complement, to monitoring the current delay of the jitter buffer JB1 and trying to perform actions aiming at reducing excessive buffering delay each time the transceiver 302 needs to be provided with a new block of compressed speech information, the buffer manager 301 could perform such processing each time a new block of compressed speech information has been received and is about to be inserted into the jitter buffer JB1. According to one embodiment, the buffer manager 301 could be adapted to first insert the new block into the jitter buffer JB1 and then perform all the processing steps of
There are alternative ways of defining and determining the current delay. One example of such an alternative way would be to estimate when the last block of compressed speech information added to the jitter buffer will be fetched for transmission and defining the current delay based on this estimate and the time when the last block of compressed speech information was added to the jitter buffer. In this alternative embodiment, it is not necessary to include in each jitter buffer entry a time stamp specifying when the respective jitter buffer entry was added to the jitter buffer. Thus, each jitter buffer entry could be reduced to consist of a single block of compressed speech information and instead a single variable associated with the jitter buffer could be used for storing the time when the last jitter buffer entry was added to the jitter buffer. In other embodiments, preferrably where each block of compressed speech information represents shorter speech segments of e.g. 5 ms length, it may suffice to use the number of blocks currently stored in the jitter buffer as a measure of the current delay.
As already indicated, in different embodiments of the technology, each jitter buffer entry may comprise one block of compressed speech information or may include additional data associated with said one block of compressed speech information. One example of such additional data is e.g. the time stamp 307 which in the first exemplary embodiment indicates when a jitter buffer entry 306 was inserted into the jitter buffer JB1. In other embodiments, the additional data may include additional data contained in the packet in which a block of compressed speech information was received and may even include the complete packet. Including complete speech packets in the jitter buffer may e.g. be desirable in a modified version of the exemplary communication system of
In the exemplary first embodiment, the buffer manager 301 selects packets according to a predetermined rule based on the current delay of the jitter buffer JB1 and energy levels of the speech segment represented by blocks 305 currently stored in the jitter buffer JB1. Depending on which speech coding algorithm has been used, and thus the format of the blocks 305 of compressed speech information, the speech segment (frame) energy levels may be directly available in the blocks 305 or implicitly available in which case partial decoding of the packets has to be performed to obtain the speech segment energy levels. An alternative to obtaining the speech segment energy levels through partial decoding of the blocks of compressed speech information, would be for the node, e.g. the fixed terminal TE1, generating the blocks of compressed speech information, to include the corresponding speech segment energy levels in a separate field of each speech packet. Thus each speech packet would include both a block of compressed speech information and an additional field specifying the energy of the speech segment represented by said block. In such an arrangement, it would be preferable to include both the blocks of compressed information as well as the associated speech segment energy levels as entries in the jitter buffer JB1
The packet switched network NET1 in
The speech segment energy levels represented by blocks of compressed speech information is but one example of a characteristic of said blocks which could be used in combination with the current delay of the jitter buffer as a basis for selecting blocks of compressed speech information. Another example would be to consider whether the blocks of compressed speech information represent segments of voiced or unvoiced speech. Generally blocks representing voiced speech segments are more important to the perceived speech quality, and thus the predetermined rule for selecting blocks could specify that e.g. primarily blocks representing unvoiced speech segments should be selected. An estimate on whether a block represents a voiced or unvoiced speech segment could e.g. be derived from parameters representing LPC-prediction gain and/or LTP-predicition gain according to methods well known to a person skilled in the art. Yet another example of a suitable block characteristic to consider is the so called AMR-codec mode used to encode blocks of compressed speech information when applying Adaptive Multi-Rate (AMR) coding. AMR-coding provides different coding modes, each using different number of bits for representing a speech segment and thus enabling a trade off between the number of information source bits used to represent a speech segement and the level of error protection which can be provided for said information bits. The lowest AMR-codec mode, i.e. the coding mode using a minimum number of bits to represent a speech segment, is used when radio channel conditions are relatively bad and the distortion caused by dropping a block of compressed speech information coded using the lowest AMR-codec mode will be less audible than if a higher rate codec-mode was used under better radio conditions. Thus when the lowest AMR-codec mode is used, it may become advantageous to drop a block of compressed speech information at a lower delay threshold as compared to if the same speech segment was coded using a higher rate AMR-codec mode.
Another way of reducing the delay of the jitter buffer JB1, apart from discarding selected packets removed from the jitter buffer, would be to select, based on the current delay and at least one characteristic of the blocks of compressed speech information, pairs of consecutive packets and transmit the most important parts of each block in a pair using the same bandwith as one complete block of compressed speech information while discarding the remaining parts of the blocks in said pair. At the mobile station, the missing information could be randomly created. The mobile station would need to be able to distinguish between receipt of ordinary complete blocks of compressed speech information and combinations of the most important parts of two consecutive blocks. Preferrably also a channel coding scheme providing fair protection against bit errors for all bits in a combined block would be implemented.
As a person skilled in the art appreciates, the invention is applicable in basically all situations where a jitter buffer is used for buffering blocks of compressed speech information.
Number | Date | Country | Kind |
---|---|---|---|
0004839 | Dec 2000 | SE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SE01/02798 | 12/14/2001 | WO | 00 | 12/2/2003 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO02/052399 | 7/4/2002 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5157728 | Schorman et al. | Oct 1992 | A |
5371787 | Hamilton | Dec 1994 | A |
5450410 | Hiuchyj et al. | Sep 1995 | A |
5544324 | Edem et al. | Aug 1996 | A |
5553071 | Aranguren et al. | Sep 1996 | A |
5566169 | Rangan et al. | Oct 1996 | A |
5594732 | Bell et al. | Jan 1997 | A |
5594734 | Worsley et al. | Jan 1997 | A |
5606562 | Landguth | Feb 1997 | A |
5617418 | Shirani et al. | Apr 1997 | A |
5659541 | Chan | Aug 1997 | A |
5668811 | Worsley et al. | Sep 1997 | A |
5687174 | Edem et al. | Nov 1997 | A |
5699481 | Shlomot et al. | Dec 1997 | A |
5805597 | Edem et al. | Sep 1998 | A |
5862343 | Landguth et al. | Jan 1999 | A |
5999525 | Krishnaswamy et al. | Dec 1999 | A |
6064673 | Anderson | May 2000 | A |
6163535 | Jordan et al. | Dec 2000 | A |
6215797 | Fellman et al. | Apr 2001 | B1 |
6246702 | Fellman et al. | Jun 2001 | B1 |
6335927 | Elliott et al. | Jan 2002 | B1 |
6360271 | Schuster et al. | Mar 2002 | B1 |
6434606 | Boella et al. | Aug 2002 | B1 |
6438702 | Hodge | Aug 2002 | B1 |
6452950 | Ohlsson et al. | Sep 2002 | B1 |
6556820 | Le et al. | Apr 2003 | B1 |
6577872 | Lundh et al. | Jun 2003 | B1 |
6590876 | Brent | Jul 2003 | B1 |
6658027 | Kramer et al. | Dec 2003 | B1 |
6661810 | Skelly et al. | Dec 2003 | B1 |
6683889 | Shaffer et al. | Jan 2004 | B1 |
6684273 | Boulandet et al. | Jan 2004 | B2 |
6747999 | Grosberg et al. | Jun 2004 | B1 |
6862298 | Smith et al. | Mar 2005 | B1 |
6983161 | Wesby et al. | Jan 2006 | B2 |
7130368 | Aweya et al. | Oct 2006 | B1 |
20020007429 | Boulandet et al. | Jan 2002 | A1 |
20020026568 | Jeon | Feb 2002 | A1 |
20020075857 | LeBlanc | Jun 2002 | A1 |
20020101885 | Progrebinsky et al. | Aug 2002 | A1 |
20020120749 | Widegren et al. | Aug 2002 | A1 |
20020141452 | Mauritz et al. | Oct 2002 | A1 |
20020167911 | Hickey | Nov 2002 | A1 |
20020181438 | McGibney | Dec 2002 | A1 |
20030031210 | Harris | Feb 2003 | A1 |
20030112758 | Pang et al. | Jun 2003 | A1 |
20030152093 | Gupta et al. | Aug 2003 | A1 |
20030152094 | Colavito et al. | Aug 2003 | A1 |
20030169755 | Ternovsky | Sep 2003 | A1 |
20030185222 | Goldstein | Oct 2003 | A1 |
20030202528 | Eckberg | Oct 2003 | A1 |
20040022262 | Vinnakota et al. | Feb 2004 | A1 |
20040037320 | Dickson | Feb 2004 | A1 |
20040057445 | LeBlanc | Mar 2004 | A1 |
20040062252 | Dowdal et al. | Apr 2004 | A1 |
20040062260 | Raetz et al. | Apr 2004 | A1 |
20040073692 | Gentle et al. | Apr 2004 | A1 |
20040076190 | Goel et al. | Apr 2004 | A1 |
20040120309 | Kurittu et al. | Jun 2004 | A1 |
20040156622 | Kent, Jr. et al. | Aug 2004 | A1 |
20040258099 | Scott et al. | Dec 2004 | A1 |
20050007952 | Scott | Jan 2005 | A1 |
20050041692 | Kallstenius | Feb 2005 | A1 |
20050220240 | Lesso | Oct 2005 | A1 |
20050276411 | LeBlanc | Dec 2005 | A1 |
20060088000 | Hannu et al. | Apr 2006 | A1 |
20070150264 | Tackin et al. | Jun 2007 | A1 |
Number | Date | Country |
---|---|---|
1 032 165 | Aug 2000 | EP |
12155559 | Jun 2002 | EP |
1427121 | Jun 2004 | EP |
0016509 | Mar 2000 | WO |
0042728 | Jul 2000 | WO |
0120828 | Mar 2001 | WO |
0150657 | Jul 2001 | WO |
0213421 | Feb 2002 | WO |
02054662 | Jul 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20040076191 A1 | Apr 2004 | US |