This invention relates to signal processing, and in particular to processing of an audio signal in a communications network.
In a mobile telecommunications network (such as a GSM or 3G network), a user terminal typically communicates with at least one base station in the network. In this way signals can be sent between the user terminal and the base station(s). Each base station in the network is associated with a geographical region, known as a cell, whereby the base station is used to communicate with user terminals within the particular cell associated with the base station. When a user of the user terminal takes the user terminal from one cell to another a handover is performed in which the user terminal stops communicating with a first base station and starts communicating with a second base station.
During a voice call over the network there is a need to maintain continuous communication between the user terminal and a base station to ensure that the voice call is not interrupted. If a handover occurs during a voice call the audio stream can be interrupted for a short duration while the handover process is performed. This interruption can cause sounds that are undesirable from the user's perspective and give an impression of bad audio quality.
Efforts have been made in the prior art to address the problem of interrupting a voice call during handover. For example, in WO 1998/009454 by Khawand et al, handovers between base stations are performed where possible during periods in which there is no voice activity in the signal. In this way, the handover is performed when the users in the voice call are not talking. Similar systems are described in WO 99/65266 by Cerwall and in GB 2330484 by Frandsen. In these systems the detection of voice pauses to trigger the handover can be complex, requiring significant use of processing resources. Furthermore, these systems rely on there being a period of speech inactivity at or near the time when handover is required.
Other prior art systems use artificial comfort noise synthesis in which a handover period is filled with artificially created noise. Such systems are described in US 2008/0002620A1 by Anderton et al and in U.S. Pat. No. 5,974,374 by Wake. However, the use of comfort noise is not always appropriate, in particular when voiced speech, such as a vowel, is interrupted by the handover.
Another method employed in the prior art is to repeat and fade out buffered received speech at the user terminal to cover the interruption caused by the handover. However, this method typically creates audible clicks in the signal due to signal discontinuity as the speech is repeated. The human ear is particularly sensitive to signal discontinuities in a speech signal. A sudden discontinuity in the speech signal (such as an artificial jump in the signal between one speech sample and the next or a sudden mute) often creates a “click” sound, which may be perceived by the user as bad audio quality in the signal.
There is therefore a problem in the prior art of how to improve the quality of an audio signal when the audio signal is interrupted during handover between base stations in a communications network.
According to a first aspect of the invention there is provided a method of processing an audio signal in a communications network, the method comprising: receiving, at a speech buffer, a first portion of the audio signal over the network from a base station of the network, the speech buffer being configured to store and subsequently output the first portion of the audio signal; determining the presence of an interruption to the received audio signal, the interruption being such that a subsequent portion of the audio signal which is intended to be output from the speech buffer immediately following the output of the first portion is not stored in the speech buffer at the time that the subsequent portion is intended to be output from the speech buffer; in the event that the presence of the interruption has been determined, appending a second portion of the audio signal to the first portion in such a way as to form an output audio signal having no signal discontinuities in the time domain, the second portion having a predetermined duration and having a pitch matching that of the first portion over the predetermined duration; applying a fade out envelope to the second portion to gradually reduce the amplitude of the second portion over the predetermined duration; and outputting the output audio signal.
According to a second aspect of the invention there is provided an apparatus for processing an audio signal in a communications network, the apparatus comprising: a speech buffer for receiving a first portion of the audio signal over the network from a base station of the network, the speech buffer being configured to store and subsequently output the first portion of the audio signal; means for determining the presence of an interruption to the received audio signal, the interruption being such that a subsequent portion of the audio signal which is intended to be output from the speech buffer immediately following the output of the first portion is not stored in the speech buffer at the time that the subsequent portion is intended to be output from the speech buffer; means for appending a second portion of the audio signal to the first portion in the event that the presence of the interruption has been determined, in such a way as to form an output audio signal having no signal discontinuities in the time domain, the second portion having a predetermined duration and having a pitch matching that of the first portion over the predetermined duration; means for applying a fade out envelope to the second portion to gradually reduce the amplitude of the second portion over the predetermined duration; and means for outputting the output audio signal.
According to a third aspect of the invention there is provided a system for processing an audio signal, the system comprising: a communications network comprising a base station for transmitting the audio signal; and an apparatus as described above for receiving and processing the audio signal.
In a fourth aspect of the invention there is provided a computer program product comprising computer readable instructions for performing a method as described above.
Prior art systems require notification in advance of a handover that the handover will happen shortly. This allows the systems to prepare for the interruption to the audio signal caused by the handover. The prior art systems are not adapted for use where there is no advance notification that the audio signal will be interrupted. For example these prior art systems cannot handle unexpected speech underflow in which the speech buffer at the user terminal does not receive audio signal quickly enough, resulting in the speech buffer running out of audio signal to output. This may be due to the system not transmitting the signal for a period of time or may be due to a loss of synchronization between the user terminal and the base station without notification.
In preferred embodiments, a recovery buffer stores a copy of a portion of the most recently received speech frame of the audio signal. The pitch period of the frame is determined so that the copied portion in the recovery buffer can be time shifted to ensure continuity of the signal characteristics with the most recently received speech frame. When the audio signal is unvoiced, any reasonable time shift, or alternatively no time shift, can be applied to the copied portion in the recovery buffer. The copied portion in the recovery buffer can then be appended to the most recently received frame in the speech buffer to create a continuous signal. Since the copied portion is copied from the most recently received speech frame in the speech buffer, the copied portion has a matching spectral profile to that of the frame in the speech buffer. Consequently, the evolution over time of important characteristics of the speech signal (such as the signal in the time domain, the signal level, the pitch and the spectral shape) is ensured to be continuous from the most recently received frame in the speech buffer onward to the end of the recovery buffer, without any sudden changes.
Therefore when the copied portion is appended to the frame in the speech buffer the result is a natural sounding continuous audio signal. By using the recovery buffer it can be ensured that there is sufficient continuous audio signal available to be output for a predetermined duration D. A fade out pattern can be applied to the audio signal for the predetermined duration D to fade out the audio signal in a natural sounding way.
In preferred embodiments, audio stream interruption situations (such as handover or sudden underflow) are handled quickly and seamlessly. A natural sounding fading out of the audio stream is provided even when the speech buffer is empty. As stated above, the human ear is particularly sensitive to signal discontinuities and fading-out speed in a speech signal. The smooth and progressive fading out of the audio signal provided by preferred embodiments is comfortable for the user. Preferably the audio signal is faded out over a duration in the order of 3-20 ms which is comfortable for the user and is sufficiently short to allow the system to resume from the interruption quickly. Thus, the present invention produces a continuous, quickly faded-out speech signal without any artefacts. Longer durations, such as 20-200 ms are possible but increasing the fade out duration D into this longer range does not significantly improve the quality of the audio signal and may give the impression of muted transmission.
The present invention offers a solution that improves the perception of speech quality in the case of underflow or handover. The solution is cheap and efficient in terms of processing power, it does not create signal artefacts and so the audio signal sounds natural to the user and it does not add delay in the system.
For a better understanding of the present invention and to show how the same may be put into effect, reference will now be made, by way of example, to the following drawings in which:
a and 4b are diagrams showing the copying of a portion of the audio signal according to a two different embodiments;
a to 5c are diagrams showing the selection of a portion of audio signal is three different conditions;
a to 8c represent the audio signal according to three different prior art methods;
With reference to
The operation of the communications system 100 in a preferred embodiment will now be described with reference to
The audio signal typically comprises a plurality of speech frames. In this example, the speech frame and the speech buffer have the same duration (20 ms) which corresponds to the frame length most commonly used in current communication standards. However, different speech frame lengths can be used depending on the communication standard. If speech frames are shorter than this, they can be appended successively to obtain a speech buffer of the desirable length. Similarly, if the frame and the speech buffer are longer, only the last portion of speech buffer can be used to obtain the desirable length. In step S204 a speech frame received at the user terminal 104 is analysed to determine the pitch period of the speech frame. An example of a speech frame is shown in
A method to determine the pitch period is illustrated in
As in the example shown in
In an alternative embodiment the signal received at the user terminal 104 from the base station 102 comprises a pitch period parameter which identifies the pitch period of the frame of the audio signal. Therefore in step S204 the pitch period is determined by using the pitch period parameter received in the audio signal, rather than by performing any signal analysis on the speech frame.
In step S206 a portion of the speech frame is copied. In step S208 the copied portion is time shifted in dependence upon the pitch period determined in step S204. The time shift is selected such that the copied portion can be appended to the speech frame in the speech buffer 108 in such a way that the resulting signal has no discontinuities (i.e. the evolution of the most important signal characteristics is continuous as described below with reference to
Returning to the method shown in
The first method for storing the copied portion in the recovery buffer 110 is shown in
The second method for storing the copied portion in the recovery buffer 110 is shown in
It can be seen that the signal stored in the recovery buffer 110 as a result of either of the methods shown in
In step S212 the presence of an interruption in the audio flow between the base station 102 and the terminal equipment speaker 112 is determined. For example, the interruption may be due to a handover between base stations in the communications network or due to underflow in the receipt of the audio signal from the base station 102 (either attributed to the base station 102 or to the terminal equipment 104 or to the radio link between both). The interruption is such that a portion of the audio signal is output from the speech buffer 108 before a subsequent portion of the audio signal which is intended to be output from the speech buffer immediately following the output of the first portion is stored in the speech buffer. In other words the speech buffer 108 runs out of audio signal to output due to the interruption.
According to the preferred embodiment, when the interruption occurs, a second portion of audio signal of duration D is output from the speaker 112 and the second portion is faded out over the duration D. In order for this to be achieved, in step S214 the second portion of the audio signal is appended to the audio signal already output from the speech buffer 108. This second portion of the audio signal may be obtained from different sources as explained below with reference to
When the interruption occurs, if the speech buffer 108 has enough audio signal still waiting to be output then the second portion of the audio signal can be obtained entirely from the speech buffer 108. This is shown in
In other situations, when the interruption occurs the speech buffer 108 may not have enough samples waiting to be output to create the second portion of duration D. In these cases the recovery buffer 110 is used to compensate for the lack of audio signal in the speech buffer 108. For example,
In the situation shown in
In step S216 a fade-out envelope is applied to the second portion. The fade-out envelope has a duration D.
In some embodiments, the faded out signal which is output over the duration D is mixed with a noise signal, e.g. comfort noise generated at the user terminal 104. This can give a more natural sounding faded out signal.
The duration D can be a fixed quantity. Alternatively, the duration D can be variable in dependence on, for example, characteristics of the audio signal such as the speech signal content, or characteristics of the user terminal 104 such as the user terminal recovery time capability after an underflow event.
The method described above will create a smooth fading out of the audio signal, in which there are no signal discontinuities in the audio signal.
a shows a method in which the last received speech frame before the interruption is repeated. It can be seen that where the original speech frame joins the repeated speech frame there is a discontinuity in the signal which will create an audible clicking artefact in the output signal which could even create rattle noise if the frame is repeated several times.
b shows a method in which a silence frame is added after the last received speech frame. This creates a signal discontinuity which can create audible artefacts in the audio signal.
The present invention time shifts the audio signal in the recovery buffer 110 according to the pitch period of the audio signal to ensure that there is no signal discontinuity such as that shown in
c shows a method in which the amplitude of the audio signal is smoothly brought down to zero following an interruption. This is an improvement on the method shown in
The present invention is advantageous over the method shown in
The fading out duration D is preferably in the range 3-20 ms. This is long enough to avoid creating an audible clicking sound in the audio signal, whilst being short enough to allow the system to react quickly to subsequent changes in the network conditions. For example, if the interruption is caused by a handover, the user terminal 104 needs to quickly resume normal operation when audio signals are received from the new base station after handover is complete. Similarly, when an underflow condition is resolved, the user terminal 104 needs to quickly resume normal operation when audio signals are next received.
In the embodiment described above, a copied portion of each speech frame that is received at the speech buffer 108 is stored in the recovery buffer 110. This allows the recovery buffer 110 to be prepared in advance of an interruption, such that when an interruption occurs (even if the interruption occurs with no advance notification such as in the event of a sudden underflow) then the recovery buffer is already prepared to be used in fading out the audio signal as described above. This avoids extra processing power when the interruption occurs.
In alternative embodiments copied portions of received speech frames are only stored in the recovery buffer 110 when an interruption occurs. This is particularly useful when interruptions occur with some advance warning, such as in the case of a network programmed hand-over in which the modem indicates that an audio stream rupture or underflow is about to occur before the underflow actually occurs. In this alternative embodiment, when advance warning of an interruption is received, the step of determining the presence of an interruption (step S212 in
The present invention avoids audible artefacts in the speech stream without needing to rerun a speech decoder.
The method described above can be split conceptually into three different steps:
Where an interruption occurs causing the signal to be faded out as described above, when the next audio signals are received at the user terminal 104 the amplitude of the output audio signal can be faded in over a duration Din which can be the same as, or different from, the fade out duration D). By fading in the audio signal, a sudden change in the amplitude is avoided which can improve the user's perception of the audio quality.
While this invention has been particularly shown and described with reference to preferred embodiments, it will be understood to those skilled in the art that various changes in form and detail may be made without departing from the scope of the invention as defined by the appendant claims.
Number | Date | Country | Kind |
---|---|---|---|
0920729.1 | Nov 2009 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2010/066069 | 10/25/2010 | WO | 00 | 5/24/2012 |