The present invention relates generally to the field of comfort noise generation and more particularly to a method and system for comfort noise generation in communication networks with discontinuous transmission or as artificial background noise to be used by echo canceller systems or by communication systems that implement mute function.
Comfort Noise (CN) is an artificial background noise that is used in a variety of audio applications. One application that uses comfort noise is communication network with discontinuous transmission (DTX) such as VoIP, GSM or DECT, where the CN is used to fill silence intervals/periods (also known as transmission gaps) at the receiver end when the silence is not transmitted explicitly. Silence intervals are common in speech applications such as phone call conversations. It is known that speech gaps in transmission should be filled with some kind of noise to prevent the phenomena of complete silence at the receiver end, which creates a discomfort feeling to the listener.
Other types of applications that make use of CN are echo cancellers and suppressors. CN is used as a non-linear processing (NLP) that replaces residual echo. These applications refer to a situation where a far-end user and near-end user are conducting a conversation and the generation of an artificial background noise is required in order to provide the far-end with a background noise, instead of complete silence, when only the far-end speaks.
Yet another type of application where CN could be used by applications that implements a mute functionality, such as telephone systems that enable a first participant (near-end) to disable its microphone and turn into a listen-only participant (muted user). In this mode it may be desired to provide a CN for the far-end listener to avoid the feeling of complete silence at the far-end participant side.
Producing CN usually consists of two steps: first the background noise is learned and then it is generated. There are several methods for implementing Comfort Noise Generation (CNG), including:
An aspect of an embodiment of the invention relates to a method and system for comfort noise generation (CNG) that provides good spectral and level matching with BGN and is simple for implementation, requires very limited hardware and software resources.
An aspect of an embodiment of the invention relates to a method and system for CNG that is based on two phases: recording actual BGN and estimating its level in a first learning phase and applying coefficients that are extracted from the recorded BGN on White Noise (WN) samples wherein the Comfort Noise (CN) is adjusted according to the BGN level estimation of the learn phase.
An aspect of an embodiment of the invention relates to a method and system for CNG that can be implemented in communication networks with discontinuous transmission, or in an echo canceller system, or in communication system that implements a mute function by a muted user.
In an exemplary embodiment in accordance with the disclosed subject matter there is disclosed a method for Comfort Noise Generation (CNG) comprising the steps of recording information of Background Noise (BGN); generating white noise samples; and generating Comfort Noise (CN) by applying coefficients that are extracted from the information of BGN on White Noise (WN) samples.
In an exemplary embodiment in accordance with the disclosed subject matter the step of recording information of Background Noise includes estimation of actual BNG level, and the step of generating Comfort Noise (CN) by applying coefficients that are extracted from the information of BGN on White Noise (WN) samples includes level adjustment according to the estimation of actual BNG level.
In an exemplary embodiment in accordance with the disclosed subject applying coefficients that are extracted from the information of BGN on White Noise (WN) for generating the n'th sample of CN is performed by implementing a formula that is basically
wherein i goes from 0 to N−1, where N is the number of coefficients of each BGN sample, C[i] is the n'th sample of the recorded information of BGN, and X[n] is the n'th sample of the WN.
In an exemplary embodiment in accordance with the disclosed subject matter the CNG is used in a communication network with discontinuous transmission in order to fill silence periods, the communication network comprising a transmitter and a receiver and wherein the information of BGN is recorded during a predefined period that starts at the beginning of a silence period and wherein the transmitter keeps transmission for enabling the receiver to collect information on the BGN.
In an exemplary embodiment in accordance with the disclosed subject matter the CNG is used during silence periods, when the transmitter does not transmit data to the receiver.
In an exemplary embodiment in accordance with the disclosed subject matter the CNG is used in an echo canceller system having a near-end and a far-end; wherein the Background noise (BGN) is recorded during periods when both far-end and near-end are inactive.
In an exemplary embodiment in accordance with the disclosed subject matter the Comfort Noise (CN) replaces residual echo at times when only far end is active.
In an exemplary embodiment in accordance with the disclosed subject matter the CNG is used in a communication system that implements a mute function by a muted user, for providing a listener to the muted user with Comfort Noise during periods of the mute function activation.
In an exemplary embodiment in accordance with the disclosed subject matter BGN is recorded during periods when the mute function is inactive and both the muted user and the listener are inactive; and wherein the CNG is activated during periods when the mute function is activated.
In an exemplary embodiment in accordance with the disclosed subject matter recording information of Background Noise is implemented by a cyclic buffer with a pointer that tracks the most updated background noise information.
In an exemplary embodiment in accordance with the disclosed subject matter the generation of Comfort Noise is implemented by software.
In an exemplary embodiment in accordance with the disclosed subject matter the generation of Comfort Noise is implemented by hardware.
In an exemplary embodiment in accordance with the disclosed subject matter the generation of Comfort Noise is implemented by a combination of software and hardware elements.
In an exemplary embodiment in accordance with the disclosed subject matter there is disclosed a system for Comfort Noise Generation, comprising: a unit for recording information of Background Noise (BGN) during periods when only BGN is present; a White-Noise generation unit; a unit for generating Comfort Noise (CN); wherein the CN is generated by applying coefficients that are extracted from the information of BGN on White Noise (WN) samples that were generated by the White-Noise generation unit.
In an exemplary embodiment in accordance with the disclosed subject matter the unit for recording information of Background Noise (BGN) includes a functionality of estimation of actual BNG level, and wherein the unit for generating CN includes level adjustment according to the estimation of actual BNG level.
In an exemplary embodiment in accordance with the disclosed subject matter the unit for generating Comfort Noise implements a function that is basically described by the formula
The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings. Identical structures, elements or parts, which appear in more than one figure, are generally labeled with a same or similar number in all the figures in which they appear, wherein:
The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings. Identical structures, elements or parts, which appear in more than one figure, are generally labeled with a same or similar number in all the figures in which they appear, wherein:
The present disclosure refers to one of four possible cases of the systems as shown in
(a) Only far-end speaks;
(b) Only near-end speaks;
(c) Both ends speaking simultaneously;
(d) Nobody speaks.
CNG generation is required only at the first case—when only FE speaks. In this case, which is identified by echo canceller control unit 214; there is a need to provide only BGN- to the FE. At this state only FE speaks and the only signal that is desired at FE side is BGN that is used to prevent the inconvenience of complete silent at the FE speaker. On the other hand, CNG learn is desired to be applied during the last case, when nobody speaks and only actual background noise exists at the output of the echo canceller 214 and CNG 240,
It should be noted that while in an exemplary embodiment according to the disclosed subject matter, recording BGN is performed when detecting end of data transmission during TL period, recording BGN may be performed continuously in a cyclic buffer, while usage of the recorded BGN will be controlled by pointers to the relevant sections in the buffer.
When referring to
When referring to
In order to enable comfort noise generation that has good level matching with the background, a BGN level estimation is performed (310) and level information of the BGN is recorded.
The generate phase of CNG 304 is performed at a later stage when using or playing artificial BGN is required. The generate phase (312) is applied by implementing the convolution following formula:
Where Xn is the n-th sample of a white-noise signal and wherein C[i] is the ith-sample of the BGN that was recorded at the learn phase. Obviously, in order to get an artificial BGN that meets the basic requirements of spectral matching—the sampled BGN that is recorded at the learn phase 302 should be of a minimal predefined length. In the frequency domain the convolution is transformed to multiplication of the two signals, hence the spectrum of the result is similar to the spectrum of the BGN and therefore there is a perfect spectral matching between the BGN and the generated Comfort Noise. It should be noted that in order to guarantee good matching, it is required to use a relatively big buffer that supports the storage of enough coefficients C[i]. While white-noise could be generated by many methods that are well known to persons that are skilled in the art, thus, this disclosure will not refer to the techniques of generating white noise. It is assumed that white noise is generated by any method and the samples Xn of the white-noise are stored and available for use as described above.
This method of generating artificial background noise is very simple—it requires only a buffer for storing background noise and a simple circuit that implements equation (1) as described above. Since this method uses real background data for generating comfort noise, the generated comfort noise has perfect spectral matching with actual BGN and precise spectral shaping, it has successful track of changes in the actual BGN. (It is continuously updating according to actual BGN), it does not suffer stability problems, there is no need to estimate excitation signal and there is no need to model the spectral envelope. Furthermore, white noise input signal eliminates any non-naturality and repetition.
It is readily understood by persons skilled in the art, that many variations of equation (1) will still yield a good CN. Therefore it should be noted that while equation (1) describes a single formula for generating comfort noise, the invention is not limited to the specific equation as shown by equation (1) and includes any variation on equation (1) that is based on combinations of white noise and samples of real BGN.
Before playing the CN a level adjustment is performed (314) by estimating the actual level of the BGN and adjusting the CN level accordingly. Finally CN is played by the system (316). It should be noted that while level adjustment (314) is shown in
In an exemplary embodiment, according to the disclosed subject matter, the status of the input frame 502 is checked 504 to determine its VAD (Voice Activity Detection) status. If a voice transmission is detected (VAD1) the input frame enters a CNG LEARN block 516. At the CNG learn block the input frame is stored in a cyclic buffer and a start pointer is updated to point to the recently stored frame 518. It should be noted that it is not necessary to use a cyclic buffer. In another embodiment where VAD state is explicitly transmitted to the receiver or VAD is implemented in the receiver, alternatively a buffer can be filled only at times when a VAD1 to VAD0 transition is detected.
The input frame is then played out, as it was received 522 (during VAD1 the output is not influenced by CNG circuit). In the CNG LEARN there is also a unit for actual BGN level estimation 520 whose output is being used in the CNG generator 506. BGN Level estimation can be done continuously during any step of CNG Learn or alternatively can be done only during VAD1 to VAD0 transition, using the last updated buffer.
When a VAD0 is detected, input frame 502 is ignored and the circuit generates white noise (WN) 508 with a known level. While WN generation is known in the art and may be created by various methods and circuits, the process of creating WN is not described in this disclosure. The WN that was generated in block 508 together with coefficients C[i] that are samples of BGN from the stored input frame 518 are used to produce a Comfort Noise (CN) 510 using the formula:
Where i goes from 0 to (N−1), where N is the buffer size that stores the samples C[i] and X[n] are white noise samples. As a person skilled in the art readily understands, in order to produce CN that has good spectral matching characteristic it is necessary to use a relatively long buffer to store the incoming frames. The buffer's length determines the number of coefficients that are used for producing each bit of the CN. (A certain size of buffer is required for preventing the stream from repeating itself, in order to provide naturalness and in order to represent a good frequency response of the actual background noise).
After implementing the above formula a level adjustment block 512 is adjusting the level of the CN according to the estimated actual BGN level, as it is provided by estimate actual BGN level block 520. This is important for providing a CN that has good spectral matching and also good level matching.
Finally the CNG is played out as CN 514 during the VAD0 period.
It should be noted that while
In an exemplary embodiment in accordance with disclosed subject matter, a residual frame after Acoustic Echo Cancelling (AEC) or Echo Cancelling (EC) (as shown in
If the input frame is not a BGN (BGN=0) it is checked whether it is double-talk (DT) or not. If it is DT (Not case one) the input frame is played out as is. (In this case when BGN=0 there is no reason to store the frame as the frame storage is performed in order to record BGN and extract C[i] coefficients of BGN).
If the input frame is found to be not a DT (This case of both BGN=0 and DT=0 is indication that the system is in state one where only FE is speaking) it goes into the CNG generator block 606 and undergoes the same path as was described with reference to
When the Mute function is active the a White Noise Generator 708 is activated (White noise generation is known in the art and could easily be implemented by a person skilled in the art, hence its implementation is not described in the present disclosure). The white noise is than processed 710712 in the same way as was described in
While
It should be appreciated that the above described methods and systems may be varied in many ways, including omitting or adding steps, changing the order of steps and the type of devices used. It should be appreciated that different features may be combined in different ways. In particular, not all the features shown above in a particular embodiment are necessary in every embodiment of the invention. Further combinations of the above features are also considered to be within the scope of some embodiments of the invention.
Section headings are provided for assistance in navigation and should not be considered as necessarily limiting the contents of the section.
It will be appreciated by persons skilled in the art, that the present invention is not limited to what has been particularly shown and described hereinabove. Rather the scope of the present invention is defined only by the claims, which follow.