Embodiments of the present subject matter relate to speech processing. More particularly, embodiments of the present subject matter relate to method and system for peak limiting of speech signals for delay sensitive voice communication.
Generally, speech processing systems deal with a variety of signals with varying intensity levels. Exemplary speech processing systems may include mobile phones, audio recorders, Voice over Internet Protocol (VOIP) systems etc. A person using the speech processing systems may speak at different audible levels at different instants in time. The variation in audio/speech signals may occur when the person changes the position with respect to the microphone of the speech processing system or if there is sudden and transient increase in the audio level. Such transient increase in the audio level may exceed the dynamic range of the audio processing system, thereby producing distorted audio output.
The term “peak limiting”, commonly used in signal processing, handles such signal bursts or transients in the audio signals. Further, the signal level is maintained below some predefined threshold, particularly during such transients. This has been a common practice for audio signal processing that is needed for audio content production and listening requirements.
In existing methods, the focus has been on to reduce the distortions caused in the audio quality during the peak limiting process. One generic approach to handle the transients is to delay the signals sufficiently such that future transients are anticipated and attenuated in time. In the audio signal used for entertainment, there was less focus on reducing the processing delay of the signals. However, for a voice communication system for interactivity and reducing the impact of acoustic echo feedback, it is desired that signal processing delay be minimal or preferably no delay should be introduced.
Further, a major section of voice communication systems is packet based communication like Voice over IP (VoIP) system. In the packet based communication, the speech signal is processed at block level or frame level. Hence, there is need for a method that can handle the speech signal transients without introducing any delay and with minimal distortion in signal quality, while processing at frame level as desired in the existing signal flow in the voice communication systems.
Various embodiments are described herein with reference to the drawings, wherein:
The systems and methods disclosed herein may be implemented in any means for achieving various aspects. Other features will be apparent from the accompanying drawings and from the detailed description that follow.
A method and system for peak limiting of speech signals for delay sensitive voice communication are disclosed. In the following detailed description of the embodiments of the present subject matter, references are made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the present subject matter may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present subject matter, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present subject matter. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present subject matter is defined by the appended claims.
The terms “frame” and “block” are used interchangeably throughout the document. Further, the terms “speech” and “audio” are used interchangeably throughout the document.
Peak limit process generally refers to handling the transient part of the signal mainly the peak level. The idea is to apply quick attenuation to the transient and peak level of the signal to bring it below a predefined threshold. This is expected to avoid possible distortions later in the signal path. In case of audio signal, it refers to avoiding distortions in audio recording and audio reproduction parts.
Once the transient part of the signal is reduced below the predefined threshold to avoid distortions, the applied attenuation should also be gradually removed so that overall signal level goes back to original signal level. This step of reducing or releasing the applied attenuation is referred as peak-release process in the following description. In the current description, peak-release process is described as integral part of overall peak limiting process. However, in literature and implementations peak-release process may be independent of the peak limiting process.
For peak limiting process, the power level of the signal can be any function of the signal. One representation can be smoothened power level of the signal. In the current description, the individual samples are taken as representative of the signal power level. Accordingly, the pre-defined threshold is taken as limit at a sample level.
One major challenge of the peak limiting process is to quickly identify the place when the signal level is crossing the pre-defined threshold and attenuate the signal such that the signal does not cross the pre-defined threshold. One of the key features of the current invention is that it identifies the samples crossing the pre-defined threshold and attenuates the signal without adding any processing delay. Another aspect of this invention is that it describes the peak limit process with respect to block level processing which is commonly used in Voice over IP (VoIP) communication.
peak gain=(predetermined peak threshold)/(highest magnitude in block).
At step 106, a gain delta by which an old gain is updated (i.e., reduced) to the peak gain is computed. Note that in this case, peak gain may be less than the old gain and this means reducing the gain. In one example embodiment, the gain delta is computed using the following equation:
gain delta=peak gain/old gain,
where, the old gain is a gain at an end of the previous block of samples.
At step 108, a gain update rate or a gain factor is computed for the current block of samples based on the position the sample with highest magnitude and the gain delta. The gain factor is computed using the equation:
where, the gain factor refers to a factor by which gain values get updated, the gain delta refers to a fractional change in gain, and the peak index is an index value of the position of the sample with highest magnitude in current block of samples.
At step 110, the gain factor is set to a predetermined minimum gain factor only when the computed gain factor is less than the predetermined minimum gain factor. The predetermined minimum gain factor is the maximum rate by which gain values are decreased at a sample level. If the computed gain factor is below the predetermined minimum gain factor, the gain factor is set/limited to the predetermined minimum gain factor to avoid any distortions. In an exemplary scenario, the value of the predetermined minimum gain factor is −0.5 dB/ms.
The predetermined minimum gain factor is the highest attenuation (i.e., gain reduction) rate that does not cause any distortion or introduces acceptable distortion. For speech processing, a pitch period characteristic can be used to derive the minimum gain factor. It is observed that over one pitch period, maximum increase in sample level is about 1 dB. Also, the minimum pitch period is about 2 ms. Hence for handling the transients, the gain should decrease at the rate of 1 dB per 2 ms. This effectively means a gain increase rate of −0.5 dB/ms. For different use cases, the minimum gain factor can be determined considering signal characteristics and acceptable quality distortion. Similarly, for peak release process, an acceptable maximum rate of gain increase considered is +0.1 dB/ms.
At step 112, gain is applied to the current block of samples using the gain factor. Before applying the gain factor, the process determines whether a peak gain for the current block of samples is reached. If the peak gain is not reached, a gain is updated at each sample in the current block of samples based on the gain factor using the equation:
updated gain=gain*gain factor,
where, the gain is the gain applied to a previous sample, and the gain factor is same as the gain factor computed at step 110. Further, the updated gain is applied to a current sample. Furthermore, the steps of determining, updating and applying gain are repeated until the peak gain is reached. When the peak gain is reached, the gain (at which the peak gain is reached) is applied to the remaining samples in the current block of samples. In other words, the gain update is stopped once the peak gain is reached. The step 112 is explained in detail in
Referring now to
At step 206, a gain delta is computed. At step 208, a check is performed to determine if the computed gain delta is greater than 1. If the gain delta is greater than 1, at step 210, a check is performed to determine if a hangover count value is greater than a hangover threshold value. The hangover count is used to determine whether a hangover wait period is over. Only after the hangover period, increase in gain is allowed as part of peak-release process. If the increase in gain is allowed just after the sample where peak gain is reached (during peak limiting), there is a higher probability of some subsequent samples crossing the peak threshold especially on the rising edge of signal burst. Once the samples cross peak threshold, peak limiting needs to me performed again. Therefore, hangover wait is performed to make sure that there are no undue gain fluctuations. If the hangover count value is greater than the hangover threshold value, at step 212, a check is performed to determine if the old gain is equal to 1. If the old gain is equal to 1, at step 218, the input samples are reproduced as output samples. The reason for this step is that gain cannot be increased beyond 1, that is, unity gain. With the unity gain, the output samples will be same as the input samples. If the old gain is not equal to 1, at step 216, a peak release process is performed to increase the gain towards unity. The peak release process is explained in detail with reference to
S
out
=S
in
*g
old,
where, Sout is the output sample, Sin is the input sample and gold is the old gain.
At step 224, the hangover count is incremented. If the gain delta is not greater than 1 (at step 208), then a peak limit process is performed at step 220 and then a status flag “hangover count” is set to zero at step 222. The peak limit process is explained in detail with reference to
Referring now to
Process explained in flowchart 300B (
In this case, the samples till the peak index only need to be considered. The reason is that the gain factor of the samples beyond the peak index is always more than the gain factor of the peak sample.
At step 356 a gain factor is computed for each of the identified samples. In these embodiments, the process starts with an initial count of n equal 0 and incrementing the count by 1. The index of the samples crossing the predetermined threshold are identified using the equation:
index=peak cross index[n],
Further, gain factor of each of the identified samples based on respective magnitudes of the identified samples is computed using the following equations. Here, peak gain[n] and gain delta[n] have their usual meaning with respect to nth identified sample. Also, Sin [index] is the value of the identified input sample.
peak gain[n]=(predetermined peak threshold)/(magnitude of Sin[index]), and
gain delta[n]=(peak gain[n])/(oldgain).
Subsequently, the gain factor is computed for each of the identified samples based on the computed gain delta and respective positions of the identified samples using the equation below:
At step 358, a minimum gain factor is determined from the computed gain factors associated with the identified samples. At step 360, the minimum gain factor is set to the predetermined minimum gain factor when the minimum gain factor is less than predetermined minimum gain factor. At step 362, the gain factor is applied to the current block of samples.
Referring now to
Referring now to
At step 504, the value of n is incremented. At step 506, a check is performed to determine if a peak gain is reached for a sample in the current block of samples. If the peak gain is not reached, at step 508, a gain at the sample in the current block of samples is updated based on the gain factor and subsequently the “peak gain reached” flag is updated at step 510. The gain update process is explained in detail in
At step 514, a check is performed to determine whether the value of n (i.e., count value) is equal to the length of the current block of samples. If n is not equal to the length of the current block of samples, the process increments the value of n at step 504 and goes to step 506 until the value of n is equal to the length of the current block of samples.
Referring now to
If the gain factor (at step 602) is greater than 1, the peak gain is checked for the peak release process where at step 608, a check is made to determine whether the gain is greater than the peak gain. If the gain is greater than the peak gain, at step 606, the “peak gain reached flag” is set to TRUE. If the gain is not greater than the peak gain, the “peak gain reached flag” is retained to FALSE at step 610.
output sample=input sample*gain,
where, the gain refers to computed gain corresponding to the input sample at consideration.
At step 654, a check is made to determine whether a magnitude of a computed output sample is greater than a peak threshold in the current block of samples. When the magnitude of the output sample is greater than the peak threshold, the peak cross count is incremented at step 656 and the output sample crossing the peak threshold is identified by noting the index of the output sample, at step 658. At step 660, a check is made to determine whether the magnitude of the output sample is greater than hard peak limit. If the magnitude of output sample is greater than hard peak limit, then the output sample (Sout (n)) is limited/set/updated to the hard peak limit with the sign of value being same as the sign of output sample, at step 662. Note that, predetermined peak threshold is also referred as soft peak limit. It is desired that to avoid quality distortions output sample values be within the soft peak limit. However, few samples can cross the soft peak without seriously impacting the quality. In comparison, all output sample values must be within the hard peak limit else audio quality will be seriously impacted. The number of samples that cross the hard peak limit can also be found similar to the way the number of samples crossing the soft peak limit (peak cross count) is found.
Referring now to
In operation, the peak detection module 704 is configured to determine a position of a sample with highest magnitude within a current block of samples. The gain factor computation module 706 is configured to determine a peak gain to be applied for the block of samples for bringing the highest magnitude to a predetermined threshold value. Further, the gain factor computation module 706 computes a gain delta by which an old gain is updated to the peak gain. The old gain is a gain at an end of the previous block of samples. Further, the gain factor computation module 706 computes a gain factor for the current block of samples based on the position of the sample with highest magnitude and the gain delta. After computing the gain factor, the gain factor computation module 706 sets the gain factor to a predetermined minimum gain factor when the computed gain factor is less than the predetermined minimum gain factor.
The gain application module 708 is configured to update gain at sample level and apply the gain to corresponding sample within the block of samples. In one example embodiment, the gain application module 708 determines whether a peak gain is reached in the current block of samples. If the peak gain is not reached, the gain application module 708 updates a gain at each sample in the current block of samples based on the gain factor. Further, the updated gain is applied to each sample in the current block of samples. Furthermore the gain application module 708 repeats the steps of determining, updating and applying gain until the peak gain is reached.
The gain factor refinement module 710 is configured to identify a plurality of samples till a peak index that are crossing the predetermined threshold value upon applying the gain to the current block of samples. Further, the gain factor refinement module 710 computes a gain delta for each of the identified samples based on respective magnitudes of the identified samples. Furthermore, the gain factor refinement module 710 computes a gain factor for each of the identified samples based on the computed gain delta and respective positions of the identified samples. Further the gain factor refinement module 710 determines a minimum gain factor from the computed gain factors associated with the identified samples. Also, the gain factor refinement module 710 sets the minimum gain factor to the predetermined minimum gain factor when the minimum gain factor is less than predetermined minimum gain factor. In addition, the gain application module 708 applies the computed gain to the current block of samples.
Referring now to
At transmitter end, a speech is captured by audio input device 802. The audio input device may convert the analog speech signal into its digital counterpart using analog to digital conversion circuitry. Further, the speech signal is amplified by the amplifier 804. Further the speech signal is processed by the peak limiting module 806 to mitigate any transients in the speech signal by varying the gain factor applied to the samples in the speech signal. The peak limiting module 806 performs peak limiting process (as explained in
In one example embodiment, the processed speech signal is recorded at a recording device 808. The recording device 808 may include, for example, a voice recorder, mobile phone, music system. In another example embodiment, the processed speech is transmitted over the network 810 to the receiver system 811 which can also perform the peak limit process. The peak limiting modules 806 and 814 are similar to the peak limiting module 702 as explained with reference to
Referring now to
The speech processing system 902 includes a processor 904, memory 906, a removable storage 918, and a non-removable storage 920. The speech processing system 902 additionally includes a bus 914 and a network interface 916. As shown in
Exemplary audio input devices 922 include a microphone, a Musical Instrument Digital Interface (MIDI) keyboard and the like. Exemplary audio output devices 924 include speakers, earphones, headphones and the like. Exemplary communication connections 926 include a local area network, a wide area network, and/or other networks.
The memory 906 further includes volatile memory 908 and non-volatile memory 910. A variety of computer-readable storage media are stored in and accessed from the memory elements of the speech processing system 902, such as the volatile memory 908 and the non-volatile memory 910, the removable storage 918 and the non-removable storage 920. The memory elements include any suitable memory device(s) for storing data and machine-readable instructions, such as read only memory, random access memory, erasable programmable read only memory, electrically erasable programmable read only memory, hard drive, removable media drive for handling compact disks, digital video disks, diskettes, magnetic tape cartridges, memory cards, Memory Sticks™, and the like.
The processor 904, as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a graphics processor, a digital signal processor, or any other type of processing circuit. The processor 904 also includes embedded controllers, such as generic or programmable logic devices or arrays, application specific integrated circuits, single-chip computers, smart cards, and the like.
Embodiments of the present subject matter may be implemented in conjunction with program modules, including functions, procedures, data structures, and application programs, for performing tasks, or defining abstract data types or low-level hardware contexts. Machine-readable instructions stored on any of the above-mentioned storage media may be executable by the processor 904 of the speech processing system 902. For example, the memory 906 includes machine-readable instructions capable of peak limiting speech signals for delay sensitive voice communication generated in the speech processing system 902, according to the teachings and herein described embodiments of the present subject matter. In one embodiment, the memory may include a compact disk-read only memory (CD-ROM) and loaded from the CD-ROM to a hard drive in the non-volatile memory 910. Machine-readable instructions in the memory 906 cause the peak limiting module 702 to operate according to the various embodiments of the present subject matter.
As shown, the memory 906 includes a peak limiting module 702. For example, the peak limiting module 702 can be in the form of instructions stored on a non-transitory computer-readable storage medium. When the instructions in the non-transitory computer-readable storage medium are executed by a computing device, causes the speech processing system 902 to perform the one or more methods and systems described with reference to
Thus, the described method and system provides a method for peak limiting speech signals for delay sensitive voice communication. The described method and system applies fast attenuation in order to avoid samples from going beyond a peak threshold. The described method and system is implemented in a block processing manner which is advantageous in digital domains like Voice over Internet Protocol (VoIP) applications. Further, the method and system provides a peak release process where the gain is increased back to unity level. Furthermore, the method and system does not introduce any additional delay caused by incorporating look-ahead feature. Additionally, the method and system uses pitch information to determine the rate of attenuation (gain factor) which is effective for speech and certain musical content.
Although certain methods, systems, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. To the contrary, this patent covers all methods, apparatus, and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.