This application claims priority to the Chinese Patent Application titled, “System and Method for Audio Limiting in Call Downlink Algorithm,” filed Oct. 17, 2023, and having Application No. CN 202311347262.0. The subject matter of this related application is hereby incorporated herein by reference.
The present disclosure relates generally to audio processing, and particularly to a system and a method for audio limiting in a downlink algorithm.
Using earphones to make phone calls has become a common practice in daily life. However, when using an earphone for a call, a near-end earphone user is generally unable to control the behavior of a far-end user. For example, sometimes the far-end user may be too far away from the phone or microphone, so that it is difficult for the near-end earphone user to hear the other person clearly. Conversely, a person at the far end may suddenly speak loudly, which causes a sudden surge in the volume of the near-end earphone, resulting in a poor experience at the near-end earphone user side.
Therefore, in order to improve the call experience, an audio limiter device is needed, which necessarily maintains a consistent volume for a near-end earphone user, no matter how soft or loud a far-end person speaks.
According to one aspect of the present disclosure, a method for audio limiting in a call downlink is provided. The method includes receiving far-end sound from a far-end device through an audio limiting module configured at a near-end device, and performing voice activity detection and adjusting a current volume for each audio frame of a plurality of audio frames in the received far-end sound. The method further includes playing far-end sound with the adjusted volume through a speaker of the near-end device. The step of adjusting the current volume in the method further includes calculating a difference between the current volume and a target volume and regarding it as a gain, and applying the gain to a current audio frame.
According to another aspect of the present disclosure, a system for audio limiting in a call downlink is provided. The system includes an audio limiting module, which is configured to receive far-end sound from a far-end device at a near-end device, and perform voice activity detection and adjust a current volume for each audio frame of a plurality of audio frames in the received far-end sound. The system further includes a speaker in the near-end device, and the speaker is configured to play far-end sound with the adjusted volume in the near-end device. Adjusting the current volume in the system further includes calculating a difference between the current volume and a target volume and regarding it as a gain, and applying the gain to a current audio frame.
The audio limiting system and method provided by the present disclosure only play a role in the detected speech part, and can perform noise suppression on the non-speech part at the same time.
These and/or other features, aspects, and advantages of the present disclosure will be better understood upon reading the following detailed description with reference to the accompanying drawings, where the same characters represent the same components throughout these accompanying drawings:
Description of various embodiments is provided below for the purpose of illustration, but it is not intended to be exhaustive or to limit the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.
In an existing call product, a user needs to control, through a button on the product (for example, on a mobile phone or an earphone), the volume of the other speaking person heard in a call. During a call, for example, the other person may whisper, which makes it difficult to hear. To hear clearly, the volume must be turned up by pressing the button. However, doing so will increase the background noise. Therefore, in order to improve the call experience during the speech call, no matter how soft or loud the voice of the person talking at the far end is, it is necessary to maintain a consistent volume for the user using the earphone at the near end.
In order to achieve this, the present disclosure specifically introduces an audio limiter technique in a call downlink algorithm for the design of a near-end device, such as an earphone. In order to ensure that an earphone wearer can more comfortably hear far-end sound during a call, an audio limiting method is adopted in the downlink of the call algorithm to adaptively limit the audio. Here, the downlink refers to enabling the person at the near end to hear the voice of the other person at the far end, and the method for audio limiting provided by the present disclosure only acts on the detected speech part. The speech part is processed differently according to the volume, the volume higher than a target volume is suppressed, and the volume lower than the target volume is increased. If the speech volume is too small, an upper threshold for the gain can limit the maximum gain, and avoid the background noise of the speech segment being excessively amplified, thereby improving the user experience. In this way, the caller can free his/her hands and the earphone can adaptively keep the listening volume at an appropriate level. At the same time, it can also suppress noise when there is no speech transmission (that is, no speech is detected).
Referring to
Then, a current volume is obtained. When the determination result of the current frame is VAD_T=1, that is, there is voice, the current volume is acquired as a maximum absolute value of the audio frame volume in a block 260. Specifically, the volume of the frame takes the maximum amplitude of all audio samples in the frame. For example, a speech sampling rate is 16 kHz, and the number of sampling bits is 16 bits. Each audio frame selects an audio length of 16 ms for processing, corresponding to 256 sampling points, and selects the maximum value among amplitudes of the 256 samples as the current volume of the frame.
In addition, when the determination result of the current frame is VAD_T=0, that is, there is no voice, in a block 270, the current volume is acquired as an average value of maximum absolute values of the volume of several speech presence frames before the audio frame. For example, the average value of the maximum absolute values of amplitudes obtained per frame for several previous speech presence frames saved in a buffer is regarded as the current volume. For example, the average value of the maximum absolute values of amplitudes obtained per frame for 128 speech presence frames saved in the buffer is regarded as the current volume.
Next, a gain of adjusting the current volume to the target volume is calculated. The value of the target volume can be experience-based, which can be an adjustable range of target values. For each frame, a difference between the current volume and the target volume can be calculated. The difference is regarded as the gain to adjust the current volume to the target volume. The gain should be limited in a certain range, for example, in a range having an upper threshold and a lower threshold. For example, it should be ensured that the voice heard can be prevented from being amplified too much. At the same time, the gain value can further be smoothed. For example, the gain is smoothed sample by sample in the frame range, so as to prevent the excessive fluctuation of the gain value from introducing abnormal noise.
Then, after the above smoothing process, the volume difference is compensated in the near-end earphone sample by sample in the frame.
As can be seen, in order to avoid processing noise or silence fragments, the VAD (voice activity detection) is necessary. If the noise is also amplified, it will affect the sensation of hearing. Preprocessing, such as smoothing, of the gain used to adjust the volume is also necessary. Therefore, the algorithm of the audio limiter can be as follows:
As mentioned above, the audio limiting system and method provided by the present disclosure take different processing strategies according to whether a speech is detected or not, amplify or compress the speech to the target volume according to the current speech amplitude when the speech is detected, and suppress the detected pure noise segment. If the speech volume is too small, the upper threshold for the gain can limit the maximum gain, and avoid the background noise of the speech segment from being excessively amplified, thereby improving the user experience.
The audio limiter is used for the call downlink algorithm of the earphone to automatically amplify the low pitch, limit the high pitch and prevent the distortion of the high pitch.
Specifically, in
In
Similarly, in
Moreover, in
As can be seen by comparing
The audio limiter is used in a call downlink algorithm for earphones to automatically amplify the low pitch, limit the high pitch, and prevent distortion of the high pitch. For example, by adjusting each frequency point, clipping of the amplitude can be avoided, thereby preventing cracking voice. This method is more like volume control, which always keeps the speech volume at a target level. In this way, the earphone user can listen at a stable volume. In this process, according to the distinguishing result of the voice activity detection, the volume is adjusted only when there is a speech, and the noise is suppressed when there is no speech.
Examples of one or more embodiments of the present disclosure are described in the following clauses:
Clause 1. A method for audio limiting in a call downlink, including: receiving far-end sound from a far-end device through an audio limiting module at a near-end device; performing voice activity detection and adjusting a current volume for each audio frame of a plurality of audio frames in the received far-end sound; and playing far-end sound with the adjusted volume through a speaker of the near-end device, wherein the adjusting a current volume includes: calculating a difference between the current volume and a target volume and regarding it as a gain; and applying the gain to a current audio frame.
Clause 2. The method according to clause 1, wherein the performing voice activity detection includes: calculating a signal-to-noise ratio of the each audio frame of the plurality of audio frames in the received far-end sound through noise estimation; determining the current audio frame as an audio frame containing a speech when the signal-to-noise ratio is higher than a predetermined threshold; and determining the current audio frame as an audio frame not containing a speech when the signal-to-noise ratio is not higher than the predetermined threshold.
Clause 3. The method according to clause 1 or clause 2, wherein the adjusting a current volume further includes: for the current audio frame being an audio frame containing a speech, taking the current volume as a maximum absolute value among amplitudes of all sampled samples in the current audio frame.
Clause 4. The method according to any one of clause 1 to clause 3, wherein the adjusting a current volume further includes: for the current audio frame being an audio frame not containing a speech, taking the current volume as an average value of maximum absolute values of amplitudes obtained per frame for the previous several audio frames containing a speech.
Clause 5. The method according to any one of clause 1 to clause 4, wherein applying the gain to the current audio frame further includes: previously limiting the gain to a range between an upper threshold and a lower threshold, and smoothing the gain sample by sample during the current audio frame.
Clause 6. The method according to any one of clause 1 to clause 5, wherein the target volume is a value taken based on experience.
Clause 7. The method according to any one of clause 1 to clause 6, wherein the near-end device is an earphone, the far-end device is a mobile phone, and the near-end device is connected to the far-end device via a Bluetooth connection.
Clause 8. A system for audio limiting in a call downlink, including: an audio limiting module configured to receive far-end sound from a far-end device at a near-end device, and perform voice activity detection and adjust a current volume for each audio frame of a plurality of audio frames in the received far-end sound; a speaker configured to play far-end sound with the adjusted volume in the near-end device, wherein the adjusting a current volume includes: calculating a difference between the current volume and a target volume and regarding it as a gain; and applying the gain to a current audio frame.
Clause 9. The system according to clause 8, wherein the performing voice activity detection includes: calculating a signal-to-noise ratio of the each audio frame of the plurality of audio frames in the received far-end sound through noise estimation; determining the current audio frame as an audio frame containing a speech when the signal-to-noise ratio is higher than a predetermined threshold; and determining the current audio frame as an audio frame not containing a speech when the signal-to-noise ratio is not higher than the predetermined threshold.
Clause 10. The system according to clause 8 or clause 9, wherein the adjusting a current volume further includes: for the current audio frame being an audio frame containing a speech, taking the current volume as a maximum absolute value among amplitudes of all sampled samples in the current audio frame.
Clause 11. The system according to any one of clause 8 to clause 10, wherein the adjusting a current volume further includes: for the current audio frame being an audio frame not containing a speech, taking the current volume as an average value of maximum absolute values of amplitudes obtained per frame for the previous several audio frames containing a speech.
Clause 12. The system according to any one of clause 8 to clause 11, wherein applying the gain to the current audio frame further includes: previously limiting the gain to a range between an upper threshold and a lower threshold, and smoothing the gain sample by sample during the current audio frame.
Clause 13. The system according to any one of clause 8 to clause 12, wherein the target volume is a value taken based on experience.
Clause 14. The system according to any one of clause 8 to clause 15, wherein the near-end device is an earphone, the far-end device is a mobile phone, and the near-end device is connected to the far-end device via a Bluetooth connection.
Clause 15. A non-transient computer-readable medium storing instructions thereon, wherein the instructions, when executed by one or more processors, cause the one or more processors to perform the method for audio limiting in a call downlink according to clauses 1 to 7.
The elements of various real-time solutions for implementing the modules, elements, and components of the methods provided in the present disclosure may be fabricated as one or more electronic devices residing on the same chip or in a chipset, including but not limited to an array of fixed or programmable logic elements (such as transistors or gates). One or more elements of various implementations of the device described herein may also be fully or partially implemented as one or more instruction sets, which may be arranged to execute on one or more fixed or programmable logic element arrays (such as microprocessors, embedded processors, IP cores, digital signal processors, FPGAs, ASSPs, and ASICs).
The selection of terms used herein is intended to best explain the principles and practical applications of the embodiments or the improvements to technologies found in the market, or to enable other persons of ordinary skill in the art to understand the embodiments disclosed herein.
In the above, reference is made to the embodiments presented in the present disclosure. However, the scope of the present disclosure is not limited to the specifically described embodiments. Rather, any combination of the above features and elements, whether related to different embodiments or not, is contemplated as implementing and practicing the contemplated embodiments.
Furthermore, although the embodiments disclosed herein can achieve advantages better than other possible solutions or better than the prior art, whether a given embodiment achieves specific advantages does not limit the scope of the present disclosure. Therefore, the above aspects, features, embodiments and advantages are merely illustrative and are not considered as elements or limitations of the appended claims unless explicitly recited in the claims.
Although the above content is directed to the embodiments of the present disclosure, other and further embodiments of the present disclosure can be designed without departing from the basic scope of the present disclosure, and the scope of the present disclosure is determined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202311347262.0 | Oct 2023 | CN | national |