The present disclosure relates to switching frequency response and directivity of a microphone, and more particularly to accurately switching frequency response and directivity of a microphone based on background noise conditions.
For hands-free communication applications, and particularly for hands-free communication in a vehicle cabin, the types of microphones chosen for a design depend heavily on use cases targeted by the design. A hands-free communication use case, for example in a vehicle cabin, may be characterized by background noise conditions. Depending on the background noise conditions, some types of microphones used in hands-free communication applications include an omnidirectional microphone with a flat frequency response (FR), an omnidirectional microphone with a rising FR, a unidirectional microphone, or a microphone array that provides directional sensing by strategically combining signals coming from multiple (i.e., ≥2) microphone elements. For an in-vehicle hands-free application, the microphones detect speech of a speaker for a hands-free communication system inside the vehicle cabin. The specific type of microphone used in the hands-free communication system should be designed to provide the highest possible speech intelligibility (SI) and speech quality (SQ) for primary targeted use cases by design, and, ideally, for all use cases.
Noise inside the vehicle cabin is complex and diverse, thereby presenting unique challenges for hands-free communication. Noise from the engine, noise from wind, noise from the heating, ventilation, and air conditioning (HVAC) system, and noise from other passengers in the vehicle can all interfere with the microphone response characteristics, making it difficult to apply only one type of microphone, and/or only one frequency response, and/or only one directivity.
For example, a dominant energy of engine, wind, and HVAC noises are typically concentrated in a low frequency (e.g., <500 Hz) range. And the frequency content that is critical to understanding human speech is generally above 500 Hz. Based on this knowledge, it is a common practice to employ a microphone with a rising frequency response for hands-free speech or voice communication. The rising response microphone has a predefined cut-off frequency (e.g., between 300 Hz and 500 Hz), below which, its sensitivity reduces monotonically with decreasing frequency to purposely filter out low frequency noise. However, the signal below its cut-off frequency also includes speech content, so the rising response microphone will remove speech content below its cut-off frequency as well, making speech sound unnatural in low noise conditions, such as, for example, when the vehicle is idling, stationary, or driving at low speeds. For low noise driving conditions, a microphone with a flat frequency response in the common speech band (e.g., between 50 Hz and 14 kHz) may be preferred. A flat frequency response microphone means that its sensitivity (i.e., amplitude of the microphone signal output per unit acoustic pressure input) is substantially the same over the entire frequency band of interest.
Alternatively, a unidirectional microphone, or a microphone array, may be used because it is able to focus on sound coming from a direction of the speaker. This improves the signal-to-noise ratio (SNR) by spatially filtering out unwanted noise coming from directions other than the direction of the speaker. However, its noise rejecting performance is significantly degraded under wind turbulence-induced noise conditions such as mid-to-high speed driving with open windows.
Because there is not one type of microphone that can provide optimal SI and SQ for all possible noise sources in the vehicle cabin, engineers designing the hands-free communication system must choose a specific type of microphone for the specific vehicle design. A major disadvantage is that performance is satisfactory only for some cases and performance is unsatisfactory for other cases.
A possible solution is to use multiple microphones of different frequency response and/or directivity characteristics, or a microphone with multiple modes and switch the microphone type and/or mode based on an overall noise condition in the vehicle cabin. The microphone output is monitored and compared to a predetermined threshold value. Based on the comparison, the microphone may switch, for example, between an omnidirectional response and a unidirectional response. Unfortunately, a single predetermined threshold comparison does not accurately capture noise vs. speech characteristics, resulting in unwanted switching decision.
There is a need to adaptively alter frequency response and directivity characteristics of a hands-free microphone to optimize SI and SQ when the hands-free microphone used for a hand-free communication system in a vehicle cabin.
A handsfree system and method to modify at least one microphone output based on a linearized correlation between a modified Speech Intelligibility Index (mSII) and a Mean Opinion Score (MOS). The at least one microphone output signal is compared to predetermined thresholds for the mSII that correspond to a noise condition and the microphone output signal is modified to optimize Speech Intelligibility and Sound Quality for the noise condition.
In one or more embodiments, a handsfree system has a primary microphone having a first frequency response (FR) shape. The primary microphone outputs a measure of noise in a cabin of the vehicle. A modified Speech Intelligibility Index (mSII) is determined by multiplying a standard SII with a weighting coefficient having a value between zero and one. The mSII is linearly correlated with a Mean Opinion Score (MOS) to determine predetermined thresholds for the mSII that correspond to noise conditions. Predetermined filter coefficients are then selected from a lookup table and applied to the primary microphone output based on a comparison between the mSII determined by the primary microphone output and the predetermined mSII thresholds. A filter applies the predetermined coefficients to modify a first FR shape of the primary microphone to a second FR shape that optimizes Speech Intelligibility (SI) and Sound Quality (SQ) for the noise condition.
In one or more embodiments the primary microphone is an omnidirectional microphone, the first FR shape is flat and the second FR shape is rising.
In one or more embodiments, a plurality of predetermined threshold stages further resolves the determination of the noise condition for first and second predetermined thresholds. The further resolution provides a more refined selection and application of predetermined filter coefficients to the microphone output signal to optimize SI and SQ.
In one or more embodiments the handsfree system has a microphone module having primary and secondary microphones. A beam forming algorithm is applied to the primary and secondary microphone output signals to modify a directivity from omnidirectional to unidirectional.
Elements and steps in the figures are illustrated for simplicity and clarity and have not necessarily been rendered according to any sequence. For example, steps that may be performed concurrently or in different order are illustrated in the figures to help to improve understanding of embodiments of the present disclosure.
While various aspects of the present disclosure are described with reference to
As discussed above, a unidirectional microphone typically provides higher SNR, and therefore better SI, under most driving conditions that do not involve direct wind turbulent noise. An omnidirectional microphone, on the other hand, performs better when driving conditions include wind noise. Furthermore, an omnidirectional microphone with a rising frequency response (FR) may perform better than one with a flat FR, yet the flat FR has a wider bandwidth making it more appropriate for natural sounding speech. The inventive subject matter adaptively alters the FR and directivity characteristics of a microphone to optimize speech intelligibility (SI) and speech quality (SQ) for multiple driving conditions. To accomplish this, a switching algorithm that accurately differentiates multiple driving conditions, including, but not limited to, direct wind turbulence and non-wind noise, directional noise and non-directional noise, low/mid/high-level noise, and noise with different spectrum characteristics, also known as spectrum coloration.
Many factors, such as signal SNR, distortion, speech spectrum, noise spectrum, etc., are related to SI and SQ performance. And there are multiple known methods for evaluating SI and SQ performance of a communication system. Many are standardized and adopted for scientific evaluation and/or product development. For example, ANSI S3.5-1997 describes an objective method to calculate a SI index (SII). This method provides reasonable correlation using intelligibility data obtained from human subjects. However, SII does not necessarily consider all important psychoacoustical effects of noise (human perceptions of noise). ANSI S3.5-1997 purposely focuses more on the intelligibility factor of the speech signal and less on the quality factor.
Another example method, ITU-T P862.2 provides a perceptual evaluation of a SQ model to calculate a Mean-Opinion-Score (MOS) that corresponds to human subjective evaluation. The MOS prediction is generally accepted as a better approach compared to SII as it inherently considers both the intelligibility factor of the speech signal and the quality factor. However, a MOS evaluation method that is based on, or similar to, the ITU-T P862.2 process is difficult to implement in real-world product designs due to its computational complexity.
Although SI and SQ are sometimes used interchangeably, there are distinct differences. SI emphasizes a degree to which speech may be understood by a listener. On the other hand, SQ may be considered a measure more representative of human perception. SQ, in addition to the degree to which speech may be understood by the listener, includes a degree of satisfaction of the listener as well as a naturalness and listening effort required for the listener to understand speech. For this reason, SQ is typically only evaluated using human subjects, which is why it is primarily used in laboratory evaluations. ITU-T has recommendations P862, P862.2 and P863 that describe a mathematical process to derive the MOS based on objective measurements only, removing the need for actual human subjects. However, the process is still complex and costly to implement in real-world product applications.
Further analysis showed that SII changes rapidly around SNR of 0 dB.
In one or more embodiments, the SII is mathematically modified by weighting SII data with an overall signal SNR. Referring back to
f(snr)=1−c*Γpdf(snr+s,A,B) (1)
mSII=f(snr)*SII (2)
In Equation (1), f(snr) is a weighting function based on an unweighted SNR. Γpdf(snr+s, A, B) is a Gamma probability density function, Γpdf, with scale parameters A and B. Since a negative input variable will result in a Γpdf value of 0, the input SNR (denoted by snr) value is shifted by a positive s dB to ensure negative SNR values are considered. Depending on the values of A and B, a proper coefficient, c, is determined to guarantee the resulting f(snr) value is between 0 and 1. Referring now to Equation (2), the modified SII value, mSII, is obtained by multiplying the weighting function value f(snr) with the original SII value.
The modified SII, mSII, is used as an indication of SI/SQ performance measured with MOS score. The correlation between the mSII and MOS, as shown in
Once the standard SII is calculated, at step 604 a weighting coefficient related to the total signal SNR, f(snr), is determined using Equation (1) above. In Equation (1) the Γpdf function is:
And at step 606, mSII is calculated using Equation (2) above which multiples the weighting coefficient and the standard SII. The modified SII, mSII, is compared to one or more predetermined threshold values. The threshold values correspond to background noise conditions and are used to switch microphone characteristics, including FR and directivity, as needed to optimize SI/SQ.
The modified SII, mSII, is an indication of SI/SQ performance measured with the Mean Opinion Score (MOS). The linearized correlation between mSII and MOS allows mSII to be used to adjust microphone output characteristics (FR and directivity) to optimize microphone performance under differing automotive driving conditions. According to the inventive subject matter, the mSII criteria may be applied to switch an output between a flat FR and a rising FR. Alternatively, or additionally, it may be applied to switch between an omnidirectional microphone to a unidirectional microphone, which may also include switching between the flat and rising FRs for the omnidirectional microphone.
The standard SII block 808 outputs an SII 812 and a SNR 814 which, along with a weighting function 816, is fed into block 818 to calculate the modified SII (mSII) using Equations (1) and (2) described earlier herein. At decision block 820 the mSII value is compared to the first predetermined threshold value, V1. When mSII is greater than or equal to V1822, this indicates low noise conditions and no change 824 is made to the operation of the microphone and the microphone maintains 826 an omnidirectional signal with a flat FR. The original microphone signal 803 is output with no filtering. Under low noise conditions, the microphone with a flat FR will provide sufficient SI and provide optimal SQ.
A high noise condition is determined when mSII is less than the first predetermined threshold, V1, 828. For high noise conditions, a high pass filter 830 is applied to the original microphone signal 803. Design coefficients for the high pass filter 830 may be stored in a lookup table 832 and are selected from the lookup table based on the mSII. Application of the high pass filter 830 to the microphone signal 803 results in a microphone output that has a rising FR shape 834. The rising FR shape helps to remove noise that is typically dominant in the low frequency range, thereby improving SI and SQ.
The signal processing blocks and steps may be carried out within the microphone element 802a itself or within the microphone module 802 if the microphone element 802a or the microphone module 802 includes a built-in digital signal processor (DSP). Alternatively, a DSP in another system element, such as an amplifier or a head unit, may carry out the processing blocks and steps.
With only a single omnidirectional microphone element, the directivity of microphone module 802 cannot be altered to be unidirectional. Therefore, the entire range when mSII<V1, as depicted in
V1_2 is a second predetermined threshold stage for mSII that is less than the first predetermined threshold, V1, but is larger than the second predetermined threshold, V2. For the present example, V1_2 has a value between 0.5 and 0.6. An mSII falling between the first predetermined threshold stage V1_1 and the second predetermined threshold stage V1_2 indicates a slightly increased noise condition. Under the slightly increased noise condition, high pass filtering the signal with a low cut-off frequency (e.g., 100 Hz), as represented by curve 907, helps improve SNR while conserving low frequency speech content. The microphone output is processed using the high pass filter with predetermined coefficients selected from a look up table 904. The filtered signal will be output as an omnidirectional microphone output with a rising frequency response that is optimal for the noise condition relevant to the calculated modified SII, mSII, for values between V1_1 and V1_2.
A third additional predetermined threshold stage V1_3 has a value lower than or equal to the second predetermined threshold V2. At high noise conditions, when mSII falls below V1_3, the system will apply the filter with coefficients selected from a lookup table 904 to generate an omnidirectional output with a rising FR having a high cut-off frequency (e.g., 500 Hz) as represented by curve 906. This is optimal for achieving the best SI/SQ at high noise conditions as predicted by the modified SII, mSII.
Similarly, for a modified SII, mSII that falls between the second predetermined threshold stage, V1_2, and the third predetermined threshold stage, V13, the system will apply a high pass filter with coefficients selected from the lookup table 904 that is less rigorous than curve 906, but more rigorous than curve 907, resulting in curve 908. This will generate an omnidirectional microphone output with a rising frequency response that is optimal for achieving the best SI/SQ for the noise condition corresponding to the calculated modified SII, mSII, values between V1_2 and V1_3.
It should be noted that the example presented in
The standard SII block 1008 outputs an SII 1012 and an SNR 1014 which, along with a weighting function 1016, is fed into block 1018 to calculate the modified SII (mSII) using Equations (1) and (2) described earlier herein. At decision block 1020, the mSII value is compared to the first predetermined threshold value, V1 and the second predetermined threshold value, V2.
Low noise conditions may be indicated when mSII is greater than or equal to V11022. For low noise conditions 1022, the primary microphone signal 1003a is output 1024 without any processing as omnidirectional with flat FR 1026. This scenario is for low noise use cases when the microphone with flat FR is sufficient for SI and SQ.
High noise conditions, such as noise typically caused by wind turbulence, may be indicated when mSII is less than or equal to V21028. In this scenario a high pass filter 1030 is applied to one of the primary or secondary microphones 1002a, 1002b to generate a microphone output 1032 having a rising FR shape. The high pass filter 1030 has design coefficients that may be stored in a lookup table. The microphone signal having a rising FR shape 1032 will provide relatively optimal SI and SQ for high noise conditions in the vehicle cabin.
Medium to high noise conditions may be indicated when mSII is somewhere between V1 and V2, 1034. For medium to high noise conditions, a unidirectional microphone output is the preferred design choice. When mSII falls between the first and second predetermined thresholds, V1 and V2, 1034, the outputs 1003a, 1003b of the primary microphone 1002a and the secondary microphone 1002b are combined 1036 in an algorithm to form an array (a two-element array in the present example) that produces a unidirectional output 1038.
The signal processing blocks and steps may be carried out within the microphone elements 1002a, 1002b, or the microphone module 1002 if the microphone elements 1002a and 1002b or the microphone module 1002 include a built-in digital signal processor (DSP). Alternatively, a DSP in another system element, such as an amplifier or a head unit, may carry out the processing blocks and steps.
The inventive subject matter adaptively adjusts frequency response and directivity for one or more microphones to optimize SI and SQ performance for a variety of driving conditions. A switching algorithm establishes a linear correlation between standard SII and a MOS score to calculate a modified SII. The modified SII is compared to predetermined threshold values and a determination is made whether the microphone output, FR, and directivity, should be adjusted to optimize SI and SQ performance in the presence of varying noise sources. More specifically, the inventive subject matter may apply to an automotive hands-free microphone and its ability to detect, and compensate for, noise that occurs in the vehicle cabin under varying driving conditions that interferes with the hand-free microphone's ability to detect speech.
The inventive subject matter references national and international standards on SI and SQ evaluations. The inventive subject matter uses overall noise spectrum characteristics and applies a weighting characteristic to accurately differentiate driving conditions. The inventive subject matter determines, based on a driving condition, whether to process the microphone output in a manner that optimizes SI and SQ performance, even as driving conditions change.
In the foregoing specification, the present disclosure has been described with reference to specific exemplary embodiments. The specification and figures are illustrative, rather than restrictive, and modifications are intended to be included within the scope of the present disclosure. Accordingly, the scope of the present disclosure should be determined by the claims and their legal equivalents rather than by merely the examples described.
For example, the steps recited in any method or process claims may be executed in any order, may be executed repeatedly, and are not limited to the specific order presented in the claims. Additionally, the components and/or elements recited in any apparatus claims may be assembled or otherwise operationally configured in a variety of permutations and are accordingly not limited to the specific configuration recited in the claims. Any method or process described may be carried out by executing instructions with one or more devices, such as a processor or controller, memory (including non-transitory), sensors, network interfaces, antennas, switches, actuators to name just a few examples.
Benefits, other advantages, and solutions to problems have been described above regarding embodiments; however, any benefit, advantage, solution to problem or any element that may cause any particular benefit, advantage, or solution to occur or to become more pronounced are not to be construed as critical, required, or essential features or components of any or all the claims.
The terms “comprise”, “comprises”, “comprising”, “having”, “including”, “includes” or any variation thereof, are intended to reference a non-exclusive inclusion, such that a process, method, article, composition, or apparatus that comprises a list of elements does not include only those elements recited but may also include other elements not expressly listed or inherent to such process, method, article, composition, or apparatus. Other combinations and/or modifications of the above-described structures, arrangements, applications, proportions, elements, materials, or components used in the practice of the present disclosure, in addition to those not specifically recited, may be varied, or otherwise particularly adapted to specific environments, manufacturing specifications, design parameters or other operating requirements without departing from the general principles of the same.
Number | Name | Date | Kind |
---|---|---|---|
7171003 | Venkatesh | Jan 2007 | B1 |
20150019212 | Nongpiur | Jan 2015 | A1 |
20190019526 | Sørensen | Jan 2019 | A1 |
Number | Date | Country |
---|---|---|
115186000 | Oct 2022 | CN |
Number | Date | Country | |
---|---|---|---|
20240144950 A1 | May 2024 | US |