This application claims priority to Chinese Application No. 202010636777.2 filed Jul. 3, 2020, the disclosure of which is hereby incorporated in its entirety by reference herein.
The present invention relates to the field of voice enhancement and voice separation, and more particularly, to a method and system for compensating a frequency response of a microphone in a vehicle.
With the continuous intelligentization of vehicles, more and more attention has been paid to the field of on-board voice enhancement and multi-voice separation. Generally, multiple on-board microphones have been calibrated to be completely consistent before delivery to ensure the accuracy of execution results of various voice-related algorithms (e.g., a blind source separation algorithm). However, losses of the microphones will be different in use due to various reasons such as use time, temperature, and humidity, and thus characteristics of the microphones will become different. This will bring many disadvantages. For example, when there is more than one passenger talking at the same time in a vehicle, a host system of the vehicle often cannot perform voice separation well. This is because predetermined conditions required by a voice separation algorithm cannot be satisfied any longer due to the inconsistent degradation of microphone performance. This will provide a bad user experience for users. In addition, if a worn microphone needs to be replaced or recalibrated, users need to return the vehicle to the factory for repair. Such repairs are inconvenient users.
Therefore, it is necessary to develop a method and system capable of compensating a frequency response of an on-board microphone to improve the accuracy of voice processing algorithms (e.g., blind source separation), so as to bring better user experience to users.
One or more embodiments of the present invention provide a method for compensating a frequency response of a microphone. The method includes receiving a compensation signal at multiple microphones in a microphone array from a calibration speaker and outputting multiple output signals. A uniform frequency response of the multiple microphones is determined based on the multiple output signals. A compensation gain is calculated for each microphone in the microphone array according to the uniform frequency response, and the calculated compensation gain of each microphone is stored.
One or more embodiments of the present invention provide a system for compensating a frequency response of a microphone. The system includes a calibration speaker, a microphone array, a processor, and a memory. The calibration speaker is configured to send out a compensation signal to the microphone array. Multiple microphones in the microphone array receive the compensation signal sent from the calibration speaker and output multiple microphone output signals. The processor is configured to: determine a uniform frequency response of the multiple microphones based on the multiple output signals, and calculate a compensation gain for each of the multiple microphones according to the uniform frequency response. The memory is configured to store the calculated compensation gain for each microphone.
One or more embodiments of the present invention provide a computer-readable medium configured to perform steps of the method described above.
Advantageously, the method and system for compensating a frequency response disclosed in the present invention can conveniently and flexibly improve the accuracy of blind source separation, thereby bringing better user experience to users.
The system may be better understood with reference to the following description and in conjunction with the accompanying drawings. Parts in the figures are not to scale, but the focus is placed on explaining the principle of the present invention. In addition, in the figures, similar or identical reference numerals refer to similar or identical elements.
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention.
It should be understood that the following description of the embodiments is given for illustrative purposes only, and not restrictive. The division of examples in the functional blocks, modules, or units shown in the drawings should not be construed as representing these functional blocks, and these modules or units must be implemented as physically separated units. The functional blocks, modules, or units shown or described may be implemented as individual units, circuits, chips, functions, modules, or circuit elements. One or more functional blocks or units may also be implemented in a common circuit, chip, circuit element, or unit.
When multiple people are talking in a vehicle at the same time, some voice enhancement processing, such as a blind spot separation (BSS) algorithm, is required to isolate clean voices in the voice recognition process.
x
1(t)=Σj=1Nsj(t)
x
2(t)=Σj=1Nαjsj(t−δj)
where N represents the number of sources, δj represents an arrival delay between the microphones (i.e., a time difference between sound from a source to two microphones), and αj is a relative attenuation factor, which corresponds to an attenuation ratio of a path between the source and the microphones.
The above expression assumes that the microphones have the same frequency response. In other words, when a person speaks to microphones (j=1), if the distances between the person and the microphones are the same (α=1, δ=0), the microphone inputs are the same (s(t)), so the microphone outputs should be the same, i.e. x1 (t)=x2 (t). However, in use, attributes of the microphones will change with time, temperature, and humidity, and the changes may be inconsistent. Even microphones with the same initial specifications may have inconsistent attributes after being used for a period of time. That is, even though the inputs are both s(t), output signals x1(t) and x2 (t) are not the same.
After receiving a recalibration control signal sent by a user to the system, the system is set in a recalibration mode. The control signal controls the speaker to send out a compensation signal, and microphone 1 and microphone 2 are started to record the compensation signal and respectively output microphone output signals x1(t) and x2(t) to the vehicle host system. The compensation signal may be a chrip signal with a broadband frequency and a uniform amplitude. For example, the chrip signal may linearly scan from a frequency of 0.1 kHz to a frequency of 4 kHz, and the duration is always 5 s. For example, the recording duration of the microphone is about 7-8 s. However, those skilled in the art can understand that the chrip range of the chrip signal, the duration, the microphone recording time, and other parameters are described here as examples only, and are not intended to be specifically limited. The above parameters may be changed according to specific requirements.
The vehicle host system receives the microphone output signals x1(t) and x2(t), and converts the output signals into frequency domain signals X1(jω) and X2(jω). Then, a uniform frequency response (UFR) of microphone 1 and microphone 2 may be calculated based on the frequency domain signals. Next, gains of the two microphones may be calculated based on the uniform frequency response. For example, a compensation gain of microphone 1 is gain1 and a compensation gain of microphone 2 is gain2. Finally, the calculated compensation gains gain1 and gain2 of the two microphones may be saved in the system for use in algorithms such as BSS. For example, once the BSS algorithm is invoked, calibrated and updated; microphone gains are first invoked from a memory to compensate frequency responses of the microphone output signals, and then the compensated output signals are used as inputs to the BSS algorithm. For example, a frequency spectrum of an audio signal received from microphone 1 will be multiplied by gain1, and a frequency spectrum of an audio signal received from microphone 2 will be multiplied by gain2. Therefore, the frequency responses of the output signals of the two microphones are compensated through the stored corresponding compensation gains. The accuracy of subsequent voice processing algorithms (such as the BSS algorithm) is further improved.
For the purpose of briefly explaining the principle,
For the microphone array in
Alternatively, frequency response amplitudes of output signals of all or part of the microphones in the microphone array may be calculated respectively, and a weighted sum of the frequency response amplitudes may be calculated, thereby calculating the UFR of all or part of the microphones. For example,
UFR=a*|X
1(jω)|+b*|X2(jω)|+ . . . +q*|Xp(jω)|,
where p≤N, a+b+ . . . +q=1, N represent the total number of microphones in the microphone array, P represents the number of partial microphones in the microphone array, and a, b . . . p are weight coefficients of corresponding microphones respectively. For example, the weight coefficients may be equal to 1/p, or may also be set according to the importances of the microphones. For example, if the output of a certain microphone is more important, the weighting coefficient thereof is larger.
Then, a ratio of the UFR to a frequency response amplitude of an output signal of each microphone may be calculated, thereby calculating the compensation gain of each microphone in the microphone array. For example,
Alternatively, frequency response energy values of the output signals of all or part of the microphones in the microphone array may be calculated respectively, and a UFR of all or part of microphones in the microphone array is obtained by calculating a weighted sum of the frequency response energy values. For example,
UFR=a*|X
1(jω)|2+b*|X2(jω)|2+ . . . +q*|Xp(jω)|2)1/2,
where p≤N, a+b+ . . . +q=1, N represent the total number of microphones in the microphone array, P represents the number of partial microphones in the microphone array, and a, b . . . p are weight coefficients of corresponding microphones respectively.
Then, the compensation gain of each microphone may be calculated by calculating a ratio of the UFR to a frequency response energy of an output signal of the microphone. For example,
The microphone array arranged in a linear array shown in
For each group of microphones, the gain may be calculated according to a dual-microphone frequency response compensation scheme shown in
By analogy, the gains of the microphones numbered N/2 and N/2+1 in the N/2th group of microphones are calculated, so that compensation gains of all the microphones are finally obtained.
Alternatively, frequency response amplitudes of output signals of two microphones in each group of microphones may be calculated, and a weighted sum of the frequency response amplitudes may be taken as a UFR of the group. By calculating a ratio of the UFR of each group to the frequency response amplitude of the output signal of each microphone in the group, a compensation gain of each microphone in the group of microphones is obtained.
For example, the UFR of the first group of microphones may be calculated by the following formula:
UFR=a*|X
1(jω)|+q*|XN(jω)|,
Then, the gains of two microphones in the first group of microphones are respectively calculated as:
By analogy, the gains of the microphones numbered N/2 and N/2+1 in the N/2th group of microphones are calculated, so that the gains of all the microphones are calculated.
Alternatively, frequency response energies of output signals of two microphones in each group of microphones may be calculated, and a weighted sum of the frequency response energies may be taken as a UFR of the group. Then, the compensation gain of each microphone in each group is calculated by calculating a ratio of the UFR of the group to a frequency response energy of an output signal of the microphone.
For example, the UFR of the first group of microphones may be calculated by the following formula:
UFR=a*|X
1(jω)|+q*|Xp(jω)|2)1/2,
The gains of two microphones in the first group of microphones are respectively calculated as:
By analogy, the gains of the microphones numbered N/2 and N/2+1 in the N/2th group of microphones are calculated, so that the gains of all the microphones are calculated.
The microphone array arranged in a linear array shown in
Processor 703 is also configured to determine whether the calibration speaker is positioned to be equally spaced from the microphones in the microphone array. When the processor determines that the calibration speaker is positioned to be equally spaced from the microphones in the microphone array, a frequency response of an output signal of one microphone in the microphone array may be selected as a uniform frequency response UFR, a compensation gain of the selected microphone is set to 1, and a compensation gain of each microphone in the remaining microphones of the microphone array is calculated as a ratio of the uniform frequency response UFR to a frequency response amplitude of an output signal of the microphone.
Further, processor 703 is also configured to calculate, when it is determined that the calibration speaker is positioned to be equally spaced from the microphones in the microphone array, frequency response amplitudes of all or part of the multiple output signals, take a weighted sum of the frequency response amplitudes as the uniform frequency response UFR, and calculate the compensation gain of each microphone in the microphone array as a ratio of the uniform frequency response UFR to a frequency response amplitude of an output signal of the microphone.
Further, processor 703 is also further configured to calculate, when it is determined that the calibration speaker is positioned to be equally spaced from the microphones in the microphone array, frequency response energies of all or part of the multiple output signals, take a weighted sum of the frequency response energies as the uniform frequency response UFR, and set the compensation gain of each of the multiple microphones as a ratio of the uniform frequency response UFR to a frequency response energy of an output signal of the microphone.
Further, the processor 703 is also configured to further determine, when the processor determines that the calibration speaker is not positioned to be equally spaced from the microphones in the microphone array, whether the calibration speaker is located on a central symmetry axis of the microphone array. If the calibration speaker is located on the central symmetry axis of the microphone array and the number of microphones in the microphone array is an even number, the multiple microphones are grouped by grouping every two microphones equally spaced from the calibration speaker. If the calibration speaker is located on the central symmetry axis of the microphone array and the number of microphones in the microphone array is an odd number, multiple microphones other than the microphone located on the central symmetry axis are grouped by grouping every two microphones equally spaced from the calibration speaker.
Further, processor 703 is also further configured to select, when the number of microphones in the microphone array is an even number, a frequency response of an output signal of one microphone in each group of microphones as a uniform frequency response of the group, set a compensation gain of the selected microphone to 1, and calculate a compensation gain of the other microphone in the group of microphones as a ratio of the uniform frequency response of the group to a frequency response amplitude of an output signal of the other microphone in the group of microphones.
Further, processor 703 is also further configured to calculate, when the number of microphones in the microphone array is an even number, frequency response amplitudes of output signals of each group of microphones, take a weighted sum of the frequency response amplitudes of the output signals as a uniform frequency response of the group, and calculate a compensation gain of each microphone in the group of microphones as a ratio of the uniform frequency response of the group to a frequency response amplitude of an output signal of the microphone.
Further, processor 703 is also further configured to calculate, when the processor determines that the number of microphones in the microphone array is an even number, frequency response energies of multiple output signals of each group of microphones, take a weighted sum of the frequency response energies as a uniform frequency response of the group, and calculate a compensation gain of each microphone in the group of microphones as a ratio of the uniform frequency response of the group to a frequency response energy of an output signal of the microphone.
Further, processor 703 is also further configured to select, when the number of microphones in the microphone array is an odd number, a frequency response of an output signal of one microphone in each group of microphones as a uniform frequency response of the group, set a compensation gain of the selected microphone to 1, calculate a compensation gain of the other microphone in the group of microphones as a ratio of the uniform frequency response of the group to a frequency response amplitude of an output signal of the other microphone in the group of microphones, and set a compensation gain of the microphone located on the central symmetry axis to 1.
Further, processor 703 is also further configured to calculate, when the number of microphones in the microphone array is an odd number, frequency response amplitudes of output signals of each group of microphones, take a weighted sum of the frequency response amplitudes of the output signals as a uniform frequency response of the group, calculate a compensation gain of each microphone in the group of microphones as a ratio of the uniform frequency response of the group to a frequency response amplitude of an output signal of the microphone, and set a compensation gain of the microphone located on the central symmetry axis to 1.
Further, processor 703 is also further configured to calculate, when the number of microphones in the microphone array is an odd number, frequency response energies of output signals of each group of microphones, take a weighted sum of the frequency response energies of the output signals as a uniform frequency response of the group, calculate a compensation gain of each microphone in the group of microphones as a ratio of the uniform frequency response of the group to a frequency response energy of an output signal of the microphone, and set a compensation gain of the microphone located on the central symmetry axis to 1.
The processor of the present invention as a whole may be a microprocessor, an application specific integrated circuit (ASIC), a system on chip (SoC), a mobile computing device (e.g., a tablet computer or a mobile phone), a media player, etc.
Any one or more of the processor, memory, or system described herein includes computer-executable instructions that may be compiled or interpreted from computer programs created using various programming languages and/or technologies. Generally speaking, a processor (such as a microprocessor) receives and executes instructions, for example, from a memory, a computer-readable medium, etc. The processor includes a non-transitory computer-readable storage medium capable of executing instructions of a software program. The computer-readable medium can be, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof.
The description of the implementations has been presented for the purposes of illustration and description. Appropriate modifications and changes of the implementations can be implemented in view of the above description or can be obtained through practical methods. For example, unless otherwise indicated, one or more of the methods described may be performed by a combination of suitable devices and/or systems. The method can be performed in the following manner: using one or more logic devices (for example, processors) in combination with one or more additional hardware elements (such as storage devices, memories, circuits, hardware network interfaces, etc.) to perform stored instructions. The method and associated actions can also be executed in parallel and/or simultaneously in various orders other than the order described in this application. The system is illustrative in nature, and may include additional elements and/or omit elements. The subject matter of the present disclosure includes all novel and non-obvious combinations of the disclosed various methods and system configurations and other features, functions, and/or properties.
As used in this application, an element or step listed in the singular form and preceded by the word “one/a” should be understood as not excluding a plurality of said elements or steps, unless such exclusion is indicated. Furthermore, references to “one implementation” or “an example” of the present disclosure are not intended to be interpreted as excluding the existence of additional implementations that also incorporate the recited features. The present invention has been described above with reference to specific embodiments. However, those of ordinary skill in the art will understand that various modifications and changes can be made without departing from the broader spirit and scope of the present invention as set forth in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202010636777.2 | Jul 2020 | CN | national |
Number | Date | Country | |
---|---|---|---|
20220007110 A1 | Jan 2022 | US |