Speech enhancement technology is an indispensable part for many far-field sound capturing devices in adverse environments. Both shotgun microphones (usually a super-cardioid capsule with long, hollow, slotted interference tube) and microphone arrays are capable of attenuating the ambient noise or interference due to their high directionality. Shotgun microphone is commonly used in many applications requiring low noise such as camera-specific, conference-only, or interview-specific situations. Although, this type of shotgun microphones can pick up the sound in a certain direction in a noisy environment, making the picked-up sound clearer and less noisy, they have fixed beamforming properties and are not tunable. Additionally, the cost associated with designing and producing such microphones is relatively high. In comparison, a microphone array with an appropriate signal processing algorithm can provide more flexible solutions.
Differential microphone array (DMA), among all microphone arrays, has been gaining attention recently. As one type of DMA, a linear differential microphone array (LDMA) has been extensively studied, however, many of the LDMA designs published appear to assume the use of the omni-directional microphones. Although a robust LDMA design can improve the white noise gain (WNG) with a minimum-norm solution by using more microphone elements than the order of LDMA, the WNG may still be relatively low, especially at the low frequencies, causing the well-known white noise amplification problem in the practical implementations. Additionally, the directivity factor (DF) of the conventional LDMA usually degrades as the frequency increases and a beampattern also tends to deform at high frequencies.
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.
A design method for a linear differential directional microphone array (LDDMA), which takes into account the directionality of the array elements, is provided. Some directional microphone elements have inherent unique property which may be advantageous over the omni-directional elements. The LDDMA may be implemented as a high-performance shotgun sound capturing device.
Omni-directional and directional microphone elements are commonly used in the industry. An omni-directional microphone picks up sound with an equal gain from all directions while a directional microphone picks up sound predominantly from some specific direction(s). Mathematically, the beampattern of a directional microphone can be expressed as u(p, θ, α)=p+(1−p)cos(θ−α), where θ is the sound incident angle, α is the steering direction of the microphone element and p defines the property of the directional microphone, for instance, it makes the well-known cardioid beampattern when p=0.5 and a dipole when p=0. The directional microphones may be any type of directional microphones including omni-directional, cardioid, dipole microphones, and the like.
Two approaches, a dedicated directional microphone using a single microphone cartridge with two sound inlets and a two-omnidirectional-element system with some appropriate digital signal processing, may be utilized to implement a directional microphone. The dedicated directional microphone approach is known to yield a much better directional microphone in term of signal-to-noise ratio (SNR) than the two-omnidirectional-element system approach. This performance advantage of the dedicated directional microphone is mainly due to the signal processing, which creates the directivity, being performed acoustically with the front and rear sound inlets. This unique property of the dedicated directional microphone may be utilized to achieve a better performance than the conventional LDMA. The dedicated directional microphone may come in the form of either Electret Condenser Microphones (ECMs) or Micro-Electro-Mechanical System (MEMS).
For the ULA 102 in
d(ω,θ)=[p+(1−p)cos θ][1e−jωδ cos θ/c . . . e−jω(M−1)δ cos θ/c]T, (1)
where the superscriptT is the transpose operator, j=√{square root over (−1)} is the imaginary unit, ω=2πf is the angular frequency, and f is the temporal frequency. For comparison, the steering vector for a conventional ULA with omni-directional microphones may be expressed as:
a(ω,θ)=[1e−jωδ cos θ/c . . . e−jω(M−1)δ cos θ/c]T, (2)
By combining the equation of the beampattern of a directional microphone with the equation for a conventional ULA with omni-directional microphones (2), the steering vector, d (ω, θ), may be expressed as:
d(ω,θ)=u(p,θ)a(ω,θ) (3)
The beamforming problem may be interpreted as a spatial filter to estimate the signal from the desired look direction and suppress the signal from the undesired direction, by applying a complex weight vector:
h(ω)=[H1(ω)H2(ω) . . . HM(ω)]T. (4)
Given the signal model, in the desired look direction θ=0, the beamformer exhibits a distortionless response, i.e., dH(ω, θ)h(ω)=1, where the superscriptH is the conjugate-transpose operator. In other directions, the beamformer shows a certain distortion on the response, i.e., dH(ω, θ)h(ω)<1.
The mathematical definitions of three widely-used performance measures, i.e., white noise gain (WNG), beampattern, and directivity factor (DF) are provided as follows. WNG shows the ability of a beamformer to suppress spatially uncorrelated noise, and is also the most convenient way to evaluate the sensitivity of a beamformer to some of its imperfections such as sensor noise, position errors, etc. WNG is defined as: W[h(ω)]=1/[hH(ω)h(ω)]. A beampattern illustrates the directional sensitivity of a beamformer to a plane wave 108 impinging on the array 102 from the incident angle θ as illustrated in
Directivity factor (DF) is defined as the ratio between the array output response power in the desired steering direction and the power averaged over all directions, i.e., DF is computed as DF[h(ω)]=1/∫0π dϕ ∫02π dθ sin ϕ|B(ω, ϕ, θ)|2, where |B(ω, ϕ, θ)| is the is the beampattern in the spherical coordinate system; θ is the azimuth angle and the ϕ is the elevation angle. Directivity index (DI) is defined as DI[h(ω)]=10*log 10(DF[h(ω)]).
To design an Nth-order differential beamforming for a ULA with directional microphones, the problem may be formulated as linear system equations shown below.
R(ω,θ)h(ω)=c, (5)
where θ is a constraint matrix R(ω, θ) of size (N+1)×M is given by:
where dH(ω, θn), n=1, 2, . . . , N, is the steering vector of length M defined in the equation (1), and
θ=[0 θ1 . . . θN]T, (7)
c=[1 c1 . . . cN]T, (8)
are vectors of size (N+1) containing the design parameters of the beamformer. θ (bold letter face) indicates a null-position constraint vector as defined in the equation (7) and θ1 . . . θN usually define the desired null directions, and c1 . . . cN are the corresponding response for these directions, i.e., 0 for a null or a small value if some attenuation is desired.
Combining the equations (3) and (6) yields:
R(ω,θ)=U(p,θ)A(ω,θ), (9)
where a steering matrix A(ω, θ) is constructed based on the steering vectors a(ω, θ) as shown below:
and U(p, θ) is called a microphone response matrix and expressed as a diagonal matrix:
U(p,θ)=diag(1,u(p,θ1), . . . ,u(p,θN)) (11)
To maximize the WNG of the array 102 and solve the linear system equations of (5), a minimum-norm solution may be utilized to obtain an LDDMA beamformer as:
h(ω)=RH(ω,θ)R(ω,θ)RH(ω,θ)−1c (12)
where the LDDMA beamformer with the minimum-norm solution may be recognized as the same form as that of the LDMA.
The difference is reflected in R(ω, θ) which consists of the conventional far-field steering vectors for omnidirectional microphones and the proposed directional microphone response vectors, as shown in the equation (9). Combining the equations (9) and (12), the LDDMA beamformer may be reformulated as:
h(ω)=AH(ω,θ)UH(p,θ)[U(p,θ)A(ω,θ)AH(ω,θ)UH(p,θ)]−1c. (13)
This equation neatly shows the relationship between the solutions of a conventional LDMA and the proposed LDDMA, which extends the LDMA by introducing another degree of freedom, U(p, θ). In other words, the LDMA is a special case of the LDDMA when the microphone response matrix U(p, θ) is reduced to an identity matrix when p=1 for all microphones in the equation (11), i.e., the LDDMA may be used as a more general framework to design an LDMA.
To evaluate the effects of different types of directional microphones, i.e., p, on the performance of an LDDMA beamformer, three types of commonly used microphone elements, omnidirectional (p=1), cardioid (p=0.5), and dipole (p=0), are used to form a ULA with the array configuration of δ=1 cm and M=6. The comparison of their beampatterns at frequencies of 1 kHz, 3 kHz and 6 kHz for two designs, i.e., a second-order cardioid with
and c=[1 0 0]T and a third-order pattern with
and c=[1 0 0 0]T are illustrated.
As shown in
As shown in
Thus, the WNG and DI for the 3rd-order design perform similar to those for the 2nd-order design, that is, given the same constraints, the directional microphones are better suited in terms of the WNG and DI performance when constructing an LDMA than omni-directional microphones.
At block 802, a steering vector d(ω, θ) for a proposed apparatus, an LDDMA, may be generated. That is, some desired parameters of the LDDMA, including parameters δ, p, θ, N, and M, may be preselected for generating the steering vector d(ω, θ). At block 804, a proposed constraint matrix. R(ω, θ) may be generated based on the steering vector d(ω, θ). The constraint matrix R(ω, θ) may be reformulated, such as shown in the equation (9), based on a steering matrix and a microphone response matrix, such as the equations (10) and (11), respectively, and be a matrix of a size (N+1)×M, where N is an order of differential beam forming for the ULA and M is a number of microphones. The microphone response matrix may be derived based on a beampattern of a directional microphone with a sound incident angle θ, a steering direction α, and property of the directional microphone p as described above. For example, p=1 indicates omni-directional microphones, p=0.5 indicates cardioid microphones, and p=0 indicates dipole microphones. Although omni-directional, cardioid, are dipole microphones are described, the directional microphones may be any type of directional microphones.
Based on a minimum-norm solution, such as the equation (12) for maximizing the white noise gain (WNG), an LDDMA beamformer, such as h(ω) of the equation (13), may be obtained at block 806. As can be seen, the beamformer h(ω) is frequency dependent complex value weights.
At block 808, the LDDMA beamformer for a desired direction at a desired frequency may be calculated and stored in memory, and time domain frame-by-frame sensor signals through the LDDMA may be obtained at block 810. At block 812, all the time domain sensor signals may be transformed into the frequency domain sensor values. For each frame, the real value of signals in time domain will become a complex value in the frequency domain. The transformation method used may be short-time Fourier transform (STFT), filter-banks, wavelet transform, and the like. In the frequency domain, the LDDMA beamformer complex value weights may be loaded in a vector form (LDDMA beamformer vector) and a dot product of the frequency domain sensor signal complex values and the LDDMA beamformer vector may be obtained at block 814. Then the result of the dot product is a single complex value in the frequency domain, which may be transformed into a real value in the time domain signal by a corresponding inverse transform function.
As discussed above, the effects of different types of directional microphones to form a ULA, for example, omnidirectional (p=1), cardioid (p=0.5), and dipole (p=0), on the performance of the LDDMA beamformer, may be used with different array configurations having various inter-element spacing δ and number of elements M at different frequencies for different order patterns, to evaluate beampatterns as illustrated in
Some or all operations of the methods described above can be performed by execution of computer-readable instructions stored on a computer-readable storage medium, as defined below. The term “computer-readable instructions” as used in the description and claims, include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
The computer-readable storage media may include volatile memory (such as random-access memory (RAM)) and/or non-volatile memory (such as read-only memory (ROM), flash memory, etc.). The computer-readable storage media may also include additional removable storage and/or non-removable storage including, but not limited to, flash memory, magnetic storage, optical storage, and/or tape storage that may provide non-volatile storage of computer-readable instructions, data structures, program modules, and the like.
A non-transient computer-readable storage medium is an example of computer-readable media. Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media. Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, phase change memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer-readable storage media do not include communication media.
The computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to
A. A method for constructing a linear array (LA) of microphones comprising: generating a steering vector for the LA having preselected parameters; generating a constraint matrix based on the steering vector; reformulating the constraint matrix based on a microphone response matrix and a steering matrix; obtaining a beamformer by applying a minimum norm solution in terms of the constraint matrix; verifying a desired characteristic of the LA by calculating the beamformer for a desired direction; and constructing the LA based on the preselected parameters and the beamformer.
B. The method as paragraph A recites, wherein the microphones of the LA are directional microphones and the LA is a linear differential directional microphone array (LDDMA).
C. The method as paragraph B recites, wherein the LDDMA is one of a uniform LDDMA or a non-uniform LDDMA.
D. The method as paragraph A recites, wherein the constraint matrix is a matrix of a size (N+1)×M, where Nis an order of differential beam forming for the LA and M is a number of microphones.
E. The method as paragraph A recites, wherein the microphone response matrix is derived based on a beampattern of a directional microphone with a sound incident angle, a steering direction, and property of the directional microphone.
F. The method as paragraph E recites, wherein the property of the directional microphone includes omni-directional, cardioid, and dipole.
G. The method as paragraph A recites. The method of claim 1, wherein obtaining the beamformer by applying the minimum norm solution in terms of the constraint matrix includes maximizing a white noise gain (WNG).
H. The method as paragraph A recites, wherein calculating the beamformer for the desired direction includes calculating the beamformer for the desired direction for at a desired frequency.
I. The method as paragraph H recites, wherein calculating the beamformer for the desired direction is based on time domain frame-by-frame sensor signals received through the LA.
J. The method as paragraph I recites, further comprising: transforming all of the time domain frame-by-frame sensor signals into frequency domain sensor values.
K. The method as paragraph J recites, further comprising: calculating a dot product of the frequency domain sensor values and a beamformer vector associated with complex value weights of the beamformer.
L. The method as paragraph K recites, wherein constructing the LA based on the preselected parameters and the beamformer includes constructing the LA based on the dot product.
M. A linear array (LA) comprising: a desired number of microphones linearly disposed and spaced with desired inter-microphone distances, the desired number of microphones and the desired inter-microphone distances verified by: generating a steering vector for the LA having preselected parameters; generating a constraint matrix based on the steering vector; reformulating the constraint matrix based on a microphone response matrix and a steering matrix; obtaining a beamformer by applying a minimum norm solution in terms of the constraint matrix; verifying a desired characteristic of the LA by calculating the beamformer for a desired direction; and constructing the LA based on the preselected parameters and the beamformer.
N. The LA as paragraph M recites, wherein the microphones of the LA are directional microphones and the LA is a linear differential directional microphone array (LDDMA).
O. The LA as paragraph N recites, wherein the LDDMA is one of a uniform LDDMA or a non-uniform LDDMA.
P. The LA as paragraph M recites, wherein the constraint matrix is a matrix of a size (N+1)×M, where N is an order of differential beam forming for the LA and M is a number of microphones.
Q. The LA as paragraph M recites, wherein the microphone response matrix is derived based on a beampattern of a directional microphone with a sound incident angle, a steering direction, and property of the directional microphone.
R. The LA as paragraph Q recites, wherein the property of the directional microphone includes omni-directional, cardioid, and dipole.
S. The LA as paragraph M recites, wherein obtaining the beamformer by applying the minimum norm solution in terms of the constraint matrix includes maximizing a white noise gain (WNG).
T. The LA as paragraph M recites, wherein calculating the beamformer for the desired direction includes calculating the beamformer for the desired direction for at a desired frequency.
U. The LA as paragraph T recites, wherein calculating the beamformer for the desired direction is based on time domain frame-by-frame sensor signals received through the LA.
V. The LA as paragraph U recites, further comprising: transforming all of the time domain frame-by-frame sensor signals into frequency domain sensor values.
W. The LA as paragraph V recites, further comprising: calculating a dot product of the frequency domain sensor values and a beamformer vector associated with complex value weights of the beamformer.
X. The LA as paragraph W recites, wherein constructing the LA based on the preselected parameters and the beamformer includes constructing the LA based on the dot product.
Y. A computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to perform operations comprising: generating a steering vector for the LA having preselected parameters; generating a constraint matrix based on the steering vector; reformulating the constraint matrix based on a microphone response matrix and a steering matrix; obtaining a beamformer by applying a minimum norm solution in terms of the constraint matrix; verifying a desired characteristic of the LA by calculating the beamformer for a desired direction; and constructing the LA based on the preselected parameters and the beamformer.
Z. The computer-readable storage medium as paragraph Y recites, wherein the microphones of the LA are directional microphones and the LA is a linear differential directional microphone array (LDDMA).
AA. The computer-readable storage medium as paragraph Z recites, wherein the LDDMA is one of a uniform LDDMA or a non-uniform LDDMA.
AB. The computer-readable storage medium as paragraph Y recites, wherein the constraint matrix is a matrix of a size (N+1)×M, where N is an order of differential beam forming for the LA and M is a number of microphones.
AC. The computer-readable storage medium as paragraph Y recites, wherein the microphone response matrix is derived based on a beampattern of a directional microphone with a sound incident angle, a steering direction, and property of the directional microphone.
AD. The computer-readable storage medium as paragraph AC recites, wherein the property of the directional microphone includes omni-directional, cardioid, and dipole.
AE. The computer-readable storage medium as paragraph Y recites, wherein obtaining the beamformer by applying the minimum norm solution in terms of the constraint matrix includes maximizing a white noise gain (WNG).
AF. The computer-readable storage medium as paragraph Y recites, wherein calculating the beamformer for the desired direction includes calculating the beamformer for the desired direction for at a desired frequency.
AG. The computer-readable storage medium as paragraph AF recites, wherein calculating the beamformer for the desired direction is based on time domain frame-by-frame sensor signals received through the LA.
AH. The computer-readable storage medium as paragraph AG recites, further comprising: transforming all of the time domain frame-by-frame sensor signals into frequency domain sensor values.
AI. The computer-readable storage medium as paragraph AH recites, further comprising: calculating a dot product of the frequency domain sensor values and a beamformer vector associated with complex value weights of the beamformer.
AJ. The computer-readable storage medium as paragraph AI recites, wherein constructing the LA based on the preselected parameters and the beamformer includes constructing the LA based on the dot product.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2019/117371 | 11/12/2019 | WO |