Embodiments of the present invention relate to a device, to a corresponding method, and to an apparatus for estimating Direction of Arrival (DOA) from Q≥1 sound sources. In particular, the device and method perform a post-processing on a phase difference matrix, which is obtained, for instance, from a sound receiver adapted to receive the sound from the sound sources.
Most multichannel sound source DOA estimation algorithms suffer from spatial aliasing problems. As a consequence of spatial aliasing, interchannel phase differences are wrapped beyond the spatial aliasing frequency. A common solution for addressing this problem is to adjust a distance between microphones or microphone arrays receiving the sound generated by the sound sources, in order to obtain a suitable minimum aliasing frequency. Further, to then take only the frequency band below that minimum aliasing frequency for localizing the sound sources.
A conventional method for localizing sound sources using microphones is to estimate a Time Difference of Arrival (TDOA, Δt) from each sound source to the microphones. For narrow band localization algorithms, the TDOA can be estimated from the interchannel phase differences μi in each frequency band. The relationship between these phase differences and the TDOA is
μi=2πfiΔt (1)
where fi denotes the frequency of the narrowband. For a far-field assumption, DOA can further be estimated from the TDOA. The relationship between the phase differences μi and DOA, as expressed by an angle θ of the sound source to the microphones, is shown in the below equation (2). In this equation, c denotes the speed of sound in the recording environment, and Δd denotes a distance between the microphones.
Conventional algorithms estimate narrowband DOA by estimating a phase difference . However, when μi>2π, will be wrapped into [−π, π], which can be seen in
The frequency at the boundary of the spatial aliasing problem is called the aliasing frequency fa. From the below equation (3), it can be seen that fa is related to the angle θ, which is unfortunately unknown, so that the wrapped phase difference matrix , cannot directly be unwrapped for frequencies above fa.
The minimum aliasing frequency for a certain scenario is defined as fa
Conventionally, when a sound source is broad in frequency band, a Discrete Fourier transform (DFT) is applied. Then, the narrowband localization algorithm is repeated on each frequency. Thus, a “raw” phase difference vector is obtained for a single sound source scenario, and a “raw” phase difference matrix is obtained for a multisource scenario, which is defined as μ0. This phase difference matrix includes correct phase difference values only at fi≤fa
Therefore, conventionally only the lower frequency bands (fi≤fa0) of the sound are taken into account for the localization, in order to avoid the spatial aliasing problem. This is a significant disadvantage of the conventional algorithms.
In view of the above-mentioned problems and disadvantages, embodiments of the present invention aim at improving devices and methods that operate based on conventional localization algorithms, that is, for estimating DOA. Embodiments of the invention have the object of utilizing also higher frequency bands fi>fa
An object of embodiments of the invention is achieved by the solution provided in the description which follows.
In particular, embodiments of the invention use a replication of phase difference values in the phase difference matrix μ0, in order to reduce the impact of the spatial aliasing problem for single as well as multi sound source localization, and to apply further post-processing that can make the technique more robust, e.g., in noisy scenarios. In particular, embodiments of the invention reconstruct the phase difference matrix μ0 by certain relationships between its phase difference values, utilizing the higher frequency bands fi>fa
A first aspect of embodiments of the invention provide a device for estimating DOA of sound from Q≥1 sound sources, the device being configured to obtain a phase difference matrix, which includes measured phase difference values, each of the measured phase difference values being a measured value of a phase difference between two microphone units for a frequency bin in a range of frequencies of the sound, generate a replicated phase difference matrix by replicating the measured phase difference values to other potential sinusoidal periods, calculate a DOA value for each phase difference value in the replicated phase difference matrix, and determine, as Q DOA results, the Q most prominent peak values in a histogram generated based on the calculated DOA values.
Generating the replicated phase difference matrix enables a localization based on not only low frequencies but also high frequencies of wide-band sound sources, for greater robustness and accuracy, i.e. an improved estimation of the DOA. Specifically, higher frequency bands fi>fa
In addition, selecting the DOA results based on most prominent peaks in a histogram is both efficient and accurate, and allows further post-processing steps that additionally improve the DOA estimation.
A frequency bin may be a subrange of the range of frequencies of the sound, or may be a single frequency in the range of frequencies of the sound.
In an implementation form of the first aspect, the device is configured to generate the replicated phase difference matrix by replicating the measured phase difference values based on the minimum aliasing frequency defined by
wherein Δd denotes a distance between the two microphone units and c is the speed of the sound.
Thereby, all of the potential aliasing frequencies are covered for the replication. Below the minimum aliasing frequency fa0, phase differences cannot be wrapped, only above fa0.
In a further implementation form of the first aspect, the measured phase difference values in the phase difference matrix are wrapped into [−π, π], and the device is configured to generate the replicated phase difference matrix according to
wherein μ0 denotes the phase difference matrix, μ denotes the replicated phase difference matrix, i is a frequency bin index corresponding to frequency fi, j is a replication index, and ┌*┐ denotes the ceiling function.
In this manner, a replicated phase difference matrix can be generated efficiently, which covers all of the potentially correct sinusoidal periods of interchannel phase differences.
In a further implementation form of the first aspect, the device is configured to calculate the DOA values based on the formula
wherein θ (i,j) denotes the DOA value for frequency bin index i and replication index j, μ denotes the replicated phase difference matrix and Δd denotes a distance between the two microphone units.
As explained above, the replicated phase difference matrix contains for each frequency bin a set of one or more candidate values of the correct phase difference for that frequency bin. Transforming each element of the replicated phase difference matrix into a DOA value generates a DOA matrix which contains for each frequency bin a set of one or more candidate values of the correct DOA for that frequency bin, including the actual correct DOA value.
In a further implementation form of the first aspect, the device is configured to generate a first histogram from the calculated DOA values, and determine, as the Q DOA results, the Q most prominent peak values in the first histogram.
This implementation form is particularly advantageous if the sound sources provide broadband signals and/or if the scenario is clean, i.e. if noise in the scenario is low. The selection of the peaks from the first histogram is a fast and simple way to obtain the DOA results, and leads to more robust and accurate results than using only low frequencies.
In a further implementation form of the first aspect, the device is configured to generate a first histogram from the calculated DOA values, select, as Q+q DOA candidates, the Q+q most prominent peak values in the first histogram, wherein preferably q=2, generate a second histogram based on the selected Q+q DOA candidates, and determine, as the Q DOA results, the Q most prominent peak values in the second histogram.
This implementation form is particularly advantageous if the scenario is noisy and/or if some of the sound sources are weak. In this case, these weak sound sources may contribute peaks to the first histogram which are likely to be less prominent than peaks resulting from spatial aliasing. Accordingly, the selection of q additional peaks, which are taken as candidates from the first histogram, makes the DOA estimation even more robust and accurate.
In a further implementation form of the first aspect, the device is configured to remove complex calculated DOA values before generating the first histogram.
Thereby, the DOA estimation becomes less complex and has a high accuracy. DOA values are complex only if the interchannel phase differences are in the wrong sinusoidal periods.
In a further implementation form of the first aspect, for generating the second histogram, the device is configured to determine, for each selected DOA candidate, its related DOA values from the calculated DOA values, generate third histograms from each selected DOA candidate and its related DOA values, and generate the second histogram by merging the third histograms of all selected DOA candidates.
By selecting the related DOA values for each candidate, and analyzing the third histograms individually, the interference between sources is reduced. Therefore the accuracy of the DOA estimation is further improved.
In a further implementation form of the first aspect, the device is configured to merge the third histograms of all selected DOA candidates to generate the second histogram by, for each histogram index, using the maximum value from all the third histograms as the value of the second histogram for that histogram index.
After merging, the correct peaks are clearer compared to the second histogram. This implementation of the merging of the histograms therefore leads to the desired result of an accurate and robust DOA estimation. Using the mean for merging instead of maximum would lead to error accumulation across the different histograms. The merging rule based on the maximum does not have this problem.
In a further implementation form of the first aspect, the device is configured to determine the related DOA values of a DOA candidate by determining, as its related phase difference values, the phase difference values in the replicated phase difference matrix that are in supposed correct sinusoidal periods, and calculating its related DOA values from its related phase difference values.
Thereby, values corresponding to incorrect sinusoidal periods are removed. A supposed correct sinusoidal period is that sinusoidal period, which would be the result of unwrapping based on the aliasing frequency that is determined based on the DOA of the candidate peak. In this way of the determination, the height of the peak will be conserved in the third histogram, if the peak is correct.
In a further implementation form of the first aspect, the device is configured to apply a soft mask to the peak values in each of the third histograms, before merging the third histograms into the second histogram, wherein the soft mask is designed as a peak filter with a smaller width at a DOA of 0° and larger widths at DOAs of ±90°.
The soft masking of the peak values improves the accuracy of the peaks selected from the third histogram as DOA results. Theoretically, the widths of the aliasing peaks are large whereas the widths of the correct peaks are narrow at 0° and the widths increase when the peaks are getting closer to ±90°. Therefore using the soft-mask in this way can help to detect the correct peaks more reliably.
In a further implementation for of the first aspect, the device is configured to apply a low-pass filter to the second histogram, before determining the Q DOA results, preferably a Gaussian filter with a standard deviation σ according to
wherein fs denotes the sampling rate.
By use of such a Gaussian filter, the height of wide and narrow peaks can be balanced, leading to better estimation results. This filter can help to sharpen the wide correct peaks closer to ±90°, and flatten the narrow and sharp peaks around 0°.
In a further implementation form of the first aspect, each microphone unit includes an array of one or more microphones, and the one or more measured phase difference values of the phase difference matrix have been obtained from measured phase differences between the one or more microphones of one of the microphone units and the one or more microphones of the other one of the microphone units.
A second aspect of embodiments of the invention provide an apparatus for determining DOA of sound from Q≥1 sound sources, the apparatus comprising a device according the first aspect as such or any of its implementation forms, and a sound receiver including the two microphone units, which is configured to receive the sound, generate the phase difference matrix, and provide the phase difference matrix to the device.
The apparatus of the second aspect achieves all the advantages and effects of the device of the first aspect and its implementation forms, respectively.
A third aspect of embodiments of the invention provide a method of estimating DOA of sound from Q≥1 sound sources, the method comprising obtaining a phase difference matrix, which includes measured phase difference values, each of the measured phase difference values being a measured value of a phase difference between two microphone units for a frequency bin in a range of frequencies of the sound, generating a replicated phase difference matrix by replicating the measured phase difference values to other potential sinusoidal periods, calculating a DOA value for each phase difference value in the replicated phase difference matrix, and determining, as Q DOA results, the Q most prominent peak values in a histogram generated based on the calculated DOA values.
The method of the third aspect can be provided with implementation forms adding further method steps, which correspond to the actions taken by the device according to the implementation forms of the first aspect.
Accordingly, the method of the third aspect achieves all advantages and effects of the device of the first aspect and its implementation forms, respectively.
It will be noted that all devices, elements, units and means described in the present application could be implemented in software or hardware elements or any combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities.
Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.
The above described aspects and implementation forms of embodiments of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:
The device 100 of
The device 100 is further configured to generate a replicated phase difference matrix μ by replicating the measured phase difference values in the obtained phase difference matrix μ0 to other potential sinusoidal periods.
Then, the device 100 is configured to calculate a DOA value for each phase difference value in the replicated phase difference matrix μ, i.e. it calculate a DOA matrix θ. Finally, the device 100 is configured to determine, as Q DOA results, the Q most prominent peak values in a histogram generated based on the calculated DOA values θ.
The device 100 is thereby configured to carry out a method according to an embodiment of the invention. As shown in
The position of the device 100 in the sound source localization is shown in
A more detailed overview of a device 100 according to an embodiment of the invention, which builds on the embodiment of the device 100 in
In box 301, the phase difference matrix μ0 is obtained, and the replicated phase difference matrix μ is generated by replicating the measured phase difference values to other potential sinusoidal periods. In box 302, DOA values θ are calculated from the replicated phase difference matrix μ. That is, a DOA value θ is calculated for each phase difference value in the replicated phase difference matrix μ.
In box 303, a DOA histogram h (denoted as first histogram) is generated from the calculated DOA values θ. In a simple implementation form of the device 100, the Q most prominent peak values in the first histogram h may be selected already at this point as Q DOA results. In an implementation form of the device 100, for improved robustness, more peaks in the histogram h are detected at box 304. In particular, here the Q+q most prominent peak values in the first histogram h may be detected as DOA candidates. q is preferably 2.
In box 305, a binary masking may be applied, wherein the binary masking takes as input the Q+q peaks detected at box 304 and the DOA values θ calculated at box 302. Thus, in box 305 particularly related DOA values θ1, θ2 . . . θi are determined and output. At box 306, further histograms (denoted as third histograms) are produced from each selected DOA candidate and its related DOA values, and are output as h1, h2 . . . hi. At box 307, soft masking is applied to these histograms to output soft-masked histograms H1, H2 . . . Hi. That is, a soft mask to the peak values is applied in each of the third histograms. At box 308, these histograms H1, H2 . . . Hi are then merged into one histogram H (denoted as second histogram) at box 308. The third histograms are particularly merged to generate the second histogram by, for each histogram index, using the maximum value from all the third histograms as the value of the second histogram for that histogram index (denoted by “maximum”).
At box 309, an optional low-pass filtering is applied to the histogram H. Specifically, a Gaussian filter may be applied. Then, at box 309, the Q most prominent peak values in the second histogram are determined as the Q estimated DOA results θ, and are output.
The purpose of this step is to obtain a (replicated) phase difference matrix μ in all of the potential sinusoidal periods. Frequency bands below fa
where └*┘ denotes floor process, and μ is the replicated matrix. μ now contains μ0 in the correct sinusoidal period and contains some errors introduced from this step.
Each phase difference value in the replicated phase difference matrix μ has a single corresponding DOA θ. μ is transformed to DOA θ including these θ as
θ(i,j) denotes the DOA value for frequency bin index i and replication index j, and Δd denotes the distance between the two microphone units 203.
Now, {umlaut over (μ)} may define the phase differences in the correct sinusoidal periods, and the transformed corresponding value of DOAs may be defined as {dot over (θ)}. It is known that {dot over (θ)} is theoretically constant in clean (low noise) scenarios. This property can be expressed as
By simplifying the above equation (6), the relationship of {dot over (μ)} between different frequencies can be determined as
When the phase difference is in the wrong sinusoidal periods, {umlaut over (μ)}(i)={umlaut over (μ)}(i)+2nπ, (n≠0, n∈Z). The wrong estimated DOA is defined as {umlaut over (θ)}(i). {umlaut over (θ)}(i) is a complex number when the condition
is met. For this reason, all of the complex values are preferably removed from θ.
By taking the above equation (6) and the mentioned simplifications, the {umlaut over (θ)} differences relationship between different frequencies is obtained as
This proves that {umlaut over (θ)} is a monotonic variant along the frequency axis. Together with the constancy of {dot over (θ)}, when θ is transformed into the histogram h, the amplitudes of the correct peaks are higher than the peaks from {umlaut over (θ)}.
If sound sources 202 are broadband signals, and the scenario is clean, the DOA results can be estimated by the positions of the peaks with the highest Q prominence. If the scenario is noisy, and/or some of the sound sources 202 are weak, the corresponding peaks may have less prominence than the peaks from {umlaut over (θ)}.
To make the estimation carried out by the device 100 even more robust, in such a case, Q′=Q+q peaks may be taken from the histogram h as DOA candidates (practically, q is taken as 2, but it may also be another integer value, like 3 or higher).
This is shown in
To evaluate, whether the chosen peaks (DOA candidates) correspond to actual sound sources 202, and not aliasing peaks, each of the peaks is processed individually. The position of a kth peak is denoted as pk, and from equation (3), the corresponding aliasing frequency can be determined as fa
With these frequency indexes, binary masks can be applied to select the DOA values of the phases in supposed correct sinusoidal periods for the corresponding peaks from θ. The process of selecting the related DOA values for a peak value may be described as
where θk includes the kth peak and its related DOA values.
θk of each peak is then transformed into a histogram hk. That is a histogram hk is generated for the kth selected DOA candidate and its related DOA values, as is shown in
A soft mask Mk may now be applied to the histogram hk related to the kth peak, in order to highlight the correct peaks. The mask may be the same or different for each peak.
Theoretically, the width of an aliasing peak is large. In contrast, the width of a correct peak pk is narrow at 0°, and increases when the peak is getting closer to ±90°. With this property, the soft mask may be designed as a peak filter with small width at 0° and large width at +90°. A practical soft mask with respect to the kth selected DOA candidate can preferably be designed like
where fnh denotes the considered highest frequency.
The soft masking is preferably applied by Schurproduct (°) according to
Hk−hk° Mk (12)
The masked histograms from the peak candidates are merged to H by “maximum” according to
H(i)=max(H1(i), . . . ,Hk(i), . . . HQ′(i)) (13)
A low-pass filter is preferably further applied to this histogram H, more preferably Gaussian filter. Even more preferably, a Gaussian filter is suggested to be applied with a standard deviation a equal to the lowest localization resolution of the microphone setup. The reason to set this deviation is to balance the height of the peaks closer to 0° and 90°. Theoretically, the widths of the aliasing peaks are large while the widths of the correct peaks are narrow at 0°, and the widths of the correct peaks increase when the peaks are getting closer to ±90°. Therefore using the soft-mask in this way can help to detect the correct peaks more reliably. A simplified equation to obtain the lowest resolution is given as
where fs denotes the sampling rate.
Finally, Q peaks are selected by their peak prominence from the (optionally low-pass filtered) histogram H. The positions of the peaks are the DOA result output by the device 100.
As a consequence, the device 100 of embodiments of the invention enhances the robustness and accuracy of sound source localization that uses microphones or microphone arrays, especially when the distance between the microphones is large. A potential application for such a device 100 or for the apparatus 200 is, for example, in a distance speech pick up device, in a tablet, in a mobile phone, or in a teleconference device. In each application, the invention specifically reduces or eliminates the negative spatial aliasing effects.
The invention has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items described.
This application is a continuation of International Application No. PCT/EP2017/059732 filed on Apr. 25, 2017, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Date | Country |
---|---|---|
104459607 | Mar 2015 | CN |
104914408 | Sep 2015 | CN |
106405501 | Feb 2017 | CN |
WO-2010048620 | Apr 2010 | WO |
Entry |
---|
Masahito Togame, stepwise phase difference restoration method for DAO estimation of multiple sources, 2008, IEICE Trans. Fundamentals, VOLERi-A, No. 11 (Year: 2008). |
Togami et al., “Stepwise Phase Difference Restoration Method for DOA Estimation of Multiple Sources,” IEICE Trans. Fundamentals, vol. E91-A, No. 11, XP001517415, pp. 3269-3281, The Institute of Electronics, Information and Communication Engineers, Tokyo, Japan (Nov. 2008). |
Krämer et al., “Presentation of an improved Phase Unwrapping Algorithm based on Kalman filters combined with local slope estimation,” European Space Agency, Provided by the NASA Astrophysics Data System, pp. 253-260. (Mar. 1997). |
Arberet et al., “A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture,” IEEE Transactions on Signal Processing, vol. 58, No. 1, pp. 121-133, Institute of Electrical and Electronics Engineers, New York, New York (Jan. 2010). |
Lombard et al., “Multidimensional Localization of Multiple Sound Sources Using Averaged Directivity Patterns of Blind Source Separation Systems,” ICASSP 2009, pp. 233-236, Institute of Electrical and Electronics Engineers, New York, New York (2009). |
Itoh, “Analysis of the phase unwrapping algorithm,” Applied Optics, vol. 21, No. 14, OSA Publishing (Jul. 15, 1982). |
Amin et al., “Estimation of Direction of Arrival (DOA) Using Real-Time Array Signal Processing and Performance Analysis,” IJCSNS International Journal of Computer Science and Network Security, vol. 10, No. 7, pp. 43-57 (Jul. 2010). |
Dmochowski et al., “Direction of Arrival Estimation Using the Parameterized Spatial Correlation Matrix,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, No. 4, pp. 1327-1339, Institute of Electrical and Electronics Engineers, New York, New York (May 2007). |
Nikunen et al., “Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, No. 3, pp. 727-739, Institute of Electrical and Electronics Engineers, New York, New York (Mar. 2014). |
Nico et al., “Bayesian Approaches to Phase Unwrapping: Theoretical Study,” IEEE Transactions on Signal Processing, vol. 48, No. 9, pp. 2545-2556, Institute of Electrical and Electronics Engineers, New York, New York (Sep. 2000). |
Sukumar et al., “Phase Unwrapping with Kalman Filter based Denoising in Digital Holographic Interferometry,” 2015 International Conference on Advances in Computing, Communications, and Informatics (ICACCI), pp. 2256-2260, Institute of Electrical and Electronics Engineers, New York, New York (2015). |
Yan et al., “Weighted Kalman Filter Phase Unwrapping Algorithm Based on InSAR Image.,” Engineering Review vol. 33, Issue 3, pp. 227-231 (2013). |
Valin et al., “Robust Localization and Tracking of Simultaneous Moving Sound Sources Using Beamforming and Particle Filtering,” vol. 55, Robotics and Autonomous Systems, pp. 216-228, Elsevier (2006). |
Ihlefeld et al., “Effect of Source Spectrum on Sound Localization in an Everyday Reverberant Room,” pp. 324-333, Acoustical Society of America (Jul. 2011). |
Dibiase, “A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays,” pp. 1-122 (May 2000). |
Dmochowski et al., “On Spatial Aliasing in Microphone Arrays,” IEEE Transactions on Signal Processing, vol. 57, No. 4, pp. 1383-1395, Institute of Electrical and Electronics Engineers, New York, New York (Apr. 2009). |
Reddy et al., “Unambiguous Speech DOA Estimation Under Spatial Aliasing Conditions,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, No. 12, pp. 2133-2145, Institute of Electrical and Electronics Engineers, New York, New York (Dec. 2014). |
Amin et al., “Estimation of Direction of Arrival (DOA) Using Real-Time Array Signal Processing,” 5th International Conference on Electrical and Computer Engineering, ICECE 2008, Dhaka, Bangladesh, pp. 422-427, Institute of Electrical and Electronics Engineers, New York, New York (Dec. 20-22, 2008). |
Schmidt, “Multiple Emitter Location and Signal Parameter Estimation,” IEEE Transactions on Antennas and Propagation, vol. AP-34, No. 3, pp. 276-280, Institute of Electrical and Electronics Engineers, New York, New York (Mar. 1986). |
Roy et al., “ESPRIT-Estimation of Signal Parameters via Rotational Invariance Techniques,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, No. 7, pp. 984-995, Institute of Electrical and Electronics Engineers, New York, New York (Jul. 1989). |
Reddy et al., “Direction-of-Arrival Estimation of Speech Sources Under Aliasing Conditions,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1-5, Institute of Electrical and Electronics Engineers, New York, New York (2015). |
Benesty, “Adaptive Eigenvalue Decomposition Algorithm for Passive Acoustic Source Localization,” pp. 384-391, Acoustical Society of America (Jan. 2000). |
Buchner et al., “Simultaneous Localization of Multiple Sound Sources Using Blind Adaptive MIMO Filtering,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 97-100, Institute of Electrical and Electronics Engineers, New York, New York (2005). |
Lombard et al., “TDOA Estimation for Multiple Sound Sources in Noisy and Reverberant Environments Using Broadband Independent Component Analysis,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, No. 6, pp. 1490-1503, Institute of Electrical and Electronics Engineers, New York, New York (Aug. 2011). |
Nesta et al., “Cooperative Wiener-ICA for Source Localization and Separation by Distributed Microphone Arrays,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers, New York, New York (2010). |
Navarro et al., “Fast Two-Dimensional Simultaneous Phase Unwrapping and Low-pass Filtering,” Optics Express, vol. 20, No. 3, pp. 1-6, Optical Society America (Jan. 2012). |
Martin, “Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics,” IEEE Transactions on Speech and Audio Processing, vol. 9, No. 5, pp. 504-512 (Jul. 2001). |
Cooke et al., “An Audio-Visual Corpus for Speech Perception and Automatic Speech Recognition (L),” J. Acoust. Soc. Am., vol. 120, No. 5, pp. 2421-2424, Acoustical Society of America (Nov. 2006). |
Gdeisat et al., “One-Dimensional Phase Unwrapping Problem,” pp. 1-11 (2011). |
Teutsch et al.,“Wavefield Decomposition Using Microphone Arrays and Its Application to Acoustic Scene Analysis,” pp. 1-279 (2005). |
Shin et al., “Reduced-Complexity Maximum Likelihood Direction-of-Arrival Estimation Based on Spatial Aliasing,” IEEE Transactions on Signal Processing, vol. 62, No. 24, pp. 6568-6581, Institute of Electrical and Electronics Engineers, New York, New York (Dec. 15, 2014). |
Tang et al., “Aliasing-Free Wideband Beamforming Using Sparse Signal Representation,” IEEE Transactions on Signal Processing, vol. 59, No. 7, pp. 3464-3469, Institute of Electrical and Electronics Engineers, New York, New York (Jul. 2011). |
Liu et al., “Wideband Beamforming Concepts and Techniques,” A John Wiley and Sons, Ltd., Publication (2010). |
Lombard et al., “Localization of Multiple Independent Sound Sources in Adverse Environments,” pp. 1-263 (2012). |
Backus, “The Acoustical Foundations of Music, Chapter 2: Sound,” Second Edition, pp. 16-25, W. W. Norton and Company (1977). |
Van Trees, “Detection, Estimation, and Modulation Theory, Optimum Array Processing, 2.4: Uniformly Weighted Linear Arrays,” John Wiley and Sons (2002). |
Mh acoustics LLC, “www.mhacoustics.com,” date retrieved (Jan. 17, 2020). |
Chen et al.,“Robust Audio Localization with Phase Unwrapping,” ICASSP 2017, pp. 471-475, Institute of Electrical and Electronics Engineers, New York, New York (2017). |
Blauert, “Spatial Hearing—The Psychophysics of Human Sound Localization, 2.1: Localization and Localization Blur,” pp. 37-50, The MIT Press (1983). |
Number | Date | Country | |
---|---|---|---|
20200057132 A1 | Feb 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2017/059732 | Apr 2017 | US |
Child | 16664373 | US |