This disclosure relates generally to audio systems and methods, and more particularly to audio acquisition systems and methods.
In various audio acquisition systems, such as voice recording systems, voice recognition systems, audio and video recording systems, and video-conferencing systems, one or more microphones having fixed directivity may be used to acquire audio information. In general, more than one audio source may be present, which may be located at different distances and angles relative to the one or more microphones. Accordingly, it may be desirable to control the directivity of the microphones to improve the quality of an audio recording.
Audio acquisition systems and methods to determine a direction of arrival of an audio signal are disclosed. In an aspect, an apparatus includes a continuous sampling stage configured to receive audio information and to generate one or more correlations from the received audio information, and a processing stage configured to receive the one or more correlations and to generate direction of arrival information for the audio information. In another aspect, a method includes generating audio signals from an ambient acoustic environment, and performing beamforming on the generated audio signals. The method further includes calculating signal-to-interference ratios from the beamformed signals, forming correlations between the signal-to-interference ratios and audio sampling angles, selecting at least one correlation based upon predetermined selection criteria, and determining a direction of arrival for the audio signals.
Various embodiments are described in detail in the discussion below and with reference to the following drawings.
Audio acquisition systems and methods that may be configured to determine a direction of arrival of an audio signal are disclosed. Briefly, and in general terms, the various embodiments may be configured to control the directivity of one or more microphones associated with the audio acquisition system by determining a direction of arrival of a selected audio signal. In the various embodiments, an audio acquisition system may also be configured to direct one or more video devices towards an audio source identified by the system.
The continuous sampling stage 12 may be coupled to a processing stage 14, which may be configured to output a result after a predetermined number of samples from the continuous sampling stage 12 have been received. For example, the processing stage 14 may be configured to generate a result based upon at least one-thousand samples received from the continuous sampling stage 12. In accordance with the various embodiments, between approximately one-thousand and approximately two-thousand samples may be processed by the processing stage 14, although other sampling ranges or sampling limits may be selected. The continuous sampling stage 12 may include a microphone apparatus 16 that may be removably coupleable, which may include a single microphone, or alternatively, the microphone apparatus 16 may include a plurality of microphone devices that may be positioned at a variety of selected locations remote from the system 10. In accordance with the various embodiments, the microphone apparatus 16 may therefore include a uniform linear microphone array, a uniform circular array and a uniform square array orientation, among other suitable arrangements that may, in general, be configured to detect acoustical disturbances in an ambient acoustic environment. In the various embodiments, the maximum number of microphones in the microphone apparatus 16 may be limited only by processing capabilities of the system 10.
The continuous sampling stage 12 may also include one or more beamforming modules 181 through 18k that may be operably coupled to the microphone apparatus 16. Briefly, and in general terms, the beamforming modules 181 through 18k may be configured to alter an audio directionality of the microphone apparatus 16 by combining audio information received from the one or more microphones in the microphone apparatus 16. Accordingly, the beamforming modules 181 through 18k may be configured to process received audio signals to produce a main signal lobe that may vary from approximately +90 degrees to approximately −90 degrees, where the angle may be measured relative to a line extending perpendicularly from the microphone apparatus 16. In addition to the main signal lobe, various signal nulls and signal side lobes may also be generated by the beamforming modules 181 through 18k. A position of the signal nulls may be important, for example, in suppressing selected undesired audio signals that may be received by the microphone apparatus 16. The beamforming modules 181 through 18k may be structured using an all-pass infinite impulse response (IIR) filter that may be configured with appropriate delays. For example, and in accordance with the various embodiments, a Thiran all-pass filter may be used. Suitable delay values may be selected as disclosed in “Fractional Delay Filter Based on the B-Spline Transform”, J. T. Olkkonen and H. Olkkonen, IEEE Signal Processing Letters, vol. 14, No. 2, February 2007, which reference is incorporated herein by reference in its entirety. The beamforming modules 181 through 18k may also be configured to implement various algorithms, which may include a delay-and-sum beamforming algorithm, a linearly-constrained minimum variance beamforming algorithm, a time-domain generalized sidelobe canceller, and a robust generalized sidelobe canceller, as well as other suitable algorithms.
The continuous sampling stage 12 may also include signal-to-interference ratio (SIR) modules 201 through 20k suitably coupled to the beamforming modules 181 through 18k. The SIR modules 201 through 20k may be configured to continuously receive information from the beamforming modules 181 through 18k and to process the information to continuously generate a signal-to-interference ratio (SIR). The determination of the signal-to-interference ratio (SIR) will be discussed in greater detail below. The continuous sampling stage 12 may also include a curve module 22 that may be configured to receive information from the SIR modules 201 through 20k and to process the received information to generate a selected correlation between the signal-to-interference (SIR) ratio and an audio sampling angle.
Still referring to
The processing stage 14 may also include a curve selection module 26 configured to receive the distributions processed by the filter module 24, and to further process selected correlations. For example, the curve selection module 26 may be configured to select a distribution having a suitable global minimum point. As a further example, the curve selection module 26 may be further configured to select a single distribution having one or more predetermined characteristics. In accordance with the various embodiments, the curve selection module 26 may select more than one distribution, however. The curve selection module 26 will also be discussed in greater detail below.
The audio acquisition system 10 may also include an angle determination module 28 that may be configured to receive the one or more distributions received from the curve selection module 26. The angle determination module 28 may accordingly generate direction-of-arrival (DOA) information DOA1 through DOAk for audio signals detected by the microphone apparatus 16. For example, the DOA1 through DOAk may include an angle of a source of audio signals relative to a position of each of the microphones included in the microphone apparatus 16. In accordance with the various embodiments, the DOA1 through DOAk may be expressed in other forms that may express a direction of the audio signals received by the microphone apparatus 16.
The determination of the signal-to-interference ratio (SIR) will now be discussed in detail. A signal output from a selected microphone in the microphone apparatus 16 may be expressed as m(i,n), where i represents a selected microphone, and n represents a time or a sample value. Accordingly, an average value f(n) for the microphone response may be readily determined by summing the signal outputs for the various microphones in the microphone apparatus 16 (e.g., summing over the index):
f(n)=(1/(number of microphones))Σm(i,n)
For example, if the microphone apparatus 16 includes four microphones, then the average value f(n) becomes:
f(n)=0.25Σm(i,n)
Where the index i may be summed from one to four. Still assuming that the microphone apparatus 16 includes four microphones, a difference b(n) may be defined as:
b(n)=m(2,n)−m(3,n)
Accordingly, the following expressions for the microphone power may be formed:
P
f(n)=αPf(n−1)+(1−a)f(n)f(n)
P
b(n)=αPb(n−1)+(1−α)b(n)b(n)
The signal-to-interference ratio (SIR) may therefore be defined in terms of the foregoing expression:
SIRi(n)=(Pb(n)/Pf(n))
Referring now to
Referring now to
With continued reference to
The minimum points of the various groups may be determined by a variety of methods. For example, the minimum points may be located by progressively calculating a slope for lines tangent to the correlation, and finding a location on the correlation that corresponds to a selected numerical criterion ε so that the calculated slope may be less than, or equal to the numerical criterion ε, where ε may be a selected numerical value that is close to zero.
From the foregoing it will be appreciated that, although various embodiments have been described for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the disclosure. Moreover, although the functional description of the various embodiments may be associated with the various described modules, it is understood that the disclosed functionality may be associated with fewer modules, or even a greater number of modules without deviating from the scope of the various embodiments. The various disclosed modules may also be implemented exclusively in hardware or in software, or even in a combination of hardware and software. Where an alternative may be disclosed for a particular embodiment, this alternative may also apply to other of the various embodiments even if not specifically stated.