This invention relates to angle of arrival and source direction determination and identification.
Driving an automobile is a dangerous endeavor. In the United States alone, there were approximately 6.4 million auto accidents in 2005 resulting in $230 billion of damage, 2.9 million injuries and 42,636 deaths. Safe driving requires skill and the ability to detect and avoid dangerous situations. The detection aspect requires visual and auditory acuity. Interestingly, a minimum vision requirement is needed to obtain a driver's license but there is no corresponding auditory requirement. This is reasonably since vision is clearly the more important of the two senses for driving and it would be unfair to deprive those with hearing impairments of a driver's license. Nonetheless, the ability to hear sirens, screeching tires, collisions and horns is clearly a benefit to safe driving. In fact, the inventor has interviewed a handful of legally deaf drivers who have expressed that driving can be frightening without the ability to hear oncoming sounds.
In the United States alone, there are as many 600,000 deaf people and 6,000,000 with hearing impairment. Thus, a fairly large group of drivers is lacking full sensory perception needed for safe driving. Some may not even be willing to drive because of their hearing impairment. The present invention is intended to provide compensation for this deficit by creating a visual indication of what would otherwise be audibly detected. Specifically, through use of microphones and signal processing algorithms, an embodiment of the present invention will detect sounds, determine the direction of the sound source and visually indicate that direction. Additionally, an embodiment can also indicate the type of sound that has been detected.
Direction of arrival technology is well-known in the art. Radar systems, and more recently cellular direction finding systems, have used various methods such as TDOA (time difference of arrival), monopulse, triangulation and other methods, to locate the direction, or actual location, of a signal source. These systems determine location based on measurements of radio signals. These techniques have also been adopted for sound source location. In most applications, accuracy is the paramount requirement.
The disclosed embodiments of the present invention provide sound direction information in a visual form that can, for instance, be used to assist the hearing impaired while driving in an automobile. Although it is possible to achieve high accuracy with the approach discussed herein, accuracy is not the primary goal. Instead, a rough idea of the source direction is sufficient and this can be achieved with a simplification of the general approach. Thus, one aspect of the described embodiments is oriented towards a simple implementation.
The described embodiments are composed of a detection mechanism and a display mechanism. The detection mechanism uses a microphone array to sample sound over a spatial area. In one embodiment, time differences of arrivals of the sound to the various microphones are computed. From the time differences, the angle of arrival is determined.
Many existing methods for sound location determination are based on pairs of microphones oriented with a common origin. The time differences between the pairs of microphones give rise to a simultaneous set of equations which can be solved for the source location. Typically, least-squares is use to obtain a solution. Some approaches suffer the defect of having “blind spots”. One embodiment of the method described herein also uses time-differences but accomplishes the desired results with three microphones equally spaced around a unit circle with high accuracy and low implementation complexity. The method can use more microphones for increased accuracy (or only two microphones if localizing the sound source to one of two half-planes is sufficient). The novel solution is given in closed-form and does not have blind spots. A simplified, less precise, implementation is also developed.
Another embodiment specifically applies sound direction determination for use in an automobile to provide a visual indication to the driver of the sound direction.
Another purpose of the described embodiments is to provide an indication of the type of sound. For instance, the device might indicate, through a visual icon, that the sound was produce by screeching tires, a horn or a siren.
An apparatus which can detect sounds and provide a visual indication of the direction of a sound source is highly useful for hearing impaired drivers. Even for non-hearing impaired drivers, such a product could be highly useful if sounds outside the car are hard to detect inside the car. Very little in the way of products are available to assist the hearing impaired drive. An example of one product is called the AutoMinder. The AutoMinder monitors your vehicle's built-in sound warning systems (such as low fuel, fasten seat belt, door ajar, etc.) It warns you with a loud tone and a flashing light when these warning systems go off.
The apparatus should be able to distinguish, and ignore, sounds at the ambient sound level. Having the additional ability to recognize and indicate the type of sound (siren, screeching tires, horn, collision, etc . . . ) is also desirable. An example of a situation in which such an apparatus would be useful is depicted in
More generally, an embodiment includes a plurality of separately located microphones attached to an automobile for receiving sound from a sound source. A processor receives signals generated from the microphones that are electronic representations of the sound. The processor determines time difference of arrival of that sound between pairs of the microphones. From the time differences, the direction of the sound source is determined. A display controllable by the processor is used to provide a visual indication of the direction of the sound source.
There are many methods to determine angle of arrival. Possibly the simplest approach is to use relative sound levels: the microphone receiving the highest sound level would indicate the direction of the sound. The two problems with this approach are that you need as many microphones as directions you wish to be able to indicate and any kind of sound reflections (reverb) and path-dependent attenuation will cause accuracy to degrade significantly. Using directional microphones can help to some extent.
A much more reliable approach is to measure the time difference of arrival at different microphones. In one embodiment of the invention, three microphones are equally spaced around a circle (that is, they are separated by 120°). A simple closed-form solution for the angle of arrival based on the time differences is herewith derived based on one approximation. It will be clear that this configuration is used for simplicity of implementation but that other configurations using a different number of microphones or a different arrangement of microphones do not alter the nature of the invention.
di2=a2+b2−2ab cos(φi−θ), i=1,2,3
where φ1=0, φ2=⅔π and φ3=−⅔π. Note that the difference in distance are given by
δij≡di2−dj2=2ab(cos(φj−θ)−cos(φi−θ)).
These distances can be determined based on time differences between the various microphones of the arrival of the sound. This will be discussed later.
More useful than the individual differences, δij, are the ratio of these differences:
Note that these ratios do not depend on the unknown quantities a and b. The only remaining unknown is θ. Before making the variable substitution indicate above a solution is provided for the sake of completeness:
When implementing this formulation with finite-precision arithmetic, it is important to take steps to keep the argument of the inverse tangent as close to zero as possible to maintain procession. Doing so requires careful choice of the i, j and k indices based on the δij quantities. There are several strategies that work. The easiest approach is to choose i to correspond to the microphone closest to the sound source as determined by which microphone receives the signal first. However, this determination can be difficult in a realistic environment with reverb and noise. One can also determine i based on examination of the set of {δij}.
Instead of pursuing this further, a simpler implementation is achieved by making the angle substitutions ψ1=θ−π/3, or ψ2=θ−π, or ψ3=θ+π/3. Making this substitution into the equations for Δklij, solving for ψ and then converting back to θ results in the simpler expression:
where the ± depends in a simple way on j and k and
where the choice of the first or second rotation depends on the sign of δik (or, equivalently, on the sign of δij). As a specific example, if |δ12|<|δ23<|δ31|, then
The benefit of this formulation is that it is straightforward to choose the indices i,j,k to keep the argument of the arc tangent between −0.5 and 0.5. Thus, the argument is kept in the sweet spot of the arc tangent and finite-precision arithmetic can give very accurate results.
If high accuracy in the determination of θ is not needed, the above equation can be greatly simplified. If it is sufficient to indicate from which of six equally spaced 60° sectors the sound originated from, then the signs of the differences δij provide enough information. This is shown in
which of the six sectors the sound originated. Each of the six possible values of S (303) corresponds to one of the six sectors. Determining the location of the sound source to one of a set of different regions, or sectors, is referred to as localizing the sound source. For example, in this case of three microphones, the sound source location can be localized to one of six sectors as shown in
Notice that the three signs of S encode the order in which the sound was received at the microphones. For instance, the sign vector
indicates that the sound arrived first at microphone 1, then microphone 2 and finally at microphone 3. Each quantity δij>0 thus defines the half-plane containing microphone j while δij<0 defines the other half-plane containing microphone i. For each pairing of microphones, the associated half-planes divide space mid-way between the microphones perpendicular to the line connecting the microphones. For example,
The intersection of the half-planes defined by the sign vector localizes the sound source. This is depicted in
From this description, it should be clear that this sound localization technique can be generalized to N non-equally spaced microphones. Specifically, if there are N microphones, then the sound can be localized to one of 2N sectors (except in the case N=2 for which the sound can be localized to only one of two half-planes separating the microphones). The sectors are defined by the intersection of the half-planes separating the microphones.
Finally, note that if different sets of indices are used to construct S then the signs will be different. However, the solution is unique. The present invention is, of course, not restricted to any particular choice of indices.
In the preceding development, it is assumed that differences δij are known. We now show how to determine the δij based on the time-difference of arrivals of the sound arriving at the microphones. Because of differences in the signal received at each microphone due to noise, reverb and amplitude variations, it is not accurate to simply try to time the arrival of s(t) at each microphone and compare the arrivals to form the time differences. Instead, it is known in the art that performing a cross-correlation of the signals si(t), i=1, 2, 3, where si(t) is the signal arriving at microphone i, yields accurate results. The signals arriving at the microphones are
si(t)=hi(t)*s(t−τi)+ni(t) i=1,2,3
where hi(t) is the channel between the sound source and microphone i, ni(t) is the ambient noise at microphone i and the τi, i=1, 2, 3, represent the arrival time at microphone i of the sound s(t) from the sound source. Let ζij≡(τi−τj) represent the time difference of arrivals between microphone pair i and j. Then, if c is the speed of sound, the difference in distance between the source and the microphones is di−dj=cζij. It remains to determine the time difference of arrivals ζij. The cross-correlation is defined as
Rij(τ)=∫TTsi(t)sj(t−τ)dt.
where T is a sufficient long interval to integrate most of the signal energy. An equivalent representation in the frequency-domain is
Rij(τ)=F−1{Ψij(f)Si(f)S*j(f)}
with Ψ(f)=1. The time-difference of arrivals is simply
Various improvements on this formula are obtained by choosing a filter Ψ(f) other than unity. Some common choices include
where Ni(f) is the Fourier transform of ni(t). PHAT is known as a whitening filter. It can be understood as flattening the magnitude of the spectrum which leads to a sharper impulse for Rij(τ). The maximum-likelihood filter, ML, can be understood as giving more weight to frequencies which possess a higher signal-to-noise ratio. As a third example, a simple filter which can be applied separately to each si(t), is
The parameter, q, trades-off spectral whitening for noise filtering. A further simplification occurs if the signal and noise spectrums are similar at each microphone. Then, a single filter can be applied to each si(t):
Note that the filter Ψ(f) can either be applied in the frequency-domain or its inverse-Fourier transform, ψ(t), can be convolved with the microphone signals in the time-domain. In other words,
Rij(τ)=∫TT(Ψ(t)*si(t)(Ψ(t−τ)*sj(t−τ)dt.
One advantage of this formulation is that it incorporates the possibility of match filtering to specific sound types. Specifically, rather than having a single filter, Ψ(f), a bank of filters, Ψj(f), j=1, . . . , J, can be applied where J is the number of different sound types being considered. Then, for each j the energy
Ej=∫(Ψj(t)*si(t))dt
(for one or all of the microphones, i) can be computed and the type of sound can be inferred from the j that gives the maximum Ej. From the determination of the sound type, an icon which graphically presents the type of sound can be displayed. Alternatively, text can be used to state the type of sound.
Finally, note that the cross correlation of the time-differences leads to di−dj=cζij (where c is the speed of sound). However, the quantity of interest used in the formulation above is δij≡di2−dj2. At this point, the only approximation used in this derivation is introduced. If b>>a (that is, the distance to the sound source is much greater than the distance between the microphones) then
This approximation is quite benign. For instance, if b=10a then the maximum error in the determination of θ is only 1.4°.
Once the direction of the sound source is determined, it is indicate visually.
An embodiment of the invention proceeds as shown in
An alternative embodiment is shown in
A third embodiment of the invention is shown in
is determined (901). If a signal is detected whose amplitude or energy exceeds a pre-determined (or adaptively determined) margin, the system applies filtering to the received signals (902). Time differences of arrivals are computed from the cross-correlation of the filtered signals (903). From the time difference, the direction of the sound is determined (904). A visual indication corresponding to the direction of the source is then provided (905). Optionally, the type of sound (which might include, for instance, the sounds of screeching tires, horns, sirens, or collisions) is determined (906) and visually indicated (907). After the sound ends, the system returns to state (900).
Number | Name | Date | Kind |
---|---|---|---|
3859621 | Foreman | Jan 1975 | A |
5465302 | Lazzari et al. | Nov 1995 | A |
6266014 | Fattouche et al. | Jul 2001 | B1 |
6912178 | Chu et al. | Jun 2005 | B2 |
7117149 | Zakarauskas | Oct 2006 | B1 |
7162043 | Sugiyama et al. | Jan 2007 | B2 |
20020181723 | Kataoka | Dec 2002 | A1 |
20050078833 | Hess et al. | Apr 2005 | A1 |
20050207591 | Tsuji et al. | Sep 2005 | A1 |
20070025562 | Zalewski et al. | Feb 2007 | A1 |
Number | Date | Country | |
---|---|---|---|
20090052687 A1 | Feb 2009 | US |