The present disclosure is directed to a system and method for detecting multi-tone sirens, and particularly for detecting multi-tone sirens despite environmental noises that may be present.
Automated vehicles that are capable of sensing their environment and operating with little to no human effort are being rapidly developed and deployed. Automated vehicles include autonomous vehicles, semi-autonomous vehicles and vehicles with automated safety systems. These vehicles provide full or partly automated control features that keep the vehicle within its lane, perform a lane change, regulate speed and engage the vehicle brakes, for example.
A well-known classification system is promulgated by The Society of Automotive Engineers (SAE International) and classifies vehicles according to six increasing levels of vehicle automation, from “Level 0” to “Level 5”. These levels feature, in increasing order, warning systems but no automation, driver assistance, partial automation, conditional automation, high automation and full automation. Level 0 vehicles have automated warning systems, but the driver has full control. Level 5 vehicles require no human intervention. The term “automated vehicle” as used herein includes Level 0 to Level 5 autonomous and semi-autonomous vehicles.
In most cities and countries, laws require that vehicles pull over and yield to approaching emergency vehicles. Emergency vehicles utilize multi-tone sirens that cycle through a sequence of tones having a predefined duration. Recognition of approaching emergency vehicles is critical to public safety in general and especially in systems for automated vehicles.
Present siren detection methods lack robustness in real world operating conditions because of environmental noise. As used herein, environmental noises include sounds produced by vehicles and vehicular traffic, speech, music, and the like.
The present disclosure provides a system and method for detecting a multitone siren by accounting for a doppler shift attributable to a relative speed between an emergency vehicle and an automated vehicle.
The present disclosure provides such a system and method that uses an explicit model of the multi-tone siren signal, which model describes the siren as a sequence of tones that are specified by their fundamental frequency and duration.
The present disclosure further provides such a system and method that factors and/or models the change of the tones' fundamental frequencies and durations due to the doppler shift.
The present disclosure still further provides such a system and method that uses integral signal representations to efficiently detect tone duration patterns.
The present disclosure still further provides such a system and method that considers the effect on upper harmonics.
The present disclosure still yet further provides such a system and method that detects tones over their entire duration period so that unwanted perturbation by interfering tonal signals such as speech and music is minimized.
The system and method of the present disclosure can advantageously detect the siren signals at very low signal-to-noise ratios (SNR) and notwithstanding whether the siren signal is overlaid by tonal signals, such as speech or music.
The accompanying drawings illustrate aspects of the present disclosure, and together with the general description given above and the detailed description given below, explain the principles of the present disclosure. As shown throughout the drawings, like reference numerals designate like or corresponding parts.
Referring to the drawings and, in particular to
Referring again to
Siren 42 produces sound waves 46. In example embodiments, siren 42 is a multi-tone siren. As used herein, a “multi-tone siren” is a loud noise-making device that generates two or more alternating tones such as alternating “hi-lo” signals. Unless otherwise specified in this disclosure, a sound is a vibration that typically propagates as an audible wave of pressure, through a transmission medium. A tone is sound at one specific frequency.
Although it can be seen that the time/frequency pattern of a siren signal is clearly defined, in real operating conditions there is significant variability.
Referring back to
For example, if automated vehicle 10 drives at a velocity of v1=150 km/h and emergency vehicle 40 drives at a velocity of v2=50 km/h, there is a speed difference of v1−v2=100 km/h, which needs to be added to the speed of sound. Hence, the speed of sound changes from c=1235 km/h to c+v1−v2=1335 km/h, which corresponds to a factor of 1335/1235=1.081. This 8% increase in the speed of sound changes the duration of tones by a factor of 1/1.081=0.925 (time stretching factor) and it increases the frequency of tones by 8%, i.e. a 1000 Hz tone becomes a 1080 Hz tone, a 3000 Hz tone becomes a 3240 Hz tone, . . . etc.
In general, if automated vehicle 10 approaches emergency vehicle 40 with a relative speed Δv, the time stretching factor α is defined as:
where c denotes the speed of sound in km/h. The duration of all tones of the siren pattern needs to be multiplied by this value. The change in frequency due to the Doppler shift, by which change each tone frequency is to be multiplied by, is:
Note that Δv becomes negative, if the automated vehicle 10 is driving away from emergency vehicle 40. The time stretching factor α will become bigger than 1 in this case and the siren tone frequencies will decrease, as 1/α is smaller than 1.
Noises 22 also exist within environment 20. Noises 22 are environmental noises that include sounds produced by vehicles and vehicular traffic, speech, music, and the like. Noises 22 are generally dynamic with respect to one or more of pitch, intensity, and quality.
Referring to
System 100 includes the following exemplary components that are electrically and/or communicatively connected: a microphone 110 and a computing device 200.
Microphone 110 is a transducer that converts sound into an electrical signal. Typically, a microphone utilizes a diaphragm that converts sound to mechanical motion that is in turn converted to an electrical signal. Several types of microphones exist that use different techniques to convert, for example, air pressure variations of a sound wave into an electrical signal. Nonlimiting examples include: dynamic microphones that use a coil of wire suspended in a magnetic field; condenser microphones that use a vibrating diaphragm as a capacitor plate; and piezoelectric microphones that use a crystal of made of piezoelectric material. A microphone according to the present disclosure can also include a radio transmitter and receiver for wireless applications.
Microphone 110 can be directional microphones (e.g. cardioid microphones) so that focus on a direct is emphasized or an omni-directional microphone. Microphone 110 can be one or more microphones or microphone arrays.
Computing device 200 can include the following: a detection unit 210; a control unit 240, which can be configured to include a controller 242, a processing unit 244 and/or a non-transitory memory 246; a power source 250 (e.g., battery or AC-DC converter); an interface unit 260, which can be configured as an interface for external power connection and/or external data connection such as with microphone 110; a transceiver unit 270 for wireless communication; and antenna(s) 272. The components of computing device 200 can be implemented in a distributed manner.
Detection unit 210 performs the multi-tone siren detection in example embodiments discussed below.
At step 510 a relevant range of the relative speed between vehicles is specified, e.g. a set of speeds such as {137 km/h, 65 km/h, 0 km/h, −59 km/h, −112 km/h} is considered, possibly with a higher resolution.
At step 520, the doppler effect is considered by determining a set or relevant time stretching factors, e.g. {0.9, 0.95, 1.0, 1.05, 1.1}, which has been derived from the above set of relevant relative speeds according to is tsƒ(Δv), as specified before.
At step 530, relevant combinations of duration and frequency for the detection of siren tonal components are determined and siren pattern model 540 is applied. As used here, “relevant combinations” means that durations specified in the siren pattern model are translated through multiplication by all applicable time stretching factors tsƒ(Δv). Frequencies specified in the siren pattern model are translated through multiplication by 1/tsƒ(Δv) for all applicable time stretching factors tsƒ(Δv).
Advantageously, using an explicit model 540 yields a robust result. For example, an explicit model allows for a distant siren signal to be detected in loud driving noise. An explicit model allows for better discrimination of the siren signal from local signals in the car, such as media playback from smart phones and tablets or cell phone ring tones.
At step 550, microphone 110 acquires a signal from siren 42.
It is noted that step 550 can occur prior to step 510. Steps 510, 520, and 530 can be performed independent of steps 550 and 560. Likewise, steps 550 and 560 can be performed independent of steps 510, 520, and 530.
At step 560, a time-frequency representation of the microphone input signal is obtained by applying, in real time, a time frequency analysis. In this example, short-Time Fourier Transform (STFT) calculations are performed and energy values for each time-frequency bin are determined by detection unit 210.
At step 570, for all relevant combinations of duration and frequency, as determined in step 530, the following steps are iteratively performed: steps 575, 580, 585 and 595.
At step 575, detection unit 210 detects tone duration patterns for each given frequency
At step 580, detection unit 210 checks for common onsets of the detected tone duration patterns for harmonics of the same fundamental frequency to generate detected segments.
At step 585, detection unit 210 matches the detected segments to given siren pattern models, which specify valid sequences of segments for siren signals.
Finally, at step 590, detection unit 210 generates a detection result.
The detection result can be used as input in automated safety systems of automated vehicle 10.
where a “+1” refers to tone presence, a “−1” refers to tone absence (e.g. because the siren switched to a different frequency) and a “0” refers to areas that are ignored. In the above example, it is assumed that a siren tone of fundamental frequency ω1 is active for a duration of 0.7 seconds, followed by a leading and trailing tone absence of 0.7 seconds.
This creates an alternating duration pattern for the second siren tone with fundamental frequency ω2. In this example, the multi-tone model consists of the two tone-duration patterns.
An example algorithm 800 performed by detection unit 210 for detecting tone duration patterns based on integral signal representations as in step 575 is summarized in
At step 810, detection unit 210 acquires an integral signal representation in time direction over spectral magnitude values or other values that are calculated based on the spectrogram.
At step 820, for each frequency/duration pattern and for each time stretching factor corresponding to a relevant Doppler shift, detection unit 210 calculates the cross-correlation of the tone duration pattern using the integral image representation.
At step 830, detection unit 210 determines presence of duration pattern by post-processing the result of the cross-correlation.
As explained above, the doppler shifted frequencies ωi(α) and duration patterns Pi(α) of these patterns need to be considered for all relevant time stretching factors α. This is achieved by translating the frequencies ωi and patterns Pi as follows:
Let X(t, ω) denote the short-time Fourier transform (STFT) of the microphone input signal x(t), where t denotes time and ω denotes frequency. Furthermore, let {tilde over (X)}(t, ω) denote the magnitude spectrogram {tilde over (X)}(t, ω)=|X(t,ω)|. Then a straight-forward detection δ(t, ωi, Pi) of a time duration pattern Pi at frequency ωi can be achieved by first cross-correlating Pi(t) with {tilde over (X)}(t, ωi), t=0 , . . . , ∞ through convolution with Pi(−t), i.e.
{tilde over (X)}(t, ωi)*Pi(−t)=∫0∞{tilde over (X)}(τ, ωi)·Pi(τ−t)dτ,
and then applying a threshold Γ on the result:
The above cross-correlations become prohibitively expensive if they need to be performed for all possible tone frequencies and duration patterns in all Doppler shifted variants. Advantageously an integral signal representation can be used to efficiently detect the duration patterns Pi. For this, the integral signal representation
In one example implementation, the integral signal representation can be calculated over the magnitude spectrogram {tilde over (X)}(t, ωi), in direction of t:
With this representation, the cross-correlation of {tilde over (X)}(t, ωi) and Pi(t) is easily obtained, as the Pi always consist of segments that assume a value αk=−1 or αk=+1 on a corresponding time interval tk,start≤t<tk,stop:
The calculation includes one multiplication and one subtraction for each segment in the duration pattern. The value K denotes the number of segments, i.e. K=3 in the example P1(t) from above, for which the cross-correlation with {tilde over (X)}(t,ω1) is calculated as:
The actual detection of the duration pattern Pi at frequency ωi and time t is eventually determined according to δ(t, ωi, Pi).
In another example implementation, the integral signal representation can be calculated over a local signal detector Λ(t, ω):
A simple local signal detector Λ(t, ω) can detect signal presence, i.e. assume a value of one, if the spectral magnitude value {tilde over (X)}(t, ω) exceeds a specified SNR threshold ΓSNR whereas it can be zero otherwise:
where Ñ(t, ω) denotes a noise spectral magnitude estimate at time t and frequency ω.
It is envisioned that a more sophisticated local signal detector can use a tone, peak or harmonics detector based on more complex functions of spectral magnitude values.
It should be apparent that integral signal representations can also be two sided, i.e. the integral signal representation may be calculated as a two-sided integral if this is suitable:
It should be apparent that the integral signal representations are calculated in time direction and can be calculated for individual frequency bins of the spectrogram, power ratios of values in the spectrogram or more general functions of the spectrogram, such as a local tone detection measure.
It should be understood that elements or functions of the present invention as described above can be implemented in the form of control logic using computer software in a modular or integrated manner. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement the present invention using hardware and a combination of hardware and software.
While the present disclosure has been described with reference to one or more exemplary embodiments, it will be understood by those skilled in the art, that various changes can be made, and equivalents can be substituted for elements thereof without departing from the scope of the present disclosure. In addition, many modifications can be made to adapt a particular situation or material to the teachings of the present disclosure without departing from the scope thereof. Therefore, it is intended that the present disclosure will not be limited to the particular embodiments disclosed herein, but that the disclosure will include all aspects falling within the scope of a fair reading of appended claims.
This application claims the benefit of U.S. Provisional Application No. 62/701,169, filed Jul. 20, 2018.
Number | Name | Date | Kind |
---|---|---|---|
8094040 | Cornett | Jan 2012 | B1 |
20090179774 | Mohan | Jul 2009 | A1 |
20150304784 | Snider | Oct 2015 | A1 |
20160070788 | Vrazic | Mar 2016 | A1 |
20170180863 | Biggs | Jun 2017 | A1 |
20190139565 | Chang | May 2019 | A1 |
Number | Date | Country |
---|---|---|
2018050913 | Mar 2018 | WO |
Number | Date | Country | |
---|---|---|---|
20200025904 A1 | Jan 2020 | US |
Number | Date | Country | |
---|---|---|---|
62701169 | Jul 2018 | US |