The present application relates to sound localization from analysis of acoustic signals gathered by distributed sensor units, and more specifically to the use of acoustic data that has been timestamped via GPS or a similar satellite system.
Sound source location identification has a wide range of potential applications. One example is the pinpointing and tracking of the direction and location of animals (wolves, elephants, dolphins, etc.) in the wild over distances extending from a few hundred meters to many kilometers. Another would be to detect and quickly locate the source of gunshots by law enforcement or by soldiers in a combat environment. It could also be applied to aircraft engine or propeller noise or sonic booms over longer distances, or even to sonar detection and location of underwater vehicles. The location of very low frequency sounds (e.g., from earthquakes, volcanic eruptions, nuclear weapon explosions) at global distances.
In “Sound Source Localization in Wide-range Outdoor Environment Using Distributed Sensor Network”, IEEE Sensors PP (99): 1-1 (October 2019), Faraji etc. discussed a GPS based sound source identification data acquisition (DAQ) system that is of great interest. This device is equipped with an array of multiple microphones, ADC (Analog to Digital Converter), a microprocessor, an SD card to store the measurement data. The GPS is used for two purposes, one to determine the location of each DAQ unit and the other to provide a 1 pps (pulse per, second) signal to the microprocessor so that measured acoustic signals can be timed stamped. When the technology is deployed, multiple DAQ units (sensor nodes) are spread over a range of interest. The algorithm is based on time domain detection and statistical analysis with a method called Fuzzy beliefs. Specifically, each DAQ unit or node determines a sound source direction using a delay and sum beamforming (DSB) method, then the directions obtained from multiple DAQ units or nodes are “fuzzified” and fused to determine a most probable sound source location. This technology can successfully detect the source location in an outdoor range such as 240×160×80 m3, with a mean distance error of 6.0 meters.
The technology proposed above has a couple of drawbacks. First, the whole process is based on the time domain signal processing and statistical analysis. The accuracy of time domain signal processing is much lower than those in the frequency domain because in time domain the signals are easier to get contaminated by noise or unexpected interference signals. In order to increase the accuracy of measurement, the authors adopted multiple microphones on a DAQ and multiple DAQs in the field. In a typical example, 8 sensors are used on each DAQ while 8 DAQs are deployed to construct the complete measurement system. Totally 64 microphones are used in order to make the complete measurement. By having many more redundant measurement and statistic processing, the system is able to achieve high accuracy of location determination. However, the total cost and complexity of system is very high.
In U.S. Pat. Nos. 6,847,587; 7,586,812; 7,710,278; 7,750,814; and 8,063,773, Patterson, Baxter, Holmes, and Fisher of ShotSpotter, Inc. described versions of a system that comprise multiple radio-networked DAQs, each equipped with a microphone, ADC, CPU, GPS time source, power supply and network interface. A time-domain based processing method using envelope analysis method is disclosed. The signal processing is applied to the data of each of microphones separately. There is no signal processing dealing with the cross terms between multiple channels such as cross spectral analysis. A drawback of this method is that it potentially may have high false rate of alarms. Processing methods based on the envelope of a time domain signal has less capability to differentiate the sound with different signatures comparing to a frequency domain processing method such as FRF (Frequency Response Function) or cross spectra. U.S. Pat. Nos. 7,266,045 and 8,036,065 were granted to Baxter and Fisher for a wearable configuration of gunshot DAQ for targeting in a battlefield or combat environment. The configuration is a distributed network like that in the first set of patents with radio wireless communication with other wearable DAQs in the field.
In U.S. Pat. No. 5,973,998, Showen and Dunham disclose a method that simply use cross-correlation method to determine the location of sound source. The algorithm of this method has been in the textbook for many years. Cross correlation is a reliable way to detect the time delay of two response signals coming from the same source. Based on the delay calculated and the estimated speed of sound, the location of sound source can be estimated. This method has much higher accuracy than those described by Fisher and Baxter. The data acquisition system is a centralized data acquisition structure based on a computer. Multiple sensors can be connected to the computer where the data are digitized. The cable length for each sensor can be as long as a few thousand feet but no more. Showen and Dunham also teach that cross-correlation methods between pairs of sound on different sensors should be used because the cross-correlation technique helps to weed out a false (i.e., non-correlating) noise.
In U.S. Patent Application Publication 2021/0289168 of Glückert et al., a combined visual and acoustic monitoring system provides event detection, localization, and classification. An event source is spatially localized (at least directionally and preferably with distance information) by the system using time information, then associated with corresponding video information, and the type of event is classified. For the acoustic localization, a multi-channel microphone array continuously captures audio information and provides a timestamp. An acoustic localization algorithm determines the localization of a sound event by determining differences in arrival times at the microphone array (both primary signals and any corresponding secondary echo signals, i.e., reflections, multi-reflections, or resonance effects).
There are roughly two main categories of sound localization hardware. One involves a distributed network of multiple separate DAQs that are geographically spread over the range of interest. Each DAQ has its own microphones, ADCs, CPU, display panel, control panel, data storage, and communication interface. They communicate with each other through a wireless network. Since each DAQ has its own sampling clock, which will necessarily deviate from each other, the localization algorithms are based on single channel signal processing in the time domain, such as envelope analysis, edge detection or statistics analysis. The second category of DAQ hardware configuration involves a centralized data acquisition system of physically connected sensors. The microphones or hydrophones or accelerometer sensors are connected to the input channels of the centralized data acquisition system, where multiple ADCs are installed. The ADCs of all input channels are sampled simultaneously on the same clock with synchronous sampling rate. Because of this, cross-correlation signal processing can be used. The cross-correlation function can be derived from the cross-spectrum in the frequency domain. The time delays can be calculated based on that cross-correlation of pairwise inputs. Based on multiple time delays between sensors, the location can be accurately estimated. However, this configuration is possible only where reliable hardwired connections of the sensors can be constructed. From a cost and deployment standpoint, the distributed architecture, even if somewhat less accurate, is more economical and practical.
Ideally, it would be desirable to have a distributed architecture of the sensor network, while at the same time, one where cross-terms such as a cross-correction function or cross-spectrum can be calculated. An ideal hardware system shall consist multiple DAQs that are not hardwire connected. Each DAQ is equipped with microphone and data processing unit. Hardware should be ruggedized, easy to install, small in size and battery powered. When real-time result is needed instantaneously, the measured data shall be transmitted to a designated processor wirelessly to compute the cross-terms between each pair of measurements or an array of measurements. Sound source location should be shown up on a 2D or 3D map within seconds of event happening.
The present invention estimates sound source location using the association of accurate time stamps with groups of sampled acoustic data recorded by multiple separate data acquisition (DAQ) units and performing frequency domain cross-spectra calculations and noise removal before transforming back to a time domain for pairwise delay estimations. DAQ microphones measure analog sound signals that are converted to digital format with analog-to-digital converters (ADCs). Both the DAQ locations and a reference time base are based on satellite ground positioning system (GPS) or related reference, which allows time stamps to be applied to blocks of ADC-sampled acoustic data. The recorded digital acoustic data and time stamps are transferred to a central computer for processing. The acoustic data is transformed into the frequency domain, then multiple pairs of the transformed data are subject to cross-spectra computation. Any noise is removed or substantially reduced in the frequency domain. The cross-spectrum computations for the multiple pairs of data are transformed back into time domain for estimation of time delays between measurements of acoustic signals. From those estimated time delays, and the relative positions of the pairs of DAQ microphones, the sound source location can be estimated to greater accuracy than previously possible. The location can be displayed on a map or geographic coordinates can be otherwise communicated to an interested user.
With reference to
r
xy(τ)=∫−∞∞x(t)·y(t−τ)dt
The correlation function rxy(τ) quantifies the conformity of x(t) compared to a time-shifted signal y(t−τ). The variable t contains the temporal distance between both signals. The higher the value of the correlation function, the more similar are both signals to each other. After the cross-correlation function between two signals is calculated, using peak or edge detection, the time delay between two signals can be determined.
The data plot in
The data plots in
There is a direct transform relationship between the cross-spectrum Gxy(f) and the correlation function:
G
xy(f)=F(rxy(τ)); and
r
xy(t)=F−1(Gxy(f),
where F( ) is the Fourier transform and F−1( ) is the inverse Fourier transform. While the correlation function can be obtained in the time domain by integral operation, there is an advantage to computing it first in the frequency domain through cross-spectrum and then transform it back to the time domain to generate the correlation function. This is because in the frequency domain, there are more tools to remove the noise, i.e., unwanted signals. For example, if a gunshot happens in a very noisy environment when a loud machine is running, or an airplane, a helicopter is approaching, the gunshot signal can be submerged in the noise. In the signal processing theory, noise refers to unwanted signals. Noise can be louder than the signal of interest.
In a concept plot,
However, in the frequency domain, seen in
Another distinct advantage conducted in the cross-spectrum domain is its capability of reducing the noise by applying the average. Multiple frames of cross-spectrum can be averaged to achieve higher confidence in the estimation. The rule of thumb is that the variance of estimation of source location can be increased by a factor of square-root of average number. For example, an average of 4 frames of data can reduce the variance by a factor of 2.
The cross-correlation function calculation, or any cross-term spectrum calculation, requires that all signals in the computation are digitized simultaneously. We have not found any existing technology that can easily compute the cross-correlation function or cross-spectra if the digitized signals are coming from ADCs driven by different sampling clocks, as is the case when multiple unconnected DAQs are involved. When cross-correlation function cannot be computed, people must use time domain methods for each individual sensor to compute the time delays between them. Various edge detection and peak detection methods are developed based on the envelope of time domain signals. Various statistics methods are developed to increase the accuracy of estimation. All these methods lack the advantage of those in the frequency domain mentioned above.
Regardless of whether cross-correlation techniques are used or not, the basic principle of estimating the location of sound source require input parameters of speed of sound, the locations of microphones and the measured time delay between all pairs of the microphones in use.
In U.S. Patent Application Publications 2022/0322259 and 2022/0322260, “Sampling Synchronization through GPS Signals”, a hardware-based method is disclosed to time stamp the digitized data after the ADC transforms the analog data into digital domain. In my pending U.S. patent application Ser. No. 17/564,654, a hardware and software solution are disclosed that the cross-spectrum can be computed based on such time stamped signals. The present invention will discuss further development in this domain that can be targeted at identifying the sound source location.
With reference to
With reference to
Each DAQ unit 71 can be as large as a brick or as small as a mobile phone. In fact, if a mobile phone is customized with the time stamping function described below, it is an ideal solution as a wearable DAQ that solders can use in the field.
After the measurement data and their corresponding time stamps are obtained, they will be saved in the storage media 82 on each of DAQ systems 71. Since the cross-term calculation asks for the data from multiple devices, the measurement data must be gathered to a central location to conduct the calculation. The first method, represented by
With reference to
The next processing steps are done at the central computer (or in the cloud). Each of the digital signals is first transformed into the frequency domain 105. Then cross-spectra are computed 106 for multiple pairs of those transformed signals. Additionally, in the frequency domain processing 107 (filtering, averaging, etc.) can be performed to remove noise. The cross-spectra for each signal pair are then transformed 108 back to the time domain. In the time domain, the time delays between the multiple pairs of measurements can be estimated 109, and from those time delays and the respective DAQ positions the sound source is localized 110. Finally, the sound source location can be displayed 111 on a map, or the location coordinates otherwise communicated to the interested users.
In U.S. Patent Application Publications 2022/0322259 and 2022/0322260, the present inventor disclosed a method to accurately apply time stamps to the digital signals that come from each ADC in each DAQ unit. The source of the time base comes from a GPS receiver (or for a similar satellite system) or equivalent time sources, such as IRIG-B. A time register is configured in a FPGA or CPLD. When a buffer of the data is full (in the FPGA or CPLD), the hardware automatically transfers the value of this time register to a location where the processor can read. The value in this register is the time stamp corresponding to the last point in the data buffer. Since the whole process is deterministic in timing, the accuracy of time stamps is guaranteed to the 100 ns or better.
The sampling interval, dT, is the inverse of the sampling frequency of ADC. It is driven by an onboard clock on the DAQ unit. For example, assuming the nominal sampling frequency to be 100 kHz, the sampling interval dT will be 1/(100,000 Hz)=10 μs. 1% of sampling interval will be 100 ns. Since the sampling clock has a small drift, usually it is less than 50 ppm, the sampling interval may slight change over time. For example, it may change from 10 μs to 10.00001 μs, or to 9.999999 μs. When the sampling intervals on different DAQ drifted around, the exact sampling time (i.e., the ADC conversion time) will not line up to each other. However, it is possible to extract the time of each ADC conversion time, this is called time stamp.
In this applicant's pending U.S. patent application Ser. No. 17/564,654, it is mentioned that to compute the cross-spectra of measured signals, each point of the sampling data is time stamped. However, if we generate time stamps for each sampled data point right after the ADC conversion, and if all those time stamps are transmitted along with their measurement data, then the total data quantity will be tremendous. Take as an example that a signal is sampled at 51.2 kHz and the time stamp is stored in the UTC format, which will take roughly 128 bits, i.e., 16 bytes per value. Every second it will take a memory space of 512000*16=8 Mbytes. One hour of recording will comes with 28 GB just to store the time stamps. The time stamp will take even more space than the actual measurement data. Not only it takes tremendous amount of space, but also the CPU computational resource to move around the data, not mentioning the speed or writing of them into the media like SD card. Obviously, this is not desirable.
Fortunately, we have determined that storing an original time stamp for each data sample is not necessary. A criterion can be established based on the maximum possible clock drift and nominal data sampling rate that time stamps can be stored at much longer period than the data sampling interval. In a typical acoustic frequency range, P, the period of storing the original time stamps can be as long as a few seconds to a few ten seconds. The stored time stamps will be interpolated or extrapolated to calculate the time stamp of each digital point right before the cross spectrum is computed. This strategy will significantly reduce the storage of original time stamps in the memory or in the storage media. In the example above when sampling rate is set to 51.2 kHz and P is 5 seconds, the data storage of time stamps will be reduced to 1/256000.
The flow chart in
We use the calculated P value as the time stamping interval 127 for storing time stamps on the DAQ. Blocks of measurement data are transmitted 128 with their respective time stamps, one time stamp per block representing P seconds worth of sampled data. Then, before the cross-spectra are calculated, knowing the nominal data sampling rate and the time stamp for the last data sample in the block, one can reconstruct 129 (via simple interpolation) a corresponding time stamp for each data sample in that block.
Besides calculating P based on input parameters, we also can take measurement to verify if the selected interval P can generate accurate time stamps by comparing the reconstructed time stamps to those original. The goal is to store minimum number of original time stamps but can meet the accuracy requirement after the reconstruction. The reconstruction algorithm can be any simple interpolation algorithm like a two-point straight-line interpolation. For example, if two original time stamps are available for 5 seconds of measurement data sampled at 51.2 kHz, the time stamp of any of 256000 samples can be calculated.
The reduction in the size of time stamp signals described above plays a crucial role in our hardware architecture. When the DAQs are communicating to each other or to the cloud via wireless communication, the bandwidth is limited, and data rate can be low. While transmitting measurement data is all necessary, it would be highly desirable to make the size of time stamps as small as possible and transmit it with minimum resource. Hence the method described here is highly desirable.
During the process of sound source identification, many known techniques of signal processing will be used, including envelope analysis, removing the background noise using either time or frequency domain methods, signature detection, determining the threshold of detection, loudness, and sharpness analysis, using redundant measurements to improve the accuracy of estimation, and so on. This invention focuses upon how to compute the cross-terms functions between any pair of measurement data sets using accurate time stamping technology. Once all the time delays are obtained using cross correlation functions, with known coordinate location of each DAQ, the source location can be estimated using methods described in the previous disclosures and textbooks.
The disclosed technology concerns multiple DAQ units that are separately distributed without hardwire connection in between. The accuracy of estimating the cross-spectrum and cross-correlation functions is as high as those obtained in a centralized data acquisition system with wired connection. Hence, the accuracy of estimation of the sound source location using the technology disclosed in this invention is the same as those from a centralized data acquisition system. The main factors influencing the accuracy of estimation will be no longer the instruments. Instead, other physical factors, such wind speed, become dominant factors.
One of the drawbacks of GPS-based technology is that the DAQ must have access to the GPS satellite signals. We tested that the maximum usable length of cable for a GPS receiver antenna that is 50 meters. Beyond that, the time base accuracy is degraded sufficiently that cross-spectra calculations cannot be guaranteed.
This application claims priority under 35 U.S.C. 119 (e) from prior U.S. Provisional Application 63/428,186, filed Nov. 28, 2022.
Number | Date | Country | |
---|---|---|---|
63428186 | Nov 2022 | US |