The present invention relates to audio signal processing applications where the direction of arrival of the audio signal(s) is the primary parameter for signal processing. The invention can be used in any application that requires the input audio signal(s) to be processed based on the spatial direction from which the signal arrives.
Application of this invention includes, but is not limited to audio surveillance systems, hearing aids, voice-command systems, portable communication devices, speech recognition/transcription systems, and any application where it is desirable to process signal(s) based on the direction of arrival.
Directional processing can be used to solve a multitude of audio signal processing problems. In hearing aid applications, for example, directional processing can be used to reduce the environmental noise that originates from spatial directions different from the desired speech or sound, thereby improving the listening comfort and speech perception of the hearing aid user. In audio surveillance, voice-command and portable communication systems, directional processing can be used to enhance the reception of sound originating from a specific direction, thereby enabling these systems to focus on the desired sound. In other systems, directional processing can be used to reject interfering signal(s) originating from specific direction(s), while maintaining the perception of signal(s) originating from all other directions, thereby insulating the systems from the detrimental effect of interfering signal(s). Beamforming is the term used to describe a technique which uses a mathematical model to maximise the directionality of an input device. In such a technique filtering weights may be adjusted in real time or adapted to react to changes in the environment of either the user or the signal source, or both.
Traditionally, directional processing for audio signals has been implemented in the time-domain using Finite Impulse Response (FIR) filters and/or simple time-delay elements. For applications dealing with simple narrow band signals these approaches are generally sufficient. To deal with complex broadband signals such as speech however, these time-domain approaches generally provide poor performance unless significant extra resources, such as large microphone arrays, lengthy filters complex post-filtering, and high processing power are committed to the application. Examples of these technologies are described in “Analysis of Noise Reduction and Dereverberation Techniques Based on Microphone Arrays with Postfiltering,” C. Marro, Y. Mahieux and K. U. Simmer, IEEF Trans Speech and Audio Processing, vol. 6, no. 3, 1998, and in “A Microphone Array for Hearing Aids,” B. Widrow, IEEE Adaptive Systems for Signal Processing, Communications and Control Symposium, pp. 7-1 I, 2000.
In any directional processing algorithm, an array of two or more sensors is required. For audio directional processing, either omni-directional or directional microphones are used as the sensors.
There are two common types of directional processing algorithms: adaptive beamforming and fixed beamforming. In fixed beamforming, the spatial response—or beampattern—of the algorithm does not change with time, as opposed to a time varying beampattern in adaptive beamforming. A beampattern is a polar graph that illustrates the gain response of the beamforming system at a particular signal frequency over different directions of arrival.
More recent Fast Fourier Transform (EFT)-based approaches attempt to improve upon the traditional time-domain approaches by implementing directional processing in the frequency-domain. However, many of these EFT-based approaches suffer from wide sub-bands that are highly overlapped, and therefore provide poor 5 frequency resolution. They also require longer group delays and more process in, power in computing the EFT.
Accordingly, there is a need to solve the problems noted above and also a need for an innovative approach to enhance and/or replace the current technologies.
The invention described herein is applicable to both the end-fire and broadside microphone configurations in solving the problems found in conventional beamforming solutions. It is also possible to apply the invention to other geometric configurations of the microphone array, as the underlying processing architecture is flexible enough to accommodate a wide range or array configurations. For example, more complex directional systems based on two or three-dimensional arrays, used to produce beampatterns having three dimensions are known and are suitable for used with this invention.
In accordance with an aspect of the present invention, there is provided a directional signal processing system for beamforming a plurality of information 20 signals, which includes: a plurality of microphones; an oversampled filterbank comprising at least one analysis filterbank for transforming a plurality of information signals in time domain from the microphones into a plurality of channel signals in transform domain; and one synthesis filterbank and a signal processor for processing the outputs of said analysis filterbank for beamforming said information signals. The synthesis filterbank transforming the outputs of said signal processor to a single information signal in time domain.
In accordance with a further aspect of the present invention, there is provided a method of processing a plurality of channel signals for achieving approximately linear phase response within the channel, which includes a step of performing filtering by applying more than one filter to at least one channel signal.
In accordance with a further aspect of the present invention, there is provided a method of processing at least one information signal in time domain for achieving approximately linear phase response which includes a step of performing an oversampling using at least one oversampled analysis filterbank. The oversampled analysis filterbank applies at least one fractional delay impulse response to at least one filterbank prototype window time.
The directional processing system of the invention takes advantage of oversampled analysis/synthesis filterbanks to transform the input audio signals in time domain to a transform domain. Example of common transformation methods includes GDFT (Generalized Discrete Fourier Transform), FFT, DCT (Discrete Cosine Transform). Wavelet Transform and other generalized transforms. The emphasis of the invention described herein is on a directional processing system employing oversampled filterbanks, with the FFT method being one possible embodiment of said filterbanks. An example of the oversampled, FFT-Based filterbanks is described in U.S. Pat. No. 6,236,731 “Filterbank Structure and Method for Filtering and Separating an Information Signal into Different Bands, Particularly for Audio Signal in Hearing Aids” by R. Brennan and T. Schneider, incorporated herein by reference. An example of an hearing aid apparatus employing said oversampled filterbanks is described in U.S. Pat. No. 6,240,192 “Apparatus for and Method for Filtering in an Digital Hearing Aid, Including an Application Specific Integrated Circuit and a Programmable Digital Signal Processor” by R. Brennan and T. Schneider, incorporated herein by reference. However, this use of oversampled analysis/synthesis filterbanks in the general framework of the directional processing system disclosed herein has not been reported before.
The sub-band signal processing approach described henceforth, with its corresponding FFT-based method being one possible embodiment of the oversampled filterbanks employed in the invention disclosed herein, has the advantage of directly addressing the frequency-dependent characteristics in the directional processing of broadband signals. Compared to traditional time-domain and FFT-based approaches, the advantages of using an oversampled filterbank in sub-band signal processing according to the present invention are as follows:
1) Equal or greater signal processing capability at a fraction of the processing power,
2) Orthogonalization effect of the subband signals in the different frequency bins due to the FFT of the oversampled filterbank,
3) Improved high frequency resolution,
4) Better spatial filtering,
5) Wide range of gain adjustment at a very low cost of processing power, and
6) Ease of integration with other algorithms.
As a result, the sub-band directional processing approach with an oversampled filterbank allows powerful directional processing capability to be implemented on miniature low-power devices. For applications employing the invention, this means:
1) Better listening comfort and speech perception (particularly important for hearing aids),
2) More accurate recognition for speech and speaker recognition systems,
3) Better directionality and higher SNR,
4) Low group delay, and
5) Lower power consumption.
Thus, the present invention is applicable for audio applications that require a high fidelity and ultra low-power processing platform.
A further understanding of the other features, aspects, and advantages of the present invention will be realized by reference to the following description, appended claims: and accompanying drawings.
Embodiments of the invention will now be described will reference to the accompanying drawings, in which:
Turning now to
Oversampled filterbanks offer the general advantages explained in the summary above by virtue of their flexibility and the fabrication technology. Further advantages of their use for the adaptive beamformer application of the present invention are:
1) Directional processing using prior art techniques requires very long adaptive filter lengths particularly in reverberant environments, as reported by other researchers (see .J. E. Greenberg, “Improved Design of Microphone-Array Hearing Aids,” Ph.D Thesis, MIT, September, 1994). The sub-band adaptation using the oversampled filterbank can efficiently implement the equivalent of a long filter through parallel sub-band processing.
2) In frequency domain beamforming (both adaptive and fixed). there is a need to weight the Fast Fourier Transform (FF1) coefficients in a highly unconstrained way. A typical adaptive post-filtering operation is the multiplemicrophone Wiener filtering, in which the frequency response is adapted depending on the Signal-to-Noise Ratio (SNR) of the received signal. In this process, there is a need for unconstrained gain adjustments across the frequency bands. The oversampled filterbank implementation allows a wide range of gain adjustments without creating the so-called “time-aliasing” problem that happens in the critically sampled filterbanks. It has been observed that the operation cost is not much higher than the critically sampled filterbanks and much lower than the undecimated filterbanks. For more information see U.S. Pat. No. 6,236,737 “Filterbank Structure and Method for Filtering and Separating an Information Signal into Different Bands. Particularly for Audio Signal in Hearing Aids” R. Brennan and T Schneider, and “A Flexible Filterbank Structure for Extensive Signal Manipulations in Digital Hearing Aids,” R. Brennan and T. Schneider, Proc. IEEE Int. Symp. Circuits and Systems, pp. 569-572, 1998.
3) The so-called “Misadjustment” error, where there is excessive Mean Square Error when compared to an optimal Wiener filter, is typically present in adaptive systems. It is well known and understood that sub-band and orthogonal decomposition reduces this problem. The oversampled filterbank used in the invention employs such decomposition in at least one preferred embodiment.
4) Estimation of Target-to-Jammer Ratio (TJR) usually requires the cross-correlation of two or more microphone outputs (as described in “Improved Design of Microphone-Array Hearing Aids,” J. E. Greenberg, Ph.D Thesis, MIT, September 1994). The frequency domain implementation of the process using the oversampled filterbank is much faster and more efficient than the time-domain methods previously used.
5) By using the side process outputs of the Voice Activity Detector (VAD), the Target-to-Jammer Ratio (TJR) estimator, and the Signal-to-Noise Ratio (SNR) estimator, the adaptation process can be slowed down or totally inhibited when there is a strong target (like speech) presence. This enables the system to work in reverberant environments. There are enough pauses in speech signal to ensure that the inhibition process does not disturb the system performance. A suitable efficient frequency domain VAD that uses the oversampled filterbank is described in a copending patent application “Sub-band Adaptive Signal Processing in an Oversampled Filterbank,” K. Tam et. al., Canadian Patent Application Serial 2,354,808, August 2001, U.S. application Ser. No. ______, incorporated herein by reference.
According to a further preferred embodiment of the invention, shown in
Et(I)=Etot(I)−En(I) I=1,2 . . . B
SNR(I)=Et(I)/En(I)
If the noise statistics, and noise and target directions do not change much from one target signal pause to the next pause, the SNR(I) for each beam can be used to make a weighted sum of the beams. However, if the noise is highly non-stationary, or if the noise and/or target sources are moving quickly, an adaptive processor should be employed to adjust the weights. For improved performance, the fixed beamformer can be designed with a set of narrow beams covering the azimuth and elevation angles of interest for a particular application.
A further embodiment of the invention in a fixed beamforming application will now be discussed. The classical method of implementing a fixed beamformer is the delay-and-sum method. Because of the physical spacing of the microphones in the array, there is all inherent time delay between the signals received at each microphone. Hence, the delay-and-sum method utilizes a simple time-delay element to properly align the received signals so that the signals arriving from certain directions can be maximally in-phase, and contribute coherently to the summed output signal. Any signal arriving from other directions then contributes incoherently to the output signal so that its signal power can be reduced at the output.
With the FIR-filter method, the FIR filters are generally designed so that their phase responses take on the role of aligning the received signals to create the desired beampattern. These filters can be designed using transformation from analogue filters or direct FIR filter design approaches. When complex broadband signals are involved, such time-domain filter designs generally require the availability of a significant amount of computation power. For comparison,
Further preferred embodiments of the invention described herein perform a series of narrowband processing steps to solve the more complex broadband problem. The use of the oversampled filterbank allows the narrowband processing to be done in an efficient and practical manner.
The complex-valued gain factors of the beamforming filter can be derived in a number of ways. For example, if an analogue filter has been designed, then it can be implemented directly in sub-bands by simply using the centre frequency of each sub-band to look up the corresponding complex response of the analogue filter (frequency sampling). With sufficiently narrow sub-bands, this method can create a close digital equivalent of the analogue filter. In a further embodiment of the invention, to closely approximate the ideal phase and amplitude responses for wider sub-bands, a narrowband filter to each sub-band output is applied as will now be described relation to
Almost all implementations of beamformers suffer from a low-frequency roll-off effect. To compensate for this effect, most systems, including the proposed system, introduce low-frequency amplification. However, because of the unavoidable microphone internal noise, this inherently leads to a high level of output noise at very low frequencies. As is well known, the result is that the desired beampattern can only be obtained for the frequencies above some cut-off value (usually around J kHz based on a particular microphone separation distance). In a further embodiment, shown in
Besides the conventional digital filter design methods, the beamformer filter 710 in
The Cascaded Hybrid Neural Network (CHNN), designed specifically for sub-band signal processing, can be used to implement a beamforming filter. The CHNN consists of two classical neural networks—the Self-Organising Map (SOM) and Radial Basis Function Network (RBFN)-connected in a tapped-delay line structure (for example, see “Adaptive Noise Reduction Using a Cascaded Hybrid Neural Network,”. E. Chau. M. Sc. Thesis, School of Engineering, University of Guelphy, 2001. The neural network can also be used to provide integrated functions of the ANC, the beamforming filter and other signal processing algorithms in the sub-band signal processing system.
While the present invention has been described with reference to specific embodiments, the description is illustrative of the invention and is not to be construed as limiting the invention. Various modifications may occur to those skilled in the art without departing from the true spirit and scope of the invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2354858 | Aug 2001 | CA | national |
This application claims priority to copending Canadian Patent Application entitled, “Directional Audio Signal Processing Using an Oversampled Filterbank,” having serial number 2,354,858, filed Aug. 8, 2001, which is entirely incorporated herein by reference. This application is a Continuation-In-Part application of U.S. patent application Ser. No., 10/214,350 filed on Aug. 7, 2002, which has been allowed and which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 10214350 | Aug 2002 | US |
Child | 12013818 | Jan 2008 | US |