1. Field of the Invention
The present invention relates generally to signal and speech processing for coding strategies in medical devices, and more particularly, to hearing prostheses such as cochlear implants.
2. Related Art
There are several electrical stimulation devices that use an electrical signal to stimulate nerve, tissue or muscle fibers in a user. Cochlear implants and similar hearing devices apply a stimulating signal to the cochlea of the ear to stimulate a percept of hearing. More particularly, these systems include a microphone that receives ambient sounds, a signal processor that converts selected sounds according to a speech coding strategy into corresponding stimulating signals, and an implanted electrode array for delivering stimuli to the recipient. The recipient (also referred to as a patient herein) receives a perception of hearing based on the nerve stimulation.
Although hearing implants have been widely used, there is an on-going need to improve the fidelity of speech and sound percepts which are experienced by the users.
According to a first aspect of the present invention, there is provided a method for processing sound signals for use in a hearing prosthesis, the method comprising: receiving a signal representative of a sound signal over a frequency range; applying a first filter bank, having a relatively higher spectral resolution, to a first selected region or regions of said frequency range to produce a first set of a plurality of substantially equally spaced channel outputs; applying a second filter bank, having a relatively lower spectral resolution, to a second selected region or regions of said frequency range to produce a second set of a plurality of substantially equally spaced channel outputs; and combining the first and second sets of channel outputs, and processing the combined outputs so as to produce a set of stimulation signals for said hearing prosthesis; wherein the spacing of the second set of channel outputs is greater than the spacing of the first set of channel outputs.
According to another aspect of the present invention, there is provided a hearing prosthesis comprising: a receiver configured to receive a signal representative of a sound signal over a frequency range; a first filter bank, having a relatively higher resolution, adapted to process said received signal and produce a first set of a plurality of substantially equally spaced channel outputs relating to a first selected region or regions of said frequency range; a second filter bank having a relatively lower resolution, adapted to process said received signal and produce a second set of a plurality of substantially equally spaced channel outputs relating to at least a second region or regions of said frequency range; and combination unit configured to combine the first and second sets of channel outputs; and a processor configured to produce a set of stimulation signals for said hearing prosthesis using the combined outputs; wherein the spacing of the second set of channel outputs is greater than the spacing of the first set of channel outputs.
According to yet another aspect of the present invention, there is provided a system for processing sound signals, the system comprising: means for receiving a signal representative of a sound signal over a frequency range; first means for filtering a first selected region or regions of said frequency range to produce a first set of a plurality of substantially equally spaced channel outputs, wherein the means for filtering has a relatively higher spectral resolution; second means for filtering a second selected region or regions of said frequency range to produce a second set of a plurality of substantially equally spaced channel outputs, wherein the second means for filtering has a relatively lower spectral resolution; means for combining the first and second sets of channel outputs; and means for processing the combined outputs so as to produce a set of stimulation signals for said hearing prosthesis; wherein the spacing of the second set of channel outputs is greater than the spacing of the first set of channel outputs.
The invention will be described in conjunction with the accompanying drawings, in which:
Embodiments of the present invention recognize that certain areas of the hearing frequency range are of more significance than others to speech perception. Accordingly, instead of employing a conventional approach of generally equally-spaced analysis channels, aspects of the present invention provide more closely spaced analysis channels in one or more regions of the hearing frequency range, thereby providing higher spectral resolution in those selected regions.
In one exemplary embodiment of the present invention there is provided a new filter bank specification to be implemented with speech coding strategies and may emphasize, with high spectral resolution, the speech fundamental or speech harmonics over a specific region or regions. One advantage of such an embodiment may be to increase spectral cues in one or more parts of the processed audio spectrum. In addition, such a filter bank of the present invention may specify the region or regions that are able to resolve increased spectral harmonics from speech signals to allow a prosthetic hearing implant patient to better distinguish different harmonic structures in speech by providing cues to voice-pitch perception, and thus aid tasks such as identification of male/female talker, perception of tonal languages and appreciation of music.
Although an exemplary embodiment will be described in use with prosthetic hearing devices, the present invention may also be used in other stimulating applications that require emphasizing particular spectrums. For example, embodiments may also be applied to other neural stimulation applications, so that higher spectral resolution is provided in some regions of interest than in the broader frequency range of interest.
Examples of prosthetic hearing devices systems are shown in U.S. Pat. Nos. 6,537,200, 6,575,894, and 6,697,674, and PCT Published Application No. WO 02/17679, the entire contents and disclosures of which are hereby incorporated by reference herein. In typical prosthetic hearing implant devices, there may be as many as 22-24 electrodes. Depending on the strategy used, a portion of the 22-24 electrodes may carry a transmitted stimulating signal to the nerves in a cochlea.
Embodiments of the present invention may be used in combination with any speech strategy now or later developed, including but not limited to, Continuous Interleaved Sampling (CIS), Spectral PEAK Extraction (SPEAK), and Advanced Combination Encoders (ACE™). An example of such speech strategies is described in U.S. Pat. No. 5,271,397, the entire contents and disclosures of which is hereby incorporated by reference herein. Embodiments of the present invention may also be used with other speech coding strategies. Preferably, the present invention may be used on Cochlear Limited's Nucleus™ implant system that uses a range of coding strategies alternatives, including SPEAK, ACE™, and CIS. Among other things, these strategies offer a trade-off between temporal and spectral resolution of the coded audio signal by changing the number of frequency channels chosen in the signal path. A typical ACE™ signal path is shown in
Specifically, a signal is received by a microphone (not shown) and is multiplied by a smoothing window and passed through a filter bank process 112 using a Fast Fourier Transform (FFT) to produce 64 signals for channel combination unit 114 to process. In conventional systems, channel combination unit 114 may be limited by the number of electrodes available in the system, e.g. 22 electrodes. Once channel combination unit 114 combines the number of channels to match the number of electrodes, the processed signal is sent to an equalizer 116 and a maxima extractor unit 118. Maxima extractor unit 118 may extract the largest amplitude channels for stimulating the electrodes according to the speech strategy employed. Once the electrodes are chosen, a mapping unit 120 arranges the signals for stimulating the corresponding electrodes.
For example, with ACE™ on the commercially available SPrint™ speech processor from Cochlear Limited, the number of analysis filter channels may be varied between 6 and 22, depending on the number of electrodes available and the overall requirements for the filter bank. If the frequency range over which these channels are formed remains constant, e.g. 80 Hz-8000 Hz, then a setting of 6 will consist of a set of 6 wide filters while a setting of 22 will consist of a set of 22 considerably narrower filters. In some cases, overlapping filters may also be desirable, such that more filters does not necessarily mean they will be narrower, but “more overlapped” with other filters. It is known that prosthetic hearing implant patients may be able to make use of both spectral and temporal cues with the stimuli presented to their cochlea, and thus the use of wider filters may provide more temporal information.
Certain embodiments of the present invention provide a filter bank that may increase the number of channels to enhance any region of the spectrum where finer spectral detail might be required via many narrow filters. Currently, approximately logarithmic, center frequency spaced filters are typically used in prosthetic hearing implants. An embodiment of the present invention may include a region of high spectral resolution filters within an otherwise logarithmically spaced filter bank. An advantage of the present invention may be to provide more channels in the filter bank path, so that more channels would become available for selection in the following stages of processing, such as maxima extraction. Channel combination unit 114 may be able to increase the number of available channels for selection by post processing modules 106.
The number of channels used in embodiments of the present invention may be more than the number of electrodes present in the system. An additional channel may be placed between each existing electrode channel to emphasis certain regions. For example, an electrode array with 10 electrodes may use 19 channels in processing the audio signal. An increase in the number of channels may allow such embodiments of the present invention to easily accommodate prosthetic hearing implants that have increased numbers of electrodes without any major modifications to the implants.
Alternatively, embodiments of the present invention may use any number of filters and are not limited to the number of electrodes in the system, since any number of intermediate stimulation sites may be created via mechanisms such as described in U.S. Pat. No. 5,649,970 the entire contents and disclosures of which are hereby incorporated by reference.
A filter bank of the present invention may be designed to select a particular harmonic region of the speech spectrum. Any portion of the sound range captured by a prosthetic hearing implant, i.e., approximately 0 Hz to 16000 Hz, may be selected by embodiments of the present invention. The selected portion of the speech spectrum may be divided according to formants, i.e., large concentrations of energy in speech, in particular which together determine the characteristic quality of a vowel sound. Examples of regions to select may be the F1 region of speech, approximately 300 Hz to 1000 Hz, or a subset of this region, e.g., 400 Hz to 800 Hz. Another region to select may be F2 region of speech, approximately 850 Hz to 2500 Hz. Additionally, embodiments of the present invention may be extended to the fundamental frequency range that would target the F0 region of speech, approximately 80 Hz to 400 Hz. In addition, multiple portions or non-consecutive ranges, i.e., 400 Hz to 700 Hz and 1000 Hz to 1500 Hz, may be selected.
Any type of filter bank construction now or later developed may be used, such as FIR, IIR or FFT if implemented in a Digital Signal Processor (DSP). With increasing numbers of channels, it often becomes more efficient to use a FFT. In addition, a dual FFT structure may be used where the high resolution FFT covers the 400 Hz-800 Hz frequency region and a low resolution FFT covers the remaining spectrum.
A filter bank of the present invention may be based on a dual FFT filter bank. The first FFT, low resolution, may have a wide filter (128 pt) and operates over the full audio input bandwidth, which is 0-8 kHz. The second FFT, high resolution, may be narrower filter (256 pt) and operates over the 0-4 kHz band. The second FFT provides four times increased resolution for low frequencies compared to standard ACE™ based on a single 128 pt FFT, assuming a 16 kHz sample rate.
Because the high resolution 256 pt FFT 206 filter bank requires twice as many samples at half the Fs sample rate, there may be a processing latency of four times the low resolution 128 pt FFT 204 filter bank. To allow more time to align the low 204 and high resolution FFTs 206, a FIFO delay 211 (or other similar buffering operation) may be used before the low resolution 128 pt FFT 204 window function, since the high resolution 256 pt FFT 206 will be approximately 12 ms behind. The 12 ms delay results from processing delay through the high resolution path, which in this example is 16 ms, less the processing delay though the low resolution path, which is 4 ms. The exact length of the FIFO is dependant on the implementation, including the delay through the down sampling low pass filter (LPF) 220. This filter could be an IIR or FIR.
It is illustrative to look at the spectrums of the two synthetic vowels, identical except for fundamental frequency. The two vowels are both “a”, with the first having a fundamental frequency of 130 Hz and the second having a fundamental frequency of 180 Hz, both typical speech fundamentals, are shown in
In conventional speech strategy processing, such as ACE™ on the SPrint™ speech processor, available from Cochlear Limited, filters are typically of the order 180 Hz wide, spaced for example at center frequencies 250 Hz, 375 Hz, and 500 Hz, etc. The 180 Hz spacing and overlap between filters means that the change in the vowel fundamental by 50 Hz and the resultant harmonic spacing does not have much of a change in the energy coming out of each ACE™ filter, which results in audio stimulation. This is shown in
In
Embodiments of the present invention may provide an improved spectral resolution by providing many narrow filters in regions of high harmonic energy. In general, for segments of voiced speech, one or more filters in this region will have a relatively large amount of energy in them, while one or more other nearby filters will have relatively little energy in them. Using more filters in regions of relatively large amounts of energy allows the present invention to gives an emphasised cue of the spectral content of a particular region of the speech spectrum.
Using the Nucleus™ Matlab™ Toolbox (NMT), it is possible to examine what happens when spectral resolution is increased with a common prosthetic implant processing strategy, such as ACE™.
The same two vowels “a” with different fundamentals, as shown in
The greatest spectral discrimination between fundamental frequencies for each vowel is given by the last filter bank, as shown in
Embodiments of the present invention enhances the spectral cues, such as those shown in
One example of an implementation of a filter bank for a prosthetic implant speech processor may define a region where the analyze of spectral harmonics with channel spacing equal to or better than the 512 pt FFT, as shown in
A specific implementation of the concept is defined for use in a cochlear implant system using a defined region of 400 Hz to 800 Hz as the target region for the increased resolution. This region carries considerable F1 (1st) formant energy for typical voiced speech. The total number of filters used is 43, i.e., one additional channel in between each existing electrode channel in the Nucleus® 24 system (22+21 in between=43). Since there is a desire for the higher frequency resolution in a particular region of the spectrum (400 Hz to 800 Hz), wider filters can be used above and below this region, such as a logarithmically spaced fashion normal with ACE™. Two wider filters are chosen to cover the F0 region, below 400 Hz, and approximately log spaced filters following a shifted version of the natural characteristic cochlea filters are chosen above 800 Hz. The total number of filters, including the high resolution ones, is 43.
A center frequency plot of an embodiment of the present invention, namely a Harmonic Emphasis Filter bank (HEF), is compared to a SPrint™ ACE™ filter bank as shown in
As shown in
Computer simulating software, such Simulink™, was used to represent an example of a dual FFT 1002 constructed in accordance with the present invention. As shown in
The magnitude output from the low and high resolution FFTs may be made available as a dual buffer of values representing the energy in each bin of each FFT. A Frequency Allocation Table (FAT) may be arbitrarily constructed to make use of any bins (either single or combined) for the required filter bank.
The following example used a bin allocation table for the use of the two FFT outputs as shown in the table illustrated in
Although the present invention has been fully described in conjunction with the certain embodiment thereof with reference to the accompanying drawings, it is to be understood that various changes and modifications may be apparent to those skilled in the art. For example, embodiments of the present invention have been described in connection with a prosthetic hearing device. As noted, the present invention may be implemented in any electrical stimulating device now or later developed.
This application is a continuation of U.S. Utility patent application Ser. No. 11/167,283, filed Jun. 28, 2005 entitled “Selective Resolution Speech Processing” and makes reference to and claims the priority of U.S. Provisional Patent Application No. 60/583,013, entitled, “Harmonic Emphasis Filter bank,” filed Jun. 28, 2004. The entire disclosure and contents of the above applications are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60583013 | Jun 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11167283 | Jun 2005 | US |
Child | 12772823 | US |