Studies have shown that only about one-third of individuals who have hearing loss utilize a hearing aid. Among those individuals, around one-third do not use their hearing aids regularly. The main reason for this disuse is often the dissatisfaction with the speech quality offered by modern hearing aids, especially in noisy environments where hearing-impaired individuals need them the most. Achieving music appreciation with hearing aids is an even greater challenge.
One highly effective approach for improving the audibility of sound for hearing impaired users is called Wide Dynamic Range Compression (WDRC), which is the amplification and reduction of the dynamic range, or volume swing, of an audio signal. WDRC involves amplifying quiet signals to improve audibility, and simultaneously decreasing the volume of loud signals to reduce discomfort to a hearing-impaired user.
Human hearing, however, is inherently frequency-dependent. The human cochlea perceives finer pitch variation at lower frequencies than at higher frequencies. Additionally, hearing loss is also typically frequency dependent, affecting certain frequency ranges more than others. For this reason, the compression gains needed to compensate for hearing loss vary across different frequency bands, necessitating a multiband approach to WDRC. Studies have shown that a greater number of frequency bins increases researchers' flexibility, especially for unusual hearing loss patterns.
In one aspect a Real-time Multirate Multiband Amplification system is presented herein which addresses the need for finer, more precise gain control in a hearing aid device. The system design provides higher flexibility and accuracy than currently available on open-source platforms. In one implementation the system includes:
1) A Multirate Audiometric Filter Bank, offering highly accurate low-latency subband decomposition which can be used for a variety of hearing enhancement algorithms. In this paper, we present a half-octave realization, centered at the standard audiometric frequencies of 250, 375, 500, 1000, 1500, 2000, 3000, 4000, 6000, and 8000 Hz.
2) A Multirate Automatic Gain Control system for WDRC that accurately fulfills the static and dynamic properties specified by audiologists, which include steady state Gains, as well as the dynamics of the Gains realized as the attack and release times of the said Gains in each subband.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
In one particular implementation presented for illustrative purposes and not as a limitation on the systems and techniques described herein, the multirate amplification system is implemented and tested on the Open Speech Platform (OSP)—an open source suite of software and hardware tools for performing research on emerging hearing aids and hearables. The OSP suite includes a wearable hearing aid, a wireless interface, and a set of hearing enhancement algorithms.
The structure of an audiometric filter bank reflects the spectral nature of the human cochlea, which is inherently logarithmic. The American Speech-Language-Hearing Association (ASHA) defines a set of ten audiometric frequencies used for pure-tone audiometry, which are 0.25, 0.5, 1, 1.5, 2, 3, 4, 6, and 8 kHz. These frequencies closely resemble a half-octave logarithmic sequence, and are commonly targeted for audiometric filter banks. However, every other frequency is not a true half-octave frequency, but rather a simplified integer approximation. The audiometric filter bank is a true half-octave channelizer, making it uniformly distributed on the logarithmic scale, as seen from
The American National Standards Institute (ANSI) S1.11 defines specifications for Half-Octave Acoustic filters. The standard includes three classes of filters—class 0, 1, and 2, where class 0 has the strictest tolerances and class 2 has the most lax tolerances. The filter bank meets class 0 standards—the highest of the three. Accordingly, each band of the filter bank has −75 dB sidelobe attenuation, and the in-band ripple is within ±0.15 dB. The ripple of the composite response of the channelizer is also within ±0.15 dB. It should be noted that as used herein ANSI generally refers to the ANSI s3.22 standard, unless otherwise stated.
We ensure that our filter bank has perfect reconstruction by employing complementary filter design. Complementary filters are two filters the sum of which is an all-pass filter. For any highpass or lowpass filter, its complement can be found by subtracting it from an all-pass filter, which is simply an impulse in the time domain. We designed all neighboring filter edges to be complements of each other, ensuring that their sum is an all-pass filter, which guarantees signal reconstruction. The channelizer offers perfect reconstruction within ±0.15 dB.
It is well known in the signal processing community that the sharper a digital filter is, the more coefficients it requires. As seen from
The multirate filter bank dramatically reduces both power consumption and latency by employing multirate signal processing. Compared to a single-rate implementation, multirate processing reduces the power consumption by a factor of 13.7, and reduces latency from 32 ms down to 5.4 ms.
The motivation behind multirate processing is to decrease the complexity of a filter by reducing the sampling rate. Table 1 lists the number of taps needed to implement the filters shown in
However, the complexity of a filter can be decreased by reducing the sampling rate. For a given bandpass filter, the relative bandwidth is narrower at a higher sampling rate and wider at a lower sampling rate. Thus, a filter spanning a fixed range of frequencies becomes relatively wider as the sampling rate decreases. As the relative filter bandwidth increases, the numbers of taps proportionately decrease. For example, when the sampling rate of a filter is decreased by half, the relative bandwidth of the filter doubles, and the number of taps needed to implement it is also halved.
We exploit the unique structure of the multirate, audiometric filter bank to map each frequency octave to a sampling rate. The audiometric channelizer is a half-octave filter bank spanning a frequency range of about 5 octaves, from 250 Hz to 8000 Hz. An octave is a logarithmic unit defined as the difference between two frequencies separated by a factor of two, and a half-octave is the difference between two frequencies separated by a factor of 2. Thus, a half-octave filter bank is binary logarithm and the bandwidth of any two filters an octave apart differs by a factor of two.
As such, we are able to map each octave of the channelizer to a different sampling rate. We start by designing two bandpass filters at the original sampling rate that span one octave. The next two filters are one octave below, are half as wide, and would require double the number of taps. However, if we lower the sampling rate of the lower octave, the number of taps would decrease by half, resulting in filters of the same length as the ones we started with. Following this pattern, we are able to design all the filters in the audiometric channelizer using the same number of coefficients for each filter.
Table 1 compares a single-rate versus a multirate implementation of the channelizer. In the single-rate case, as the bandwidth of the filters is halved for every octave, the number of filter coefficients doubles for every octave. However, in the multirate implementation, we do not increase the filter complexity because the decrease in a filter's bandwidth is compensated by a decrease in the sampling rate. (The 8 kHz band is an exception because it is a highpass rather than a bandpass filter.)
Conventionally, downsampling is performed by passing a signal through an antialiasing filter, and then decimating it. Similarly, conventional upsampling is performed by zero-packing a signal, and then passing it through an interpolating filter. As such, the complexity of conventional resamplers strongly depends on their resampling ratio-a high-ratio downsampler would require a sharp antialiasing filter to remove all unwanted frequencies, and a high-ratio upsampler would require a sharp interpolating filter to remove spectral signal copies. As before, sharp antialiasing and interpolating filters would require many taps, negating the power and latency benefits of multirate processing.
We combat this issue by performing resampling in multiple stages. Since all of our resamplers are multiples of two, we cascade multiple 1:2 or 2:1 resamplers to achieve the desired resampling ratio. 1:2 and 2:1 resamplers require only a short half-band filter for anti-aliasing and interpolating, which allows us to achieve high reductions of complexity.
We further reduce the complexity of the resamplers by employing polyphase filtering. Conventional resamplers perform many redundant computations, such as computing samples which will be discarded, or computing samples which are known to be zero. Polyphase filtering eliminates these redundant computations by splitting a single filter into multiple paths and employing the Noble identity to rearrange filtering and resampling.
We estimate the cumulative power consumption of the filter bank by computing the total number of multiply-and-accumulate operations per one output sample. For a filter running at a single sampling rate, the number of operations per sample is simply equal to the number of filter taps. However, in a multirate system, samples are continuously removed and added, which makes it impossible to match an input sample to a single output sample. As such, we compute the number of operations per sample of the multirate channelizer by calculating the total number of operations per input frame, and then normalizing by the input frame size. For each stage of the filter bank, we track the current frame size and the cumulative operations count. Due to the multirate structure of the channelizer, normalization by frame size results in a fractional number of operations per sample.
Table 2 compares the total number of multiply-and-accumulate operations per sample for a single-rate and multirate implementation of the channelizer. The multirate operations estimate accounts for all filters and resamplers. Our evaluations show that compared to a conventional approach, the multirate filter bank offers 13.7 improvement in complexity. For a wearable battery-operated system, power consumption and processing capabilities are of critical importance. Reducing the number of operations improves battery-life and frees processing power for other tasks.
As seen from
In order to eliminate this latency disparity, we realign the bands by inserting delays into the signals' paths, as seen in
WDRC is a type of automatic gain control (AGC) system which reduces the dynamic range of audio by applying varying gain to a signal depending on the instantaneous input magnitude. For any instantaneous input magnitude, the WDRC curve, shown in
It is insufficient, however, to set the gain of each audio sample independently. Studies in acoustics and speech intelligibility have shown that the rate of change of WDRC gain has a strong effect on speech clarity and legibility. The rate of change of gain is measured using the attack and release times, which play a key role in the performance of WDRC. However, to the best of our knowledge, currently available hearing aids do not have an accurate mechanism for setting attack and release times independently of other parameters. For example, the attack and release times of the Kates system depend on the user-defined compression ratio, which gives rise to major inaccuracies.
In the following we discuss the complex relationship between the attack and release times of WDRC and the parameters defining a WDRC curve. We also propose a multirate compression algorithm which yields precise response times for the dynamics of the WDRC gains, in accordance with ANSI standards for any user-defined WDRC parameters.
Wide Dynamic Range Compression calculates compression gains based on the instantaneous input magnitude. However, sound is a modulating signal, meaning the magnitude of the signal is contained in the envelope. Common approaches to finding the envelope of a modulating signal include peak detection, per-frame total power, sliding RMS windows, and more. However, all these approaches introduce inaccuracies into the envelope estimate, such as ripple or excessive smoothing. We estimate the signal envelope by employing the Hilbert Transform. The Hilbert Transform accepts a real signal and computes a 90-degree phase shifted imaginary component.
The magnitude of the input signal is then found as the absolute value of the real and imaginary components.
The accuracy of the Hilbert Transform depends on the accuracy of the underlying Hilbert Filter, which is a filter that cuts off the negative frequencies of the signal spectrum. If the transition bandwidth of the Hilbert Filter overlaps with signal content, then the computed envelope becomes distorted.
As seen from
The multirate Hilbert Transform produces highly accurate signal envelopes for all frequency channels of the filter bank.
The ANSI S3.22 Specification of Hearing Aid Characteristics defines the attack and release times for hearing aid devices. Given a step input which changes magnitude from 55 dB to 90 dB, as shown in
The general concept of Automatic Gain Control for WDRC, illustrated in
We derived a closed-form relationship between user compression parameters (compression ratio) and the attack and release times of a hearing aid, and designed an Automatic Gain Control (AGC) loop which yields exact attack and release values for any user-defined compression parameters. Our design builds upon work in by adapting radio AGC to Wide Dynamic Range Compression. The block diagram of the AGC algorithm is shown in
In this section, we derive the relationship between a and WDRC parameters such that the system yields exact attack and release times in any configuration. The behavior of the system above is described by the equation below.
Consider the ANSI test signal, which is a step input which changes magnitude from 55 dB to 90 dB at time n=0. Let us define G0 as the initial steady state gain before the step change. For n<0, R[n]=A1, X[n]=55, so G0=R[n]−X[n]=A1-55.
Let us define G∞ as the final steady state gain after the step change. For all times n≥0, R[n]=A2, X[n]=90, so G∞=R[n]−X[n]=A2-90. Using these definitions, for all n≥0, equation 1 can be rewritten as:
In order to gain insight into the behavior of the system, let us write out the gains of the first few samples:
As seen from the pattern formed in equation 3, the gain of the n'th sample is found as a geometric series, shown in equation 4a and simplified in equation 4b.
This important result provides us with an equation for gain as a function of time and α. As expected, at time n=0 the gain is equal to G0, and as n reaches infinity the gain approaches G∞.
Using the equation above, we can use known values of n to solve for α. As explained earlier, α is the only parameter which sets the attack and release times of the AGC system. Let AT represent the attack time. From the ANSI definition of attack time, we know that at time n=AT, the gain needs to be within 3 dB of steady state, which is G∞+3. Substituting these values into equation 4b yields:
The equation above contains only one unknown variable, allowing us to solve for αattack:
Following similar steps and using the ANSI definition for release time, we can find a similar expression for αrelease:
Equations 6 and 7 provide us with values for αattack and αrelease that guarantee exact attack and release times for the AGC loop. It is important to note that in equation and 7, the units for AT and RT are samples. Samples and milliseconds are related to each other through sampling rates which, as described earlier, varies between the different subbands.
It can be noted that the difference G0−G∞ is none other than the Overshoot pictured in
Another feature of the AGC loop, shown in
For purposes of illustrating the systems and techniques described herein and not as limitation thereon, the audiometric filter bank has been integrated into the Open Speech Platform (OSP), which is an open-source suite of hardware and software tools for conducting research into many aspects of hearing loss both in the lab and the field. The hardware system includes of a battery-operated wearable device running a Qualcomm 410c processor, similar to those in cellphones, with two ear-level assemblies attached—one for each ear.
At the core of OSP software is the real-time Master Hearing Aid (RT-MHA) reference design. Initially, the incoming audio signal from the microphones is sampled at 48 kHz and is then downsampled to 32 kHz (not to be confused with the resamplers present in the channelizer). The audio signal is then routed to the channelizer.
The outputs of the channelizer then pass through the WDRC unit to compensate for the user's hearing loss. Then the amplified outputs are recombined and passed through a Global Maximum Power Output (MPO) controller in order to limit the power outputted by the speaker. Finally, the audio is upsampled from 32 kHz back to 48 kHz and outputted through the speakers. Additionally, the RT-MHA reference design contains Adaptive Feedback Cancellation (AFC) in order to compensate for the feedback arising from the close proximity of the microphone and the speaker. More detailed explanations of the RT-MHA components can be found in L. Pisha et al., “A wearable, extensible, open-source platform for hearing healthcare research,” IEEE Access, vol. 7, pp. 2019, and D. Sengupta et al., “Open speech platform: Democratizing hearing aid research,” in Proceedings of the 14th EAI International Conference on Pervasive Computing Technologies for Healthcare, 2020.
We evaluated the design using the widely accepted Audioscan Verifit 2 Professional Verification system. Verifit 2 is a verification tool consisting of a soundproof binaural audio chamber, a display unit, and a set of powerful testing procedures, such as speech map, ANSI tests, and distortion.
We conducted steady state input-output measurements to evaluate the multirate amplification system running on Open Speech Platform hardware. The purpose of this test is to compare the experimentally measured input-output curve of our device to the ideal target curve specified by a hearing loss prescription. In this experiment, the hearing aid device is placed into the soundproof audio chamber. The Verifit's reference speaker plays calibrated audio signals with known acoustical properties into the hearing aid microphone, which becomes the input signal for the hearing aid. The processed output signal of the hearing aid is then collected by the Verifit's coupler microphone and is compared to the input signal to identify the measured gain.
We verified our system using seven of the ten standard pure tone audiograms developed by the International Standard for Measuring Advanced Digital Hearing Aids (ISMADHA) group, which represent a broad class of hearing loss patterns, from very mild to profound. We obtained compression parameters for a subset of ISMADHA using the NAL-NL2 Prescription Procedure, which is a widely accepted algorithm for generating hearing aid prescriptions from pure tone audiograms.
We performed steady state measurements at the eleven half-octave frequencies offered by the audiometric filter bank. For each frequency, we obtained the target compression curves, such as the ones shown in
Comparison with Other Work
We compared the (i) Multirate Audiometric Filter Bank and (ii) Multirate Wide Dynamic Range Compression System with the Kates Digital Hearing Aid, one of the most popular open-source tools for hearing aid research.
In one aspect, the systems and techniques described herein improve the spectral resolution of hearing aids.
We also used the Verifit's input-output curve feature to compare the prescription accuracy of the multirate eleven-band system versus the Kates system.
Table 4 compares the complexity and latency of the Kates filter bank and the eleven-band filter bank. In addition to offering almost twice the number of bands compared to Kates's filter bank, the proposed filter bank achieves about 3.5 times reduction in complexity, with a comparable algorithmic latency of 5.43 ms.
We also compared our Multirate Multiband Automatic Gain Control with Kates's approach. As described above the relationship between WDRC parameters and AGC response times are not explored in previous works. In the Kates approach, the AGC response times are controlled by the coefficients of the peak detector used to estimate the signal magnitude. The resulting coefficients are approximated to meet ANSI attack and release time standards, but diverge from target values significantly.
As a test case,
In this example, the measured attack and release times of the Multirate system are 10.2 ms and 20.5 ms respectively, which deviate from the target values by 0.2 ms (2%) and 0.5 ms (2.5%). On the other hand, the measured attack and release times of the Kates system are 4.4 ms and 37.3 ms respectively, which is a 5.6 ms (45%) and 17.3 ms (87%) deviation from the target values. This experiment shows that the Multirate system described herein satisfies attack and release times within 0.5 ms of the target value. However, the Kates system yields attack and release time values that significantly diverge (by orders of magnitude) from the target. Furthermore, this error is unpredictable because the internal coefficients responsible for attack and release times of the Kates system are designed to be “fudge” factors.
The Multirate systems described herein offer very accurate fulfillment of user (e.g., audiologist) designated attack and release times. However, neither the current standards nor popular HA prescription tools provide guidance for the dynamic aspects of dynamic range compression.
In summary, a real-time multirate, multiband amplification system for hearing aids has been described herein. The system improves upon the prescription accuracy of hearing aids and provides an open-source tool for hearing loss research.
We designed a channelizer offering eleven frequency sub-bands centered at the standard frequencies used in pure-tone audiometry, with high side-lobe attenuation and low ripple. This high frequency allows our hearing aid system to accurately satisfy hearing aid prescriptions, even for complex and unusual hearing loss patterns. The × channelizer uses multirate processing to reduce the complexity by about 14 compared to a single-rate implementation. By employing minimum-phase filters, we decreased the latency of our filter bank to 5.43 ms, which is within the conventional threshold for modern hearing aids.
We also designed an automatic gain control (AGC) system which provides accurate control of the steady state and dynamic behavior of dynamic range compression. We use the Hilbert Transform to find the instantaneous signal magnitude, which provides higher accuracy than conventional instantaneous power estimation methods. Furthermore, we derived the closed-form relationship between the compression parameters of our AGC loop, and the attack and release times at the output. The accurate fulfilment of attack and release times in dynamic range compression opens new opportunities for exploring the relationship between response times and hearing impaired users' satisfaction.
In one example, the Multirate Multiband Amplification System was implemented on Open Speech Platform—an open-source suite of hardware and software tools for hearing loss research. The system runs in real-time on a wearable device and is suited for hearing loss research both in the lab and in the field.
The particular systems and methods described above have been presented for illustrative purposes and not as a limitation on the subject matter described herein. More generally, in one aspect, a method is presented for performing frequency sub channelization. In accordance with the method, a digital signal is received at an original sampling rate. A plurality of multirate frequency channels is produced by dividing the digital signal into an integer number of multirate frequency channels such that a sampling rate of each of the multirate frequency channels is proportional to a center frequency of the frequency channel. Signal processing is performed on each of the multirate frequency channels. The original sampling rate is reconstructed using the multirate frequency channel.
In some embodiments the digital signal is a digital audio signal and dividing the digital audio signal into an integer number of multirate frequency channels includes dividing the digital audio signal into an integer number of multirate frequency channels per octave.
In some embodiments the method further includes recombining the upsampled multirate frequency channels.
In some embodiments the signal processing performed on each of the multirate frequency sub-bands includes automatic gain control (AGC) for wide dynamic range compression (WDRC).
In some embodiments the AGC for WDRC uses an algorithm that has a closed form relationship between user compression parameters and compression gains and compression attack and release times.
In some embodiments each respective multirate frequency channel is sampled at a rate that is proportional to a frequency of an octave to which the multirate frequency channel belongs.
In another aspect, a hearing aid device is presented. The hearing aid includes a microphone, a multi-band hearing aid processing circuit, and a speaker. The microphone is configured to receive an audible input signal from an environment and convert the audible input signal to an electrical audio input signal. The multi-band hearing aid processing circuit is configured for processing the electrical audio input signal. The multi-band hearing aid processing circuit is further configured to: receive the electrical audio input signal and produce a digital signal at an original sampling rate; produce a plurality of multirate frequency channels that divide the digital signal into an integer number of multirate frequency channels per octave; perform envelope detection on each of the multirate frequency channels; perform automatic gain control (AGC) for WDRC using the detected envelope of each of the multirate frequency channels using an algorithm that has a closed form relationship between user compression parameters and compression gains and compression attack and release times; upsample the multirate frequency channels to the original sampling rate; and recombine the upsampled multirate frequency channels to produce an electrical audio output signal. The speaker is configured to receive the electrical audio output signal from the multi-band hearing aid processing circuit and emit an audible output signal into an ear of a user.
In some embodiments the envelope detection is performed using a Hilbert Transform.
In some embodiments the Hilbert Filter utilized in the Hilbert Transform is a minimum phase Hilbert Filter.
In some embodiments the envelope detection is performed using a peak detector.
In some embodiments the envelope detection is performed using a frame-based power estimation technique.
The claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter. For instance, the claimed subject matter may be implemented as a computer-readable storage medium embedded with a computer executable program, which encompasses a computer program accessible from any computer-readable storage device or storage media. For example, computer readable storage media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). However, computer readable storage media do not include transitory forms of storage such as propagating signals, for example. Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
Some of the elements described in the disclosed embodiments may be implemented as modules that define an isolatable element that performs a defined function and has a defined interface to other elements. The blocks described in this disclosure may be implemented as modules in hardware, a combination of hardware and software, firmware, or a combination thereof. For example, modules may be implemented using computer hardware in combination with software routine(s) written in a computer language (MATLAB, Java, HTML, XML, PHP, Python, ActionScript, JavaScript, Ruby, Prolog, SQL, VBScript, Visual Basic, Perl, C, C++, Objective-C or the like). Additionally, it may be possible to implement modules using physical hardware that incorporates discrete or programmable analog, digital and/or quantum hardware. Examples of programmable hardware include: computers, microcontrollers, microprocessors, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), and complex programmable logic devices (CPLDs). Computers, microcontrollers and microprocessors are programmed using languages such as assembly, C, C++ or the like, FPGAs, ASICs and CPLDs are often programmed using hardware description languages (HDL) such as VHSIC hardware description language (VHDL) or Verilog that configure connections between internal hardware modules with lesser functionality on a programmable device. Finally, it needs to be emphasized that the above mentioned technologies may be used in combination to achieve the result of a functional module.
In addition, it should be understood that any figures that highlight any functionality and/or advantages, are presented for example purposes only. The disclosed architecture is sufficiently flexible and configurable, such that it may be utilized in ways other than that shown. For example, instructions listed in any block may be re-ordered, combined with other instructions, or only optionally used in some embodiments.
The foregoing description, for the purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the embodiments and its practical applications, to thereby enable others skilled in the art to best utilize the embodiments and various modifications as may be suited to the particular use contemplated. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein but may be modified within the scope and equivalent of the appended claims.
This application claims the benefit of U.S. Provisional Application Ser. No. 63/273,512, filed Oct. 29, 2022, the contents of which are incorporated herein by reference.
This invention was made with government support under DC015046 and DC015436 awarded by the National Institutes of Health, and under U.S. Pat. No. 1,838,830 awarded by the National Science Foundation. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/048465 | 10/31/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63273512 | Oct 2021 | US |