This disclosure relates generally to hearing assistance devices, and more particularly to frequency translation by high-frequency spectral envelope warping in hearing assistance devices.
Hearing assistance devices, such as hearing aids, include, but are not limited to, devices for use in the ear, in the ear canal, completely in the canal, and behind the ear. Such devices have been developed to ameliorate the effects of hearing losses in individuals. Hearing deficiencies can range from deafness to hearing losses where the individual has impairment responding to different frequencies of sound or to being able to differentiate sounds occurring simultaneously. The hearing assistance device in its most elementary form usually provides for auditory correction through the amplification and filtering of sound provided in the environment with the intent that the individual hears better than without the amplification.
In order for the individual to benefit from amplification and filtering, they must have residual hearing in the frequency regions where the amplification will occur. If they have lost all hearing in those regions, then amplification and filtering will not benefit the patient at those frequencies, and they will be unable to receive speech cues that occur in those frequency regions. Frequency translation processing recodes high-frequency sounds at lower frequencies where the individual's hearing loss is less severe, allowing them to receive auditory cues that cannot be made audible by amplification.
One way of enhancing hearing for a hearing impaired person was proposed by Hermansen, Fink, and Hartmann in 1993. “Hearing Aids for Profoundly Deaf People Based on a New Parametric Concept,” Hermansen, K.; Fink, F. K.; Hartmann, U; Hansen, V. M., Applications of Signal Processing to Audio and Acoustics, 1993. “Final Program and Paper Summaries,” 1993 IEEE Workshop on, Vol., Iss, 17-20 Oct. 1993, pp. 89-92. They proposed that a vocal tract (formant) model be constructed by linear predictive analysis of the speech signal and decomposition of the prediction filter coefficients into formant parameters (frequency, magnitude, and bandwidth). A speech signal was synthesized by filtering the linear prediction residual with a vocal tract model that was modified so that any high frequency formants outside of the range of hearing of a hearing impaired person were transposed to lower frequencies at which they can be heard. They also suggested that formants in low-frequency regions may not be transposed. However, this approach is limited in the amount of transposition that can be performed without distorting the low frequency portion of the spectrum (e.g., containing the first two formants). Since the entire signal is represented by a formant model, and resynthesized from the modified (transposed) formant model, the entire signal may be considerably altered in the process, especially when large transposition factors are used for patients having severe hearing loss at mid and high frequencies. In such cases, even the part of the signal that was originally audible to the patient is significantly distorted by the transposition process.
In U.S. Pat. No. 5,571,299, Melanson presented an extension to the work of Hermansen et. al. in which the prediction filter is modified directly to warp the spectral envelope, thereby avoiding the computationally expensive process of converting the filter coefficients into formant parameters. Allpass filters are inserted between stages in a lattice implementation of the prediction filter, and the fractional-sample delays introduced by the allpass filters determine the nature of the warping that is applied to the spectral envelope. One drawback of this approach is that it does not provide direct and complete control over the shape of the warping function, or the relationship between input frequency and transposed output frequency. Only certain input-output frequency relationships are available using this method.
In U.S. Pat. No. 5,014,319, Leibman relates a frequency transposition hearing aid that classifies incoming sound according to frequency content, and selects an appropriate transposition factor on the basis of that classification. The transposition is implemented using a variable-rate playback mechanism (the sound is played back at a slower rate to transpose to lower frequencies) in conjunction with a selective discard algorithm to minimize loss of information while keeping latency low. This scheme was implemented in the AVR TranSonic™ and ImpaCt™ hearing aids. However, in at least one study, this variable-rate playback approach has been shown to lack effectiveness in increasing speech intelligibility. See, for example, “Preliminary results with the AVR ImpaCt Frequency-Transposing Hearing Aid,” McDermott, H. J.; Knight, M. R.; J. Am. Acad. Audiol., 2001 March; 12 (3); 121-7 11316049 (P, S, E, B), and “Improvements in Speech Perception with use of the AVR TranSonic Frequency-Transposing Hearing Aid,” McDermot, H. J.; Dorkos, V. P.; Dean, M. R.; Ching, T. Y.; J. Speech Lang. Hear. Res. 1999 December; 42(6):1323-35. Some disadvantages of this approach are that the entire spectrum of the signal is transposed, and that the pitch of the signal is, therefore, altered. To address this deficiency, this method uses a switching system that enables transposition when the spectrum is dominated by high-frequency energy, as during consonants. This switching system may introduce errors, especially in noisy or complex audio environments, and may disable transposition for some signals which could benefit from it.
In U.S. Patent Application Publication 2004 0264721 (issued as U.S. Pat. No. 7,248,711), Allegro et. al. relate a method for frequency transposition in a hearing aid in which a nonlinear frequency transposition function is applied to the spectrum. In contrast to Leibman, this algorithm does not involve any classification or switching, but instead transposes low frequencies weakly and linearly and high frequencies more strongly. One drawback of this method is that it may introduce distortion when transposing pitched signals having significant energy at high frequencies. Due to the nonlinear nature of the transposition function (the input-output frequency relationship), transposed harmonic structures become inharmonic. This artifact is especially noticeable when the inharmonic transposed signal overlaps the spectrum of the non-transposed harmonic structure at lower frequencies.
The Allegro algorithm is described as a frequency domain algorithm, and resynthesis may be performed using a vocoder-like algorithm, or by inverse Fourier transform. Frequency domain transposition algorithms (in which the transposition processing is applied to the Fourier transform of the input signal) are the most-often cited in the patent and scholarly literature (see for example Simpson et. al., 2005, and Turner and Hurtig, 1999, U.S. Pat. No. 6,577,739, U.S. Patent Application Publication 2004 0264721 (issued as U.S. Pat. No. 7,248,711) and PCT Patent Application WO 0075920). “Improvements in speech perception with an experimental nonlinear frequency compression hearing device,” Simpson, A.; Hersbach, A. A.; McDermott, H. J.; Int J Audiol. 2005 May;44(5):281-92; and “Proportional frequency compression of speech for listeners with sensorineural hearing loss,” Turner, C. W.; Hurtig, R. R.; J Acoust Soc Am. 1999 August;106(2):877-86. Not all of these method render transposed harmonic structure inharmonic, but they all share the drawback that the pitch of transposed harmonic signals are altered.
Kuk et. al. (2006) discuss a frequency transposition algorithm implemented in the Widex Inteo hearing aid, in which energy in the one-octave neighborhood of the highest-energy peak above a threshold frequency is transposed downward by one or two octaves (a factor of two or four) and mixed with the original unprocessed signal. “Linear Frequency Transposition: Extending the Audibility of High-Frequency Information,” Francis Kuk; Petri Korhonen; Heidi Peeters,; Denise Keenan; Anders Jessen; and Henning Andersen; Hearing Review 2006 October. As in other frequency domain methods, one drawback of this approach is that high frequencies are transposed into lower frequencies, resulting in unnatural pitch transpositions of the sound. Additional artifacts are introduced when the harmonic structure of the transposed signal overlaps the spectrum of the non-transposed harmonic structure at lower frequencies.
Therefore, an improved system for improved intelligibility without a degradation in natural sound quality in hearing assistance devices is needed.
Disclosed herein, among other things, is a system for frequency translation by high-frequency spectral envelope warping in a hearing assistance device for a wearer. According to various embodiments, the present subject matter includes a method for processing an audio signal received by a hearing assistance device, including: filtering the audio signal to generate a high frequency filtered signal, the filtering performed at a splitting frequency; transposing at least a portion of an audio spectrum of the filtered signal to a lower frequency range by a transposition process to produce a transposed audio signal; and summing the transposed audio signal with the audio signal to generate an output signal, wherein the transposition process includes: estimating an all-pole spectral envelope of the filtered signal from a plurality of line spectral frequencies; applying a warping function to the all-pole spectral envelope of the filtered signal to translate the poles above a specified knee frequency to lower frequencies, thereby producing a warped spectral envelope; and exciting the warped spectral envelope with an excitation signal to synthesize the transposed audio signal. It also provides for the estimation of the line spectral frequencies estimated from a set of linear prediction coefficients. It also provides for application of warping functions to the line spectral frequencies. It also provides for scaling the transposed audio signal and summing the scaled transposed audio signal with the audio signal. It is contemplated that the filtering includes, but is not limited to high pass filtering or high bandpass filtering. In various embodiments, the estimating includes performing linear prediction. In various embodiments, the estimating is done in the frequency domain. In various embodiments the estimating is done in the time domain.
In various embodiments, the pole frequencies are translated toward the knee frequency and may be done so linearly using a warping factor or non-linearly, such as using a logarithmic or other non-linear function. Such translations may be limited to poles above the knee frequency.
In various embodiments, the excitation signal is a prediction error signal, produced by filtering the high-pass signal with an inverse of the estimated all-pole spectral envelope. The present subject matter in various embodiments includes randomizing a phase of the prediction error signal, including translating the prediction error signal to the frequency domain using a discrete Fourier Transform; randomizing a phase of components below a Nyquist frequency; replacing components above the Nyquist frequency by a complex conjugate of the corresponding components below the Nyquist frequency to produce a valid spectrum of a purely real time domain signal; inverting the DFT to produce a time domain signal; and using the time domain signal as the excitation signal. It is understood that in various embodiments the prediction error signal is processed by using, among other things, a compressor, peak limiter, or other nonlinear distortion to reduce a peak dynamic range of the excitation signal. In various embodiments the excitation signal is a spectrally shaped or filtered noise signal.
In various embodiments the system includes combining the transposed signal with a low-pass filtered version of the audio signal to produce a combined output signal, and in some embodiments the transposed signal is adjusted by a gain factor prior to combining.
The system also provides the ability to modify pole magnitudes and frequencies.
In various embodiments, the system includes different uses of line spectral frequencies to simplify computations of the frequency translation process.
This Summary is an overview of some of the teachings of the present application and not intended to be an exclusive or exhaustive treatment of the present subject matter. Further details about the present subject matter are found in the detailed description and appended claims. The scope of the present invention is defined by the appended claims and their legal equivalents.
The following detailed description of the present subject matter refers to subject matter in the accompanying drawings which show, by way of illustration, specific aspects and embodiments in which the present subject matter may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present subject matter. References to “an”, “one”, or “various” embodiments in this disclosure are not necessarily to the same embodiment, and such references contemplate more than one embodiment. The following detailed description is demonstrative and not to be taken in a limiting sense. The scope of the present subject matter is defined by the appended claims, along with the full scope of legal equivalents to which such claims are entitled.
The present subject matter relates to improved speech intelligibility in a hearing assistance device using frequency translation by high-frequency spectral envelope warping. The system described herein implements an algorithm for performing frequency translation in an audio signal processing device for the purpose of improving perceived sound quality and speech intelligibility in an audio signal when presented using a system having reduced bandwidth relative to the original signal, or when presented to a hearing-impaired listener sensitive to only a reduced range of acoustic frequencies.
One goal of the proposed system is to improve speech intelligibility in the reduced-bandwidth presentation of the processed signal, without compromising the overall sound quality, that is, without introducing undesirable perceptual artifacts in the processed signal. In embodiments implemented in a real-time listening device, such as a hearing aid, the system must conform to the computation, latency, and storage constraints of such real-time signal processing systems.
Hearing Assistance Device Application
In one application, the present frequency translation system is incorporated into a hearing assistance device to provide improved speech intelligibility without undesirable perceptual artifacts in the processed signal.
Frequency Translation System Example
The system of
In one embodiment, the filter 210 in the lower block is omitted. In one embodiment the filter 210 is replaced by a simple delay compensating for the delay incurred by filtering in the upper processing branch.
In one embodiment the frequency translation processor 320 is programmed to perform a piecewise linear frequency warping function. Greater detail of one embodiment is provided in
The three algorithm parameters described above, the splitting frequency, the warping function knee frequency, and the warping ratio, determine which parts of the spectral envelope are processed by frequency translation, and the amount of translation that occurs.
The frequency warping function governs the behavior of the frequency translation processor, whose function is to alter the shape of the spectral envelope of the processed signal. In such embodiments, the pitch of the signal is not changed, because the spectral envelope, and not the fine structure, is affected by the frequency translation process. This process is depicted in
In one embodiment of the present system a whitened excitation signal, derived from linear predictive analysis, is processed using a warped spectral envelope filter to construct a new signal whose spectral envelope is a warped version of the envelope of the input signal, having peaks above the knee frequency translated to lower frequencies. In one embodiment, the peak frequencies are computed directly from the values of the complex poles in the filter derived by linear prediction. In one embodiment the peak frequencies are estimated by examination of the frequency response of the filter. Other approaches for determining the peak frequencies are possible without departing from the scope of the present subject matter.
By translating the peak frequencies according to the frequency warping function described above, a new warped spectral envelope is specified which is used to determine the coefficients of the warped spectral envelope filter. In one embodiment, the filter pole frequencies can be modified directly, so that the spectral envelope described by the filter is warped, and peak frequencies above the knee frequency (such as 2 kHz shown in
The whitened excitation signal, derived from linear predictive analysis, may be subjected to further processing to mitigate artifacts that are introduced when the high-frequency part of the input signal contains very strong tonal or sinusoidal components. For example, the excitation signal may be made maximally noise-like (and less impulsive) by a phase randomization process. This can be achieved in the frequency domain by computing the discrete Fourier transform (DFT) of the excitation signal, and expressing the complex spectrum in polar form (magnitude and phase, or angle). The phase of components at and below the Nyquist frequency (half the sampling frequency) are replaced by random values, and the components above the Nyquist frequency are made equal to the complex conjugate of corresponding (mirrored about the Nyquist component) components below the Nyquist frequency, so that the representation corresponds to a real time domain signal. This frequency domain representation is then inverted to obtain new excitation signal.
In various alternative embodiments, the excitation signal may be replaced by a shaped (filtered) noise signal. The noise may be shaped to behave like a speech-like spectrum, or may be shaped by a highpass filter, and possibly using the same splitting filter used to isolate the high-frequency part of the input signal. In such an implementation, it is generally not necessary to compute the excitation (prediction error) signal in the linear predictive analysis stage.
In other alternative embodiments, the excitation signal may be subjected to dynamics processing, such as dynamic range compression or limiting, or to non-linear waveform distortion to reduce its impulsiveness, and the artifacts associated with frequency transposition of signals with strongly tonal high-frequency components.
The output of the frequency translation processor, consisting of the high-frequency part of the input signal having its spectral envelope warped so that peaks in the envelope are translated to lower frequencies, and optionally scaled by a gain control, is combined with the original, unmodified signal to produce the output of the algorithm.
The present system provides the ability to govern in very specific ways the energy injected at lower frequencies according to the presence of energy at higher frequencies.
Time Domain Spectral Envelope Warping Example
In the time domain process of
It is understood that variations in process order and particular filters may be substituted in systems without departing from the scope of the present subject matter.
Frequency Domain Spectral Envelope Warping Example
In the frequency domain process of
The complex sub-band excitation signal, E(wk), and complex frequency response {H(wk)} are multiplied 1010 to provide a sampled warped spectral envelope signal in the frequency domain {X(wk)}. This sampled warped spectral envelope signal in the frequency domain {X(wk)} can be further processed in the frequency domain by other processes and ultimately converted into the time domain for transmission of processed sound according to one embodiment of present subject matter.
Examples of Combined Whitening and Shaping Filters
In some embodiments, computational savings can be achieved by combining the application of the all-zero FIR filter, to generate the prediction error signal, and the application of the all-pole warped spectral envelope filter to the excitation signal, into a single filtering step.
The all-pole spectral envelope filter is normally implemented as a cascade (or sequence) of second-order filter sections, so-called biquad sections or biquads. Those practiced in the art will recognize that, for reasons of numerical stability and accuracy, as well as efficiency, high-order recursive filters should be implemented as a cascade of low-order filter sections. In the implementation of an all-pole filter, each biquad section has only two poles in its transfer functions, and no (non-trivial) zeros. However, the zeros in the FIR filter can be implemented in the biquad sections along with the spectral envelope poles, and in this case, the FIR filtering step in the original frequency translation algorithm can be eliminated entirely. An example is provided by the system 1100 in
In
In one embodiment, the zeros corresponding to (unwarped) roots of the predictor polynomial should be paired in a single biquad section with their counterpart warped poles in the frequency translation algorithm. Since not all poles in the spectral envelope are transformed in the frequency translation algorithm (only complex poles above a specified knee frequency), some of the biquad sections that result from this pairing will have unity transfer functions (the zeros and unwarped poles will coincide). Since the application of these sections ultimately has no effect on a signal, they can be omitted entirely, resulting in computational savings and improved filter stability.
In the present frequency translation algorithm, the highpass splitting filter makes poles on the positive real axis uncommon, but it frequently happens that poles are found on the negative real axis (poles at the Nyquist frequency, or half the sampling frequency) and these poles should not be warped, but should rather remain real poles (at the Nyquist frequency) in the warped spectral envelope. Moreover, it may happen that a pole is found below the knee frequency in the warping function, and such a pole need not be warped. Poles such as these whose frequencies are not warped can be omitted entirely from the filter design. In the case of a predictor of order 8, for example, if one pole pair is found on the negative real axis, a 25% savings in filtering costs can be achieved by omitting one second order section. If additionally one of the poles is below the knee frequency, the savings increases to 50%.
In addition to achieving some computational savings, this modification may make the biquad filter sections more numerically stable. In some embodiments, for reasons of numerical stability and accuracy, filter sections including both poles and zeros are implemented, rather than only poles.
It is understood that the system of
In various embodiments, the processes for performing frequency translation depicted in the block 122 of
One approach that eases computational complexity is to find the line spectral frequencies that describe the predictor polynomial A(k). They are the angles of the roots of the palindromic and anti-palindromic polynomials defined by:
P(m)=A(m)+A(M+1−m), and
Q(m)=A(m)−A(M+1−m)
for m=0 . . . M, where M is the order of the polynomial A(k), and A(M+1) is equal to 0. The roots of these polynomials are guaranteed to lie on the unit circle in the complex plane, and therefore can be found using one-dimensional search techniques (rather than two dimensional searching, as is necessary to find the roots of A(k)). The original polynomial can be reconstructed as:
A(m)=(P(m)+Q(m))/2
The polynomials P and Q have at least two advantages over the predictor polynomial A. One advantage is that they are less sensitive to quantization errors. The corruption of the coefficients that occurs in quantization has little effect on the stability or shape of the polynomial function, whereas small errors in the coefficients of A may introduce large distortions in the spectral envelope, and may make the all-pole filter unstable (may move a pole outside the unit circle). Moreover, all the coefficients of P and Q are approximately equally sensitive to errors, whereas in the polynomial A, the higher order coefficients are much more sensitive to errors.
Another advantage that motivates their use in spectral envelope warping, is that all of the roots of both P and Q are on the unit circle in the Z-plane. For speech coding, this is an advantage, because it means that only the root frequencies need to be stored and transmitted (hence the term “line spectral frequencies”), the magnitudes are always unity. In our application, this property implies that the roots of these polynomials are very much easier to find than those of A itself. For example, the roots can be identified as the zeros in the magnitude of the discrete Fourier transform (or its efficient implementation, the FFT) of the polynomial coefficients. In this way, the precision with which the roots are found can be easily traded against computational cost through the length of the DFT (a longer DFT gives more precise root frequencies at the cost of more computation). Other one-dimensional search techniques can be employed to find the roots of the polynomials P and Q, since they are known to lie on the line that describes the unit circle in the complex plane. Such techniques for estimating the line spectral frequencies have been shown to be very efficient, and in the case of low-order polynomials, well-known closed-form solutions exist for computing the roots (such as the quadratic formula for computing roots of a second-order polynomial).
In this approach the process of spectral envelope warping is carried out in the line spectral domain, by transforming the line spectral frequencies, rather than the predictor polynomial root frequencies.
and after warping are
before warping, and
after warping. Clearly, the frequencies of the roots of P(k) are quite closely related to the frequencies of the poles of A(k), and therefore they undergo a very similar transformation. Thus, if a suitable transformation of the root frequencies of Q(k) can be identified, then spectral envelope warping can be performed on the line spectral pairs, which are easy to find, rather than the poles of the predictor polynomial itself.
Since the frequencies of the roots of P(k) correspond to the frequencies of the roots of A(k), it follows that the frequencies of the roots of Q(k) must correspond in some way to the magnitudes of the roots of A(k) (recall that the magnitudes of the roots of both P(k) and Q(k) are always unity). This relationship is found through the so-called “difference parameters,” the difference between the frequencies of the roots of P(k) and the nearest (in frequency) root of Q(k). The difference parameters for the example polynomials can be found to be:
before warping, and
after warping. It is known that smaller values of the difference parameters correspond to sharper peaks in the spectral envelope, and larger values to broader peaks. (The peaks in this example were all chosen to be fairly sharp to make them easier to see.) Note that the difference parameters are not much affected by the warping process.
In order to preserve the bandwidth of the spectral peaks, one could attempt to preserve, as nearly as possible, the difference parameters in the warping process, transforming only the frequencies of the roots of P(k), and re-computing the frequencies of the roots of Q(k) from the difference parameters. In some applications, it may not be considered necessary to preserve the original peak bandwidths, and in such cases, suitable difference parameters can be chosen arbitrarily, or chosen to satisfy some other properties of the warped spectral envelope (for example, they may be chosen to avoid unnaturally sharp peaks in the spectral envelope).
which is in good agreement with the frequencies of the poles obtained through the original warping procedure.
Various warping approaches are possible without departing from the scope of the present subject matter. In one approach, the line spectral frequencies are warped in the same way as the linear prediction frequencies. This has the effect of sharpening all of the poles of the reconstructed polynomial (moving them closer to the unit circle). In one alternative approach, the difference between the line spectral frequencies that bracket a pole are preserved in the warping. This tends to preserve the shape of the peaks in the spectral envelope, but can introduce problems with surrounding line spectral frequencies. This method highlights the added benefit of omitting extra line spectral frequencies from the warped set.
Another variation includes implementing only the spectral envelope peak finding function in the line spectral frequency domain. This can be done by computing the line spectral frequencies from B(n), estimating poles or biquad coefficients from the line spectral frequencies, and performing warping of the poles or biquad coefficients as set forth in the earlier embodiments.
Computing line spectral frequencies is relatively computationally quick and efficient compared to the earlier methods of finding roots of the LPC polynomial. The line spectral frequencies are not exactly the roots or poles of the spectral envelope, but pairs of line spectral frequencies bracket spectral envelope poles. Larger magnitude poles are more tightly bracketed by pairs of line spectral frequencies. In various applications, spectral envelope peaks are translated by translating the corresponding line spectral frequencies. Peaks can be sharpened by moving the corresponding line spectral frequencies closer together. In various applications, line spectral frequencies that do not bracket a pole can be eliminated.
It is understood that one variation of the present process includes, but is not limited to:
performing linear prediction on the input signal to get coefficients, hK
obtaining line spectral frequencies from the coefficients hK;
obtaining from the line spectral frequencies an estimate of the roots of the predictor polynomial described by the coefficients hK;
warping the resulting estimated roots; and
filtering the resulting input signal with a filter having the transfer function H(n)=B(n)/A(n),
where B(n) are the coefficients of a polynomial having roots equal to those estimated from the line spectral frequencies and A(n) are coefficients of a polynomial having roots equal to the warped estimated roots (found at, for example, block 908 of
It is understood that one variation of the present process includes, but is not limited to:
performing linear prediction on the input signal to get coefficients, hK
obtaining line spectral frequencies from the coefficients hK;
warping the line spectral frequencies; and
filtering the resulting input signal with a filter having the transfer function H(n)=B(n)/A(n),
where B(n) are the coefficients of the predictor polynomial (the coefficients hK for at, for example, block 904 of
In this variation, an N-order ARMA filter can be implemented directly, without conversion to biquad sections. In a variation of this approach, when constructing the warped line spectral frequencies some of the frequencies that do not correspond to poles can be optionally eliminated. This creates an A(n) of lower order than B(n). Further variations can remove the corresponding line spectral frequencies from the non-warped set to reduce the order of B(n).
It is understood that one variation of the present process includes a hybrid approach, which includes, but is not limited to:
performing linear prediction on the input signal to get coefficients, hK
obtaining line spectral frequencies from the coefficients hK;
warping the line spectral frequencies;
filtering the input signal with a FIR filter having coefficients hK (as shown, for example, in block 904 in
filtering the whitened excitation signal (for example, e(t) in
It is understood that variations in process order and particular conversions may be substituted in systems without departing from the scope of the present subject matter.
The present subject matter includes a method for processing an audio signal received by a hearing assistance device, including: filtering the audio signal to generate a high frequency filtered signal, the filtering performed at a splitting frequency; transposing at least a portion of an audio spectrum of the filtered signal to a lower frequency range by a transposition process to produce a transposed audio signal; and summing the transposed audio signal with the audio signal to generate an output signal, wherein the transposition process includes: estimating an all-pole spectral envelope of the filtered signal from a plurality of line spectral frequencies; applying a warping function to the all-pole spectral envelope of the filtered signal to translate the poles above a specified knee frequency to lower frequencies, thereby producing a warped spectral envelope; and exciting the warped spectral envelope with an excitation signal to synthesize the transposed audio signal. It also provides for the estimation of the line spectral frequencies estimated from a set of linear prediction coefficients. It also provides for application of warping functions to the line spectral frequencies. It also provides for scaling the transposed audio signal and summing the scaled transposed audio signal with the audio signal. It is contemplated that the filtering includes, but is not limited to high pass filtering or high bandpass filtering. In various embodiments, the estimating includes performing linear prediction. In various embodiments, the estimating is done in the frequency domain. In various embodiments the estimating is done in the time domain.
In various embodiments, the pole frequencies are translated toward the knee frequency and may be done so linearly using a warping factor or non-linearly, such as using a logarithmic or other non-linear function. Such translations may be limited to poles above the knee frequency.
In various embodiments, the excitation signal is a prediction error signal, produced by filtering the high-pass signal with an inverse of the estimated all-pole spectral envelope. The present subject matter in various embodiments includes randomizing a phase of the prediction error signal, including translating the prediction error signal to the frequency domain using a discrete Fourier Transform; randomizing a phase of components below a Nyquist frequency; replacing components above the Nyquist frequency by a complex conjugate of the corresponding components below the Nyquist frequency to produce a valid spectrum of a purely real time domain signal; inverting the DFT to produce a time domain signal; and using the time domain signal as the excitation signal. It is understood that in various embodiments the prediction error signal is processed by using, among other things, a compressor, peak limiter, or other nonlinear distortion to reduce a peak dynamic range of the excitation signal. In various embodiments the excitation signal is a spectrally shaped or filtered noise signal.
In various embodiments the system includes combining the transposed signal with a low-pass filtered version of the audio signal to produce a combined output signal, and in some embodiments the transposed signal is adjusted by a gain factor prior to combining.
The system also provides the ability to modify pole magnitudes and frequencies.
In various embodiments, the system includes different uses of line spectral frequencies to simplify computations of the frequency translation process.
In various embodiments, a system and method of frequency translation uses additive synthesis of frequency translation spectra. This approach enhances frequency translation processing to provide greater variation in the translated spectral envelope with different parameter settings, and more distinct and less confusable translated spectra for different translated sounds. The method includes, but is not limited to one or more of the following aspects: additive synthesis of spectral peaks by means of a single, modulated prototype spectrum having fixed spectral shape; fine control of the translated spectral envelope; and/or independence of the spectral envelope excitation signal from the input signal, specifically, from the low frequency input spectrum.
In various embodiments, a process from musical sound synthesis, known as bandwidth-enhanced additive synthesis (see for example, K. Fitz and L. Haken, “A new algorithm for bandwidth association in bandwidth-enhanced additive sound modeling,” in Proc. ICMC, 2000, which is incorporated herein by reference in its entirety, and see also K. Fitz, “The reassigned bandwidth-enhanced method of additive synthesis,” Thesis (PhD). University Of Illinois At Urbana-Champaign, 2000, which is incorporated herein by reference in its entirety) is adapted for the present system to synthesize translated spectral features. In such embodiments, the envelopes of individual tones in a sinusoidal synthesis model are modulated by narrowband noise to produce noisy sinusoids. This allows noisy sounds, like flutes and clarinets, to be synthesized using a simple additive algorithm.
In this approach the narrowband noise is modulated by a tone to translate it to a desired spectral region. The frequency of the tone becomes the center frequency of a synthesized noise band, and the amplitude of the tone becomes the peak spectral magnitude of the noise band. A noise signal is filtered to obtain a narrowband, lowpass noise signal. This narrowband noise can be considered a “prototype” spectral peak that will be replicated at the desired feature center frequencies. In various embodiments, the approach is performed in the time domain. In various embodiments, the approach is performed in the frequency domain or subband domain. In limited bandwidth implementations of the present subject matter some computation can be conserved in its generation. In various embodiments, random samples can be generated at a highly decimated rate and upsampled by simple interpolation or smoothing. Some embodiments can require that only a single fixed-coefficient prototype lowpass filter be generated (rather than updating filter coefficients at each block for each spectral peak, as in other approaches), and since the excitation for that filter is already lowpass (being generated at a lower rate), the stopband constraints on the filter can be relaxed (allowing the use of a cheaper, lower-order filter). An example prototype spectral peak is shown in
In addition to potential computational efficiencies achieved by synthesizing a single prototype spectral peak, the present subject matter offers other potential advantages. In various embodiments, the synthesized spectrum does not depend on the presence of low frequency energy in the input signal. In other approaches, the algorithm suppresses low frequency energy in the input with the splitting filter, and then boosts it again with the translated spectral envelope filter. This has consequences for the sound quality of the translated sound, since one can consider that the suppressed low frequency noise was represented in a small number of bits before being amplified by the spectral envelope filter. Under such other approaches, this situation can be aggravated when the input signal has little low frequency energy to begin with. Some dependence on the input signal level is useful, but an /s/ sound, for example, may have very little low frequency energy to excite the spectral envelope filter. In contrast, the embodiments of the present subject matter allow more precise and reliable control over the frequency at which energy is introduced, and the amount of energy introduced, because the prototype spectral peak is synthesized, and does not need to be generated from the input signal.
Another consequence is that the dominant spectral effect appears just below the splitting filter cutoff frequency. Although the algorithm produces audible changes, its effect is mostly insensitive to input or algorithm parameters because the synthesis is confined mainly to the rising part of the splitting filter response. This can make the algorithm somewhat difficult and confusing to fit.
In such embodiments of the present approach, the bandwidth and shape of the rendered spectral peaks does not vary with the magnitude of the translated peak. Two (or more) peaks at different and arbitrary magnitudes can be produced without changing the shape of either (as shown in
Relation to Bandwidth-enhanced Additive Synthesis
In bandwidth-enhanced additive synthesis (see for example, K. Fitz and L. Haken, “A new algorithm for bandwidth association in bandwidth-enhanced additive sound modeling,” in Proc. ICMC, 2000, which is incorporated herein by reference in its entirety, and see also K. Fitz, “The reassigned bandwidth-enhanced method of additive synthesis,” Thesis (PhD). University Of Illinois At Urbana-Champaign, 2000, which is incorporated herein by reference in its entirety), sinewaves in an additive synthesizer are amplitude-modulated using narrowband noise to add noise energy to the synthesized sound in the frequency neighborhood of the sinewave, using a synthesis equation like
yt=(At+Itζt)cos(ωtt+θ)
in which At is the time-varying carrier amplitude, It is the time-varying modulation index, ζt is the bandlimited noise modulator, and ωt is the time-varying carrier (center) frequency. The amount of noise added to the spectrum is governed by the modulation index, and when It is zero, the synthesized component is a pure sinusoid at frequency ωt (no noise is introduced at that component's frequency). The amplitude of the pure tone is governed by the carrier amplitude At. The effect is demonstrated in
In the present approach, the intended application and purpose are different, and these differences motivate a somewhat different implementation. In the music synthesis application, stochastic modulation was used to add a little noisiness to a sound that was otherwise composed of pure sinusoids. In the present method, the tone at frequency ωt (see the above equation) is not desired in the frequency translation output. This approach is not interested in hearing noisy sinewaves, so the implementation is not necessarily balancing noise energy and tonal energy. The present method in various embodiments uses modulation to place noise energy at a specific desired center frequency. Therefore, the modulation index is always unity and the carrier is always suppressed, because the pure sinusoid is not desired in the frequency translation output. The synthesis equation for each component is therefore reduced to modulation (multiplication) of narrowband noise by at tone at the desired center frequency, as in
yt=Mnζt cos (ωnt).
Here, Mn and ωn are the magnitude and center frequency of the translated spectral envelope peak, which are estimated and updated each block in various embodiments (rather than each sample), and ζt is the prototype bandlimited noise. The final synthesized signal, the output of the frequency translation algorithm, is a sum of these modulated noise components, as in
where K is the number of estimated spectral envelope peaks (2 for example in certain other frequency translation algorithms).
In such embodiments of the present method, modulation is an efficient way to synthesize a noise spectrum, by adding up bandlimited noise components. If, instead, those bandlimited noise components were generated using bandpass filters with varying center frequencies, far more computation would be required just to update the time-varying filter coefficients. In such methods, a single pre-calculated and optimized filter may be used for all synthesized spectral envelope peaks, and the output of that single filter is multiplied by sinewaves at the component center frequencies (and having the desired component amplitudes).
In various embodiments, variations include algorithms where a noise-like excitation signal that is independent of the input signal spectrum is used to excite the spectral envelope filter used the earlier-described frequency translation algorithm approaches. One disadvantage to this approach, relative to the present approach, is that the excitation signal needs to have sufficient bandwidth to excite all parts of the spectral envelope filter that might have significant magnitude, so the synthesis of the noise sequence cannot be decimated to the same extent. The prototype noise in the proposed algorithm could be as narrow as, say 500 Hz, but the full bandwidth of the spectral envelope filter could be 2 to 3 kHz. Using a narrower prototype noise also controls the spread of spectral energy.
Composition of the translated spectral envelope can be achieved by modifying the earlier-described algorithms to use fixed bandwidth second order (biquad) sections and following each one with a gain scale. In this scheme, the biquad filter center frequencies and the gain scales would be updated, but the bandwidth (or Q factor) would remain constant.
Alternatively, the excitation signal could be constructed using a flat magnitude spectrum and the phase spectrum of the input signal, thereby preserving the fine structure of the input signal. This is a more aggressive whitening of the input signal than is achieved in the linear prediction decomposition currently used. This whitened signal could then excite a bank of fixed-bandwidth filters to produce the translated signal.
The present subject matter includes hearing assistance devices, including, but not limited to, cochlear implant type hearing devices, hearing aids, such as behind-the-ear (BTE), in-the-ear (ITE), in-the-canal (ITC), or completely-in-the-canal (CIC) type hearing aids. It is understood that behind-the-ear type hearing aids may include devices that reside substantially behind the ear or over the ear. Such devices may include hearing aids with receivers associated with the electronics portion of the behind-the-ear device, or hearing aids of the type having a receiver in-the-canal. Such devices may also be referred to as receiver-in-the-canal (RIC) or receiver-in-the-ear (RITE) devices. It is understood that other hearing assistance devices not expressly stated herein may fall within the scope of the present subject matter
It is understood one of skill in the art, upon reading and understanding the present application will appreciate that variations of order, information or connections are possible without departing from the present teachings. This application is intended to cover adaptations or variations of the present subject matter. It is to be understood that the above description is intended to be illustrative, and not restrictive. The scope of the present subject matter should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
The present application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application 61/660,466, filed Jun. 15, 2012, and claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application 61/662,738, filed Jun. 21, 2012, the disclosures of which are both hereby incorporated by reference herein in their entirety. This application is related to U.S. Provisional Patent Application Ser. No. 61/175,993, filed on May 6, 2009, which is incorporated herein by reference in its entirety. This application is related to U.S. patent application Ser. No. 12/774,356, filed May 5, 2010, which is incorporated herein by reference in its entirety. This application is related to U.S. patent application Ser. No. 12/043,827, filed on Mar. 6, 2008, (now U.S. Pat. No. 8,000,487) which is incorporated herein by reference in its entirety. This application is related to U.S. patent application Ser. No. 13/208,023, filed on Aug. 11, 2011, which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5014319 | Leibman | May 1991 | A |
5771299 | Melanson | Jun 1998 | A |
6169813 | Richardson et al. | Jan 2001 | B1 |
6577739 | Hurtig et al. | Jun 2003 | B1 |
6980665 | Kates | Dec 2005 | B2 |
7248711 | Allegro et al. | Jul 2007 | B2 |
7580536 | Carlile et al. | Aug 2009 | B2 |
8000487 | Fitz et al. | Aug 2011 | B2 |
8526650 | Fitz | Sep 2013 | B2 |
20040264721 | Allegro et al. | Dec 2004 | A1 |
20060247922 | Hetherington et al. | Nov 2006 | A1 |
20060247992 | Hetherington et al. | Nov 2006 | A1 |
20060253209 | Hersbach et al. | Nov 2006 | A1 |
20080215330 | Harma et al. | Sep 2008 | A1 |
20090226016 | Fitz | Sep 2009 | A1 |
20100284557 | Fitz | Nov 2010 | A1 |
20120177236 | Fitz et al. | Jul 2012 | A1 |
Number | Date | Country |
---|---|---|
WO-0075920 | Dec 2000 | WO |
WO-2007010479 | Jan 2007 | WO |
WO-2007135198 | Nov 2007 | WO |
Entry |
---|
U.S. Appl. No. 12/043,827, Notice of Allowance mailed Jun. 10, 2011, 6 pgs. |
U.S. Appl. No. 12/774,356, Non Final Office Action mailed Aug. 16, 2012, 6 pgs. |
U.S. Appl. No. 12/774,356, Notice of Allowance mailed Jan. 8, 2013, 5 pgs. |
U.S. Appl. No. 12/774,356, Notice of Allowance mailed May 1, 2013, 6 pgs. |
U.S. Appl. No. 12/774,356, Response filed Dec. 17, 2012 to Non Final Office Action mailed Aug. 16, 2012, 8 pgs. |
U.S. Appl. No. 13/208,023, Final Office Action mailed Nov. 25, 2013, 5 pgs. |
U.S. Appl. No. 13/208,023, Non Final Office Action mailed May 29, 2013, 5 pgs. |
U.S. Appl. No. 13/208,023, Notice of Allowance mailed Feb. 10, 2014, 5 pgs. |
U.S. Appl. No. 13/208,023, Response filed Jan. 27, 2014 to Final Office Action mailed Nov. 25, 2013, 7 pgs. |
U.S. Appl. No. 13/208,023, Response filed Sep. 30, 2013 to Non Final Office Action mailed May 29, 2013, 6 pgs. |
European Application Serial No. 09250638.5, Extended Search Report mailed Jan. 20, 2012, 8 pgs. |
European Application Serial No. 10250883.5, Office Action mailed Jan. 23, 2012, 8 pgs. |
Fitz, Kelly, et al., “A New Algorithm for Bandwidth Association in Bandwidth-Enhanced Additive Sound Modeling”, International Computer Music Conference Proceedings, (2000), 4 pgs. |
Fitz, Kelly Raymond, “The Reassigned Bandwidth-Enhanced Method of Additive Synthesis”, (1999), 163 pgs. |
Hermansen, K., et al., “Hearing aids for profoundly deaf people based on a new parametric concept”, Applications of Signal Processing to Audio and Acoustics, 1993; Final Program and Paper Summaries, 1993, IEEE Workshop on, vol., Iss. Oct. 17-20, 1993, 89-92. |
Kong, Ying-Yee, et al., “On the development of a frequency-lowering system that enhances place-of-articulation perception”, Speech Commun., 54(1), (Jan. 1, 2012), 147-160. |
Kuk, F., et al., “Linear Frequency Transposition: Extending the Audibility of High-Frequency Information”, Hearing Review, (Oct. 2006), 5 pgs. |
Makhoul, John, “Linear Prediction: A Tutorial Review”, Proceedings of the IEEE, 63, (Apr. 1975), 561-580. |
McDermott, H., et al., “Preliminary results with the AVR ImpaCt frequency-transposing hearing aid”, J Am Acad Audiol., 12(3), (Mar. 2001), pp. 121-127. |
McDermottt, H., et al., “Improvements in speech perception with use of the AVR TranSonic frequency-transposing hearing aid.”, Journal of Speech, Language, and Hearing Research, 42(6), (Dec. 1999), 1323-1335. |
McLoughlin, Ian Vince, et al., “Line spectral pairs”, Signal Processing, Elsevier Science Publisher B.V. Amsterdam, NL, vol. 88, No. 3, (Nov. 14, 2007), 448-467. |
Posen, M. P, et al., “Intelligibility of frequency-lowered speech produced by a channel vocoder”, J Rehabil Res Dev., 30(1), (1993), 26-38. |
Sekimoto, Sotaro, et al., “Frequency Compression Techniques of Speech Using Linear Prediction Analysis-Synthesis Scheme”, Ann Bull RILP, vol. 13, (Jan. 1, 1979), 133-136. |
Simpson, A., et al., “Improvements in speech perception with an experimental nonlinear frequency compression hearing device”, Int J Audiol., vol. 44(5), (May 2005), 281-292. |
Turner, C. W., et al., “Proportional frequency compression of speech for listeners with sensorineutral hearing loss”, J Acoust Soc Am., vol. 106(2), (Aug. 1999), 877-86. |
Number | Date | Country | |
---|---|---|---|
20130336509 A1 | Dec 2013 | US |
Number | Date | Country | |
---|---|---|---|
61662738 | Jun 2012 | US | |
61660466 | Jun 2012 | US |