1. Field of the Invention
The present invention relates to a signal processing method, and more particularly to a signal processing method in cochlear implant.
2. Description of Related Art
Cochlear implant is a surgically implanted electronic device that provides a sense of sound to hearing loss patients. The tremendous progress of the cochlear implant technologies has enabled many hearing loss patients to enjoy high level of speech understanding quality
Noise reduction and signal compression are critical stages in the cochlear implant. For example, a conventional cochlear implant comprising multiple microphones can enhance the sensed speech volume. However, noises in the sensed speech are also amplified and compressed so as to affect the speech understanding of the hearing loss patient. Besides, the multiple microphones increase hardware cost.
An objective of the present invention is to provide a signal processing method in cochlear implant for computationally compressing an input speech signal into a predetermined amplitude range. The signal processing method is performed by a speech processor and comprises a noise reduction stage and a signal compression stage.
The noise reduction stage comprises:
The signal compression stage comprises:
Another objective of the present invention is to provide a signal processing method in cochlear implant. The signal processing method is performed by a speech processor having a noise reduction unit and a signal compressor. The signal compressor has a compression unit, a boundary calculation unit, and a compression-factor-providing unit. The signal processing method comprises a noise reduction stage and a signal compression stage.
The noise reduction stage is performed by the noise reduction unit and comprises:
The signal compression stage is performed by the signal compressor and comprises:
Based on the signal processing method of the present invention, the noise reduction stage can efficiently reduce noise in the electrical speech signal of the normal speech, and the signal compression stage can perform good signal compression to enhance signals to stimulate cochlear nerves of a hearing loss patient, such that the hearing loss patient can well understand the normal speech. The present invention performs the noise reduction stage and the signal compression stage to improve performance of the cochlear implant instead of using multiple microphones. Compared with the conventional cochlear implant with multiple microphones, the present invention would not increase the hardware cost.
Embodiments of the present invention are described in detail as follows.
With reference to
The microphone 11 is an acoustic-to-electric transducer that converts a normal speech in air into an electrical speech signal. The speech processor 12 receives the electrical speech signal and converts the electrical speech signal into multiple output sub-speech signals in different frequencies. The transmitter 13 receives the output sub-speech signals from the speech processor 12 and wirelessly sends the output sub-speech signals to the receiver 14. The pulse generator 15 receives the output sub-speech signals from the receiver 14 and generates different electrical pulses based on the output sub-speech signals to the electrode array 16. The electrode array 16 has multiple electrodes 161 respectively and electrically connected to different cochlear nerves of the hearing loss patient's inner ear. Therefore, the electrodes 161 respectively output the electrical pulses to stimulate the cochlear nerves, such that the patient can hear something approximating the normal speech.
In more details, with reference to
Based on the above configuration, the band-pass filter 121 of each one of the channels sequentially receives the frames of the electrical speech signal from the noise reduction unit 126. The band-pass filter 121 of each one of the channels can preserve elements of each one of the frames of the electrical speech signal within a specific frequency band and remove elements beyond the specific frequency band from such frame. The specific frequency bands of the band-pass filters 121 of the channels are different from each other. Afterwards, the amplitude envelopes of the frames of the electrical speech signal are detected by the envelope detection units 122 and provided to the signal compressors 123 respectively.
The present invention relates to a noise reduction stage performed by the noise reduction unit 126 and a signal compression stage performed by the signal compressor 123. The noise reduction stage and the signal compression stage are respectively described below.
1. Noise Reduction Stage
The noise reduction unit 126 can be performed in a DDAE (deep denoising autoencoder)-based NR (noise reduction) structure. The DDAE-based NR structure is widely used in building a deep neural architecture for robust feature extraction and classification. In brief, with reference to
The input layer 21 receives an electrical speech signal y from the microphone 11 and segments the electrical speech signal y into a first noisy frame y1, a second noisy frame y2, . . . , a t-th noisy frame yt, . . . , and a T-th noisy frame yT, wherein T is a length of the current utterance. In other words, the present invention may segment an input speech signal, such as the electrical speech signal y, into a plurality of time-sequenced frames, such as the noisy frames y1, y2. . . , and yT.For the elements in the t-th noisy frame yt, the noise reduction unit 126 reduces noise in the t-th noisy frame yt to form a t-th clean frame xt. Afterwards, the output layer 23 sends the t-th clean frame xt to the channels of the speech processor 12 respectively.
A relationship between the t-th noisy frame yt and the t-th clean frame xt can be represented as:
xt=W2h(yt)+b2 (1)
where
h(yt) is a function including W1 and b1 in time domain;
W1 and W2 are default connection weights in time domain; and
b1 and b2 are default vectors of biases of the hidden layers 22 of the DDAE-based NR structure in time domain.
Besides, in another embodiment, the relationship between the t-th noisy frame yt and the t-th clean frame xt can be represented as
xt=InvF{(W2′h′(F{yt})+b2′)} (2)
where
F { } is a Fourier transform function to transfer the t-th noisy frame yt, from time domain to frequency domain;
h′( ) is a function including W1′ and b1′;
W1′ and W2′ are default connection weights in frequency domain;
b1′ and b2′ are default vectors of biases of the hidden layers 22 of the DDAE-based NR structure in frequency domain; and
InvF { } is an inverse Fourier transform function to obtain the t-th clean frame xt.
According to experimental result, the t-th clean frame xt deduced from the Fourier transform and the inverse-Fourier transform as mentioned above has better performance than which without the Fourier transform and the inverse-Fourier transform.
For the time domain based method as shown in equation (1), h(yt) can be represented as:
For the frequency domain based method as shown in equation (2), h′ (F {yt}) can be represented as:
Regarding the parameters including W1, W2, b1, and b2 in time domain or W1′, W2′, b1′, and b2′ in frequency domain, they are preset in the speech processor 12.
The parameters including W1, W2, b1, and b2 of equation (1) and equation (3) are optimized based on the following objective function:
In equation (5), θ is a parameter set {W1, W2, b1, b2}, T′ is a total number of the clean frames u1, u2, . . . , uT′, and η is a constant used to control the tradeoff between reconstruction accuracy and regularization on connection weights (for example, η can be set as 0.0002). The training data including the clean frames u1, u2, . . . , uT′ and the training parameters of W1-test, W2-test, b1-test and b2-test can be substituted into the equation (1) and equation (3) to obtain a reference frame ūt. When the training parameters of W1-test, W2-test, b1-test, and b2-test can make the reference frame ūt, mostly approximate the clean frames ut, such training parameters of W1-test, W2-test, b1-test, and b2-test are taken as the parameters of W1, W2, b1, and b2 of equation (1) and equation (3). Besides, when the noisy speech sample v approximates the electrical speech signal y, the training result of the parameters of W1, W2, b1, and b2 can be optimized. The optimization of equation (5) can be solved by using any unconstrained optimization algorithm. For example, a Hessian-free algorithm can be applied in the present invention.
After training, optimized parameters including W1, W2, b1, and b2 are obtained to be applied to equation (1) and equation (3) for real noise reduction application.
Besides, in frequency domain, the parameters including W1′, W2′, b1′, and b2′ of equation (2) and equation (4) are optimized based on the following objective function:
In equation (6), θ is a parameter set {W1′, W2′, b1′, b2′}, T′ is a total number of the clean frames u1, u2, . . . , uT′, and η is a constant used to control the tradeoff between reconstruction accuracy and regularization on connection weights (for example, η can be set as 0.0002). The training data including the clean frames u1, u2, . . . , uT′ and the training parameters of W1-test′, W2-test′, b1-test′, and b2-test′ can be substituted into the equation (2) and equation (4) to obtain a reference frame ūt. When the training parameters of W1-test′, W2-test′, b1-test′, and b2-test′ can make the reference frame ūt, mostly approximate the clean frames ut, such training parameters of W1-test′, W2-test′, b1-test′, and b2-test′ are taken as the parameters of W1′, W2′, b1′, and b2′ of equation (2) and equation (4). Besides, when the noisy speech sample v approximates the electrical speech signal y, the training result of the parameters of W1′, W2′, b1′, and b2′ can be optimized. The optimization of equation (6) can be solved by using any unconstrained optimization algorithm. For example, a Hessian-free algorithm can be applied in the present invention.
After training, optimized parameters including W1′, W2′, b1′, and b2′ are obtained to be applied to equation (2) and equation (4) for real noise reduction application.
With reference to
2. Signal Compression Stage
With reference to
The signal compressor 123 of the present invention comprises a compression unit 127, a boundary calculation unit 128, and a compression-factor-providing unit 129. The compression unit 127 and the boundary calculation unit 128 are connected to the envelope detection unit 122 to receive the amplitude envelope 30 of the t-th clean frame xt, real-time. With reference to
UB=
LB=
where α0 is an initial value.
The compression unit 127 receives the amplitude envelope 30 of the t-th clean frame xt, and outputs a t-th output frame zt. Inputs of the compression-factor-providing unit 129 is connected to an input of the compression unit 127, an output of the compression unit 127, and an output of the boundary calculation unit 128 to receive a calculating result of the upper boundary UB, the lower boundary LB, and the t-th output frame zt. An output of the compression-factor-providing unit 129 is connected to the input of the compression unit 127, such that the compression-factor-providing unit 129 provides a compression factor αtto the compression unit 127. The compression factor αt is determined according to a previous compression factorαt−1, the upper boundary UB, the lower boundary LB, and the t-th output frame zt. In brief, the present invention may determine the compression factor αt for a frame based on the frame's amplitude upper boundary UB and lower boundary LB. When the t-th output frame zt is in a monitoring range between the upper boundary UB and the lower boundary LB, the compression factor αt is expressed as:
αt=αt−1+Δα1 (9)
where Δα1 is a positive value (i.e., Δα1=1).
In contrast, when the t-th output frame zt is beyond the monitoring range, the compression factor αt is expressed as:
αt=αt−1+Δα2 (10)
where Δα2 is a negative value (i.e., Δα2=−0.1).
The t-th output frame zt can be expressed as:
zt=αt×(xt−
where
According to equations (9) and (10), a present compression factor αt is obtained by a previous compression factor αt−1. It is to be understood that the present invention may modify the compression factorαt, for the next frame based on the next frame's amplitude upper boundary UB and lower boundary LB. According to equation (11), the t-th output frame zt, is repeatedly adjusted by the t-th clean frame xt, and the calculation result of UB, LB, and αt,. According to experimental result, the signal compression capability is good. As illustrated in
Number | Name | Date | Kind |
---|---|---|---|
7027987 | Franz | Apr 2006 | B1 |
20040172242 | Seligman | Sep 2004 | A1 |
20050111683 | Chabries | May 2005 | A1 |
20090187065 | Basinger | Jul 2009 | A1 |
Number | Date | Country | |
---|---|---|---|
20170056654 A1 | Mar 2017 | US |