This document relates to speech enhancement in a breathing apparatus.
There are numerous situations which require the use of a breathing apparatus such as the absence of a breathable atmosphere or the potential for this condition. An exemplary breathing apparatus consists of a face mask with a regulator that supplies air from a high pressure hose on demand from the user. The high pressure hose is usually connected to an air tank. When the pressure in the air tank falls below a set level, a low air alarm is generated to warn the user. A common low air alarm is generated by a valve in the regulator which releases pulses of air which can easily be sensed by the user. These pulses of air can produce pressure levels inside the mask which exceed the user's voice pressure levels. These high levels of pressure can act as interfering noise that can make tasks such as communication or automatic speech recognition more difficult.
A second source of interfering noise results from the turbulence of the air or gas released into the breathing mask by the regulator during inhalation. Inhalation noise may be reduced by turning a microphone off when the pressure drops.
Inhalation noise may be detected and attenuated by measuring the frequency response of a breathing mask to determine resonances and antiresonances, and by acting on this information.
In one aspect, generally, a breathing apparatus speech enhancement system includes a breathing mask, a primary sensor which produces a primary signal, and at least one reference sensor which produces a reference signal. A processor combines the sensor signals to produce an output signal with an enhanced speech component.
Implementations may include one or more of the following features. For example, each of the primary sensor and the reference sensor may be a microphone, such as a microphone of the noise canceling or gradient type.
The primary sensor may be mounted on the breathing mask so as to be near the mouth of a user wearing the breathing mask. When the breathing mask includes a voice port, the primary sensor may be mounted externally to the mask near the voice port.
A reference sensor may be mounted near a noise source, such as the user's mouth. The breathing mask may include a breath screen to shield at least one reference sensor to reduce the impact of air flow from the user's mouth.
The system may include a wireless transmitter connected to transmit the primary signal and/or the reference signal wirelessly.
The system may be incorporated in a communication system and may further include a speech recognition system configured to process the output signal with the enhanced speech component
The processor may employ a filter to filter the reference signal, and may subtract the filtered reference signal from the primary signal to produce the output signal. The processor may update the filter based on the output signal and the reference signal. The processor may do so in a transform domain to improve a convergence rate of the filter.
The system may employ techniques for detecting the exclusive presence of an alarm signal. For example, the processor may detect the exclusive presence of an alarm signal by receiving the primary signal, determining the energy of the primary signal, determining a peak count of the number of consecutive energy samples below a first threshold, and determining a valley count of the number of consecutive energy samples above a second threshold. The processor then determines an alarm count of the number of consecutive samples for which the peak count and valley count are below a third threshold, and declares the exclusive presence of the alarm signal when the alarm count exceeds a fourth threshold. The processor may be configured to only update the filter upon detecting the exclusive presence of an alarm signal.
More general systems and techniques for detecting the exclusive presence of an alarm signal may be provided. For example, a method for such detection may include receiving a digitized audio signal, determining the energy of the digitized audio signal, determining a peak count of the number of consecutive energy samples below a first threshold, determining a valley count of the number of consecutive energy samples above a second threshold, determining an alarm count of the number of consecutive samples for which the peak count and valley count are below a third threshold, and declaring the exclusive presence of the alarm signal when the alarm count exceeds a fourth threshold. A system for such detection may include a processor configured to perform the method described above.
The system also may employ triple filter noise cancellation techniques to achieve improved noise cancellation performance through reduction of filter maladaptation. For example, the processor may filter the reference signal with an output filter to produce an output filtered reference signal and subtract the output filtered reference signal from the primary signal to produce an output signal. The processor also may filter the reference signal with an evaluation filter to produce an evaluation filtered reference signal, and subtract the evaluation filtered reference signal from the primary signal to produce an evaluation signal. Finally, the processor may filter the reference signal with an update filter to produce an update filtered reference signal, subtract the update filtered reference signal from the primary signal to produce an update signal, modify the update filter based on the reference signal and the update signal, modify the evaluation filter based on the update filter, and modify the output filter based on the output signal and the evaluation signal.
More general systems and techniques for triple filter noise cancellation may be provided. For example, a method for such noise cancellation may include receiving a digitized primary audio signal, receiving at least one digitized reference audio signal, filtering the at least one reference signal with an output filter to produce an output filtered reference signal, subtracting the output filtered reference signal from the primary signal to produce an output signal, filtering the at least one reference signal with an evaluation filter to produce an evaluation filtered reference signal, subtracting the evaluation filtered reference signal from the primary signal to produce an evaluation signal, filtering the at least one reference signal with an update filter to produce an update filtered reference signal, subtracting the update filtered reference signal from the primary signal to produce an update signal, modifying the update filter based on the reference signal and the update signal, modifying the evaluation filter based on the update filter, and modifying the output filter based on the output signal and the evaluation signal.
The update filter may be modified only when the exclusive presence of a noise signal is declared, such as by using the techniques above.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.
In some applications, such as retrofitting an existing breathing mask with sensors, it may be desirable to avoid penetration of the mask by cable 18. One method of achieving this objective is to connect the sensors to a wireless transmitter mounted interior to the mask. The primary and reference signals are then transmitted to a wireless receiver external to the mask which is connected to a processor.
Another method of avoiding mask penetration is to mount the sensors external to the mask. An exemplary location for the primary sensor 13 is near the external portion of voice port 19. An exemplary location for the reference sensor 15 is near demand regulator 12.
The filtered reference signal produced by the filter 51 is then removed from the primary signal using subtraction unit 52 to produce output signal e(n).
Filter update unit 53 updates the filter coefficients h(n, m) based on the primary signal y(n), the reference signal x(n), and the output signal e(n). A simple normalized least mean squares (NLMS) filter update is given by
where μ is the step size with an exemplary value of
is an estimate of the variance of x(n). An estimate for σx(n) is
σx(n)=max(
where the function max(a, b) returns the maximum of a or b, σmin has an exemplary value of 0.01, and
where α has an exemplary value of 0.01 and β has an exemplary value of 0.0625. Estimating σx(n) rather than σx2(n) reduces the dynamic range of the estimated parameter and leads to reduced computation or better performance for a fixed word length implementation.
In order to prevent maladaptation of the filter when speech is present, a detector is necessary for the condition where only noise is present. A Low Air Alarm Only (LAAO) detector operates by first computing the energy in the reference signal
where an exemplary value for the block size L is 80 samples. An example of the energy γ(n) is shown in
The energy γ(n) is compared to a threshold Tp and a peak count Np(n) of the number of consecutive samples below threshold is maintained
where S1 is the update interval with an exemplary value of 10 samples. The update interval S1 may be larger than 1 without loss due to the rectangular low pass filter of length L applied to estimate the energy in Equation 5. The threshold Tp has an exemplary value of 2.0.
The energy γ(n) is compared to a threshold Tv and a valley count Nv(n) of the number of consecutive samples above threshold is maintained
The threshold Tv has an exemplary value of 0.1.
The counts Np(n) and Nv(n) are compared to threshold Tn to update LAAO count Na(n)
where the threshold Tn has an exemplary value of 500.
The convergence rate for the NLMS filter update depends on the eigenvalue spread of the covariance matrix of x(n). When x(n) is white noise, the eigenvalue spread is minimal and convergence is rapid. However, the internal reflections of the acoustic signals within the breathing mask produce resonances and antiresonances or poles and zeros in the frequency response which can produce a large spread in the eigenvalues and a consequent slow convergence rate.
One method of improving the convergence rate is to transform the signals to the frequency domain using the Discrete Fourier Transform (DFT) before updating the filter. This allows normalization by the variance estimate at each DFT frequency which effectively reduces the eigenvalue spread and increases the convergence rate. The filter update is computed by
h(n+S,m)=h(n,m)+μ1g(n,m) (9)
where S is an update block size with an exemplary value of 80 samples, μ1 is a step size with an exemplary value of 0.1, and g(n, m) is the inverse DFT of G(n, k) computed by
where K, the DFT length, has an exemplary value of 256.
The frequency domain update G(n, k) is computed by
where X(n,k) is a Short Time Fourier Transform (STFT) of x(n)
and E*(n, k) is the complex conjugate of a STFT of e(n)
The variance σx2(n, k) may be estimated as follows
(n,k)=max((|Xr(n,k)|+|Xi(n,k)|),σmin) (14)
Estimating σx(n, k) rather than σx2(k, n) reduces the dynamic range of the estimated parameter and leads to reduced computation or better performance for a fixed word length implementation.
When low amplitude speech is present, such as at the start of a phrase, the LAAO detector may not properly indicate that filter adaptation should be disabled. This can lead to small maladaptations of the filter which reduces noise cancellation performance.
Filter update unit 107 monitors signals e0(n), e1(n), e2(n), x(n), and y(n) to decide how to update filters h0(n, k), h1(n, k), and h2(n, k). First, the estimated standard deviations σe
Then, filter update unit 107 updates h2(n, m) in a manner similar to the single filter ANC discussed above with reference to Equation 9:
h
2(n+S,m)=h2(n,m)+μ1g(n,m) (18)
The other filters are updated based on the estimated standard deviations σe
The filter update unit 107 starts the triple filter update at step 111 and executes the triple filter update at an interval of T samples, where T has an exemplary value of 2000. It should be noted that if a filter update is not explicitly encountered in the flow chart, then the new value hp(n, m) should be set to the previous value hp(n−T, m). At step 112, the unit 107 compares the LAAO count Na(n) to the threshold Ta. If the LAAO count is greater than the threshold, the unit 107 executes step 113. Otherwise, the unit 107 proceeds to step 117.
At step 113, the unit 107 compares the estimated standard deviations σe
At step 114, the unit 107 sets the coefficients of the output filter h0(n, m) to the coefficients of the previous version of the evaluation filter h1(n−T, m) since h1(n−T, m) produces a lower estimated standard deviation. At step 114, the unit 107 also sets σe
At step 115, the unit 107 sets the coefficients of the evaluation filter h1(n, m) to the coefficients of the update filter h2(n, m) so that the most recent filter update may be evaluated. Step 116 signifies the end of this update. At step 117, the unit 107 sets all of the filters to the previous value of the output filter h0(n−T, m) to prevent maladaptations in h1(n, m) and h2(n, m) from reaching the output filter h0(n, m). The unit 107 also updates the estimated standard deviations appropriately.
Other implementations are within the scope of the following claims.