1. Technical Field
This invention relates to signal processing systems. In particular, this invention relates to a signal processing system which imparts a measure of robustness against tonal noise to other signal processing systems.
2. Related Art
Most if not all signal processing systems must intelligently handle input signal noise. The input signal noise may mask, corrupt, distort or otherwise detrimentally affect desired components of the input signal. Input signal noise also may mimic desired input signal components and increase the difficulty of identifying, removing, or compensating for the input signal noise, regardless of the signal processing system or its purpose.
Tonal noise is one form of noise which mimics desired input signal components in some applications. For example, speech processing systems commonly detect and process voice signal components which contain harmonic activity. Vowel sounds and certain consonants exhibit characteristic tonal content which the processing system employs to determine when an individual is speaking, what they are speaking, or other characteristics of the speech.
A speech processing system which examines an input signal for desired signal content may interpret the tonal noise as speech, may isolate a segment of the input signal with the tonal noise, and may attempt to process the tonal noise. The speech processing system consumes valuable computational resources not only to isolate the segment, but also to process the segment and take action based on the result of the processing. In a speech recognition system, the system may interpret the tonal noise as a voice command, execute the spurious command, and responsively take actions that were never intended.
There is a need for a system that provides tonal noise robustness for signal processing systems.
This invention provides a pre-processing system which mitigates or eliminates detection of tonal noise as a signal component for further processing. The pre-processing system produces an output signal which may be more reliably analyzed by any downstream processing system. The output signal suppresses tonal noise, while maintaining desired signal content. Downstream processing systems are less likely to mistake tonal input signal noise for desired signal content, to needlessly consume computational resources, and to take actions that are not called for by the input signal content.
A pre-processing system includes a memory and a processor coupled to the memory. The memory stores a smoothing program, a background noise estimate, and a blending program. The smoothing program applies an attenuation to signal peaks in an input signal to generate a smoothed signal. The blending program combines the smoothed signal with the input signal, based on the background noise estimate, to generate an output signal. The processor executes the smoothing program and the blending program.
The attenuation may be a multi-pass windowed average on the input signal. The attenuation may smooth the noise peaks, such as tonal noise peaks, as well as desired signal peaks in the input signal. Other attenuations may be employed.
The blending program determines output signal components based on input signal components and smoothed signal components. The output signal component may depend in part on the signal-to-noise ratio of the input signal, or other noise measure. Depending on the SNR, the output signal component may be the input signal component, the smoothed signal component, or may be a mix of both the input signal component and the smoothed signal component. Mixtures of fewer or additional signals in other amounts also may be employed.
Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
A signal processing system reduces the likelihood of detecting tonal noise as a signal component of interest for further processing. The signal processing system provides an output signal for subsequent processing circuitry or logic. The output signal includes desired signal content present in the input signal, while reducing or eliminating tonal noise. The subsequent processing stages may avoid spending time or computational resources to process noise which has been mistaken as a signal of interest.
In
The input signal ‘x’ 106 includes desired signal components and undesired signal components. The discussion below describes a pre-processing system for a voice recognition system in a vehicle. However, the processing system 100 may be used in any other application which processes an input signal.
In
The undesired signal sources 114 include a tonal noise source 116. The tonal noise source 116 generates a signal which may corrupt, mask, or distort the voice 112. The tonal noise source 116 produces a signal with periodic components. Tonal noise sources may include engine hum or whine or other electromagnetic interference, vehicle tires (e.g., as the tires run over pavement grooves or raised pavement markers such as rumble strips) or other mechanical noise sources, audio output, including noise, from vehicle audio/visual systems, other voices in the vehicle, or other tonal noise sources.
The microphone 118 captures the sound produced by desired signal sources 110 and the undesired signal sources 114. The microphone 118 may be part of the voice recognition system in the vehicle, part of a hands free phone system, or part of any other system in the vehicle. The microphone 118 captures the sound and provides a corresponding electrical signal to the automatic gain controller 108. The automatic gain controller 108 adjusts the input signal level according to the dynamic range of the analog-to-digital converter 109.
Tonal noise may couple directly into the input signal before or after the microphone 118 and/or automatic gain control 108. Thus, tonal noise need not be audible and need not be captured by the microphone 118 in order to be present in the input signal ‘x’ 106. Electromagnetic noise generated by engine electronics may generate tonal noise that couples directly into the input signal.
The processor 102 executes the noise estimator 120, the smoothing program 122, and the blending program 124. The noise estimator 120 may be circuitry or logic that provides a background noise estimate. The noise estimator 120 may measure input signal levels during periods of time when there is no voice activity to form a background noise estimate. Alternatively, or additionally, the noise estimator 120 may form an average or other statistical measure of the input signal ‘x’ 106 in time or frequency content over a window of time (e.g., 1-500 ms, 1-5 s, or other window) regardless of whether voice is present to obtain the background noise estimate. Other noise estimation techniques based on signal magnitude, frequency content, or other characteristics also may be employed.
The smoothing program 122 reduces or eliminates peaks in the input signal ‘x’ 106. The peaks may be tonal noise peaks, desired signal peaks, or both types of peaks. The smoothing program 122 generates a smoothed signal 126.
The smoothing parameters 128 establish configuration options for the smoothing program 122. The smoothing parameters 128 may select between multiple smoothing techniques which may be applied to the input signal, may provide parameters for any of the smoothing techniques, or may otherwise establish configuration options for the smoothing program 122. Alternatively, the smoothing program 122 may be pre-configured for any desired smoothing technique.
In one implementation, the smoothing parameters 128 select a windowed average smoothing technique. The smoothing parameters 128 may further specify whether the smoothing program 122 will apply a one-pass windowed average, two-pass windowed average, or other multi-pass windowed average. Additionally, the smoothing parameters 128 may specify the window size for each pass of the windowed average, how the average is calculated, whether to discard outlying samples, the outlying sample threshold, which passes may discard outlying samples, or other smoothing parameters.
The blending program 124 implements the blending rules 132 to generate the output signal ‘y’ 130. The blending parameters 134 may establish operating parameters for the blending program 124. The blending parameters 134 establish a lower SNR threshold 136, an upper SNR threshold 138, and may include a blending function specifier 140. Alternatively, the blending program 124 may implement a pre-configured technique for generating the output signal ‘y’ 130.
The processor 102 employs the background noise estimate to form a signal-to-noise ratio (SNR) spectrum estimate for the input signal ‘x’ 106. The SNR estimate may be updated on a sample by sample basis, periodically, when discrete events occur, prior to execution of the blending program 124, or at any other time. The SNR estimate influences the operation of the blending program 124.
The blending program 124 takes into consideration the spectra of the input signal, background noise estimate, and smoothed signal. The processor 102 may apply a time-to-frequency transform such as a Fast Fourier Transform to obtain the spectra. The time-to-frequency transform may have a length of 256, 512, or any other length which reveals tonal peaks in the input signal ‘x’ 106.
The time-to-frequency transform generates discrete signal components representative of frequency content in the input signal and background noise estimate. The smoothed signal 126 obtained from the input signal may also be represented as discrete frequency signal components. The blending program 124 determines one or more output signal components based on the input signal components, smoothed signal components, and SNR estimate.
Any other rule or set of rules may be established to direct the operation of the blending program 124.
The lower SNR threshold 136 determines when the blending program 124 uses a smoothed signal component as an output signal spectrum component. As the blending program 124 creates the output signal, the blending rule 144 directs the blending program 124 to use the smoothed signal component for the current output signal ‘y’ 130 component, when the SNR estimate is less than the lower SNR threshold 136. The upper SNR threshold 138 may determine when the blending program 124 uses an input signal component as an output signal spectrum component. As the blending program 124 creates the output signal ‘y’ 130, the blending rule 142 directs the blending program 124 to use the input signal component for the current output signal component, when the SNR estimate is greater than the upper SNR threshold 138.
The SNR estimate may also lie between the upper SNR threshold 138 and the lower SNR threshold 136. In that case, the blending rule 146 directs the blending program 124 to determine the current output signal component by evaluating a blending function of the input signal component and the smoothed signal component. The blending function specifier 140 may direct the blending program 124 to determine a weighted average of the input signal component and the smoothed signal component. Other blending functions may be used and may take into consideration different, additional or fewer signals.
The weighted average may be a linear SNR weighted average:
where ‘y’ is the output signal component, ‘s’ is the smoothed signal component, ‘x’ is the input signal component, ‘upper’ is the upper SNR threshold 138, ‘lower’ is the lower SNR threshold 136, and ‘SNR’ is the SNR estimate. Thus, if the SNR estimate is 80% of the way between the upper SNR threshold 138 and the lower SNR threshold 136, the output signal component is set to 20% of the smoothed signal component and 80% of the input signal component. Other linear and/or non-linear weightings may also be employed.
The blending program 124 may determine the output signal spectral components in decibels (dB), based on input signal and smoothed signal components also expressed in dB. Alternatively, the blending program 124 may determine the output signal components based on the power or amplitude of the input signal or smoothed signal components. The processor 102 may also convert the output signal ‘y’ 130 into another representation such as power or amplitude prior to providing the output signal ‘y’ to another processing stage.
The broadband increase in signal energy may cause a signal detector or other processing logic to determine that the input signal should be analyzed for voice commands to the vehicle voice recognition system. The voice recognition system may employ a pitch detector, endpointer, or other signal processing system to examine the input signal ‘y’ 106 in response to the signal detection. The tonal noise mimics characteristics of speech (e.g., vowel sounds) and may result in a false identification of speech content in the input signal. The processing system 100 smoothes and blends the input signal ‘x’ 106 to reduce or eliminate false identifications.
The smoothing program 122 first applies the averaging window 508 to the input signal components. The smoothing program 122 generates a first windowed average of the input signal components inside the window 508. The smoothing program 122 moves the averaging window 508 index position by index position along the input signal components. At each index position, the smoothing program 122 determines a new spectral component of the first windowed average signal.
During the second pass, the smoothing program 112 applies the second pass averaging window 510 to the input signal components. The second pass averaging window 510 may be the same size, larger, or smaller than the first pass averaging window 608. The smoothing program 122 generates smoothed spectral signal components based on the first windowed averaged components and the input signal components inside the window 510. The smoothing program 122 moves the second averaging window 510 index position by index position along the input signal components. At each index position, the smoothing program 122 determines a new signal component of the smoothed signal spectrum.
During the second pass of the windowed average, the smoothing program 122 may discard or otherwise eliminate from consideration outlying signal components for any given index position. In
In
The blending program 124 generates the output signal spectrum 802 as a mix of the input signal spectrum 302 and the smoothed signal spectrum 402. The blending program 124 performs the mix based in part on the background noise estimate 804. The mix may follow the blending rules 132 or other rules. In one implementation, an output signal component ‘y’ at each spectral index position is given by:
where ‘x’ is the input signal component at that index position, ‘s’ is the smoothed input signal component at that index position, SNR is the SNR estimate, ‘upper’ is the upper SNR threshold 138 and ‘lower’ is the lower SNR threshold 136.
The upper SNR threshold 138 may be 1-10 dB, 2-8 dB, 4-6 dB, or any other upper threshold. The lower SNR threshold 136 may be 0-1 dB, less than 0 dB, or any other lower threshold. The thresholds 136 and 138 may be dynamically set or adapted during operation of the processing system 100.
In
The smoothing program 122 generates the smoothed signal spectrum 908 from the input signal spectrum 902. The smoothed signal spectrum 908 significantly reduces or eliminates peaks in the input signal spectrum 902 while retaining attenuated characteristics of the input signal. Both the tonal noise and voice content peaks are smoothed or eliminated in the smoothed signal spectrum 908.
The output signal spectrum 910 reproduces the components of the input signal spectrum 902 with relatively high SNR. The output signal spectrum 910 thus includes spectral components 912 representing the voice content 904. In addition, the output signal spectrum 910 significant reduces or eliminates the tonal noise peaks 806-814 by using the smoothed signal components when the input signal SNR is low.
In generating an output signal component, the blending program 124 uses the input signal component when the SNR exceeds the upper threshold 138. The output signal spectrum 910 thereby captures the desired signal content in the input signal spectrum 902. The blending program 124 uses the smoothed signal components when the SNR is less than the lower threshold 136. The output signal spectrum 910 thereby reflects the significant attenuation of the peaks originally present in the input signal spectrum 902.
The output signal spectrum 910 may be provided to subsequent processing systems. such as a pitch detector, voice recognition system, or other system The processor 102 may provide the output signal ‘y’ 130 in the form of spectral samples, in terms of amplitude or power (e.g., as the square of the amplitude), or in any other form based on the output signal spectrum 910. The output signal ‘y’ 130 has significantly reduced or eliminated the tonal noise components 206-214, but has retained the desired signal content 904. The subsequent processing system may reliably detect and process the voice content originally present in the input signal ‘x’ 106, without false triggers caused by the tonal noise components 206-214 which may otherwise mimic the voice content or other desired signal content.
In preparation for smoothing the input signal spectrum 902, the smoothing program 122 reads the smoothing parameters 128 in the memory 104 (Act 1004). The smoothing parameters 128 may specify a smoothing algorithm, parameters for the smoothing algorithm such as window sizes for one or more windowed average passes, or other parameters. For a two-pass windowed average smoothing technique, the smoothing program 122 applies a first averaging window 508 to the input signal spectrum 902, position by position, to generate a first windowed averaged signal (Act 1006).
In the second pass, the smoothing program 122 applies a second averaging window 608 to the input signal (Act 1008). During the second pass, the smoothing program 122 may determine whether signal components in the current averaging window are outlying signal components. The smoothing program 122 may discard or attenuate the outlying signal components so that they do not contribute, or do not contribute as much, to the windowed average (Act 1010).
The smoothing program 122 generates an output signal component based on the input signal components remaining in the window (Act 1010). When there are no further components in the input signal, the blending program ends. Otherwise, the smoothing program 122 moves the second averaging window 608 to the next position (Act 1012) and continues. A smoothed signal spectrum 908 results.
The blending program 124 generates individual output signal spectrum components. For each component, the blending program 124 obtains the next input signal spectrum component, smoothed signal spectrum component, and SNR estimate (Act 1106). The blending program 124 applies the blending rules 132 to the generate the next output signal spectrum component.
When the SNR is between the upper SNR threshold 138 and lower SNR threshold 136, the blending program 124 determines the output signal component to be a mix of the input signal component and the smoothed signal component (Act 1116). The mix may be a SNR weighted mix. Alternatively, other mixes of the same or different signals may also be employed to form the output signal component.
The blending program 124 may produce an output signal component for each input signal component. When there are no more input signal components (Act 1118), the blending program 124 ends. The output signal spectrum 910 results.
In
The signal processing system 1200 may accept input from the input sources 1212 directly, or after initial processing by the signal processing systems 1214. The signal processing systems 1214 may accept digital or analog input from the signal sources 1212, apply any desired processing to the signals, and produce an output signal to the pre-processing system 1200.
The input sources 1212 may include digital signal sources or analog signal sources such as analog sensors 1216. The input sources may include a microphone 1218 or other acoustic sensor. The microphone 1218 may capture voice commands to a voice recognition system in a vehicle, on a home computer, or in any other application. Other systems may employ other types of sensors 1220 which are also susceptible to tonal noise sources. The sensors 1220 may include touch, force, or motion sensors, inductive displacement sensors, proximity detectors, or other types of sensors.
The digital signal sources may include a communication interface 1222, memory, or other circuitry or logic in the system in which the pre-processing system 1200 is implemented. When the input source 1212 is a digital signal source, the signal processing systems 1214 may process the digital signal samples and generate an analog output signal. The pre-processing system 1200 may process the analog output signal or the digital signal samples.
The pre-processing system 1200 also connects to post-processing logic 1204. The post-processing logic 1204 may include an audio reproduction system 1224, digital and/or analog data transmission systems 1226, a pitch estimator 1228, a voice recognition system 1230, or other systems. The pre-processing system 1200 may provide a tonal noise robust output signal to any other type of post-processing logic 1204.
The voice recognition system 1230 may operating in conjunction with the pitch estimator 1228. The pitch estimator 1228 may include discrete cosine transform circuitry or logic and may process a power or amplitude based representation of the output signal spectrum 910. The voice recognition system may include circuitry and/or logic that interprets, takes direction from, records, or otherwise processes voice. The voice recognition system 1230 may process voice as part of a handsfree car phone, desktop or portable computer system, entertainment device, or any other system. In a handsfree car phone, the pre-processing system 1200 removes tonal noise and provides an output signal to the voice recognition system that is
The transmission system 1226 may provide a network connection, digital or analog transmitter, or other transmission circuitry and/or logic. The transmission system 1226 may communicate the tonal noise robust output signal generated by the pre-processing system 1200 to other devices. In a car phone, for example, the transmission system 1226 may communicate enhanced signals from the car phone to a base station or other receiver through a wireless connection such as a ZigBee, Mobile-Fi, Ultrawideband, Wi-fi, or a WiMax network.
The audio reproduction system 1224 may include digital to analog converters, filters, amplifiers, and other circuitry or logic. The audio reproduction system 1224 may be a speech and/or music reproduction system. The audio reproduction system 224 may be implemented in a cellular phone, car phone, digital media player/recorder, radio, stereo, portable gaming device, or other devices employing sound reproduction.
The processing systems 100 and/or 1200 may be implemented in hardware and/or software. The processing systems 100 and/or 1200 may include a digital signal processor (DSP), microcontroller, or other processor. The processing systems 100 and/or 1200 may include discrete logic or circuitry, a mix of discrete logic and a processor, or may be distributed over multiple processors or programs. Additionally, or alternatively, the processing systems 100 and/or 1200 may take the form of instructions stored on a machine readable medium such as a disk, EPROM, flash card, or other memory.
The processing system 100 maintains desired signal content in the output signal ‘y’ 130, while suppressing tonal noise. The processing system 100 may remove strong tonal noise, allowing even subtle voice content to be detected in the output signal. The output signal ‘y’ 130 reduces the likelihood that subsequent processing circuitry or logic will interpret noise as a signal warranting further processing. Limited computational resources may be saved and the subsequent processing logic may avoid taking spurious actions, issuing incorrect commands, or responding in other ways which are not called for by the input signal.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.