1. Technical Field
This invention relates to acoustics, and more particularly, to a system that enhances the perceptual quality of a processed voice.
2. Related Art
Many communication devices acquire, assimilate, and transfer a voice signal. Voice signals pass from one system to another through a communication medium. In some systems, including some systems used in vehicles, the clarity of the voice signal does not depend only on the quality of the communication system or the quality of the communication medium. The clarity of the voice signal may also depend on the amount of noise which accompanies the voice signal. When noise occurs near a source or a receiver, distortion garbles the voice signal, destroys information, and in some instances, masks the voice signal so that it is not recognized by a listener or a voice recognition system.
Noise, which may be annoying, distracting, or result in a loss of information, may come from many sources. Noise from a vehicle may be created by the engine, the road, the tires, or by the movement of air. When a vehicle is in motion on a paved road, a significant amount of the noise it produces may be generated from the contact between the tire and the road—a whooshing or hissing sound one hears as the car passes by. This sound may be particularly noticeable to others driving on the highway with their windows down. The noise may originate from an air pumping effect emanating from the air compression and expansion between the tires of the passing car and the road. This sound may be amplified by the side less horn shape formed by the tire and the road. The short-term, or transient, whooshing or hissing sound as a vehicle passes by a communication device may cause the communication device to suffer voice quality and intelligibility loss, and may also cause speech recognition failure.
Noise estimation techniques may have temporal smoothing parameters to ensure that they do not incorporate speech and temporally short events into their estimates. Because passing tire hiss noise may have a duration similar to that of speech sounds, many conventional noise estimation techniques are unsuitable for identifying passing tire hiss as noise. Instead, passing tire hiss noise may be misinterpreted as signal content and augmented in noise reduction algorithms or misclassified as an utterance in speech recognition applications.
Therefore there is a need for a system that counteracts passing tire hiss noise.
A voice enhancement logic improves the perceptual quality of a processed voice. The system detects and dampens some noises associated with moving tires. The system includes a passing tire hiss noise detector and a passing tire hiss noise attenuator. The passing tire hiss noise detector may detect a passing tire hiss noise by comparing the input signal to a passing tire hiss model. The passing tire hiss noise attenuator then dampens the passing tire hiss. The system may also detect, dampen and/or attenuate continuous noise or other transient noises.
Alternative voice enhancement logic includes time frequency transform logic, a background noise estimator, a passing tire hiss noise detector, and a passing tire hiss noise attenuator. The time frequency transform logic converts a time varying input signal into a frequency domain output signal. The background noise estimator measures the continuous noise that may accompany the input signal. The passing tire hiss noise detector automatically identifies and models passing tire hiss noise, which may then be dampened by the passing tire hiss noise attenuator.
Other systems, methods, features, and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
A voice enhancement logic improves the perceptual quality of a processed voice. The logic may automatically detect the shape and form of the noise associated with the hiss of tires of vehicles passing the receiver in a real or a delayed time. By tracking selected attributes, the logic may eliminate or dampen passing tire hiss noise using a limited memory that temporarily stores the selected attributes of the noise. The passing tire hiss noise can be detected and attenuated in the presence or absence of speech. The passing tire hiss noise may be detected and attenuated with some time buffering (e.g. 300-500 ms), or alternatively, the presence of passing tire hiss noise may be predicted based on modeled passing tire hiss noise and attenuated in real time. Alternatively or additionally, the logic may also dampen a continuous noise and/or the “musical noise,” squeaks, squawks, chirps, clicks, drips, pops, tones, or other sound artifacts that may be generated by some voice enhancement systems.
In
Noise can be broadly divided into two categories: (1a) non-periodic noises, which include sounds like passing tire hiss, rain, wind, and share the traits that they usually occur at non-periodic intervals, don't have a harmonic frequency structure, and have a transient, short time duration; (1b) periodic noises, which include repetitive sounds like turn indicator clicks, engine or drive train noise and windshield wiper swooshes and may have some harmonic frequency structure due to their periodic nature. Speech can also be broadly divided into two categories: (2a) unvoiced speech, such as consonants, without harmonic or formant structure; (2b) voiced speech, such as vowel sounds, which exhibits a regular harmonic structure, or harmonic peaks weighted by the spectral envelope that may describe the formant structure. Noise plus speech may comprise any mixture of non-periodic noises, periodic noises, unvoiced speech and/or voiced speech.
The passing tire hiss noise detector 102 may separate the noise-like segments from the remaining signal in a real or in a delayed time no matter how complex or how loud an incoming segment may be. The separated noise-like segments are analyzed to detect the occurrence of passing tire hiss noise, and in some instances, the presence of a continuous underlying noise. When passing tire hiss noise is detected, the spectrum is modeled, and the resulting passing tire hiss model is retained in a memory for use by the passing tire hiss noise attenuator 104. While the passing tire hiss noise detector 102 may store an entire model of a passing tire hiss noise signal, it also may store selected attributes in a memory. The stored passing tire hiss models may be used to create an average passing tire hiss model, or otherwise combined for future use by the passing tire hiss noise detector 102 or the passing tire hiss noise attenuator 104.
To overcome the effects of passing tire hiss noise, the passing tire hiss noise attenuator 104 substantially removes or dampens the passing tire hiss noise from the input signal. The voice enhancement logic 100 encompasses any system that substantially removes or dampens passing tire hiss noise. Examples of systems that may dampen or remove passing tire hiss noise include systems that use a signal and a passing tire hiss noise model such as (1) systems which use a neural network mapping of a noisy signal and a passing tire hiss model to a noise-reduced signal, (2) systems which subtract the passing tire hiss model from a noisy signal, (3) systems that use the noisy signal and the passing tire hiss model to select a noise-reduced signal from a code-book, (4) systems that in any other way use the noisy signal and the passing tire hiss model to create a noise-reduced signal based on a reconstruction or reduction of the masked signal. These systems may attenuate passing tire hiss noise, and in some instances, attenuate the continuous noise that may be part of the short-term spectra. The passing tire hiss noise attenuator 104 may also interface or include an optional residual attenuator that removes or dampens artifacts that may result in the processed signal. The residual attenuator may remove the “musical noise,” squeaks, squawks, chirps, clicks, drips, pops, tones, or other sound artifacts.
To detect a passing tire hiss, modeler 508 may fit a smoothly-varying function to a selected portion of the signal in the time-frequency domain. The smoothly-varying function may be a log-Lorentzian function, with a width determined by the speed of the passing vehicle generating the passing tire hiss noise, and a sharpness determined by the lateral distance of the passing vehicle from the receiver. A correlation between a smoothly-varying function and the signal envelope in the time domain over one or several frequency bands may identify a passing tire hiss. The correlation threshold at which a portion of the signal is identified as a passing tire hiss noise may depend on a desired clarity of a processed voice and the variations in width and sharpness of the passing tire hiss noise. Alternatively or additionally, the system may determine a probability that the signal includes passing tire hiss noise, and may identify a passing tire hiss noise when that probability exceeds a probability threshold. The correlation and probability thresholds may depend on various factors, including the presence of other noises or speech in the input signal. When the passing tire hiss noise detector 102 detects a passing tire hiss, the characteristics of the detected passing tire hiss may be provided to the passing tire hiss noise attenuator 104 for removal of the passing tire hiss noise.
As more windows of sound are processed, the passing tire hiss noise detector 102 may derive average noise models for the passing tire hiss. A time-smoothed or weighted average may be used to model the passing tire hiss and continuous noise estimates for each frequency bin. The average model may be updated when a passing tire hiss noise is detected in the absence of speech. Fully bounding a passing tire hiss noise when updating the average model may increase the probability of accurate detection.
To limit a masking of voice, the fitting of the smoothly-varying function to a suspected passing tire hiss noise may be constrained by rules. For example, a spectral flatness measure may be used to differentiate passing tire hiss noise from voiced signals, and may improve the accuracy of passing tire hiss noise detection, since passing tire hiss is broad spectrum noise and has a fairly smooth spectral shape, unlike voiced signals. Alternatively or additionally, in a vehicle equipped with MOST bus or similar technology, the voice enhancement logic 100 may be provided with information about whether or not the windows are open and passing tire hiss noise detection may be disabled or constrained when the windows are closed.
To overcome the effects of passing tire hiss noise, a passing tire hiss noise attenuator 104 may substantially remove or dampen the passing tire hiss noise from the signal by any method. One method may add the passing tire hiss model to a recorded or estimated continuous noise. In the power spectrum, the passing tire hiss model and continuous noise may then be subtracted from the unmodified signal. If an underlying speech signal is masked by a passing tire hiss or continuous noise, a conventional or modified interpolation method may be used to reconstruct the speech signal. A linear or step-wise interpolator may be used to reconstruct the missing part of the signal. An inverse FFT may then be used to convert the signal power to the time domain, which provides a reconstructed speech signal.
To minimize the “music noise,” squeaks, squawks, chirps, clicks, drips, pops, or other sound artifacts, an optional residual attenuator may also condition the voice signal before it is converted to the time domain. The residual attenuator may be combined with a passing tire hiss noise attenuator 104, combined with one or more other elements, or comprise a separate element.
The residual attenuator may track the power spectrum within a mid to high frequency range (e.g., from about 400 Hz up to about the Nyquist frequency, which is about one half the sample rate). When a large increase in signal power is detected an improvement may be obtained by limiting or dampening the transmitted power in the mid to high frequency range to a predetermined or calculated threshold. A calculated threshold may be equal to, or based on, the average spectral power of that same mid to high frequency range at an earlier period in time.
Further improvements to voice quality may be achieved by pre-conditioning the input signal before it is processed by the passing tire hiss noise detector 102. One pre-processing system may exploit the lag time caused by a signal arriving at different detectors that are positioned apart as shown in
Alternatively, passing tire hiss noise detection may be performed on each of the channels. A mixing of one or more channels may occur by switching between the outputs of the microphones 602. Alternatively or additionally, the controller 604 may include a comparator, and a direction of the signal may be detected from differences in the amplitude or timing of signals received from the microphones 602. Direction detection may be improved by pointing the microphones 602 in different directions. The passing tire hiss noise detection may be made more sensitive for signals originating outside of the vehicle.
The signals may be evaluated at only frequencies above a certain threshold (for example, by using a high-pass filter) which are of interest in certain applications. The threshold frequency may be updated over time as the average passing tire hiss model learns the expected frequencies of passing tire hiss noises. For example, when passing vehicles are traveling at high speeds, the threshold frequency for passing tire hiss noise detection may be set relatively high, since the maximum frequency of passing tire hiss noise increases with vehicle speed. Alternatively, controller 604 may combine the output signals of multiple microphones 602 at a specific frequency or frequency range through a weighting function.
To prevent biased background noise estimations at transients, a transient detector 706 may disable or modulate the background noise estimation process during abnormal or unpredictable increases in power. In
B(f,i)>B(f)Ave+c (Equation 1)
Alternatively or additionally, the average background noise may be updated depending on the signal to noise ratio (SNR). An example closed algorithm is one which adapts a leaky integrator depending on the SNR:
B(f)Ave′=aB(f)Ave+(1−a)S (Equation 2)
where a is a function of the SNR and S is the instantaneous signal. In this example, the higher the SNR, the slower the average background noise is adapted.
To detect a passing tire hiss, passing tire hiss noise detector 708 may fit a smoothly-varying function to a selected portion of the signal in the time-frequency domain. The smoothly-varying function may be a log-Lorentzian function, with a width determined by the speed of the passing vehicle generating the passing tire hiss noise, and a sharpness determined by the lateral distance of the passing vehicle from the receiver. A correlation between a smoothly-varying function and the signal envelope in the time domain over one or more frequency bands may identify a passing tire hiss. The correlation threshold at which a portion of the signal is identified as a passing tire hiss noise may depend on a desired clarity of a processed voice and the variations in width and sharpness of the passing tire hiss noise. Alternatively or additionally, the system may determine a probability that the signal includes passing tire hiss noise, and may identify a passing tire hiss noise when that probability exceeds a probability threshold. The correlation and probability thresholds may depend on various factors, including the presence of other noises or speech in the input signal. When the noise detector 708 detects a passing tire hiss, the characteristics of the detected passing tire hiss may be provided to the noise attenuator 712 for removal of the passing tire hiss noise.
A signal discriminator 710 may mark the voice and noise of the spectrum in real or delayed time. Any method may be used to distinguish voice from noise. Spoken signals may be identified by (1) the narrow widths of their bands or peaks; (2) the broad resonances, which are also known as formants, which may be created by the vocal tract shape of the person speaking; (3) the rate at which certain characteristics change with time (i.e., a time-frequency model can be developed to identify spoken signals based on how they change with time); and when multiple detectors or microphones are used, (4) the correlation, differences, or similarities of the output signals of the detectors or microphones.
At act 806, a continuous or ambient noise is measured. The background noise estimate may comprise an average of the acoustic power in each frequency bin. To prevent biased noise estimations at transients, the noise estimation process may be disabled during abnormal or unpredictable increases in power at act 808. The transient detection act 808 disables the background noise estimate when an instantaneous background noise exceeds an average background noise by more than a predetermined decibel level.
At act 810, a passing tire hiss noise may be detected when a high correlation exists between a smoothly function and the temporal and/or spectral characteristics of the input signal in the time and/or frequency domains. The detection of a passing tire hiss noise may be constrained by one or more optional acts. For example, if a vowel or another harmonic structure is detected, the passing tire hiss noise detection method may limit the passing tire hiss noise correction to values less than or equal to average values. An additional optional act may allow the average passing tire hiss model or attributes to be updated only during unvoiced segments. If a speech or speech mixed with noise segment is detected, the average passing tire hiss model or attributes are not updated under this act. If no speech is detected, the passing tire hiss model or each attribute may be updated through many means, such as through a weighted average or a leaky integrator. Many other optional acts may also be applied to the model.
If passing tire hiss noise is detected at act 810, at act 814, a signal analysis may discriminate or mark the spoken signal from the noise-like segments. Spoken signals may be identified by (1) the narrow widths of their bands or peaks; (2) the broad resonances, which are also known as formants, which may be created by the vocal tract shape of the person speaking; (3) the rate at which certain characteristics change with time (i.e., a time-frequency model can be developed to identify spoken signals based on how they change with time); and when multiple detectors or microphones are used, (4) the correlation, differences, or similarities of the output signals of the detectors or microphones.
To overcome the effects of passing tire hiss noise, a passing tire hiss noise is substantially removed or dampened from the noisy spectrum by any act. One exemplary act 816 adds the smoothly varying passing tire hiss model to a recorded or modeled continuous noise. In the power spectrum, the modeled noise may then be substantially removed from the unmodified spectrum by the methods and systems described above. If an underlying speech signal is masked by a passing tire hiss noise, or masked by a continuous noise, a conventional or modified interpolation method may be used to reconstruct the speech signal at act 818. A time series synthesis may then be used to convert the signal power to the time domain at act 820, which provides a reconstructed speech signal. If no passing tire hiss noise is detected at act 810, at act 820 the signal is converted into the time domain to provide the reconstructed speech signal.
Alternatively, a passing tire hiss noise attenuator may substantially remove or dampen the passing tire hiss from the signal by any method. One method may add the passing tire hiss model to a recorded or estimated continuous noise. In the power spectrum, the passing tire hiss model and the continuous noise may then be subtracted from the unmodified signal.
If an underlying speech signal is masked by passing tire hiss or continuous noise, a conventional or modified interpolation method may be used to reconstruct the speech signal.
The method shown in
A “computer-readable medium,” “machine-readable medium,” “propagated-signal” medium, and/or “signal-bearing medium” may comprise any means that contains, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical connection “electronic” having one or more wires, a portable magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM” (electronic), a Read-Only Memory “ROM” (electronic), an Erasable Programmable Read-Only Memory (EPROM or Flash memory) (electronic), or an optical fiber (optical). A machine-readable medium may also include a tangible medium upon which software is printed, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
The above-described systems may condition signals received from only one or more than one microphone or detector. Many combinations of systems may be used to identify and track passing tire hiss noises. Besides the fitting of a smoothly varying function to a suspected passing tire hiss, a system may detect and isolate any parts of the signal having greater energy than the modeled passing tire hiss. One or more of the systems described above may also be used in alternative voice enhancement logic.
Other alternative voice enhancement systems include combinations of the structure and functions described above. These voice enhancement systems are formed from any combination of structure and function described above or illustrated within the attached figures. The logic may be implemented in software or hardware. The term “logic” is intended to broadly encompass a hardware device or circuit, software, or a combination. The hardware may include a processor or a controller having volatile and/or non-volatile memory and may also include interfaces to peripheral devices through wireless and/or hardwire mediums.
The voice enhancement logic is easily adaptable to any technology or devices. Some voice enhancement systems or components interface or couple vehicles as shown in
The voice enhancement logic improves the perceptual quality of a processed voice. The logic may automatically learn and encode the shape and form of the noise associated with passing tire hiss in a real or a delayed time. By tracking selected attributes, the logic may eliminate, substantially eliminate, or dampen passing tire hiss noise using a limited memory that temporarily or permanently stores selected attributes of the passing tire hiss noise. The voice enhancement logic may also dampen a continuous noise and/or the squeaks, squawks, chirps, clicks, drips, pops, tones, or other sound artifacts that may be generated within some voice enhancement systems and may reconstruct voice when needed.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.