1. Technical Field
The invention relates to communication systems, and more particularly, to systems that improve the intelligibility of speech.
2. Related Art
Many communication devices acquire, assimilate, and transfer speech signals. Speech signals pass from one system to another through a communication medium. All communication systems, especially wireless communication systems, suffer bandwidth limitations. In some systems, including some telephone systems, the clarity of the voice signals depend on the systems ability to pass high and low frequencies. While many low frequencies may lie in a pass band of a communication system, the system may block or attenuate high frequency signals, including the high frequency components found in some unvoiced consonants.
Some communication devices may overcome this high frequency attenuation by processing the spectrum. These systems may use a speech/silence switch and a voiced/unvoiced switch to identify and process unvoiced speech. Since transitions between voiced and unvoiced segments may be difficult to detect, some systems are not reliable and may not be used with real-time processes, especially systems susceptible to noise or reverberation. In some systems, the switches are expensive and they create artifacts that distort the perception of speech.
Therefore, there is a need for a system that improves the perceptible sound of speech in a limited frequency range.
A speech enhancement system improves the intelligibility of a speech signal. The system includes a frequency transformer and a spectral compressor. The frequency transformer converts speech signals from time domain into frequency domain. The spectral compressor compresses a pre-selected portion of the high frequency band and maps the compressed high frequency band to a lower band limited frequency range.
Other systems, methods, features, and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
Enhancement logic improves the intelligibility of processed speech. The logic may identify and compress speech segments to be processed. Selected voiced and/or unvoiced segments may be processed and shifted to one or more frequency bands. To improve perceptual quality, adaptive gain adjustments may be made in the time or frequency domains. The system may adjust the gain of some or the entire speech segments. The versatility of the system allows the logic to enhance speech before it is passed to a second system in some applications. Speech and audio may be passed to an Automatic Speech Recognition (ASR) engine wirelessly or through a communication bus that may capture and extract voice in the time and/or frequency domains.
Any bandlimited device may benefit from these systems. The systems may be built into, may be a unitary part of, or may be configured to interface any bandlimited device. The systems may be a part of or interface radio applications such as air traffic control devices (which may have similar bandlimited pass bands), radio intercoms (mobile or fixed systems for crews or users communicating with each other), and Bluetooth enabled devices, such as headsets, that may have a limited bandwidth across one or more Bluetooth links. The system may also be a part of other personal or commercial limited bandwidth communication systems that may interface vehicles, commercial applications, or devices that may control user's homes (e.g., such as a voice control.)
In some alternatives, the systems may precede other processes or systems. Some systems may use adaptive filters, other circuitry or programming that may disrupt the behavior of the enhancement logic. In some systems the enhancement logic precedes and may be coupled to an echo canceller (e.g., a system or process that attenuates or substantially attenuates an unwanted sound). When an echo is detected or processed, the enhancement logic may be automatically disabled or mitigated and later enabled to prevent the compression and mapping, and in some instances, a gain adjustment of the echo. When the system precedes or is coupled to a beamformer, a controller or the beamformer (e.g., a signal combiner) may control the operation of the enhancement logic (e.g., automatically enabling, disabling, or mitigating the enhancement logic). In some systems, this control may further suppress distortion such as multi-path distortion and/or co-channel interference. In other systems or applications, the enhancement logic is coupled to a post adaptive system or process. In some applications, the enhancement logic is controlled or interfaced to a controller that prevents or minimizes the enhancement of an undesirable signal.
The compression logic comprises a spectral compression device or spectral compressor 104. The spectral compressor 104 maps a wide range of frequency components within a high frequency range to a lower, and in some enhancement systems, narrower frequency range. In
In
One frequency compression scheme used by some enhancement systems combines a frequency compression with a frequency transposition. In these enhancement systems, an enhancement controller may be programmed to derive a compressed high frequency component. In some enhancement systems, equation 1 is used, where Cm is the
amplitude of compressed high frequency component, gm is a gain factor, Sk is the frequency component of original speech signal, φm(k) is compression basis functions, and k is the discrete frequency index. While any shape of window function may be used as non-linear compression basis function (φm(k)), including triangular, Hanning, Hamming, Gaussian, Gabor, or wavelet windows, for example,
The frequency components are then mapped to a lower frequency range. In some enhancement systems, an enhancement controller may be programmed or configured to map
the frequencies to the functions shown in equation 2. In equation 2, Ŝk is the frequency component of compressed speech signal and fo is the cutoff frequency index. Based on this compression scheme, all frequency components of the original speech below the cutoff frequency index fo remain unchanged or substantially unchanged. Frequency components from cutoff frequency “A” to the Nyquist frequency are compressed and shifted to a lower frequency range. The frequency range extends from the lower cutoff frequency “A” to the upper cutoff frequency “B” which also may comprise the upper limit of a telephone or communication pass-band. In this enhancement system, higher frequency components have a higher compression ratio and larger frequency shifts than the frequencies closer to upper cutoff frequency “B.” These enhancement systems improve the intelligibility and/or perceptual quality of a speech signal because those frequencies above cutoff frequency “B” carry significant consonant information, which may be critical for accurate speech recognition.
To maintain a substantially smooth and/or a substantially constant auditory background, an adaptive high frequency gain adjustment may be applied to the compressed signal. In
The gain controller 106 may be programmed to amplify and/or attenuate only the compressed spectral signal that in some applications includes noise according to the function shown in equation 3. In equation 3, the output gain gm is derived by:
where Nk is the frequency component of input background noise. By tracking gain to a measured or estimated noise level, some enhancements systems maintain a noise floor across a compressed and uncompressed bandwidth. If noise is sloped down as frequency increases in the compressed frequency band, as shown in
To overcome the effects of an increasing background noise in the compressed signal band shown in
When background noise is equal or almost equal across all frequencies of a desired bandwidth, as shown in
To minimize speech loss in a band limited frequency range, the cutoff frequencies of the enhancement system may vary with the bandwidth of the communication systems. In some telephone systems having a bandwidth up to approximately 3,600 Hz, the cutoff frequency may lie between about 2,500 Hz and about 3,600 Hz. In these systems, little or no compression occurs below the lowest cutoff frequency, while higher frequencies are compressed and transposed more strongly. As a result, lower harmonic relations that impart pitch and may be perceived by the human ear are preserved.
Further alternatives to the voice enhancement system may be achieved by analyzing a signal-to-noise ratio (SNR) of the compressed and uncompressed signals. This alternative recognizes that the second format peaks of vowels are predominately located below the frequency of about 3,200 Hz and their energy decays quickly with higher frequencies. This may not be the case for some unvoiced consonants, such as /s/, /f/, /t/, and /t∫/. The energy that represents the consonants may cover a higher range of frequencies. In some systems, the consonants may lie between about 3,000 Hz to about 12,000 Hz. When high background noise is detected, which may be detected in a vehicle, such as a car, consonants may be likely to have higher Signal-to-Noise Ratio in the higher frequency band than in the lower frequency band. In this alternative, the average SNR in the uncompressed range SNRA-B uncompressed lying between cutoff frequencies “A” and “B” is compared to the average SNR in the would-be-compressed frequency range SNRA-B compressed lying between cutoff frequencies “A” and “B” by a controller. If the average SNRA-B uncompressed is higher than or equal to the average SNRA-B compressed then no compression occurs. If the average SNRA-B uncompressed is less than the average SNRA-B compressed, a compression, and in some case, a gain adjustment occurs. In this alternative A-B represents a frequency band. A controller in this alternative may comprise a processor that may regulate the spectral compressor 104 through a wireless or tangible communication media such as a communication bus.
Another alternative speech enhancement system and method compares the amplitude of each frequency component of the input signal with a corresponding amplitude of the compressed signal that would lie within the same frequency band through a second controller coupled to the spectral compressor. In this alternative shown in
|Ŝk output|=max(|Sk|,|Ŝk|) (Equation 4)
Equation 4, the amplitude of each frequency bin lying between cutoff frequencies “A” and “B” is chosen to be the amplitude of the compressed or uncompressed spectrum, whichever is higher.
Each of the controllers, systems, and methods described above may be encoded in a signal bearing medium, a computer readable medium such as a memory, programmed within a device such as one or more integrated circuits, or processed by a controller or a computer. If the methods are performed by software, the software may reside in a memory resident to or interfaced to the spectral compressor 104, noise detector 108, gain adjuster 106, frequency to time transformer 110 or any other type of non-volatile or volatile memory interfaced, or resident to the speech enhancement logic. The memory may include an ordered listing of executable instructions for implementing logical functions. A logical function may be implemented through digital circuitry, through source code, through analog circuitry, or through an analog source such through an analog electrical, or optical signal. The software may be embodied in any computer-readable or signal-bearing medium, for use by, or in connection with an instruction executable system, apparatus, or device. Such a system may include a computer-based system, a processor-containing system, or another system that may selectively fetch instructions from an instruction executable system, apparatus, or device that may also execute instructions.
A “computer-readable medium,” “machine-readable medium,” “propagated-signal” medium, and/or “signal-bearing medium” may comprise any apparatus that contains, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical connection “electronic” having one or more wires, a portable magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM” (electronic), a Read-Only Memory “ROM” (electronic), an Erasable Programmable Read-Only Memory (EPROM or Flash memory) (electronic), or an optical fiber (optical). A machine-readable medium may also include a tangible medium upon which software is printed, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
The speech enhancement logic 100 is adaptable to any technology or devices. Some speech enhancement systems interface or are coupled to a frequency to time transformer 110 as shown in
The speech enhancement logic is also adaptable and may interface systems that detect and/or monitor sound wirelessly or by an electrical or optical connection. When certain sounds are detected in a high frequency band, the system may disable or otherwise mitigate the enhancement logic to prevent the compression, mapping, and in some instances, the gain adjustment of these signals. Through a bus, such as a communication bus, a noise detector may send an interrupt (hardware of software interrupt) or message to prevent or mitigate the enhancement of these sounds. In these applications, the enhancement logic may interface or be incorporated within one or more circuits, logic, systems or methods described in “System for Suppressing Rain Noise,” U.S. Ser. No. 11/006,935, each of which is incorporated herein by reference.
The speech enhancement logic improves the intelligibility of speech signals. The logic may automatically identify and compress speech segments to be processed. Selected voiced and/or unvoiced segments may be processed and shifted to one or more frequency bands. To improve perceptual quality, adaptive gain adjustments may be made in the time or frequency domains. The system may adjust the gain of only some of or the entire speech segments with some adjustments based on a sensed or estimated signal. The versatility of the system allows the logic to enhance speech before it is passed or processed by a second system. In some applications, speech or other audio signals may be passed to remote, local, or mobile ASR engine that may capture and extract voice in the time and/or frequency domains. Some speech enhancement systems do not switch between speech and silence or voiced and unvoiced segments and thus are less susceptible the squeaks, squawks, chirps, clicks, drips, pops, low frequency tones, or other sound artifacts that may be generated within some speech systems that capture or reconstruct speech.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
This application is a continuation-in-part of U.S. application Ser. No. 11/110,556 “System for Improving Speech Quality and Intelligibility,” filed Apr. 20, 2005. The disclosure of the above application is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4130734 | Lee | Dec 1978 | A |
4170719 | Fujimura | Oct 1979 | A |
4255620 | Harris et al. | Mar 1981 | A |
4343005 | Han et al. | Aug 1982 | A |
4374304 | Flanagan | Feb 1983 | A |
4600902 | Lafferty | Jul 1986 | A |
4630305 | Borth et al. | Dec 1986 | A |
4700360 | Visser | Oct 1987 | A |
4741039 | Bloy | Apr 1988 | A |
4953182 | Chung | Aug 1990 | A |
5335069 | Kim | Aug 1994 | A |
5345200 | Reif | Sep 1994 | A |
5396414 | Alcone | Mar 1995 | A |
5416787 | Kodama et al. | May 1995 | A |
5455888 | Iyengar et al. | Oct 1995 | A |
5471527 | Ho et al. | Nov 1995 | A |
5497090 | Macovski | Mar 1996 | A |
5581652 | Abe et al. | Dec 1996 | A |
5715363 | Tamura et al. | Feb 1998 | A |
5771299 | Melanson | Jun 1998 | A |
5774841 | Salazar et al. | Jun 1998 | A |
5790671 | Cooper | Aug 1998 | A |
5822370 | Graupe | Oct 1998 | A |
5828756 | Benesty et al. | Oct 1998 | A |
5867815 | Kondo et al. | Feb 1999 | A |
5950153 | Ohmori et al. | Sep 1999 | A |
5999899 | Robinson | Dec 1999 | A |
6115363 | Oberhammer et al. | Sep 2000 | A |
6144244 | Gilbert | Nov 2000 | A |
6154643 | Cox | Nov 2000 | A |
6157682 | Oberhammer | Dec 2000 | A |
6195394 | Arbeiter et al. | Feb 2001 | B1 |
6208958 | Cho et al. | Mar 2001 | B1 |
6226616 | You et al. | May 2001 | B1 |
6275596 | Fretz et al. | Aug 2001 | B1 |
6295322 | Arbeiter et al. | Sep 2001 | B1 |
6311153 | Nakatoh et al. | Oct 2001 | B1 |
6504935 | Jackson | Jan 2003 | B1 |
6523003 | Chandran et al. | Feb 2003 | B1 |
6539355 | Omori et al. | Mar 2003 | B1 |
6577739 | Hurtig et al. | Jun 2003 | B1 |
6615169 | Ojala et al. | Sep 2003 | B1 |
6675144 | Tucker et al. | Jan 2004 | B1 |
6680972 | Liljeryd et al. | Jan 2004 | B1 |
6681202 | Miet et al. | Jan 2004 | B1 |
6691083 | Breen | Feb 2004 | B1 |
6691085 | Rotola-Pukkila et al. | Feb 2004 | B1 |
6704711 | Gustafsson et al. | Mar 2004 | B2 |
6721698 | Hariharan et al. | Apr 2004 | B1 |
6741966 | Romesburg | May 2004 | B2 |
6766292 | Chandran et al. | Jul 2004 | B1 |
6778966 | Bizjak | Aug 2004 | B2 |
6819275 | Reefman et al. | Nov 2004 | B2 |
6895375 | Malah et al. | May 2005 | B2 |
7062040 | Faller | Jun 2006 | B2 |
7069212 | Tanaka et al. | Jun 2006 | B2 |
7139702 | Tsushima et al. | Nov 2006 | B2 |
7248711 | Allegro et al. | Jul 2007 | B2 |
7283967 | Nishio et al. | Oct 2007 | B2 |
7333618 | Shuttleworth et al. | Feb 2008 | B2 |
7333930 | Baumgarte | Feb 2008 | B2 |
20020107593 | Rabipour et al. | Aug 2002 | A1 |
20020111796 | Nemoto | Aug 2002 | A1 |
20020128839 | Lindgren et al. | Sep 2002 | A1 |
20020138268 | Gustafsson | Sep 2002 | A1 |
20030009327 | Nilsson et al. | Jan 2003 | A1 |
20030050786 | Jax et al. | Mar 2003 | A1 |
20030055636 | Katuo et al. | Mar 2003 | A1 |
20030093278 | Malah | May 2003 | A1 |
20030093279 | Malah et al. | May 2003 | A1 |
20030158726 | Philippe et al. | Aug 2003 | A1 |
20040022404 | Negishi | Feb 2004 | A1 |
20040057574 | Faller | Mar 2004 | A1 |
20040158458 | Sluijter et al. | Aug 2004 | A1 |
20040166820 | Sluijter et al. | Aug 2004 | A1 |
20040170228 | Vadde | Sep 2004 | A1 |
20040172242 | Seligman et al. | Sep 2004 | A1 |
20040174911 | Kim et al. | Sep 2004 | A1 |
20040175010 | Allegro et al. | Sep 2004 | A1 |
20040181393 | Baumgarte | Sep 2004 | A1 |
20040190734 | Kates | Sep 2004 | A1 |
20040264610 | Marro et al. | Dec 2004 | A1 |
20040264721 | Allegro et al. | Dec 2004 | A1 |
20050047611 | Mao | Mar 2005 | A1 |
20050159944 | Beerends | Jul 2005 | A1 |
20050175194 | Anderson | Aug 2005 | A1 |
20050195988 | Tashev et al. | Sep 2005 | A1 |
20050261893 | Toyama et al. | Nov 2005 | A1 |
20050286713 | Gunn et al. | Dec 2005 | A1 |
20060098810 | Kim | May 2006 | A1 |
20070198268 | Hennecke | Aug 2007 | A1 |
20070280472 | Stokes, III et al. | Dec 2007 | A1 |
20070282602 | Fujishima et al. | Dec 2007 | A1 |
Number | Date | Country |
---|---|---|
0 054 450 | Jun 1982 | EP |
0 497 050 | Aug 1992 | EP |
0 706 299 | Oct 1998 | EP |
1 424 133 | Feb 1976 | GB |
59-122135 | Jul 1984 | JP |
06-303166 | Oct 1994 | JP |
07-147566 | Jun 1995 | JP |
08-321792 | Dec 1996 | JP |
06-164520 | Jun 1997 | JP |
10-124098 | May 1998 | JP |
2001-196934 | Jul 2001 | JP |
2001-521648 | Nov 2001 | JP |
2002-073088 | Mar 2002 | JP |
2002-244686 | Aug 2002 | JP |
10-1998-0073078 | May 1998 | KR |
10-2002-0024742 | Apr 2002 | KR |
10-2002-0066921 | Aug 2002 | KR |
2002-0066921 | Aug 2002 | KR |
WO 9806090 | Feb 1998 | WO |
WO 9914986 | Mar 1999 | WO |
WO 0118960 | Mar 2001 | WO |
WO 2005-004111 | Jan 2005 | WO |
WO 2005015952 | Feb 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20060241938 A1 | Oct 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11110556 | Apr 2005 | US |
Child | 11298053 | US |