1. Technical Field
The invention relates to communication systems, and more particularly, to systems that extend audio bandwidths.
2. Related Art
Some telecommunication systems transmit speech across a limited frequency range. The receivers, transmitters, and intermediary devices that makeup a telecommunication network may be bandlimited. These devices may limit speech to a bandwidth that significantly reduces intelligibility and introduces perceptually significant distortion that may corrupt speech. In many telephone systems bandwidth limitations result in the characteristic sounds that may be associated with telephone speech.
While users may prefer listening to wideband speech, the transmission of such signals may require the building of new telecommunication networks that support larger bandwidths. New networks may be expensive and will likely take time to become established. Since many established networks support narrow band speech, there is a need for systems that extend signal bandwidths at receiving ends.
Bandwidth extension may be problematic. While some bandwidth extension methods reconstruct speech under ideal conditions, these methods cannot extend speech in noisy environments. Since it is difficult to model the effects of noise, the accuracy of these methods may decline in the presence of noise. Therefore, there is also a need for a system that improves the perceived quality of speech in a noisy environment.
A system extends the bandwidth of a narrowband speech signal into a wideband spectrum. The system includes a high-band generator that generates a high frequency spectrum based on a narrowband spectrum. A background noise generator generates a high frequency background noise spectrum based on a background noise within the narrowband spectrum. A summing circuit linked to the high-band generator and background noise generator combines the high frequency band and narrowband spectrum with the high frequency background noise spectrum.
Other systems, methods, features, and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
Bandwidth extension logic generates more natural sounding speech. When processing a narrowband speech, the bandwidth extension logic combines a portion of the narrowband speech with a high-band extension. The bandwidth extension logic may generate a wideband spectrum based on a correlation between the narrowband and high-band extension. Some bandwidth extension logic works in real-time or near real-time to minimize noticeable or perceived communication delays.
When a portion of the extended narrowband spectrum falls below a predetermined threshold (e.g., that may be a dynamic or a static noise floor) the associated phase of that portion of the spectrum is randomized through a phase adjuster 112 before the envelop is adjusted. The extended spectral envelope may be generated by a predefined transformation. In
To ensure that the energy in the extended narrowband spectrum (that may be referred to as the high-band extension in this system) is adjusted to the energy in the original narrowband signal, the amplitudes of the harmonics in the extended narrowband spectrum are adjusted to the extended spectral envelope through a gain adjuster or a harmonic adjuster 118. Portions of the phase of the extended narrowband that correspond to a consonant are then randomized when the parameter detector detects a consonant through a phase adjuster 120. Separate power spectral density masks filter the narrowband signal and high frequency bandwidth extension before they are combined. In
To ensure that the combined narrowband and high-band extension is more natural sounding a background noise spectrum may be added to the combined signal. In
In
In
A real time or near real time convolver 204 convolves the new speech spectrum with itself to generate a high-band or extended spectrum SExt(f). The systems and methods described in U.S. application Ser. No. 11/168,654 entitled “Frequency Extension Harmonic Signals” filed Jun. 28, 2005, which is incorporated herein by reference may be used.
To generate a more natural sounding speech, when the magnitude of the extended spectrum lies below a predetermined level or factor of the background noise spectrum, the phases of those portions of the extended spectrum are made random by a phase adjuster 206. This relation may be expressed in equation 2 where m lies between about 1 and about 5.
To adjust the envelope of the extended spectrum, the envelope of narrowband speech is extracted through an envelope extractor 208. The narrowband spectral envelope may be derived, mapped, or estimated from the narrowband signal. A spectral envelope generator 210 then estimates or derives the high-band or extended spectral envelope. In
w=T(f)=α(f−fL)(wH−wL)/(fH−fL)+wL Equation 3
The parameter α may be adjusted empirically or programmed to a predetermined value depending on whether the portion of the narrowband spectral envelope to be extended corresponds to a vowel, a consonant, or a background noise. In
To ensure that the energy in the extended spectrum matches the energy in the narrowband spectrum, the harmonics in the extended narrowband spectrum are adjusted to the extended spectral envelope through a gain adjuster 214. Adjustment may occur by scaling the extended narrowband spectrum so that the energy in a portion of the extended spectrum is almost equal or substantially equal to the energy in a portion of the narrowband speech spectrum. Portions of the phase of the extended narrowband signal that correspond to a consonant are then randomized by a phase adjuster 216 when the consonant/vowel/no-speech detector detects a consonant. Separate power spectral density masks filter the narrowband speech signal and the extended narrowband signal before the signals are combined through combining logic or a summer 250. In
To make the bandwidth of the extended spectrum sound more natural, a background noise may be extended separately and then added to the combined bandwidth extended and narrowband speech spectrum. In some systems the extended background noise spectrum has random phases with a consistent envelope slope.
In
w=T((f)=α(f−fL)(wH−wL)/(fH−fL)+wL Equation 3
parameter α may be adjusted empirically or may be programmed to a predetermined value.
Random phases consisting of uniformly distributed numbers between about 0 and about 2π are introduced into the extended background noise spectrum through a phase adjuster 224 before it is filtered by a power spectral density mask 226. The power spectral density mask 226 selectively passes portions of the extended background noise spectrum that are above a predetermined frequency before it is combined through combining logic or a summer 228 with the narrowband speech and extended spectrum. In those systems having an upper break frequency near about 5,500 Hz, the power spectral density mask may generate the frequency response shown in
In
In
Some consonant/vowel/no-speech detectors 212 may detect a vowel or a consonant when a measured or an estimated EL and/or γ lie above or below a predetermined threshold or within a predetermined range. Some bandwidth extension systems recognize that some vowels have a greater value of EL and a smaller value of γ than consonants. The spectral estimates or measures and decisions made on previous frames may also be used to facilitate the consonant/vowel decision in the current frame. Some bandwidth extension systems detect no-speech regions, when energy is not detected above a measured or derived background noise floor.
In
When a portion of the extended narrowband spectrum falls below a predetermined threshold (e.g., that may be a dynamic or a static noise floor) the associated phase of that is randomized at act 1006 before the extended envelop is adjusted. In
To ensure that the energy in the extended narrowband spectrum (that may be referred to as the high-band extension) is adjusted to the energy in the original narrowband signal, the amplitude or gain of the harmonics in the extended narrowband spectrum is adjusted to the extended spectral envelope at act 1014. Portions of the phase of the extended narrowband that correspond to a consonant are then randomized when a consonant is detected at acts 1012 and 1016. Separate power spectral density masks filter the narrowband signal and high frequency bandwidth extension before they are combined. In
To ensure that the combined narrowband and high-band extension is more natural sounding a background noise spectrum may be added to the combined signal. At act 1020, a background noise envelope is extracted and extended at act 1022 through an envelope extension. Envelope extension may occur through a linear transformation, a mapping, or other methods. Random phases are then introduced into the extended background noise spectrum at act 1024. A second power spectral density mask selectively passes portions of the extended background noise spectrum at act 1026 that are above a predetermined frequency before it is combined with the narrowband signal and high-band extension signal at act 1032.
In
Each of the systems and methods described above may be encoded in a signal bearing medium, a computer readable medium such as a memory, programmed within a device such as one or more integrated circuits, or processed by a controller or a computer. If the methods are performed by software, the software may reside in a memory resident to or interfaced to the high-band generator 102, the background noise generator 104, and/or the parameter detector 106 or any other type of non-volatile or volatile memory interfaced, or resident to the speech enhancement logic. The memory may include an ordered listing of executable instructions for implementing logical functions. A logical function may be implemented through digital circuitry, through source code, through analog circuitry, or through an analog source such through an analog electrical, or optical signal. The software may be embodied in any computer-readable or signal-bearing medium, for use by, or in connection with an instruction executable system, apparatus, or device. Such a system may include a computer-based system, a processor-containing system, or another system that may selectively fetch instructions from an instruction executable system, apparatus, or device that may also execute instructions.
A “computer-readable medium,” “machine-readable medium,” “propagated-signal” medium, and/or “signal-bearing medium” may comprise any apparatus that contains, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical connection “electronic” having one or more wires, a portable magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM” (electronic), a Read-Only Memory “ROM” (electronic), an Erasable Programmable Read-Only Memory (EPROM or Flash memory) (electronic), or an optical fiber (optical). A machine-readable medium may also include a tangible medium upon which software is printed, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
While some systems extend or map narrowband spectra to wideband spectra, alternate systems may extend or map a portion or a variable amount of a spectra that may lie anywhere at or between a low and a high frequency to frequency spectra at or near a high frequency. Some systems extend encoded signals. Information may be encoded using a carrier wave of constant or an almost constant frequency but of varying amplitude (e.g., amplitude modulation, AM). Information may also be encoded by varying signal frequency. In these systems, FM radio bands, audio portions of broadcast television signals, or other frequency modulated signals or bands may be extended. Some systems may extend AM or FM radio signals by a fixed or a variable amount at or near a high frequency range or limit.
Some other alternate systems may also be used to extend or map high frequency spectra to narrow frequency spectra to create a wideband spectrum. Some system and methods may also include harmonic recovery systems or acts. In these systems and/or acts, harmonics attenuated by a pass band or hidden by noise, such as a background noise may be reconstructed before a signal is extended. These systems and/or acts may use a pitch analysis, code books, linear mapping, or other methods to reconstruct missing harmonics before or during the bandwidth extension. The recovered harmonics may then be scaled. Some systems and/or acts may scale the harmonics based on a correlation between the adjacent frequencies within adjacent or prior frequency bands.
Some bandwidth extension systems extend the spectrum of a narrowband speech signal into wideband spectra. The bandwidth extension is done in the frequency domain by taking a short-time Fourier transform of the narrowband speech signal. The system combines an extended spectrum with the narrowband spectrum with little or no artifacts. The bandwidth extension enhances the quality and intelligibility of speech signals by reconstructing missing bands that may make speech sound more natural and robust in different levels of background noise. Some systems are robust to variations in the amplitude response of a transmission channel or medium.
While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
4255620 | Harris et al. | Mar 1981 | A |
4343005 | Han et al. | Aug 1982 | A |
4700360 | Visser | Oct 1987 | A |
4741039 | Bloy | Apr 1988 | A |
4953182 | Chung | Aug 1990 | A |
5335069 | Kim | Aug 1994 | A |
5345200 | Reif | Sep 1994 | A |
5396414 | Alcone | Mar 1995 | A |
5416787 | Kodoma et al. | May 1995 | A |
5455888 | Iyengar et al. | Oct 1995 | A |
5497090 | Macovski | Mar 1996 | A |
5581652 | Abe et al. | Dec 1996 | A |
5771299 | Melanson | Jun 1998 | A |
5778335 | Ubale et al. | Jul 1998 | A |
5949796 | Kumar | Sep 1999 | A |
5950153 | Ohmori et al. | Sep 1999 | A |
6115363 | Oberhammer et al. | Sep 2000 | A |
6144244 | Gilbert | Nov 2000 | A |
6154643 | Cox | Nov 2000 | A |
6157682 | Oberhammer | Dec 2000 | A |
6195394 | Arbeiter et al. | Feb 2001 | B1 |
6208958 | Cho et al. | Mar 2001 | B1 |
6226616 | You et al. | May 2001 | B1 |
6246698 | Kumar | Jun 2001 | B1 |
6295322 | Arbeiter et al. | Sep 2001 | B1 |
6504935 | Jackson | Jan 2003 | B1 |
6577739 | Hurtig et al. | Jun 2003 | B1 |
6615169 | Ojala et al. | Sep 2003 | B1 |
6675144 | Tucker et al. | Jan 2004 | B1 |
6681202 | Miet et al. | Jan 2004 | B1 |
6691083 | Breen | Feb 2004 | B1 |
6704711 | Gustafsson et al. | Mar 2004 | B2 |
6889182 | Gustafsson | May 2005 | B2 |
6988066 | Malah | Jan 2006 | B2 |
7046694 | Kumar | May 2006 | B2 |
7174135 | Sluijter et al. | Feb 2007 | B2 |
7181402 | Jax et al. | Feb 2007 | B2 |
20020128839 | Lindgren et al. | Sep 2002 | A1 |
20030009327 | Nilsson et al. | Jan 2003 | A1 |
20030050786 | Jax et al. | Mar 2003 | A1 |
20030158726 | Philippe et al. | Aug 2003 | A1 |
20040019492 | Tucker et al. | Jan 2004 | A1 |
20040138876 | Kallio et al. | Jul 2004 | A1 |
20040148162 | Fingscheidt et al. | Jul 2004 | A1 |
20040158458 | Sluijter et al. | Aug 2004 | A1 |
20040174911 | Kim et al. | Sep 2004 | A1 |
20040264721 | Allegro et al. | Dec 2004 | A1 |
20050267741 | Laaksonen et al. | Dec 2005 | A1 |
20060293016 | Giesbrecht et al. | Dec 2006 | A1 |
Number | Date | Country |
---|---|---|
0 497 050 | Aug 1992 | EP |
0 706 299 | Apr 1996 | EP |
WO 98-06090 | Feb 1998 | WO |
WO 01-18960 | Mar 2001 | WO |
WO 0233696 | Apr 2002 | WO |
WO 02093562 | Nov 2002 | WO |
WO 2005-015952 | Feb 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20070150269 A1 | Jun 2007 | US |