1. Field of the Invention
The present invention relates generally to audio processing, and more particularly to audio signal analysis.
2. Description of Related Art
Audio communication networks often have bandwidth limitations that affect the quality of the audio transmitted over the network. For example, telephone channel networks limit the bandwidth received by the receiver to 300 Hz to 3500 Hz. As a result, speech transmitted using only this limited bandwidth sounds thin and dull due to the lack of low and high frequency content in the audio signal.
Previous systems approached the issue in different ways. Some systems attempt to improve audio quality in a narrow bandwidth system by dividing the received signal into an envelope and an excitation portion. The system then analyzed and attempted to extend the bandwidth of both the envelope and excitation signals independently. This system requires a lot of resources and introduces latency in the processing.
Some previous systems attempt to remedy a narrow band audio signal by determining a mapping of the signal frequency components and reconstructing missing frequencies using an algorithm based on the mapped signal frequency components. This system also is not practical for use with audio applications due to introduced latency effects.
Therefore, there is a need for systems and methods to be able to quickly and efficiently improve the audio quality over bandwidth limited networks.
The present technology may provide audio signal bandwidth expansion for a narrow bandwidth signal received from a far end source. The far end source may transmit the signal over the audio communication network. The narrow band signal bandwidth is then expanded such that the bandwidth is greater than that of the audio communication network. The signal may be expanded by performing frequency folding on the signal. One or more features are then determined for the narrow bandwidth signal, and the expanded signal is modified based on a feature. The feature may be signal band energy slope, narrow band signal energy, or some other feature. The modification may be performed by a shelf filter selected based on the feature. The modified signals are then provided for additional processing. In some embodiments, a noise component is added to the narrow band signal prior to folding to create an excitation that reduces the appearance of a fully harmonic signal characteristic. An embodiment may process an audio signal to expand the bandwidth of the signal. A feature may be determined for a received signal. A bandwidth expansion module stored in memory may be executed to expand the spectrum of the received signal to create an expanded signal. The expanded signal may be modified based on the feature of the received signal.
An embodiment may include a system that expands the bandwidth of an audio signal. The system may include a processor as well as a signal fold module and feature extraction module stored in memory and executable by the processor. The signal fold module may be executed to receive an audio signal and provide an expanded signal having an expanded spectrum. The feature extraction module may be executed to receive the audio signal and provide a feature based on the audio signal. The signal shaping may be executed to modify the expanded signal based on the feature.
A computer readable storage medium as described herein has embodied thereon a program executable by a processor to perform a method for expanding the bandwidth of an audio signal as described above.
Other aspects and advantages of the present invention can be seen on review of the drawings, the detailed description, and the claims which follow.
The present technology expands the bandwidth of an audio signal received over an audio communication network. The bandwidth expansion is simple and efficient, hence low complexity, such that it minimizes the resources and time required to expand signal bandwidth. This allows for additional processing to be performed in near real time on the expanded audio signal without any discernible delay on the output signal
Audio signal bandwidth expansion may begin with receiving a narrow bandwidth signal from a far end source. The far end source may transmit the signal over the audio communication network. The narrow band signal bandwidth is then expanded such that the bandwidth is greater than that of the audio communication network. The signal may be expanded by performing frequency folding on the signal. One or more features are then determined for the narrow bandwidth signal, and the expanded signal is modified based on a feature. The feature may be signal band energy slope, narrow band signal energy, or some other feature. The modification may be performed by a shelf filter selected based on the feature. The modified signals are then provided for additional processing. In some embodiments, a noise component may be added to the narrow band signal to reduce the appearance of a non-harmonic signal characteristic.
Processor 220 may execute instructions and modules stored in a memory (not illustrated in
The exemplary receiver 210 is an acoustic sensor configured to receive a signal from a communications network. In some embodiments, the receiver 210 may include an antenna device. The signal may then be forwarded to the audio processing system 250 to reduce noise using the techniques described herein, and provide an audio signal to the output device 260. The present technology may be used in one or both of the transmit and receive paths of the audio device 110.
The audio processing system 250 is configured to receive the acoustic signals from an acoustic source via the primary microphone 230 and secondary microphone 240 and process the acoustic signals. Processing may include generating sub-band signals from one or more received acoustic signals, performing noise reduction on the sub-band signals, and reconstructing the noise-reduced (i.e., modified) sub-band signals. The audio processing system 250 is discussed in more detail below.
The primary and secondary microphones 230 and 240 may be spaced a distance apart in order to allow for detection of an energy level difference between them. The acoustic signals received by primary microphone 230 and secondary microphone 240 may be converted into electrical signals (i.e. a primary electrical signal and a secondary electrical signal). The electrical signals may themselves be converted by an analog-to-digital converter (not shown) into digital signals for processing in accordance with some embodiments. In order to differentiate the acoustic signals for clarity purposes, the acoustic signal received by the primary microphone 230 is herein referred to as the primary acoustic signal, while the acoustic signal received from by the secondary microphone 240 is herein referred to as the secondary acoustic signal. The primary acoustic signal and the secondary acoustic signal may be processed by the audio processing system 250 to produce a signal with an improved signal-to-noise ratio. It should be noted that embodiments of the technology described herein may be practiced utilizing only the primary microphone 230.
The output device 260 is any device which provides an audio output to the user. For example, the output device 260 may include a speaker, an earpiece of a headset or handset, or a speaker on a conference device.
In various embodiments, where the primary and secondary microphones are omni-directional microphones that are closely-spaced (e.g., 1-2 cm apart), a beamforming technique may be used to simulate forwards-facing and backwards-facing directional microphones. The level difference may be used to discriminate speech and noise in the time-frequency domain which can be used in noise reduction.
Noise reduction module 310 may receive the narrow band signal and provide a noise reduced version to bandwidth expansion module 320. An audio processing system suitable for performing noise reduction by noise reduction module 310 is discussed in more detail in U.S. patent Ser. No. 12/832,901, titled “Method for Jointly Optimizing Noise Reduction and Voice Quality in a Mono or Multi-Microphone System,” filed on Jul. 8, 2010, the disclosure of which is incorporated herein by reference.
Bandwidth expansion module 320 may process the noise reduced narrow band signal to expand the bandwidth of the signal. Bandwidth expansion module 320 is discussed in more detail below with respect to
Noise generator module 415 may generate a noise signal. The noise signal may be described as a function N, which may be expressed as
Where 1<a<2. For example, if a=1, a “pink” noise signal may be generated while if a=2, a “Brownian” noise signal may be generated. The generated noise signal is provided to modulator 420.
Modulator 420 combines the generated noise signal and the narrow band envelope into a single modulated signal. Hence, the noise signal is modulated to provide greater energy at frequencies having higher energy within the narrow band signal. The modulated signal is then provided to gain module 430 where a gain is applied to the modulated signal, and the gain signal is then applied to combiner 435.
The narrow band received by bandwidth expansion module 320 may be provided to gain module 425. The output of gain module 425 is then applied to combiner 435, which combines the modulated noise signal and the narrow band signal output by gain module 425. The combined noise and narrow band signal is then provided to signal fold module 440.
Signal fold module 440 receives the combined signal and “folds” the signal. To fold the signal, the sampling of the signal is doubled by inserting samples having a magnitude of zero (0.0) in between each sample. The narrow band signal is up-sampled by two, resulting in a signal with twice the initial sampling rate and a spectrum symmetrical about the half band. The second half of the spectrum at high frequencies is a mirror image of the spectrum of the first half at lower frequency. By folding a signal, the signal frequencies appear as a mirror image about the upper frequency of the original combined signal.
Feature extraction module 445 receives the narrow bandwidth signal and extracts a feature from the signal. The feature may include pitch estimation, pitch energy, energy ratio, or some other feature. For example, the feature may include a ratio of energy in a first portion of the narrow bandwidth signal to the energy in a second portion of the narrow bandwidth signal. The ratio of the energy in the first portion of the spectrum to the energy of the second portion of the spectrum may be determined per frame of the far-end signal. The one or more features may be sent to signal shaping module 450.
Signal shaping module 450 receives the signal with the folded spectrum from signal fold module 440 and one or more features from feature extraction module 445. Signal shaping module 450 then applies a filter to the expanded signal based on one or more features.
Signal shaping module 450 may shape the expanded signal to help the expanded portion of the signal comply with characteristics and pattern of the narrow band signal. For example, if the narrow band signal is characteristic of speech, the signal shaping module 450 may shape the expanded portion of the signal to better resemble a spectrum resembling a speech model. In one embodiment, signal shaping module 450 may shape the expanded signal based on a feature of the narrow band signal. Signal shaping module 450 may select a filter, such as a shelf filter, based on a feature received from feature extraction module 445 and apply the selected filter to the expanded signal received from signal fold module 440.
Once a shelf filter is applied to the folded signal, signal shaping module 450 provides the filtered signal to a high pass filter module 455. High pass filter module 455 applies a high pass filter to the filtered signal in order to retain only the expanded portion of the signal.
The narrow band signal received by bandwidth expansion module 320 may be expanded at signal fold module 465 and filtered by low pass filter 470. The high pass filtered signal and the low pass filtered signal are combined at combiner 460 and provided as output by bandwidth expansion module 320.
The all-pass subsystems 620 and 630 are generated by factoring a low pass prototype filter 610, which may be designed using several methods including but not limited to odd order elliptic, Butterworth, Chebyschev filter design methods, into power complimentary all-pass subsystems. The all-pass subsystems A0(z) 620 and A1(z) 630 can then form high pass and low pass complimentary filters. The outputs of all-pass subsystems 620 and 630 are then summed at summing modules 640 and 650. The high pass branch associated with summing module 650 may be scaled by a gain G 660, which produces a shelving equalizer filter 670. The prototype filter can be of any order, such as an odd order, allowing an arbitrary slope in the transition region.
The narrow band signal may be processed to reduce noise at step 720. Reducing noise may include steps such as detecting a noise component, echo component and noise component, reducing the noise by subtractive cancellation or multiplicative noise suppression, and other processing. Processing the expanded signal to reduce noise is described in more detail in U.S. patent Ser. No. 12/832,901, titled “Method for Jointly Optimizing Noise Reduction and Voice Quality in a Mono or Multi-Microphone System,” filed on Jul. 8, 2010, the disclosure of which is incorporated herein by reference.
The noise reduced narrow band signal may be expanded to create an expanded signal at step 730. The narrow band signal may be expanded in a simple and efficient manner. The expansion may involve signal spectrum folding as well as signal shaping to form an expanded signal. Expanding a narrow band acoustic signal is discussed in more detail below with respect to
After expanding the narrow band acoustic signal, the expanded signal is output at step 740. The signal may be output via a speaker or some other output device.
A feature of an acoustic signal may be determined at step 820. The feature can be a measured property of a signal or derived from the signal. For example, the feature may be a ratio of energy in different portions of the narrow band acoustic signal. Determining a feature of an acoustic signal is discussed in more detail below with respect to
An expanded signal may be modified based on a feature at step 840. A feature may be used to select a filter model which may then be applied to the expanded signal spectrum, for example by signal shaping module 450. Modifying the expanded signal based on a feature is described in more detail below with respect to
A high pass filter may be applied to a modified expanded signal at step 850. A high pass filter may be applied to select only the upper frequency portion, or the extended portion, of an expanded signal. A low pass filter may be applied to the original received narrow band signal at step 860. The low pass filter may be applied to ensure that only the original signal is used in generating an output signal. The high pass filtered signal and the low pass filtered signal may be combined at step 870. Combining the signals may be formed by a simple combiner, but may also involve smoothing of the signals to avoid any distortion.
An energy level of a higher frequency portion of the narrow band signal may be determined at step 1020. The energy level of the higher frequency portion may be determined in the same way as the energy level of the lower frequency portion, but is performed for a different portion of the narrow band signal. For example, the energy may be the frequency components greater than R2, the energy of frequency component R2, or some other frequency energies.
A ratio of the lower frequency portion energy and the higher frequency portion energy is determined at step 1030. The ratio is determined to identify whether a narrow band signal can be characterized as speech, noise, or some other type of signal. For example, in a voice signal, the lower frequency portions will have more energy than the higher frequency portions. Thus, in voice signals, the ratio of the lower frequency components to the higher frequency components will be greater than 1.
The above described modules, including those discussed with respect to
While the present invention is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the invention and the scope of the following claims.
This application claims the benefit of U.S. Provisional Application No. 61/319,881, filed on Apr. 1, 2010, entitled “Low Complexity Bandwidth Expansion of Speech,” the disclosure of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
3517223 | Gaunt, Jr. | Jun 1970 | A |
6377915 | Sasaki | Apr 2002 | B1 |
6895375 | Malah et al. | May 2005 | B2 |
8078474 | Vos et al. | Dec 2011 | B2 |
8271292 | Osada et al. | Sep 2012 | B2 |
20030093278 | Malah | May 2003 | A1 |
20070299655 | Laaksonen et al. | Dec 2007 | A1 |
Number | Date | Country | |
---|---|---|---|
61319881 | Apr 2010 | US |