Having thus described embodiments of the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
Embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout.
In addition, while several embodiments of the method of the present invention are performed or used by a mobile terminal 10, the method may be employed by other than a mobile terminal. Moreover, the system and method of embodiments of the present invention will be primarily described in conjunction with mobile communications applications. It should be understood, however, that the system and method of embodiments of the present invention can be utilized in conjunction with a variety of other applications, both in the mobile communications industries and outside of the mobile communications industries.
The mobile terminal 10 includes an antenna 12 in operable communication with a transmitter 14 and a receiver 16. The mobile terminal 10 further includes a controller 20 or other processing element that provides signals to and receives signals from the transmitter 14 and receiver 16, respectively. The signals include signaling information in accordance with the air interface standard of the applicable cellular system, and also user speech and/or user generated data. In this regard, the mobile terminal 10 is capable of operating with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the mobile terminal 10 is capable of operating in accordance with any of a number of first, second and/or third-generation communication protocols or the like. For example, the mobile terminal 10 may be capable of operating in accordance with second-generation (2G) wireless communication protocols IS-136 (TDMA), GSM, and IS-95 (CDMA), or with third-generation (3G) wireless communication protocols, such as UMTS, CDMA2000, and TD-SCDMA.
It is understood that the controller 20 includes circuitry required for implementing audio and logic functions of the mobile terminal 10. For example, the controller 20 may be comprised of a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and other support circuits. Control and signal processing functions of the mobile terminal 10 are allocated between these devices according to their respective capabilities. The controller 20 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission. The controller 20 can additionally include an internal voice coder, and may include an internal data modem. Further, the controller 20 may include functionality to operate one or more software programs, which may be stored in memory. For example, the controller 20 may be capable of operating a connectivity program, such as a conventional Web browser. The connectivity program may then allow the mobile terminal 10 to transmit and receive Web content, such as location-based content, according to a Wireless Application Protocol (WAP), for example.
The mobile terminal 10 also comprises a user interface including an output device such as a conventional earphone or speaker 24, a ringer 22, a microphone 26, a display 28, and a user input interface, all of which are coupled to the controller 20. The user input interface, which allows the mobile terminal 10 to receive data, may include any of a number of devices allowing the mobile terminal 10 to receive data, such as a keypad 30, a touch display (not shown) or other input device. In embodiments including the keypad 30, the keypad 30 may include the conventional numeric (0-9) and related keys (#, *), and other keys used for operating the mobile terminal 10. Alternatively, the keypad 30 may include a conventional QWERTY keypad arrangement. The mobile terminal 10 further includes a battery 34, such as a vibrating battery pack, for powering various circuits that are required to operate the mobile terminal 10, as well as optionally providing mechanical vibration as a detectable output.
The mobile terminal 10 may further include a universal identity element (UIM) 38. The UIM 38 is typically a memory device having a processor built in. The UIM 38 may include, for example, a subscriber identity element (SIM), a universal integrated circuit card (UICC), a universal subscriber identity element (USIM), a removable user identity element (R-UIM), etc. The UIM 38 typically stores information elements related to a mobile subscriber. In addition to the UIM 38, the mobile terminal 10 may be equipped with memory. For example, the mobile terminal 10 may include volatile memory 40, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data. The mobile terminal 10 may also include other non-volatile memory 42, which can be embedded and/or may be removable. The non-volatile memory 42 can additionally or alternatively comprise an EEPROM, flash memory or the like, such as that available from the SanDisk Corporation of Sunnyvale, Calif., or Lexar Media Inc. of Fremont, Calif. The memories can store any of a number of pieces of information, and data, used by the mobile terminal 10 to implement the functions of the mobile terminal 10. For example, the memories can include an identifier, such as an international mobile equipment identification (IMEI) code, capable of uniquely identifying the mobile terminal 10.
Referring now to
The MSC 46 can be coupled to a data network, such as a local area network (LAN), a metropolitan area network (MAN), and/or a wide area network (WAN). The MSC 46 can be directly coupled to the data network. In one typical embodiment, however, the MSC 46 is coupled to a GTW 48, and the GTW 48 is coupled to a WAN, such as the Internet 50. In turn, devices such as processing elements (e.g., personal computers, server computers or the like) can be coupled to the mobile terminal 10 via the Internet 50. For example, as explained below, the processing elements can include one or more processing elements associated with a computing system 52 (two shown in
The BS 44 can also be coupled to a signaling GPRS (General Packet Radio Service) support node (SGSN) 56. As known to those skilled in the art, the SGSN 56 is typically capable of performing functions similar to the MSC 46 for packet switched services. The SGSN 56, like the MSC 46, can be coupled to a data network, such as the Internet 50. The SGSN 56 can be directly coupled to the data network. In a more typical embodiment, however, the SGSN 56 is coupled to a packet-switched core network, such as a GPRS core network 58. The packet-switched core network is then coupled to another GTW 48, such as a GTW GPRS support node (GGSN) 60, and the GGSN 60 is coupled to the Internet 50. In addition to the GGSN 60, the packet-switched core network can also be coupled to a GTW 48. Also, the GGSN 60 can be coupled to a messaging center. In this regard, the GGSN 60 and the SGSN 56, like the MSC 46, may be capable of controlling the forwarding of messages, such as MMS messages. The GGSN 60 and SGSN 56 may also be capable of controlling the forwarding of messages for the mobile terminal 10 to and from the messaging center.
In addition, by coupling the SGSN 56 to the GPRS core network 58 and the GGSN 60, devices such as a computing system 52 and/or origin server 54 may be coupled to the mobile terminal 10 via the Internet 50, SGSN 56 and GGSN 60. In this regard, devices such as the computing system 52 and/or origin server 54 may communicate with the mobile terminal 10 across the SGSN 56, GPRS core network 58 and the GGSN 60. By directly or indirectly connecting mobile terminals 10 and the other devices (e.g., computing system 52, origin server 54, etc.) to the Internet 50, the mobile terminals 10 may communicate with the other devices and with one another, such as according to the Hypertext Transfer Protocol (HTTP), to thereby carry out various functions of the mobile terminals 10.
Although not every element of every possible mobile network is shown and described herein, it should be appreciated that the mobile terminal 10 may be coupled to one or more of any of a number of different networks through the BS 44. In this regard, the network(s) can be capable of supporting communication in accordance with any one or more of a number of first-generation (1G), second-generation (2G), 2.5G and/or third-generation (3G) mobile communication protocols or the like. For example, one or more of the network(s) can be capable of supporting communication in accordance with 2G wireless communication protocols IS-136 (TDMA), GSM, and IS-95 (CDMA). Also, for example, one or more of the network(s) can be capable of supporting communication in accordance with 2.5G wireless communication protocols GPRS, Enhanced Data GSM Environment (EDGE), or the like. Further, for example, one or more of the network(s) can be capable of supporting communication in accordance with 3G wireless communication protocols such as Universal Mobile Telephone System (UMTS) network employing Wideband Code Division Multiple Access (WCDMA) radio access technology. Some narrow-band AMPS (NAMPS), as well as TACS, network(s) may also benefit from embodiments of the present invention, as should dual or higher mode mobile stations (e.g., digital/analog or TDMA/CDMA/analog phones).
The mobile terminal 10 can further be coupled to one or more wireless access points (APs) 62. The APs 62 may comprise access points configured to communicate with the mobile terminal 10 in accordance with techniques such as, for example, radio frequency (RF), Bluetooth (BT), infrared (IrDA) or any of a number of different wireless networking techniques, including wireless LAN (WLAN) techniques such as IEEE 802.11 (e.g., 802.11a, 802.11b, 802.11 g, 802.1 In, etc.), WiMAX techniques such as IEEE 802.16, and/or ultra wideband (UWB) techniques such as IEEE 802.15 or the like. The APs 62 may be coupled to the Internet 50. Like with the MSC 46, the APs 62 can be directly coupled to the Internet 50. In one embodiment, however, the APs 62 are indirectly coupled to the Internet 50 via a GTW 48. Furthermore, in one embodiment, the BS 44 may be considered as another AP 62. As will be appreciated, by directly or indirectly connecting the mobile terminals 10 and the computing system 52, the origin server 54, and/or any of a number of other devices, to the Internet 50, the mobile terminals 10 can communicate with one another, the computing system, etc., to thereby carry out various functions of the mobile terminals 10, such as to transmit data, content or the like to, and/or receive content, data or the like from, the computing system 52. As used herein, the terms “data,” “content,” “information” and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of the present invention.
Although not shown in
An exemplary embodiment of the invention will now be described with reference to
Referring now to
The first filtered signal 86 may be communicated to the non-linear function 72 which is in communication with the first band-pass filter 70. The non-linear function 72 creates low frequency components at harmonics below those included in the input speech signal 84. In this regard, the non-linear function 72 may create either or both of the fundamental frequency and other low frequency harmonics. For example, if the first band-pass filter 70 includes a pass band that passes the first and second harmonics of a particular input speech signal, the non-linear function 72 may produce the fundamental frequency and other harmonics as an output as shown in
As stated above, the non-linear function 72 is employed to recreate missing and/or attenuated harmonic components from the input speech signal 84 using the existing harmonics from the input speech signal 84. The missing and/or attenuated harmonic components are recoverable using the non-linear function 72 since, when a non-linear function is applied to a signal with two or more sine components (i.e., harmonics), the non-linear function produces some upper harmonic components and intermodular components at sum and difference frequencies of the two or more sine components. As shown in
In an exemplary embodiment as shown in
where ω02πf0, ω1=2πf1, ω2=2πf2, etc, and f1=2f0,f2=3f0, etc. Thus, a non-linear function output 88 from the non-linear function 72 would contain the lost fundamental frequency and the 3rd, 4th, and 5th harmonic components as shown in
As stated above, the first filtered signal 86 which is input to the non-linear function 72 may be a band-pass filtered version of the signal to be expanded (i.e. the input speech signal 84). The pass band Hbp1(z) of the first band-pass filter 70 may be fixed or dependent on the fundamental frequency of the input speech signal 84. In other words, filters employed in embodiments of the present invention may be either signal dependent or signal independent. For example, if the pass band of the first band-pass filter 70 is fixed (i.e., signal independent), the pass band should be such that at least two harmonics are always preserved, e.g. roughly 100-600 Hz. Meanwhile, if the pass band of the first band-pass filter 70 is dependent on the fundamental frequency of the input speech signal 84 (i.e., signal dependent), the higher cutoff frequency may be selected to be about 2-4 times a value of an estimate of the fundamental frequency.
As shown in
The second filtered signal 90 may then be gain adjusted by the amplifying element 76, a gain of which is controlled by the level control element 80 as described in greater detail below. An output of the amplifying element 76 is a gain adjusted low frequency signal 92 which is delayed with respect to the input speech signal 84 due to delays introduced, for example, in the first and second band-pass filters 70 and 74 and the non-linear function 72. The delays introduced may be compensated for before summation of the gain adjusted low frequency signal 92 with the input speech signal 84 at the summing element 78. In this regard, the delay element 82 may be employed to compensate for the delays introduced into the gain adjusted low frequency signal 92 by delaying the input speech signal 84 to produce a delayed input speech signal 96. The delays should be substantially the same throughout the pass band of the second band-pass filter 74, such that generated low-frequency components are summed in-phase with original signal components of the input speech signal 84 that have the same frequencies. In other words, components in the gain adjusted low frequency signal 92 must be summed in phase with corresponding components from the input speech signal 84. If the delay is frequency-dependent, a separate phase equalizer may be employed. If the first and second band-pass filters 70 and 74 are implemented as finite impulse response (FIR) filters and the non-linear function 72 preserves the phase, no phase equalizer may be needed and a constant delay may be used. If infinite impulse response (IIR) filters are used, the phase of the delayed input signal 96 may be equalized with an all pass filter. In any case, the delayed input signal 96 may be summed with the gain adjusted low frequency signal 92 to produce an enhanced or expanded output signal 98 (senh(n) in
As stated above, the amplifying element 76 adjusts a gain of the second filtered signal 90 to produce the gain adjusted low frequency signal 92. The gain of the amplifying element 76 is controlled by the level control element 80. An exemplary embodiment of the level control element 80 is shown in
The level control element 80 is employed to provide an adjustment to low frequency content prior to summing the low frequency content with the delayed input speech signal 96 to produce the expanded output signal 98. Accordingly, the level control element 80 adjusts the gain of the amplifying element 76 in response to a feature of the input speech signal 84. In this regard, a feature vector may be extracted from the input speech signal 84 using a feature extraction element 100. The feature vector may be used as an indicator of how much energy is missing from the input speech signal in the lowest frequencies (i.e., an estimate of the energy of the missing and/or attenuated harmonic components). In an exemplary embodiment, the feature vector may represent a tilt (or slope) of the narrowband spectrum. However, other features may be selected for use as the feature vector such as zero crossing rate or others. The tilt may be estimated from a fast Fourier transform (FFT) spectrum. Alternatively, a first order auto-regressive coefficient may be used.
The level control element 80 calculates signal energies or amplitude levels of three different signals. Two of the three different signals are produced by processing the input speech signal 84 at the first and second low-pass filters 102 and 104. Cutoff frequencies of the first and second low-pass filters 102 and 104 having pass bands Hlp1(z) and Hlp2(z), respectively, may be about 300-500 Hz and 500-800 Hz, respectively. Furthermore, the cutoff frequency of the first low-pass filter 102 may be selected to be substantially equal to a higher cutoff frequency of the second low-pass filter 104. Outputs of the first and second low-pass filters 102 and 104 (i.e., slp1(n) and slp2(n), respectively) are communicated to the first and second level estimation elements 106 and 108, respectively, which determine respective levels of slp1(n) and slp2(n). A third level estimate for determining a gain signal 114 to be applied to the amplifying element 76 may be a level of the second filtered signal 90 (i.e., slow(n)) that is output from the third level estimation element 110 and is based on low-frequency component regeneration parts generated by the expansion algorithm as provided by the system described with reference to
The level control element 80 produces the gain signal 114 based on an approximation that describes a relationship between sub-band amplitude levels calculated from a direct narrowband signal (e.g., a signal with original low-frequency components such as the second filtered signal 90), and a feature vector extracted from the corresponding low-frequency limited narrowband signal (e.g., the input speech signal):
where L1 is the amplitude level of a direct signal in the frequency band defined by the first low-pass filter 102, L2 is the amplitude level of a direct signal in the frequency band defined by the second low-pass filter 104, fL is a function that has been previously defined using direct training samples, and a is the feature vector extracted from a corresponding low-frequency limited signal.
Based on the approximation above, the gain to be applied to the second filtered signal 90 at the amplifying element 76 may be calculated as:
where Llp1 is the amplitude level of the bandlimited signal slp1(n) (i.e., the output of the first level estimation element 106), Llp2 is the amplitude level of a bandlimited signal slp2(n) (i.e., the output of the second level estimation element 108), and Llow is the amplitude level of signal slow(n) (i.e., the output of the third level estimation element 110).
It should be noted that although
where E1 is the energy of a direct signal in the frequency band defined by the first low-pass filter 102, E2 is the energy of the direct signal in the frequency band defined by the second low-pass filter 104, and fE is a function of the feature vector a. The gain to be applied to the second filtered signal 90 at the amplifying element 76 may be calculated as:
where E[slp1(n)] is the energy of the bandlimited signal slp1(n), E[slp2(n)] is the energy of the bandlimited signal slp2(n) and E[slow(n)] is the energy of slow(n) (i.e., the energy of the second filtered signal 90).
The feature vector could contain several features that could be useful in defining an optimal level adjustment. The features can be all extracted inside the level control element 80 by the feature extraction element 100, in exemplary embodiments in which a level control algorithm which embodies the level control element 80 includes the feature extraction element 100 as shown in
In an exemplary embodiment of the invention, an apparatus may be configured to execute the low frequency expansion described above for each input speech signal without regard to other factors. However, in an alternative exemplary embodiment, the low frequency expansion described above may be applied discriminatorily based on information related to device capabilities for devices receiving an input from an apparatus or computer program product capable of providing low frequency expansion as described above. For example, accessory information could be utilized so that low frequency expansion as described above is enabled only when it is determined that speaker elements being used are able to reproduce the generated low-frequency components. Additionally or alternatively, volume information could be also be useful in determining whether the low frequency expansion as described above should be employed due to potential limited power tolerance of earpiece elements. Alternatively, an amount of expansion towards low frequencies could be programmed to decrease gradually as the volume increases. In addition, a noise level of the input speech signal 84 may affect performance. Thus, when the signal-to-noise ratio (SNR) is poor, less content may be added to the low frequencies, because intelligibility may suffer if the noise components are expanded also.
It should also be noted that it is possible to directly control the properties of filter elements rather than providing a separate gain control for the output of the filter elements. For example, as shown in
Processes described above for providing low frequency expansion of an input speech signal may also be employed in a downsampled (or decimated) time domain. A low frequency expansion algorithm, such as that described above, is characterized in that an output of the algorithm includes the input speech signal 84 relatively unchanged except that an expanded low frequency component is added to the input speech signal 84. As such, low frequency expansion is a good candidate for processing using multi-rate signal processing techniques. In this regard, it is conceivable that significant computational savings could be achieved by splitting the input speech signal 84 into two or more downsampled signals and then implementing low frequency expansion only on the lowest frequency region.
Downsampling time domain processing helps in reducing the computational complexity in two main ways. First, all processing operations can be done at a lower sampling rate (i.e., less frequently). Accordingly, there is a savings in processor cycles which is linearly related to the downsampling factor. Second, without downsampling, the digital filters required in this application have fairly low cutoff frequencies and sharp transition bands, which require fairly high order, computationally accurate filters. Because the relative cutoff frequencies and transition bands increase with decreasing sampling rate, lower order filters can be used in a downsampled implementation. If filters are implemented as FIR filters, the filter length normally has a direct relation to the transition bandwidth. Additionally, when processing decimated signals, issues related to computational accuracy pertinent to IIR filter implementations are much less critical. As a result, downsampling may result in linear savings in computational complexity, which decreases with the sampling rate. However, consideration must also be given to overhead that is added by analysis and synthesis filterbanks.
An exemplary implementation of decimation may be accomplished using quadrature mirror filters (QMF) as shown in
A more detailed example showing the QMF analysis element 140 and the QMF synthesis element 142 is illustrated in
The QMF analysis element 140 splits the input speech signal 84 into a low-frequency portion (i.e., out0) and a high-frequency portion (i.e., out1) which undergo respective low-frequency branch processing 150 and high-frequency branch processing 152 as shown in
It should be noted that both the low and high-frequency branch processing 150 and 152 may also include use of the low and high-frequency portions (out0 and out1, respectively) in level control operations. More specifically, inputs to the level control element 80 may be modified as shown in
Both the low and high-frequency portions represent critically downsampled data. Because filters can never have infinitely sharp transition bands and infinite stopband attenuation, the analysis process will always produce aliased signal components (i.e., original components in the higher frequency band will cause attenuated signal components in the low-frequency output). However, the framework shown in
Of course, when the low-frequency band from the QMF analysis element 140 is processed for low-frequency extension, the phase and magnitude responses in the two branches will not be the same. Adding energy to the low-frequency signal components will create spurious high-frequency components when signals are reconstructed in the QMF synthesis element 142. However, this is not a problem in practice as long as the responses can be matched for the QMF transition band frequency region, where the aliasing is the strongest. For low-frequency extension of speech signals, this is easily achieved, as the low-frequency region where energy is added is sufficiently far from a typical QMF transition band edge. In such a case, a magnitude of generated aliased high-frequency components is determined by a stopband attenuation in the QMF synthesis element 142.
If an original sampling rate of the input speech signal 84 is, for example, 8 kHz, applying QMF downsampling once enables running time-domain processing at a 4 kHz sampling rate with an effective frequency range between about 0 and 2 kHz. Considering the frequency ranges of the filters employed, it may be possible to process data decimated by an additional factor of two. Such an implementation may be achieved by wrapping the implementation described with respect to
Accordingly, in the case of dual downsampling as shown in
As stated above, embodiments of the present invention may be employed in numerous fixed and mobile devices. It should be noted, however, that when embodiments are implemented in mobile telephone networks, such embodiments may be implemented in either mobile terminals or network side devices. For example, embodiments of the present invention may be implemented in a mobile terminal with a digital signal processor (DSP) together with other speech enhancement algorithms. Meanwhile, embodiments implemented in a network side device may be used on decoded speech signals. As such, input may be received from terminals which transmit narrowband signals and signals having low frequency expansion may be provided to mobile terminals in communication with the network side device. In this regard, low frequency expansion services may be provided in conjunction with high frequency expansion services or any other service either to every customer or to particular customers.
Accordingly, blocks or steps of the flowcharts support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that one or more blocks or steps of the flowcharts, and combinations of blocks or steps in the flowcharts, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
In this regard, one embodiment of a method of providing low frequency expansion of speech, as shown in
The above described functions may be carried out in many ways. For example, any suitable means for carrying out each of the functions described above may be employed to carry out embodiments of the invention. In one embodiment, all or a portion of the elements of the invention generally operate under control of a computer program product. The computer program product for performing the methods of embodiments of the invention includes a computer-readable storage medium, such as the non-volatile storage medium, and computer-readable program code portions, such as a series of computer instructions, embodied in the computer-readable storage medium.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these embodiments pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.