1. Field of the Invention
This invention relates in general to methods and systems that transmit and receive audio and more particularly, that rely on multiband excitation vocoders to do so.
2. Description of the Related Art
In recent years, portable electronic devices, such as cellular telephones and personal digital assistants, have become commonplace. Many of these devices include a vocoder, such as a multiband excitation (MBE) vocoder. An MBE vocoder is a device that converts analog speech waveforms from various individuals into digital signals. These digital signals are then typically transmitted to another portable electronic device, where they are decoded and broadcast through a speaker to a user of the receiving portable electronic device.
Many MBE vocoders, however, have a limited encoding range. For example, most MBE vocoders are only able to encode speech waveforms that have pitch values between 80 Hz and 500 Hz. The range is limited because the vocoder is provided with a relatively small number of bits to cover the whole spectrum of pitch values generated by the different types of user voices (only a small number of bits are provided to preserve bandwidth).
Generally, the limited range is suitable for encoding the many different types of user voices. The pitch values of certain voice types, however, may exceed the encoding range of the vocoder. For example, the pitch values of the voice of a woman or a small child may surpass this range, particularly if the woman or small child is in an excited state. That is, the pitch inflections of certain individuals may exceed an allowable pitch range. In this instance, the vocoder cannot properly encode the speech waveforms, which will result in a degradation of voice quality.
The present invention concerns a method for improving voice quality of a vocoder. The method includes the steps of monitoring a pitch of a voice signal; at a transmitting unit, when the pitch of the voice signal reaches a predetermined threshold, shifting the pitch of the voice signal to at least a portion of a predetermined range; transmitting the pitch-shifted voice signal to a receiving unit; and at the receiving unit, reshifting the pitch-shifted voice signal to a level that compensates the step of shifting the pitch of the voice signal at the transmitting unit.
As an example, the voice signal can be comprised of a plurality of time-based frames. In one arrangement, the monitoring the pitch step includes the steps of estimating the pitch of the voice signal for at least a portion of the time-based frames of the voice signal and based on the estimating step, generating a pitch contour of the voice signal. In another arrangement, the voice signal can be comprised of voiced and unvoiced portions. Additionally, the generating the pitch contour step can include the step of interpolating the pitch contour for the unvoiced portions of the voice signal.
The method can also include the steps of, in the transmitting unit, detecting speech on the voice signal and when detecting speech on the voice signal, determining whether the speech is comprised of voiced and unvoiced portions. Also, if no speech is detected on the voice signal, the method can further include the step of inserting silence frames into the voice signal. The method can also include the step of converting at least a portion of the silence frames to pitch frames. The pitch frames can signal the receiving unit that the pitch-shifted voice signal was pitch shifted. The pitch frames can also signal the receiving unit of the magnitude that the pitch-shifted voice signal was shifted. As an alternative step, the pitch frames can be added to the voice signal.
The pitch of the voice signal can be shifted by either increasing or decreasing the pitch of the voice signal. The method can further include the steps of encoding the pitch-shifted voice signal at the transmitting unit, decoding the pitch-shifted voice signal at the receiving unit and detecting a voiced or an unvoiced condition on the voice signal. As an example, the predetermined threshold can be a compression window, and the predetermined range can be between the maximum encoding pitch level and the minimum encoding pitch level of the vocoder. As another example, the pitch of the voice signal can be shifted from a first level to the portion of the predetermined range. The pitch-shifted voice signal can be reshifted at the receiving unit to a second level that is at least substantially equal to the first level.
The present invention also concerns a system for improving voice quality of a vocoder. The system includes a pitch analysis section for monitoring a pitch of a voice signal, a pitch shifter coupled to the pitch analysis section, an encoding section coupled to the pitch shifter and a transmission section coupled to the encoding section. When the pitch analysis section determines that the pitch of the voice signal has reached a predetermined threshold, the pitch shifter shifts the pitch of the voice signal to at least a portion of a predetermined range. In addition, the encoding block encodes the voice signal and provides pitch-shifting information in the voice signal, and the transmission section transmits the pitch-shifted voice signal to a receiving unit. The receiving unit uses the pitch-shifting information to reshift the pitch-shifted voice signal to a level that compensates the pitch shifting performed by the pitch shifter. The system can also include suitable software and/or circuitry to carry out the processes described above.
The present invention also concerns a machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a portable computing device. The code sections cause the portable computing device to perform the steps of monitoring a pitch of a voice signal; at a transmitting unit, when the pitch of the voice signal reaches a predetermined threshold, shifting the pitch of the voice signal to at least a portion of a predetermined range; and transmitting the pitch-shifted voice signal to a receiving unit. At the receiving unit, the pitch-shifted voice signal is reshifted to a level that compensates the step of shifting the pitch of the voice signal at the transmitting unit. The code sections can also cause the portable computing device to perform the steps described above.
The features of the present invention, which are believed to be novel, are set forth with particularity in the appended claims. The invention, together with further objects and advantages thereof, may best be understood by reference to the following description, taken in conjunction with the accompanying drawings, in the several figures of which like reference numerals identify like elements, and in which:
While the specification concludes with claims defining the features of the invention that are regarded as novel, it is believed that the invention will be better understood from a consideration of the following description in conjunction with the drawing figures, in which like reference numerals are carried forward.
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed structure. Further, the terms and phrases used herein are not intended to be limiting but rather to provide an understandable description of the invention.
The terms a or an, as used herein, are defined as one or more than one. The term plurality, as used herein, is defined as two or more than two. The term another, as used herein, is defined as at least a second or more. The terms including and/or having, as used herein, are defined as comprising (i.e., open language). The term coupled, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically. The terms program, software application, and the like as used herein, are defined as a sequence of instructions designed for execution on a computer system. A program, computer program, or software application may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
This invention presents a method and system for improving voice quality of a vocoder. For example, a transmitting unit can transmit a voice signal to a receiving unit. In the transmitting unit, a pitch analysis section can monitor the pitch of the voice signal, and when it reaches a predetermined threshold, a pitch shifter can shift the pitch of the voice signal to at least a portion of a predetermined range. The predetermined threshold can be a compression window. The pitch-shifted voice signal can be transmitted to the receiving unit. In the receiving unit, a decoding block can reshift the pitch-shifted voice signal to compensate for the pitch shifting that occurred in the transmitting unit.
Referring to
It should also be noted that the transmitting unit 110 is not limited to transmitting signals and that the receiving unit 112 is not limited to receiving signals. These terms are merely meant to distinguish the transmitting unit 110 from the receiving unit 112. As such, the transmitting unit 110 can receive any suitable type of communications signals. Similarly, the receiving unit 112 can transmit any suitable type of communications signals. As an example, the transmitting unit 110 and the receiving unit 112 can be mobile communication units, such as cellular telephones, personal digital assistants, two-way radios, etc. Of course, the transmitting unit 110 can be any electronic device that is capable of at least encoding speech, and the receiving unit 112 can be any electronic device that is capable of at least decoding speech.
The transmitting unit 110 and the receiving unit 112 can also be referred to as portable computing devices, both of which can be loaded with a computer program having a plurality of code sections. These code sections can be executable by the portable computing devices 110, 112 for causing the portable computing devices 110, 112 to perform the inventive methods that will be described below.
In one arrangement, the transmitting unit 110 can include a pitch analysis section 118, a pitch shifter 120, an encoding section 122 and a transmission section 124. The pitch analysis section 118 can be coupled to the pitch shifter 120, which can be coupled to the encoding section 122. Additionally, the encoding section 122 can be coupled to the transmission section 124. The receiving unit 112 can include a receiving section 126 and a decoding section 128 in which the receiving section 126 can be coupled to the decoding section 128.
Briefly, the pitch analysis section 118 can monitor the pitch of a voice signal in the transmitting unit 110. A voice signal may or may not contain speech. When the pitch analysis section 118 determines that the pitch of the voice signal has reached a predetermined threshold, the pitch shifter 120 can shift the pitch of the voice signal to at least a portion of a predetermined range. The encoding section 122 can encode the voice signal, and the transmission section 124 can transmit the voice signal to the receiving unit 112.
At the receiving unit 112, the receiving section 126 can receive the voice signal. Additionally, the decoding section 128 can reshift the pitch-shifted voice signal to a level that compensates the pitch shifting performed by the pitch shifter 120. The decoding section 128 can also decode the voice signal. Those of skill in the art will appreciate, however, that the transmitting unit 110 and the receiving unit 112 can include other suitable components for performing many other functions.
Referring to
The voiced/unvoiced detector 134 can be coupled to the pitch contour block 135 and can also have a signaling path to the pitch contour block 135. The speech activity detector 130 can also have a signaling path to the voiced/unvoiced detector 134. In one arrangement, the voiced/unvoiced detector 134 can detect voiced and unvoiced portions of speech that are on the voice signal, and the pitch contour block 135, based on the pitch estimation, can determine a pitch contour for the voice signal.
The pitch contour block 135 can be coupled to the range test control block 136, and the range test control block 136 can be coupled to the pitch shifter 120. The range test control block 136 can also have a signaling path to the pitch shifter 120. In one embodiment of the invention, the range test control block 136 can determine when the pitch contour of the voice signal reaches a predetermined threshold. When the pitch contour does so, the range test control block 136 can signal the pitch shifter 120. As will be explained later, the pitch shifter 120 can shift the pitch of the voice signal into at least a portion of a predetermined range.
The encoding section 122 can include a vocoder 138, a frame type detector 140 and a silent frame block 142. The pitch shifter 120 can be coupled to the vocoder 138, and the vocoder 138 can be coupled to the frame type detector 140. The vocoder 138 can encode the voice signal, such as by generating frames. The frame type detector 140 can be coupled to the silent frame block 142, and the frame type detector 140 can also have a signaling path to the silent frame block 142. As an example, the frame type detector 140 can detect the frames that the vocoder 138 generates and can selectively signal the silent frame block 142 based on the presence of certain frames. The range test control block 136 can also have a signaling path to the silent frame block 142 to permit the range test control block 136 to signal the silent frame block 142 when the range test control block 136 determines that the pitch contour of the voice signal has reached the predetermined threshold.
In one arrangement, when signaled by the range test control block 136 and the frame type detector 140, the silent frame block 142 can convert silent frames in the voice signal to pitch frames. Alternatively, when the silent frame block 142 is signaled, the silent frame block 142 can add pitch frames to the voice signal. These processes will be explained further below.
The transmission block 124 can include a transmitter 144 and an antenna 146 in which the transmitter 144 is coupled to the antenna 146. The silent frame block 142 can also be coupled to the transmitter 144. The transmission block 124, as those of skill in the art will appreciate, can transmit the voice signal to another communication device, such as the receiving unit 112.
Turning to the receiving unit 112, the receiving section 126 can include a receiver 148 and an antenna 150 in which the receiver 148 is coupled to the antenna 150. The antenna 150 can capture any voice signals transmitted from the transmitting unit 110, and the receiver 148 can process the voice signal in accordance with well-known principles. In one arrangement, the decoding block 128 can include a frame type detector 152, a pitch value block 154, a vocoder 156 and a pitch shifter 158. The frame type detector 152 can detect the type of frames that are in the incoming voice signal and can be coupled to the receiver 148 and the pitch value block 154. The frame type detector 152 can also have a signaling path to the pitch value block 154. The pitch value block 154, when signaled by the frame type detector 152, can determine the magnitude of the pitch shifting that occurred in the transmitting unit 110. The pitch value block 154 can also be coupled to the vocoder 156 and can include a signaling path to the pitch shifter 158.
The vocoder 156 can be coupled to the pitch shifter 158 and can decode the pitch-shifted voice signal. When signaled with the pitch-shifting information by the pitch value block 154, the pitch shifter 158 can reshift the pitch of the voice signal to compensate for the pitch shifting that occurred in the transmitting unit 110. The pitch shifter 158 can also output the voice signal to any other suitable components in the receiving unit 112.
Referring to
At step 310, the method 300 can start. At step 312, a pitch of a voice signal can be monitored. One way to monitor the pitch of the voice signal is shown in steps 314–324. For example, at decision block 314, in a transmitting unit, it can be determined whether speech is present on the voice signal. If speech is not present, then the method 300 can resume at step 312. If speech is present, at step 316, the pitch of the voice signal can be estimated for at least a portion of the time-based frames of which the voice signal is comprised. At decision block 318, it can be determined whether the speech on the voice signal is comprised of a voiced portion. If it is, a pitch contour can be generated for the voice signal based on the pitch estimating step 316, as shown at step 320. If unvoiced portions are present in the speech, then a pitch contour for the unvoiced portions of the voice signal can be generated by interpolation, as shown at step 322. At decision block 324, it can then be determined whether the generated pitch contour of the voice signal has reached a predetermined threshold.
For example, referring to
The pitch estimating block 132 (see
The pitch estimating block 132 (see
Referring to
Using the pitch estimate 500, the pitch contour block 135 can generate a pitch contour 510 (see
The range test control block 136 can determine when a pitch contour of a voice signal reaches a predetermined threshold. Determining when a pitch contour reaches a predetermined threshold can also be referred to as determining when the pitch itself reaches the predetermined threshold. Referring to
In this example, the maximum encoding pitch level 820 of the vocoder 138 can be 500 Hz, and the minimum encoding pitch level 830 of the vocoder 138 can be 80 Hz. It is understood, however, that the above values are merely examples, as the vocoder 138 can have any other suitable maximum and minimum encoding pitch levels. In any event, for this example, it can be seen that the pitch contour 510 has exceeded the maximum encoding pitch level 820, which can lead to degradation in voice quality. This result may be caused by, for example, the speech of a woman or child with high pitch.
As an example, the predetermined threshold can be a compression window 840, a range of frequencies where compression of the pitch of a voice signal may occur. In this particular example, the compression window 840 can have a range from 250 Hz to 750 Hz. In accordance with an embodiment of the inventive arrangements, when the pitch contour 510 reaches the compression window 840, the range test control block 136 can determine that the pitch has reached the predetermined threshold. Of course, other values can be used for the compression window 840.
In one arrangement, the range test control block 136 (see
Referring back to the method 300 of
For example, referring once again to
To shift the pitch of the voice signal 400 (and hence the pitch contour 510), the pitch shifter 120 can use any suitable compression algorithm. One particular example of a mapping function compression table 900 that the pitch shifter 120 can utilize to shift the pitch is shown in
Referring to
Continuing with the example, the range test control block 136 can determine that the pitch contour 510 has a pitch value of about 475 Hz (see frame 20 in
When the range test control block 136 checks the pitch contour 510 at frame 50 of
It must be noted that the description above is merely one example of how to do pitch shifting. Those of skill in the art will appreciate that there are many different ways to modify the pitch of a voice signal. Moreover, it must be stressed that pitch shifting a voice signal is not limited to decreasing the pitch; that is, the pitch of a voice signal may also be increased in accordance with the example above to help keep the voice signal within the encoding range of a vocoder. It is also understood that the compression shown above is not limited to being performed in a linear fashion, as non-linear pitch shifting can be employed in accordance with the inventive arrangements. Once the voice signal 400 has been shifted, the vocoder 138 can encode the pitch-shifted voice signal 400. The process of encoding a voice signal is well known in the art, and a description here is not necessary. At this point, the voice signal 400 may be considered an audio signal, although it will continue to be referred to as a voice signal for purposes of clarity.
Referring back to the method 300 of
For example, referring to
As noted earlier, when the range test control block 136 determines that the pitch of the voice signal 400 has reached the predetermined threshold, the range test control block 136 can also signal the silent frame block 142. Based on this signaling, the silent frame block 142 can determine the amount of pitch shifting to be performed by the pitch shifter 120. This signaling can also be received from the pitch shifter 120, if so desired.
After receiving these signals, the silent frame block 142 can, for example, convert one or more of the silent frames in the voice signal 400 to pitch frames. Alternatively, the silent frame block 142 can add one or more pitch frames to the voice signal, leaving the silent frames in place. The pitch frames can include pitch-shifting information, such as data that can inform the receiving unit 112 that the incoming voice signal 400 has been pitch shifted. The data can also inform the receiving unit 112 of the magnitude of the pitch shifting that was performed in the transmitting unit 110. Once the pitch frames have been inserted in the voice signal 400, the transmitter 144 can transmit the voice signal 400 through the antenna 146 to the receiving unit 112.
Sending the pitch-shifting information in the fashion described above can minimize any interruption to the voice signal 400 without seriously affecting the amount of data that must be transmitted. Even so, the invention is not limited in this regard, as the pitch-shifting information can be transmitted to a receiving unit at any other suitable time. In addition, other scenarios for inserting the pitch-shifting information into the voice signal 400 are within contemplation of the inventive arrangements.
Referring once again to the method 300 of
As an example, referring to
The vocoder 156 can decode the incoming voice signal 400. Because the voice signal 400 can remain pitch-shifted at this point, the pitch of the voice signal 400 can be within the decoding parameters of the vocoder 156. As a result, the vocoder 156 can efficiently decode the voice signal 400. Once the voice signal 400 is decoded, the pitch shifter 158—because it is signaled with the pitch-shifting information from the pitch value block 154—can reshift the pitch of the voice signal 400 to compensate for the pitch shifting that occurred in the transmitting unit 110.
As an example, the pitch shifter 158 can reshift the pitch of the voice signal 400 to a second level, and the second level can be at least substantially equal to the first level to which the pitch was originally shifted. For purposes of the invention, the phrase “substantially equal to” can include exact equality or even slight or moderate deviations thereform. Of course, the invention is not limited in this regard, as the pitch shifter 158 can reshift the pitch of the voice signal 400 to any suitable lower or even higher pitch value. Following pitch shifting, the voice signal 400 can be transferred to any other suitable components in the receiving unit 112.
While the preferred embodiments of the invention have been illustrated and described, it will be clear that the invention is not so limited. Numerous modifications, changes, variations, substitutions and equivalents will occur to those skilled in the art without departing from the spirit and scope of the present invention as defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5664055 | Kroon | Sep 1997 | A |
5933808 | Kang et al. | Aug 1999 | A |
5953696 | Nishiguchi et al. | Sep 1999 | A |
5960386 | Janiszewski et al. | Sep 1999 | A |
6336092 | Gibson et al. | Jan 2002 | B1 |
6418407 | Huang et al. | Jul 2002 | B1 |
6526376 | Villette et al. | Feb 2003 | B1 |
6549884 | Laroche et al. | Apr 2003 | B1 |
6691082 | Aguilar et al. | Feb 2004 | B1 |
20030065506 | Adut | Apr 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20060025990 A1 | Feb 2006 | US |