The invention relates to the field of communications, and more particularly to techniques for generating clearer and more reliable speakerphone operation in a cellular telephone or other communications device.
Convenient and effective speakerphone operation has become a desirable feature in cellular handsets and other communications devices. Communities concerned with traffic safety have in some instances banned the handheld operation of cellular phones while driving. Handsets and other devices equipped with a speakerphone feature permit users to place the device in a resting position in a car or other location while still carrying out normal conversations and other telephone access.
However, equipping a cellular telephone with an effective speakerphone capability is not a trivial integration task. One practical difficulty is that many cellular telephones are small devices which contain both an earpiece speaker and integrated microphone within a few inches of each other, to make the unit more compact. Therefore, duplex-type operation where both the speaker path and microphone path are active at the same time may generate unwanted feedback, since the output of the speaker leaks into the microphone via air and case vibration. This feedback problem only gets worse as speaker volumes are increased, such as they might be in a noisy car or room.
Echo canceling circuits are known which can be connected to the microphone path on a cellular phone or other device, and remove a portion of the feedback energy emanating from the speaker. Unfortunately, echo canceling circuits are currently only capable of about 35 dB of cancellation, and the energy from the speaker may be more than 35 dB greater than the energy delivered by the embedded microphone so that echo and feedback still occur, even when echo cancellation circuits are included.
One solution to the speakerphone problem is to attempt to physically isolate the speaker and microphone from each other in the handset. For instance, one may place the speaker used for speakerphone operation in a rear-facing part of the handset so that less sound impinges directly on the microphone from the speaker. However, this placement makes the sound harder to hear for a user from whom the speaker faces away, and some amount of speaker energy will still leak through the cellular or other case to the microphone.
Another solution to feedback is to prevent the speaker path and microphone path from operating at the same time. This simplex-type of operation makes direct feedback impossible but results in one-way communication only, which requires users at both ends to signal the end of their speech, and wait for a response. More effective and natural speakerphone operation is desirable. Other problems exist.
The invention overcoming these and other problems in the art relates in one regard to a system and method for speakerphone operation in a communications device, in which built-in intelligence simultaneously manages both the speaker path and the microphone path of the device to reduce unwanted echo and feedback while still preserving a perceived quality of conversational speech. In an embodiment of the invention, a communications device such as a cellular telephone handset or other device may incorporate dual voice activity detection circuits to simultaneously monitor the signal energy and other characteristics in both speaker and microphone paths, and award control to one or the other path based on dynamic thresholds or other adaptive or other criteria. In other embodiments, problems such as premature dropouts caused by greater than average background noise may be prevented by applying hangtime parameters which keep the speaker path open until a minimum interval has passed, before transferring control to the microphone path. The criteria applied to trigger a change in control from speaker path to microphone path and vice versa may also be adapted in embodiments of the invention, including to eliminate a lower threshold below which the speaker path switches out and passes control to the microphone path, automatically.
The invention will be described with reference to the accompanying drawings, in which like elements are referenced with like numbers, and in which:
FIGS. 2(A)-2(C) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention.
FIGS. 4(A) and 4(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention.
FIGS. 9(A) and 9(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention.
FIGS. 10(A) and 10(B) illustrate outbound and inbound path control including an interposed hangtime, according to an embodiment of the invention.
FIGS. 12(A) and 12(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention.
FIGS. 14(A) and 14(B) illustrate speaker path activation during noisy conditions, according to an embodiment of the invention.
The microphone 102 in the microphone path 128 may be connected to a microphone gain control 104, to boost or attenuate the output of microphone 102 as appropriate. The output of the microphone gain control 104 may be communicated to an echo canceller 106 to remove a portion of any feedback, including echo, leaking from speaker 120 to microphone 102. Echo canceller 106 may for example be implemented in hardware, software, firmware of a combination thereof. Echo canceller 106 may for instance be implemented instance using commercially available parts such as dedicated integrated circuits manufactured by Oki Semiconductor or others, or using software modules such as echo canceller modules available for digital signal processors such as the DSP 56000 family manufactured by Motorola Corp., digital signal processors made by Texas Instruments Inc., or others. In embodiments, the echo canceller 106 may incorporate or implement known echo cancellation algorithms, for instance algorithms related to or incorporated in International Telecommunications Union (ITU) standard G.165 or other cancellation algorithms or techniques. In embodiments, the echo canceller 106 may reduce the echo or other feedback by as much as 35 dB or more, but may typically not eliminate the full degree of feedback present in the signal generated by the microphone 102.
The output of the echo canceller 106 may be communicated to a speech encoder 108, which compresses or otherwise processes speech input for purposes of wireless or other transmission. The speech encoder 108 may be implemented using known speech compression or other algorithms, for instance algorithms related to or incorporated in ITU standards such as ITU G.711, G.723, G.726, G.729, or other protocols. Those standards or protocols may incorporate or implement for example the Low-Delay Code-Excited Linear Prediction (LD-CELP) speech coding algorithm, which encodes 2.5 ms frames of digitized, telephone bandwidth speech or audio signals sampled at 8 KHz, or other digitizing or other techniques. Other speech compression/decompression (codec) algorithms, software or standards may be used. The speech encoder 108 may likewise be implemented in hardware, software, firmware or a combination thereof, including using programmable digital signal processors or other components.
After a user's speech input is encoded by the speech encoder 108, the encoded speech may be communicated to the modem transmit module 110. The modem transmit module 110 may prepare the encoded signal for wireless or other transmission via an antenna or other air or other interface, for instance generating wireless transmission in the 800/900 MHz, 1.9 GHz or other cellular, PCS or other frequency spectra for voice or other communications.
On the receiver side, a modem receiver module 126 may likewise be coupled to a cellular antenna or other source of radio frequency (RF) or other wireless or other energy to capture, downconvert and/or demodulate wireless carrier signals. The modem receive module 126 may communicate the demodulated received signal to a speech decoder 124. The speech decoder 124 may in general perform the reverse type of operation from the speech encoder 108, for example to decompress far-end speech from a remote user of another cellular handset or other device. The output of speech decoder 124 may be communicated to the speaker gain control 122, providing amplification or attenuation of the decoded speech for driving the speaker 120, such as the earpiece speaker in a cellular handset or other transducer. The output of the speech decoder 124 may also be communicated to the echo canceller 106 to perform echo detection and cancellation processing.
In embodiments of the invention such as that illustrated in
The output of each of the inbound VAD 114 and the outbound VAD 118 may in turn be communicated to a duplex arbiter 116. Duplex arbiter 116 may also be implemented using hardware such as a microprocessor or digital signal processor, in software, firmware or a combination thereof to perform supervisory tasks to arbitrate and manage the activation of the microphone path 128, speaker path 130 and other resources to enhance speakerphone and other operation. The duplex arbiter 116 may, for instance, determine instances in time when the inbound (near-end, or handheld user of the communications device) speech energy is significant while the outbound (far-end, or remote user) speech energy is negligible so that the duplex arbiter 116 may activate the microphone path 128 to capture that local speech, while deactivating or muting the speaker path 130 since the far-end user is interpreted as not speaking or communicating.
Conversely, in instances when the inbound speech energy detected by the inbound VAD 114 is negligible while the outbound speech energy detected by the outbound VAD 118 is significant, the duplex arbiter 116 may activate the speaker path 130 while deactivating the microphone path 128, so that the far-end user's speech may be heard over the speaker 120.
On the other hand, during those intervals of time in which both the inbound VAD 114 and outbound VAD 118 detect significant speech energy in their respective paths, the duplex arbiter 116 may apply selective criteria to decide which path to activate. As illustrated for instance in FIGS. 2(A)-2(C), intervals may occur when both the inbound VAD 114 (
As illustrated in
Operation of this type may permit seamless transitions between the near-end and far-end user's speech in conversation, and prevent artifacts such as channel lockouts. In embodiments, as illustrated the duplex arbiter 116 may also communicate with a comfort noise generation and substitution module 112, likewise capable of being implemented in hardware, software or firmware or a combination thereof. The comfort noise generation and substitution module 112 may in turn also communicate with the microphone gain control 104 and the speaker gain control 122, to output white noise or other comparatively pleasant or innocuous sounds during path transitions, dead spots such as when both the microphone path 128 and speaker path 130 may be muted, or at other times. In other embodiments or under other conditions, the duplex arbiter 116 may award control to the microphone path 128 or the speaker path 130 under different fixed or dynamic criteria used for decision processing.
In an embodiment illustrated in
Where:
ob_r0(n) = outbound speech energy for a frame n;
n = current speech frame
β = an energy scalar; and
α = decay rate.
In step 310, the output of the speech encoder 108 may also be communicated to an inbound speech envelope generator 132, which may in embodiments be integrated with or interface to inbound VAD 114. Inbound speech envelope generator 132 may generate a moving envelope representing speech energy, such as a moving average or other representation of speech energy of the signal in the microphone path 128. Outbound speech envelope generator 134, which also may be integrated with or interface to outbound VAD 118, may similarly generate an envelope output based on the signal in the speaker path 130.
In step 312, the resulting speech envelope may be compared to the current inbound break-in threshold (ib_break_in_thresh). If the envelope of the inbound speech exceeds that threshold, processing proceeds to step 314 where the duplex arbiter 116 may mute the speaker path 130 and activate or unmute the microphone path 128, thus allowing the near-end user's speech to be captured and communicated to the far-end user. If the envelope of the inbound speech does not exceed the inbound break-in threshold (ib_break_in_thresh), processing proceeds to step 316 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
FIGS. 4(A) and 4(B) illustrate speaker samples and echo-cancelled microphone samples, respectively, generated according to the embodiment illustrated in
When encoded speech is choppy or contains large swings in amplitude or other artifacts, in cases those inputs may cause rapid switching between microphone path 128 and speaker path 130, or other “race” or other undesirable conditions. In an embodiment of the invention illustrated in
As shown in
In step 812, an inbound break-in threshold (ib_break_in_threshold) and outbound break-in threshold (ob_break_in_threshold) may be generated, for instance according to the embodiment illustrated in
If the microphone path 128 is not activated, processing may proceed to step 822 where the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted. After step 822, control may proceed to step 840 where processing for the current frame may end, following which processing may repeat, proceed to other tasks or end.
If the determination at step 818 is that the microphone path 128 is on, processing may proceed to step 820 where a determination may be made the outbound speech envelope (ob_env) may be greater than the outbound break-in threshold (ob_break_in_threshold). If the outbound speech envelope (ob_env) is greater than the outbound break-in threshold (ob_break_in_threshold), processing may proceed to step 824 where a determination may be made whether the inbound hangtime (ib_hang_time) has expired. If the inbound hangtime (ib_hang_time) has not expired, processing may proceed to step 822 where again the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted.
If at step 824 the inbound hangtime (ib_hangtime) has expired, processing may proceed to step 826 where an outbound hangtime (ob_hangtime) may be set to begin a hangtime period for the speaker path 130. The outbound hangtime (ob_hangtime) may for instance be set to a fixed amount of time, such as 4 seconds or another value according to implementation. In embodiments, the outbound hangtime may be computed or set on a dynamic basis, for instance as a function of prior inbound or outbound hangtimes, detected speech energy in the inbound or outbound paths or other variables. In step 828, the microphone path 128 may be deactivated or muted, while the speaker path 130 may be activated or unmuted, after which control may proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
If at step 820 the outbound speech envelope (ob_env) is determined to not exceed the outbound break-in threshold (ob_break_in_threshold), processing may proceed to step 822 where again the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted. Control may then also proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
If at step 816 a determination is made that the speaker path 130 is on, processing may proceed to step 830 in which a determination may be made whether the inbound envelope (ib_envelope) exceeds the inbound break-in threshold (ib_break_in_threshold). If the inbound envelope (ib_envelope) does not exceed the inbound break-in threshold (ib_break_in_threshold), processing may proceed to step 832 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted. Following that step, control may then proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
If at step 830 a determination is made that the inbound envelope (ib_envelope) exceeds the inbound break-in threshold (ib_break_in_threshold), processing may proceed to step 834 where a determination may be made whether the outbound hangtime (ob_hangtime) has expired. If the outbound hangtime (ob_hangtime) has not expired, processing may likewise proceed to step 832 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted.
If at step 834 a determination is made that the outbound hangtime (ob_hangtime) has expired, processing may proceed to step 836 where the inbound hangtime may be set to a fixed amount of time, such as 4 seconds or another value according to implementation. In embodiments, the inbound hangtime may be computed or set on a dynamic basis, for instance as a function of prior inbound or outbound hangtimes, detected speech energy in the inbound or outbound paths or other variables. Processing may then proceed to step 838, where the speaker path 130 may be deactivated or muted while the microphone path 128 may be activated or unmuted. Following that step, control may then proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
In the embodiment of the invention illustrated in
In particularly noisy environments, such as for example in urban areas, when an automobile window may be open, during playback of a noisy voice message or at other times, the fricatives and other signal components may tend to trigger the speaker path 130 to be muted, even when still-intelligible speech is present. This may in one regard be due to the crossing of an outbound muting threshold ordinarily intended to switch the speaker path 130 off when the far-end user input has degraded into noise. In an embodiment of the invention illustrated in
As shown in that figure, processing may begin in step 1102. In step 1104, near-end samples from the microphone 102 may be processed by the speech encoder 108. In step 1106, outbound speech from the far-end user may be processed by speech decoder 124. In step 1108, the echo canceller 106 may receive the outputs of the speech encoder 108 and the speech decoder 124 to suppress echo and other feedback artifacts. In step 1110, the echo-cancelled inbound speech and the decoded outbound speech may be communicated to inbound speech envelope generator 132 and outbound speech envelope generator 134, respectively, to generate speech energy envelopes or other functions.
In step 1112, an inbound on threshold (ib_on_threshold) and outbound on threshold (ob_on_threshold) may be generated, for instance similarly to the embodiment illustrated in
In step 1116, a determination may be made whether the outbound envelope (ob_env) exceeds the outbound on threshold (ob_on_threshold). If the outbound envelope (ob_env) does not exceed the outbound on threshold (ob_on_threshold), processing may proceed to step 1118 where a determination may be made whether the inbound envelope (ib_env) exceeds the inbound on threshold (ib_on_threshold). If the inbound envelope (ib_env) exceeds the inbound on threshold, processing may proceed to step 1120 where a determination may be made whether the speaker path 130 is locked, that is, currently has control of the communications channel, such as a wireless cellular or other connection. If the speaker path 130 is locked, the state of the microphone path 128 and speaker path 130 may remain unchanged from the start of processing at step 1102 and control may proceed to step 1128 where processing for the current frame may end, following which processing may repeat, proceed to other tasks or end.
If the determination at step 1120 is that the speaker path 130 is not locked, processing may proceed to step 1122 where the speaker path 130 may be deactivated or muted, while the microphone path 128 may be activated or unmuted. Processing then may likewise proceed to step 1128 to repeat, proceed to other tasks or end.
If the determination at step 1118 is that the inbound envelope (ib_env) does not exceed the inbound on threshold (ib_on_threshold), processing may proceed to step 1128 to repeat, proceed to other tasks or end.
If the determination at step 1116 is that the outbound envelope (ob_env) exceeds the outbound on threshold (ob_on_threshold), processing may proceed to step 1124 where a determination may be made whether the microphone path 128 is locked. If the microphone path 128 is not locked, control may proceed to step 1126 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted. Processing then may proceed to step 1128 to repeat, proceed to other tasks or end. Likewise, if the determination at step 1124 is that the microphone path 128 is locked, the state of the microphone path 128 and speaker path 130 may remain unchanged from the start of processing at step 1102, and control may proceed to step 1128 to repeat, proceed to other tasks or end.
The foregoing description of the system and method for speakerphone operation according to the invention is illustrative, and variations in configuration and implementation will occur to persons skilled in the art. For instance, while the invention has generally been described as containing discrete voice detectors in the form of inbound VAD 114 and outbound VAD 118, in embodiments the functions or parts of the functions of the two voice activity detectors could be combined in one part, or in one software module. More than two paths could also be managed according to the invention. Similarly, while the invention has been described with respect to an inbound path including an echo canceller 106, in embodiments other types of noise suppressors could be implemented, or in embodiments that component could be omitted or modified.
It has likewise been noted that the communications device in which the invention may operate may be or include a cellular telephone, but could consist of other communications platforms such as wired or wireless telephones, two-way radios, base stations for wireless telephones, network-enabled wireless communications devices such as 802.11a, 802.11b, 802.11g or other short or long-range telephony or other units, or other equipment as well.
Yet further, while the invention has generally been described in terms of a speakerphone architecture in which the electronic intelligence governing the speakerphone operation is integral with the cellular telephone or other communications device, in other embodiments the intelligence may be embedded or shared in an attachment coupled to the communications device. For instance, the intelligence may be embedded or shared in a detachable battery, a headphone device, a tabletop or other fixed or non-wearable speakerphone unit, or in other accessories or parts. For example, the intelligence may enable a speakerphone operation through a car audio system coupled to a cellular telephone.
In the case of a detachable or coupleable unit which adds or enhances speakerphone capability in a communications device, the intelligence embedded in the add-on device may communicate with the electronics of the communications device through interfaces such as a serial port such as an RS-232, a universal serial bus (USB) or a universal asynchronous receiver/transmitter (UART) connection, an infrared data (IrDA) port, a radio frequency link, or other serial, parallel or other data ports or other connections. The scope of the invention is accordingly intended to be limited only by the following claims.