The present invention relates to methods of improving the quality of telephonic or radio duplex communications, in particular to communications having marginal comprehension quality due to background noise at the speaker or listener, signal degradation or distortions, and combinations thereof.
Speech coding is highly deployed in modern communication devices, although originally the evolution was driven specifically by the development of mobile phones. Due to the limited bandwidth, speech coding is a key element to “over air transmissions”. However, the encoded data when sent through the air by radio frequencies will be exposed to a sensitive transmission link, which is very likely to be affected by errors. Such errors may corrupt the transmitted data, and due to the lack of redundancy, it may be difficult if not impossible to reconstruct the speech signal. Due to interaction of speech coding and transmission, errors in a mobile transmission can cause heavy distortions that sound quite different from traditional “analog” distortions, making it difficult for the receiver to understand the information. In fact, the speaker is frequently unaware that the listener is having trouble hearing or understanding such speech, and may be constantly inquiring, “Do you hear mew?”
Further, the portability of device, such as phones and radios, that permit “over air transmissions” naturally include environments that are substantially noisier than the environments typical in traditional PSTN (Public Switched Telephone Network) or POTS (Plain Old Telephone System) usage. Such noisy environments include public social environments (restaurants, bars), automobile environments, public transportation environments (subways, trains, airports) and other common situations (shopping locations and city/traffic noise).
However real world use of cell phone is problematic even if system performance is flawless. Portability in effect, guarantees use in marginal environments, background noise pick up at speaker, background noise of listener, tendency to speak loudly, social consequences, confounding of system noise, low confidence of full comprehension
It is therefore a first object of the present invention to provide a method for the user to select their speaking environment and voice volume in proportion to the listener's requirements, as effected by their own listening environment as well as degradation from the system quality of service.
Another object of the invention is to provide one or more speakers with the ability to know if their voice is being heard at a high enough quality for the speaker to comprehend it.
In the present invention, the first object is achieved by measuring background noise at the speaker, calculating the signal to noise ratio and indicating it to the speaker.
Other objectives achieved by measuring background noise at the receiver and transmitting the results of a signal to noise ratio analysis to at least one of the sender and receiver.
Another object is achieved by the communicating instructions to one or more parties to the conversation that indicate if the other party is likely to comprehend their communications, as well as suggest or signal behavior that would either improve the quality of the received communication such that it can be comprehended, or indicate that the voice may be lowered without detriment to the communication quality, thus avoiding the social consequences of speaking loader than necessary.
Accordingly, the inventive voice based communication methods and device disclosed herein have a signal processing module operative to receive, analyze and display the speech quality as received or transmitted via a metric such as a signal to noise ratio. The speaker has the option of adjusting their voice level and/or reducing background noise to improve the other party's quality of reception. Likewise, in a preferred embodiments at least one speaker can also receive a signal indicating the voice or sound quality as received, which in comparison to the quality metric for transmission indicate if the other party is likely to comprehend the speech, or the poor quality is due to transmission related factors independent of the speakers hearing or listening environment. In the latter case, the metric would indicate that transmission quality is unlikely to improve and that the connection should be repeated by other means, including a different time and place.
A further aspect of the invention is notifying the user of the results. The method of notification may be by visual indicates, such as digital or analog meter, text messages and the like. Other method of notification includes vibration, sounds and patterns of the same.
The above and other objects, effects, features, and advantages of the present invention will become more apparent from the following description of the embodiments thereof taken in conjunction with the accompanying drawings.
As used in this disclosure, VQP is a figure of merit indicating deviations from perfect speech. Some threshold value of VQP correlates with the listener ability to readily and comfortable comprehend the speaker. Numerous methods have been developed to analyze voice signals to quantify the quality as it relates to final perception of the sound generated there from. Such methods, including the associated hardware and software are in fact in the configuration testing & maintenance of telecommunication systems. The VQP may be determined simply by determining the ratio of signal (voice) to background noise, or SNR for signal to noise ratio, and optionally takes into account the background noise at the listener as well as at the speaker. Mathematically speaking the signal to noise ratio as received is equal to the spoken signal divided by the sum of the noise due to: the speakers background, the listeners background and the system noise, it being understood that the SNR is a transient property having temporal variation. The VQP may also take into account of noise and distortion related to the quality of transmission of the signal, or deploy a combination of parameter. More sophisticated methods of determining a VQP have been primarily developed measure performance of a telecommunication system, or the analog to digital conversion Codec's used therein, as disclosed in the following U.S. patents, which are incorporated herein by reference: U.S. Pat. Nos. 6,628,453; 6,330,428; 6,418,196; 6,609,092; 6,275,797; 5,987,320, which are incorporated herein by reference.
In accordance with the present invention,
However, transmitting both voice signals and characteristics of the speech quality as transmitted, along with other signals depends on the specific embodiment of the invention. As will be further discussed with respect to
A currently favored method of determining a VQP is the PESQ method. PESQ is described in an IEEE publication entitled “Perceptual Evaluation of Speech Quality (PESQ)A New method for speech quality assessment of telephone networks and Codecs”, by A. W. Rix et al., ICASASSP, 7-11 May 2001, which is incorporated herein by reference. PESQ is not used on real speech, but rather known test waveforms (or test waveforms plus real noise); a second point of novelty, more fully described with respect to
However, real world communications suffer from background noise picked up microphone 110 (i.e. other conversations in a restaurant, air conditioner and fan noise in a car, or traffic noises), noise and distortion in transmission, as well as the difficulty of listener discerning the sound generated by speaker 280 due to background noise in their environment. At some level of VQPmin. a listener is simply unable to consistently understand what is being said. This value of VQPmin. is indicated by the arrow point to a minimum speaker threshold on the y-axis of the graph in
It should be appreciated from the foregoing discussion that as VQP-Rx is the critical parameter for the listener greater degradation of signal quality and/or noise from the communication system, represented by a lower value of R, requires a higher value of VQP-TX. However, as VQP-Tm cannot be increased beyond a certain level, thus at some level of R, defined as Rcrit. A vertical line extending downward from Rcrit. defines region 302 to the right of this line. To the extent that the analysis characterizes the total quality parameter (TQP) within region 302 communication is not possible as VQP-TX is less than VQPmin.
When the TQP is within regional 305, defined as lying above diagonal line 307, VQP-Rx is within the comprehensible range. Although regions 301 and 302 overlap it should be appreciated that a sub-portion of region 301 that does not fall within 302, denoted as 303, is significant, as it is possible to obtain an acceptable VQP-RX that is moving into region 305, by first increasing VQP-Tx to some value above VQPmin., that is placing the TQP beyond region 304, that is crossing line 307 as one increases VQP-Tx.
Thus, in preferred embodiment, the user is instructed on the options to reach region 305, based on either having the speaker go to a quieter environment that is decreasing noise, speaking loader to increase signal, or having the listener go to a quieter environment. Therefore, in the most preferred embodiment, both the speaker and listener receive notification and recommendations to improve the TQP such that VQP-Rx is within the comprehensible range.
However, it should be recognized that at some point, increasing TCP-Tx VQP by the user becomes fruitless as the noise or degradation to VQP by the communication system cannot be overcome, such as when one is one the edge of cell phone coverage area, or transmission is blocked or distorted by building or geography. In one zone, the worst case, the ratio between background noises, arising from the system, to signal sound is so low that no increase in the signal volume improves comprehension, as the signal is in effect distorted by the system.
In yet another aspect of the invention, algorithms/ methods could be used continuously on the real speech, to interpolate between the test waveform packets. As to commercial acceptance, one issue (among no doubt many) is the typical delay and frequency of update available, which must be within seconds to be of practical use to the consumer. This will depend on the sample waveform length and the amount of time to send the sample waveform in an interlaced format, and process the PESQ and/or other algorithms. A further advantage of the invention in its various embodiments is that it alerts users to the need and opportunity to change equipment speech or system/network quality problems unlikely to be overcome by speaking loader, or changing the speaking or listening environment. For example the VQP information transmitted to either the speaking or receiving party may be used to select a quality of service level from the communication carrier, and hence bandwidth allocation, as taught in U.S. Pat. No. 6,418,196. Alternatively, the user may choose to hold or complete a call using POTS as opposed to VoIP or a mobile communication system.
In
In
In
It should be further understood that any of the diagrams in
In contrast,
Independent of displaying the information to the user, as described with respect to
Alternatively, the TQP/VQP can be determined either continuously or discontinuously, but transmitted discontinuously by temporal sample coding when the speech is not being transmitted. Such methods of determining the VQP may include or utilize the interlacing of non-intrusive test waveforms, such as test speech patterns used in the PESQ protocol.
For example, as shown in
The system of
Further, part or all of analysis of the signal information to determine the TQP/VQP can occur in either or both telecommunication devices, or reside in part or in whole within the communication network. In other embodiments, the user may be able to choose among a number of methods to determine the TQP/VQP, depending on how frequently they want to be updated during the conversation. Alternatively, the system may select an algorithm most appropriate to the temporal nature and variation in the noise characteristics
In various embodiments, background noise levels are detected using the microphone, an extra sensor that is coupled to the device for sensing environmental noise, and by indirect measures such as the position of a volume control knob, or an indication of a location of use of the device. It should be understood that the determination and transmission of TQP/VQP between the speaker and listener is optional duplex, and is not limited to the methodology illustrated with
The above invention can be implemented in numerous methods, which for example include providing the integrated function disclosed herein with the microprocessor of a mobile telephone. Alternatively, a user may deploy a module that receives the signal through the auxiliary output jack or split line connection of any phone and merely notify the speaker of the current, as transmitted VQP. In other embodiments, the invention may be implemented as a software routine running on a general-purpose computer to analyze the TQP/VQP of communications using the voice over internet protocol (VoIP) from point to point, in which the signal is digitally transmitted over network, such as the internet, between one or more users via the computer.
While the invention has been described in connection with a preferred embodiment, it is not intended to limit the scope of the invention to the particular form set forth, but on the contrary, it is intended to cover such alternatives, modifications, and equivalents as may be within the spirit and scope of the invention as defined by the appended claims.
The present application claims priority to the U.S. Provisional Patent Application for a “Telecommunication Device and Method”, having Ser. No. 60/592,179, filed on Jul. 28, 2004, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60592179 | Jul 2004 | US |