Some contemporary communication systems use the Internet for transmitting voice calls; the underlying technology is generally referred to as voice over internet protocol, or VoIP. Gateways are devices often used in VoIP systems to bridge the traffic across domains. For example, a business using an internet protocol-based private branch exchange (IP-PBX) system usually has one or more VoIP gateways to connect the PBX to the public internet, and also may have some ‘PSTN gateways’ for connecting to the traditional public switched telephone network. The gateways are responsible for relaying control signals as well as for relaying the media for each communication link.
Currently, gateways are tested for their quality with respect to relaying audio streams by having a human tester listen in on VoIP calls. In general, any echo and interference (distortion) are noted by the tester. However, such a testing process is somewhat subjective, is not scalable to testing large numbers of devices, and can be quite expensive.
This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
Briefly, various aspects of the subject matter described herein are directed towards a technology by which quality of an audio channel between a caller and a callee is evaluated, including by outputting audio generated from audio files at the callee and caller, and detected at the callee and caller, respectively. The relative timing of outputting the audio and detecting sounds at the caller and callee is analyzed to provide results indicative of the quality of the audio channel. The audio channel may include an IP-PBX device, and/or a gateway, such as a VoIP gateway or a PSTN gateway.
In one example implementation, the audio channel between a caller mechanism and a callee mechanism includes a device under test. An analyzer receives timestamps from the caller mechanism and the callee mechanism during a calling session, including a first timestamp corresponding to when the callee mechanism initially provides audio to the caller mechanism, a second timestamp corresponding to when the caller mechanism initially detects sound, a third timestamp corresponding to when the caller mechanism initially provides audio to the callee mechanism, and a fourth timestamp corresponding to when the callee mechanism initially detects sound. The analyzer determines that the audio channel is operating correctly with respect to not having interference or echo when the first timestamp is before the second timestamp, the second timestamp is before the third timestamp, and the third timestamp is before the fourth timestamp. Alternatively the analyzer determines that the audio channel has interference when the fourth timestamp is before the first timestamp or the second timestamp is before the first timestamp, or the audio channel has echo when the fourth timestamp is before the third timestamp and after the first timestamp. When the audio includes speech, a speech recognizer recognizes the speech and a confidence level corresponding to accuracy of speech recognition may also be used to establish the quality of the audio channel. Speech recognition may also be used to detect echo, e.g., when the output speech is recognized as matching input speech. One or more audio files may be randomly selected, and/or the time or times corresponding to generating audio from one or more of the audio files may be random, such as to vary the testing.
Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.
The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Various aspects of the technology described herein are generally directed towards using an audio file (e.g., a “.wav” file) or set of audio files to evaluate the quality of VoIP or PSTN gateways, and/or an IP-PBX device. In general, the audio files correspond to set of spoken words that can be recognized by speech recognition systems. As described below, the caller and the callee mechanisms ordinarily use different audio files, each comprising distinct audio such as speech, to facilitate delayed echo detection based part on expected differences in the files, e.g., via speech recognition. However, certain echo detection can also be performed via files that do not necessarily include speech, in which event it is possible to use alternative audio files that comprise tones for testing, possibly including subsonic and/or supersonic frequencies, which may be the same files at the caller and callee.
In one example implementation, there is described a gateway and/or IP-PBX testing configuration in which a call is placed from a calling mechanism on a testing computing device back to a callee mechanism on the same computing device. As can be readily appreciated, any number of intermediary devices and/or networks may be present between the caller and the callee, including a PBX device, the PSTN, one or more gateways, an intranet, the public Internet, and so forth. However, these intermediaries introduce external variables, and ordinarily are thus avoided to the extent possible, except possibly when it is desired to evaluate a device's operation with one or more specific intermediaries being present, for example. Further, while the caller mechanism can be on the same computing device as the calling mechanism, separate computer systems for each may also be used, as long as the clocks on the separate computing systems are synchronized.
Thus, as will be understood, the technology described herein is not limited to any type of test configuration, nor to any particular type of gateway and/or particular type (e.g., PBX-type) of telephone systems, but applies to any configuration and/or telephone related-devices that are present in an audio channel. As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in computing, telephony and/or testing in general.
Turning to
The exemplified computing device 102 also includes a callee mechanism comprising callee telephone logic 114 and a callee telephone 116. The callee telephone 116 likewise may be an internal component or an external component, but in any event, is controlled by the callee telephone logic 114 to output appropriate audio signals such as generated from a selected audio file of the set of audio files 108. The callee telephone logic 114 also outputs other data including timestamps to the analyzer 110, as also described below.
The caller telephone may be coupled to the callee telephone in essentially any way, with any number of intermediary devices, including a device or combination of devices under test. In
To evaluate the quality of a gateway (
For speech recognition purposes, the caller telephone logic 104 is associated with one automatic speech recognizer 126, while the callee telephone logic 114 is associated with another automatic speech recognizer 128, (although it is feasible to have a single speech recognizer multiplexed between the caller and callee as needed). The automatic speech recognizers 126 and 128 assume the roles of a human speaker and listener, to automate gateway and PBX device testing, and thereby lower the testing cost for VoIP deployment or the like. However, because automated speech recognition can introduce errors, additional baseline measures are also established and provided.
An aspect of the testing is to quantitatively determine the audio channel quality for VoIP calls. More particularly, when VoIP calls are routed through VoIP or PSTN gateways, these intermediate devices often introduce echoes or random noise interference in the audio channel. The testing described herein detects such quality disturbances.
To this end, the caller logic 104 and callee logic 114 execute a test scenario and record timestamps for significant events. The analyzer 110 interprets these timestamps and generates a report 120 indicating the occurrence of echoes, noise interferences and the overall quality of the recognized speech.
By way of example, one suitable test scenario comprises example steps as set forth in the flow diagram of
At step 310 the callee plays an audio file (e.g., a .wav file) over the audio channel established at steps 302 and 304. This step corresponds to timestamp 1 (TS1) in
At step 312, the caller's speech recognizer detects the speech (or other audio) from the callee's audio file. This detection corresponds to time TS2 in
More particularly, TS3 is not always at the same interval following TS2, whether a random time interval is used or some preset time variation pattern. This ensures that, given enough repetitions, the timing of responding with the audio playback to the callee is not a factor in the test results. Random file selection in conjunction with random timing of playing back the file provides the least chance of a coincidence that would factor into the test results.
At step 316, the callee detects this speech, which in normal operation (
Step 316 also represents the caller's speech being recognized at the callee, and an evaluation made as to the confidence level that the speech was recognized correctly. For example, the caller can notify the callee as to what audio file was selected, by which the callee can access known good recognition text to compare against the actually recognized text. In general, the confidence level is an indication of how accurately the callee's speech recognizer was able to recognize the speech.
It should be noted that although not explicitly shown in
With the parameters, the analyzer can use the causal ordering of these parameter events and/or the outcome of the speech recognizer to measure the quality of the audio channels, and hence the device under test. For example, in correct execution represented in
In addition to timestamp comparison, echo can also be detected by the speech recognizer. For example, if at TS4 the callee recognizer audio that it played at TS1, (and the audio files are different), this implies that callee is hearing itself rather than hearing the caller, which is also an echo. In this manner, the speech recognizer's outcome helps in detecting echoes, and more particularly in detecting delayed echoes that cannot be detected using the timestamp comparison.
In this manner, the test configurations (
As can be readily appreciated, the above description is mainly for illustrative and example purposes. Those skilled in the art can easily generalize the invention to a large scale test operation, e.g., with multiple callers making calls to multiple callees, such as via multiple computing devices. Further, the various technological aspects and concepts described herein may be applied in an environment in which a mixture of PSTN, cellular and/or VoIP calls are simultaneously involved.
Exemplary Operating Environment
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer 810 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 810 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 810. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
The system memory 830 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 831 and random access memory (RAM) 832. A basic input/output system 833 (BIOS), containing the basic routines that help to transfer information between elements within computer 810, such as during start-up, is typically stored in ROM 831. RAM 832 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 820. By way of example, and not limitation,
The computer 810 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, described above and illustrated in
The computer 810 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 880. The remote computer 880 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 810, although only a memory storage device 881 has been illustrated in
When used in a LAN networking environment, the computer 810 is connected to the LAN 871 through a network interface or adapter 870. When used in a WAN networking environment, the computer 810 typically includes a modem 872 or other means for establishing communications over the WAN 873, such as the Internet. The modem 872, which may be internal or external, may be connected to the system bus 821 via the user input interface 860 or other appropriate mechanism. A wireless networking component 874 such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN. In a networked environment, program modules depicted relative to the computer 810, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
An auxiliary subsystem 899 (e.g., for auxiliary display of content) may be connected via the user interface 860 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state. The auxiliary subsystem 899 may be connected to the modem 872 and/or network interface 870 to allow communication between these systems while the main processing unit 820 is in a low power state.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.
Number | Name | Date | Kind |
---|---|---|---|
5566272 | Brems et al. | Oct 1996 | A |
5640490 | Hansen et al. | Jun 1997 | A |
6202047 | Ephraim et al. | Mar 2001 | B1 |
6249570 | Glowny et al. | Jun 2001 | B1 |
6324170 | McClennon et al. | Nov 2001 | B1 |
6785267 | Knappe | Aug 2004 | B1 |
6996068 | Sherlock | Feb 2006 | B1 |
7003458 | Feng et al. | Feb 2006 | B2 |
7026957 | Rubenstein | Apr 2006 | B2 |
7058713 | Reynolds et al. | Jun 2006 | B2 |
7085374 | Schulz | Aug 2006 | B2 |
7130281 | Surazski et al. | Oct 2006 | B1 |
7218895 | Raghavan | May 2007 | B1 |
7286652 | Azriel et al. | Oct 2007 | B1 |
7391765 | Tezuka et al. | Jun 2008 | B2 |
7408884 | Bauer et al. | Aug 2008 | B2 |
7426221 | Cutaia | Sep 2008 | B1 |
7525952 | Shankar et al. | Apr 2009 | B1 |
7664231 | Schmidmer et al. | Feb 2010 | B2 |
7813378 | Gass | Oct 2010 | B2 |
8194565 | Goodman | Jun 2012 | B2 |
20020035616 | Diamond et al. | Mar 2002 | A1 |
20020075818 | Matsuo | Jun 2002 | A1 |
20020131604 | Amine | Sep 2002 | A1 |
20030179745 | Tsutsumi et al. | Sep 2003 | A1 |
20040162722 | Rex et al. | Aug 2004 | A1 |
20040215448 | Funatsu et al. | Oct 2004 | A1 |
20050015253 | Rambo et al. | Jan 2005 | A1 |
20050261895 | Bauer et al. | Nov 2005 | A1 |
20060031469 | Clarke et al. | Feb 2006 | A1 |
20060104218 | Kako | May 2006 | A1 |
20060153174 | Towns-von Stauber et al. | Jul 2006 | A1 |
20060193273 | Passier et al. | Aug 2006 | A1 |
20070239458 | Odell et al. | Oct 2007 | A1 |
20080014883 | Topaltzas et al. | Jan 2008 | A1 |
20080177534 | Wang et al. | Jul 2008 | A1 |
20080215717 | Zhou et al. | Sep 2008 | A1 |
20080255829 | Cheng | Oct 2008 | A1 |
20090061843 | Topaltzas | Mar 2009 | A1 |
20090096874 | Hayashi et al. | Apr 2009 | A1 |
20090111459 | Topaltzas et al. | Apr 2009 | A1 |
20090129282 | Lee et al. | May 2009 | A1 |
20090132248 | Nongpiur | May 2009 | A1 |
20090316881 | Prakash et al. | Dec 2009 | A1 |
20100189290 | Choi | Jul 2010 | A1 |
20110130136 | Topaltzas et al. | Jun 2011 | A1 |
20110246192 | Homma | Oct 2011 | A1 |
20110292797 | Bejerano | Dec 2011 | A1 |
20120099734 | Joly | Apr 2012 | A1 |
20120113270 | Spracklen | May 2012 | A1 |
20120170761 | Ozawa | Jul 2012 | A1 |
20130170391 | Feiten et al. | Jul 2013 | A1 |
Number | Date | Country |
---|---|---|
WO2005099231 | Oct 2005 | WO |
Entry |
---|
“Acoustic Echo Cancellation in IP Phones”, http://www.globalipsound.com/pdf/AEC—TestReport.pdf. |
“Voice Enhancement for VoIP-Pstn Gateways”, http://www.ditechcom.com/solutions/solutionsdetail.aspx?pid=47. |
Mandelstam, David, “Echo and Soft VoIP Pbx Systems”, hftp://delivery.acm.org/10.1145/1110000/1103055/8424. html?key1=1103055&key2=3230187611&col1=Acm&d1=Acm&Cfid=75919783&Cftoken=92791909. |
Number | Date | Country | |
---|---|---|---|
20080177534 A1 | Jul 2008 | US |