At times, it is necessary to evaluate the voice quality of a communication link, such as a Voice over Internet Protocol (VoIP) communication link, or a cellular network communication link.
Traditionally, the voice quality of a communication link has been tested by establishing a call over the communication link, and then playing a reference speech record at a remote end of the communication link while recording a copy of the speech record at a local end of the communication link. The reference speech record is then played at the local end of the communication link, and a copy of the speech record is recorded at the remote end of the communication link. Finally, each of the recorded speech records is compared to the reference speech record to evaluate its voice quality; and the voice quality of one or both of the speech records is used to characterize the voice quality of the communication link.
Typically, synchronization between the local and remote ends of the communication link is required before each file play/record (FPR) process. Without synchronization, there is a high probability that either the beginning or the end of a speech record may not be recorded (e.g., because playback begins too early or ends too late, or because recording begins too early or ends too late). If part of a speech record is missed, or is recorded at the wrong time, comparison of the recorded speech record to a reference speech record will result in an erroneous indication of poor voice quality.
Of note, the time needed to synchronize the two ends of a communication link can add significant overhead to a voice quality test. This is especially so when synchronization is undertaken each time a speech record is played, and for each direction in which the speech record is played (which is typically the case).
In one embodiment, a method comprises 1) buffering an audio record received over a communication link; 2) aligning a data window with a portion of a buffer that contains the buffered audio record; and 3) comparing a portion of the buffered audio record, to which the data window is aligned, with a portion of a reference audio record. If the portions of the buffered and reference audio records match, the buffered and reference audio records are synchronized in accord with a current position of the data window, and an audio quality of the buffered audio record is evaluated by comparing the synchronized audio records. If the portions of the buffered and reference audio records do not match, a location of the data window is incremented with respect to the buffer; and a comparison of A) the portion of the buffered audio record to which the data window is aligned, to B) the portion of the reference audio record, is repeated.
In another embodiment, a computer program comprises 1) code to initiate buffering of an audio record received over a communication link; 2) code to align a data window with a portion of a buffer that contains the buffered audio record; and 3) code to compare a portion of the buffered audio record, to which the data window is aligned, with a portion of a reference audio record. The computer program further comprises code to, if the portions of the buffered and reference audio records match, 1) synchronize the buffered and reference audio records in accord with a current position of the data window, and 2) evaluate an audio quality of the buffered audio record by comparing the synchronized audio records. The computer program also comprises code to, if the portions of the buffered and reference audio records do not match, 1) increment a location of the data window with respect to the buffer, and 2) repeat the comparison of A) the portion of the buffered audio record to which the data window is aligned, to B) the portion of the reference audio record.
In yet another embodiment, a system comprises an interface to receive and buffer an audio record. The audio record is received over a communication link to which the interface is attached. The system further comprises a processing system to 1) align a data window with a portion of a buffer that contains the buffered audio record, and 2) compare a portion of the buffered audio record, to which the data window is aligned, with a portion of a reference audio record. If the portions of the buffered and reference audio records match, the processing system 1) synchronizes the buffered and reference audio records in accord with a current position of the data window, and 2) evaluates an audio quality of the buffered audio record by comparing the synchronized audio records. If the portions of the buffered and reference audio records do not match, the processing system 1) increments a location of the data window with respect to the buffer, and 2) repeats the comparison of A) the portion of the buffered audio record to which the data window is aligned, to B) the portion of the reference audio record.
Other embodiments are also disclosed.
Illustrative embodiments of the invention are illustrated in the drawings, in which:
In accord with the method 200, the voice quality of the communication link 100 is characterized by first establishing a call between the local and remote ends 102, 104 of the communication link 100. See blocks 202 and 212 of
Finally, each of the speech files that is recorded during execution of the method 200 is compared to a corresponding reference speech file to evaluate its voice quality; and the voice quality of one or more individual speech files is used to characterize the voice quality of the communication link.
As previously mentioned, the synchronization steps 204, 208, 214, 218 of the method 200 can add significant overhead to a voice quality test. And, even if synchronization is successful, FPR timing can be jeopardized due to a network glitch.
As an alternative to the method 200,
As shown in
After aligning the data window 504 with a portion of the buffer 500, the portion of a buffered audio record to which the data window 504 is aligned is compared to a portion 506 of a reference audio record 502. See block 406 of
Upon incrementing the location of the data window 504 to the end of the buffer 500 (block 412), and upon failing to match any portion of the buffered audio record to the portion 506 of the reference audio record 502, an error condition may be signaled (at block 416).
In one embodiment of the method 400, it may be determined that the portions of the buffered and reference audio records match when the portions differ by no more than a difference threshold. By way of example, the difference threshold may specify a difference that may not be exceeded at any sample point in a audio record; or, the difference threshold may specify a cumulative sum of differences that may not be exceeded after analysis of a plurality of sample points in an audio record. In an alternate embodiment of the method 400, an exact match of the buffered and reference audio records may be required.
The lengths of the buffer 500 and data window 504 may vary. However, it is preferable that the buffer 500 be long enough (or that the data window 504 be short enough) to enable several movements of the data window 504 with respect to (and within the limits of) the buffer 500. The distance over which the data window 504 can be moved determines the sensitivity of the method 400 to variations in audio record timing.
It is also preferable that the data window 504 be moved in sufficiently small increments to enable a good correlation between buffered and reference audio records. In one exemplary embodiment, a buffer 500 was sized to store thirty seconds of recorded audio; a data window 504 was sized to span ten seconds of the buffer 500; and the data window 504 was moved with respect to the buffer 500 in increments of one-hundred (100) milliseconds (ms).
In one embodiment, the method 400 is commenced after buffering only part of an audio record. However, the method 400 may also be commenced after an audio record has been fully buffered.
Referring back to the method 300 (
If the buffers at each end of the communication link are sized slightly larger than the audio records that they are designed to buffer, then the play cycles 304, 310, 318 can be timed to occur somewhere within the “record windows” of the buffers (i.e., with playback 304 not beginning until after recording 314 has started, and with playback 304 ending before recording 314 has stopped).
In most cases, the method 400 will be executed by means of a computer program. In some cases, the computer program may be embodied in whole or in part in software or firmware. The computer program may be stored on any one or more computer-readable media, including, for example, any number or mixture of fixed or removable media (such as one or more fixed disks, random access memories (RAMs), read-only memories (ROMs), or compact discs), at either a single location or distributed over a network.
As shown in
The system 700 may further comprise a processing system 704 to execute the method 400. In one embodiment, the processing system 700 includes a microprocessor 706, application-specific integrated circuit or field-programmable gate array (FPGA) that is controlled, at least in part, by software or firmware.
The system 700 may be housed within a single device, or may comprise multiple networked devices. In one embodiment, the system 700 is housed within an enclosure having a form-factor of a handheld device. The system 700 may be coupled directly to one end 102 of the communication link 100, or may be coupled to the end 102 via a cable 706 (e.g., a phone or network patch cable).
In addition to receiving one or more audio records via the interface 702, one or more audio records (i.e., reference audio records) may be transmitted over the communication link 100 via the interface 702. In this manner, the system 700 may facilitate execution of the method 400 at the opposite end 104 of the communication link 100, and may enable execution of the method 300 shown in