[Not Applicable]
[Not Applicable]
Voice communication systems have incorporated many new techniques to improve speech quality. One of these techniques involves the use of pulse code modulation (PCM) of voice or speech signals. For example, the ITU-T G.711 standard may be employed to digitize and encode voice frequencies using one or more variants of PCM. Complementary codecs are utilized at the transmitter and receiver to perform such pulse code modulation (PCM).
Prior to transmission at the transmitter, many voice communication systems typically employ linear G.711, μ-law G.711, or A-law G.711 types of pulse code modulation to a speech or voice waveform. When a voice waveform is digitized by way of such pulse code modulation and transmitted by a transmitter, a receiver must appropriately decode the modulation in order to regenerate the signal transmitted from the transmitter. The received signal is typically a DS0 channel transmitting a digitized 64 kilobit/second sampled PCM signal.
Often, a newly implemented voice communication system or an existing problematic voice communication system may need to be diagnosed and tested at one or more points within the system. One of the problems that may be encountered during testing of such a communication system may relate to whether a proper PCM codec is utilized at the receiver. If the PCM codec at the receiver does not employ the corresponding decoding algorithm used by the PCM codec at the transmitter, voice quality may suffer because the received voice signal was improperly decoded.
Furthermore, the inability to efficiently diagnose codec related performance issues may lead to undue testing of other subsystems within the communication system. This often results in system downtime and additional labor costs.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.
Aspects of the invention provide a method and system to detect or identify one or more types of algorithms used in the encoding of a voice or speech waveform. The system and method may be used as a testing tool to identify whether a voice data stream was encoded using linear G.711, μ-law G.711, or A-law G.711 pulse code modulation (PCM) algorithms.
In one embodiment, a method is used to identify a type of encoding used in generating a voice data stream comprising reading words from a voice data stream, generating at least one parameter using the words and determining a format in which the words are encoded from a plurality of possible formats.
In one embodiment, a method of identifying a type of encoding used in generating a voice data stream incorporates reading words of the voice data stream, determining a first number of words of the voice data stream that corresponds to a first range of values, determining a second number of words of the voice data stream that corresponds to a second range of values, generating μ-law linear equivalents of the one or more words of the voice data stream, determining a third number of words corresponding to the μ-law linear equivalents of the one or more words that have values within a third range, determining a fourth number of words corresponding to the μ-law linear equivalents of the one or more words that have values within a fourth range, generating A-law linear equivalents of the one or more words of the voice data stream, determining a fifth number of words using corresponding to the A-law linear equivalents of the one or more words that have values within a fifth range, and determining a sixth number of words corresponding to the A-law linear equivalents of the one or more words that have values within a sixth range.
In one embodiment, a system for identifying a type of encoding used in generating a voice data stream includes a processor, a memory, a storage device, and a set of computer instructions residing in the storage media.
These and other advantages, aspects, and novel features of the present invention, as well as details of illustrated embodiments, thereof, will be more fully understood from the following description and drawings.
Aspects of the present invention may be found in a system and method to detect or identify one or more types of algorithms used in the encoding of a voice or speech waveform. The system and method may be used as a testing tool to identify whether a voice data stream is encoded using one or more pulse code modulation (PCM) compression algorithms defined by ITU (International Telecommunications Union) G.711 recommendation specification. The system and method may be applied to a voice data stream comprising a number of bytes of data that has been previously stored as a data file. The one or more types of algorithms may comprise a 16 bit linear (in some instances described as uniform PCM or linear G.711), μ-law G.711, and A-law G.711 types of pulse code modulation (PCM) algorithms. The system and method characterize the voice data stream in terms of one or more parameters that correlate with linear G.711, μ-law G.711, or A-law G.711. Thereafter, the parameters are analyzed by way of one or more tests to determine which algorithm was used to encode the voice data stream.
The system and method are applied to a voice data stream in order to ensure that a codec that employs the proper decoding algorithm is used to reproduce the audio waveform that was transmitted. The system comprises a set of computer instructions or software, which resides in a computing device. The aforementioned set of computer instructions or software will be termed a G.711 detection software. The G.711 detection software may be generated using a computer language. In one embodiment, the G.711 detection software may be generated using the C/C++ language. The G.711 detection software is executed by way of the computing device. The computing device will be described, hereinafter, as a G.711 detection system. The G.711 detection software operates on a stream of data that represents an encoded speech sample. The encoded speech sample may comprise a stream of data bytes or words output by a transmit codec of a transmitter. In one embodiment, the stream of bytes may correspond to one or more utterances or one or more phrases spoken in one or more languages.
Referring to
Referring to
Thereafter, at step 416, the normalized sum of μ-law and A-law “zeros” are calculated using the following equation:
zero_mag=(azero_percent+μzero_percent)/100.0, wherein
Next, at step 420, the normalized sum of μ-law and A-law “overflows” are calculated using the following equation:
ovfl_mag=(aovfl_percent+movfl_percent)/100.0, wherein
Thereafter, at step 424, the normalized difference between μ-law and A-law “zeros” are calculated, using the following exemplary equation:
zero_diff=(abs(azero_percent−μzero_percent)/(azero_percent+μzero_percent+0.001)), wherein
The value 0.001 is added in the denominator as a safeguard to prevent an instance in which the denominator in the quotient is equal to zero. In such an event, the quotient is equal to infinity and the value of ovfl_diff may not be acceptable.
At the last step 428, of
ovfl_diff=(abs(μovfl_percent−aovfl_percent)/(μovfl_percent+aovfl_percent+0.001)), wherein,
After the parameters described in
Referring to
The first test determines if both a μ-law maximum jump discontinuity and an A-law maximum jump discontinuity are greater than a first threshold. In addition, the test determines if a difference between the μ-law maximum jump discontinuity and a linear maximum jump discontinuity is greater than a second threshold. Furthermore, the test determines if a difference between an A-law maximum jump discontinuity and the linear maximum jump discontinuity is greater than the second threshold. Then, the first test verifies if a normalized sum of μ-law and A-law “overflows” is above a third threshold, a percentage of linear overflows is less than a fourth threshold, a percentage of μ-law overflows is greater than a fifth threshold, and a percentage of A-law overflows is greater than the fifth threshold. If all these conditions are satisfied, the G.711 detection software determines that the voice data stream file is linear G.711. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test, in which exemplary threshold values for JUMP_MAX, JUMP_DIFF, THR_LIN_OVFL_PERCENT, THR_OVFL_MAG, THR_UA_OVFL_PERCENT, and THR_UA_OVFL_PERCENT were defined previously.
The second test determines if a percentage of linear zeros is above a particular threshold and if a percentage of μ-law zeros and if a percentage of A-law zeros are both below the same threshold. If all these conditions are satisfied, the G.711 detection software determines that the voice data stream is linear G.711. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test:
The third test determines whether μ-law or A-law was used to encode the voice data stream file. The test determines if the μ-law and A-law zeros and overflows percentages are significantly different. For example, the G.711 detection system calculates whether a normalized difference between the μ-law and A-law overflows is greater than a normalized overflows difference threshold and a normalized difference between the μ-law and A-law zeros is greater than a normalized zeros difference threshold. If the μ-law/A-law zeros and overflows are significantly different, the G.711 detection system determines if the number of μ-law overflows is greater than the number of A-law overflows and the A-law zero percentage is greater than the μ-law zero percentage. If so, then the G.711 detection system determines that the voice data stream file is A-law G.711. If not, the G.711 detection system determines whether the number of A-law overflows is greater than μ-law overflows and that percentage of μ-law zeros is greater than the percentage of A-law zeros. If so, then the G.711 detection system determines that the voice data stream file is μ-law G.711. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test:
The fourth test checks to see if there are no μ-law or A-law overflows before using μ-law or A-law zeros percentages to determine an outcome. Then, the test determines if an A-law zeros percentage is greater than a μ-law zeros percentage. If so, the test returns an A-law G.711 decision. If the test subsequently determines if the μ-law zeros percentage is greater than the A-law zeros percentage, a μ-law G.711 decision is returned. If either μ-law or A-law G.711 decision is not determined, the fourth test returns “unknown” as a decision. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test:
wherein a helper function is invoked to determine zeroCheck as shown below:
The fifth test determines if a normalized sum of the μ-law and A-law zeros is greater than a first threshold. The test subsequently determines if a normalized difference of the A-law and μ-law zeros is greater than a second threshold. In addition to this condition, a normalized sum of the μ-law and A-law overflows and a normalized difference between the μ-law and A-law overflows must both be less than a third threshold and fourth threshold, respectively. If all of the previously described conditions are satisfied, the G.711 detection system invokes the zeroCheck helper function previously described in the fourth test to determine whether μ-law zeros percentage or A-law zeros percentage is greater. The fifth test returns a decision based on this helper function. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test:
The sixth test assesses whether a normalized sum of the μ-law and A-law overflows is greater than a first threshold and if a normalized difference of the μ-law and A-law overflows is greater than a second threshold. If these two conditions are satisfied, then an assessment is made if a normalized difference between the μ-law and A-law zeros is less than a third threshold. If the third condition is satisfied, the G.711 detection system invokes an overflowCheck helper function to determine whether the μ-law overflows percentage or the A-law overflows percentage is greater. The sixth test returns a decision based on this helper function. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test:
wherein a helper function is invoked to determine ovflCheck as shown below:
The seventh test assesses if a normalized sum of the μ-law and A-law zeros is greater than a first threshold and a normalized differences of the μ-law and A-law zeros is greater than a second threshold. If so, the G.711 detection system invokes the zeroCheck helper function previously described in the fourth test to determine whether μ-law zeros percentage or A-law zeros percentage is greater. The seventh test returns a decision based on this helper function. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test:
The eighth test assesses whether a normalized sum of the μ-law and A-law overflows is greater than a first threshold and whether a normalized difference of the μ-law and A-law overflows are greater than a second threshold. If both are significant, the detection system invokes the overflowCheck helper function, as was previously described, to determine whether the μ-law overflows percentage or the A-law overflows percentage is greater. The eighth test returns a decision based on this helper function. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test:
The ninth test assesses whether an A-law maximum discontinuity jump is greater than a first threshold and whether an absolute value of the difference between the A-law maximum discontinuity jump and a μ-law maximum discontinuity jump is greater than a second threshold. If both of these last two conditions are satisfied, then the G.711 detection system generates a μ-law decision. Otherwise, the G.711 detection system assesses whether the μ-law maximum discontinuity jump is greater than the first threshold and whether the absolute value of the difference between the A-law maximum discontinuity jump and the μ-law maximum discontinuity jump is greater than the second threshold. If both of these last two conditions are satisfied, then the G.711 detection system generates an A-law decision. For example, a software program such as a C/C++ program may comprise the following high level language instructions to implement this particular test:
The tenth test is a combination of two subtests. The first subtest compares the normalized difference between μ-law and A-law overflows against two parameters. If the normalized difference between μ-law and A-law overflows is greater than twice the normalized difference between μ-law and A-law zeros while the normalized difference between μ-law and A-law overflows is greater than a first threshold, then the G.711 detection system invokes the ovflCheck helper function previously described to determine whether the μ-law overflows percentage or the A-law overflows percentage is greater. The second subtest compares a normalized difference between μ-law and A-law zeros versus twice a normalized difference between μ-law and A-law overflows while assessing the normalized difference between μ-law and A-law zeros against a second threshold. If the normalized difference between μ-law and A-law zeros is greater than twice the normalized difference between μ-law and A-law overflows while the normalized difference between μ-law and A-law zeros is greater than a second threshold, then the G.711 detection system invokes the zeroCheck helper function previously described to determine whether the μ-law zeros percentage or the A-law zeros percentage is greater.
The following computer output is generated by an exemplary G.711 detection system that executes the exemplary G.711 detection software. The G.711 detection software operates on an exemplary file named ingress.pcm:
As illustrated by the preceding output, the samples or words in the voice data stream file are characterized by a substantial number of A-law zeros. The values of these words, after converting from A-law to linear are analyzed and those words that exceed a particular threshold value are categorized as overflows while those that fall below a particular threshold are classified as zeros. In this particular data stream file, the percentage of A-law zeros far exceeds the percentage of μ-law zeros or linear zeros. Referring to the output above, the percentage of A-law zeros is 93.69% while the μ-law and linear zeros are negligible. Another parameter of significance is the maximum discontinuity jump associated with values of successive words in either the linear, A-law, or μ-law case. As illustrated in the output, the maximum discontinuity jump associated with the A-law case is the smallest among the three possible cases. The maximum discontinuity jump associated with A-law is 11,766 compared with approximately 64,000 for the other two cases, indicating that a voice data stream decoded using A-law G.711 results in values that are more reasonable than the same voice data stream decoded using either μ-law G.711 or linear G.711. Hence, as illustrated by the last line of the output, the data stream file has been determined to be encoded using A-law (i.e., the data file is a representation of A-law).
While the invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from its scope. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.
This application is a continuation of U.S. patent application Ser. No. 10/688,443 filed Oct. 17, 2003.
Number | Date | Country | |
---|---|---|---|
Parent | 10688443 | Oct 2003 | US |
Child | 12345407 | US |