This invention relates in general to communication systems, and, more particularly, to a system and method for voice quality analysis of communication systems.
One type of network that has received considerable interest over the past several years for its voice conveyance capabilities is the packet data network. In such a network, sound at an origination point may be digitized, placed into packets, and sent across the network in the packets to a destination point, which may reproduce the sound based on the data in the packets.
Unfortunately, the packets in such a network may be sent at irregular intervals, sent by different routes, and/or discarded. This leads to voice packets arriving at irregular intervals, arriving in a different order, and/or not arriving at all relative to their generation at the origination point. Thus, voice quality may suffer.
Typical systems for assessing voice quality in a packet data network require recording a test voice stream at a destination point and generating a reference voice stream from a reference voice sample. The recorded voice stream and the generated voice stream may then be compared to determine the voice quality of the network.
This approach, however, requires an additional device in the network under test so that the test voice stream may be introduced. Furthermore, introducing the test voice stream requires coordination between the origination and destination points and produces extra load in the network, which corrupts the analysis. Additionally, by only being able to measure voice quality from the origination point to the destination point, isolating problems in the network is difficult.
The present invention provides a system and method that substantially reduce and/or eliminate at least some of the problems and/or disadvantages with existing systems and methods for voice quality analysis. To accomplish this, the present invention provides, at least in certain embodiments, a voice quality module that does not require a test stream to be introduced into the network.
In particular embodiments, a system for voice quality analysis includes a voice packet capture module, a voice data substitution module, and a voice quality analysis module. The voice packet capture module is operable to receive packets in a voice stream and to generate a receipt indicator for the packets. The voice data substitution module is operable to substitute a reference voice sample for the voice data in the packets. The voice quality analysis module is operable to compare the voice data in the voice-substituted packets to the reference voice sample to determine voice quality.
In certain embodiments, a method for voice quality analysis includes receiving packets in a voice stream and generating a receipt indicator for the packets. The method also includes substituting a reference voice sample for the voice data in the packets and comparing the voice data in the voice-substituted packets to the reference voice sample to determine voice quality.
The present invention has several technical features. For example, because a voice quality module may be coupled between an originating endpoint and a receiving endpoint, problems introduced by the receiving endpoint may be eliminated from the voice quality analysis. Furthermore, because a voice quality module may be coupled to a communication system at a variety of locations, problems with voice quality in the communication system may be isolated and/or identified. As another example, because a reference voice sample does not have to be introduced into a communication system for a voice quality module to perform its task, the operations of the system may not be disturbed by the analysis, leading to more accurate analysis. Moreover, by still being able to use a reference voice sample, privacy concerns are assuaged. As another example, a device does not have to be provided at one of the endpoints to introduce the reference voice stream into the communication system, which simplifies the analysis process. Moreover, the test stream introduction and recording do not have to be coordinated, which also simplifies the analysis process.
Of course, some embodiments may possess none, one, some, or all of these technical features and/or additional technical features. Other technical features will be readily apparent to those skilled in the art from the figures, detailed written description, and claims.
The figures described below provide a more complete understanding of the present invention and its technical features, especially when considered with the following detailed written description:
In more detail, endpoints 20 may include telephones, voice-capable personal computers, personal computers, voice-capable personal digital assistants, personal digital assistants, a voice-mail storage and delivery system, and/or any other type of device for generating voice data, storing voice data, sending it to communication network 30, receiving it from communication network 30, and/or converting it to audible sound. If one of endpoints 20 generates voice data based on audible sound of a user, it will typically include a microphone to convert the audible sound into electrical signals, an encoder to convert the electrical signals to voice data, and a processor executing logical instructions to group the voice data into packets and send them to communication network 30. If one of endpoints 20 generates audible sound based on voice data, it will typically include a processor executing logical instructions to receive voice packets from communication network 30 and ungroup the voice data from the packets, a decoder to convert the voice data to electrical signals, and a speaker to convert the electrical signals to audible sound. In particular embodiments, endpoints 20 are telephones that utilize Internet protocol (IP) telephony techniques.
Communication network 30 provides the conveyance of the voice data packets between endpoints 20. To accomplish this, communication network 30 is coupled to endpoints 20 by links 31. Links 31 may be wires, cables, fiber-optic cables, microwave channels, infrared channels, or any other type of wireline or wireless path for conveying data. Additionally, communication network 30 includes conveyance modules 32. Conveyance modules 32 may be switches, routers, bridges, voice gateways, call managers, transceivers, hubs, and/or any other type of device for conveying data packets. Conveyance modules 32 are also coupled to each other by links 31, which may have intervening conveyance modules. Communication network 30 may operate according to any appropriate type of protocol, such as, for example, Ethernet, IP, X.25, frame relay, or any other packet data protocol. Note that communication network 30 may also support the conveyance of non-voice data packets between endpoints 20 and/or other devices.
Communication network 30 also includes a gateway 34, which is coupled to conveyance module 32o by one of links 31. Gateway 34 is operable to convert voice data packets in communication network 30 to a format suitable for a public switched telephone network (PSTN) 40 and/or to convert voice data from PSTN 40 to a format suitable for communication network 30. Thus, endpoints 20 may operate with standard telephony devices.
Voice quality module 50 is coupled to conveyance module 32z by one of links 31 in the illustrated embodiment, although it could be coupled to any of conveyance modules 32, or even one of endpoints 20. By being coupled to conveyance module 32z, voice quality module 50 is operable to receive packets from a voice-packet stream being conveyed by conveyance module 32z and to analyze the voice quality for the stream. To receive packets from a voice-packet stream, voice quality module 50 may, for example, tap a shared medium, such as, for example, an Ethernet connection or a wireless connection. The voice data in the packets may then be replaced by an encoded reference voice sample, and the packets analyzed using the encoded reference voice sample. Voice quality module 50 may include a communication interface, a processor, a memory, an encoder, a decoder, a voice synthesizer, a reference voice sample, a filter, an echo canceller, and/or any other components for receiving and analyzing packets for voice quality analysis. In particular embodiments, voice quality module 50 is a Linux-based PC with an Ethernet port.
In operation, when one of endpoints 20, endpoint 20a, for example, wishes to convey voice data to another of endpoints 20, endpoint 20z, for example, a session is established between the endpoints. A session may be established according to the real-time transfer protocol (RTP), the H.323 protocol, the Skinny protocol, or any other appropriate protocol. Then, endpoint 20a may begin to send packets containing voice data, which may or may not have been generated by endpoint 20a, to conveyance module 32a of communication network 30. Upon receiving the packets, conveyance module 32a routes the packets toward endpoint 20z. Typically, the route will be defined during establishment of the session, or conveyance module 32a may define the route. Sometimes, however, packets may take alternate routes. The packets are conveyed over links 31 and through conveyance modules 32 towards endpoint 20z.
When packets in the voice stream arrive at conveyance module 32z, voice quality module 50 may store the packets and their arrival characteristics in order to perform voice quality analysis. To determine which packets to analyze, voice quality module 50 may examine the destination address of the packets, the origination address of the packets, the arrival port of the packets, the type of data conveyed by the packets, and/or any other appropriate indicia. For example, using the destination address, voice conveyance module 50 may look for voice packets destined for a particular endpoint 30, on a particular local area network (LAN), or on a particular virtual LAN (VLAN). For instance, if voice quality module 50 is coupled to a catalyst switch, it may use the switch's spanning feature to record and analyze the voice streams on a VLAN. For a service provider environment, it may be beneficial to couple voice quality module 50 to an aggregation point or other device where network congestion is likely to occur. As another example, voice quality module 50 may look for signaling messages indicating that a call is being set up. This may be done upon command, at predetermined intervals, or upon any other appropriate criterion. The packets of interest may be stored along with their arrival characteristics, such as, for example, time of arrival, order of arrival, and/or any other appropriate arrival characteristics.
Once a sufficient number of packets of interest have been collected, perhaps indicated by the end of the voice stream from endpoint 20a to endpoint 20z or by a sufficient amount of voice data having been received, a reference voice sample may be substituted for the voice data in the collected packets. To accomplish this, the collected packets may be examined to determine the type of encoding, and possibly frame size and packetization, used for the voice data in the packets. The reference voice sample may then be encoded similarly, and the encoded reference voice sample may be substituted for the voice data in the packets, perhaps based on the size of each packet and the sequence in the voice stream. Note that if some packets are missing from the voice stream, the portion of the encoded reference voice sample associated with that packet may be discarded.
The packets with the encoded reference voice sample may be processed as if they were the actual packets arriving from endpoint 20a, with the jitter, packet losses, and/or packet ordering that occurred prior to arriving at voice quality module 50. For example, the voice data in the packets may be decoded, synthesized, and compared to the reference voice sample to obtain a voice quality analysis. The voice quality analysis may be made according to perceptual speech quality measurement (PSQM) techniques, such as, for example, those described in ITU-T P.861, or any other appropriate technique. Results of the analysis, such as, for example, signaling events (e.g., call setup and disconnect), statistics (e.g., jitter, drop, order), a voice quality score, or any other appropriate data, may be conveyed to a user by display device, acoustic device, electronic message, and/or other appropriate technique and/or stored for later retrieval. In particular embodiments, the results are conveyed if a threshold is broken, such as, for example, a high PSQM score. In certain embodiments, a representation of the reference voice sample or the voice data in the packets may also be output and/or sent to a user.
As illustrated in
Although
Note that voice quality module 50 may suffer from several drawbacks. For example, if a tap is not perfect, voice packets may be missed, and, hence, voice quality may be undercomputed. As another example, even if the tap is perfect, routing flaps or load splitting might cause some voice packets to be conveyed by alternate routes, which may bypass the tap, leading to an effect similar to the one just mentioned. As a further example, in encoding schemes where the packet type does not reflect the encoding, the encoding of the data may have to be estimated, or the signaling messages for the session between endpoints may have to be tapped. The former is error prone, and the latter may be difficult because the signaling may go by a different path or may be encrypted. In particular embodiments, however, voice quality module 50 may run in a distributed mode in which it examines both a signaling channel and a bearer channel. As an additional example, the arrival characteristics of the voice packets may only reflect the conditions in communication system 10 upstream of the tap. Thus, conditions in communication system 10 downstream of the tap may not be reflected by the voice quality analysis. As another example, accurate emulation of an adaptive jitter algorithm may be imperfect because of sensitivity of the algorithm to initial conditions and other timing-related problems. As a further example, encrypted RTP packets may be difficult to interpret correctly since the packet type is inside the encrypted envelope; of course, time sequence and sequence number are probably in the clear, so the basic tap information is still available. Even with these potential drawbacks, however, the various embodiments of the present invention have advantages, some of which have been mentioned previously.
Voice packet capture module 51 receives packets from communication network 30, determines whether the packets are of interest, and, if they are of interest, stores them and their associated arrival characteristics in a memory 52, which is a type of computer readable media. To determine whether a packet is of interest, voice packet capture module 51 may examine the destination address of the packet, the origination address of the packet, the type of data in the packet, the arrival port of the packet, and/or any other appropriate criterion that indicates the packet is part of a voice stream. If a packet is of interest, voice packet capture module 51 stores the packet and its associated arrival characteristics in a location 53 in memory 52. Voice packet capture module 51 continues to examine packets and store those of interest in location 53 until a sufficient number of packets have been received. A sufficient number of packets may be received, for example, if there are no more packets in a voice stream or if voice data representing a predetermined period of time, such as, for example sixty seconds, has been received.
In particular embodiments, voice packet capture module 51 may include a communication interface, such as, for example, a network interface card, a transceiver, a modem, and/or a port, and a processor operating according to logical instructions, such as, for example, a microprocessor, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), and/or any other type of device for manipulating data in a logical manner. The instructions for the processor could be stored in memory 52. Furthermore, memory 52 may include read-only memory (ROM), random access memory (RAM), compact-disk read-only memory (CD-ROM), registers, and/or any other type of volatile or non-volatile electromagnetic or optical data storage service. In general, voice packet capture module 51 may be any type of device that can receive, examine, and store packets from communication network 30.
Voice data substitution module 54 replaces the voice data in the packets of interest with voice data of reference voice sample 55, which may be a WAV file, an AU file, or any other storable representation of audible sound. To accomplish this, voice data substitution module 54 examines the packets to determine the type of encoding used for the voice data, such as, for example, G.711, G.726, or G.729. For example, if the packets are sent using RTP, they may indicate the type of encoding used in the RTP header. As another example, the payload type and payload length for the packets, which may be in another header, may be examined to determine the encoding scheme. Voice data substitution module 54 may then encode reference voice sample 55 according to a similar encoding scheme and substitute the encoded reference voice sample for the voice data in the packets of interest, perhaps by using the sequence number of the packets. In doing this, voice data substitution module 54 may have to discard some of the encoded reference voice sample because of missing voice packets. Also, voice data substitution module 54 may truncate or recycle reference voice sample 55, depending on the length of the voice stream to be analyzed. Voice data substitution module 54 may then output the packets, which now contain the encoded reference voice sample instead of the original voice data, according to their arrival characteristics to decoding module 57.
In particular embodiments, voice data substitution module 54 includes a processor operating according to logical instructions encoded in a memory. In general, however, voice data substitution module 54 may be any type of device that can examine voice packets and substitute an encoded reference voice sample for the voice data in the packets.
MLS module 56 provides a pseudo-random code that voice data substitution module 54 may use to align the encoded reference voice sample with the reference voice sample. For example, voice data substitution module 54 may place the code in the first packet(s) in the voice stream to align the encoded reference with the reference.
Decoding module 57 decodes the voice data, which is now the encoded reference voice sample, in the packets. In accomplishing this, the decoding module 57 may determine the type of encoding used on the voice data in the packets, apply a jitter buffer to the packets, and decode the voice data. Note that decoding module 57 may discard packets if they are too far out of order. Decoding module 57 passes the decoded voice data to voice synthesis module 58.
In particular embodiments, decoding module 57 includes a processor operating according to logical instructions encoded in a memory and operates similarly to one of endpoints 30. In general, however, decoding module 57 may be any type of device for decoding voice data.
Voice synthesis module 58 is responsible for converting the decoded voice data into a voice synthesized format. For example, voice synthesis module 58 may convert the decoded voice data into a WAV or an AU file. In particular embodiments, voice synthesis module 58 may be an audio format converter. In general, voice synthesis module 58 may be any type of device for synthesizing sound.
After conversion of the decoded voice data into a voice synthesized format, voice quality analysis module 59 may compare the synthesized voice data to the reference voice sample 55 to perform a voice quality analysis. For example, voice quality analysis module 59 may use perceptual speech quality management (PSQM) techniques to determine voice quality. The results of such an analysis may be output to a user by a display device, an acoustic device, an electronic message, and/or any other suitable communication technique.
In particular embodiments, voice quality analysis module 59 may be a PESQ, PAMS, or SNR calculator. In general, however, voice quality analysis module 59 may be any device that can compare audible sound.
Although
Receipt indicator section 110 includes a time stamp field 112 and a sequence number field 114. Time stamp field 112 contains an indication of the time at which the associated voice packet arrived at voice packet capture module 51. As illustrated, time stamp field 112 indicates the hour, minute, second, and hundredth of a second at which a packet was received. The data in field 112 may be useful, among other things, for determining the jitter of the voice packets. Sequence number field 114 contains an indication of the sequence in which the associated voice packet arrived at voice packet capture module 51. The data in field 114 may be useful, among other things, for determining the order in which voice packets arrived at voice packet capture module 51.
Voice packet section 120 includes an RTP section 122, a user datagram protocol (UDP) section 124, and an IP packet section 126. RTP section 122 includes a time stamp field 132, a sequence number field 134, and a coding type field 136. Time stamp field 132 contains an indication of when the associated voice data was processed. Field 132 may be useful in determining jitter and/or determining whether packets are missing from the voice stream. Sequence number field 134 contains an indication of where the associated voice data belongs in the voice stream. Field 134 may be useful in determining jitter, determining whether packets are missing from the voice stream, and/or determining the proper order of voice packets. Furthermore, the data may be useful for associating the encoded reference voice sample with the appropriate voice packets. Coding type field 136 contains an indication of the type of encoding scheme used for the voice data. For example, data that represents audible sounds may be encoded using G.711, G.726, or G.729. As mentioned previously, by examining the data in field 136, the type of encoding used for the voice data may be determined and used for encoding the reference voice sample. UDP section 124 includes a port number field 142. Port number field 142 contains an indication of the port for which the voice packet is destined at the receiving device. Field 142 may be useful in identifying packets of interest. IP packet section 126 includes a destination address field 152, a type of service (TOS) field 154, and a voice data field 156. Destination address field 152 contains an indication of the destination of the voice packet, such as, for example, an IP address. Field 152 may be useful in identifying packets of interest. TOS field 154 contains an indication of the type of data that the packet is carrying. For example, the data in field 154 may indicate that the packet is carrying general data, audio data, video data, low priority data, high priority data, or any other type of data. In some embodiments, by examining TOS field 154 and the size of the voice data, an indication of the encoding for the voice data may be obtained. Voice data field 156 contains the actual voice data that is being conveyed. Field 156 may be encoded according to G.711, G.726, G.729, or any other appropriate format.
Although one embodiment of location 53 in memory 52 is illustrated by
Once a packet has been received, the method calls for determining whether the packet is of interest at decision block 408. A packet may be of interest, for example, if it is carrying voice data, is destined for a particular endpoint, originates from a particular endpoint, destined for a particular port, and/or contains any other appropriate voice-stream indicia. If the packet is not of interest, the method returns to decision block 404 to check whether another packet has been received. If, however, the packet is of interest, the method calls for generating a receipt indicator for the packet at function block 412. As discussed previously, the receipt indicator may indicate the time that the packet was received, the sequence in which the packet was received, and/or any other appropriate receipt characteristic. At function block 416, the method calls for storing the packet and the receipt indicator. For example, the packet and receipt indicator may be stored in a location of a memory.
After this, the method calls for determining whether a sufficient number of packets of interest have been received at decision block 420. A sufficient number of packets of interest may have been received, for example, if a predetermined amount of a voice stream, such as, for instance, sixty seconds, is represented by the packets of interest or if there are no more packets in a voice stream. If a sufficient number of packets of interest have not been received, the method returns to decision block 404 to check whether another packet has been received.
If, however, a sufficient number of packets of interest have been received, the method calls for substituting a reference voice sample for the voice data in the packets at function block 424. As discussed previously, this may include: 1) determining the type of encoding used for the voice data in the packets; 2) encoding the reference voice sample using the identified encoding type; and 3) substituting the encoded reference voice sample for the voice data in the packets. The method then calls for comparing the voice data, which is now the encoded reference voice sample, in the packets to the reference voice sample at function block 428. As discussed previously, this may include decoding the voice data in the packets, generating a voice synthesis of the decoded data, and comparing the generated voice synthesis to the reference voice sample using PSQM techniques. After this, the method is at an end.
Returning to decision block 404, if a packet has not been received, the method calls for determining whether a voice stream is idle at decision block 432. A voice stream may be idle, for example, if no packets have been received from it in thirty seconds. If a voice stream is not idle, the method calls for returning to decision block 404. If, however, a voice stream is idle, the method calls performing the voice quality analysis operations discussed previously, beginning at function block 424.
Although flowchart 400 illustrates one embodiment of a method for voice quality analysis, other embodiments may include fewer, more, and/or a different arrangement of operations. For example, if packets are prescreened to identify those of interest, decision block 408 may be eliminated. As another example, the receipt indicator may be generated upon receipt of a packet and stored with the packet before determining whether the packet is of interest. As an additional example, a particular voice stream may need to be identified at the beginning of the method to determine which packets are of interest. As a further example, a plurality of voice streams may be analyzed simultaneously, meaning that the packets from the different voice streams will have to be identified separately from each other. A variety of other examples exist.
While a variety of embodiments have been discussed for the present invention, a variety of additions, deletions, modifications, and/or substitutions will be readily suggested to those skilled in the art. It is intended, therefore, that the following claims encompass such additions, deletions, modifications, and/or substitutions to the extent that they do not do violence to the spirit of the claims.
Number | Name | Date | Kind |
---|---|---|---|
4476559 | Brolin et al. | Oct 1984 | A |
5737365 | Gilbert et al. | Apr 1998 | A |
5812534 | Davis et al. | Sep 1998 | A |
5940479 | Guy et al. | Aug 1999 | A |
6009082 | Caswell et al. | Dec 1999 | A |
6275797 | Randic | Aug 2001 | B1 |
6289003 | Raitola et al. | Sep 2001 | B1 |
6330428 | Lewis et al. | Dec 2001 | B1 |
6910168 | Baker et al. | Jun 2005 | B2 |
7061903 | Higuchi | Jun 2006 | B2 |
7068594 | Tasker | Jun 2006 | B1 |
20020003799 | Tomita | Jan 2002 | A1 |