1. Field
The present invention relates to wireless communication systems, and specifically to playback of packets in an adaptive de-jitter buffer for voice over internet protocol (VoIP) for packet switched communications.
2. Background
In a communication system, the end-to-end delay of a packet may be defined as the time from its generation at the source to when the packet reaches its destination. In a packet-switched communication system, the delay for packets to travel from source to destination may vary depending upon various operating conditions, including but not limited to, channel conditions and network loading. Channel conditions refer to the quality of the wireless link.
The end-to-end delay of a packet includes delays introduced in the network and the various elements through which the packet passes. Many factors contribute to end-to-end delay. Variance in the end-to-end delay is referred to as jitter. Factors such as jitter lead to degradation in the quality of communication. A de-jitter buffer may be implemented to correct for jitter and improve overall quality in a communication system.
Generally, speech consists of sentences having periods of talkspurts and periods of silence. Individual sentences are separated by periods of silence, and in turn, a sentence may comprise multiple talkspurts separated by periods of silence. Sentences may be long or short, and the silence periods within sentences (or “intra-sentence”) may typically be shorter than periods of silence separating sentences. As used herein, a talkspurt is generally made up of multiple packets of data. In many services and applications, e.g., voice over IP (VoIP), video telephony, interactive games, messaging, etc., data is formed into packets and routed through a network.
Generally, in wireless communication systems, channel conditions, network load, quality of service (QoS) capabilities of a system, the competition for resources by different flows, among other factors, impact the end-to-end delay of packets in a network. The end-to-end delay of packets may be defined as the time it takes a packet to travel within a network from a “sender” to a “receiver.” Each packet may incur a unique source to destination delay, resulting in a condition generally referred to as “jitter.” If a receiver fails to correct for jitter, a received message will suffer distortion when the packets are re-assembled. When packets arriving at a receiver fail to arrive at regular intervals, a de-jitter buffer may be used to adjust for the irregularity of incoming data. The de-jitter buffer smooths the jitter experienced by packets and conceals the variation in packet arrival time at the receiver. In some systems this smoothing effect is achieved using an adaptive de-jitter buffer to delay the playback of a first packet of each talkspurt. The “de-jitter delay” may be calculated using an algorithm, or may be equal to the time it takes to receive voice data equal to the length of the de-jitter buffer delay.
Channel conditions, and thus jitter may vary and the delay of a de-jitter buffer may change from talkspurt to talkspurt to adapt to these changing conditions. While adapting the de-jitter delay, packets (representing both speech and silence) may be expanded or compressed, in a method referred to herein as “time-warping.” The perceived voice quality of communication may not be affected when speech packets are time-warped. However, in certain scenarios, when time-warping is applied to silence periods, voice quality may appear degraded. Thus, it is an objective of the present invention to provide a method and an apparatus for modifying the playback timing of talkspurts within a sentence without affecting intelligibility.
The following discussion is applicable to packetized communications, and in particular, details a voice communication, wherein the data, or speech and silence, originate at a source and are transmitted to a destination for playback. Speech communication is an example of application of the present discussion. Other applications may include video communications, gaming communications, or other communications having characteristics, specifications and/or requirements similar to those of speech communications. For clarity the following discussion describes a spread-spectrum communication system supporting packet data communications including, but not limited to code division multiple access (CDMA) systems, orthogonal frequency division multiple access (OFDMA), wideband code division multiple access (W-CDMA), global systems for mobile communications (GSM) systems, systems supporting IEEE standards, such as 802.11 (A,B,G), 802.16, WiMAX etc.
For transmission from AT 140, data/voice is provided from transmit processing unit 116 to encoder 118. Lower layer processing unit 120 processes the data for transmission to BS 110. For receipt of data from BS 110 at AT 130, data is received at lower layer processing unit 108. Packets of data are then sent to a de-jitter buffer 106, where they are stored until a required buffer length or delay is reached. Once this length or delay is attained, the de-jitter buffer 106 begins to send data to a decoder 104. The decoder 104 converts the packetized data to sampled voice and sends the packets to receive processing unit 102. In the present example, the behavior of AT 130 is analogous to AT 140.
A storage or de-jitter buffer is used in ATs, such as the ones described above, to conceal the effects of jitter.
In one example, the de-jitter buffer has an adaptive buffer memory and uses speech time warping to enhance its ability to track variable delay and jitter. In this example, the processing of the de-jitter buffer is coordinated with that of the decoder, wherein the de-jitter buffer identifies an opportunity or need to time-warp the packets and instructs the decoder to time-warp the packets. The decoder time-warps the packets by compressing or expanding the packets, as instructed by the de-jitter buffer. An adaptive de-jitter buffer discussed further in co-pending U.S. application Ser. No. 11/215,931, entitled “METHOD AND APPARATUS FOR AN ADAPTIVE DE-JITTER BUFFER,” filed Aug. 30, 2005 and assigned to the assignee of the present disclosure. The adaptive de-jitter buffer may be a memory storage unit, wherein the status of the de-jitter buffer is a measure of the data (or the number of packets) stored in the adaptive de-jitter buffer. The data processed by the de-jitter buffer may be sent to a decoder or other utility from the de-jitter buffer. The encoded packets may correspond to a fixed amount of speech data, e.g., 20 msec corresponding to 160 samples of speech data, at 8 kHz sampling rate.
If a silence period consists of just a few frames, for instance when the silence period occurs within a sentence, voice quality may be affected by the expansion or compression of silence periods.
Since expansion or compression of short periods of silence may result in degradation, the length of the transmitted silence period may be maintained at the receiver. In one scenario, when intra-sentence silence periods are detected, such as the silence periods illustrated in
In another aspect of the present disclosure, the length of a silence period between talkspurts may be calculated using the difference in RTP timestamps between the last packet of a talkspurt and the first packet of the next talkspurt. The sequence number (SN) of a real-time transport protocol (RTP) packet increments by one for each transmitted packet. The SN is used by a receiver to restore packet sequence and to detect packet loss. The time stamp (TS) may reflect the sampling instant of a first octet in the RTP data packet. The sampling instant is derived from a clock that increments monotonically and linearly in time. In applications processing speech, the TS may be incremented by a constant delta that corresponds to the number of samples in each speech packet. For instance, an input device may receive speech packets having 160 sampling periods, thus TS is incremented by 160 for each packet.
In another example, if the length of silence is too strictly maintained, a degree of freedom may be removed from the operation of the de-jitter buffer. A goal of a de-jitter buffer is to introduce an optimum delay in order to correct for jitter. This delay may be updated with changing channel conditions and in consideration of factors such as frame error rate, etc. If the length of silence is strictly maintained and a de-jitter buffer is designed to only adapt between sentences, inefficiencies may be introduced. For instance, during certain initial channel conditions, inter-sentence adaptation of the de-jitter buffer may prove sufficient. However, a sudden change in jitter conditions may result in the need to adapt between even short sentences. If this capability is disabled, the de-jitter buffer will not be able to adapt quickly enough to overall changing jitter conditions.
In order to operate the de-jitter buffer with a requisite degree of freedom while maintaining integrity of voice quality, an example of the disclosed invention aims to loosely maintain silence lengths between talkspurts occurring intra-sentence. To achieve this objective, the intra-sentence silence lengths may be adjusted by an amount calculated using an algorithm based on channel conditions, user input, etc. The resulting length of silence, although adjusted, approximates the length of the original silence in the voice source. In determining the adjusted length of silence, the effect of silence compression and silence expansion is taken into account. In certain scenarios, for instance, silence compression is more noticeable than silence expansion, therefore only expansion may be triggered. Another factor taken into consideration is the length of the original silence. For instance, when the original silence in the voice source is relatively longer, there is more flexibility in the amount of adjustment. For instance, if the original length of silence is 20 msec, expanding the silence by 40 msec at the receiver may be as noticeable. On the other hand, if the original length of silence is 100 msec, expanding the silence by 40 msec at the receiver may not be very noticeable. Assuming the original length of silence in the voice source is X sec, an example of the present disclosure maintains a silence spacing of:
[X−a,X+b], where a=MIN(0.2*X,0.02) sec, and b=MIN(0.4*X,0.04) sec
According to the one example, for the first talkspurt of each received sentence, the playback of the first packet may be delayed by Δ, where Δ is equal to de-jitter buffer delay. For subsequent talkspurts of each sentence, the playback of the first packet may be delayed according to the example of the following algorithm:
Let arrival_time be the arrival time of the first packet. Let depth_playout_time be the time at which the first packet would have been played out if it were delayed by de-jitter buffer delay after its arrival. Also, let spacing_playout_time (n) be the time at which the first packet would have been played out if it maintained a spacing of n with the end of previous talkspurt. Let X be the actual spacing between the last packet of the previous talkspurt and the present packet. Let actual_delay denote the time at which the packet is played out. Then:
These conditions are illustrated in
In
In
The above method is illustrated further in the flowchart of
During silence intervals, packets are sent from adaptive de-jitter buffer and control unit 1108 to a discontinuous transmission (DTX) unit 1112, wherein DTX unit 1112 provides background noise information to decoder 1114. The packets provided by the de-jitter buffer and control unit 1108 are ready for decode processing and may be referred to as vocoder packets. The decoder 1114 decodes the packets. In another aspect of the present disclosure, a time warping unit may be enabled to time warp speech packets as disclosed in application '931 “METHOD AND APPARATUS FOR AN ADAPTIVE DE-JITTER BUFFER,” filed Aug. 30, 2005 and assigned to the assignee of the present disclosure. Pulse code modulated (PCM) speech samples are provided to the time warping unit 1116 from decoder 1114. Time warping unit 1116 may receive a time warping indicator from de-jitter buffer and control unit 1108. The indicator may indicate expand, compress, or no warping of speech packets as disclosed in the abovementioned application for patent.
The method of
While the specification describes particular examples of the present invention, those of ordinary skill can devise variations of the present invention without departing from the inventive concept. For example, the teachings herein refer to circuit-switched network elements but are equally applicable to packet-switched domain network elements. Also, the teachings herein are not limited to authentication triplet pairs but can also be applied to use of a single triplet including two SRES values (one of the customary format and one of the newer format disclosed herein).
Those skilled in the art will understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Those skilled in the art will further appreciate that the various illustrative logical blocks, modules, circuits, methods and algorithms described in connection with the examples disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, methods and algorithms have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the examples disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The methods or algorithms described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The previous description of the disclosed examples is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these examples will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other examples without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the examples shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.