1. Field of the Invention
The present invention relates in general to signal coding and more particularly, to variable bit rate speech coding.
2. Background
Speech coding is traditionally driven by bandwidth considerations and efficiency. As a result, modern communication systems typically implement various speech coding and compression techniques to reduce requirements on bandwidth and to achieve higher transmission efficiency.
One typical scheme for providing speech coding is a technique called Pulse Code Modulation (“PCM”) that is used for converting speech signals into digital form and is widely used by the telephone companies in their T1 circuits. Every minute of the day, millions of telephone conversations, as well as data transmissions via modems, are converted into digital via PCM for transport over high-speed intercity trunks. PCM samples the analog waves 8,000 times per second and converts each sample into an 8-bit number, resulting in a 64 kbps data stream. In fact, the PCM technique has been adopted by the International Telecommunication Union (“ITU”) under G.711 standard which defines a single rate coding method at 64 kbps.
Another technique adopted by the ITU utilizes a method called Adaptive Differential PCM (“ADPCM”) that converts analog sound, such as speech, into digital. Using this technique, in lieu of coding an absolute measurement at each sample point, the difference between samples is coded. ADPCM can dynamically switch the coding scale to compensate for variations in amplitude. The ITU standards that have utilized this technique include G.721 (32 kbps), G.722 (64 kbps), G.723 (20 kbps and 40 kbps), G.726 (16 kbps, 24 kbps, 32 kbps and 40 kbps) and G.727 (16 kbps, 24 kbps, 32 kbps and 40 kbps).
A more recent ITU standard has adopted the Code Excited Linear Prediction Technique (“CELP”) in G.729 family, the main body and Annex A (8 kbps), Annex B (0 kbps and 1.5 kbps), Annex D (6.4 kbps), Annex E (11.2 kbps), and Annex I (0 kbps, 1.5 kbps, 6.4 kbps, 8 kbps and 11.2 kbps) that achieves high compression ratios along with toll quality narrow-band (telephone band) audio. A similar method has also been utilized in G.723.1 (5.3 kbps and 6.4 kbps). And a method called Low-Delay CELP (“LD-CELP”) has been used in G.728 (16 kbps) standards and provides near toll quality audio by using a smaller sample size that is processed faster, resulting in lower delays.
As noted above, G.723, G.726, G.727, G.729 Annex I and G.723.1 standards define a multi-rate capability for speech data transfer. Today, these multi-rates have been taken advantage of by the network providers, such as AT&T, MCI or Sprint, which control data bit rates according to predetermined factors, such as time of the day or particular usage of the network. For example, the network providers may decide to save network bandwidth during business hours and limit the data bit rate to 6.4 kbps. After business hours, however, the network providers may increase the data bit rate to 11.2 kbps. Yet, the network providers may allocate certain lines for high quality speech data transfer during specific hours.
As shown in
While such traditional multi-rate speech encoders have been successfully implemented in digital communication systems, they are restricted in use and application. Such systems are disadvantageous and inflexible, since data bit rates are set based on predetermined factors that may or may not hold true. As a result, too little or too much of the network bandwidth may be used for a given speech. For example, high quality speech, such as music, may be transmitted on a communication channel selected to transmit at low date rates, and thus, causing degradation in the quality. On the other hand, a high data rate communication channel may be wasted if only low quality speech, such as voice which does not require a high bandwidth, is transmitted.
Accordingly, there is an intense need in the technology for a flexible speech encoder that can efficiently utilize the bandwidth of a given communication channel. Furthermore, there is a strong need in the industry for a speech encoder system that can combine various speech encoding schemes while maintaining interoperability with the exiting speech decoders and standards.
In accordance with the purpose of the present invention as broadly described herein, there is provided method and system for rate determination coding.
In one embodiment, the present invention includes a data rate determinator and a plurality of data signal encoders. The data rate determinator determines the data rate for the data signal and selects one of the data signal encoders based on the determined data rate and encodes the data signal accordingly.
In another embodiment, the system includes a plurality of speech encoders, a network controller capable of selecting at least two of the speech encoders and a data rate determinator capable of determining the data rate of the speech signal and selecting, according to the data rate, one of the speech encoders selected by the network controller.
In one aspect of the present invention, the data or speech signal includes a number of frames and the data rate determinator determines the data rate of each of the frames and selects one of the encoders based on the data rate of each frame. The signal is then encoded frame-by-frame. In another aspect of the present invention, different encoding standards may be utilized for encoding various frames of the signal.
Other aspects of the present invention will become apparent with further reference to the drawings and specification, which follow.
The features and advantages of the present invention will become more readily apparent to those ordinarily skilled in the art after reviewing the following detailed description and accompanying drawings, wherein:
An embodiment of the present invention is shown in
As shown, speech signal 210 enters the encoding system 200 for transmission over communication channel 260. A “communication channel” refers to the medium or channel of communication. The communication channel may include, but is not limited to, a telephone line, a modem connection, an Internet connection, an Integrated Services Digital Network (“ISDN”) connection, an Asynchronous Transfer Mode (ATM) connection, a frame relay connection, an Ethernet connection, a coaxial connection, a fiber optic connection, satellite connections (e.g. Digital Satellite Services, etc.), wireless connections, radio frequency (RF) links, electromagnetic links, two way paging connections, etc., and combinations thereof.
In accordance with the practices of persons skilled in the art of computer programming, the present invention is described below with reference to symbolic representations of operations that are performed by the system 200 (
When implemented in software, the elements of the present invention are essentially the code segments to perform the necessary tasks. The program or code segments can be stored in a processor readable medium or transmitted by a data signal embodied in a carrier wave over a transmission medium or communication link. The “processor readable medium” may include any medium that can store or transfer information. Examples of the processor readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an erasable ROM (EROM), a floppy diskette, a CD-ROM, an optical disk, a hard disk, a fiber optic medium, a radio frequency (RF) link, etc. The computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc. The code segments may be downloaded via computer networks such as the Internet, Intranet, etc.
Returning to
For example, if the speech signal has the shape or characteristics of a male voice, the rate determination controller 220 may position the encoder selector 212 to select a medium data rate speech encoder, such as the speech encoder 230, G.729 6.4 kbps, to encode that particular frame. For the next frame, however, if the rate determination controller 220 finds a higher quality speech frame, such as music-like speech, the rate determination controller 220 may position the encoder selector 212 to select a high data rate encoder, such as the speech encoder 250, G.729 11.2 kbps, to encode that speech frame in order to prevent quality degradation. In one embodiment, the speech encoder 250 of the system 200 may be a G.727 ADPCM 24.0 kbps, in that event, positioning the encoder selector 212 to the speech encoder 250 by the rate determination controller 220 would cause the speech frame be encoded using the G.727 standard.
It should be noted that according to one embodiment of the present invention, various numbers of speech encoders of different standards may be included in the speech encoding system 200. Such embodiment, of course, requires a complementary speech decoding system that can support these various speech encoders in order to decode the speech on a frame-by-frame basis.
However, in some embodiments, the speech encoding system 200 may encode the speech frames using various speech encoders belonging to a single standard, such as G.729 Annex I. Such systems are advantageous since they require no change to the conventional decoding systems.
The rate determination controller 220 may be implemented as hardware, firmware or software, or any combination thereof. The resulting bit stream from each of the speech encoder 230, 240 and 250 is provided to a communication channel 260.
As described above, speech signal 210 is first routed to the rate determination controller 220 on a frame-by-frame basis. Once the speech signal 210 is routed to the rate determination controller 220, a predetermined flag in the header of the speech frame is analyzed to determine classification of the speech frame. For example, the value of the flag in the speech frame may indicate that the speech frame is a non-active speech signal (background noise or silence) and thus is to be processed by a low bit rate encoder. The value of the flag in the speech frame may indicate that the speech frame is an active speech and of high quality, such as music, and is thus to be processed using a high bit rate encoder. In the alternative, the value of the flag in the speech frame may indicate that the speech frame is an active speech but of medium quality, such as male voice, and is thus to be processed using a medium bit rate encoder. Once the encoding scheme is determined, the speech frame is routed to one of the speech encoders 1 . . . n via the encoder selector 212. It is understood that classification of the input speech may be accomplished by any type of control circuit or software, based on a predetermined standard, criterion or set of criteria, or based on system requirements and/or need.
Turning to
Just as explained above in relation to the embodiment of
The present invention thus provides an apparatus and method for providing flexible variable bit rate encoding. The flexible encoding scheme facilitates encoding of speech using any desired standard, criteria or fixed rate-bit encoders. In one embodiment, the speech encoders 440–480 may be existing fixed bit-rate encoders, such as GSM EFR (enhanced Full-Rate), IS-641 (TIA/EIA TDMA standard), etc., or in yet other embodiments, the speech encoders 440–480 may include single multi-rate standards, such as GSM AMR (adaptive multi-rate), or any combinations of the above.
At any given time interval, speech may be encoded using one or a plurality of standards and/or criteria. The encoding system of the invention may interface with a decoding system based on existing standards. Alternatively, it may interface with a decoding system implemented using new standards or a decoding system with a combination of existing and new standards. In this manner, the invention provides flexibility in choice of standards, bandwidth requirements or quality of service, while enabling use with existing systems and/or new systems. Existing decoding systems may interface with the encoding system of the invention without any change or alteration. At the same time, the encoding system may accommodate the use of new standards while providing flexibility of choice.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Number | Name | Date | Kind |
---|---|---|---|
5341456 | DeJaco | Aug 1994 | A |
5742734 | DeJaco et al. | Apr 1998 | A |
5761634 | Stewart et al. | Jun 1998 | A |
5778338 | Jacobs et al. | Jul 1998 | A |
5898696 | Proctor et al. | Apr 1999 | A |
5911128 | DeJaco | Jun 1999 | A |
6006178 | Taumi et al. | Dec 1999 | A |
6012026 | Taori et al. | Jan 2000 | A |
6104993 | Ashley | Aug 2000 | A |
6400693 | Otani | Jun 2002 | B1 |
Number | Date | Country |
---|---|---|
WO 0122402 | Mar 2001 | WO |