This application claims the priority of Korean Patent Application No. 10-2004-0109267 filed on Dec. 21, 2004, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
1. Field of the Invention
The present invention relates to methods and apparatuses for encoding and decoding, and more particularly, to methods and apparatuses for low bit rate encoding and decoding, which can efficiently compress data at a low bit rate while maintaining high sound quality.
2. Description of Related Art
Information carrier waves are analog signals, which are continuous in time and amplitude. Accordingly, in order to represent the information, carrier waves in a discrete form, analog-to-digital (A/D) conversion is used. A/D conversion comprises two processes: discretion in time (sampling), and quantization of amplitude. Sampling is a process that converts time continuous signals into time discrete signals. Amplitude quantization is a process that defines the number of possible amplitudes of discrete signals. Namely, amplitude quantization replaces input amplitude x(n) by y(n) within a limit of possible amplitude levels.
Generally, digital data is obtained after sampling and amplitude quantization of analog signals. It is then stored in a recording/storage medium, such as a compact disc (CD) or a digital audio tape (DAT), in pulse code modulation (PCM) format to be reproduced as needed. The PCM scheme for storage and reproduction helps to improve sound quality and to prevent degradation over time in comparison with any other analog scheme, but has a problem in the storage and communication of large amounts of data.
To solve this problem of the PCM scheme, differential pulse code modulation (DPCM) and adaptive differential pulse code modulation (ADPCM) schemes have been developed. Using these schemes, attempts have been made to reduce the amount of digital audio data, however, their efficiencies vary greatly depending on signal types. In the Moving Pictures Experts Group (MPEG)/audio scheme, which recently have been standardized by the International Standard Organization (ISO), or in the AC-2/AC-3 scheme, developed by Dolby Laboratories Inc., the human psychoacoustic model has been used to efficiently reduce the amount of data.
In known audio data compression schemes, such as MPEG-1/audio, MPEG-2/audio, or AC-2/AC-3, signals in the time domain, which are grouped into blocks of a set size, are transformed into signals in the frequency domain. The transformed signals are then subjected to scalar quantization using the human psychoacoustic model. The scalar quantization is simple, but not optimal, even when input samples are statistically independent, and it is certain to be at a great insufficiency when input samples are statistically dependent. To compensate for this, lossless compression encoding, such as entropy encoding or another type of adaptive quantization, is incorporated into the encoding process. Consequently, audio data compression schemes become much more complicated than those that only stores PCM data, and have bitstreams containing not only quantized PCM data but also additional information for data compression.
An MPEG/audio standardized scheme or an AC-2/AC-3 scheme provides sound quality comparable to that of a compact disc, at one-eighth to one-sixth of data of other known digital encoding methods, and at a bit rate of between 64 and 384 kbps. Thus, the MPEG/audio standard is expected to play an important role in storing and communicating audio signals in multimedia systems, such as digital audio broadcasting (DAB), audio on demand (AOD), and Internet phones.
Unfortunately, when encoding at low bit rate below 32 kbps, the encoding method with only signal quantization lacks available bits to encode. Accordingly, there is a need to have an efficient method for low bit rate compression of audio signals that can maintain close-to-original sound reproduction.
An aspect of the present invention provides a method and apparatus for low bit rate encoding and decoding, which provides efficient data compression and close-to-original sound reproduction.
According to an aspect of the present invention, there is provided an method of low bit rate encoding including transforming input audio signals in a time domain into spectral signals in a frequency domain, extracting important-spectrum components from the spectral signals in the frequency domain, and quantizing the important-spectrum components, extracting residual-spectrum components other than the important-spectrum components from the spectral signals in the frequency domain, and calculating and quantizing a noise level of the residual-spectrum components, and encoding the quantized important-spectrum components and the quantized noise level losslessly, and outputting encoded bitstreams.
According to another aspect of the present invention, there is provided an apparatus for low bit rate encoding including an important-spectrum component processing unit that extracts important-spectrum components from a spectral signal in a frequency domain and quantizes the important-spectrum components, a noise component processing unit that extracts residual-spectrum components other than the important-spectrum components from the spectral signal in the frequency domain, and calculates and quantizes noise levels for the residual-spectrum components, and a lossless encoding unit that encodes the important-spectrum components and the noise level losslessly, and outputs encoded bitstreams.
According to still another aspect of the present invention, there is provided an method of low bit rate decoding including decoding input bitstreams into spectral signals losslessly, dequantizing quantized important-spectrum components of decoded spectral signals, dequantizing noise level of additional information of the decoded spectral signals to generate noise components, combining the dequantized important-spectrum components and the noise components to be output as spectral signals in a frequency domain, and generating spectral signals in a time domain from the spectral signals in the frequency domain.
According to still another aspect of the present invention, there is provided an apparatus for low bit rate decoding including a lossless decoding unit that decodes input bitstreams into spectral signals losslessly, an important-spectrum component dequantizing unit that dequantizes quantized important-spectrum components of the decoded spectral signals, a noise component processing unit that dequantizes a noise level of additional information of the decoded spectral signals to generate noise components, a spectrum combining unit that combines the dequantized important-spectrum components and the noise components to be output as spectral signals in a frequency domain, and a signal generating unit that generates spectral signals in a time domain from the spectral signals in the frequency domain.
According to still other aspects of the present invention, there are provided computer-readable storage media encoded with processing instructions for causing a processor to execute the above-described methods.
Additional and/or other aspects and advantages of the present invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
The above and/or other aspects and advantages of the present invention will become apparent and more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
The signal transforming unit 100 transforms audio signals in the time domain into spectral signals in the frequency domain. A modified discrete cosine transform (MDCT) can be applied to make the time-to-frequency transformation. In addition, certain frequency components are divided into several sub-bands in the signal transforming unit 100.
The psychoacoustic modeling unit 110 calculates encoding bit-assignment information for each sub-band created by the signal transforming unit 100 to remove perceptual redundancy due to characteristics of the human auditory system. The psychoacoustic modeling unit 110 exploits human auditory characteristics to omit information to which the human auditory system is insensitive, and assigns separate bits for each frequency to reduce the amount of coding. It calculates encoding bit-assignment information in the context of psychoacoustics, and outputs the calculated information to the important-spectrum component processing unit 120 and the noise component processing unit 130.
The important-spectrum component processing unit 120 extracts important-spectrum components from spectral signals in the frequency domain, output by the signal transforming unit 100, and quantizes the important-spectrum components. The important-spectrum component processing unit 120 comprises an important-spectrum component extracting unit 121 and an important-spectrum component quantizing unit 122. The important-spectrum component extracting unit 121 determines and extracts important spectrum components for each spectrum range. The important-spectrum component quantizing unit 122 quantizes the important spectrum components extracted by the important-spectrum component extracting unit 121 at a bit rate according to the encoding bit-assignment information output by the psychoacoustic modeling unit 110.
The noise component processing unit 130 extracts residual-spectrum components other than important-spectrum components, and calculates and quantizes a noise level for the residual-spectrum components. The noise component processing unit 130 will later be explained in more detail.
The lossless encoding unit 140 receives quantized spectral signals from the important-spectrum component processing unit 120 and the noise component processing unit 130, losslessly encodes the spectral signals, and outputs encoded bitstreams. Lossless encoding, such as the Huffman coding and arithmetic coding can achieve efficient compression for encoding.
Referring to
Referring to
In operation S310, the psychoacoustic modeling unit 110 calculates encoding bit-assignment information to be assigned to each of the sub-bands, in order to remove perceptual redundancy that occurs due to human auditory characteristics. The psychoacoustic modeling unit 110 calculates the encoding bit-assignment information in terms of psychoacoustics, thereby assigning more bits to higher auditory perceptual frequencies and fewer bits to lower auditory perceptual frequencies.
In operation S320, the important-spectrum component processing unit 120 extracts important-spectrum components from the spectral signal in the frequency domain output by the signal transforming unit 100 and quantizes the important-spectrum components.
In operation S330, the noise component processing unit 130 extracts residual-spectrum components other than the important-spectrum components from the spectral signal in the frequency domain, calculates noise levels for the residual-spectrum components, and quantizes the noise levels. Operation S330 will later be explained in more detail.
In operation S340, the lossless encoding unit 140 receives the quantized spectral signal from the important-spectrum component processing unit 120 and the noise component processing unit 140 losslessly encodes the quantized spectral signal, and output encoded bitstreams in hierarchical format. The encoded bitstream comprises quantized data of the important-spectrum components and additional noise level information.
Referring to
In operation S410, the noise level calculating unit 210 divides the residual-spectrum components into predetermined sub-bands and calculates noise levels for various magnitudes of noise for each of the sub-bands.
The magnitudes of noise can be obtained by performing linear prediction analysis for each of the sub-bands. The linear prediction analysis is performed by using methods such as a well-known autocorrelation method, a covariance method, the Durbin's method, etc. Through linear prediction analysis, noise components for the current frame can be estimated. If it is estimated that there are more noise components than tone components in the current frame, the magnitude of the noise is transmitted as it is. Otherwise, if it is estimated that there are less noise components than there are tone components in the current frame, the magnitude of the noise is reduced prior to being transmitted. In addition, in the case of a small window where noise components are abruptly changing, the magnitude of the noise is further reduced before being transmitted.
The noise level can be obtained by the following equation:
aNoise=√{square root over (Energy/nCountFreq)}×dNoise×α (1)
where, Energy is the energy of the sub-band, nCountFreq is the number of non-zero spectrum components, dNoise is the calculated magnitude of the noise for the sub-band, and α is a perceptual weight constant determined by the noise characteristics. α is selected to be smaller (e.g., 0.3) for a temporary noise (where data is transformed using a short window), and α is selected to be greater (e.g., 0.7) for a constant noise, such as white noise (where data is transformed using a long window).
In operation S420, the noise level quantizing unit 220 quantizes the noise level at a bit rate according to the encoding bit-assignment information input by the psychoacoustic modeling unit 110.
The lossless decoding unit 600 losslessly decodes received bitstreams, and outputs spectral signals to the important-spectrum component dequantizing unit 610 and the nose level processing unit 620. More specifically, the lossless decoding unit 600 extracts data and additional information from bitstreams in hierarchical format.
The important-spectrum component dequantizing unit 610 dequantizes important-spectrum components of the decoded spectral signal.
The noise level processing unit 620 comprises a noise level dequantizing unit 621 that dequantizes the noise level in the decoded spectral signal, and a noise component generating unit 622 that generates a noise component from the dequantized noise level for the remaining range other than the predetermined range for the important-spectrum component.
The spectrum component combining unit 630 combines the dequantized important-spectrum components and the noise components to be output as a spectral signal in the frequency domain.
The signal generation unit 640 generates an audio signal in the time domain from the spectral signal in the frequency domain.
Referring to
In operation S710, the important-spectrum component dequantizing unit 610 dequantizes the important-spectrum components of the quantized data of the decoded spectral signal.
In operation S720, the noise level processing unit 620 dequantizes the noise level of the additional information from the decoded spectral signal to generate noise components. The noise level dequantizing unit 621 then dequantizes the noise level of the decoded spectral signal, and the noise component generating unit 622 generates noise components for the remaining range other than a predetermined range aroundfor the important-spectrum component.
In operation S730, the spectrum component combining unit 630 combines the dequantized important-spectrum components and the noise components to output as spectral signals in the frequency domain.
In operation S740, the signal generating unit 640 generates audio signals in the time domain from the spectral signals in the frequency domain.
It is possible for the methods of low bit rate encoding and decoding, according to the above-described embodiments of the present invention to be implemented as a computer program. Codes and code segments constituting the computer program may readily be inferred by those skilled in the art. The computer programs may be recorded on computer-readable media and read and executed by computers. Such computer-readable media include all kinds of storage devices, such as ROM, RAM, CD-ROM, magnetic tape, floppy discs, optical data storage devices, etc. The computer-readable media may be distributed to computer systems connected to a network, and codes on the distributed computer-readable media may be stored and executed in a decentralized fashion.
According to the above-described embodiments of the present invention, by separately encoding important-spectrum components and noise components of an audio signal, efficient data compression and high fidelity to the original sound can be achieved.
Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2004-0109267 | Dec 2004 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
5692102 | Pan | Nov 1997 | A |
5886276 | Levine et al. | Mar 1999 | A |
6061649 | Oikawa et al. | May 2000 | A |
6263312 | Kolesnik et al. | Jul 2001 | B1 |
6424939 | Herre et al. | Jul 2002 | B1 |
6741960 | Kim et al. | May 2004 | B2 |
6766293 | Herre et al. | Jul 2004 | B1 |
6782361 | El-Maleh et al. | Aug 2004 | B1 |
7031269 | Lee | Apr 2006 | B2 |
7246065 | Tanaka et al. | Jul 2007 | B2 |
7272566 | Vinton | Sep 2007 | B2 |
7447631 | Truman et al. | Nov 2008 | B2 |
7634400 | Averty et al. | Dec 2009 | B2 |
8200497 | Hardwick | Jun 2012 | B2 |
20020128828 | Gao | Sep 2002 | A1 |
20030088328 | Nishio et al. | May 2003 | A1 |
20030115042 | Chen et al. | Jun 2003 | A1 |
20060015328 | Van Schijndel et al. | Jan 2006 | A1 |
20060247929 | Van De Par et al. | Nov 2006 | A1 |
Number | Date | Country |
---|---|---|
10-91196 | Apr 1998 | JP |
10-91196 | Apr 1999 | JP |
1999-0082402 | Nov 1999 | KR |
Entry |
---|
Hendriks,. R. et al. “Perceptual linear predictive noise modelling for sinusoid-plus-noise audio coding” International Conf. on Audio, speech and signal Processing vol. 4, 189-92, May 2004. |
Purnhagen, H. “Advances in parametric audio coding” IEEE workshop of signal processing to audio and acoustics, New York, Oct. 1999. |
Number | Date | Country | |
---|---|---|---|
Parent | 11312457 | Dec 2005 | US |
Child | 13678413 | US |