1. Field of the Invention
This invention relates to digital audio compression and, more particularly, to MPEG audio encoding.
2. Description of the Related Art
The computational capability of modern computer systems and the use of compression algorithms have made the use of complex multimedia applications possible. For example, a personal computer or workstation may be capable of running applications that allow a user to listen to high quality music reproductions or watch a motion picture. Compression algorithms may allow a digital signal to be transferred at a very high bit rate.
There are many compression algorithms available for compressing digital audio signals such as Code Excited Linear Prediction (CELP), μ-law and Adaptive Differential Pulse Code Modulation (ADPCM). Compressing an audio signal allows a higher bit density to be transmitted from an encoding device to a decoding device and it allows a higher bit density when storing an audio sample to a storage medium such as a compact disk (CD).
Another compression algorithm, known as the (MPEG)/audio compression algorithm, was developed by the Moving Picture Experts Group as an international standard for compressing high-fidelity audio. The MPEG/audio standard is one part of a three-part standard relating to the compression of audio and video and the synchronization of the respective audio and video streams. For a more detailed description of the MPEG/audio compression algorithm, see the ISO/IEC 11 172-3 standard.
The MPEG/audio compression standard is based on the perceptual limitations of the human auditory system. Thus, the portions of an audio signal that may be either out of the normal auditory range or masked by stronger portions are removed from the signal. Although the removal of these components results in a distorted signal, the distortions may either be inaudible or barely perceptible.
In an MPEG encoder, incoming digital audio samples are separated into frequency bands and encoded. This may be accomplished using a polyphase filter bank and a psychoacoustic model. The filter bank may utilize one form of a discrete cosine transform. The psychoacoustic model may use a Fourier transform for frequency domain transformation. In the psychoacoustic model, the frequency spectra are then separated into sub-bands and calculations are performed to determine the signal-to-mask ratios used in final quantization and encoding of the digital samples.
Many computer systems run multimedia application software that allows a user to view MPEG movies or listen to MPEG audio. As multimedia applications have become more sophisticated, the demands placed on computers have increased. Microprocessors are now routinely provided with enhanced support for these applications. For example, many processors now support single-instruction multiple-data (SIMD) commands such as MMX instructions. Advanced Micro Devices, Inc. (hereinafter referred to as AMD) has implemented 3DNow!™, a set of floating point SIMD instructions on x86 processors such as the Athlon™ processor. Software applications may use these instructions to accomplish signal processing functions and the traditional x86 instructions to accomplish other desired functions.
However, though the above instructions may be efficient, the repeated execution of some of the encoder compression floating point calculations may take as much as 25% of the computational overhead of an MPEG/audio compression algorithm. Therefore, a more efficient way of performing the calculations associated with the psychoacoustic model is desired.
Various embodiments of an efficient finite length POW10 calculation for MPEG audio encoding are disclosed. In one embodiment, a method for encoding an audio input signal includes storing a plurality of predetermined tonal values corresponding to a plurality of predetermined power levels. The method also includes receiving a plurality of input values each representative of a power level of a spectral component of the audio input signal at a corresponding frequency sub-band and accessing at least one corresponding tonal value of the plurality of predetermined tonal values. The method further includes generating an encoded output signal representative of the audio input signal by using at least one corresponding tonal value for each of the plurality of input values. Further, the storing of the plurality of predetermined tonal values is performed prior to the receiving of the plurality of input values.
In an additional embodiment, a method for calculating tonal values of spectral components of an audio input signal for an audio encoder includes storing a plurality of predetermined tonal values corresponding to a plurality of predetermined power levels, receiving a plurality of input values each representative of a power level of a spectral component of the audio input signal at a corresponding frequency sub-band and accessing at least one corresponding tonal value of the plurality of predetermined tonal values. The method further includes generating a composite tonal value using at least one of the corresponding tonal values. Further, storing the plurality of predetermined tonal values is performed prior to receiving the plurality of input values.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
Turning now to
In one embodiment, system memory 30 is a memory in which application programs may be stored and from which processor 10 may primarily execute. A suitable system memory 30 comprises Dynamic Random Access Memory (DRAM). For example, a plurality of banks of SDRAM (Synchronous DRAM), DDR SDRAM (Double Data Rate), or Rambus DRAM (RDRAM may be suitable. In addition, computer system 100 may include installation media devices such as a CD-ROM (not shown) or a floppy disk (not shown).
As described above, processor 10 may execute software instructions that perform an MPEG/audio encoding process. During the encoding process, digital audio samples may be encoded or compressed into the MPEG/audio format. The digital audio sample may come from various sources. In one embodiment, the MPEG/audio encoder may be an application. However it is contemplated that the MPEG/audio encoder software may be incorporated into the operating system. It is also contemplated that in other embodiments, more than one processor such as processor 10 may run the encoding process software.
In this particular illustration, sound card 50 may accept an analog audio input 55. Sound card 50 may then convert the analog signal into a digital representation consisting of multiple digital samples which may be stored to mass storage 40. It is contemplated that mass storage 40 may be a hard disk drive, a tape drive, a ram disk or any other storage device suitable for storing digital data. In other embodiments, the digital audio samples may come from other sources such as digital audio files, referred to as WAV files. It is contemplated that other sources may also provide digital audio samples to computer system 100.
Functional blocks may represent the MPEG/audio encoder software routines. One of the blocks is the psychoacoustic model introduced in the background section above. As will be described in greater detail below, the psychoacoustic model is used to calculate a signal-to-mask ratio which is then used in subsequent calculations for allocation of bits during the encoding process.
Referring to
As described above in conjunction with the background, filter bank 210 may perform a time to frequency transformation of the digital audio samples. Thus transforming the samples into frequency spectra.
Psychoacoustic model 230 also transforms the digital audio samples into bands, referred to as frequency spectra. In one embodiment, psychoacoustic model 230 may use a fast Fourier transform to perform the transformation. Once transformed, each of the frequency bands is represented by a power level. The bands may then be broken into further sub-bands characterized according to the human aural range. Psychoacoustic model 230 may then calculate the signal-to-mask ratio for each frequency sub-band by determining the tonal and non-tonal components.
In one embodiment, an interim power of ten calculation is used when determining the tonal components of the frequency sub-bands. This power of ten calculation is typically a floating-point calculation. The power level associated with a particular frequency sub-band is operated on by a software instruction referred to as POW10. The POW10 calculation is closely approximated a 10x floating-point calculation where x is the power level associated with a particular sub-band. In some applications, as each sub-band is input to the software routine, processor 10 of
If the input power level is a floating-point number x in the mathematical expression 10x, then ‘x’ may have both an integer portion and a decimal portion. Thus the above mathematical expression 10x may also be expressed as 10i+d, or 10i×10d, where ‘i’ is the integer and ‘d’ is the decimal. Thus, if the floating-point number x is separated into its integer and decimal portions, then the 10x calculation may be performed on the integer and decimal portions independently. The result of the independent integer and decimal calculations may then be multiplied together to obtain the resultant 10x.
In one embodiment, the POW10 calculations may be done while the encoder software is initializing. During initialization, the POW10 calculations may be performed on a finite set of possible input values representing the power levels of the frequency sub-bands. These values may be stored in system memory 30 or mass storage 40 of FIG. 1. As will be described in greater detail below, the calculations may be stored in one or more tables, which can then be accessed by an index value.
A code segment which uses the POW10 calculations is shown below as a portion of the encoder software. It is noted however that the code segment shown below is only an exemplary code segment and that in other embodiments, other code segments and other programming languages may be used.
As described above, the illustrated code segment uses power of ten values previously calculated using floating-point calculations and stored in memory to perform integer calculations. The resulting integer calculations may reduce processor overhead associated with psychoacoustic model 230.
Turning now to
It is noted that in the illustrated embodiment the int_part column is numbered from 0 to 511, which corresponds to the finite set of possible integers. It is contemplated that in other embodiments more or less integer values may be used in the finite set and therefore tonal value integer table 300 may have more or less entries.
Referring to
It is noted that in the illustrated embodiment the dec_part column is numbered from 0 to 1023, which corresponds to the finite set of possible decimals. It is contemplated that in other embodiments more or less decimal values may be used in the finite set and therefore tonal value decimal table 350 may have more or less entries.
Referring collectively to FIG. 3A and
Various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the above description upon a carrier medium. Generally speaking, a carrier medium may include storage media or memory media such as magnetic or optical media, e.g., disk or CD-ROM, volatile or non-volatile media such as RAM (e.g. SDRAM, DDR SDRAM, RDRAM, SRAM, etc.), ROM, etc. as well as transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as network and/or a wireless link.
Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Number | Name | Date | Kind |
---|---|---|---|
5252773 | Kozuki et al. | Oct 1993 | A |
5721806 | Lee | Feb 1998 | A |
5764698 | Sudharsanan et al. | Jun 1998 | A |
5805770 | Tsutsui | Sep 1998 | A |
5864802 | Kim et al. | Jan 1999 | A |
6137046 | Kamiya | Oct 2000 | A |
6385572 | Hu | May 2002 | B1 |