1. Field of the Invention
The present invention relates to a method of simplifying psychoacoustic analysis, and more particularly, to a method of simplifying psychoacoustic analysis by utilizing spectral flatness for an audio compression system.
2. Description of the Prior Art
With rapid development of electronic video products, video compression technology applied to the electronic video products is more and more important, in which the Motion Picture Experts Group (MPEG) is indeed a mainstream for the video compression.
Please refer to
Before the MDCT is executed, the block type needs to be determined for transforming the sound signal, namely the sound signal is suitable for a long-block or a short-block MDCT to transform. The long-block MDCT is utilized if the sound signal is a short-term stationary signal, and the short block MDCT is utilized if the sound signal has a transition, to avoid pre-echo noise.
Please refer to
In addition, when spectral characteristic of left and right channel signals of the sound signal are similar, the M/S transform can remove correlation of the left and right channel signals, and then compress the sound signal, to increase efficiency of compression. For example, if the left channel signal of the sound signal is defined as L[n], and the right channel signal is defined as R[n], then the middle signal is defined as M[n]=√{square root over (2)}×(L[n]+R[n])/2, and the side signal is defined as S[n]=√{square root over (2)}×(L[n]−R[n])/2. As can be seen, the middle signal is the same part of the left and right channel signals, and the side signal is the different part of the left and right channel signals. Therefore, the M/S transform can decrease data amount and increase efficiency of compression. As a result, determining whether the spectral characteristic of the left and right channel signals are similar can determine whether the M/S transform is suitable for the sound signal.
Please refer to
Therefore, the abovementioned processes 20 and 30 may increase an amount of the calculation, and affect efficiency of the system.
Therefore, the present invention provides a method and related device of simplifying psychoacoustic analysis by utilizing spectral flatness, for increasing efficiency of compression.
The present invention discloses a method of simplifying psychoacoustic analysis with spectral flatness characteristic values, which includes calculating energy of a plurality of frames of a sound signal in a frequency domain, calculating a plurality of spectral flatness according to the energy of the plurality of frames in the frequency domain, and using a short-block or a long-block Modified Discrete Cosine Transform (MDCT) for transforming each frame of the plurality of frames according to the plurality of spectral flatness.
The present invention further discloses an audio converter device utilized in an audio compression system, for executing the method abovementioned.
The present invention further discloses a method of simplifying psychoacoustic analysis with spectral flatness, which includes calculating energy of a left and right channel signals of a sound signal in a frequency domain, calculating spectral flatness of the left and right channel signals according to the energy of the left and right channel signals in the frequency domain, using a middle/side (M/S) transform or left and right channel encoding to transform the left and right channel signals according to the spectral flatness of the left and right channel signals.
The present invention further discloses an audio converter device utilized in an audio compression system, for executing the method abovementioned.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
The present invention discloses a method of simplifying psychoacoustic analysis with spectral flatness characteristic values, which utilizes spectral flatness for determining a block type and a middle/side type (M/S type) of a sound signal, so as to simplify execution of psychoacoustic analysis and increase efficiency of compression.
Please refer to
Step 400: Start.
Step 402: Calculate energy of a plurality of frames of a sound signal in a frequency domain.
Step 404: Calculate a plurality of spectral flatness of the plurality of frames according to the energy of the plurality of frames in the frequency domain.
Step 406: Use a short-block or a long-block Modified Discrete Cosine Transform (MDCT) for transforming each frame of the plurality of frames according to the plurality of spectral flatness.
Step 408: End.
According to the process 40, the embodiment of the present invention calculates the energy of the frames of a sound signal in a frequency domain, and calculates the spectral flatness of the frames according to the energy, so as to determine to use the short-block or the long-block MDCT to transform each frame. Therefore, by utilizing the calculation of the spectral flatness, the sound signal can be determined to use the short-block or the long-block MDCT for transform. Moreover, if the sound signal uses the short-block MDCT for transform in Step 204, the calculation in Step 202 becomes unnecessary, so as to increase efficiency of compression and simplify twice psychoacoustic analysis (as shown in
In Step 402, the sound signal goes through pulse-code modulation (PCM), proper filtering, subband filtering or Fast Fourier Transform (FFT), etc. for obtaining parameters of the energy of the plurality frames of the sound signal in the frequency domain. Take subband filtering as an example, a frame is defined as a[t], t=0˜(N−1), and divided into M frequency bands by subband filtering, in which each frequency band marked as A[0][k], A[1][k], A[2][k] . . . A[M−1][k], k=0˜(N/M−1). Therefore, parameters of the energy of the plurality frames can be indicated as an energy sequence A_ene[m]. In Step 404, by utilizing the parameters of the energy, the spectral flatness of the frame a[t] is obtained through the energy sequence A_ene[m] by the following formula (A):
Finally, in Step 406, the frames are transformed by short-block or long-block MDCT according to the spectral flatness. A detailed operation method related to Step 406 is shown in
Step 500: Start.
Step 502: Compare the spectral flatness of one frame with a preceding frame of the plurality of frames, to generate a first differential value.
Step 504: Compare the spectral flatness of the frame with a next frame, to generate a second differential value.
Step 506: Compare the first differential value with the second differential value, to generate a third differential value.
Step 508: Determine whether the third differential value is greater than a preset value. If yes, perform Step 510; otherwise perform Step 512.
Step 510: Use the short-block MDCT to transform the frame.
Step 512: Use the long-block MDCT to transform the frame.
Step 514: End.
Please refer to
As mentioned above, the first differential value ΔN−1 and the second differential value ΔN indicate a variance of the frame grN−1 and the preceding frame grN−2, and a variance of the frame grN−1 and the next frame grN. Certainly, besides utilizing the absolute value, a logarithm value can be utilized for the spectral flatness of the frames. For example, the first differential value ΔN−1 is an absolute value of a variance of logarithm values of the spectral flatness of the frame grN−1 and the preceding frame grN−2, and the second differential value ΔN is an absolute value of a variance of logarithm values of the spectral flatness of the frame grN−1 and the next frame grN. In this situation, the preset value could be set to 3, which is not limited herein. Certainly, a way of comparing the spectral flatness of each frame abovementioned is only an embodiment, which is not limited herein, and values related to the spectral flatness comparison, such as the preset value, could be modified accordingly.
Therefore, the present invention utilizes the spectral flatness for determining the block type of a frame, and decides to use the short-block or the long-block MDCT for transforming the frame, thereby efficiency of compression is increased by simplifying twice psychoacoustic analysis (as shown in
Note that, in Step 402, the frames is defined as a[t], t=0˜(N−1) if parameters of the energy of the plurality of frames in the frequency domain included in the sound signal is obtained by FFT; then, the frame a[t] is transformed by FFT, to obtain a complex sequence A[n]+B[n]*i, n=0˜(N/2−1) in the frequency domain, where A[n] is a real part of the complex sequence, B[n] is an imaginary part of the complex sequence, and i is an imaginary root; finally, an energy sequence A_ene[n]=A[n]*A[n]+B[n]*B[n], n=0˜(N/2−1) of the frame a[t] is calculated.
In addition, for a stereo sound signal transform, please refer to
Step 700: Start.
Step 702: Calculate energy of the left and the right channel signals of a sound signal in a frequency domain.
Step 704: Calculate spectral flatness of the left and the right channel signals according to the energy of the left and the right channel signals in the frequency domain.
Step 706: Use the M/S transform or left and right channel encoding to transform the left and the right channel signals according to the spectral flatness of the left and the right channel signals.
Step 708: End.
Similar to the process 40, the process 70 decides the transform method of the stereo signal according to the spectral flatness. The process 70 calculates the energy of the left and right channel signals of the sound signal in the frequency domain, and determines to use M/S transform or the left and right channel encoding to transform the left and right channel signals according to the calculated spectral flatness of the left and right channel signals.
In Step 702, the sound signal goes through PCM and proper filtering, such as subband filtering or FFT, etc. for obtaining the parameters of energy of the left and right channel signals of the sound signal in the frequency domain. Take the subband filtering as an example, the left or right channel signal is defined as c[t], t=0˜(N−1); the left or right channel signal c[t] is divided into M frequency bands by subband filtering, where each frequency band marked as C[0][k], C[1][k], C[2][k] . . . C[M−1][k],k=0˜(N/M−1). Therefore, the energy sequence C_ene[m] indicates the parameters of the energy of the left or the right channel signal in frequency domain. In addition, Step 702 of an embodiment of the present invention utilizes FFT for obtaining the parameters of the energy of the plurality of frames of the sound signal in frequency domain. Suppose the left or right channel signal is defined as c[t], t=0˜(N−1); the left or the right channel signal c[t] using is transformed by FFT, to obtain a complex sequence C[n]+D[n]*i, n=0˜(N/2−1) in the frequency domain, where C[n] is a real part of the complex sequence, D[n] is an imaginary part of the complex sequence, and i is an imaginary root; finally, an energy sequence C_ene[n]=C[n]*C[n]+D[n]*D[n],n=0˜(N/2−1) of the left or the right channel signal c[t] is calculated.
In the embodiment of the present invention utilizing subband filtering for obtaining the parameters of energy of the left and right channel signals of the sound signal in the frequency domain, Step 704 uses the parameters of energy for calculating the spectral flatness of the left and right channel signals. Please refer to the following formula (B) for calculation of the spectral flatness.
Finally, in Step 706, the left and right channel signals are determined to undergo the M/S transform or left and right channel encoding according to the spectral flatness of the left and right channel signals. The M/S transform is used to transform the left and right channel signals when a variation of spectral flatness of the left and the right channel signals is smaller than a preset value. The left and right channel encoding is used to transform the left and the right channel signals when a variation of spectral flatness of the left and the right channel signals is greater than the preset value. Preferably, after the present invention calculates and obtains the logarithm values of the spectral flatness of the left and right channel signals, the present invention compares the absolute value of the variance of the logarithm value of the spectral flatness of the left and right channel signals. The M/S transform is used to transform the left and right channel signals if an absolute variation is smaller than 5, which means spectral of the left and the right channels are similar. The left and right channel encoding are used to transform the left and right channel signals if the absolute variation is greater than 5. Certainly, a way of comparing the spectral flatness of the left and the right channels abovementioned is only an embodiment, which is not limited herein, and values related to the spectral flatness comparison, such as the preset value, could be modified accordingly.
Therefore, the present invention utilizes the spectral flatness for determining variance of the left and right channel signals, and determining whether using the M/S transform to transform the left and right channel signals. Therefore, when Step 302 as shown in
In
On the other hand, as to the sound signal transform shown in
Similarly, the electronic device 80 can be a model for an electronic device to realize the process 70 shown in
In conclusion, the present invention utilizes the spectral flatness for determining the block type of a frame, and decides to use the short-block or the long-block MDCT for transforming the frame. Meanwhile, the present invention utilizes the spectral flatness for determining variance of the left and right channel signals, and determining whether using the M/S transform to transform the left and the right channel signals. Therefore, a process of determining the block type and characteristics of the left and right channel signals in the present invention simplifies the number of execution, and increases efficiency of compression, so as to realize the goal of the present invention.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2008 1 0178895 | Dec 2008 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
5812672 | Herre et al. | Sep 1998 | A |
6456963 | Araki | Sep 2002 | B1 |
7283968 | Youn | Oct 2007 | B2 |
20020022898 | Araki | Feb 2002 | A1 |
20030088423 | Nishio et al. | May 2003 | A1 |
20030115052 | Chen et al. | Jun 2003 | A1 |
20030215013 | Budnikov | Nov 2003 | A1 |
20040002854 | Ha | Jan 2004 | A1 |
20040083110 | Wang | Apr 2004 | A1 |
20040162720 | Jang et al. | Aug 2004 | A1 |
20040181403 | Hsu | Sep 2004 | A1 |
20040196913 | Chakravarthy et al. | Oct 2004 | A1 |
20080004873 | Liu et al. | Jan 2008 | A1 |
20080136686 | Feiten | Jun 2008 | A1 |
Entry |
---|
Suresh et al. “Direct MDCT Domain Psychoacoustic Modeling”, IEEE International Symposium on Signal Processing and Information Technology, 2007. |
Herre et al. “Robust Matching of Audio Signals Using Spectral Flatness Features”, IEEE Workshop on the application of signal processing to audio and acoustics, 2001. |
Herre et al. “MP3 Surround: Efficient and Compatible Coding of Multi-Channel Audio”, Audio Engineering Society convention paper, Berlin, Germany, May 2004. |
Brandenburg, “Perceptual Coding of High Quality Digital Audio”, Applications of Digital Signal Processing to Audio and Acoustics, The Kluwer International Series in Engineering and Computer Science, vol. 437, 2002. |
Ivan Dimkovic, “Improved ISO AAC coder”, [online] “www.psytel-veseard.co.yu/papers/di0400I.pdf”, 2004. |
Number | Date | Country | |
---|---|---|---|
20100145682 A1 | Jun 2010 | US |