This application claims the benefit of Korean Patent Application No. 10-2005-0020136, filed on Mar. 10, 2005, in the Korean Intellectual Property Office, the disclosure of which incorporated herein by reference.
1. Field of the Invention
The present invention relates to audio coding and decoding apparatuses and methods, and recording mediums on which the methods are recorded, and more particularly, to audio coding and decoding apparatuses and methods in which the quality of an audio signal including harmonics can be optimized, and recording mediums on which the methods are recorded.
2. Description of Related Art
As the range of applications of audio coders has increased, the demand for low transmission rate coders has also increased. As such, a code excited linear prediction (CELP) coder is being used for transmission rates equal to or greater than 4 kbps, and a harmonic-CELP coder is being used for transmission rates of less than 4 kbps. The reason why a harmonic-CELP coder is being used for transmission rates of less than 4 kbps is that, in a CELP coding algorithm, sound quality is lowered when there are too few quantization bits, whereas, in a harmonic coding algorithm, the periodicity of a voiced sound that greatly affects sound quality, even fewer smaller bits, is well modeled.
A harmonic vector excitation coder (HVXC), which uses the MPEG-4 audio standard is an example of a harmonic-CELP coder. An HVXC is characterized by quantization of a variable dimension harmonic vector, high-speed harmonic synthesis, harmonic amplitude estimation using a real number pitch, and natural property control using noise mixing.
However, in a harmonic-CELP coder, an audio signal section (or voiced sound section) including harmonics is formed by interpolating standard waveforms of a previous frame and a current frame so that there is a high probability that pitch halving prediction in which a pitch lag is reduced by half or pitch doubling prediction in which a pitch lag is doubled can be performed in a transition section of the harmonic-CELP coder. When the pitch halving prediction or the pitch doubling prediction is performed, waveform distortion and discontinuity occur at a frame boundary due to a severe amount of variation of pitch lag.
In addition, since an overlap-addition method through a triangular window is used in harmonic synthesis, when a signal in an audio signal section including harmonics in a transition section increases or decreases instantaneously, a synthesis excitation signal may disadvantageously increase or decrease linearly due to the effect of the triangular window.
An aspect of the present invention provides audio coding and decoding apparatuses and methods in which the quality of an audio signal including harmonics can be optimized, and recording mediums on which the methods are recorded.
An aspect of the present invention also provides audio coding and decoding apparatuses and methods in which pitch halving prediction or pitch doubling prediction in an audio signal section including harmonics can be prevented, and recording mediums on which the methods are recorded.
An aspect of the present invention also provides audio coding and decoding apparatuses and methods in which harmonic amplitude information is converted into a quantized LPC coefficient and the quantized LPC coefficient is used to extract LPC coefficients needed by a second harmonic coding module and a CELP module, and recording mediums on which the methods are recorded.
An aspect of the present invention also provides audio coding and decoding apparatuses and methods in which bit allocation for a plurality of coding modules is performed differently according to whether harmonics are included in an input audio signal, and recording mediums on which the methods are recorded.
An aspect of the present invention also provides audio coding and decoding apparatuses and methods in which scalability can be easily applied, and recording mediums on which the methods are recorded.
According to an aspect of the present invention, there is provided an audio coding apparatus, the audio coding apparatus including: a first harmonic coding module performing first harmonic coding on an input audio signal using a pitch lag of the input audio signal and producing a quantized linear prediction coding coefficient; a first detector detecting a first difference audio signal from a difference between an audio signal output from the first harmonic coding module and the input audio signal; a second harmonic coding module performing harmonic coding on the first difference audio signal using the quantized linear prediction coding coefficient and a previous harmonic coding result; a second detector detecting a second difference audio signal obtained from a difference between an audio signal output from the second harmonic coding module and the first difference audio signal; and a code excited linear prediction (CELP) module CELP coding the second difference audio signal using the quantized linear prediction coding coefficient obtained from the first harmonic coding module.
The first harmonic coding module may convert an amplitude of harmonics of the input audio signal into a linear prediction coding coefficient, quantize the converted linear prediction coding coefficient, and provide the quantized linear prediction coding coefficient to the second harmonic coding module and the CELP module, respectively.
The second harmonic coding module may extract a quantized linear prediction coding coefficient needed for the second harmonic coding using the quantized linear prediction coding coefficient obtained from the first harmonic coding module.
According to another aspect of the present invention, there is provided an audio decoding apparatus, the audio decoding apparatus including: an inverse quantization unit inverse quantizing each of a plurality of parameters to restore an audio signal; a first harmonic decoding module performing harmonic decoding using a linear prediction coding coefficient and a phase vector output from the inverse quantization unit; a second harmonic decoding module performing harmonic decoding based on the linear prediction coding coefficient, a harmonic index, and a first gain value output from the inverse quantization unit; a first adder adding a signal output from the first harmonic decoding module to a signal output from the second harmonic decoding module; a code excited linear prediction (CELP) decoding module performing CELP decoding based on a stochastic codebook index output from the inverse quantization unit and a second gain value output from the inverse quantization unit; and a second adder adding a signal output from the first adder to a signal output from the CELP decoding module and outputting the result as a restored audio signal.
According to another aspect of the present invention, there is provided an audio coding method, the audio coding method including: harmonically coding an input audio signal without analyzing a linear prediction coding coefficient; analyzing a linear prediction coding coefficient of a difference audio signal obtained from a difference between the input audio signal and the harmonic-coding result and harmonically coding the difference audio signal; and CELP coding a difference audio signal obtained from a difference between the result of harmonically coding on the difference audio signal and the input audio signal.
According to another aspect of the present invention, there is provided an audio decoding method, the audio decoding method including: inverse quantizing a plurality of parameters for restoring an audio signal; first harmonic decoding using a linear prediction coding coefficient and a phase vector obtained through the inverse quantizing; second harmonic decoding using a linear prediction coding coefficient, a harmonic index, and a first gain value obtained through the inverse quantizing; first adding the first harmonic decoding result to the second harmonic decoding result; CELP decoding using a stochastic index and a second gain value obtained through the inverse quantization; and adding the result obtained through the first adding to the result obtained through the CELP decoding to obtain a restored audio signal.
According to another aspect of the present invention, there is provided a recording medium on which a program for performing an audio coding method is recorded, the audio coding method including: harmonically coding an input audio signal without analyzing a linear prediction coding coefficient; analyzing a linear prediction coding coefficient of a difference audio signal obtained from a difference between the input audio signal and the harmonic-coding result and harmonically coding the difference audio signal; and CELP coding a difference audio signal obtained from a difference between the result of harmonically coding on the difference audio signal and the input audio signal.
Additional and/or other aspects and advantages of the present invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
The above and/or other aspects and advantages of the present invention will become apparent and more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
The pitch analyzer 110 analyzes the pitch of an input audio signal and detects a pitch lag tp. The pitch lag tp is obtained using a normalized auto-correlation function shown in Equation 1
where s(n) is the input audio signal, Lf is the length of a portion of the audio signal s(n) to be analyzed, and LMIN and LMAX are the maximum and minimum of the pitch, respectively. In general, LMIN and LMAX are 20 and 143, respectively. Maximum values of R(i) are found for LMIN≮≮MIN+19, LMIN+20≮≮MIN+39, LMIN+40≮≮MAX, respectively. If the respective values of i as t3, t2, and t1, one value is selected from t3, t2, and t1 as a pitch lag tp based on Equation 2.
The pitch lag tp detected by the pitch analyzer 110 is provided to the first harmonic coding module 140.
The signal classifier 120 determines whether harmonics are included in the input audio signal. That is, the signal classifier 120 detects values of the input signal such as a sharpness rate, a right and left energy rate, a zero-crossing rate, and a first-order prediction coefficient, compares a threshold value for each detected value with the detected values, and if the comparison result satisfies a predetermined condition, the signal classifier 120 can determine that the harmonics are included in the input audio signal. The comparison can be performed in subframe units. The determination result of the signal classifier 120 is provided to the bit allocator 130.
The bit allocator 130 provides allocation bit information for the first harmonic coding module 140, the second harmonic coding module 150, and the CELP module 180 according to the determined result provided by the signal classifier 120. If a signal indicating that the harmonics are included in the input audio signal is provided by the signal classifier 120, the bit allocator 130 can provide information indicating that bits are allocated at a ratio of 3:3:2, for example, to the first harmonic coding module 140, the second harmonic coding module 150, and the CELP module 180. If a signal indicating that the harmonics are not included in the input audio signal is provided by the signal classifier 120, the bit allocator 130 can provide information indicating that bits are allocated at a ratio of 2:2:4, for example, to the first harmonic coding module 140, the second harmonic coding module 150, and the CELP module 180. The bit allocation information can be set in advance.
The first harmonic coding module 140 performs harmonic coding on the input audio signal using the pitch lag and outputs a linear prediction coding (LPC) coefficient quantized for audio decoding, a quantized LPC (QLPC) coefficient index, and a quantized phase index.
To this end, the first harmonic coding module 140 includes a first harmonic analyzer 201, an amplitude/LPC coefficient converter 202, an LPC coefficient quantizer 203, a QLPC/amplitude converter 204, a phase quantizer 205, and a first harmonic synthesizer 206, as shown in
The first harmonic analyzer 201 analyzes harmonics of the input audio signal using a pitch lag (or a pitch delay). That is, the first harmonic analyzer 201 searches for a fundamental frequency ω0 using the pitch lag and searches for harmonic parameters using a sine dictionary. The harmonic parameters include an amplitude A and a phase φ.
The amplitude A and the phase φ of the sine dictionary are found using a matching pursuit (MP) algorithm in which the input audio signal s(n) is used as a target signal. The input audio signal SH(n) can be expressed using the sine dictionary as shown in Equation 3
where Ak is the amplitude of a k-th sine wave, ωk is an angle frequency of the k-th sine wave, φk is the phase of the k-th sine wave, wham(n) is a hamming window, and K is the number of sine dictionaries, which is generally obtained using Equation 4.
The angle frequency ωk of sine dictionaries can be obtained using Equation 5.
Referring to
where rh,k is a k-th target signal and Ek is a value obtained by multiplying a mean squared error between rh,k and a k-th sine dictionary by a hamming window wham. If k=0, rh,k(n) is the same as the original audio signal s(n). Ak and φk which minimize Ek can be defined using Equation 7.
The first harmonic analyzer 201 transmits the amplitude of the sine dictionary to the amplitude/LPC coefficient converter 202 and transmits the phase of the sine dictionary to the phase quantizer 205.
The amplitude/LPC coefficient converter 202 converts the amplitude A of the input sine dictionary into an LPC coefficient. The LPC coefficient analyzer 203 quantizes the LPC coefficient using the allocated bit information provided by the bit allocator 130 and outputs the quantized LPC (QLPC) coefficient and the quantized LPC coefficient index.
The QLPC/amplitude converter 204 converts the quantized LPC coefficient into an amplitude vector  of the quantized sine dictionary and outputs the amplitude vector Â.
The phase quantizer 205 quantizes a phase output from the first harmonic analyzer 201 based on the allocated bit information provided by the bit allocator 130 and outputs a quantized phase vector {circumflex over (φ)} and a quantized phase index.
The first harmonic synthesizer 206 synthesizes the amplitude vector  of the quantized sine dictionary output from the QLPC/amplitude converter 204 and the quantized phase vector {circumflex over (φ)} output from the phase quantizer 205 using Equation 8 to obtain a synthesized audio signal {circumflex over (SH)}(n) with respect to the input audio signal.
The first harmonic synthesizer 206 transmits the synthesized audio signal {circumflex over (sH)}(n) to the first detector 150.
The first detector 150 detects and outputs a first difference audio signal obtained from the first difference between the input audio signal and the synthesized audio signal output from the first harmonic coding module 140.
The second harmonic coding module 160 harmonically codes the first difference audio signal detected by the first detector 150 using the quantized LPC coefficient obtained by the first harmonic coding module 140 and a previous output signal of the second harmonic coding module 160, outputs a first synthesized difference audio signal, a harmonic index quantized for audio signal decoding and a first quantized gain index.
To this end, referring to
The LPC coefficient analyzer 301 analyzes an LPC coefficient on the first difference audio signal output from the first detector 150 using the quantized LPC coefficient provided by the first harmonic coding module 140 and extracts an LPC coefficient needed by the second harmonic coding module 160.
The LPC coefficient analyzer 301 can be configured to extract a reduced LPC coefficient when the order of the quantized LPC coefficient provided by the first harmonic coding module 140 must be reduced according to the operation conditions of a corresponding audio coding apparatus. An LPC coefficient can be reduced by obtaining only necessary LPC coefficients in a head part among transmitted LPC coefficients. In this case, the number of LPC coefficients should be even. For example, when the order of the quantized LPC coefficient is P and the order of an LPC coefficient to be intended to be used in the second harmonic coding module 160 is Q, the number of Q LPC coefficients existed in the head part are extracted from all P LPC coefficients. The extracted LPC coefficients are provided to the inverse synthesis filter 302 and the synthesis filter 306, respectively.
The inverse synthesis filter 302 performs the inverse operation of the operation performed by a synthesis filter on the first difference audio signal detected by the first detector 150 to generate an excitation signal of the first difference audio signal and transmits the generated excitation signal to the second harmonic analyzer 303.
Referring to
The index quantizer 304 quantizes the harmonic index output from the second harmonic analyzer 303 using the allocated bit information provided by the bit allocator 130 and outputs the quantized harmonic index and the quantized gain index.
The second harmonic synthesizer 305 has the same structure as the first harmonic synthesizer 206 of
The synthesis filter 306 outputs the first synthesized difference audio signal by synthesis filtering the synthesized audio signal output from the second harmonic synthesizer 305 using the quantized LPC coefficient output from the LPC coefficient analyzer 301. The first synthesized difference audio signal is output to the second detector 170.
The second detector 170 detects a difference audio signal obtained from the difference between the first difference audio signal output from the first detector 150 and the first synthesized difference audio signal output from the second harmonic coding module 160 and outputs the detected difference audio signal as a second difference audio signal.
The CELP module 180 CELP-codes the second difference audio signal output from the second detector 170 using the quantized LPC coefficient obtained by the first harmonic coding module 140 and outputs a stochastic index quantized and a second quantized gain index in order to decode an audio signal.
To this end, the CELP module 180 includes a third detector 401, a perceptual weighting filter 402, a stochastic codebook search unit 403, an index quantizer 404, a stochastic codebook 405, a multiplier 406, an LPC coefficient analyzer 407, and a synthesis filter 408, as shown in
The third detector 401 detects a difference audio signal obtained from a difference between the second difference audio signal output from the second detector 170 and a synthesized audio signal previously obtained by the CELP module 180.
The perceptual weighting filter 402 perceptual-weighting-filters the difference audio signal using the LPC coefficient provided by the LPC coefficient analyzer 407 so that quantization noise of the difference audio signal output from the third detector 401 is equal to or less than a masking level using a hearing masking effect.
The stochastic codebook search unit 403 searches one corresponding stochastic codebook based on a signal output from the perceptual weighting filter 402 and outputs an index of the searched stochastic codebook.
The index quantizer 404 quantizes the index provided by the stochastic codebook search unit 403 and outputs the quantized stochastic codebook index and the quantized gain index.
The stochastic codebook 405 includes a plurality of stochastic codebooks and outputs a stochastic codebook that corresponds to the quantized stochastic codebook index provided by the index quantizer 404.
The multiplier 406 multiplies the stochastic codebook output from the stochastic codebook 405 by the quantized gain output from the index quantizer 404.
The LPC coefficient analyzer 407 analyzes the quantized LPC coefficient of the signal output from the third detector 401 using the quantized LPC coefficient provided by the first harmonic coding module 140 and extracts the quantized LPC coefficient. The method of extracting the quantized LPC coefficient is similar to the method used in the LPC coefficient analyzer 301 provided in the second harmonic coding module 160.
The extracted LPC coefficient is provided to the perceptual weighting filter 402 and the synthesis filter 408.
The synthesis filter 408 performs synthesis filtering on the signal output from the multiplier 406 using the quantized LPC coefficient output from the LPC coefficient analyzer 407 and provides the synthesis-filtered result to the third detector 401. The synthesis filtering is performed by obtaining an impulse response of the synthesis filter 408 from the quantized LPC coefficient and then convoluting the impulse response and the signal output from the multiplier 406 to obtain the synthesized audio signal.
The inverse quantizers 501, 502, 503, 504, 505, and 506 can constitute an inverse quantization unit for inversely quantizing a plurality of parameters for restoring an audio signal.
The first harmonic coding module 510 performs harmonic decoding using an LPC coefficient output from the LPC coefficient inverse quantizer 501 and a phase vector output from the phase index inverse quantizer 502 to output the restored audio signal including harmonics.
To this end, the first harmonic coding module 510 includes an LPC coefficient/amplitude converter 511 and a harmonic synthesizer 512.
The LPC coefficient/amplitude converter 511 converts the LPC coefficient into a amplitude vector  of a sine dictionary. The harmonic synthesizer 512 synthesizes the phase vector {circumflex over (φ)} output from the phase index inverse quantizer 502 with the amplitude vector  of the sine dictionary output from the LPC/amplitude converter 511 using Equation 8 and outputs an audio signal including harmonics. The output audio signal including harmonics is output to the first adder 530.
The second harmonic coding module 520 performs harmonic coding based on the LPC coefficient output from the LPC coefficient inverse quantizer 501, a harmonic index output from the harmonic index inverse quantizer 503, and a first gain value output from the first gain index inverse quantizer 504.
To this end, the second harmonic coding module 520 includes a harmonic code generator 521, a first multiplier 522, and a first synthesis filter 523.
The harmonic code generator 521 includes a plurality of harmonic codes and generates a harmonic code based on the input harmonic index. The first multiplier 522 multiplies the generated harmonic code by the first gain value.
The first synthesis filter 523 performs synthesis filtering on the signal output from the first multiplier 522 based on the input LPC coefficient and outputs the synthesized and filtered audio signal to the first adder 530. If the audio signal output from the first multiplier 522 is sh(n), the LPC coefficient is a and the synthesized and filtered audio signal is s1(n), the synthesis filtering can be defined by Equation 9
where p is the order of the LPC coefficient.
The first adder 530 adds the signal output from the first harmonic coding module 510 to the signal output from the second harmonic coding module 520 and outputs the added result to the second adder 550.
The CELP decoding unit 540 performs CELP decoding based on the stochastic index output from the stochastic index inverse quantizer 505 and the second gain value output from the second gain index inverse quantizer 506.
To this end, the CELP decoding module 540 includes a stochastic codebook 541, a multiplier 542, and a second synthesis filter 543.
The stochastic codebook 541 includes a plurality of stochastic codebooks and outputs a stochastic codebook corresponding to the stochastic index.
The second multiplier 542 multiplies the second gain value by the stochastic codebook. The second synthesis filter 543 provides the synthesized audio signal obtained by performing synthesis filtering on the signal output from the second multiplier 542 based on the LPC coefficient using Equation 9 to the second adder 550.
The second adder 550 adds the signal output from the first adder 530 to the signal output from the CELP decoding module 540 to restore the audio signal and outputs the restored audio signal.
In operation 601, a pitch of an input audio signal is analyzed to obtain a pitch lag.
In operation 602, it is determined whether harmonics are included in the input audio signal to classify the input audio signal and bits allocated to the first harmonic coding module 140, the second harmonic coding module 160, and the CELP module 180 based on the classification.
In operation 603, harmonic coding is performed with respect to the input audio signal by the first harmonic coding module 140 using the pitch lag obtained in operation 601, without analyzing an LPC coefficient. That is, harmonic analysis with respect to the input audio signal is performed, the amplitude of the sine dictionary detected by harmonic analysis is converted into an LPC coefficient, the LPC coefficient is quantized and converted into the amplitude vector, and harmonic synthesis is performed. The quantized LPC coefficient is used in second harmonic coding and CELP coding.
In operation 604, a difference audio signal obtained as a difference between the input audio signal and the harmonic coding result obtained in operation 603 is set as a first difference audio signal, an LPC coefficient of the first difference audio signal is analyzed, and harmonic coding is performed on the first difference audio signal by the second harmonic coding module 160. Here, the LPC coefficient of the first difference audio signal is extracted using the quantized LPC coefficient detected in operation 603.
In operation 605, a difference audio signal obtained as a difference between the harmonic coding result obtained from the first difference audio signal and the input audio signal is set as a second difference audio signal, and the second difference audio signal is CELP coded by the CELP module 180. In the CELP coding, the LPC coefficient of the second difference audio signal is extracted using the quantized LPC coefficient detected in operation 603.
In operation 606, the plurality of parameters obtained in operations 603, 604, and 605 are transmitted in order to decode an audio signal. The plurality of parameters include the quantized LPC coefficient index, a quantized phase index, a quantized harmonic index, a first quantized gain index, a quantized stochastic index, and a second quantized gain index.
A plurality of parameters for restoring an audio signal are received in operation 701, and each of the plurality of received parameters is inverse quantized in operation 702.
In operation 703, harmonic decoding is performed by the first harmonic coding module 510 based on an LPC coefficient and a phase value obtained in operation 702. In operation 704, harmonic decoding is performed by the second harmonic coding module 520 based on the LPC coefficient, a harmonic index, and a first gain value obtained in operation 702. In operation 705, an audio signal in which the first harmonic decoding result obtained in operation 703 is added to the second harmonic decoding result obtained in operation 704 is obtained. In operation 706, CELP decoding is performed by the CELP decoding module 540 based on a stochastic index and a second gain value obtained in operation 702.
In operation 707, the addition result obtained in operation 705 is added to the CELP decoding result obtained in operation 706 to restore the audio signal.
Embodiments of the present invention can also be embodied as computer readable code on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. Also, functional programs, code, and code segments for accomplishing the present invention can be easily construed by programmers skilled in the art to which the present invention pertains.
According to the above-described embodiments of the present invention, harmonic analysis is performed twice such that more harmonics can be searched for using the same bits.
Allocation of bits used in harmonic coding is variably performed according to whether harmonics are included in the input audio signal such that a coarse granularity scalability function can be easily supported and harmonic sound quality can be optimised.
In addition, after harmonic coding in which the LPC coefficient is not analysed is performed, harmonic coding in which the LPC coefficient is analysed is performed, and then, CELP coding is performed such that pitch halving prediction or pitch doubling prediction can be prevented and lowering of sound quality can be minimized.
Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2005-0020136 | Mar 2005 | KR | national |