The present invention relates to an encoder, a decoder and a method thereof.
As speech coding, there are mainly two types of coding technologies, that is to say, transform coding and transform coded excitation (TCX) coding (for example, Non-Patent Literature 1).
Transform coding involves, for example, a step of converting a signal from the time domain to the frequency domain using discrete Fourier transform (DFT) or modified discrete cosine transform (MDCT). Also, transform coding performs quantizing and encoding spectrum coefficients. As general transform coding, there are MPEG MP3, MPEG AAC (for example, Non-Patent Literature 2), and Dolby AC3. Transform coding is efficient for a music signal and a general speech signal.
In an encoder of transform coding system 10 shown in
In a decoder of transform coding system 10 shown in
By contrast with this, the combination of a time domain (linear prediction) method and a frequency domain (transform coding) method is employed in TCX coding. TCX coding acquires a residual (excitation) signal by utilizing redundancy of a speech signal in the time domain using linear prediction for an input speech signal. In the case of a speech signal, especially, in the case of an active speech section (a resonance effect and a high pitch frequency component), an audio reproducing signal is efficiently generated in this model. After linear prediction, a residual (excitation) signal is converted into the frequency domain and efficiently encoded. As general TCX coding, there are AMR-WB-E, ITU.T G.729.1, and ITU.T G.718 (for example, Non-Patent Literature 4).
In an encoder of TCX coding system 20 shown in
In a decoder of TCX coding system 20 shown in
Transform coding part in both transform coding and TCX coding is normally carried out by utilizing any quantizing method. One of vector quantization is referred to as pulse vector coding.
For example, Non-Patent Literature 3 discloses factorial pulse coding (one of pulse vector coding) which quantizes a LPC residual in the MDCT domain (see
In an encoder of TCX coding system 30 shown in
Lefebvre, et al, “High quality coding of wideband audio signals using transform coded excitation (TCX)”, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 1/193-1/196, April 1994
Karl Heinz Brandenburg, “MP3 and AAC Explained”, AES 17th International Conference, Florence, Italy, September 1999.
Udar Mittal, James P.Ashley and Edgardo M. Cruz_Zeno “Low complexity factorial pulse coding of MDCT coefficients using approximation of combinatorial functions”, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1-289-1-292, April 2007.
T. Vaillancourt et al, “ITU-T EV-VBR: A Robust 8-32 kbit/s Scalable Coder for Error Prone Telecommunication Channels”, in Proc. Eusipco, Lausanne, Switzerland, August 2008
By the way, at a low bit rate, the number of spectrum coefficients to be encoded is normally much greater than the number of pulses encoded by pulse vector coding. For example, four conditions referred in Non-Patent Literature 3 are shown in the following table 1.
In the fifth layer in G.718, a relationship between the number of spectrum coefficients N and M representing the number of pulses which can be encoded is shown in the following table 2.
In view of the above, N is much greater than M in most conditions.
Here, when N is great, more bits are required for encoding a pulse position. By this means, more bits are required for encoding each pulse. Accordingly, when a bit rate is not sufficiently high, only several pluses can be encoded. As a result, when a bit rate is not sufficiently high, a large part of a spectrum remains unencoded and this may cause a situation where sound quality of a decoded signal is extremely poor.
It is therefore an object of the present invention to provide an encoder, a decoder, and a method thereof which can improve decoded signal quality by improving bit efficiency in coding.
An encoder according to the present invention employs a configuration to include a time-frequency conversion section that converts a coding target signal into a frequency domain signal; an effective range specifying section that specifies an effective range in a frequency band of the frequency domain signal; and a pulse vector coding section that performs pulse vector coding on only a signal component within the effective range.
A decoder according to the present invention employs a configuration to include a pulse vector decoding section that performs pulse vector decoding on a pulse coding parameter coded in the above encoder; a spectrum forming section that sets a decoded signal acquired in the pulse vector decoding section to a band corresponding to the effective range; and a frequency-time conversion section that converts a decoded signal set to the band corresponding to the effective range into a time domain signal.
A coding method according to the present invention employs a configuration to include a step of converting a coding target signal into a frequency domain signal; a step of specifying an effective range in a frequency band of the frequency domain signal; and a step of performing pulse vector coding on only a signal component within the effective range.
A decoding method according to the present invention employs a configuration to include a decoding step of performing pulse vector decoding on a pulse coding parameter coded in the above coding method; a spectrum forming step of setting a decoded signal acquired in the decoding step, to a band corresponding to the effective range; and a converting step of converting a decoded signal arranged in the band corresponding to the effective range into a time domain signal.
According to the present invention, it is possible to provide spectrum coefficients coding apparatus, a decoder, and a method thereof which can improve decoded signal quality by improving bit efficiency in coding.
Embodiments according to the present invention will be described below in detail with reference to the drawings. In the embodiments, identical configuration elements are assigned the same reference codes, and duplicate descriptions thereof are omitted.
In
Adaptive spectrum forming coding section 102 acquires “an effective range” in a frequency band of S(f) and acquires Sa(f) which falls within the effective range in S(f). Also, adaptive spectrum forming coding section 102 calculates spectrum coefficients of Sa(f) which falls within the effective range.
Adaptive spectrum forming coding section 102 outputs the spectrum coefficient of Sa(f) which falls within the effective range to pulse vector coding section 103, and transmits spectrum forming information showing the effective range to the decoder side through multiplexing section 104.
Pulse vector coding section 103 performs pulse vector coding for the spectrum coefficient of Sa(f) which falls within the effective range, thereby acquiring a pulse coding parameter such as a pulse position, a pulse amplitude, a pulse polarity, and a global gain.
Multiplexing section 104 multiplexes the pulse coding parameter acquired in pulse vector coding section 103 with the spectrum forming information and transmits the result to the decoder side.
Also, in a decoder shown in
Pulse vector decoding section 106 acquires spectrum coefficients of Sa{tilde over ( )}(f) by decoding a pulse coding parameter. Sa{tilde over ( )}(f) corresponds to Sa(f) and is a base signal for forming S{tilde over ( )}(f) which is a decoded signal of S(f).
Adaptive spectrum forming decoding section 107 generates frequency domain signal S{tilde over ( )}(f) using Sa{tilde over ( )}(f) and spectrum forming information showing an effective range. Specifically, adaptive spectrum forming decoding section 107 generates frequency domain signal S{tilde over ( )}(f) by setting Sa{tilde over ( )}(f) which is a decoding result in pulse vector decoding section 106 to a band in an effective range.
Frequency-time conversion section 108 generates time domain signal S{tilde over ( )}(n) by converting frequency domain signal S{tilde over ( )}(f), into the time domain using inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform (IMDCT) or the like.
Of the overall spectrum of frequency domain signal S(f), spectrum specifying section 201 specifies the top M spectrum coefficients of an amplitude absolute value (that is to say, a plurality of spectrum coefficients in descending order of an amplitude absolute value). Here, M is the number of pulses to be encoded and is derived from the number of available bits, and the number of frequency domain signal S(f). SMax
Minimum position specifying section 202 detects minimum position (the lowest frequency) N1 among the top M spectrum coefficients of an amplitude absolute value.
Maximum position specifying section 203 detects maximum position (the highest frequency) N2 among the top M spectrum coefficients of an amplitude absolute value.
Here, one of the simplest methods for detecting minimum position N1 and maximum position N2 is to store positions of M spectrum coefficients in a sequence and then performs sorting so as to acquire a maximum value and a minimum value in the sequence. A maximum value of positions calculated in this way is N2 and a minimum value thereof is N1. A part between N1 and N2 is “an effective range,” and it is considered that there is no pulse in the remaining spectrum. This minimum position N1 and maximum position N2 represent spectral shape information and are transmitted (reported) to the decoder side through multiplexing section 104.
Operations of coding system 100 having the above configuration will be explained.
In an encoder of coding system 100, adaptive spectrum forming coding section 102 specifies an effective range (a range between N1 and N2 in
Specifically, in spectrum specifying section 201 of adaptive spectrum forming coding section 102, the top M spectrum coefficients of an amplitude absolute value are specified of the overall spectrum of frequency domain signal S(f). Then, in minimum position specifying section 202, minimum position N1 (the lowest frequency) is detected among the top M spectrum coefficients of an amplitude absolute value, and maximum position specifying section 203 detects maximum position N2 (the highest frequency) among the top M spectrum coefficients of an amplitude absolute value. An effective range is a range where N1 is the starting point and N2 is the end point.
Next, pulse vector coding section 103 acquires a pulse coding parameter by performing pulse vector coding on the spectrum coefficient within an effective range, which is specified in adaptive spectrum forming coding section 102. Here, it is considered that there is no pulse in a spectrum which is out of an effective range. The pulse coding parameter and spectrum forming information showing an effective range, which are acquired in this way, are multiplexed in multiplexing section 104 and transmitted to the decoder side.
In this way, it is possible to reduce the number of spectrum coefficients which are a target of pulse vector coding by applying pulse vector coding to not the overall spectrum but only a part thereof, thereby making it possible to reduce the number of bits required for encoding a pulse. That is to say, it is possible to improve bit efficiency in coding. Further, it is possible to improve decoded signal quality by utilizing the reduced bits as described below. The method for utilizing the bits includes, first, increasing the number of pulses using the reduced bits, and second, using the reduced bits for encoding other parameters without changing the number of pulses.
In a decoder of coding system 100, adaptive spectrum forming decoding section 107 receives a pulse vector decoding result which corresponds to spectrum coefficients of Sa(f) in an encoder, and spectrum forming information. Then, adaptive spectrum forming decoding section 107 can form frequency domain signal S{tilde over ( )}(f) which corresponds to S(f) in an encoder by arranging a pulse vector decoding result within an effective range shown by spectrum forming information (see
In view of the above, according to the present Embodiment, a spectrum effective range is determined by a range in which all pulses are arranged. That is to say, a spectrum effective range is adaptively determined in accordance with signal characteristics. Further, pulse vector coding is applied to not the overall spectrum but limited to an effective range. Since the number of spectrum coefficients within an effective range is smaller than the number of spectrum coefficients in the overall spectrum, the number of bits required for encoding the same number of pulses is reduced. That is to say, it is possible to improve bit efficiency in coding. Further, it is possible to improve decoded signal quality by utilizing reduced bits.
In the above-described Embodiment, the following modified examples are possible.
It is possible to apply any limitation upon specifying an effective range for the purpose of reducing the number of bits required for transmitting a starting position and an end position of the effective range. Here, an embodiment which sets a step size upon specifying an effective range to more than 1 will be explained.
In
In view of the above, it is possible to reduce candidates of a starting position and an end position by setting a step width to an integer more than one upon specifying an effective range. As a result, it is possible to reduce bits required for transmitting a starting position and an end position.
In the above Embodiment 1, there has been described the method of reducing the number of bits required for pulse vector coding by an adaptive spectrum forming technology. Embodiment 1 also discloses that it is possible to improve decoded signal quality by arranging additional pulses between N1 and N2 using the reduced number of bits Then, limitation is provided where all additional pulses are arranged between N1 and N2. In addition, N1 and N2 are determined in accordance with the original number of pulses.
However, if the best position of an additional pulse is out of a range between N1 and N2, there is a problem that performance is not efficiently improved by this limitation. Accordingly, in modified example 2, to solve the problem, a configuration will be explained where an additional pulse can be arranged in a lower position (frequency) than N1, or a higher position (frequency) than N2, after N1 and N2 are determined. By this method, decoded signal quality can be further improved.
Adaptive spectrum forming coding section 102, for example, determines N1
In view of the above, a band (an effective range) in which a pulse is arranged in pulse vector coding is adaptively determined in accordance with the number of additional pulses. That is to say, modified example 2 has a feature of relieving the border of an effective range and includes the best position of an additional pulse for this feature. By this means, it is possible to improve decoded signal quality.
The present invention according to Embodiment 2 divides a frequency band into several subbands and analyzes signal characteristics for each subband, thereby determining whether or not the subband is within an effective range. Then, a flag signal showing the determination is transmitted to the decoder side.
In
Band dividing section 301 divides a frequency band of S(f) into a plurality of subbands and divides S(f) into subband signal Sn(f) which is present at each subband. Here, n represents a subband number. In
Forming determination section 302 analyzes three subband signals S1(f), S2(f), and S3(f) together with frequency domain signal S(f). Forming determination section 302 determines whether or not each subband is within an effective range in accordance with signal characteristics of each subband signal and outputs flag signals (F1,F2,F3) showing determination, as spectrum forming information.
Specifically, forming determination section 302 detects Smax(M) in which an amplitude absolute value is the Mth greatest of the overall frequency domain signal S(f). Also, forming determination section 302 detects spectrum coefficient Sn
Spectrum forming section 303 forms a spectrum in an effective range in accordance with the determination result output from forming determination section 302 and outputs the spectrum to pulse vector coding section 103. Flag signals (F1,F2,F3) showing a determination are also output to multiplexing section 104 and transmitted to the decoder side through multiplexing section 104.
Spectrum detecting section 401 detects Smax (M) in which an amplitude absolute value is the Mth greatest of the overall frequency domain signal S(f) (specifying of a standard value). Here, M is the number of pulses to be encoded, and is calculated from the number of available bits, and the number of spectrum coefficients in a frequency domain signal.
Of frequency domain subband signals which are included in subband 1-3, maximum spectrum detecting section 402-1˜3 respectively detects spectrum coefficients S1
Comparison sections 403-1˜3 compares spectrum coefficient S1
Specifically, this determination is performed as follows. Taking the first subband as an example, the determination is performed as follows. If Smax(M)≦S1
Flag signals F1, F2, and F3 acquired in this way are transmitted to the decoder side as spectrum forming information.
Next, the operations of adaptive spectrum forming coding section 102A having the above configurations will be described.
Spectrum forming section 303 forms an effective range and signal Sn(f) within the effective range by eliminating the second subband and adding (combining) the third subband to the first subband based on these flag signals.
Subsequent pulse vector coding section 103 performs pulse vector coding of Sa(f) formed in this way.
In view of the above, according to the present embodiment, a frequency band of S(f) is divided into a plurality of subbands and S(f) is divided into subband signal Sn(f) which is present at each subband. Then determination is made whether or not the subband is within an effective range by analyzing signal characteristics with respect to each subband signal, and a flag signal showing the determination is transmitted.
By this means, bits required for representing an effective range are only a flag signal of each subband, and therefore the number of bits for representing an effective range can be reduced, compared with a method of transmitting a starting position and an end position of an effective range as in Embodiment 1. Using bits reduced in this way for increasing the number of additional pulses, it is possible to further improve decoded signal quality in the decoder side.
The present invention according to Embodiment 3, as in Embodiment 2, divides a frequency band into several subbands and analyzes signal characteristics for each subband, thereby determining whether or not the subband is within an effective range. Then, a flag signal showing the determination is transmitted to the decoder side. It is noted that the present invention according to Embodiment 3 deals with a middle band in a frequency band as being always included in an effective range, and determines whether or not it is included in an effective range only with respect to a subband group of end parts (that is, a lower band and a higher band) in a frequency hand.
In
Forming determination section 501 analyzes lower subband signal S1(f) and higher subband signal S3(f) of three subbands together with frequency domain signal S(f). In view of the above, since a middle band is dealt as being always included in an effective range, forming determination section 501 does not analyze middle subband signal S2(f). Then, forming determination section 501 outputs flag signals (F1,F3) showing determination as spectrum forming information.
Spectrum forming section 502 forms a spectrum in an effective range in accordance with a determination result output from forming determination section 501 and outputs the spectrum to pulse vector coding section 103. Flag signals (F1,F3) showing determination are also output to multiplexing section 104 and transmitted to the decoder side through multiplexing section 104.
Next, the operations of adaptive spectrum forming coding section 102B having the above configurations will be described.
Spectrum forming section 502 forms an effective range and signal Sa(f) within the effective range by eliminating the first subband and adding (combining) the third subband to the second subband which is dealt as being always included in an effective range, based on these flag signals.
Subsequent pulse vector coding section 103 performs pulse vector coding of Sa(f) formed in this way.
The above-described configuration of adaptive spectrum forming coding section 102B is effective for an input signal containing perceptually-important information in a middle band. For example, there is a configuration of coding a lower band in a lower layer and coding all bands in a higher layer in layered coding (scalable coding). In this case, a lower band of a signal coded in a higher layer is formed with a differential signal between an input signal and a lower layer decoded signal, and a higher band is formed with an input signal itself. At this time, since a lower band has been already coded in a lower layer, there is low possibility that important information remains in a lower band. On the other hand, in a higher hand, especially, a speech signal rarely contains important information originally. In such a signal, since a middle band contains relatively-important information and therefore, it is better to always include a subband corresponding to a middle band in an effective range, and flag information may be only two bits for F1 and F3 of a lower band and a higher band at that time.
Besides configurations described in Embodiments 2 and 3, according to characteristics of an input signal, there can be various configurations in an adaptive spectrum forming coding section which specifies an effective range by dividing a frequency band into several subbands and analyzing signal characteristics for each subband to determine whether or not the band is within an effective range.
Embodiment 4 combines an adaptive spectrum forming technology with a signal classification section or a psychoacoustic model, or signal-to-noise ratio calculation or the like. By this means, it is possible to determine an effective range more appropriately in accordance with signal characteristics, perceptual importance, or SNR, each of which is the processing output. For example, since a lower frequency part is more important for a signal such as speech, it is possible to place a greater emphasis on the lower frequency part upon applying an adaptive spectrum forming technology when an input signal is classified as speech or the like.
In
Signal classification section 601 analyzes frequency domain signal S(f) and classifies signal characteristics of a coding target signal. An object of signal classification section 601 is to determine signal characteristics, for example, whether a signal is a music signal and the like, or speech and the like, and whether signal change is significant or stable.
Forming determination section 602 analyzes three subband signals S1(f), S2(f), and S3(f) together with frequency domain signal S(f). Forming determination section 602 perceptually applies weight to a subband signal by taking into account signal type information according to the signal characteristics for each subband. Then, forming determination section 602 determines whether or not a subband is within an effective range based on the weighted subband signal and outputs flag signals (F1,F2,F3) showing the determination.
Specifically, forming determination section 602 applies weight to subband signals S1(f), S2(f), and S3(f) according to signal characteristics determined in signal classification section 601, and detects spectrum coefficient Sn
Spectrum forming section 603 forms a spectrum in an effective range in accordance with a determination result output from forming determination section 602 and weighted subband signals S1
Weighting section 701-1˜3 perceptually applies weight to each subband signal in accordance with perceptual importance, according to signal classification information. These weights are adaptively determined in accordance with signal classification information. For example, in a case where an input signal is classified as speech or the like, since a lower frequency part is more perceptually-important, weights are determined so as to be W1>W2>W3>0.
Maximum spectrum detecting section 402-1˜3 respectively detects spectrum coefficients S1Max, S2
In view of the above, according to the present embodiment, an adaptive spectrum forming technology is combined with a signal classification section or a psychoacoustic model, or a signal-to-noise ratio calculation section, and an effective range is determined more appropriately in accordance with signal characteristics or perceptual importance, or coding performance, each of which is the output processing.
Upon pulse selection in pulse vector coding, amplitude information is only considered as a condition. Accordingly, it is possible to place a greater emphasis on spectrum coefficients which is perceptually more important by applying different weight to different frequency domain signals, thereby lowering the importance degree of spectrum coefficients having perceptually low importance. For example, since a lower frequency part is more important for a signal such as speech, a greater emphasis is placed on the lower frequency part upon applying an adaptive spectrum forming technology when an input signal is classified as a speech signal or the like. By this means, sound quality can be improved.
An adaptive spectrum forming technology described in Embodiments 1-4 can be applied not only to transform coding but also to TCX coding. In Embodiment 5, a case will be described where an adaptive spectrum forming technology described in Embodiments 1-4 is applied to TCX coding.
In
LPC inverse filtering section 802 acquires residual (excitation) signal Sr(n) by applying a LPC inverse filter to input signal S(n) using LPC coefficients from LPC analysis.
Time-frequency conversion section 803 converts residual signal Sr(n) into frequency domain signal Sr(f) using, for example, discrete Fourier transform (DFT), modified discrete cosine transform (MDCT) or the like.
One of adaptive spectrum forming coding sections 102, 102A, 102B, 102C, which are described in Embodiments 1-4, is applied to adaptive spectrum forming coding section 804. Spectrum forming coding section 804 acquires Sra(f) which falls within an effective range of Sr(f). Adaptive spectrum forming coding section 804 transmits spectrum forming information to the decoder side through multiplexing section 806.
Pulse vector coding section 805 performs pulse vector coding for the spectrum coefficient of Sra(f) which falls within the effective range thereby acquiring a pulse coding parameter such as a pulse position, a pulse amplitude, a pulse polarity, and a global gain.
Multiplexing section 806 multiplexes a pulse coding parameter acquired in pulse vector coding section 805, spectrum forming information acquired in adaptive spectrum forming coding section 804, and a LPC parameter acquired in LPC analysis section 801 and transmits the multiplexing result to the decoder side.
Also, in a decoder shown in
Pulse vector decoding section 808 acquires spectrum coefficients of Sra{tilde over ( )}(f) by decoding a pulse coding parameter. Sra{tilde over ( )}(f) corresponds to Sra(f) and is a base signal for forming Sr{tilde over ( )}(f) which is a decoded signal of residual frequency domain signal Sr(f).
Adaptive spectrum forming decoding section 809 generates frequency domain signal Sr{tilde over ( )}(f) using spectrum coefficients of Sra{tilde over ( )}(f) and spectrum forming information showing an effective range.
Frequency-time conversion section 810 generates time domain signal Sr{tilde over ( )}(n) by converting frequency domain signal Sr{tilde over ( )}(f) into the time domain using inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform (IMDCT) or the like.
LPC synthesis filtering section 811 acquires signal S{tilde over ( )}(n) corresponding to signal S(n) in the encoder side by filtering time domain signal Sr{tilde over ( )}(n) using a LPC parameter demultiplexed in demultiplexing section 807.
In view of the above, the same kind of effect as in Embodiments 1-4 can also be obtained in a case where an adaptive spectrum forming technology is applied to TCX coding.
Also, a coding system, an encoder, and a decoder according to each of the above embodiments are applicable to a communication terminal apparatus or a base station apparatus.
Each function block employed in the description of each of the above embodiments may typically be implemented as an LSI constituted by an integrated circuit. These may be individual chips or partially or totally contained on a single chip. “LSI” is adopted here but this may also be referred to as “IC,” “system LSI,” “super LSI,” or “ultra LSI” depending on differing extents of integration.
Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. After LSI manufacture, a programmable field programmable gate array (FPGA) or a reconfigurable processor where connections and settings of Circuit cells within an LSI can be reconfigured can be utilized.
Further, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Application of biotechnology is also possible.
The disclosure of Japanese Patent Application No.2009-250441, filed on Oct. 30, 2009, including the specification, drawings and abstract, is incorporated herein by reference in its entirety.
An encoder, a decoder according to the present invention, and a method thereof are useful for improving decoded signal quality by improving bit efficiency in coding.
100, 800 Coding system
101, 803 Time-frequency conversion section
102, 804 Adaptive spectrum forming coding section
103, 805 Pulse vector coding section
104, 806 Multiplexing section
105, 807 Demultiplexing section
106, 808 Pulse vector decoding section
107, 809 Adaptive spectrum forming decoding section
108, 810 Frequency-time conversion section
201 Spectrum specifying section
202 Minimum position specifying section
203 Maximum position specifying section
301 Band dividing section
302, 501, 602 Forming determination section
303, 502, 603 Spectrum forming section
401 Spectrum detecting section
402 Maximum spectrum detecting section
403 Comparison section
601 Signal classification section
701 Weighting section
801 LPC analysis section
802 LPC inverse filtering section
811 LPC synthesis filtering section
Number | Date | Country | Kind |
---|---|---|---|
2009-250441 | Oct 2009 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2010/006394 | 10/29/2010 | WO | 00 | 4/26/2012 |