This application is based on and claims the benefit of priority from the prior Japanese Patent Application No. 2006-187123, filed on Jul. 6, 2006, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to an audio signal coding apparatus and an audio signal decoding apparatus capable of reducing the number of bits contained in a coded wideband audio signal.
2. Description of the Related Art
A speech signal compressing/coding method such as AMR (Adaptive Multi-Rate) defines that a coding bit rate can be changed frame by frame based on the detected signal activity.
In the AMR method, in order to reduce transmission power, it is detected whether the activity of an input signal to be coded is voice or not in units of coding, that is, frame by frame (VAD control), and when the input signal is determined as being voice, the input signal is transmitted in the form of a normal audio coded frame, whereas when the input signal is determined not to be voice, only the basic information of the frame is transmitted discontinuously (DTX (Discontinuous Transmission) control) in the form of a comfort noise frame. However, because the DTX control is executed in frames, when this method is applied to a wideband signal such as an audio signal, the DTX control is performed for the whole band to determine whether the activity is present in the input signal.
In addition, according to the MPEG2 audio standards, the AAC (Advanced Audio Coding) method adopting the time-to-frequency transform coding is used.
For example, in the case of the frame F2, which contains frequency bands without the activity (only a slight number of bits is required), even when the number of bits is reduced for this frame,, as is indicated by a hollow arrow, a surplus number of bits is used for another frame. Also, in the case of the frames F3 and F4, which contain frequency bands without the activity in part of the frequency bands, even when the number of bits is reduced for such a frequency band or the frame containing such a frequency band with no activity, as is indicated by a hollow arrow, bits are allocated to the other frequency bands or to another frame. Hence, as is shown in
A variable rate coding method for controlling the coding bit rate frame by frame is disclosed in Jpn. Pat. Appln. KOKAI Publication No. 3-191618. In this coding method, variable rate control is performed for an SNR, whichmeans sound quality, to be constant. In addition, a signal sequence, such as an audio, is divided into plural frequency bands, and the number of bits is controlled for each frequency band on the basis of signal power in each frequency band. It should be noted, however, that because the presence or absence of an audio is determined in the whole frequency bands and a sum of coding quantities of the entire frame is controlled, the control is not performed for each frequency band. This method is therefore a technique that is the same as the AMR method.
The coding method in the related art has a problem that the rate control cannot be performed finely and bands cannot be utilized efficiently.
The present invention has been made to solve this problem, and it is an object of the present invention to reduce a number of bits by utilizing the bands efficiently for a wideband audio signal.
According to one aspect of the present invention, an apparatus for coding a wideband audio signal is provided which comprising: first dividing means for dividing the wideband audio signal into a plurality of frames; second dividing means for dividing each frame divided by the first dividing means into a plurality of frequency bands; detecting means, for each frequency band, for detecting whether there is activity in each frequency band, based on noise characteristics; first coding means for quantizing the frequency bands and variable length coding the quantized frequency bands; second coding means for transforming a spectrum of the frequency bands into a parameter; determining means for determining which one of the first coding means and second coding means each of the frequency bands is subject to based on the detected activity; calculating means for calculating a first characteristic of one frame and a second characteristic of all frequency bands subject to coding by the second coding means in the one frame; and adjusting means for adjusting a target code amount to be used by the first coding means based on a ratio of the first characteristic and the second characteristic.
The filter bank 1 performs processing to transform an input signal to be coded to a spectral coefficient in a frequency domain. The psycho-acoustic model portion 2 converts the input signal to a frequency-domain signal and divides the frequency -domain signal into frequency bands f0, f1, . . . , fn, and calculates PE (Perceptual Entropy), an SMR (Signal to Mask Ratio), and unpredictability measure for each of frequency bands f0, f1, . . . , fn, divided at regular intervals in terms of audibility from the spectral coefficient and the auditory characteristic. These calculation results are used for the adaptive block switching performed at the time of quantization and the filter bank processing to suppress pre-echoes. The sequence of processing is defined in the encoder section in ANNEX B of the ISO/IEC 13818-7 MPEG-2 AAC standards, the contents of which are incorporated herein by reference.
The quantizer 3 calculates a quantization step size for each frequency band on the basis of the number of bits per frame acquired from rate control information and the SMR from the psycho-acoustic model portion 2, and quantizes each spectral coefficient on the basis of the quantization step size. The noiseless coder 4 performs entropy coding, such as Huffman coding, and sectioning in order to reduce logical redundancy for a signal of the quantized spectral coefficients. In this instance, it will be described that the Huffman coding is applied for coding the quantized spectral coefficients. Consequently, noiseless coded spectral coefficients outputted from the noiseless coder 4 are the Huffman codes. The formatter 5 multiplexes the Huffman codes, the quantization step size, coded DTX control information, and so on, and generates frames containing the multiplexed information to be transmitted to a network.
The DTX controller 6 divides the spectrum signal into frequency bands f0, f1, . . . , fn at regular intervals in terms of auditory frequency resolution (Bark scale or the like). The AAD control portion 70 of the DTX controller 6 performs audio activity detection for the frequency band f0. The audio activity detection is achieved, for example, by comparing the unpredictability measure for the frequency band f0 derived from the psycho-acoustic model portion 2 with threshold, to determine whether the frequency band f0 is a noise-like signal. The AAD control portion 70 then saves the AAD determination result as AAD flag information (for example, normal signal: ON, noise-like signal: OFF) of the frequency band f0.
The AAD control portion 71 performs the audio activity detection for the frequency band fl and saves the result as AAD flag information of the frequency band fl in the same manner as described above. The AAD control portion 7n performs the audio activity detection for the frequency band fn and saves the result as AAD flag information of the frequency band fn in the same manner as described above.
The DTX coder 10 in the DTX controller 6 first determines, for each frequency band, one of a first coding mode of executing normal coding processing, a second coding mode of coding DTX control information for the divided frequency band, and a third coding mode of executing no coding processing, based on the AAD flag information in the AAD control portions 70 through 7n, and executes the determined the second mode of processing if the second mode of coding DTX control information is selected. The DTX control information of the divided frequency band includes a DTX control flag identifying that the frequency band is subject to the DTX control for the divided frequency band and parameters indicating the spectrum of the frequency band to be coded. The coded DTX control information such as coded DTX control flag and coded parameters coded by the DTX coder 10 are outputted to the formatter 5. Upon completing the processing as described above for all the frequency bands, the rate control portion 11 corrects the bit rate in response to the degree of being selected the second mode to the respective frequency bands. To correct the bit rate, the rate control portion 11 calculates rate control information and outputs the rate control information to the quantizer 10 and noiseless coding coder 4.
The stream analysis/decomposition portion 51 analyses and decomposes the multiplexed information contained in received frames, and extracts the Huffman codes, the quantization step size, the coded DTX control information, and so on. Subsequently, the Huffman codes are inputted into the noiseless decoder 52, the quantization step size is inputted into the inverse quantization portion 53, and the coded DTX control information is inputted into the DTX decoding/interpolation portion 55, respectively. The noiseless decoding portion 52 decodes the Huffman codes and extracts a physical quantity, such as quantized spectral coefficients. The inverse quantization portion 53 performs inverse quantization processing on the extracted quantized spectral coefficients pursuant to the quantization step size received from the stream analysis/decomposition portion51 and restores the spectral coefficients. The filter bank 54 transforms the spectral coefficients from the inverse quantization portion 52 into a time-domain PCM signal. This time-domain PCM signal corresponds to the input signal having been inputted into the filter bank 1.
For each band, the DTX decoding/interpolation portion 55 decodes the coded DTX control information and extracts the DTX control flag and parameters. Subsequently, the DTX decoding/interpolation portion 55 determines whether the frequency band is subjected to the DTX control for the divided frequency band with reference to the DTX control flag. The frequency domain interpolation portion 56 performs the frequency domain interpolation processing. The frame interpolation portion 57 performs the frame interpolation processing. The processing described above is performed for all the frequency bands.
Then, the DTX coder 4 first determines which of the first coding mode or the second coding mode is to be executed on the basis of the AAD flag for the frequency band f0. More specifically, it is determined whether the AAD determination results for preceding frames show that AAD-OFF (the AAD flag has been set to OFF) has continued for a predetermined number of times or more. When AAD-OFF has continued for the predetermined number of times or more, the frequency band is determined as being subject to the DTX control for the divided frequency band (the second coding mode), and when AAD-OFF has continued for less than the predetermined number of times, the frequency band is determined as being subject to the normal coding processing (the first coding mode) (Step S2). When the AAD determination result in Step S2 shows that AAD-OFF has continued for less than the predetermined number of times (NO in Step S2), the normal coding processing (e.g. scaling processing) is performed by the quantizer 3 and noiseless coder 4 (Step S3).
When the AAD determination result in Step S2 shows that AAD-OFF has continued the predetermined number of times or more (YES in Step S2), the DTX coder 10 determines that the frequency band is subject to the DTX control for the divided frequency band. If the DTX control for the divided frequency band is determined to be executed, the DTX coder 10 checks whether the frequency band is already placed under the DTX control for the divided frequency band is determined (Step S4). When it is determined in Step S4 that the frequency band is not placed under the DTX control for the divided frequency band (NO is Step 4), the DTX control information (discontinuous transmission control information) is coded by the DTX coder 10 for the intended frequency band (band f0) (Step S5). The DTX control information includes the DTX control flag identifying the frequency band as being subject to the DTX control for the divided frequency band and parameters corresponding to parameterized spectrum. The parameterized spectrum can be, for example, the average power information.
On the other hand, when it is determined that the frequency band is already placed under the DTX control for the divided frequency band (YES in Step S4), whether the current frame is in the default discontinuous transmission cycle or the default cycle responding to the AAD determination result is determined by the DTX coder 10 (Step S6). When the current frame is in the default cycle (YES in Step S6), the DTX control information is newly coded to update the DTX control information (Step S5). When it is determined in Step S6 that the current frame is not in the default cycle (NO), the DTX coder 10 does not code the DTX control information. The processing for the frequency band f0 is completed by the processing described above. Herein, the cycle in which the divided band DTX control information is transmitted can be the default cycle as described above, or alternatively, it can be changed adaptively in response to the signal characteristic.
The processing as described above is performed for each frequency band until the processing is completed for all the frequency bands f0, f1, . . . , fn (Step S7).
Subsequently, the rate control is corrected according to the degree of application of the DTX control for the divided frequency band to the respective frequency bands. The correction of the rate control is executed by the rate control portion 11 and is a method by which a correction is made by reducing the number of bits in response to a ratio of the total power for each frame and the power of the DTX applied band. Initially, power Ptot of one entire frame is calculated from the spectrum information (Step Sll). Further, power Pdtx of a signal in the frequency band to which the DTX control for the divided frequency band is applied is calculated (Step S12).
Generally, an allocated number of bits Bfrm to each frame is calculated by the rate control portion 10 in advance from the parameter from the psycho-acoustic model portion 2, the capacity of the bit reservoir 12, and so forth. In the case of the DTX control for the divided frequency band, however, in order to utilize the frequency bands efficiently by means of discontinuous transmission, it is controlled to lower the coding rate (the number of bits for each frame) by the number of bits comparable to the frequency band signal component that will not be transmitted by the DTX control. To this end, the number of bits is weighted on the basis of the power information for each frequency band, and in order to subtract the number of bits comparable to the number of bits applied to the DTX control from the number of bits, it is adjusted using the parameters Ptot and Pdtx to an allocated number of bits to each frame after correction, (target)=Bfrm×(1−Pdtx/Ptot), that is allocated to the normal coding (the second coding mode) (Step S13).
The allocated number of bits before correction, Bfrm, is applied to update the capacity of the bit reservoir 12 (Step S14). This is because there is a possibility that when the capacity of the bit reservoir 12 increases as the number o f bits is reduced by the correction, information bits are used excessively in the next and subsequent frames, which makes the efficient utilization of the frequency bands impossible.
According to the first embodiment, it is possible to achieve an allocated amount of codes (target) corresponding to the power of a signal in the frequency band to which the DTX control for the divided frequency band is applied. It is thus possible to reduce an amount of codes.
In the method of correcting the bit rate according to the second embodiment, correction is made by reducing the number of bits in response to the ratio of the total PE (Perceptual Entropy) of each frame and the PE in the DTX applied frequency band on the basis of the psycho-acoustic model. The DTX controller 6 first calculates the PE value PEtot of the entire frame obtained from the psycho-acoustic model portion 2 (Step S21). Further, the DTX controller 6 calculates the PE value PEdtx of the frequency band to which the DTX control for the divided frequency band is applied (Step S22). Subsequently, the rate control portion 11 calculates the number of bits Bfrm which is used to correct the allocated number of bits to each frame. To this end, the number of bits is weighted on the basis of the PE value, which is calculated by the psycho-acoustic model portion 2, of each frequency band, and in order to remove the PE value of the frequency band(s) to which the DTX control is applied when calculating the number of bits to be allocated to each frame, the corrected number of bits (target), Bfrm×(1−PEdtx/PEtot), to be allocated to each frame is calculated by the rate control portion 12, based on the parameters PEtot and PEdtx. The calculated Bfrm is used in the normal coding processing (the first coding mode) (Step S23).
The allocated number of bits before correction, Bfrm, is applied to update the capacity of the bit reservoir 12 (Step S24). This is because, as in the first embodiment, there is a possibility that when the capacity of the bit reservoir 12 increases as the amount of codes is reduced by the correction, information bits are used excessively in the next and subsequent frames, which makes the efficient utilization of the frequency bands impossible.
According to the second embodiment, it is possible to achieve an allocated number of bits (target) corresponding to the PE (Perceptual Entropy) of a signal in the frequency band to which the DTX control for the divided frequency band is applied. It is thus possible to reduce the number of bits.
The method of correcting the bit rate according to the third embodiment is a method by which corrected number of bits calculated by subtracting the number of bits for the DTX applied frequency band from the number of bits for all the frequency bands. The DTX controller 6 first performs coding with the initially allocated number of bits Bfrm (Step S31). Subsequently, the DTX controller 6 calculates the number of bits Bdtx allocated to the frequency band to which the DTX control is applied (Step S32). Then, the rate control portion 11 calculates the number of bits to be allocated to the normal coding processing (first coding mode) by subtracting Bdtx from Bfrm (Step S33). Coding is performed again with the corrected allocated number of bits. Only the noiseless coding by the noiseless coder 4 is performed, since the quantization step size is reusable.
The allocated number of bits before correction, Bfrm, is applied to update the capacity of the bit reservoir 12 (Step S34). This is because, as in the first embodiment, there is a possibility that when the capacity of the bit reservoir 12 increases as the number of bits is reduced by the correction, information bits are used excessively in the next and subsequent frames, which makes the efficient utilization of the frequency bands impossible.
According to the third embodiment, it is possible to achieve the number of bits from which is subtracted the number of bits Bdtx allocated to the frequency band to which the DTX control is applied. It is thus possible to reduce the number of bits.
On the other hand, when the frequency band f0 is determined as being applied to the DTX control in Step S51 (YES), it is checked whether the DTX control information is included in the present received frame by DTX decoding/interpolation portion 55, that is, it is determined whether the discontinuous transmission timing in the predetermined cycle, which is defined to execute the discontinuous tramsmission, has come (Step S53). If the DTX control information has been received (YES), the spectrum of the intended frequency band (frequency band f0) is interpolated/restored by the frequency domain interpolation portion 56 on the basis of the DTX information (Step S54). For example, if the DTX information is the power information, a signal is restored from a random signal based on calculation that total power of the random signal is closed to the power included in the DTX information.
When it is determined that the DTX information reception timing has not come in Step S53 (NO), the interpolation processing is performed by the frame domain interpolation portion 57 between frames (Step S55). For example, it is performed by the method of updating only a random signal used as the base signal based on the power value of the preceding frame or the method of linear prediction based on the power information in the past. The processing described above is performed for each frequency band until the processing is completed for all the frequency bands (Step S56).
The frame F2 comprises frequency bands without the activity (hatched portion) in the whole bandwidth, thereby having Pdtx=Ptot as the power of a signal of the frequency band to which the DTX control is applied. Hence, a number of bits (target F2) allocated to the normal coding (first coding mode) for the frame F2 after correction is Bfrm(F2)×(1−Pdtx/Ptot)=Bfrm(F2)×(1−Ptot/Ptot)=0. In practice, however, because the control bit and the like are necessary, the lowest bit rate is used.
The frame F3 comprises both the frequency bands of a signal with the activity and frequency bands without the activity (hatchedportion). Given 0.4 as the ratio of the power of the DTX applied frequency band and the power of the frame, a number of bits (target F3) allocated to the normal coding (first coding mode) for the frame F3 after correction is Bfrm(F3)×(1−Pdtx/Ptot)=Bfrm(F3)×(1−0.4)=0.6Bfrm(F3).
The frame F4 also comprises both frequency bands of a signal with the activity and a frequency band without the activity (hatchedportion). Given 0.2 as the ratio of the power of the DTX applied frequency band and the power of the frame, a number of bits (target F4) allocated to the normal coding (first coding mode) for the frame F4 after correction is Bfrm(F4)×(1−Pdtx/Ptot)=Bfrm(F4)×(1−0.2)=0.8Bfrm(F4).
According to the embodiments of the invention, it is possible to apply the rate control to an allocated number of bits in response to the power of a signal in the frequency band to which the DTX control is applied. It is thus possible to reduce a number of bits.
Number | Date | Country | Kind |
---|---|---|---|
2006-187123 | Jul 2006 | JP | national |