The present invention relates to an audio signal coding method and decoding method.
Among conventional audio signal coding methods and decoding methods, an international standard scheme of the ISO/IEC, commonly called MPEG (Moving Picture Experts Group) scheme, is well known. Currently, ISO/IEC 14496-3, commonly called MPEG-4 GA (General Audio Coding), is one of the coding schemes that are widely applicable and achieve high quality sound even at low bit rates (see Non-Patent Reference 1). There are many extended specifications of this scheme that are currently being standardized.
Among them is a low-delay technique for reducing a delay that occurs in coding and decoding. An example is the Low Delay AAC (Advanced Audio Coding) scheme defined by MPEG-4 Audio (ISO/IEC 14496-3) which is an ISO/IEC international standard. Other examples include techniques disclosed by Patent Reference 1 and Non-Patent Reference 2.
Hereinafter, a conventional audio signal coding method and decoding method of Non-Patent Reference 2 shall be described.
The auditory redundancy eliminating unit 101 eliminates auditory redundancy from an input audio signal. More specifically, it eliminates components that are inaudible by humans from the audio signal based on aural characteristics of humans. The auditory redundancy eliminating unit 101 includes an auditory model 103, a pre-filtering unit 104, and a quantizing unit 105.
The auditory model 103 is an important element for determining the audio artifact of coded audio signals. It screens the sounds and levels of frequency components inaudible by humans, by using a technique well known to those skilled in the art, such as temporal masking or simultaneous masking. As a result, it adaptively calculates the level, in each frequency band, of the sounds of frequency components audible by humans, for input audio signals. The auditory model 103 outputs to the pre-filtering unit 104 information indicating, based on the calculation result, what kind of filter the pre-filtering unit 104 should use. Meanwhile, the auditory model 103 outputs the information after including it in a coded sequence of an audio signal which is an output signal of the audio signal coding apparatus. The auditory model 103 is for example an auditory model described in the specification of the MPEG-1 Layer III (commonly called MP3). An input digital audio signal sequence is first inputted to the auditory model 103.
Based on the information provided by the auditory model 103 indicating what kind of filter should be used, that is, based on a value indicating the level, in each band, of frequency components audible by humans, the pre-filtering unit 104 eliminates with a filter the sounds of the components at the level inaudible by humans from the input digital audio signal sequence. By doing so, the pre-filtering unit 104 outputs an audio signal sequence with no inaudible components. The pre-filtering unit 104 is structured with plural linear prediction filters as disclosed in Non-Patent Reference 2.
The quantizing unit 105 quantizes the audio signal sequence received from the pre-filtering unit 104 by rounding off values less than an integer, and outputs an audio signal sequence which is an integer.
In such a manner, the auditory redundancy eliminating unit 101 eliminates, from input audio signal sequences, components inaudible by humans, and outputs audio signal sequences that are quantized into an integer.
The information redundancy eliminating unit 102 eliminates redundant information from the audio signal sequences received from the auditory redundancy eliminating unit 101 so as to enhance the coding efficiency. The information redundancy eliminating unit 102 includes a lossless coding unit 106.
The lossless coding unit 106 has conventionally been proposed, and employs a method such as Huffman coding, a technique well known by those skilled in the art. The audio signal sequences inputted to the lossless coding unit 106 are previously quantized into integers by the above mentioned quantizing unit 105. So, the lossless coding unit 106 which performs Huffman coding, for example, eliminates redundant information from the integers so as to enhance the coding efficiency.
With the above structure, the conventional audio signal coding apparatus 100 outputs both of the following as a coded sequence: information indicating what kind of prefilter was used by the pre-filtering unit 104, that is, information indicating the linear prediction coefficients that structure the pre-filtering unit 104; and an audio signal sequence (information) coded by the lossless coding unit 106.
Next, a conventional audio signal decoding apparatus shall be described.
The lossless decoding unit 201 decodes an audio signal sequence by performing lossless-decoding on a coded sequence outputted from the lossless coding unit 106.
The post-filtering unit 202 structures a postfilter (inverse of the filter used by the pre-filtering unit 104) from a decoded, linear prediction coefficient sequence. The post-filtering unit 202 post-filters the audio signal sequence which has been lossless-decoded by the lossless decoding unit 201, to eventually output an audio signal sequence obtained through the post-filtering.
By using the audio signal coding apparatus of
However, the above conventional audio signal coding method and decoding method entail the following problems:
For example, although techniques such as the Low Delay AAC that is an MPEG standard achieve a low delay as the techniques using the AAC scheme, the delay is still 60 ms approximately. Even with a further improvement incorporated, a delay of approximately 40 ms occurs. This is a problem that such a delay is not small enough for bi-directional communications.
Meanwhile, although the technique of Non-Patent Reference 2 reduces the delay to approximately 10 ms, it has a problem that the rate cannot be lowered. Further, the quantizing processing performed on input audio signals by the quantizing unit 105 is performed on a frame-by-frame basis. Thus, when an input audio signal sequence has great temporal fluctuations, the quantization noise (audio artifact caused by coding) created by the quantizing unit 105 cannot be controlled appropriately. In addition, there is also a problem that sufficient coding efficiency cannot be ensured by the lossless coding unit 106.
In view of the above, the present invention has been conceived to solve the above problems, and an object of the present invention is to provide an audio signal coding method and decoding method which can, not only reduce the delay, but also enhance the coding efficiency and reduce audio artifact upon coding.
In order to solve the above problems, the audio signal coding method of the present invention is an audio signal coding method for coding an audio signal to be coded, the method comprising: judging for each of frames whether or not coding should be performed on each of two or more subframes into which the frame is divided, based on an audio signal contained in the frame into which the audio signal to be coded is divided for every set of samples; when judged that the coding should not be performed on each of the subframes, performing, for each of the frames, frame processing of (i) determining a first value representing a characteristic of an audio signal of the frame, and (ii) coding the audio signal using the determined first value; and when judged that the coding should be performed on each of the subframes, performing, for each subframe, subframe processing of (i) determining a second value representing a characteristic of an audio signal of the subframe, and (ii) coding the audio signal using the determined second value, wherein in the performing of the subframe processing, whether or not all the second values determined for the subframes are the same is judged, and when all the second values are the same, the audio signal is coded as exceptional processing, with at least one of the second values being a different value.
This makes it possible not only to reduce the delay, but also enhance the coding efficiency and reduce audio artifact upon coding. In addition, the function of executing exceptional processing is included, allowing utilization of meaninglessness involved in coding. Here, the meaninglessness involved in coding is observed when coded data obtained on a subframe-by-subframe basis indicates the same meaning as coded data obtained on a frame-by-frame basis. The coded data obtained on a subframe-by-subframe basis usually has a greater number of bits than the coded data obtained on a frame-by-frame basis. In other words, if both of the coded data indicate the same thing, the coded data obtained on a frame-by-frame basis is preferred since it has a smaller number of bits.
Further, it may be that in the performing of the subframe processing: an identification index is coded for each of the subframes, the identification index being for identifying whether the second values are the same or different between adjacent subframes; and when all the identification indices indicate that all the second values are the same, the audio signal is coded as the exceptional processing, with at least one of the second values being a different value.
As a result, the coding efficiency can be enhanced.
Furthermore, it may be that in the exceptional processing, the audio signal is coded with the second values assumed to monotonically increase or decrease between adjacent subframes.
Moreover, it may be that each of the first and second values is a gain value used for normalizing the audio signal or a value for determining quantizing precision.
Furthermore, the audio signal decoding method of the present invention is an audio signal decoding method for decoding a coded sequence of an audio signal coded using the above audio signal coding method, the audio signal decoding method comprising identifying that the exceptional processing has been performed and decoding the coded sequence, in the case where the coded sequence has been coded in the subframe processing.
This makes it possible to appropriately decode the coded sequence, on which the coding processing including the exceptional processing has been performed.
The audio signal coding method and decoding method of the present invention can be embodied as apparatuses. In addition, the present invention can be embodied as a program that causes a computer to execute the steps of the respective methods, and as a computer-readable recording medium recorded with the program.
The audio signal coding method and decoding method of the present invention can, not only reduce the delay, but also enhance the coding efficiency and reduce audio artifact upon coding.
Hereinafter, embodiments of the present invention shall be described with reference to the drawings.
An audio signal coding apparatus of the present embodiment is capable of selecting a frame coding mode for coding on a frame-by-frame basis and a subframe coding mode for coding on a subframe-by-subframe basis. Here, a subframe is a result of dividing a frame into two or more sections. Further, in the subframe coding mode, the audio signal coding apparatus codes information indicating whether gain values determined for respective subframes are of the same values or different values between temporally consecutive subframes. In the case where the determined gain values are of the same values among all the subframes, it is the same as the case where a single gain is determined for each frame. Thus, in such a case, exceptional processing different from normal processing (coding performed when gain values are deemed to be of the same values among all the subframes) is performed. It is to be noted that a gain in the present embodiment represents a ratio when a given amplitude of an audio signal is assumed to be 1, and is a value used for normalizing the audio signal.
An audio signal coding apparatus 300 in the figure includes a judging unit 301, a frame processing unit 310, and a subframe processing unit 320. The frame processing unit 310 corresponds to the conventional audio signal coding apparatus 100 shown in
The judging unit 301 judges whether or not to perform coding on a subframe-by-subframe basis based on an audio signal contained in a frame so as to determine to which one of the frame processing unit 310 and the subframe processing unit 320 an audio signal sequence should be outputted.
More specifically, the judging unit 301 judges whether coding should be performed on a frame-by-frame basis (frame coding mode) or on a subframe-by-subframe basis (subframe coding mode) by detecting the maximum amplitude (energy) on a subframe-by-subframe basis for an input audio signal sequence. In the case of selecting the frame coding mode, the input audio signal sequence is outputted to the frame processing unit 310. In the case of selecting the subframe coding mode, the input audio signal sequence is outputted to the subframe processing unit 320.
The subframe processing unit 320 performs coding on the input audio signal sequence on a subframe-by-subframe basis. The subframe processing unit 320 includes an auditory redundancy eliminating unit 321 and an information redundancy eliminating unit 322. The information redundancy eliminating unit 322 and a lossless coding unit 326 included in the information redundancy eliminating unit 322 correspond to the information redundancy eliminating unit 102 and the lossless coding unit 106 shown in
The auditory redundancy eliminating unit 321 eliminates auditory redundancy on a subframe-by-subframe basis. The auditory redundancy eliminating unit 321 includes an auditory model 323, a pre-filtering unit 324, and a subframe quantizing unit 325. The auditory model 323 and the pre-filtering unit 324 have the same structures as those of the auditory model 103 and the pre-filtering unit 104 shown in
From the audio signal sequence inputted from the pre-filtering unit 324, the subframe quantizing unit 325 divides an audio signal of one frame into two or more subframes so as to perform quantization by multiplying by a gain on a subframe-by-subframe basis.
Assuming the gain as Gp and the audio signal sequence inputted into the subframe quantizing unit 325 as y(i), a relationship shown in Expression 1 can be observed for a value x(i) which is the target of the quantization.
y(i)=Gp×x(i) (Expression 1)
From the relationship shown in Expression 1, x(i) can be derived when the gain Gp is determined. Generally, x(i) is a real value, and the subframe quantizing unit 325 quantizes the real value x(i) into an integer. Then, the quantized x(i) is outputted to the lossless coding unit 326.
In the case of
In view of the above, in the case of
In order to implement coding that suppresses audio artifact and enhances the coding efficiency, the subframe quantizing unit 325 may refer to some of or all of the following as shown in
The first part of the stream which stores gain information shows gain configuration information indicating how a gain is stored. In the example shown in the figure, when the value is “0”, it means that only one gain value is assigned to the plural subframes. When the value is “1”, it means that two or more gain values are assigned to plural subframes. The settings for the gain configuration information are made by the judging unit 301. The judging unit 301 selects whether to use a gain value common to the subframes (set the value to “0”) or a gain value different for each subframe (set the value to “1”) for an input audio signal of one frame.
That is to say, the initial value of the gain configuration information being “0” means that the frame coding mode is to be executed. On the other hand, the initial value of the gain configuration information being “1” means that the subframe coding mode is to be executed.
When the initial value of the gain configuration information is “1”, three values are stored following the value “1”, namely, “x”, “y” and “z” as shown in
In such a manner as above, the gain configuration information is set. When the gain configuration information indicates “0”, there is only one gain parameter in total. When the gain configuration information is a sequence of “1010”, for example, there are two gain parameters. In more detail, the gain value of the subframe 1 is the same as that of the subframe 2, the gain value of the subframe 2 is different from that of the subframe 3, and the gain value of the subframe 3 is the same as that of the subframe 4.
There may be an unusual case where the gain configuration information is a sequence of “1000”. In such a case, exceptional processing different from the normal processing is executed. The reason for providing the exceptional processing is as follows:
When the gain configuration information is a sequence of “1000”, taking into account the ordinary meaning described above, it defines that there are two or more gain values, yet all the gain values are the same among the subframes 1 through 4. That is to say, the gain configuration information being “0” and a sequence of “1000” means that a single gain is used for a given one frame (all subframes). Thus, at least three bits become wasteful for indicating the same information. As described, even when the judging unit 301 selects the subframe coding mode and performs processing on a subframe-by-subframe basis, the result of the processing outputted becomes the same as the result of the processing performed in the frame coding mode. In that case, the coding efficiency deteriorates as a consequence.
As the exceptional processing different from the normal processing, the gains of the subframes are defined to monotonically increase (or monotonically decrease), for example.
In the coded stream, a value g1 comes first, and a value delta_gx the next as the coding sequence which follows the gain configuration information for deriving the actual gains of the subframes. The value g1 can be obtained by coding a gain derived using the maximum amplitude, for example, of an audio signal contained in the subframe 1. The value delta_gx can be obtained by coding a difference between a gain of a subframe x−1 and a gain of a subframe x. The variable x is an integer equal to or greater than two, and the maximum value of x equals the number of subframes (four in
By performing after-mentioned decoding processing on the values g1 and delta_gx, G1 and delta_Gx are derived, respectively. G1 is a value indicating the gain of the subframe 1. Delta_Gx is a value indicating a difference between a gain of a subframe x−1 and a gain of a subframe x.
When there is one gain value for one frame, the coded value g1 alone follows the gain configuration information in the coding processing. In the decoding processing, the gain G1 is derived from the value g1, and G1=G2=G3=G4 is applied. When there are two or more different gain values for one frame, values delta_g2, delta_g3, and delta_g4 follow the value g1 in the coding processing. In the decoding processing, the gain G1 is first derived from the value g1. Then, using delta_G2 which is a value derived by decoding delta_g2, G2=G1+delta_G2 is calculated. Thereafter, delta_g3 and delta_g4 are decoded to calculate the gains G3 and G4 sequentially.
In
Bs_same_gain[num] is flag information for identifying whether or not the gain of the num-1th subframe (hereinafter referred to as num-1 subframe) and the gain of the numth subframe (hereinafter referred to as num subframe) are the same. That is to say, it indicates “x”, “y” and “z” of the gain configuration information of
Bs_gain[0] is a value used for deriving a gain. When there is a single gain (bs_multi_gain is 0), the gain value derived using bs_gain[0] is the gain value of all of the subframes. When a gain includes two or more different values for plural subframes (bs_multi_gain is 1), the gain value derived using bs_gain[0] is the gain value of the initial subframe.
With frames for which bs_same_gain[num] is 0, the value for deriving the difference between a gain of the num-1 subframe and that of the num subframe (or deriving the gain of the num subframe) is coded as bs_delta[num], in the order starting from the frame with the smallest num.
The syntax shown in
Next, operations of the audio signal coding apparatus according to the present embodiment shall be described.
Upon receiving an audio signal sequence, the judging unit 301 selects either the frame coding mode or the subframe coding mode (S101). That is to say, bs_multi_gain of
More specifically, the judging unit 301 detects the fluctuations in the audio signal sequence by using the maximum amplitude of the audio signal sequence. When the audio signal has almost no fluctuations, e.g. when the maximum amplitude is no greater than a threshold, the quantization and coding should be performed on a frame-by-frame basis. Therefore, the audio signal sequence is outputted to the frame processing unit 310. On the other hand, when the maximum amplitude is greater than the threshold, the quantization and coding should be performed on a subframe-by-subframe basis. Therefore, the audio signal sequence is outputted to the subframe processing unit 320. The example of the audio signal sequence shown in
In the case of selecting the subframe coding mode (Yes in S101), the subframe quantizing unit 325 determines gains on a subframe-by-subframe basis and detects correlations between the determined gains (S102). In more detail, the subframe quantizing unit 325 detects whether the gain values determined on a subframe-by-subframe basis are of the same values or different values. In other words, it detects the values corresponding to “x”, “y” and “z” of
Next, the detected correlations (subframe-by-subframe based gain values) are judged (S103). When the determined gains are of two or more different values for plural subframes (Yes in S103), gains are derived on a subframe-by-subframe basis (S104).
In more detail, the difference between each of the gain values determined on a subframe-by-subframe basis and the gain value of the previous subframe is calculated.
When the determined gains are of the same values for all of the subframes (No in S103), exceptional processing is performed (5105). Here, as an example of the exceptional processing, the determined gains are assumed to monotonically increase (or monotonically decrease).
When the audio signal sequence shown in the figure is inputted, the judging unit 301 judges that the fluctuations in the audio signal are great by using the maximum amplitude of each subframe. Therefore, it selects the subframe coding mode. Here, the subframe quantizing unit 325 is assumed to determine a gain value based on the energy level of an audio signal sequence contained in a subframe. In the example shown in
It should be noted that when the judging unit 301 selects the frame coding mode for the audio signal sequence of
In order to prevent the selection of the subframe coding mode becoming meaningless as above, when the gain configuration information is a sequence of “1000”, quantization and coding are performed on the gains on a subframe-by-subframe basis, assuming that the gains monotonically increase as exceptional processing.
When the frame coding mode is selected (No in S101) in the selection processing (S101), one gain is determined for each frame, and the determined gain is quantized and coded (5106).
When the above processing (S101 to S106) is finished for one frame, the same processing is repeated for the next frame.
As above, the exceptional processing is performed in the case where, even when the subframe coding mode is selected, the same result as that of selecting the frame coding mode is obtained. By doing so, it is possible to prevent processing becoming meaningless.
Now, a conventional bit stream syntax is illustrated to clarify the difference from the present embodiment.
Next, an apparatus utilizing the audio signal decoding method of the present embodiment shall be described.
For the audio signal received from the post-filtering unit 402, the gain amplifying unit 403 amplifies a decoded audio signal on a subframe-by-subframe basis.
As stated above, the audio signal coding method and decoding method of the present embodiment can achieve efficient use through exceptional processing performed for a coding step which may become meaningless when coding is performed. As a result, while maintaining the benefit of the low-delay processing, it is possible to suppress audio artifact and achieve highly efficient coding.
Although the audio signal coding method and decoding method of the present embodiment have been described above, it is obvious that the present invention is not limited to the above embodiment, and many modifications are possible which are intended to be included within the scope of the present invention.
For example, as shown in
In
In
On the other hand, as for frames for which bs_same_gain[num] is 0, the value for deriving the difference between the gain of a num-1 subframe and that of a num subframe (or deriving the gain of the num subframe) is coded as bs_delta[num], in the order starting from the frame with the smallest num.
As described above, the bit stream syntax of
Furthermore, although in the present embodiment the judging unit 301 selects the frame coding mode and the subframe coding mode by using the maximum amplitude of an audio signal, the energy level of the audio signal may be used instead of the maximum amplitude.
Even in this case, exceptional processing needs to be performed when an audio signal sequence as shown in
The judging unit 301 selects the subframe coding mode since the fluctuations of the energy level of each subframe are great as shown in
Even when the judging unit 301 selects the subframe coding mode based on the energy level, there may be a case where the bit rate cannot be raised due to a restriction. In that case, as a consequence, the one consuming a small number of bits needs to be selected for each subframe, and the same coding processing is selected for every subframe. In this case too, the gain configuration information becomes a sequence of “1000”. As a result, as in the cases of
Further, as shown in
In addition, in the present embodiment, to derive gain values, gain values may be defined in a table and the like provided in advance. In this case, there may be an instance where decoding is performed using a method such as G1=table (g1). In that instance, there may be a case where decoding is performed through G2=table (g1+g2) or G2=table(g1)+table2(g2), for example.
When the gain configuration information defines monotone increase (monotone decrease), the values of G2 through G4 are decoded through Gp=Gp−1+delta Gp, Gp=table (gp−1+gp), or Gp=table(gp−1)+tablep(gp), for example. Here, p is an integer no less than 2.
Furthermore, differential coding is employed for coding two or more gains, but instead of using differential information, a value may be used which allows, for gains following a first gain, direct decoding of values of the corresponding subframes without using the value of a previous subframe.
Moreover, the audio signal coding apparatus 300 in the present embodiment includes, as shown in
According to an audio signal coding method and decoding method of the present embodiment, coding and decoding are performed on quantizing precision information which affects the coding efficiency upon lossless-coding. That is to say, what is different from Embodiment 1 is the target for coding and decoding being quantizing precision information instead of gains. In the present embodiment, descriptions of the same aspects as that of Embodiment 1 shall be omitted here, but different ones shall be described.
As in Embodiment 1, the apparatus which implements the audio signal coding method of the present embodiment is the audio signal coding apparatus shown in
In the present embodiment, the subframe quantizing unit 325 quantizes the quantizing precision information. For example, considering audibility, quantizing precision information Rp is set to a small value for an audio signal of an important sample, in order to ensure adequate quantizing precision.
Assuming an audio signal inputted into the subframe quantizing unit 325 as y(i) and the quantizing precision information as Rp, a relationship shown in Expression 2 can be observed for z(i) which is the target of the quantization.
y(i)=Rp×z(i) (Expression 2)
From the relationship shown in Expression 2, z(i) can be derived when the quantizing precision information Rp is determined. Generally, z(i) is a real value, and thus the subframe quantizing unit 325 quantizes the real value z(i) into an integer. Then, the quantized z(i) is outputted to the lossless coding unit 326.
As apparent from a comparison between Expression 1 illustrated in Embodiment 1 and Expression 2, the gain Gp is simply replaced with quantizing precision information Rp, and x(i) with z(i) accordingly. No change is made to the modules other than these, such as the lossless coding unit 326 and the auditory model 323.
As described above, the audio signal coding method and decoding method of the present embodiment can suppress audio artifact and make the absolute value of z(i) larger by, considering audibility, setting the quantizing precision information Rp to a small value for an audio signal of an important sample. This makes it possible to reduce the adverse effect of quantization errors that occur in the quantization process of converting a real value into an integer.
An audio signal coding method and decoding method of the present embodiment can be applied to an audio signal coding method and decoding method in which time-frequency transformation is performed. This is the difference from Embodiments 1 and 2, since in the coding method and decoding method of Embodiments 1 and 2, time-frequency transformation is basically not performed, that is, they are time-domain coding method and decoding method.
A first application is to a system using batch orthogonal transformation in which more than one transformation lengths are used, typified by the MPEG2-AAC.
In this system, a frame is formed for every given set of samples from an input audio signal, and the samples in the frame undergo batch orthogonal transformation so that a frequency spectral sequence is generated and then the frequency spectrum sequence is quantized and coded. It is selected whether to perform a single batch orthogonal transformation per frame or temporally-consecutive plural batch orthogonal transformations per frame.
When temporally-consecutive plural batch orthogonal transformations are performed per frame to obtain a frequency spectral sequence from each batch orthogonal transformation, the coding method of Embodiment 1 is applied to a representative gain of each frequency spectral sequence, so that the coding efficiency can be enhanced.
A second application is to a system using batch orthogonal transformation in which a single transformation length is used, typified by the Low Delay AAC.
In this system, a frame is formed for every given set of samples from an input audio signal, and the samples in the frame undergo batch orthogonal transformation so that a frequency spectral sequence is generated and the frequency spectrum sequence is quantized and coded. A single orthogonal transformation is performed per frame.
Thus, since the orthogonal transformation is performed only once per frame, it is impossible to obtain temporal fluctuations in a frame. In that case, to obtain information on temporal fluctuations, plural, temporal subframes are formed separately in advance regardless of the orthogonal transformation, and the subframes are used for quantizing and coding the temporal gain information. In the decoding processing, the plural subframes may be used when, for example, an audio signal of a frame decoded in batch orthogonal transformation is corrected using the temporal gain information.
The coding efficiency can be enhanced also by dividing a frequency spectral sequence, which is obtained from a single orthogonal transformation, into plural sub bands on the frequency axis (corresponding to subframes on the time axis), and then applying the coding method of Embodiment 1 to a representative gain of each sub band.
A third application is to a system using polyphase filtering for forming a time-frequency matrix, typified by the QMF (Quadrature Mirror Filter).
Obtained in this system is a time signal sequence containing plural samples in plural frequency sub bands. Thus, the coding method of Embodiment 1 may be applied to the gains of the signals of the plural frequency sub bands in given time samples. Further, a frequency sub band may be selected, and then, the coding method of Embodiment 1 may be applied to a representative gain of time signal sequences which contain plural samples of the selected frequency sub band and which are classified into groups in units of one or more time signal sequences.
A fourth application is to a system using, in addition to the polyphase filtering of the third application, batch orthogonal transformation typified by DCT, as additional processing.
In this system, output of the polyphase filtering is the same as that in the third application, but when frequency intervals of sub bands are long, for example, there occurs a deficiency in the frequency resolution of low frequency components in particular. So, to improve the frequency resolution of low frequency components, time-frequency transformation is performed using, for example, orthogonal transformation such as Discrete Cosine Transform (DCT), on a time signal sequence which is included in the output of the polyphase filtering and which corresponds to the low frequency components.
The fourth application can be implemented as a combination of the second and third applications. For example, the same technique as that of the second application may be used in low frequency components, whereas the technique of the third application may be used in high frequency components to achieve the same enhancement of the coding efficiency.
As described, even in the various systems utilizing the time-frequency transformation included in the audio signal coding method and decoding method, the coding efficiency can be enhanced by basically using the coding method and decoding method similar to the ones in Embodiment 1. Although the coding of gains has been described above, even when the coding method and decoding method similar to the ones in Embodiment 2 are performed with quantizing precision in place of the gains, the coding efficiency can still be expected to improve in the same manner.
As described, the audio signal coding method and decoding method of the present embodiment are applicable in the case where the target for coding is divided into some groups (e.g. frames on the time axis and bands on the frequency axis) and then coding is performed on a group-by-group basis. They are also applicable in the case where one group is divided into plural sub groups (e.g. subframes on the time axis and sub bands on the frequency axis) and then coding is performed on a sub group-by-sub group basis.
Although the audio signal coding method and decoding method of the present invention have been described above with some exemplary embodiments, the present invention is not to be limited to these embodiments. Those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention
For example, in the present embodiment, performed as exceptional processing is processing in which gain values, for example, are assumed to monotonically increase or decrease. However, it may be any other processing as long as it is different from the normal processing. For example, it may be processing in which gain values, for example, are assumed to take a large value and a small value alternately on a subframe-by-subframe basis. Further, it may be processing in which gain values, for example, are assumed to vary between subframes in accordance with a predetermined rule.
Furthermore, although in the present embodiment values for determining gain values or quantizing precision are quantized and coded, the target of the quantization and coding is not limited to such values. The quantization and coding may be performed on other values related to the coding of audio signals.
The present invention may be embodied as: a program that causes a computer to execute the steps of the audio signal coding method and decoding method of the present invention; a computer-readable recording medium, such as a CD-ROM, recorded with the program; and information, data or signal that indicates the program. Such a program, information, data, or a signal may be distributed via a communication network such as the Internet.
The audio signal coding method and decoding method of the present invention are applicable to various applications to which conventional audio coding methods and decoding methods have been applied. Application is possible particularly when, for example, broadcast contents are transmitted, recorded on a storing medium such as DVDs and SD cards and played back, and when AV contents are transmitted to a communication appliance typified by mobile phones. Further, it is also useful when audio signals are transmitted as electronic data exchanged over the Internet.
Number | Date | Country | Kind |
---|---|---|---|
2006-335399 | Dec 2006 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2007/073503 | 12/5/2007 | WO | 00 | 2/25/2009 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2008/072524 | 6/19/2008 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6636829 | Benyassine et al. | Oct 2003 | B1 |
7752039 | Bessette | Jul 2010 | B2 |
7953595 | Xie et al. | May 2011 | B2 |
20030046064 | Moriya et al. | Mar 2003 | A1 |
20070016402 | Schuller et al. | Jan 2007 | A1 |
20070083362 | Moriya et al. | Apr 2007 | A1 |
20100023322 | Schnell et al. | Jan 2010 | A1 |
Number | Date | Country |
---|---|---|
2002-26738 | Jan 2002 | JP |
2003-332914 | Nov 2003 | JP |
2005-49429 | Feb 2005 | JP |
2005-165183 | Jun 2005 | JP |
2005-260373 | Sep 2005 | JP |
2005078705 | Aug 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20100042415 A1 | Feb 2010 | US |