This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2011-187570, filed on Aug. 30, 2011, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to an encoding method and the like.
One of the coding schemes for an audio signal is High Efficiency-Advanced Audio Coding (HE-AAC). In HE-AAC, low-frequency components of an audio signal are encoded with AAC encoding, and high-frequency components are encoded with spectral band replication (SBR) encoding, thereby improving the coding efficiency.
An exemplary encoding apparatus of the related art will be described which encodes an audio signal with HE-AAC.
The downsampler 10 is a processor that performs downsampling on an audio signal. The downsampler 10 outputs the audio signal having a low-frequency component obtained through the downsampling, to the ACC encoder 20.
The ACC encoder 20 is a processor that applies ACC to the audio signal having the low-frequency component so as to encode the audio signal having the low-frequency component. The ACC encoder 20 outputs the encoded audio signal having the low-frequency component to the multiplexer 40.
The SBR encoder 30 is a processor that encodes the high-frequency component of the audio signal. The SBR encoder 30 outputs the encoded high-frequency component of the audio signal to the multiplexer 40. The SBR encoder 30 controls quantization of the audio signal in such a manner that the time resolution is set to high when the audio signal has a transient, or that the frequency resolution is set to high when the audio signal is stationary. The state in which an audio signal has a transient means that, for example, the audio signal includes an abrupt amplitude change.
The multiplexer 40 is a processor that multiplexes the encoded audio signal having the low-frequency component and the encoded audio signal having the high-frequency component and that outputs the multiplexed audio signal to an external apparatus.
Now, an example of the SBR encoder 30 illustrated in
The analysis filter bank 31 is a processor that transforms an audio signal into a time-frequency spectrum. The analysis filter bank 31 outputs the audio signal subjected to a time-frequency-spectrum transformation to the transient detector 32, the spectrum estimator 34, and the additional information determiner 35.
The transient detector 32 is a processor that analyzes the audio signal and that detects a state in which the audio signal has a transient. The transient detector 32 outputs the detection result to the grid information generator 33.
The grid information generator 33 is a processor that controls the quantizer 36 so that the time resolution is set to high when the audio signal has a transient, and the frequency resolution is set to high when the audio signal is stationary.
The spectrum estimator 34 is a processor that outputs, to the quantizer 36, supplementary information used for replicating the high-frequency component from the low-frequency component. The additional information determiner 35 is a processor that outputs, to the quantizer 36 and the multiplexer 37, additional information representing the high-frequency component of the audio signal.
The quantizer 36 is a processor that encodes the high-frequency component with the time resolution and the frequency resolution which are determined under the control of the grid information generator 33. The quantizer 36 outputs the encoded high-frequency component of the audio signal to the multiplexer 37.
The multiplexer 37 is a processor that multiplexes the encoded audio signal having the high-frequency component, which is output from the quantizer 36, and the additional information, and outputs the multiplexed information.
However, in the related art described above, there is a problem in that the implementation scale and the processing load are large.
As illustrated in
Regarding the related art, see Japanese Laid-open Patent Publication No. 2008-129541.
In addition, regarding the related art, see Suzuki, Masanao, Ota, Yasuji, and Ito, Takashi, “Wansegu Housou Muke Audio Fugouka Gijutsu (Audio Coding Algorithm for One-Segment Broadcasting),” FUJITSU.58, 2, pp. 162-167, March 2007.
According to an aspect of the embodiments, an encoding method executed by a computer, the method includes converting the computer information about a transient included in a low-frequency component of an audio signal into information about a transient included in a high-frequency component of the audio signal; detecting, by the computer the transient of the high-frequency component of the audio signal based on the high-frequency component of the audio signal and on the information about the transient of the high-frequency component obtained by the converting; and encoding, by the computer the high-frequency component of the audio signal based on the transient detected by the detecting.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Embodiments of an encoding method, an encoding apparatus, and an encoding program which are disclosed herein will be described in detail below based on the drawings. These embodiments are not limited to the disclosure set forth herein.
The downsampler 110 is a processor that performs downsampling on an audio signal. The downsampler 110 outputs the audio signal having a low-frequency component obtained through the downsampling, to the AAC encoder 120.
The AAC encoder 120 is a processor that applies AAC to the audio signal having the low-frequency component so as to encode the audio signal having the low-frequency component. The AAC encoder 120 outputs the encoded audio signal having the low-frequency component to the multiplexer 140.
The AAC encoder 120 determines whether or not the audio signal having the low-frequency component has a transient based on the audio signal. The AAC encoder 120 outputs, to the SBR encoder 130, the determination result as to whether or not the audio signal has a transient. In the following description, the determination result as to whether or not the audio signal has a transient is referred to as transient information of the low-frequency component.
The SBR encoder 130 is a processor that encodes the high-frequency component of the audio signal. The SBR encoder 130 outputs the encoded high-frequency component of the audio signal to the multiplexer 140. The SBR encoder 130 controls quantization so that the time resolution is set to high when the audio signal has a transient, and the frequency resolution is set to high when the audio signal is stationary.
The SBR encoder 130 converts the transient information of the low-frequency component obtained from the AAC encoder 120 into transient information of the high-frequency component, and determines whether or not the audio signal has a transient based on the transient information of the high-frequency component.
The phase or the like of the audio signal to be analyzed by the AAC encoder 120 is different from that of the audio signal to be analyzed by the SBR encoder 130. In the example illustrated in
Because of this, the SBR encoder 130 adjusts the phase in the transient information of the low-frequency component, thereby converting the transient information of the low-frequency component into that of the high-frequency component. The SBR encoder 130 sets the timing obtained by shifting by TA the timing at which a transient is detected for the low-frequency component, as the timing at which a transient occurs in the high-frequency component. The detailed description about the SBR encoder 130 will be made below.
The multiplexer 140 is a processor that multiplexes the encoded audio signal having the low-frequency component and the encoded audio signal having the high-frequency component and that outputs the multiplexed audio signal to an external apparatus.
Now, an exemplary configuration of the AAC encoder 120 and the SBR encoder 130 which are illustrated in
As illustrated in
The low-frequency transient detector 121 sequentially obtains the frames of the audio signal obtained through the downsampling, and divides each of the frames into eight subframes. The low-frequency transient detector 121 analyzes each of the subframes and detects a subframe including a transient. For example, the low-frequency transient detector 121 detects a subframe having an abrupt amplitude change, as a subframe including a transient. The low-frequency transient detector 121 outputs the detection result to the transient information converter 132 as transient information of the low-frequency component. In addition, the low-frequency transient detector 121 outputs the detection result to the low-frequency converter 122.
The low-frequency converter 122 is a processor that performs frequency conversion on the audio signal in accordance with the detection result obtained by the low-frequency transient detector 121. The low-frequency converter 122 outputs the audio signal obtained through the frequency conversion, to the low-frequency encoder 123.
Now, the SBR encoder 130 will be described. The high-frequency converter 131 is a processor that performs frequency conversion on an audio signal. The high-frequency converter 131 outputs the audio signal obtained through the frequency conversion, to the high-frequency transient detector 133 and the high-frequency encoder 134.
The transient information converter 132 is a processor that converts the transient information of the low-frequency component into the transient information of the high-frequency component.
The transient information converter 132 determines which frame in the signal 70c corresponds to the time point obtained by adding a certain time period to the time point of the second subframe in the (n−2)th frame of the signal 70b. In the example illustrated in
The transient information converter 132 generates transient information of the high-frequency component based on the determination result.
The high-frequency transient detector 133 is a processor that narrows down a frame to be subjected to detection of the presence/absence of a transient, based on the transient information of the high-frequency component, and that detects a subframe including a transient from the narrowed-down frame. For example, the case where the high-frequency transient detector 133 obtains the transient information of the high-frequency component as illustrated in
For example, when the high-frequency transient detector 133 obtains the transient information of the high-frequency component as illustrated in
The high-frequency transient detector 133 outputs the fame number and the subframe number at which a transient is included, to the high-frequency encoder 134.
The high-frequency encoder 134 is a processor that encodes the high-frequency component of the audio signal based on the detection result obtained by the high-frequency transient detector 133. The high-frequency encoder 134 encodes a frame including no transients with a high frequency resolution. For example, a frequency resolution which is equal to or more than a certain resolution is used.
In contrast, the high-frequency encoder 134 encodes the subframes in the frame including a transient with a high time resolution. For example, a time resolution which is equal to or more than a certain resolution is used. The high-frequency encoder 134 may encode a subframe including no transients with a high frequency resolution. The high-frequency encoder 134 outputs the encoded audio signal to the multiplexer 140.
Now, a procedure performed by the encoding apparatus 100 will be described.
The encoding apparatus 100 holds the transient information of the low-frequency component of the audio signal in operation S104, and converts the transient information of the low-frequency component into transient information of the high-frequency component in operation S105. The encoding apparatus 100 performs frequency conversion in operation S106, and specifies a corresponding frame in operation S107. In operation S107, the corresponding frame is a frame specified from the transient information of the high-frequency component.
The encoding apparatus 100 determines whether the subframes included in the corresponding frame include a transient in operation S108. The encoding apparatus 100 performs SBR encoding based on the determination result in operation S109, and generates a bit stream in operation S110.
Now, an effect of the encoding apparatus 100 according to the first embodiment will be described. The encoding apparatus 100 converts the transient information of the low-frequency component into the transient information of the high-frequency component, and estimates a frame including a transient, in the audio signal having the high-frequency component. Thus, the SBR encoder 130 does not necessarily detect the presence/absence of a transient for all of the frames of an audio signal having a high-frequency component, resulting in reduction in the processing load.
Now, an encoding apparatus according to a second embodiment will be described.
The downsampler 210 is a processor that performs downsampling on an audio signal. The downsampler 210 outputs the audio signal having a low-frequency component obtained through the downsampling, to the AAC encoder 220.
The AAC encoder 220 is a processor that applies AAC to the audio signal having the low-frequency component so as to encode the audio signal having the low-frequency component. The AAC encoder 220 outputs the encoded audio signal having the low-frequency component to the multiplexer 240.
The AAC encoder 220 divides the audio signal having the low-frequency component into multiple subframes, and analyzes whether each of the subframes has a transient. The AAC encoder 220 separates the subframes into an arbitrary number of groups in accordance with the position of the transient, and outputs the determination result to the SBR encoder 230. In the description below, the determination result as to whether or not each group has a transient is referred to as grouping information.
The SBR encoder 230 is a processor that encodes the high-frequency component of an audio signal. The SBR encoder 230 outputs the encoded high-frequency component of the audio signal to the multiplexer 240. The SBR encoder 230 controls quantization so that the time resolution is set to high when the audio signal has a transient, and the frequency resolution is set to high when the audio signal is stationary.
The SBR encoder 230 converts the grouping information obtained from the AAC encoder 220 into transient information of the high-frequency component, and determines whether or not the audio signal has a transient based on the transient information of the high-frequency component. A process in which the SBR encoder 230 converts the grouping information into the transient information of the high-frequency component will be described below.
The multiplexer 240 is a processor that multiplexes the encoded audio signal having the low-frequency component and the encoded audio signal having the high-frequency component and that outputs the multiplexed audio signal to an external apparatus.
Now, an exemplary configuration of the AAC encoder 220 and the SBR encoder 230 which are illustrated in
As illustrated in
The low-frequency transient detector 221 sequentially obtains the frames of the audio signal obtained through the downsampling, divides each of the frames into eight subframes, and classifies the subframes into an arbitrary number of groups.
The low-frequency transient detector 221 analyzes subframes in each of the groups, and detects a subframe including a transient. In the example illustrated in
The low-frequency converter 222 is a processor that performs frequency conversion on the audio signal in accordance with the detection result obtained by the low-frequency transient detector 221. The low-frequency converter 222 outputs the audio signal obtained through the frequency conversion, to the low-frequency encoder 223.
Now, the SBR encoder 230 will be described. The high-frequency converter 231 is a processor that performs frequency conversion on an audio signal. The high-frequency converter 231 outputs the audio signal obtained through the frequency conversion, to the high-frequency transient detector 233 and the high-frequency encoder 234.
The transient information converter 232 is a processor that converts the grouping information into the transient information of the high-frequency component.
The transient information converter 232 determines which subframe in which frame of the signal 70c corresponds to the time point obtained by adding a certain time period to the time point of the group 2 in the (n−2)th frame of the signal 70b. In the example illustrated in
The transient information converter 232 generates transient information of the high-frequency component based on the determination result.
The high-frequency transient detector 233 is a processor that outputs the frame number and the subframe number, at which a transient is included, based on the transient information of the high-frequency component to the high-frequency encoder 234.
The high-frequency encoder 234 is a processor that encodes the high-frequency component of the audio signal based on the information obtained from the high-frequency transient detector 233. The high-frequency encoder 234 encodes a frame including no transients with a high frequency resolution. For example, a frequency resolution which is equal to or more than a certain resolution is used.
In contrast, the high-frequency encoder 234 encodes the subframes in the frame including a transient with a high time resolution. For example, a time resolution which is equal to or more than a certain resolution is used. The high-frequency encoder 234 may encode a subframe including no transients with a high frequency resolution. The high-frequency encoder 234 outputs the encoded audio signal to the multiplexer 240.
Now, a procedure performed by the encoding apparatus 200 will be described.
The encoding apparatus 200 holds the grouping information in operation S204, and converts the grouping information into transient information of the high-frequency component in operation S205. The encoding apparatus 200 performs frequency conversion in operation S206. The encoding apparatus 200 determines whether the high-frequency component of the audio signal include a transient based on the transient information of the high-frequency component in operation S207.
The encoding apparatus 200 performs SBR encoding based on the determination result in operation S208, and generates a bit stream in operation S209.
Now, an effect of the encoding apparatus 200 according to the second embodiment will be described. The encoding apparatus 200 converts the grouping information into the transient information of the high-frequency component, and detects a subframe including a transient, without performing an actual transient detection process on the audio signal having the high-frequency component. Accordingly, the SBR encoder 230 does not necessarily detect a transient directly from the audio signal, resulting in reduction in the implementation scale and the processing load.
Now, an encoding apparatus according to a third embodiment will be described.
The downsampler 310 is a processor that performs downsampling on an audio signal. The downsampler 310 outputs the audio signal having a low-frequency component obtained through the downsampling, to the AAC encoder 320.
The AAC encoder 320 is a processor that applies AAC to the audio signal having the low-frequency component so as to encode the audio signal having the low-frequency component. The AAC encoder 320 outputs the encoded audio signal having the low-frequency component to the multiplexer 340.
The AAC encoder 320 divides the audio signal having the low-frequency component into multiple subframes. The AAC encoder 320 determines whether or not each of the subframes includes a transient, and outputs the determination result to the SBR encoder 330. In the description below, the determination result as to whether or not each of the subframes has a transient is referred to as transient information of the low-frequency component.
The SBR encoder 330 converts the transient information of the low-frequency component obtained from the AAC encoder 320 into transient information of the high-frequency component, and determines whether or not the audio signal has a transient based on the transient information of the high-frequency component. A process will be described below in which the SBR encoder 330 converts the transient information of the low-frequency component into the transient information of the high-frequency component.
The multiplexer 340 is a processor that multiplexes the encoded audio signal having the low-frequency component and the encoded audio signal having the high-frequency component and that outputs the multiplexed audio signal to an external apparatus.
Now, an exemplary configuration of the AAC encoder 320 and the SBR encoder 330 which are illustrated in
As illustrated in
The low-frequency transient detector 321 sequentially obtains the frames of the audio signal obtained through the downsampling, and divides each of the frames into eight subframes. The low-frequency transient detector 321 analyzes each of the subframes and detects a subframe including a transient. The low-frequency transient detector 321 outputs the detection result to the transient information converter 332 as transient information of the low-frequency component. In addition, the low-frequency transient detector 321 outputs the detection result to the low-frequency converter 322.
The low-frequency converter 322 is a processor that performs frequency conversion on the audio signal in accordance with the detection result obtained by the low-frequency transient detector 321. The low-frequency converter 322 outputs the audio signal obtained through the frequency conversion, to the low-frequency encoder 323.
Now, the SBR encoder 330 will be described. The high-frequency converter 331 is a processor that performs frequency conversion on an audio signal. The high-frequency converter 331 outputs the audio signal obtained through the frequency conversion, to the high-frequency transient detector 333 and the high-frequency encoder 334.
The transient information converter 332 is a processor that converts the transient information of the low-frequency component into the transient information of the high-frequency component.
The transient information converter 332 determines which subframe in which frame of the signal 70c corresponds to the time point obtained by adding a certain time period to the time point of the subframe #1 in the (n−2)th frame of the signal 70b. In the example illustrated in
The transient information converter 332 generates transient information of the high-frequency component based on the determination result.
The high-frequency transient detector 333 is a processor that outputs the frame number and the subframe number, at which a transient is included, based on the transient information of the high-frequency component to the high-frequency encoder 334.
The high-frequency encoder 334 is a processor that encodes the high-frequency component of the audio signal based on the information obtained from the high-frequency transient detector 333. The high-frequency encoder 334 encodes a frame including no transients with a high frequency resolution. For example, a frequency resolution which is equal to or more than a certain resolution is used.
In contrast, the high-frequency encoder 334 encodes the subframes in the frame including a transient with a high time resolution. For example, a time resolution which is equal to or more than a certain resolution is used. The high-frequency encoder 334 may encode a subframe including no transients with a high frequency resolution. The high-frequency encoder 334 outputs the encoded audio signal to the multiplexer 340.
Now, a procedure performed by the encoding apparatus 300 will be described.
The encoding apparatus 300 holds the transient information of the low-frequency component in operation S304, and converts the transient information of the low-frequency component into transient information of the high-frequency component in operation S305. The encoding apparatus 300 performs frequency conversion in operation S306. The encoding apparatus 300 detects a subframe including a transient based on the transient information of the high-frequency component in operation S307.
The encoding apparatus 300 performs SBR encoding based on the detection result in operation S308, and generates a bit stream in operation S309.
Now, an effect of the encoding apparatus 300 according to the third embodiment will be described. The encoding apparatus 300 converts the transient information of the low-frequency component into the transient information of the high-frequency component, and detects a subframe including a transient, without performing an actual transient detection process on the audio signal having the high-frequency component. Accordingly, the SBR encoder 330 does not necessarily detect a transient directly from the audio signal, resulting in reduction in the implementation scale and the processing load.
Now, an alternative process performed by the encoding apparatus 300 will be described. In the example illustrated in
In this case, the high-frequency transient detector 333 performs detection of a transient on the subframes #8 to #10 of the nth frame, and outputs the detection result to the high-frequency encoder 334. Thus, the encoding apparatus 300 determines whether or not a transient is included, only for subframes including a transient, resulting in reduction in the processing load.
Now, an exemplary computer will be described which executes encoding programs for achieving functions similar to the encoding apparatuses described in the first to third embodiments.
As illustrated in
The hard disk apparatus 507 includes, for example, a downsampling program 507a, an AAC program 507b, an SBR program 507c, and a multiplexing program 507d. The CPU 501 reads out the downsampling program 507a, the AAC program 507b, the SBR program 507c, and the multiplexing program 507d, and develops them in the RAM 506.
The downsampling program 507a functions as a downsampling process 506a. The AAC program 507b functions as an AAC process 506b. The SBR program 507c functions as an SBR process 506c. The multiplexing program 507d functions as a multiplexing process 506d.
For example, the downsampling process 506a corresponds to the downsamplers 110, 210, and 310. The AAC process 506b corresponds to the AAC encoders 120, 220, and 320. The SBR process 506c corresponds to the SBR encoders 130, 230, and 330. The multiplexing process 506d corresponds to the multiplexers 140, 240, and 340.
The downsampling program 507a, the AAC program 507b, the SBR program 507c, and the multiplexing program 507d are not necessarily stored in advance in the hard disk apparatus 507. For example, these programs are stored in a “portable physical medium”, such as a flexible disk (FD), a compact disk-read-only memory (CD-ROM), a digital versatile disk (DVD), a magneto-optical disk, or an integrated circuit (IC) card, which is inserted into the computer 500. Then, the computer 500 may read out the downsampling program 507a, the AAC program 507b, the SBR program 507c, and the multiplexing program 507d from the inserted medium and execute them.
Each of the downsampler 110, the AAC encoder 120, the SBR encoder 130, and the multiplexer 140 illustrated in
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2011-187570 | Aug 2011 | JP | national |