This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2011-45171, filed on Mar. 2, 2011, the entire contents of which are incorporated herein by reference.
The embodiments disclosed herein are related to, for example, an audio coding device, an audio coding method, and a computer-readable recording medium storing an audio coding computer program.
Hitherto, audio signal coding methods for compressing the amount of data of an audio signal have been developed. As one of such coding methods, High-Efficiency Advanced Audio Coding (HE-AAC) is known. This coding method has been standardized as MPEG-2 HE-AAC and MPEG-4 HE-AAC by the Moving Picture Experts Group (MPEG). In HE-AAC, the low frequency band (low frequency components) of an audio signal is coded in accordance with an Advanced Audio Coding (AAC) method, whereas the high frequency band (high-frequency components) of an audio signal is coded in accordance with a Spectral Band Replication (SBR) method. In the SBR method, each frame of an audio signal is divided into a plurality of time-frequency domains, and auxiliary information or the like for reproducing high-frequency components by reproducing corresponding low frequency components on the basis of the signal power within each time-frequency domain are calculated as SBR data. Then, an SBR parameter is coded. This time-frequency domain is called a grid.
In the SBR method, if the time length of a grid is too long with respect to the temporal change of an audio signal, the electric power of the audio signal is averaged in the grid, and thereby the information indicating the temporal change is lost. As a result, the reproduction sound quality of the coded audio signal deteriorates. There is a case where, in particular, as a result of sound in a certain time period being affected by sound later than that sound, sound that differs from the original sound is produced. Such a phenomenon is called a pre-echo. In Japanese National Publication of International Patent Application No. 2003-529787, a technology is disclosed in which a highly transient sound, such as attack sound, is detected with respect to each channel of an audio signal, and a grid is set so that the time resolution increases with respect to the highly transient sound. Such a transient portion of sound is called a transient.
Furthermore, in Japanese Laid-open Patent Publication No. 2006-3580, a technology has been disclosed in which when it is determined that the degree of similarity of a plurality of channels of an audio signal is high, a grouping of frequency data such that an audio signal is frequency-converted in the time direction or in the frequency direction is performed in common with respect to a plurality of channels.
According to an aspect of the embodiments, an audio coding device includes a time frequency transform unit that, with respect to each of a plurality of channels included in an audio signal, generates a time frequency signal indicating frequency components at each time by performing a time frequency transform on a signal of the channel; a transient detection unit that detects a transient with respect to each of the plurality of channels so as to obtain a transient detection time; a transient time correction unit that, when a difference in transient detection times between an early detection channel in which the transient detection time is earliest and a late detection channel that is a channel other than the early detection channel among the plurality of channels is within a range in which the transient may be regarded as a transient caused by the same sound, makes a correction so that the transient detection time of the late detection channel coincides with the transient detection time of the early detection channel; a grid determination unit that, with respect to each of the plurality of channels, sets a grid for a non-transient sound in a section in which the transient has not been detected, and sets a grid for a transient sound having a length of time shorter than that of the grid for a non-transient sound in a section in which the transient has been detected; and a coding unit that codes the audio signal for each grid for a transient sound or for each grid for a non-transient sound.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
These and/or other aspects and advantages will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawing of which:
A description will be given below of an audio coding device according to an embodiment. First, with reference to
In
In
The audio coding device of the related art compares, for example, the moving accumulated value of the power of the signal of each channel with a certain threshold value, and determines that a transient has occurred at a time at which the moving accumulated value becomes greater than the certain threshold value. For example, when a threshold value Th is a value indicated by a dotted line 113 in
In
Accordingly, the audio coding device disclosed in the present specification determines whether or not the transient detected in each channel is caused from the same sound on the basis of the difference between transient detection times among the plurality of channels and the power of the signal at, the detection time of the transient. When the transient detected in each channel has been caused from the same sound, the audio coding device unifies the start times of the grids for SBR coding with respect to all the channels to the earliest time among the detection times of the transients of the plurality of channels.
In the present embodiment, an audio signal to be coded is a stereo audio signal having a channel on the left side and a channel on the right side.
These units included in the audio coding device 1 are formed as individually separate circuits. Alternatively, these units included in the audio coding device 1 may be mounted, on the audio coding device 1, as one integrated circuit in which circuits corresponding to the units are integrated. In addition, these units included in the audio coding device 1 may be function modules which are implemented by a computer program that is executed on a processor included in the audio coding device 1.
The down-sampling unit 11 obtains the low frequency components of each channel of the input audio signal, which is coded by the AAC coder 12. The frequency of the upper limit of the low frequency components is set to, for example, ½ of the highest frequency of the input audio signal. The down-sampling unit 11 performs filtering on a signal of the time domain of each channel by using a low-pass filter. Such a low-pass filter may be made to be a finite or infinite impulse response digital filter. The down-sampling unit 11 filters a signal of the time domain of each channel by using, for example, an infinite impulse response filter of the following equation, which is indicated in the HE-AAC encoder standard (TS26.410) disclosed by the standardization project 3GPP.
where ak and bk (k=1, 2, . . . , 13) are filter coefficients. For the values of ak and bk, for example, values indicated in TS26.410 are used. z−k is a signal that is input to this filter at a k-th time.
Furthermore, the down-sampling unit 11 may perform a time frequency transform on the signal of each channel, for example, for each frame, and apply a low-pass filter to the frequency signal obtained thereby, thereby extracting low frequency components of the signal of each channel. In this case, the down-sampling unit 11 may use, as a time-frequency transform, for example, a high-speed Fourier transform, a discrete cosine transform, or a modified discrete cosine transform. The down-sampling unit 11 outputs the extracted low frequency components of the signal of each channel to the AAC coder 12.
The AAC coder 12 codes the low frequency components of the signal of each channel, which are received from the down-sampling unit 11, in accordance with the AAC coding method. The AAC coder 12 may use the technology disclosed in, for example, Japanese Laid-open Patent Publication No. 2007-183528. Specifically, the AAC coder 12 calculates a perceptual entropy (PE) value. The PE value has characteristics that become a large value with respect to sound whose signal level changes in a short time, such as attack sound like sound emitted by a percussion instrument. Accordingly, in the AAC coder 12, a window that is set along the time axis is shortened with respect to a frame whose PE value becomes comparatively large, and the window is lengthened with respect to a frame whose PE value becomes comparatively small. For example, the short window contains 256 samples, and the long window contains 2048 samples. The AAC coder 12 performs a modified discrete cosine transform (MDCT) on low frequency components of the signal of each channel by using a window having the determined length, thereby converting the low frequency components of the signal of each channel into a set of MDCT coefficients. The AAC coder 12 quantizes the set of MDCT coefficients at a certain quantization width, and codes the set of quantized MDCT coefficients and the quantization coefficient used to the determine the quantization width in accordance with a variable length coding method, such as arithmetic coding or Huffman coding.
The AAC coder 12 outputs the set of variable-length-coded MDCT coefficients and the quantization coefficient to the bit stream generation unit 14.
The SBR coder 13 codes high-frequency components of the signal for each channel in accordance with a Spectral Band Replication (SBR) coding method. The high-frequency components are components within the signal of each channel, from which low frequency components that are coded by the AAC coder 12 are excluded.
The SBR coder 13 includes a time frequency transform unit 21, a grid generation unit 22, a grid power calculation unit 23, a power quantization unit 24, an auxiliary information calculation unit 25, an auxiliary information quantization unit 26, and a multiplexing unit 27.
The time frequency transform unit 21 converts the signal of the time domain of each channel of an audio signal, which is input to the audio coding device 1, into a time frequency signal.
In the present embodiment, the time frequency transform unit 21 uses a quadrature mirror filter (QMF) filter bank in order to obtain a time frequency signal. The QMF filter bank is represented as in the following equation
where k is a variable indicating the frequency band, and in this example, denotes the k-th frequency band when the entire frequency band is equally divided into 64 portions. n denotes the time sequence of 128 sampling points that are input to the filter bank.
The time frequency transform unit 21 may calculate the time frequency signal of each channel by performing another time frequency transform process, such as a wavelet transform or a high-speed Fourier transform, for each certain section.
Each time the time frequency transform unit 21 calculates the time frequency signal of each channel, the time frequency transform unit 21 outputs the time frequency signal to the grid generation unit 22, the grid power calculation unit 23, and the auxiliary information calculation unit 25.
The grid generation unit 22 sets a grid for each channel. For this purpose, the grid generation unit 22 includes a power calculation unit 31, a transient detection unit 32, a transient time correction unit 33, and a grid determination unit 34.
The power calculation unit 31 calculates power at each time with respect to each channel, that is, power for each sampling point in the time axis of the time frequency signal. For example, the power calculation unit 31 calculates power in accordance with the following equation.
where L(k, n) denotes the time frequency signal of the n-th sampling point in the frequency band k of the left side channel, and R(k, n) denotes the time frequency signal of the n-th sampling point in the frequency band k of the right side channel. PL(n) and PR(n) denote the powers of the n-th sampling points of the left side channel and the right side channel, respectively.
The power calculation unit 31 outputs power PL(n) and PR(n) for each sampling point with respect to each channel to the transient detection unit 32 and the transient time correction unit 33.
The transient detection unit 32 detects a transient for each channel. For this purpose, the transient detection unit 32 calculates, for each channel, the moving accumulated value of the power in the section containing a plurality of sampling points that are consecutive along the time axis. For example, the transient detection unit 32 sets the total value of the powers of three sampling points that are consecutive with respect to the left side channel and the right side channel as a moving accumulated value.
The transient detection unit 32 compares the moving accumulated value with the detection threshold value Th for each channel. When the moving accumulated value of the current sampling point is greater than the detection threshold value Th and when the moving accumulated value in the immediately previous sampling point is smaller than or equal to the detection threshold value Th, the transient detection unit 32 detects the current sampling point as a transient. The detection threshold value Th is determined in advance on the basis of, for example, the difference of the powers before and after the transient in an experimental manner. When the difference between the powers before and after the transient is −30 dBov and when the moving accumulated value is the total value of the powers of consecutive three sampling points, the detection threshold value Th may be set at −10 dBov.
By using the moving accumulated value so as to detect a transient, it is possible for the transient detection unit 32 to suppress a specific sampling point from being erroneously detected as a transient even if power becomes very large at such a sampling point as a result of noise being superposed onto an audio signal.
The transient detection unit 32 sets time t of interest to first time ‘1’ in the frame (operation S101). Next, the transient detection unit 32 calculates the moving accumulated value ΣP from time (t−m) to time t (operation S102). m denotes the section in which the moving accumulated value is calculated. For example, when the moving accumulated value ΣP is calculated on the basis of the three sampling points that are consecutive in the time direction, m=2. Furthermore, when (t−j) (j=1, 2, . . . , m) is smaller than or equal to 0, the power of the time (N−j) of the previous frame (N is the total number of sampling points in the time axis, which are contained in one frame) is used to calculate the moving accumulated value ΣP.
The transient detection unit 32 determines whether or not the moving accumulated value ΣP is greater than the detection threshold value Th (operation S103). When the moving accumulated value ΣP is greater than the detection threshold value Th (operation S103—Yes), the transient detection unit 32 detects a transient (operation S104). Then, the transient detection unit 32 notifies the transient time correction unit 33 that time t is a transient detection time.
On the other hand, when the moving accumulated value ΣP is smaller than or equal to the detection threshold value Th (operation S103—No), or after operation S104, the transient detection unit 32 determines whether or not the total number of sampling points in one frame in the time axis in which time t of interest is contained is greater than or equal to N (operation S105). When t is smaller than N (operation S105—No), the transient detection unit 32 increments time t by 1 (operation S106). Then, the transient detection unit 32 repeats processing at and subsequent to operation S101. On the other hand, when t is greater than or equal to N (operation S105—Yes), the transient detection unit 32 ends the transient detection process.
The transient detection unit 32 may calculate the moving average value of powers in place of the moving accumulated value of powers. In this case, the detection threshold value may be made to be a value such that the detection threshold value for the moving accumulated value is divided by the number of sampling points contained in the section used to calculate one moving average value. Both the moving accumulated value of the powers and the moving average value of the powers are examples of statistical values of powers.
Each time a transient is detected with respect to each channel, the transient detection unit 32 notifies the transient time correction unit 33 of the detection time (that is, the number of the sampling point detected as a transient) of the transient.
There is a case where, in the manner described above, in spite of the fact that a transient has occurred in each channel, for example, attack sound emitted from one sound source, the transient being caused by the same sound, the detection times of transients of each channel differ. In such a case, there is a risk of a pre-echo occurring in a channel in which the detection time of the transient is late. Accordingly, the transient time correction unit 33 determines whether or not the difference between the transient detection times among the channels is within a range in which the transient may be regarded as a transient caused by the same sound. When the difference between the detection times is within a range in which the transient may be regarded as a transient caused by the same sound, the transient time correction unit 33 corrects the detection time with respect to the channel in which the detection time of the transient is late, and causes the detection time to coincide with the detection time of the transient of the other channel. For this purpose, the transient time correction unit 33 temporarily stores, in an incorporated memory, the transient detection time of each channel, which has been notified from the transient detection unit 32, and the power at each time (that is, at each sampling point of the time axis), which has been received from the power calculation unit 31.
Referring to
In
As illustrated in
On the other hand, as illustrated in
The transient time correction unit 33 determines whether or not notification of the transient detection time has been given with respect to any of the channels from the transient detection unit 32 (operation S201). If notification of the transient detection time has not been given (operation S201—No), the transient time correction unit 33 repeats the process of operation S201.
On the other hand, when notification of a transient detection time is given with respect to any of the channels (operation S201—Yes), the transient time correction unit 33 temporarily stores the transient detection time and the channel in a memory included in the transient time correction unit 33. If the transient detection time of the other channel has been stored in the memory, the transient time correction unit 33 calculates the absolute value ΔTR of the difference between the transient detection times of the two channels (operation S202). For the sake of convenience, the channel in which the transient detection time has been notified in operation S201 will be referred to as a late detection channel, and the channel in which a transient has been detected earlier than the transient detection time of the late detection channel will be referred to as an early detection channel. Then, the transient time correction unit 33 determines whether or not the absolute value ΔTR of the difference is smaller than or equal to the certain threshold value Thd (operation S203). The threshold value Thd is set to, for example, the maximum value of the difference between the transient detection times for each channel, the transient being caused by the same sound. For example, when the transient detection unit 32 has calculated the moving accumulated value of the powers on the basis of a section containing three consecutive sampling points, the threshold value Thd is set to a value corresponding to the time length of the section.
When the absolute value ΔTR of the difference between the transient detection times of the two channels is greater than the certain threshold value Thd or when no transient has been detected in the other channel (operation S203—No), the transient time correction unit 33 does not correct the transient detection time. Then, the transient time correction unit 33 notifies the grid determination unit 34 of the transient detection time of each channel. Furthermore, the transient time correction unit 33 deletes, from the memory, the powers of the sampling points of respective channels, which are at the transient detection time of the early detection channel and earlier than the transient detection time of the early detection channel. After that, the transient time correction unit 33 ends the transient detection time correction process.
On the other hand, when the absolute value ΔTR of the difference between the transient detection times is smaller than or equal to the certain threshold value Thd (operation S203—Yes), the transient time correction unit 33 determines whether or not the power Ptrp of the late detection channel at the transient detection time of the early detection channel is greater than the threshold value Thp (operation S204). The threshold value Thp is a value corresponding to the power of the transient sound, and is set to, for example, a value such that the threshold value Th for detecting a transient is divided by the number of sampling points contained in the section for which the moving accumulated value is to be calculated.
When the power Ptrp of the late detection channel at the transient detection time of the early detection channel is smaller than or equal to the threshold value Thp (operation S204—No), the transient time correction unit 33 does not correct the transient detection time. Then, the transient time correction unit 33 notifies the grid determination unit 34 of the transient detection time of each channel. Furthermore, the transient time correction unit 33 deletes, from the memory, the power of the sampling point of each channel at the transient detection time of the early detection channel and earlier than the transient detection time of the early detection channel. After that, the transient time correction unit 33 ends the transient detection time correction process.
On the other hand, when the power Ptrp of the late detection channel at the transient detection time of the early detection channel is greater than the threshold value Thp (operation S204—Yes), the transient time correction unit 33 makes a correction so that the transient detection time of the late detection channel coincides with the transient detection time of the early detection channel (operation S205). Then, the transient time correction unit 33 notifies the grid determination unit 34 of the transient detection time of each channel. Then, the transient time correction unit 33 deletes the transient detection times of the early detection channel and the late detection channel from the memory. Furthermore, the transient time correction unit 33 deletes the power of the sampling point of each channel at a time earlier than the transient detection time of the detection channel, which was notified in operation S101. After that, the transient time correction unit 33 ends the transient detection time correction process.
In the case that from when the transient detection time has been notified with respect to one of the channels, no transient detection time is notified with respect to the other channel even if the threshold value Thd has passed, the transient time correction unit 33 determines that a transient has occurred in only the one channel. Then, the transient time correction unit 33 notifies the grid determination unit 34 of the transient detection time of the one channel. Then, the transient time correction unit 33 deletes, from the memory, the power of the sampling point of each channel at and earlier than the transient detection time at which notification has been given with respect to the one channel.
The grid determination unit 34 determines, for each frame, a grid for the high-frequency components for which coding is performed by the SBR coder 13 and a grid for the low frequency components for which coding is performed by the AAC coder 12. In the present embodiment, the grids are set so that the period of the grid of the high-frequency components and the period of the grid of the low frequency components become the same as each other at any timing. The grid determination unit 34 sets the grid for a non-transient sound to the preset section in which no transient has been detected in the frame of interest. The time length of the grid for a non-transient sound is, for example, about 50 msec.
Furthermore, when a transient has been detected in the frame of interest, the grid determination unit 34 sets the transient detection time to the boundary between two grids, which are consecutive along the time axis. Then, the grid determination unit 34 sets the grid for a transient sound, in which the transient detection time is set as a start time. The time length of the grid for a transient sound is shorter than the time length of the grid for a non-transient sound. For example, the grid determination unit 34 sets the time length of the grid for a transient sound to about 5 msec to about 20 msec. The grid immediately before the transient detection time differs depending on whether or not the transient has been detected earlier than the detection time. For example, if another transient has been detected within a certain period before the detection time of the transient of interest, the grid immediately before the detection time of the transient of interest also becomes a grid for a transient sound. The certain period is equal to, for example, the time length of the grid for a transient sound. On the other hand, if another transient has not been detected within the certain period immediately before the detection time of the transient of interest, the grid immediately before the detection time of the transient of interest becomes a grid for a non-transient sound.
The grid is set for each channel. However, when the transient detection time of any of the channels has been corrected by the transient time correction unit 33, the transient detection times of the left and right channels coincide with each other. As a consequence, the grid for a transient sound starts from the same transient detection time with respect to either channel.
The grid determination unit 34 notifies the period of the grids for the high-frequency components and the low frequency components for each channel, and grid information indicating the start time to the grid power calculation unit 23, the auxiliary information calculation unit 25, and the multiplexing unit 27.
The grid power calculation unit 23 calculates the power for each grid with respect to each channel. For example, as illustrated in
where L(k, n) is the time frequency signal of the n-th sampling point in the frequency band k of the left side channel, and R(k, n) is the time frequency signal of the n-th sampling point in the frequency band k of the right side channel. tgs and tge are the first sampling point corresponding to the start time of the grid, and the last sampling point corresponding to the end time of the grid, respectively. fs is the sampling point in the frequency direction corresponding to the lowest frequency of the high-frequency components to be coded by the SBR coder 13. PgLl(n) and PgLh(n) are the powers of the low frequency components and the high-frequency components of the left side channel, respectively. Similarly, PgRl(n) and PgRh(n) are the powers of the low frequency components and the high-frequency components of the right side channel, respectively.
The grid power calculation unit 23 outputs the powers PgLl(n), PgLh(n), PgRl(n), and PgRh(n) for each grid with respect to each channel to the power quantization unit 24 and the auxiliary information calculation unit 25.
The power quantization unit 24 quantizes the powers PgLl(n) and PgRl(n) of the grids of the low frequency components, which are received from the grid power calculation unit 23 by using the, for example, a quantization coefficient that is determined according to the target code amount that is determined in accordance with a transmission bit rate. In the power quantization unit 24, for example, a quantization width that becomes wider as the quantization coefficient increases is set, and power for each grid is quantized at the quantization width. Then, the power quantization unit 24 outputs the quantized power for each grid to the multiplexing unit 27.
The auxiliary information calculation unit 25 calculates auxiliary information that is used to reproduce high-frequency components from the low frequency components on the basis of the powers of the grids of the low frequency components and the high-frequency components of each channel, and the time frequency signal. The auxiliary information contains, for example, with respect to each frequency band and each time period, which are contained in the grid of the high-frequency components, position information indicating the frequency band and the time period of the low frequency components from which reproduction is made, and an electric power adjustment parameter for adjusting the electric power of the high-frequency components. In addition, the auxiliary information contains information indicating the frequency band and the time period in the high-frequency components that is difficult to be reproduced from the low frequency components, and information indicating the power of the frequency band and the time period.
As is disclosed in, for example, Japanese Laid-open Patent Publication No. 2008-224902, the auxiliary information calculation unit 25 calculates auxiliary information in accordance with the SBR coding method. For example, with respect to the grid of interest of the high-frequency components of each channel, the auxiliary information calculation unit 25 compares the time frequency signal of each frequency band and time period within the grid with the time frequency signal in the grid of the low frequency components, which is set in the same period as the period of the grid of interest. Then, on the basis of the comparison result, the auxiliary information calculation unit 25 determines the position information on the basis of the frequency band and the time period of the low frequency components that are strongly correlated to the frequency band and the time period of the high-frequency components. Furthermore, the auxiliary information calculation unit 25 obtains the frequency band and the time period that is difficult to be reproduced from the low frequency components. In addition, the auxiliary information calculation unit 25 obtains the ratio of the power of the grid of interest of the high-frequency components of each channel to the power of the grid of the low frequency components from which reproduction is made, and calculates the electric power adjustment parameter in accordance with the ratio.
The auxiliary information calculation unit 25 outputs the auxiliary information to the auxiliary information quantization unit 26.
The auxiliary information quantization unit 26 quantizes the auxiliary information by using the quantization coefficient that is determined according to the target code amount that is determined in accordance with the transmission bit rate. By setting, for example, the quantization width that becomes wider as the quantization coefficient increases, the auxiliary information quantization unit 26 quantizes the auxiliary information at the quantization width. Then, the auxiliary information quantization unit 26 outputs the quantized auxiliary information to the multiplexing unit 27.
The multiplexing unit 27 codes the grid information, the quantized power of each grid, and the quantized auxiliary information in accordance with a variable length coding method, such as arithmetic coding or Huffman coding. Then, the multiplexing unit 27 arranges those pieces of variable-length-coded information in accordance with a certain data output format so as to be multiplexed. This multiplexed data is referred to as SBR data. The certain data output format is, for example, an MPEG-4 ADTS (Audio Data Transport Stream) format which will be described later, and the information that is variable-length-coded in accordance with the arrangement of the SBR data, which is specified in MPEG-4 ADTS, is arranged. The multiplexing unit 27 outputs the SBR data to the bit stream generation unit 14.
The bit stream generation unit 14 multiplexes the AAC data received from the AAC coder 12 and the SBR data received from the SBR coder 13 by arranging them in accordance with a certain order. Then, the bit stream generation unit 14 outputs the bit stream that is generated as a result of the multiplexing.
The down-sampling unit 11 extracts low frequency components by down-sampling the signal of each channel (operation S301). The down-sampling unit 11 outputs the low frequency components of each channel to the AAC coder 12. The AAC coder 12 codes the low frequency components of each channel in accordance with the AAC coding method (operation S302). Then, the AAC coder 12 outputs the AAC data obtained as a result of the coding to the bit stream generation unit 14.
Additionally, the signal of each channel of the audio signal is also input to the SBR coder 13. Then, the time frequency transform unit 21 of the SBR coder 13 performs a time frequency transform on the signal of the time domain of each channel (operation S303). The time frequency transform unit 21 outputs the time frequency signal of each channel, which is obtained as a result of the time frequency transform, to the grid generation unit 22, the grid power calculation unit 23, and the auxiliary information calculation unit 25.
The power calculation unit 31 of the grid generation unit 22 calculates power at each time with respect to each channel (operation S304). Then, the power calculation unit 31 outputs the power of each channel at each time to the transient detection unit 32 and the transient time correction unit 33 of the grid generation unit 22. The transient detection unit 32 performs a transient detection process for each channel (operation S305). When the transient detection unit 32 detects a transient, the transient detection unit 32 notifies the transient time correction unit 33 of the transient detection time.
The transient time correction unit 33 performs a transient detection time correction process (operation S306). When the transient time correction unit 33 has corrected the transient detection time with respect to any of the channels, the transient time correction unit 33 notifies the grid determination unit 34 of the grid generation unit 22 of the transient detection time after the correction. Furthermore, with respect to the channel in which the transient detection time has not been corrected, the transient time correction unit 33 notifies the grid determination unit 34 of the transient detection time that has been detected by the transient detection unit 32.
The grid determination unit 34 determines the grid of each channel (operation S307). In that case, the grid determination unit 34 sets a grid for a non-transient sound with respect to the section in which a transient has not been detected within the frame. On the other hand, if the transient has been detected, the grid determination unit 34 sets a grid for a transient sound, which is shorter than the grid for a non-transient sound, by using the transient detection time as a start time. The grid determination unit 34 notifies the grid information indicating the set grid to the grid power calculation unit 23, the auxiliary information calculation unit 25, and the multiplexing unit 27.
When the grid power calculation unit 23 is notified of the grid information, the grid power calculation unit 23 calculates power for each grid and quantizes the power for each grid (operation S308). Then, the power quantization unit 24 outputs the quantized power for each grid to the multiplexing unit 27. Furthermore, when the auxiliary information calculation unit 25 is notified of the grid information, the auxiliary information calculation unit 25 calculates the auxiliary information, and the auxiliary information quantization unit 26 quantizes the auxiliary information (operation S309). Then, the auxiliary information quantization unit 26 outputs the quantized auxiliary information to the multiplexing unit 27. The multiplexing unit 27 multiplexes the grid information, the quantized power for each grid, and the quantized auxiliary information so as to generate SBR data (operation S310). Then, the multiplexing unit 27 outputs the SBR data to the bit stream generation unit 14.
The bit stream generation unit 14 multiplexes the SBR data and the AAC data, and thereby generates a bit stream in which the coded audio data is stored (operation S311). After that, the audio coding device 1 ends the coding process.
The processing of operations S301 and S302 and the processing of operations S303 to S310 may be performed in parallel.
The audio signal that is coded by the audio coding device 1 may be reproduced by an audio decoding device corresponding to the SBR coding method, for example, an audio decoding device in compliance with MPEG-4 HE-AAC.
With reference to
As illustrated in the graphs 901 and 902, at time tr, transients, which are caused by the same sound, have occurred in both the left side channel and the right side channel. In comparison, in the reproduction signal of the audio signal that has been coded by the method disclosed in Japanese National Publication of International Patent Application No. 2003-529787, in the right side channel, the signal intensity in the time-frequency domain 913 before time tr is stronger than the original sound. That is, a pre-echo has occurred in the time-frequency domain 913. Furthermore, in the reproduction signal of the audio signal that has been coded by the method disclosed in Japanese Laid-open Patent Publication No. 2006-3580, in the left side channel and the right side channel, the signal intensity in the time-frequency domains 923 and 924 before time tr is stronger than that of the original sound. That is, a pre-echo has occurred in the time-frequency domains 923 and 924. As described above, in the audio coding method of the related art, a pre-echo occurs, and as a result, reproduction sound quality deteriorates.
In comparison, in the reproduction signal of the audio signal that has been coded by the audio coding device 1, it may be seen that the signal intensity of each frequency immediately before time tr is almost equal to the signal intensity of each frequency immediately before time tr in the original sound, and a pre-echo has not occurred.
As has been described in the foregoing, when the detection time of the transient for each channel is different, the audio coding device determines whether or not the transient of each channel is caused by the same sound. When the audio coding device determines that the transient of each channel is caused by the same sound, the audio coding device makes a correction so that the transient detection time of the late detection channel coincides with the transient detection time of the early detection channel. As a consequence, it is possible for the audio coding device to set a grid for a transient sound by using a transient that has been detected at the earliest time as a reference with respect to each channel. Thus, it is possible to suppress a pre-echo from occurring in a channel in which the detection time is late. As a result, it is possible for the audio coding device to improve reproduction sound quality.
The present invention is not limited to the above-described embodiment. According to a modification, the transient time correction unit may determine whether or not the transient detection time of the late detection channel may be corrected on the basis of the difference between detection times of transients between channels regardless of the power of the late detection channel. For example, if the absolute value of the difference between transient detection times between channels is less than a certain time period, the transient time correction unit may make a correction so that the transient detection time of the late detection channel coincides with the transient detection time of the early detection channel. This certain time period is the maximum value of the difference between the transient detection times, in which the transient of each channel may be regarded as being caused by the same sound, and is set to, for example, the threshold value Thd in the above-described embodiment.
According to another modification, the transient time correction unit may determine the threshold value Thp in operation S204 in the operation flowchart of the transient detection time correction process illustrated in
Alternatively, in operation S204, the transient time correction unit may compare the powers in the transient detection times of each channel with each other instead of comparing the power of the late detection channel at the transient detection time of the early detection channel with the threshold value Thp. In this case, if, for example, the ratio of the power at the transient detection time of the late detection channel to that at the transient detection time of the early detection channel is greater than ¼ to ½, it is sufficient that the transient time correction unit corrects the transient detection time of the late detection channel.
According to these modifications, it is possible for the transient time correction unit to correct the transient detection time by comparing the powers of both the channels with each other. Consequently, it is possible to accurately determine whether or not the difference in the transient detection times between the channels has been caused by the same sound.
The audio signal to be coded is not limited to a stereo audio signal, and may be an audio signal having a plurality of channels. For example, the audio signal to be coded may be made to be a 3.1 ch or 5.1 ch audio signal. When the number of channels of the audio signal to be coded is 3 or more, the audio coding device obtains the earliest time among the transient detection times of each channel. Then, the audio coding device may perform the transient detection time correction process between the channel corresponding to the earliest transient detection time and the other channels.
A computer program for causing a computer to realize the functions of each unit included in the audio coding device according to the embodiment or the modification may be provided in such a manner as to be stored on a recording medium, such as a semiconductor memory, a magnetic recording medium, or an optical recording medium.
Furthermore, the audio coding device according to the above-described embodiment or modification is mounted in various devices, such as a computer, a video signal recorder, and a video transmission device, which are used to transmit or record an audio signal.
The video obtaining unit 101 includes an interface circuit through which a moving image signal is obtained from another device, such as a video camera. Then, the video obtaining unit 101 passes the moving image signal that has been input to the video transmission device 100 to the video coding unit 103.
The audio obtaining unit 102 includes an interface circuit through which an audio signal is obtained from another device, such as a microphone. Then, the audio obtaining unit 102 passes the audio signal that has been input to the video transmission device 100 to the audio coding unit 104.
The video coding unit 103 codes the moving image signal in order to compress the amount of data of the moving image signal. For this purpose, the video coding unit 103 codes a moving image signal in accordance with a moving image coding standard, such as, for example, MPEG-2, MPEG-4, or H.264 MPEG-4 Advanced Video Coding (H.264 MPEG-4 AVC). Then, the video coding unit 103 outputs the coded moving image data to the multiplexing unit 105.
The audio coding unit 104 includes the audio coding device according to the above-described embodiment or the modification thereof. The audio coding unit 104 codes the audio signal in accordance with the embodiment or the modification thereof described above. Then, the audio coding unit 104 outputs the coded audio data to the multiplexing unit 105.
The multiplexing unit 105 multiplexes the coded moving image data and the coded audio data. Then, the multiplexing unit 105 generates a stream in compliance with a certain format for the transmission of video data, such as an MPEG-2 transport stream.
The multiplexing unit 105 outputs the stream in which the coded moving image data and the coded audio data have been multiplexed to the communication processing unit 106.
The communication processing unit 106 divides the stream in which the coded moving image data and the coded audio data have been multiplexed into packets in compliance with a certain communication standard, such as TCP/IP. Furthermore, the communication processing unit 106 attaches a certain header in which destination information or the like is stored to each packet. Then, the communication processing unit 106 passes the packets to the output unit 107.
The output unit 107 includes an interface circuit for connecting the video transmission device 100 to a communication line. Then, the output unit 107 outputs the packets received from the communication processing unit 106 to the communication line.
The control unit 1001 is a CPU in a computer, which performs control of each device, and computations and processing of data. The control unit 1001 is also an arithmetic operation device that executes a program stored in the main storage unit 1002 or the auxiliary storage unit 1003. After the control unit 1001 receives data from the input unit 1007 or the storage device, the control unit 1001 performs computations and processing thereof, and outputs the results to the display unit 1008, the storage device, and the like.
The main storage unit 1002 is formed of a read only memory (ROM), a random access memory (RAM), or the like. The main storage unit 1002 is a storage device for temporarily storing programs, such as an OS that is basic software, and application software, which are executed by the control unit 1001, and data.
The auxiliary storage unit 1003 is a hard disk drive (HDD) or the like, and is a storage device for storing data associated with application software or the like.
The drive device 1004 reads a program from the recording medium 1005, for example, a flexible disk, and installs the program in the storage device.
Furthermore, a certain program is stored on the recording medium 1005. The program stored on the recording medium 1005 is installed into the audio coding device 1000 through the drive device 1004. The installed certain program becomes executable by the audio coding device 1000.
The network I/F unit 1006 is an interface between peripheral devices and the audio coding device 1000 having a communication function, which are connected through a network, such as a local area network (LAN) or a wide area network (WAN), which is constructed of data transmission paths, such as a wired line and/or a wireless line.
The input unit 1007 includes a keyboard having cursor keys, numeral input keys, and various function keys, and the like, a mouse for making a selection of keys, a slice putt or the like on the display screen of the display unit 1008. Furthermore, the input unit 1007 is a user interface through which a user gives an operation instruction to the control unit 1001 and inputs data.
The display unit 1008 is constituted by a cathode ray tube (CRT), a liquid crystal display (LCD), or the like, and performs display corresponding to display data input from the control unit 1001.
As described above, the audio coding process described in the embodiment described above may be implemented as a program to be executed by a computer. By installing this program from a server or the like and causing a computer to execute the program, the audio coding process described above may be realized.
Furthermore, this program may be recorded on the recording medium 1005, and the recording medium 1005 having the program recorded thereon is read by a computer and a mobile terminal, so that the audio coding process described above may be realized. Various types of recording media may be used for the recording medium 1005. Examples thereof include a recording medium on which information is optically, electrically, or magnetically recorded, like a CD-ROM, a flexible disk, or a magneto-optical disc, a ROM, a semiconductor memory in which information is electrically recorded like a flash memory, or the like. Furthermore, the audio coding process described in each of the above-described embodiments may be mounted on one or more integrated circuits.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2011-045171 | Mar 2011 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
7020615 | Vafin et al. | Mar 2006 | B2 |
20060136229 | Kjoerling et al. | Jun 2006 | A1 |
20060256971 | Chong et al. | Nov 2006 | A1 |
20070016405 | Mehrotra et al. | Jan 2007 | A1 |
20080120116 | Schnell et al. | May 2008 | A1 |
20080219344 | Suzuki et al. | Sep 2008 | A1 |
20090228285 | Schnell et al. | Sep 2009 | A1 |
20110202358 | Neuendorf et al. | Aug 2011 | A1 |
20120051549 | Nagel et al. | Mar 2012 | A1 |
20120224703 | Kishi et al. | Sep 2012 | A1 |
Number | Date | Country |
---|---|---|
2003-529787 | Oct 2003 | JP |
2006-3580 | Jan 2006 | JP |
2008-224902 | Sep 2008 | JP |
WO 0126095 | Apr 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20120224703 A1 | Sep 2012 | US |