This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2013-259524 filed on Dec. 16, 2013, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to, for example, audio encoding devices, audio coding methods, audio coding programs, and audio decoding devices.
Audio signal coding methods of compressing the data amount of a multi-channel audio signal having three or more channels have been developed. As one of such coding methods, the MPEG Surround method standardized by Moving Picture Experts Group (MPEG) is known. In the MPEG Surround method, for example, an audio signal of 5.1 channels (5.1 ch) to be encoded is subjected to time-frequency transformation, and a frequency signal thus obtained is downmixed to once generate a three-channel frequency signal. Further, the three-channel frequency signal is downmixed again to calculate a frequency signal corresponding to a two-channel stereo signal. Then, the frequency signal corresponding to the stereo signal is encoded by the Advanced Audio Coding (MC) coding method, and if desirable, by the Spectral band replication (SBR) coding method. On the other hand, in the MPEG Surround method, when a signal of 5.1 channels is downmixed to produce a signal of three channels, or when a signal of three channels is downmixed to produce a signal of two channels, spatial information representing sound spread and localization and a residual signal is calculated and then encoded. In such a manner, the MPEG Surround method encodes a stereo signal generated by downmixing a multi-channel audio signal and spatial information having less data amount. Thus, the MPEG Surround method provides compression efficiency higher than the efficiency obtained by independently coding signals of channels contained in the multi-channel audio signal. A technique relating to coding of the multi-channel audio signal is disclosed, for example, in Japanese Laid-open Patent Application No. 2012-141412.
The residual signal described above is a signal representing an error component in the downmixing. Since an error in the downmixing may be corrected by using the residual signal during decoding, an audio signal yet subjected to the downmixing may be reproduced accurately.
In accordance with an aspect of the embodiments, an audio encoding device includes a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: mixing a channel signal of a first number included in a plurality of channels contained in an audio signal as a downmix signal of a second number; calculating a residual signal representing an error between the downmix signal and the channel signal of the first number; determining a window length of the downmix signal; and performing orthogonal transformation of the downmix signal and the residual signal based on the window length.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
These and/or other aspects and advantages will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawing of which:
Hereinafter, examples of an audio encoding device, an audio coding method and an audio coding computer program as well as an audio decoding device according to an embodiment are described in detail based on the accompanying drawings. The examples do not limit the disclosed technology.
These components included in the audio encoding device 1 are formed as separate hardware circuits using wired logic, for example. Alternatively, these components included in the audio encoding device 1 may be implemented in the audio encoding device 1 as a single integrated circuit in which circuits corresponding to the respective components are integrated. The integrated circuit may be an integrated circuit such as, for example, application specific integrated circuit (ASIC) and field programmable gate array (FPGA). Further, these components included in the audio encoding device 1 may be function modules which are achieved by a computer program executed on a processor included in the audio encoding device 1.
The time-frequency transformation unit 11 is configured to transform signals of the respective channels (for example, signals of 5.1 channels) in the time domain of a multi-channel audio signal entered into the audio encoding device 1 to frequency signals of the respective channels by time-frequency transformation on the frame by frame basis. In Embodiment 1, the time-frequency transformation unit 11 transforms signals of the respective channels to frequency signals by using a Quadrature Mirror Filter (QMF) of the following equation.
Here, “n” is a variable representing an nth time when the audio signal in one frame is divided into 128 parts in the time direction. The frame length may be, for example, any value of 10 to 80 msec. “k” is a variable representing a kth frequency band when a frequency band of a frequency signal is divided into 64 parts. QMF(k,n) is QMF for outputting a frequency signal having the time “n” and the frequency “k”. The time-frequency transformation unit 11 generates a frequency signal of an entered channel by multiplying QMF (k,n) by an audio signal for one frame of the channel. The time-frequency transformation unit 11 may transform signals of the respective channels to frequency signals through separate time-frequency transformation processing such as fast Fourier transform, discrete cosine transform, and modified discrete cosine transform.
Every time calculating a frequency signal of a channel on the frame by frame basis, the time-frequency transformation unit 11 outputs the frequency signal (for example, left front channel frequency signal L(k,n), left rear channel frequency signal SL(k,n), right front channel frequency signal R(k,n), right rear channel frequency signal SR(k,n), center-channel frequency signal C(k,n), and deep bass sound channel frequency signal LFE(k,n)) to the first downmix unit 12 and the calculation unit 15.
The first downmix unit 12 is configured to generate left-channel, center-channel and right-channel frequency signals by downmixing frequency signals of the respective channels every time receiving these signals from the time-frequency transformation unit 11. In other words, the first downmix unit 12 mixes a signal of a first number included in multiple channels contained in the audio signal as a downmix signal of a second number. Specifically, the first downmix unit 12 calculates, for example, frequency signals of the following three channels in accordance with the following equation.
L
in(k,n)=Lin Re(k,n)+j·Lin Im(k,n) 0≦k<64, 0≦n<128
L
in Re(k,n)=LRe(k,n)+SLRe(k,n)
L
in Im(k,n)=LIm(k,n)+SLIm(k,n)
R
in(k,n)=Rin Re(k,n)+j·Rin Im(k,n) 0≦k<64, 0≦n<128
R
in Re(k,n)=RRe(k,n)+SRRe(k,n)
R
in Im(k,n)=RIm(k,n)+SrIm(k,n)
C
in(k,n)=Cin Re(k,n)+j·Cin Im(k,n) 0≦k<64, 0≦n<128
C
in Re(k,n)=CRe(k,n)+LFERe(k,n)
C
in Im(k,n)=CIm(k,n)+LFEIm(k,n) (Equation 2)
In Equation 2, LRe(k,n) represents a real part of the left front channel frequency signal L(k,n), and LIm(k,n) represents an imaginary part of the left front channel frequency signal L(k,n). SLRe(k,n) represents a real part of the left rear channel frequency signal SL(k,n), and SLIm(k,n) represents an imaginary part of the left rear channel frequency signal SL(k,n). Lin(k,n) is a left-channel frequency signal generated by downmixing. LinRe(k,n) represents a real part of the left-channel frequency signal, and LinIm(k,n) represents an imaginary part of the left-channel frequency signal.
Similarly, RRe(k,n) represents a real part of the right front channel frequency signal R(k,n), and RIm(k,n) represents an imaginary part of the right front channel frequency signal R(k,n). S RRe(k,n) represents a real part of the right rear channel frequency signal SR(k,n), and SRIm(k,n) represents an imaginary part of the right rear channel frequency signal SR(k,n). Rin(k,n) is a right-channel frequency signal generated by downmixing. RinRe(k,n) represents a real part of the right-channel frequency signal, and RinIm(k,n) represents an imaginary part of the right-channel frequency signal.
Further, CRe(k,n) represents a real part of the center-channel frequency signal C(k,n), and CIm(k,n) represents an imaginary part of the center-channel frequency signal C(k,n). LFERe(k,n) represents a real part of the deep bass sound channel frequency signal LFE(k,n), and LFEIm(k,n) represents an imaginary part of the deep bass sound channel frequency signal LFE(k,n). Cin(k,n) represents a center-channel frequency signal generated by downmixing. Further, CinRe(k,n) represents a real part of the center-channel frequency signal Cin(k,n), and CinIm(k,n) represents an imaginary part of the center-channel frequency signal Cin(k,n).
The first downmix unit 12 calculates, on the frequency band basis, an intensity difference between frequency signals of two channels to be downmixed, and a similarity between the frequency signals, as spatial information between the frequency signals. The intensity difference is information representing the sound localization, and the similarity turns information representing the sound spread. The spatial information calculated by the first downmix unit 12 is an example of three-channel spatial information. In Embodiment 1, the first downmix unit 12 calculates, for example, an intensity difference CLDL(k) and a similarity ICCL(k) in a frequency band k of the left channel in accordance with the equations given below.
where “N” represents the number of clockwise samples contained in one frame. In Embodiment 1, “N” is 128. eL(k) represents an autocorrelation value of left front channel frequency signal L(k,n), and eSL(k) is an autocorrelation value of left rear channel frequency signal SL(k,n). eLSL(k) represents a cross-correlation value between the left front channel frequency signal L(k,n) and the left rear channel frequency signal SL(k,n).
Similarly, the first downmix unit 12 calculates an intensity difference CLDR(k) and a similarity ICCR(k) in a frequency band k of the right channel in accordance with the equations given below.
where eR(k) represents an autocorrelation value of the right front channel frequency signal R(k,n), and eSR(k) is an autocorrelation value of the right rear channel frequency signal SR(k,n). eRSR(k) represents a cross-correlation value between the right front channel frequency signal R(k,n) and the right rear channel frequency signal SR(k,n).
Further, the first downmix unit 12 calculates an intensity difference CLDC(K) in a frequency band k of the center channel in accordance with the following equation.
where eC(k) represents an autocorrelation value of the center-channel frequency signal C(k,n), and eLFE(k) is an autocorrelation value of the deep bass sound channel frequency signal LFE(k,n). Intensity differences CLDL(k), CLDR(k) and CLDC(k), and similarities ICCL(k) and ICCR(k) calculated by the first downmix unit 12 may be collectively referred to as first spatial information SAC(k) for the sake of convenience.
The first downmix unit 12 outputs the left-channel frequency signal Lin(k,n), the right-channel frequency signal Rin(k,n), and the center-channel frequency signal Cin(k,n), which are generated by downmixing, to the second downmix unit 13, and outputs the first spatial information SAC(k) to the spatial information encoding unit 14 and the calculation unit 15.
The second downmix unit 13 receives three-channel frequency signals including the left-channel frequency signal Lin(k,n), the right-channel frequency signal Rin(k,n), and the center-channel frequency signal Cin(k,n), respectively generated by the first downmix unit 12. The second downmix unit 13 generates a left frequency signal in the stereo frequency signal by downmixing the left-channel frequency signal and the center-channel frequency signal out of the three-channel frequency signal. Further, the second downmix unit 13 generates a right frequency signal in the stereo frequency signal by downmixing the right-channel frequency signal and the center-channel frequency signal. The second downmix unit 13 generates, for example, a left frequency signal L0(k,n) and a right frequency signal R0(k,n) in the stereo frequency signal in accordance with the following equation. Further, the first downmix unit 12 calculates, for example, a center-channel signal C0(k,n) utilized for selecting a predictive coefficient contained in the codebook according to the following equation.
In Equation 8, Lin(k,n), Rin(k,n), and Cin(k,n) are respectively left-channel, right-channel, and center-channel frequency signals generated by the first downmix unit 12. The left frequency signal L0(k,n) is a synthesis of left front channel, left rear channel, center-channel and deep bass sound frequency signals of an original multi-channel audio signal. Similarly, the right frequency signal R0(k,n) is a synthesis of right front channel, right rear channel, center-channel and deep bass sound frequency signals of the original multi-channel audio signal. The left frequency signal L0(k,n) and the right frequency signal R0(k,n) in Equation 8 may be expanded as follows:
The second downmix unit 13 selects a predictive coefficient from the codebook for frequency signals of two channels to be downmixed by the second downmix unit 13, as appropriate. For example, when performing predictive coding of the center-channel signal C0(k,n) from the left frequency signal L0(k,n) and the right frequency signal R0(k,n), the second downmix unit 13 generates a two-channel stereo frequency signal by downmixing the right frequency signal R0(k,n) and the left frequency signal L0(k,n). When performing predictive coding, the second downmix unit 13 selects, from the codebook, predictive coefficients C1(k) and C2(k) such that an error d(kn) between a frequency signal before predictive coding and a frequency signal after predictive coding becomes minimum, the error being defined on the frequency band basis in the following equations with C0(k,n), L0(k,n), and R0(k,n). In such a manner, the second downmix unit 13 may perform predictive coding of the center-channel signal C′0(k,n) subjected to predictive coding.
Equation 10 may be expressed as follows by using real and imaginary parts.
C′
0(k,n)=C′0Re(k,n)+C′0Im(k,n)
C′
0Re(k,n)=c1×L0Re(k,n)+c2×R0Re(k,n)
C′
0Im(k,n)=c1×L0Im(k,n)+c2×R0Im(k,n) (Equation 11)
L0Re(k,n), L0Im(k,n), R0Re(k,n), and R0Im(k,n) represent a real part of L0(k,n), an imaginary part of L0(k,n), a real part of R0(k,n), and an imaginary part of R0(k,n), respectively.
As described above, the second downmix unit 13 may perform predictive coding of the center-channel signal C0(k,n) by selecting, from the codebook, predictive coefficients C1(k) and C2(k) such that the error d(kn) between a center-channel frequency signal C0(k,n) before predictive coding and a center-channel frequency signal C′0(k,n) after predictive coding becomes minimum. Equation 10 represents this concept in the form of the equation.
By using predictive coefficients C1(k) and C2(k) contained in the codebook, the second downmix unit 13 refers to a quantization table (codebook) indicating a correspondence relationship between representative values of predictive coefficients C1(k) and C2(k) held by the second downmix unit 13 and index values. Then, the second downmix unit 13 determines index values most close to predictive coefficients C1(k) and C2(k) for the respective frequency bands by referring to the quantization table. Here, a specific example is described.
Next, the second downmix unit 13 determines a differential value between indexes in the frequency direction for frequency bands. For example, when an index value relative to a frequency band k is 2 and an index value relative to a frequency band (k−1) is 4, the second downmix unit 13 determines that the differential value of the index relative to the frequency band k is −2.
Next, the second downmix unit 13 refers to a coding table indicating a correspondence relationship between the differential value of indexes and predictive coefficient codes. Then, the second downmix unit 13 determines a predictive coefficient code idxcm(k)(m=1,2) of the predictive coefficient cm(k)(m=1,2) relative to a differential value of frequency bands k by referring to the coding table. Like the similarity code, the predictive coefficient code may be a variable length code having a shorter code length for a differential value of higher appearance frequency, such as, for example, the Huffman coding or the arithmetic coding. The quantization table and the coding table are stored in advance in an unillustrated memory in the second downmix unit 13. In
The predictive coefficient code idxcm(k)(m=1,2) may be referred to as second spatial information.
The second downmix unit 13 may perform predictive coding based on the energy ratio instead of predictive coding based on the predictive coefficient mentioned above. The second downmix unit 13 calculates, according to the following equation, intensity differences CLD1(k) and CLD2(k) relative to three channel frequency signals including the left-channel frequency signal Lin(k,n), the right-channel frequency signal Rin(k,n), and the center-channel frequency signal Cin(k,n), respectively generated by the first downmix unit 12.
The second downmix unit 13 outputs intensity differences CLD1(k) and CLD2(k) relative to three channel frequency signals to the spatial information encoding unit 14. Intensity differences CLD1(k) and CLD2(k) may be referred to as second spatial information instead of the predictive coefficient code idxcm(k)(m=1,2). The second downmix unit 13 outputs the left frequency signal L0(k,n) and the right frequency signal R0(k,n) to the frequency-time transformation unit 16. In other words, any two channel signals including a first channel signal and a second channel signal included in multiple channels (5.1 ch) contained in the audio signal are mixed as a downmix signal by the first downmix unit 12 or the second downmix unit 13.
The spatial information encoding unit 14 generates a MPEG Surround code (hereinafter, referred to as a spatial information code) from first spatial information received from the first downmix unit 12 and second spatial information received from the second downmix unit 14.
The spatial information encoding unit 14 refers to the quantization table indicating a correspondence relationship between similarity values in first and second spatial information and index values. Then, the spatial information encoding unit 14 determines an index value most close to the similarity ICCi(k)(i=L,R) for the respective frequency bands by referring to the quantization table. The quantization table may be stored in advance in an unillustrated memory in the spatial information encoding unit 14, and so on.
Next, the spatial information encoding unit 14 determines a differential value between indexes in the frequency direction for frequency bands. For example, when an index value relative to a frequency band k is 3 and an index value relative to a frequency band (k−1) is 0, the spatial information encoding unit 14 determines that the differential value of the index relative to the frequency band k is 3.
The spatial information encoding unit 14 refers to a coding table indicating a correspondence relationship between the differential value of indexes and predictive coefficient codes. Then, the spatial information encoding unit 14 determines the similarity code idxicci(k)(i=L,R) of the similarity ICCi(k)(i=L,R) relative to the differential value between indexes for frequencies by referring to the coding table. The coding table is stored in advance in a memory in the spatial information encoding unit 14, and so on. The similarity code may be a variable length code having a shorter code length for a differential value of higher appearance frequency, such as, for example, the Huffman coding or the arithmetic coding.
The spatial information encoding unit 14 refers to a quantization table indicating a correspondence relationship between the intensity differential value and the index value. Then, the spatial information encoding unit 14 determines index values most close to the intensity difference CLDj(k)(j=L,R,C,1,2) for the respective frequency bands by referring to the quantization table. The spatial information encoding unit 14 determines a differential value between indexes in the frequency direction for frequency bands. For example, when an index value relative to a frequency band k is 2 and an index value relative to a frequency band (k−1) is 4, the spatial information encoding unit 14 determines that the differential value of the index relative to the frequency band k is −2.
The spatial information encoding unit 14 refers to a coding table indicating a correspondence relationship between the index-to-index differential value and the intensity code. Then, the spatial information encoding unit 14 determines the intensity difference code idxcldj(k)(j=L,R,C,1,2) relative to the differential value of the intensity difference CLDj(k) for frequency bands k by referring to the coding table. The intensity difference code may be a variable length code having a shorter code length for a differential value of higher appearance frequency, such as, for example, the Huffman coding or the arithmetic coding. The quantization table and the coding table may be stored in advance in a memory in the spatial information encoding unit 14.
The spatial information encoding unit 14 generates the similarity code idxiccj(k), the intensity difference code idxcldj, and if desirable, the spatial information code by using the predictive coefficient code idxcm(k). For example, the spatial information encoding unit 14 generates the similarity code idxicci(k) and the intensity difference code idxcldj, and if desirable, also generates the spatial information code by arranging the predictive coefficient code idxcm(k) in a predetermined sequence. The predetermined sequence is described, for example, in ISO/IEC23003-1:2007. The spatial information encoding unit 14 outputs the generated spatial information code to the multiplexing unit 19.
The calculation unit 15 receives channel frequency signals (the left front channel frequency signal L(k,n), the left rear channel frequency signal SL(k,n), the right front channel frequency signal R(k,n), and the right rear channel frequency signal SR(k,n)) from the time-frequency transformation unit 11. The calculation unit 15 also receives first spatial information SAC(k) from the first downmix unit 12. The calculation unit 15 calculates, for example, a left-channel residual signal resL(k,n) in accordance with the following equation, from the left front channel frequency signal L(k,n), the left rear channel frequency signal SL(k,n), and the first spatial information SAC(k).
In Equation 13, CLCPL and ICCPL may be calculated in accordance with the following equations.
CLD
p(n)=(1−γ(n))×CLDL-prev(k)+γ(n)×CLDL-cur(k)
ICC
p(n)=(1−γ(n))×ICCL-prev(k)+γ(n)×ICCL-cur(k)
γ(n)=(n+1)/M=(n+1)/31 (Equation 14)
In Equation 14, “n” represents the time, and “w” represents the number of time samples in the frame. CLDL-CUR represents an intensity difference CLDL(k) of a frequency band k for the left channel in a current frame, and CLDL-prev represents an intensity difference CLDL(k) of a frequency band k for the left channel in a preceding frame. CLDL-CUR represents a similarity ICCL(k) of a frequency band k for the left channel in a current frame, and ICCL-prev represents a similarity ICCL(k) of a frequency band k for the left channel in a preceding frame.
Next, the calculation unit 15 calculates a right-channel residual signal resR(k,n) from the right front channel frequency signal R(k,n), the right rear channel frequency signal SR(k,n), and the first spatial information in the same manner as the above-mentioned left-channel residual signal resL(k,n). The calculation unit 15 outputs the calculated left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) to the frequency-time transformation unit 16. In Equation 14, γ(n) represents the linear interpolation, which causes a delay corresponding to 0.5 frame time due to the following reason. As may be understood from Equations 13 and 14, the residual signal (left-channel residual signal resL(k,n) or right-channel residual signal resR(k,n)) is calculated from the first spatial information used when decoding with an input signal. The first spatial information used for decoding is calculated by performing the linear interpolation of first spatial information of Nth and (N−1)th frames outputted from the audio encoding device 1. Here, the first spatial information outputted from the audio encoding device 1 has only one value for each frame and each band (frequency band). Hence, since the first spatial information is treated as a central time position of the calculation range (frame), a delay corresponding to 0.5 frames occurs due to the interpolation. Since the delay corresponding to 0.5 frames occurs for treating the first spatial information during decoding as above, a delay corresponding to 0.5 frames also occur when the residual signal is calculated by the calculation unit 15. In other words, the calculation unit 15 calculates residual signals of any two channel signals including a first channel signal and a second channel signal included in multiple channels (5.1 ch) contained in the audio signal.
The frequency-time transformation unit 16 receives the left frequency signal L0(k,n) and the right frequency signal R0(k,n) from the second downmix unit 13. The frequency-time transformation unit 16 receives the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) from the calculation unit 15. The frequency-time transformation unit 16 transforms a frequency signal (including residual signal) to a time-domain signal every time receiving the frequency signal. For example, when the time-frequency transformation unit 11 uses a QMF, the frequency-time transformation unit 16 performs frequency-time transformation of the frequency signal by using a complex QMF indicated in the following equation.
Here, IQMF(k,n) is a complex QMF using the time “n” and the frequency “k” as variables. When the time-frequency transformation unit 11 uses another time-frequency transformation processing such as fast Fourier transform, discrete cosine transform, and modified discrete cosine transform, the frequency-time transformation unit 16 uses inverse transformation of the time-frequency transformation processing. The frequency-time transformation unit 16 outputs a time signal of the left frequency signal L0(k,n) and right frequency signal R0(k,n) obtained by the frequency-time transformation to the determination unit 17 and the transformation unit 18. The frequency-time transformation unit 16 outputs a time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) obtained by the frequency-time transformation to the transformation unit 18.
The determination unit 17 receives a time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) from the frequency-time transformation unit 16. The determination unit 17 determines a window length from the time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n). Specifically, the determination unit 17 first determines the perceptual entropy (hereinafter, referred to as “PE”) from a time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n). PE represents the amount of information for quantizing the frame segment so that the listener (user) will not perceive noise.
The above PE has a property of becoming greater with respect to a sound having a signal level changing sharply in a short time, such as, for example, an attack sound like a sound produced with a percussion instrument. In other words, the determination unit 17 may determine that the window length is a short window length when the downmix signal contains an attack sound and that the window length is a long window length when the downmix signal contains no attack sound. Accordingly, the determination unit 17 provides a shorter window length (higher time resolution with respect to frequency resolution) for a frame segment where PE value becomes relatively greater. Also, the determination unit 17 provides a longer window length (higher frequency resolution with respect to time resolution) for a frame segment where PE value becomes relatively smaller. For example, the short window length contains 128 samples, and the long window length contains 1,024 samples. The determination unit 17 may determine according to the following determination formula whether the window length is short or long.
δPow>Th, then short (short window length)
δPow≦Th, then long (long window length) (Equation 16)
In Equation 16, “Th” represents an optional threshold with respect to the power (amplitude) of time signal (for example, 70% of average power of time signal). “δPow” is, for example, a power difference between adjacent segments in the same frame. The determination unit 17 may apply, for example, a window length determination method disclosed in Japanese Laid-open Patent Publication No. 7-66733. The determination unit 17 outputs a determined window length to the transformation unit 18.
The transformation unit 18 receives the window length from the determination unit 17, and a time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) from the frequency-time transformation unit 16. The transformation unit 18 receives a time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) from the frequency-time transformation unit 16.
First, the transformation unit 18 implements the modified discrete cosine transform (MDCT) as an example of the orthogonal transformation with respect to the time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n), by using a window length determined by the determination unit 17 to transform the time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) to a set of MDCT coefficients. Further, the transformation unit 18 quantizes the set of MDCT coefficients and performs variable-length coding of the set of quantized MDCT coefficients. The transformation unit 18 outputs the set of MDCT coefficients subjected to the variable-length coding and relevant information such as quantization coefficients to the multiplexing unit 19, as a downmix signal code, for example. The transformation unit 18 may perform the modified discrete cosine transform, for example, according to the following equation.
In Equation 17, MDCTK represents the output MCDT coefficient outputted by the transformation unit 18. Wn represents the window coefficient. In represents the input time signal, which is a time signal of the left frequency signal L0(k,n) or the right frequency signal R0(k,n). “n” is the time, and “k” is the frequency band. “N” is a constant of the window length multiplied by 2. Further, N0 is a constant expressed with (N/2+1)/2. The above window coefficient Wn is a coefficient corresponding to one of four windows (1. long window length→long window length, 2. long window length→short window length, 3. short window length→short window length, 4. short window length→long window length) defined by combination of a window length of a current frame to be transformed and a window length of a frame ahead (of future). In the orthogonal transformation by the transformation unit 18, a delay corresponding to one frame time occurs since information of the frame window length of a frame ahead (of future) the current frame is used.
Next, the transformation unit 18 performs the modified discrete cosine transform (MDCT transform) (an example of the orthogonal transformation) of a time signal of the left-channel residual signal resL(k,n) and the left-channel residual signal resR(k,n) by using a window length determined by the determination unit 17 as is to transform the time signal of the left-channel residual signal resL(k,n) and left-channel residual signal resR(k,n) to a set of MDCT coefficients. Further, the transformation unit 18 quantizes the set of MDCT coefficients and performs variable-length coding of the set of quantized MDCT coefficients. The transformation unit 18 outputs the set of MDCT coefficients subjected to the variable-length coding and relevant information such as quantization coefficients to the multiplexing unit 19, as a residual signal code, for example. The transformation unit 18 may perform the modified discrete cosine transform of a time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) by using Equation 17 in the same manner as the left frequency signal L0(k,n) and the right frequency signal R0(k,n). In this case, the input time signal Inn is a time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n). Further, the window coefficient Wn used in the modified discrete cosine transform of the left frequency signal L0(k,n) and the right frequency signal Ro(k,n) is used as is. Consequently, in the orthogonal transformation of the time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n), a delay corresponding to one frame time does not occur since information of the frame window length of a frame ahead (of future) the current frame is not used.
When transforming the downmix signal code to a residual signal code, the transformation unit 18 performs orthogonal transformation by adjusting delay amounts of the downmix signal code and the residual signal code in such a manner that the delay amounts synchronize with each other, due to the following reason. When delay amounts of the downmix signal code and the residual signal code are not synchronized on the side of the audio encoding device 1, the delay amounts are outputted to the audio decoding device without being synchronized. A typical audio decoding device is configured not to perform correction of the time position. Accordingly, it is difficult to decode an original sound source since decoding is performed using a downmix signal code and a residual signal code at a time position different from the original sound source. Accordingly, delay amounts of the downmix signal code and the residual signal code have to be synchronized on the side of the audio encoding device 1. The delay amounts of the downmix signal code and the residual signal code may be synchronized when the transformation unit 18 outputs the downmix signal code and the residual signal code to the multiplexing unit 19. Further, the delay amounts may be synchronized when the multiplexing unit 19 performs multiplexing described later. Further, the transformation unit 18 may include a buffer such as an unillustrated cache and memory to synchronize the delay amounts of the downmix signal code and the residual signal code.
The multiplexing unit 19 receives the downmix signal code and the residual signal code from the transformation unit 18. Also, the multiplexing unit 19 receives the spatial information code from the spatial information encoding unit 14. The multiplexing unit 19 multiplexes the downmix signal code, the spatial information code, and the residual signal code by arranging in a predetermined sequence. Then, the multiplexing unit 19 outputs an encoded audio signal generated by multiplexing.
Here, an example of technical significances in Embodiment 1 is described. Although described in detail in Comparative Example later, typically, a window length of the left-channel residual signal resL(k,n) and the right-channel residual signal re resR(k,n) have to be calculated by using Equation 16 from a time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n). Further, orthogonal transformation (for example, modified discrete cosine transform) of a time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) have to be performed by using Equation 17 in the same manner as the time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n). Consequently, in the orthogonal transformation of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n), a delay corresponding to one frame time occurs for information of the frame window length of a frame ahead (of future) the current frame.
However, in Embodiment 1, the transformation unit 18 performs modified discrete cosine transform of a time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n), as described above, by using the window coefficient Wn used in the modified discrete cosine transform of the left frequency signal L0(k,n) or the right frequency signal R0(k,n), as is. Consequently, in the orthogonal transformation of the time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n), there is an advantage that a delay corresponding to one frame time does not occur since information of the frame window length of a frame ahead (of future) the current frame is not used.
Next, technical reason is described as to why the transformation unit 18 may use the window coefficient Wn used in the modified discrete cosine transform of the left frequency signal L0(k,n) or the right frequency signal R0(k,n), as is, when performing modified discrete cosine transform of a time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) in Embodiment 1. The technical reason was newly found as a result of intensive studies by the inventors.
Technical consideration by the inventors on the above-mentioned new finding is described below. The left frequency signal L0(k,n) and the right frequency signal R0(k,n) are signals in which a direct wave with respect to an input sound source is modeled. On the other hand, the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) are signals in which a reflected wave (echo sound such as, for example, echo reflected in indoor environment) with respect to an input sound source is modeled. Since being the same input sound source originally, both frequency signals (left frequency signal L0(k,n) and right frequency signal R0(k,n)) and residual signals (left-channel residual signal resL(k,n) and right-channel residual signal resR(k,n)) include sounds having a property of becoming greater with respect to a sound whose signal level varies sharply in a short time, such as an attack sound generated by a percussion instrument, although there exists a phase difference and a power difference therebetween. When a window length determination is performed with thresholds used in Equation 16 under such conditions, influences of the phase difference and power difference would be converged by thresholds and there would be a relationship of a strong correlation between the frequency signal and the residual signal.
In
The residual signal window length determination unit 20 receives a time signal of the left-channel residual signal resL(k,n) and right-channel residual signal resR(k,n) from the frequency-time transformation unit 16. The residual signal window length determination unit 20 calculates a window length of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) from the time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) by using Equation 16. The residual signal window length determination unit 20 outputs the window length of the left-channel residual signal resL(k,n) and right-channel residual signal resR(k,n) to the transformation unit 18.
The transformation unit 18 receives the time signal of the left frequency signal L0(k,n) and right frequency signal R0(k,n), and the time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) from the frequency-time transformation unit 16. The transformation unit 18 receives the window length of the time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) from the determination unit 17. Further, the transformation unit 18 receives the window length of the time signal of the left-channel residual signal resL(k,n) and right-channel residual signal resR(k,n) from the residual signal window length determination unit 20.
The transformation unit 18 transforms the time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) to a set of MDCT coefficients through orthogonal transformation in the same manner as Embodiment 1. Further, the transformation unit 18 quantizes the set of MDCT coefficients and performs variable-length coding of the set of quantized MDCT coefficients. The transformation unit 18 outputs the set of MDCT coefficients subjected to the variable-length coding and relevant information such as quantization coefficients to the multiplexing unit 19, as a downmix signal code.
The transformation unit 18 transforms the time signal of the left-channel residual signal resL(k,n) and right-channel residual signal resR(k,n) to a set of MDCT coefficients through orthogonal transformation. Further, the transformation unit 18 quantizes the set of MDCT coefficients and performs variable-length coding of the set of quantized MDCT coefficients. The transformation unit 18 outputs the set of MDCT coefficients subjected to the variable-length coding and relevant information such as quantization coefficients to the multiplexing unit 19, as a residual signal code, for example.
Specifically, the transformation unit 18 have to perform orthogonal transformation of the time signal of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) with the window length of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n), by using Equation 17 in the same manner as the time signal of the left frequency signal L0(k,n) and the right frequency signal R0(k,n). Consequently, in the orthogonal transformation of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n), a delay corresponding to one frame time also occurs since information of the frame window length of a frame ahead (of future) the current frame is used. When transforming the downmix signal code to a residual signal code, the transformation unit 18 have to perform orthogonal transformation by adjusting delay amounts of the downmix signal code and the residual signal code in such a manner that the delay amounts synchronize with each other, similarly with Embodiment 1.
Here, delay amounts of the comparative Embodiment 1 and Embodiment 1 are compared with each other. First, in the calculation unit 15 illustrated in
To synchronize delay amounts of the downmix signal code and the residual signal code, the delay have to be adjusted to a larger one. Accordingly, a delay amount according to Embodiment 1 corresponds to 1 frame time, and a delay amount according to Comparative Example 1 corresponds to 1.5 frame time. Accordingly, the audio encoding device 1 according to Embodiment 1 is capable of reducing the delay amount.
The flowchart illustrated in
The time-frequency transformation unit 11 is configured to transform signals of the respective channels (for example, signals in 5.1 ch) in the time domain of multi-channel audio signals entered to the audio encoding device 1 to frequency signals of the respective channels by time-frequency transformation on the frame by frame basis (step S1101). Every time calculating frequency signals of the respective channels on the frame by frame basis, the time-frequency transformation unit 11 outputs frequency signals (for example, the frequency signal of the left front channel L(k,n), the frequency signal of the left rear channel SL(k,n), the frequency signal of the right front channel R(k,n), the frequency signal of the right rear channel SR(k,n), the frequency signal of the center-channel C(k,n), and the frequency signal of the deep bass sound channel LFE(k,n)) to the first downmix unit 12 and the calculation unit 15.
The first downmix unit 12 is configured to generate left-channel, center-channel and right-channel frequency signals by downmixing frequency signals of the respective channels every time receiving from the time-frequency transformation unit 11. The first downmix unit 12 calculates, on the frequency band basis, an intensity difference between frequency signals of two channels to be downmixed, and a similarity (which may be referred to as a first spatial information SAC(k)) between the frequency signals, as spatial information between the frequency signals. The intensity difference is information representing the sound localization, and the similarity turns information representing the sound spread (step S1102). The spatial information calculated by the first downmix unit 12 is an example of three-channel spatial information. In Embodiment 1, the first downmix unit 12 calculates the first spatial information SAC(k) in accordance with Equations 3 to 7. The first downmix unit 12 outputs the left-channel frequency signal Lin(k,n), the right-channel frequency signal Rin(k,n), and the center-channel frequency signal Cin(k,n), which are generated by downmixing, to the second downmix unit 13, and outputs the first spatial information SAC(k) to the spatial information encoding unit 14 and the calculation unit 15.
The second downmix unit 13 receives three-channel frequency signals including the left-channel frequency signal Lin(k,n), the right-channel frequency signal Rin(k,n), and the center-channel frequency signal Cin(k,n), respectively generated by the first downmix unit 12. The second downmix unit 13 generates a left frequency signal L0(k,n) in the stereo frequency signal by downmixing the left-channel frequency signal and the center-channel frequency signal out of the three-channel frequency signals. Further, the second downmix unit 13 generates a right frequency signal in the stereo frequency signal by downmixing the right-channel frequency signal and the center-channel frequency signal (step S1103). The second downmix unit 13 generates, for example, a left frequency signal L0(k,n) and a right frequency signal R0(k,n) in the stereo frequency signal in accordance with the Equation 8. Further, the second downmix unit calculates the predictive coefficient code idxcm(k)(m=1,2) or the intensity differences CLD1(k) and CLD2(k) as second spatial information, by using the above method (step S1104). The second downmix unit 13 outputs the second spatial information to the spatial information encoding unit 14. The second downmix unit 13 outputs the left frequency signal L0(k,n) and the right frequency signal R0(k,n) to the frequency-time transformation unit 16.
The spatial information encoding unit 14 generates a spatial information code from the first spatial information received from the first downmix unit 12 and the second spatial information received from the second downmix unit 14 (step S1105). The spatial information encoding unit 14 outputs the generated spatial information code to the multiplexing unit 19.
The calculation unit 15 receives frequency signals of the respective channels (the left front channel frequency signal L(k,n), the left rear channel frequency signal SL(k,n), the right front channel frequency signal R(k,n), and the right rear channel frequency signal SR(k,n)) from the time-frequency transformation unit 11. The calculation unit 15 also receives first spatial information SAC(k) from the first downmix unit 12. The calculation unit 15 calculates, for example, a left-channel residual signal resL(k,n) from the left front channel frequency signal L(k,n), the left rear channel frequency signal SL(k,n), and the first spatial information SAC(k) in accordance with above Equations 13 and 14. Next, the calculation unit 15 calculates a right-channel residual signal resR(k,n) from the right front channel frequency signal R(k,n), the right rear channel frequency signal RL(k,n), and the first spatial information in the same manner as the above-mentioned left-channel residual signal resL(k,n) (step S1106). The calculation unit 15 outputs the calculated left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) to the frequency-time transformation unit 16.
The frequency-time transformation unit 16 receives the left frequency signal L0(k,n) and the right frequency signal R0(k,n) from the second downmix unit 13. The frequency-time transformation unit 16 receives the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) from the calculation unit 15. The frequency-time transformation unit 16 transforms frequency signals (including residual signals) to time-domain signals every time receiving the frequency signals (step S1107). The frequency-time transformation unit 16 outputs the time signal of the left frequency signal L0(k,n) and right frequency signal R0(k,n) obtained by the frequency-time transformation to the determination unit 17 and the transformation unit 18. The frequency-time transformation unit 16 outputs time signals of the left residual signal resL(k,n) and right residual signal resR(k,n) obtained by the frequency-time transformation to the transformation unit 18.
The determination unit 17 receives the time signals of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) from the frequency-time transformation unit 16. The determination unit 17 determines the window length from the time signals of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) (step S1108). The determination unit 17 outputs the determined window length to the transformation unit 18.
The transformation unit 18 receives the window length from the determination unit 17, and the time signals of the left-channel residual signal resL(k,n) and right-channel residual signal resR(k,n) from the frequency-time transformation unit 16. The transformation unit 18 receives the time signals of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) from the frequency-time transformation unit 16. The transformation unit 18 implements the modified discrete cosine transform (MDCT), which is an example of the orthogonal transformation, with respect to the time signals of the left frequency signal L0(k,n) and the right frequency signal R0(k,n), by using the window length determined by the determination unit 17 to transform the time signals of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) to a set of MDCT coefficients (step S1109). Further, the transformation unit 18 quantizes the set of MDCT coefficients and performs variable-length coding of the set of quantized MDCT coefficients. The transformation unit 18 outputs the set of MDCT coefficients subjected to the variable-length coding and relevant information such as quantization coefficients to the multiplexing unit 19, as a downmix signal code. The transformation unit 18 may perform the modified discrete cosine transform, for example, according to Equation 17.
Next, the transformation unit 18 performs the modified discrete cosine transform (MDCT transform) (an example of the orthogonal transformation) of the time signal of the left-channel residual signal resL(k,n) and right-channel residual signal resR(k,n) by using the window length determined by the determination unit 17 as is to transform the time signals of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) to a set of MDCT coefficients (step S1110). Further, the transformation unit 18 quantizes the set of MDCT coefficients and performs variable-length coding of the set of quantized MDCT coefficients. The transformation unit 18 outputs the set of MDCT coefficients subjected to the variable-length coding and relevant information such as quantization coefficients to the multiplexing unit 19, as a residual signal code, for example. The transformation unit 18 may perform the modified discrete cosine transform of the time signals of the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n) by using Equation 17 in the same manner as the left frequency signal L0(k,n) and the right frequency signal R0(k,n). When transforming to a downmix signal code and a residual signal code, the transformation unit 18 performs orthogonal transformation by adjusting delay amounts of the downmix signal code and the residual signal code in such a manner that the delay amounts synchronize with each other.
The multiplexing unit 19 receives the downmix signal code and the residual signal code from the transformation unit 18. Also, the multiplexing unit 19 receives the spatial information code from the spatial information encoding unit 14. The multiplexing unit 19 multiplexes the downmix signal code, the spatial information code, and the residual signal code by arranging in a predetermined sequence (step S1111). Then, the multiplexing unit 19 outputs the encoded audio signal generated by multiplexing. Now, the audio encoding device 1 ends processing illustrated in the operation flowchart of the audio coding in
In Embodiment 1, relation of strong correlation is described between the frequency signal (left frequency signal L0(k,n) and right frequency signal R0(k,n)) and the residual signal (left-channel residual signal resL(k,n) and right-channel residual signal resR(k,n)). Here, Embodiment 2 capable of reducing computation load of an audio encoding device by utilizing the technical feature is described. Illustration of Embodiment 2 is omitted since functional blocks of an audio encoding device according to Embodiment 2 are the same as those of the audio encoding device illustrated in
The transformation unit 18 performs the modified discrete cosine transform (MDCT transform) (an example of the orthogonal transformation) of time signals of the left-channel residual signal resL(k,n) and left-channel residual signal resR(k,n) by using a window length determined by the residual signal window length determination unit 20 to transform the time signals of the left-channel residual signal resL(k,n) and left-channel residual signal resR(k,n) to a set of MDCT coefficients.
Next, the transformation unit 18 implements the modified discrete cosine transform as an example of the orthogonal transformation with respect to the time signals of the left frequency signal L0(k,n) and the right frequency signal R0(k,n), by using a window length determined by the residual signal window length determination unit 20 as is to transform the time signals of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) to a set of MDCT coefficients. This makes unnecessary the window length determination of the time signals of the left frequency signal L0(k,n) and the right frequency signal R0(k,n) in the determination unit 17 and thus reduces computation load of the audio encoding device.
These components included in the audio decoding device 3 are formed, for example, as separate hardware circuits using wired logic. Alternatively, these components included in the audio decoding device 3 may be implemented into the audio decoding device 3 as one integrated circuit in which circuits corresponding to respective components are integrated. The integrated circuit may be an integrated circuit such as, for example, application specific integrated circuit (ASIC) and field programmable gate array (FPGA). Further, these components included in the audio decoding device 3 may be function modules which are achieved by a computer program implemented on a processor of the audio decoding device 3.
The separation unit 31 receives a multiplexed encoded audio signal from the outside. The separation unit 31 separates the encoded downmix signal code, the spatial information code, and the residual signal code, which are contained in the encoded audio signal. The separation unit 31 is capable of using, for example, a method described in ISO/IEC14496-3 as a separation method. The separation unit 31 outputs a separated spatial information code to the spatial information decoding unit 32, a downmix signal code to the downmix signal decoding unit 33, and a residual signal code to the residual signal decoding unit 36.
The spatial information decoding unit 32 receives the spatial information code from the separation unit 31. The spatial information decoding unit 32 decodes the similarity ICCi(k) from the spatial information code by using an example of the quantization table relative to the similarity illustrated in
The downmix signal decoding unit 33 receives a downmix signal code from the separation unit 31, decodes signals (downmix signals) of the respective channels, for example, according to an MC decoding method, and outputs to the time-frequency transformation unit 34. The downmix signal decoding unit 33 may use, for example, a method described in ISO/IEC13818-7 as the MC decoding method.
The time-frequency transformation unit 34 transforms signals of the respective channels being a time signal decoded by the downmix signal decoding unit 33 to a frequency signal, for example, by using a QMF described in ISO/IEC14496-3, and outputs to the predictive decoding unit 35. The time-frequency transformation unit 34 may perform time-frequency transformation by using a complex QMF illustrated in the following equation.
Here, QMF(k,n) is a complex QMF using the time “n” and the frequency “k” as variables. The time-frequency transformation unit 34 outputs time frequency signals of the respective channels to the predictive decoding unit 35.
The predictive decoding unit 35 performs predictive decoding of the center-channel signal C0(k,n) predictively encoded from a predictive coefficient received from the spatial information decoding unit 32 as appropriate, and a frequency signal received from the time-frequency transformation unit 34. For example, the predictive decoding unit 35 is capable of predictively decoding the center-channel signal C0(k,n) from a stereo frequency signal and predictive coefficients C1(k) and C2(k) of the left frequency signal L0(k,n) and right frequency signal R0(k,n) according to the following equation.
C
0(k,n)=c1(k)·L0(k,n)+c2(k)·R0(k,n) (Equation 19)
When intensity differences CLD1(k) and CLD2(k) are received from the spatial information decoding unit 32 instead of the predictive coefficients, the predictive decoding unit 35 may predictively decodes the center-channel signal C0(k,n) by using Equation 19. The predictive decoding unit 35 outputs the left frequency signal L0(k,n), the right frequency signal R0(k,n), and the central frequency signal C0(k,n) to the upmix unit 37.
The residual signal decoding unit 36 receives a residual signal code from the separation unit 31. The residual signal decoding unit 36 decodes the residual signal code, and outputs decoded residual signals (the left-channel residual signal resL(k,n) and the right-channel residual signal resR(k,n)) to the upmix unit 37.
The upmix unit 37 performs matrix transformation according to the following equation for the left frequency signal L0(k,n), the right frequency signal R0(k,n), and the central frequency signal C0(k,n), received from the predictive decoding unit 35.
Here, Lout(k,n), Rout(k,n), and Cout(k,n) are respectively the left-channel frequency, right-channel frequency, and center-channel frequency signals. The upmix unit 37 upmixes the matrix-transformed left-channel frequency signal Lout(k,n), the right-channel frequency signal Rout(k,n), and the center-channel frequency signal Cout(k,n), for example, to a 5.1-channel audio signal, based on the first spatial information SAC(k) received from the spatial information decoding unit 32 and residual signals resL(k,n) and resR(k,n) received from the residual signal decoding unit 36. Upmixing may be performed by using, for example, a method described in ISO/IEC23003.
The frequency-time transformation unit 38 transforms frequency signals received from the upmix unit 37 to time signals by using a QMF illustrated in the following equation.
In such a manner, the audio decoding device disclosed in Embodiment 3 is capable of accurately decoding an encoded audio signal with a reduced delay amount.
The computer 100 as a whole is controlled by a processor 101. The processor 101 is connected to a random access memory (RAM) 102 and multiple peripheral devices via a bus 109. The processor 101 may be a multiple processor. The processor 101 is, for example, CPU, micro processing unit (MPU), digital signal processor (DSP), application specific integrated circuit (ASIC), or programmable logic device (PLD). Further, the processor 101 may be a combination of two or more elements selected from CPU, MPU, DSP, ASIC and PLD.
For example, the processor 101 is capable of performing processing in functional blocks illustrated in
The RAM 102 is used as a main storage device of the computer 100. The RAM 102 temporarily stores at least a portion of programs of operating system (OS) for running the processor 101 and an application program. Further, the RAM 102 stores various data to be used for processing by the processor 101.
Peripheral devices connected to the bus 109 include a hard disk drive (HDD) 103, a graphic processing device 104, an input interface 105, an optical drive device 106, a device connection interface 107, and a network interface 108.
The HDD 103 magnetically writes and reads data from an incorporated disk. The HDD 103 is used, for example, as an auxiliary storage device of the computer 100. The HDD 103 stores an OS program, an application program, and various data. The auxiliary storage device may include a semiconductor storage device such as a flash memory.
The graphic processing device 104 is connected to a monitor 110. The graphic processing device 104 displays various images on a screen of the monitor 110 in accordance with an instruction given by the processor 101. The monitor 110 includes a display device and a liquid display device using a cathode ray tube (CRT).
The input interface 105 is connected to a keyboard 111 and a mouse 112. The input interface 105 transmits signals sent from the keyboard 111 and the mouse 112 to the processor 101. The mouse 112 is an example of pointing devices. Thus, another pointing device may be used. Other pointing devices include a touch panel, a tablet, a touch pad, a truck ball, and so on.
The optical drive device 106 reads data stored in an optical disk 113 by utilizing laser beam. The optical disk 113 is a portable recording medium in which data is recorded in such a manner allowing readout by light reflection. The optical disk 113 includes digital versatile disc (DVD), DVD-RAM, Compact Disc Read-Only Memory (CD-ROM), CD-Recordable (R)/ReWritable (RW), and so on. A program stored in the optical disk 113 serving as a portable recording medium is installed in the audio encoding device or the audio decoding device 3 via the optical drive device 106. A given program installed may be executed on the audio encoding device 1 or the audio decoding device 3.
The device connection interface 107 is a communication interface for connecting peripheral devices to the computer 100. For example, the device connection interface 107 may be connected to the memory device 114 and the memory reader writer 115. The memory device 114 is a recording medium having a function for communication with the device connection interface 107. The memory reader writer 115 is a device configured to write data into the memory card 116 or read data from the memory card 116. The memory card 116 is a card type recording medium.
A network interface 108 is connected to a network 117.
The network interface 108 transmits and receives data from other computers or communication devices via the network 117.
The computer 100 implements, for example, the above mentioned graphic processing function by executing a program recorded in a computer readable recording medium. A program describing details of processing to be executed by the computer 100 may be stored in various recording media. The above program may include one or more function modules. For example, the program may include function modules which implement processing illustrated in
In the embodiments described above, components of illustrated respective devices may not be physically configured as illustrated. That is, specific separation and integration of devices are not limited to those illustrated, and devices may be configured by separating and/or integrating a whole or a portion thereof on an optional basis depending on various loads and utilization status.
Further, according to other embodiments, channel signal coding of the audio encoding device may be performed by coding the stereo frequency signal according to a different coding method. The multi-channel audio signal to be encoded or decoded is not limited to the 5.1-channel audio signal. For example, the audio signal to be encoded or decoded may be an audio signal having multiple channels such as 2 channels, 3 channels, 3.1 channels, or 7.1 channels. In this case, the audio encoding device also calculates frequency signals of the respective channels by performing time-frequency transformation of audio signals of the channels. Then, the audio encoding device downmixes frequency signals of the channels to generate a frequency signal with the number of channels less than an original audio signal.
Audio coding devices according to the above embodiments may be implemented on various devices used to convey or record an audio signal, such as a computer, video signal recorder or video transmission apparatus.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2013-259524 | Dec 2013 | JP | national |