1. Field of the Invention
The present invention relates to a method and an apparatus for processing an audio signal that encode or decode an audio signal.
2. Discussion of the Related Art
In general, auditory masking is explained by psychoacoustic theory. The masking effect uses properties of the psychoacoustic theory in that low volume signals adjacent to high volume signals are overwhelmed by the high volume signals, thereby preventing a listener from hearing the low volume signals. During quantization of an audio signal, a quantization error occurs. Such quantization error may be appropriately allocated using a masking threshold, with the result that quantization noise may not be heard.
However, bits are insufficient for a low bit rate codec, with the result that it is not possible to completely mask such quantization noise. In this case, perceived distortion cannot be avoided, and therefore, it is necessary to allocate bits so as to minimize the perceived distortion.
According to the properties of the human auditory system, on the other hand, a speech signal is more sensitive to quantization noise of a frequency band having relatively low energy than to quantization noise of a frequency band having relatively high energy.
In particular, a psychoacoustic model based on a signal excitation pattern is applied to a signal containing a mixture of speech and music, and therefore, quantization noise is allocated irrespective of the human auditory property. As a result, it is not possible to effectively allocate a quantization error, thereby increasing perceived distortion.
Accordingly, the present invention is directed to a method for processing an audio signal and apparatus that substantially obviate one or more problems due to limitations and disadvantages of the related art.
An object of the present invention is to provide a method for processing an audio signal and apparatus that are capable of adjusting a masking threshold based on a relationship between the magnitude of energy and sensitivity of quantization noise, thereby efficiently quantizing an audio signal.
Another object of the present invention is to provide a method for processing an audio signal and apparatus that are capable of applying an auditory property for a speech signal with respect to an audio signal having a speech component and a non-speech component in a mixed state, thereby improving sound quality of the speech signal.
A further object of the present invention is to provide a method for processing an audio signal and apparatus that are capable of adjusting a masking threshold without use of additional bits under the same bit rate condition, thereby improving sound quality.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, a method for processing an audio signal includes frequency-transforming an audio signal to generate a frequency spectrum, deciding a weighting per band corresponding energy per band using the frequency spectrum, receiving a masking threshold based on a psychoacoustic model, applying the weighting to the masking threshold to generate a modified masking threshold, and quantizing the audio signal using the modified masking threshold.
The weighting per band may be generated based on a ratio of energy of a current band to average energy of a whole band.
The method for processing an audio signal may further include calculating loudness based on constraints of a given bit rate using the frequency spectrum, and the modified masking threshold may be generated based on the loudness.
The method for processing an audio signal may further include deciding a speech property with respect to the audio signal, and the step of deciding the weighting per band and the step of generating the modified masking threshold may be carried out in a band having the speech property of a whole band of the audio signal.
In another aspect of the present invention, a method for processing an audio signal includes frequency-transforming an audio signal to generate a frequency spectrum, deciding a weighting including a first weighting corresponding to a first band and a second weighting corresponding to a second band based on the frequency spectrum, receiving a masking threshold based on a psychoacoustic model, applying the weighting to the masking threshold to generate a modified masking threshold, and quantizing the audio signal using the modified masking threshold, wherein the audio signal is stronger in the first band than on average and is weaker in the second band than on average.
The first weighting may have a value of 1 or more, and the second weighting may have a value of 1 or less.
The modified masking threshold may be generated based on loudness per band, and the weighting per band may be applied to the loudness per band.
In another aspect of the present invention, an apparatus for processing an audio signal includes a frequency-transforming unit for frequency-transforming an audio signal to generate a frequency spectrum, a weighting decision unit for deciding a weighting per band corresponding energy per band using the frequency spectrum, a masking threshold generation unit for receiving a masking threshold based on a psychoacoustic model and applying the weighting to the masking threshold to generate a modified masking threshold, and a quantization unit for quantizing the audio signal using the modified masking threshold.
The weighting per band may be generated based on a ratio of energy of a current band to average energy of a whole band.
The masking threshold generation unit may calculate loudness based on constraints of a given bit rate using the frequency spectrum, and the modified masking threshold may be generated based on the loudness.
In another aspect of the present invention, an apparatus for processing an audio signal includes a frequency-transforming unit for frequency-transforming an audio signal to generate a frequency spectrum, a weighting decision unit for deciding a weighting including a first weighting corresponding to a first band and a second weighting corresponding to a second band based on the frequency spectrum, a masking threshold generation unit for receiving a masking threshold based on a psychoacoustic model and applying the weighting to the masking threshold to generate a modified masking threshold, and a quantization unit for quantizing the audio signal using the modified masking threshold, wherein the audio signal is stronger in the first band than on average and is weaker in the second band than on average.
The first weighting may have a value of 1 or more, and the second weighting may have a value of 1 or less.
The modified masking threshold may be generated based on loudness per band, and the weighting per band may be applied to the loudness per band.
In another aspect of the present invention, a method for processing an audio signal includes receiving spectral data and a scale factor with respect to an audio signal and restoring the audio signal using the spectral data and the scale factor, wherein the spectral data and the scale factor are generated by applying a modified masking threshold to the audio signal, and the modified masking threshold is generated by applying a weighting per band corresponding to energy per band to a masking threshold based on a psychoacoustic model.
In a further aspect of the present invention, there is provided a storage medium for storing digital audio data, the storage medium being configured to be read by a computer, wherein the digital audio data include spectral data and a scale factor, the spectral data and the scale factor are generated by applying a modified masking threshold to an audio signal, and the modified masking threshold is generated by applying a weighting per band corresponding to energy per band to a masking threshold based on a psychoacoustic model.
The present invention has the following effects and advantages.
First, it is possible to adjust a masking threshold based on a relationship between the magnitude of energy and sensitivity of quantization noise, thereby minimizing perceived distortion even under a low bit rate condition.
Second, it is possible to apply the principles of human hearing to a speech signal while maintaining sound quality of a music signal. In addition, it is possible to improve sound quality of the speech signal without an increase in a bit rate.
Third, it is possible to effectively improve sound quality of a signal having a spectral tilt or formant, such as a speech vowel without changing the bit rate.
It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principle of the invention. In the drawings:
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. First of all, terminology used in this specification and claims must not be construed as limited to the general or dictionary meanings thereof and should be interpreted as having meanings and concepts matching the technical idea of the present invention based on the principle that an inventor is able to appropriately define the concepts of the terminologies to describe the invention in the best way possible. The embodiment disclosed herein and configurations shown in the accompanying drawings are only one preferred embodiment and do not represent the full technical scope of the present invention. Therefore, it is to be understood that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents when this application was filed.
According to the present invention, terminology used in this specification can be construed as the following meanings and concepts matching the technical idea of the present invention. Specifically, ‘coding’ can be construed as ‘encoding’ or ‘decoding’ selectively and ‘information’ as used herein includes values, parameters, coefficients, elements and the like, and meaning thereof can be construed as different occasionally, by which the present invention is not limited.
In this disclosure, in a broad sense, an audio signal is conceptionally discriminated from a video signal and designates all kinds of signals that can be perceived by a human. In a narrow sense, the audio signal means a signal having none or small quantity of speech characteristics. “Audio signal” as used herein should be construed in a broad sense. Yet, the audio signal of the present invention can be understood as an audio signal in a narrow sense in case of being used as discriminated from a speech signal.
Meanwhile, a frame indicates a unit used to encode or decode an audio signal, and is not limited in terms of sampling rate or time.
A method for processing an audio signal according to the present invention may be a spectral data encoding/decoding method, and an apparatus for processing an audio signal according to the present invention may be a spectral data encoding/decoding apparatus. In addition, the method for processing an audio signal according to the present invention may be an audio signal encoding/decoding method to which the spectral data encoding/decoding method is applied, and the apparatus for processing an audio signal according to the present invention may be an audio signal encoding/decoding apparatus to which the spectral data encoding/decoding apparatus is applied. Hereinafter, a spectral data encoding/decoding apparatus will be described, and a spectral data encoding/decoding method performed by the spectral data encoding/decoding apparatus will be described. Subsequently, an audio signal encoding/decoding apparatus and method, to which the spectral data encoding/decoding apparatus and method are applied, will be described.
Referring first to
Referring to
The weighting decision unit 122 decides a weighting per band, specifically energy per band, based on the frequency spectrum (S120). Here, the frequency spectrum may be generated by the frequency-transforming unit 112 at Step S110, or the frequency spectrum may be generated from the input audio signal by the weighting decision unit 122. Here, the weighting per band is provided to modify a masking threshold. The weighting per band is a value corresponding to energy per band. The weighting per band may be proportional to the energy per band. When the energy per band is higher than average (or is relatively high), the weighting per band may have a value of 1 or more. When the energy per band is lower than the average (or is relatively low), the weighting per band may have a value of 1 or less. The weighting per band will be described in detail with reference to
The psychoacoustic model 130 applies a masking effect to the input audio signal to generate a masking threshold. The masking effect is based on psychoacoustic theory. Auditory masking is explained by psychoacoustic theory. The masking effect uses properties of the psychoacoustic theory in that low volume signals adjacent to high volume signals are overwhelmed by the high volume signals, thereby preventing a listener from hearing the low volume signals. For example, the highest gains may be seen around the middle of the auditory spectrum, and several bands having much lower gains may be present around the peak band. Here, the highest volume signal serves as a masker, and a masking curve is drawn based on the masker. The low volume signals covered by the masking curve serve as masked signals or maskees. Leaving the remaining signals as effective signals excluding the masked signals is masking. The masking threshold is generated based on the psychoacoustic model, which is an empirical model, using the masking effect.
The masking threshold generation unit 124 generates loudness through application of the weighting per band (S130) and receives the masking threshold from the psychoacoustic model 130 (S140). Subsequently, speech properties of the audio signal are analyzed. When the current band corresponds to an audio signal region (“YES” at Step S150), the weighting generated at Step S130 is applied to the masking threshold to generate a modified masking threshold (S160). At Step S160, the loudness may be further used, which will be described in detail with reference to
The quantization unit 114 quantizes a spectral coefficient based on the modified masking threshold to generate spectral data and a scale factor.
Where, X indicates a spectral coefficient, scalefactor indicates a scale factor, and spectral_data indicates spectral data.
Mathematical expression 1 is not an equality. Since both the scale factor and the spectral data are integers, it is not possible to express all arbitrary X due to resolution of these values. For this reason, Mathematical expression 1 is not an equality. Consequently, the right side of Mathematical expression 1 may be expressed X′ as represented by Mathematical expression 2 below.
An error may occur during quantization of the spectral coefficient. An error signal may indicate the difference between the original coefficient X and the quantized value X′ as represented by Mathematical expression 3 below.
Error=X−X′ [Mathematical expression 3]
Where, X is the same as in Mathematical expression 1, and X′ is the same as in Mathematical expression 2.
Energy corresponding to the error signal Error is a quantization error Eerror.
A scale factor and spectral data are obtained using the masking threshold Eth and the quantization error Eerror acquired as described above to satisfy a condition expressed in Mathematical expression 4 below.
Eth>Eerror [Mathematical expression 4]
Where, Eth indicates a masking threshold, and Eerror indicates a quantization error.
That is, since the quantization error is less than the masking threshold when the above condition is satisfied, noise due to quantization is covered by the masking effect. In other words, listeners cannot perceive the quantized noise.
The entropy encoding unit 116 entropy codes the spectral data and the scale factor. The entropy coding may be performed based on a Huffman coding scheme, to which, however, the present invention is not limited. Subsequently, the entropy coded result is multiplexed to generate a bit stream.
Hereinafter, a first example of the weighting decision step (S120), the loudness generation step (S130), and the weighting application step (S160) of the method for processing an audio signal according to the embodiment of the present invention will be described with reference to
Referring to
A whole band is divided into a first band and a second band based on a frequency spectrum and energy (S122a). For example, the first band has higher energy than average energy of the whole band, and the second band has lower energy than average energy of the whole band. The first band may be a frequency band decided based on harmonic frequency. For example, a frequency corresponding to the harmonic frequency may be defined as represented by the following mathematical expression.
F0=[f1, . . . ,fM] [Mathematical expression 6]
The first band N having high energy may be defined as represented by the following mathematical expression based on the harmonic frequency.
N=[n1, . . . ,nM′] [Mathematical expression 7]
The remaining band, excluding the first band N, is the second band.
Subsequently, a first weighting corresponding to the first band and a second weighting corresponding to the second band are decided (S124a). For example, the first weighting and the second weighting may be decided as represented by the following mathematical expression.
a for ni∈N
b for ni∉N [Mathematical expression 8]
Where, a indicates a first weighting, and b indicates a second weighting.
The first weighting may have a value of 1 or more, and the second weighting may have a value of 1 or less. Specifically, the first weighting is a weighting with respect to a band having higher energy than average energy. The first weighting has a value of 1 or more so as to further increase the masking threshold. On the other hand, the second weighting is a weighting with respect to a band having lower energy than average energy. The second weighting has a value of 1 or less so as to further decrease the masking threshold.
Meanwhile, with respect to loudness r equally applied over the whole band, the first weighting is applied to the first band, and the second weighting is applied to the second band, to generate loudness per band (S130a). This may be defined as represented by the following mathematical expression.
r′=c×r, for ni∈N
r′=d×r, for ni∉N [Mathematical expression 9]
Where, r′ indicates loudness per band, c indicates a first weighting, d indicates a second weighting, and r indicates loudness.
The first weighting may have a value of 1 or more, and the second weighting may have a value of 1 or less. That is, the loudness is further increased in the band having high energy, and the loudness is further decreased in the band having low energy. In this way, the masking threshold is adjusted so as to maintain a modification effect of the masking threshold per frequency band. Meanwhile, the first weighting and the second weighting may be equal to those generated at Step S124a, to which, however, the present invention is not limited.
Hereinafter, a process of generating a modified masking threshold using the weighting decided at Step S124a and the loudness decided at Step S130a will be described. First, at Step 162a, when the current band of an audio signal is a first band (“YES” at Step S162a), a first weighting is applied to a masking threshold of the first band to generate a modified masking threshold (S164a). For example, the first weighting may be applied as represented by the following mathematical expression.
thr′(ni)=a×thr(ni), for ni∈N [Mathematical expression 10]
Where, thr(ni) indicates a masking threshold of the current band, a indicates a first weighting, and thr′(ni) indicates a modified masking threshold of the current band.
The first weighting may have a value of 1 or more. In this case, thr′(ni) may be greater than thr(ni). Increase of the masking threshold means that even high volume signals can be masked. Therefore, a larger quantization error may be allowed. That is, since auditory sensitivity is low in a band having relatively high energy, larger quantization noise is allowed to achieve bit reduction.
On the other hand, when the current band of an audio signal is a second band (“NO” at Step S162a), a second weighting is applied to a masking threshold (S166a). The second weighting may be applied as represented by the following mathematical expression.
thr′(ni)=b×thr(ni), for ni∉N [Mathematical expression 11]
Where, thr(ni) indicates a masking threshold of the current band, b indicates a second weighting, and thr′(ni) indicates a modified masking threshold of the current band.
The second weighting may have a value of 1 or less. In this case, thr′(ni) may be less than thr(ni). Decrease of the masking threshold means that only low volume signals can be masked. Therefore, a smaller quantization error is allowed. That is, since auditory sensitivity is high in a band having relatively low energy, little quantization noise is allowed to increase bit allocation and thus improve sound quality.
The first weighting and the second weighting are applied to the corresponding bands through Step S162a to Step S166a to generate a modified masking threshold.
Meanwhile, loudness per band generated at Step S130a may also be used to generate a modified masking threshold. For example, a masking threshold modified as represented by the following mathematical expression may be generated.
Where, thrr(ni) indicates a modified masking threshold, thr′(ni) indicates the result at Step S164a or at Step S166a, r′ indicates loudness per band, en(n) indicates energy of the current band, and minSnr(n) indicates a minimum signal to noise ratio.
Hereinafter, an example of generating a weighting changed per band and applying the weighting to a masking threshold will be described with reference to
First, a relationship between a masking threshold based on a psychoacoustic model and a masking threshold to which loudness is applied is as follows.
Tr(n)=(T(n)0.25+r)4 [Mathematical expression 13]
Where, T(n) indicates an initial masking threshold of an n-th frequency band based on a psychoacoustic model, Tr(n) indicates a masking threshold to which loudness is applied, and r indicates loudness.
The term r included in the above mathematical expression is loudness, which is a constant added to each scale factor band. A specific value of the loudness may be calculated from total perceived entropy Pe (sum of Pe values of the respective scale factor bands). Meanwhile, the perceived entropy may be developed as represented by the following mathematical expression so as to reveal a relationship between loudness and a threshold.
Where, pe(n) indicates perceived entropy, E(n) indicates energy of an n-th scale factor band, lq(n) indicates the estimated number of lines which are not 0 after quantization, and
and Tavg indicate an average approximate value of total thresholds.
When desired perceived entropy per at a given bit rate is substituted to Pe in the above mathematical expression, constant loudness r is expressed as represented by the following mathematical expression.
r=2(A-pe
Tavg is an average value of initial masking thresholds. In this case, r may be assumed to be 0. When pe0 is total perceived entropy acquired from the initial masking thresholds, therefore, Tavg0.25 may be calculated to be 2(A-pe
Meanwhile, Mathematical expression 13 may be modified to include a weighting w(n) as represented by the following mathematical expression.
Twr(n)=(T(n)0.25+w(n)r)4 [Mathematical expression 16]
Where, w(n) indicates a weighting, which corresponds to energy per band. The weighting may be proportional to energy per band. Here, “proportional” means that a weighting increases as energy per band increases. However, this relationship is not necessarily directly proportional.
The weighting may be defined as a ratio of energy per band to average energy over the entire spectrum, for example, as follows.
Where, N indicates the number of whole frequency bands encoded, and Es(n) indicates a value of energy of an n-th band which is diffused using an energy expansion function. Energy contour depends upon a spectral envelope, which is suitable for introducing a perceptual weighting effect.
Therefore, average energy across all bands
is calculated first so as to obtain a weighting per band w(n) (S122b). Subsequently, energy Es(n) of the current band is calculated (S124b). A weighting per band w(n) is decided using the average energy calculated at Step S122b and the energy of the current band calculated at Step S124b (S126b).
The generated weighting w(n) is increased at a peak band but is decreased at a valley band, and therefore, it is possible to control a bit rate reflecting a perceptual weighting concept. Since the masking threshold at the peak band is greater than a value of T, a larger quantization error is allowed. On the other hand, the masking threshold is decreased as to allow a larger amount of bits at a band having lower energy than an intermediate value, i.e., at the valley band, with the result that a quantization error is reduced.
Such a weighting application concept may be more effective for a signal, such as a speech vowel, having a spectral tilt or a formant.
Meanwhile, when weighting change is too sharp, a serious auditory defect may occur. In order to prevent occurrence of such a serious auditory defect, w(n) may be restricted by a lower bound and an upper bound as represented by the following mathematical expression using the form of a sigmoid function so as to decide a modified weighting (per band) (S128b).
Where, w(n) indicates a weighting, and {tilde over (w)}(n) indicates a modified weighting.
The maximum value of {tilde over (w)}(n) is 1.5, and the minimum value of {tilde over (w)}(n) is 1/(1+e)+0.5 (approximately 0.77).
The modified weighting {tilde over (w)}(n) is approximately but not directly proportional to the energy of a given band (i.e., there is no linear relationship between energy band and weighting) like the weighting of Mathematical expression 17. Meanwhile, Mathematical expression 18 may be variously modified according to a bit rate, signal properties, or usage, by which, however, the present invention is not limited.
Loudness r is decided to have a final value {tilde over (r)} based on constraints of a bit rate (S130b). Hereinafter, Step S130b will be described in detail. When a loudness of {tilde over (w)}(n)r is added to the above mathematical expression, the masking threshold is increased. Consequently, audible quantization noise may be considered to have a specific loudness of {tilde over (w)}(n)r at an n-th band, i.e., N′noise(n)={tilde over (w)}(n)r. Based on constraints of a bit rate, a value of r may be decided so as to minimize total noise loudness N′noise(n)={tilde over (w)}(n)r. In Mathematical expression 16, perceived entropy due to Twr(n) is set to desired perceived entropy per according to constraints of a given bit rate. A cost function to solve this problem may be set using a Lagrange multiplier as represented by the following mathematical expression.
Where,
is related to constraints of a bit rate, and lq(n) and E(n) are the same as in Mathematical expression 14.
Assuming that 0≦({tilde over (w)}(n)r)/T(n)0.25<<1, the second term in parenthesis of the above mathematical expression may approximate to a quadratic polynomial of a Taylor series.
A constrained least square problem is solved to calculate two roots r1 and r2 as represented by the following mathematical expression.
If both r1 and r2 are positive numbers, a final value {tilde over (r)} is decided to have a small valve. This is because noise loudness N′noise(n)={tilde over (w)}(n)r generated by the small value is less than that generated by the large value. However, the small value is not always a correct root. This is because, as represented by Mathematical expression 21, r has a minimum bound of zero. For example, if r1 is a negative number and r2 is a positive number, r1 is selected as a root although r2 is a correct root if r1 is set to 0. Therefore, a final value {tilde over (r)} is decided to have a larger valve than two values.
A masking threshold for quantization is newly updated using a reduction value {tilde over (r)} and an energy weighting {tilde over (w)}(n). However, if the absolute difference between desired perceived entropy per and resultant perceived entropy is greater than a predetermined masking threshold, an additional reduction value is calculated using Mathematical expression 22 and is added to {tilde over (r)} using a conventional method.
As described above, Step S130b, i.e., a process of deciding loudness r to have a final value {tilde over (r)} based on constraints of a bit rate, has been described.
A modified masking threshold Twr(n) is generated using the modified weighting {tilde over (w)}(n) decided at Step S128b and the loudness {tilde over (r)} decided at Step S130b (S160b). Mathematical expression 18 and Mathematical expression 22 may be substituted into Mathematical expression 16 so as to generate a modified masking threshold.
In
According to the present invention, a band having a relatively high intensity of energy may have a weighting of 1 or more, and a band having a relatively low intensity of energy may have a weighting of 1 or less. Therefore, a weighting of 1 or more is applied to the masking threshold {circle around (3)} based on the psychoacoustic model in a band, such as the region A of
The demultiplexing unit (not shown) receives a bit stream and extracts spectral data and a scale factor from the received bit stream. The spectral data are generated from the spectral coefficient through quantization. In quantizing the spectral data, quantization noise is allocated in consideration of a masking threshold. Here, the masking threshold is not a masking threshold generated using a psychoacoustic model but a modified masking threshold generated by applying a weighting to the masking threshold generated by the psychoacoustic model. The modified masking threshold is provided to allocate larger quantization noise in a peak band and smaller quantization noise in a valley band.
The entropy decoding unit 212 entropy decodes spectral data. The entropy coding may be performed based on a Huffman coding scheme, to which, however, the present invention is not limited.
The de-quantization unit 214 de-quantizes spectral data and a scale factor to generate a spectral coefficient.
The inverse transforming unit 216 performs frequency to time mapping to generate an output signal using the spectral coefficient. Here, the frequency to time mapping may be performed based on inverse quadrature mirror filterbank (IQMF) or inverse modified discrete Fourier transform (IMDCT), to which, however, the present invention is not limited.
The multi-channel encoder 310 receives a plurality of channel signals (two or more channel signals) (hereinafter, referred to as a multi-channel signal), performs downmixing to generated a mono downmixed signal or a stereo downmixed signal, and generates space information necessary to upmix the downmixed signal into a multi-channel signal. Here, space information may include channel level difference information, inter-channel correlation information, a channel prediction coefficient, downmix gain information, and the like. If the audio signal encoding device 300 receives a mono signal, the multi-channel encoder 310 may bypass the mono signal without downmixing the mono signal.
The band extension encoder 320 may generate band extension information to restore data of a downmixed signal excluding spectral data of a partial band (for example, a high frequency band) of the downmixed signal.
The audio signal encoder 330 encodes a downmixed signal using an audio coding scheme when a specific frame or segment of the downmixed signal has a high audio property. Here, the audio coding scheme may be based on an advanced audio coding (ACC) standard or a high efficiency advanced audio coding (HE-ACC) standard, to which, however, the present invention is not limited. Meanwhile, the audio signal encoder 330 may be a modified discrete transform (MDCT) encoder.
The speech signal encoder 340 encodes a downmixed signal using a speech coding scheme when a specific frame or segment of the downmixed signal has a high speech property. Here, the speech coding scheme may be based on an adaptive multi-rate wide band (AMR-WB) standard, to which, however, the present invention is not limited. Meanwhile, the speech signal encoder 340 may also use a linear prediction coding (LPC) scheme. When a harmonic signal has high redundancy on the time axis, the harmonic signal may be modeled through linear prediction which predicts a current signal from a previous signal. In this case, the LPC scheme may be adopted to improve coding efficiency. Meanwhile, the speech signal encoder 340 may be a time domain encoder.
The spectral data encoding device 350 performs frequency-transforming, quantization, and entropy encoding with respect to an input signal so as to generate spectral data. The spectral data encoding device 350 includes at least some (in particular, the weighting decision unit 122 and the masking threshold generation unit 124) of the components of the spectral data encoding device according to the embodiment of the present invention previously described with reference to
The multiplexer 360 multiplexes space information, band extension information, and spectral data to generate an audio signal bit stream.
The demultiplexer 410 multiplexes spectral data, band extension information, and space information from an audio signal bit stream.
The spectral data decoding device 420 performs entropy encoding and de-quantization using spectral data and a scale factor. The spectral data decoding device 420 may include at least the de-quantization unit 214 of the spectral data decoding device 200 previously described with reference to
The audio signal decoder 430 decodes spectral data corresponding to a downmixed signal using an audio coding scheme when the spectral data has a high audio property. Here, the audio coding scheme may be based on an ACC standard or an HE-ACC standard, as previously described. The speech signal decoder 440 decodes a downmixed signal using a speech coding scheme when the spectral data has a high speech property. Here, the speech coding scheme may be based on an AMR-WB standard, as previously described, to which, however, the present invention is not limited.
The band extension decoder 450 decodes a bit stream of band extension information and generates spectral data of a different band (for example, a high frequency band) from some or all of the spectral data using this information.
When the decoded audio signal is downmixed, the multi-channel decoder 460 generates an output channel signal of a multi-channel signal (including a stereo channel signal) using space information.
The spectral data encoding device or the spectral data decoding device according to the present invention may be included in a variety of products, which may be divided into a standalone group and a portable group. The standalone group may include televisions (TV), monitors, and settop boxes, and the portable group may include portable media players (PMP), mobile phones, and navigation devices.
Referring first to
A user authentication unit 520 receives user information to authenticate a user. The user authentication unit 520 may include at least one selected from a group consisting of a fingerprint recognition unit 520A, an iris recognition unit 520B, a face recognition unit 520C, and a speech recognition unit 520D. The fingerprint recognition unit 520A, the iris recognition unit 520B, the face recognition unit 520C, and the speech recognition unit 520D receive fingerprint information, iris information, face profile information, and speech information, respectively, convert the received information into user information, and determine whether the user information coincides with registered user data to authenticate the user.
An input unit 530 allows a user to input various kinds of commands. The input unit 530 may include at least one selected from a group consisting of a keypad 530A, a touchpad 530B, and a remote control 530C, to which, however, the present invention is not limited. A signal coding unit 540 includes a spectral data encoding device 545 or a spectral data decoding device. The spectral data encoding device 545 includes at least the weighting decision unit and the masking threshold generation unit of the spectral data encoding device previously described with reference to
A controller 550 receives input signals from input devices and controls all processes of the signal coding unit 540 and an output unit 560. The output unit 560 outputs an output signal generated by the signal coding unit 540. The output unit 560 may include a speaker 560A and a display 560B. When an output signal is an audio signal, the output signal is output to the speaker. When an output signal is a video signal, the output signal is output to the display.
The method for processing an audio signal according to the present invention may be modified as a program which can be executed by a computer. The program may be stored in a recording medium which can be read by the computer. Also, multimedia data having a data structure according to the present invention may be stored in a recording medium which can be read by the computer. The recording medium which can be read by the computer includes all kinds of devices that store data which can be read by the computer. Examples of the recoding medium which can be read by the computer may include a read only memory (ROM), a random access memory (RAM), a compact disc ROM (CD-ROM), a magnetic tape, a floppy disc, and an optical data storage device. In addition, a recoding medium employing a carrier waver (for example, transmission over the Internet) format may be further included. Also, a bit stream generated by the encoding method as described above may be stored in a recording medium which can be read by a computer or a transmitted using a wired or wireless communication network.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the inventions. Thus, it is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
The present invention is applicable to encoding and decoding of an audio signal.
Number | Date | Country | Kind |
---|---|---|---|
10-2009-0044622 | May 2009 | KR | national |
This application is the National Phase of PCT/KR2009/002745 filed on May 25, 2009, which claims priority under 35 U.S.C. 119(e) to U.S. Provisional Application No(s). 61/055,464 filed on May 23, 2008, 61/078,773 filed on Jul. 8, 2008 and 61/085,005 filed on Jul. 31, 2008 and under 35 U.S.C. 119(a) to Patent Application No. 10-2009-0044622 filed in the Republic of Korea on May 21, 2009, all of which are hereby expressly incorporated by reference into the present application.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR2009/002745 | 5/25/2009 | WO | 00 | 11/19/2010 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2009/142466 | 11/26/2009 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6725192 | Araki | Apr 2004 | B1 |
8332216 | Kurniawati et al. | Dec 2012 | B2 |
20040162720 | Jang et al. | Aug 2004 | A1 |
20050043830 | Lee et al. | Feb 2005 | A1 |
20070208557 | Li et al. | Sep 2007 | A1 |
20070255562 | Kurniawati et al. | Nov 2007 | A1 |
20080130903 | Ojanpera | Jun 2008 | A1 |
Number | Date | Country |
---|---|---|
WO 9922365 | May 1999 | WO |
Number | Date | Country | |
---|---|---|---|
20110075855 A1 | Mar 2011 | US |
Number | Date | Country | |
---|---|---|---|
61055464 | May 2008 | US | |
61078773 | Jul 2008 | US | |
61085005 | Jul 2008 | US |