1. Field of the Invention
The present invention relates to an apparatus for processing an audio signal and method thereof. Although the present invention is suitable for a wide scope of applications, it is particularly suitable for encoding or decoding audio signals.
2. Discussion of the Related Art
Generally, an audio signal has correlation between a low frequency band signal and a high frequency band signal within one frame. In consideration of the principle of the correlation, it is able to compress an audio signal by a band extension technology that encodes high frequency band spectral data using low frequency band spectral data.
However, in the related art, in case that low correlation exists between a low frequency band signal and a high frequency band signal, if an audio signal is compressed using a band extension scheme, a sound quality of the audio signal is degraded.
Specifically, in case of sibilant or the like, since the correlation is not high, the band extension scheme for the audio signal is not suitable for the sibilant or the like.
Meanwhile, there are band extension schemes of various types. A type of a band extension scheme applied to an audio signal may differ according to a time. In this case, a sound quality may be instantly degraded in an interval where a different type varies.
Accordingly, the present invention is directed to an apparatus for processing an audio signal and method thereof that substantially obviate one or more of the problems due to limitations and disadvantages of the related art.
An object of the present invention is to provide an apparatus for processing an audio signal and method thereof, by which a band extension scheme can be selectively applied according to a characteristic of an audio signal.
Another object of the present invention is to provide an apparatus for processing an audio signal and method thereof, by which a suitable scheme can be adaptively applied according to a characteristic of an audio signal per frame instead of using a band extension scheme.
A further object of the present invention is to provide an apparatus for processing an audio signal and method thereof, by which a quality of sound can be maintained by avoiding an application of a band extension scheme if an analyzed audio signal characteristic is close to sibilant.
Another object of the present invention is to provide an apparatus for processing an audio signal and method thereof, by which band extension schemes of various types are applied per time according to a characteristic of an audio signal.
Another object of the present invention is to provide an apparatus for processing an audio signal and method thereof, by which artifact can be reduced in a band extension scheme type varying interval in case of applying band extension schemes of various types.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
To achieve these and other advantages and in accordance with the purpose of the present invention, as embodied and broadly described, a method for processing an audio signal, comprising: receiving a spectral data of lower band and type information indicating a particular band extension scheme for a current frame of the audio signal from among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme, by an audio processing apparatus; when the type information indicates the first band extension scheme for the current frame, generating a spectral data of higher band in the current frame using the spectral data of lower band by performing the first band extension scheme; and when the type information indicates the second band extension scheme for the current frame, generating the spectral data of higher band in the current frame using the spectral data of lower band by performing the second band extension scheme, wherein the first band extension scheme is based on a first data area of the spectral data of lower band, and wherein the second band extension scheme is based on a second data area of the spectral data of lower band.
According to the present invention, the first data area is a portion of the spectral data of lower band, and, wherein the second data area is a plurality of portions including the portion of the spectral data of lower band.
According to the present invention, the first data area is a portion of the spectral data of lower band, and, wherein the second data area is all of the spectral data of lower band.
According to the present invention, the second data area is greater than the first data area.
According to the present invention, the higher band comprises at least one band equal to or higher than a boundary frequency and wherein the lower band comprises at least one band equal to or lower than the boundary frequency.
According to the present invention, the first band extension scheme is performed using at least one operation of bandpass filtering, time stretching processing and decimation processing.
According to the present invention, the method further comprises receiving band extension information including envelop information, the first band extension scheme or the second band extension scheme is performed using the band extension information.
According to the present invention, the method further comprises decoding the spectral data of lower band according to either an audio coding scheme on frequency domain or a speech coding scheme on time domain, wherein the spectral data of higher band is generated using the decoded spectral data of lower band.
To further achieve these and other advantages and in accordance with the purpose of the present invention, an apparatus for processing an audio signal, comprising: a de-multiplexer receiving a spectral data of lower band and type information indicating a particular band extension scheme for a current frame of the audio signal from among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme; a first band extension decoding unit, when the type information indicates the first band extension scheme for the current frame, generating a spectral data of higher band in the current frame using the spectral data of lower band by performing the first band extension scheme; and a second band extension decoding unit, when the type information indicates the second band extension scheme for the current frame, generating the spectral data of higher band in the current frame using the spectral data of lower band by performing the second band extension scheme, wherein the first band extension scheme is based on a first data area of the spectral data of lower band, and wherein the second band extension scheme is based on a second data area of the spectral data of lower band.
According to the present invention, the de-multiplexer further receives band extension information including envelop information, and the first band extension scheme or the second band extension scheme is performed using the band extension information.
According to the present invention, the apparatus further comprises an audio signal decoder decoding the spectral data of lower band according to an audio coding scheme on frequency domain; and, a speech signal decoder decoding the spectral data of lower band according to a speech coding scheme on time domain, wherein the spectral data of higher band is generated using the spectral data of lower band decoded by either the audio signal decoder or the speech signal decoder.
To further achieve these and other advantages and in accordance with the purpose of the present invention, a method for processing an audio signal, comprising: detecting a transient proportion for a current frame of the audio signal by an audio processing apparatus; determining a particular band extension scheme for the current frame among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme based on the transient proportion; generating type information indicating the particular band extension scheme; when the particular band extension scheme is the first band extension scheme for the current frame, generating a spectral data of higher band in the current frame using the spectral data of lower band by performing the first band extension scheme; when the particular band extension scheme is the second band extension scheme for the current frame, generating the spectral data of higher band in the current frame using the spectral data of lower band by performing the second band extension scheme; and transferring the type information and the spectral data of lower band, wherein the first band extension scheme is based on a first data area of the spectral data of lower band, and wherein the second band extension scheme is based on a second data area of the spectral data of lower band.
To further achieve these and other advantages and in accordance with the purpose of the present invention, an apparatus for processing an audio signal, comprising: a transient detecting part detecting a transient proportion for a current frame of the audio signal; a type information generating part determining a particular band extension scheme for the current frame among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme based on the transient proportion, the type information generating part generating type information indicating the particular band extension scheme; a first band extension encoding unit, when the particular band extension scheme is the first band extension scheme for the current frame, generating a spectral data of higher band in the current frame using the spectral data of lower band by performing the first band extension scheme; a second band extension encoding unit, when the particular band extension scheme is the second band extension scheme for the current frame, generating the spectral data of higher band in the current frame using the spectral data of lower band by performing the second band extension scheme; and a multiplexer transferring the type information and the spectral data of lower band, wherein the first band extension scheme is based on a first data area of the spectral data of lower band, and wherein the second band extension scheme is based on a second data area of the spectral data of lower band.
To further achieve these and other advantages and in accordance with the purpose of the present invention, a computer-readable medium comprising instructions stored thereon, which, when executed by a processor, causes the processor to perform operations, the instructions comprising: receiving a spectral data of lower band and type information indicating a particular band extension scheme for a current frame of an audio signal from among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme, by an audio processing apparatus; when the type information indicates the first band extension scheme for the current frame, generating a spectral data of higher band in the current flame using the spectral data of lower band by performing the first band extension scheme; and when the type information indicates the second band extension scheme for the current frame, generating the spectral data of higher band in the current frame using the spectral data of lower band by performing the second band extension scheme, wherein the first band extension scheme is based on a first data area of the spectral data of lower band, and wherein the second band extension scheme is based on a second data area of the spectral data of lower band.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
In the drawings:
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. First of all, terminologies or words used in this specification and claims are not construed as limited to the general or dictionary meanings and should be construed as the meanings and concepts matching the technical idea of the present invention based on the principle that an inventor is able to appropriately define the concepts of the terminologies to describe the inventor's invention in best way. The embodiment disclosed in this disclosure and configurations shown in the accompanying drawings are just one preferred embodiment and do not represent all technical idea of the present invention. Therefore, it is understood that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents at the timing point of filing this application.
The following terminologies in the present invention can be construed based on the following criteria and other terminologies failing to be explained can be construed according to the following purposes. First of all, it is understood that the concept ‘coding’ in the present invention can be construed as either encoding or decoding in case. Secondly, ‘information’ in this disclosure is the terminology that generally includes values, parameters, coefficients, elements and the like and its meaning can be construed as different occasionally, by which the present invention is non-limited.
In this disclosure, in a broad sense, an audio signal is conceptionally discriminated from a video signal and designates all kinds of signals that can be auditorily identified. In a narrow sense, the audio signal means a signal having none or small quantity of speech characteristics. Audio signal of the present invention should be construed in a broad sense. And, the audio signal of the present invention can be understood as a narrow-sense audio signal in case of being used by being discriminated from a speech signal.
Referring to
The encoder side 100 of the audio signal processing apparatus determines whether to apply a band extension scheme according to a characteristic of an audio signal and then generates coding scheme information according to the determination. Subsequently, the decoder side 200 selects whether to apply the band extension scheme per frame according to the coding scheme information.
The sibilant detecting unit 110 detects a sibilant proportion for a current frame of an audio signal. Based on the detected sibilant proportion, the sibilant detecting unit 110 generates coding scheme information indicating whether the band extension scheme will be applied to the current frame. In this case, the sibilant proportion means an extent for a presence or non-presence of sibilant in the current frame. The sibilant is a consonant such as a hissing sound generated using friction of air sucked into a narrow gap between teeth. For instance, such a sibilant includes ‘’, ‘’ and the like in Korean. For instance, such a sibilant includes such a consonant ‘s’ in English. Meanwhile, affricate is a consonant sound that begins as a plosive and becomes a fricative such as ‘’, ‘’, ‘’, etc. in Korean. In this disclosure, ‘sibilant’ is not limited to a specific sound but indicates a sound of which peak band having maximum energy belonging to a frequency band higher than that of other sounds. Detailed configuration of the sibilant detecting unit 110 will be explained later with reference to
As a result of detecting the sibilant proportion, if it is determined that a prescribed frame has a less sibilant proportion, an audio signal is encoded by the first encoding unit 122. If it is determined that a prescribed frame has a more sibilant proportion, an audio signal is encoded by the second encoding unit 124.
The first encoding unit 122 is an element that encodes an audio signal in a frequency domain based band extension scheme. In this case, by the frequency domain based band extension scheme, spectral data corresponding to a higher band in wide band spectral data is encoded using all or a portion of a narrow band. This scheme is able to reduce the bit number in consideration of the principle of correlation between a high frequency band and a low frequency band. In this case, the band extension scheme is based on a frequency domain and the spectral data is the data frequency-transformed by a QMF (quadrature mirror filter) filterbank or the like. A decoder reconstructs spectral data of a higher band from narrow band spectral data using band extension information. In this case,the higher band is a band having a frequency equal to or higher than a boundary frequency. The narrow band (or lower band) is a band having a frequency equal to or lower than a boundary frequency and is constructed with consecutive bands. This frequency domain based band extension scheme may conform with the SBR (spectral band replication) or eSBR (enhanced spectral band replication) standard, by which the present invention is non-limited.
Meanwhile, this frequency domain based band extension scheme is based on the correlation between a high frequency band and a low frequency band. And, this correlation may be strong or weak according to a characteristic of an audio signal. Specifically, in case of the above-mentioned sibilant, since the correlation is weak, if a band extension scheme is applied to a frame corresponding to the sibilant, a sound quality may be degraded. The application relation between energy characteristic of the sibilant and the frequency domain based band extension scheme will be explained in detail with reference to
The second encoding unit 124 is a unit that encodes an audio signal without using the frequency domain based band extension scheme. In this case, instead of not using band extension schemes of all types, the specific frequency domain based band extension scheme applied to the first encoding unit 122 is not used. First of all, the second encoding unit 124 corresponds to a speech signal encoder that applies a linear predictive coding (LPC) scheme. Secondly, the second encoding unit 124 further includes a module according to a time domain based band extension scheme as well as a speech encoder. Thirdly, the second encoding unit 124 is able to further include a module according to a PSDD (partial spectral data duplication) scheme newly proposed by this application. The corresponding details will be explained with reference to
The multiplexer 130 generates at least one bitstream by multiplexing the audio signal encoded by the first encoding unit 122 and the non-band extension encoding unit 124 with the coding scheme information generated by the sibilant detecting unit 110.
The demultiplexer 210 of the decoder side extracts the coding scheme information from the bitstream and then delivers an audio signal of a current frame to the first decoding unit 222 or the second decoding unit 224 based on the coding scheme information. The first decoding unit 222 decodes the audio signal by the above-mentioned band extension scheme and the second decoding unit 224 decodes the audio signal by the above-mentioned LPC scheme (or HBE/PSDD scheme).
Referring to
The transforming part 112 transforms a time domain audio signal into a frequency domain signal by performing frequency transform on an audio signal. In this case, this frequency transform can use one of FFT (fast Fourier transform), MDCT (modified discrete cosine transform) and the like, by which the present invention is non-limited.
The energy estimating part 114 calculates energy per band for a current frame by binding a frequency domain audio signal per several bands. The energy estimating part 114 then decides what is a peak band Bmax, having maximum energy in a whole band. The sibilant deciding part 116 detects a sibilant proportion of the current frame by deciding whether the band Bmax, having the maximum energy is higher or lower than a threshold band Bth. This is based on the characteristic that a vocal sound has maximum energy in a low frequency, whereas a sibilant has maximum energy in a high frequency. In this case, the threshold band Bth may be a preset value set to a default value or a value calculated according to a characteristic of an inputted audio signal.
Referring to
Meanwhile, the formerly mentioned frequency domain based band extension scheme encodes a higher band higher than a boundary frequency using a narrow band lower than the boundary frequency. This scheme is based on the correlation between spectral data of narrow band and spectral data of higher band. Yet, in case of a signal of which energy peak exists in a high frequency, the correlation is relatively reduced. Thus, if the frequency domain based band extension scheme for predicting spectral data of higher band using spectral data of the narrow band is applied, it may degrade a quality of sound. Therefore, to a current frame decided as sibilant, it is preferable that another scheme is applied rather than the frequency domain based band extension scheme.
Referring now to
Referring to (A) of
Meanwhile, a second encoding unit 124b according to a second embodiment includes an HBE encoding part 124b-1 and an LPC encoding part 124b-2. And, a second decoding unit 224b according to the second embodiment includes an LPC decoding part 224b-1 and an HBE decoding part 224b-2. The HBE encoding part 124b-1 and the HBE decoding part 224b-2 are elements for encoding/decoding an audio signal according to HBE scheme. The HBE (high band extension) scheme is a sort of a time domain based band extension scheme. An encoder generates HBE information, i.e., spectral envelope modeling information and frame energy information, for a high frequency signal and also generates an excitation signal for a low frequency signal. In this case, the spectral envelope modeling information may correspond to information indicating that an LP coefficient generated through time domain based LP (linear prediction) analysis is transformed into ISP (immittance spectral pair). The frame energy information may correspond to information determined by comparing original energy to synthesized energy per 64 subframes. A decoder generates a high frequency signal by shaping an excitation signal of a low frequency signal using the spectral envelope modeling information and the frame energy information. This HBE scheme differs from the above-mentioned frequency domain based band extension scheme in being based on a time domain. In aspect of time axis waveform, the sibilant is a very complicated and random noise-like signal. If the sibilant is band-extended based on a frequency domain, it may become very inaccurate. Yet, since the HBE is based on a time domain, it is able to appropriately process the sibilant. Meanwhile, if the HBE scheme further includes post-processing for reducing buzzness of a high frequency excitation signal, it is able to further enhance performance on a sibilant frame.
Meanwhile, the LPC encoding part 124b-2 and the LPC decoding part 224b-1 perform the same functions of the elements 124a-1 and 224a-1 having the same names of the first embodiments. According to the first embodiment, linear predictive encoding/decoding is performed on a whole band of a current frame. Yet, according to the second embodiment, linear predictive encoding is performed not on a whole band but on a narrow band (or lower band) after execution of HBE. After the linear predictive decoding has been performed on the narrow band, HBE decoding is performed.
A second encoding unit 124c according to a third embodiment includes a PSDD encoding part 124c-1 and an LPC encoding part 124c-2. And, a second decoding unit 224c according to the third embodiment includes an LPC decoding part 224c-1 and a PSDD decoding part 224c-2. The frequency domain based band extension scheme performed by the first encoding unit 122 shown in
Meanwhile, the LPC encoding and decoding parts described with reference to (A) to (C) of
Referring to (A) of
In this case, a band for transferring data to a decoder includes a low frequency band (sfb0, . . . , sfbs−1) and a copy band (cb) (sfbs, sfbn−4, sfbn−2) in a whole band (sfb0, . . . , sfbn−1). The copy band is a band starting from a start band (sb) or a start frequency and is used for prediction of a target band (tb) (sfbs+1, sfbn−3, sfbn−1). The target band is a band predicted using the copy band and does not transfer spectral data to a decoder.
Referring to (A) of
In (A) of
Referring to (B) of
Referring to (A) of
Referring to (A) of
Referring to
The plural-channel encoder 305 generates a mono or stereo downmix signal by receiving an input of a plurality of channel signals (at least two channel signals) (hereinafter named a multi-channel signal) and then performing downmixing thereon. And, the plural-channel encoder 305 generates spatial information necessary to upmix a downmix signal into a multi-channel signal. In this case, the spatial information can include channel level difference information, inter-channel correlation information, channel prediction coefficient, downmix gain information and the like. If the audio signal encoding device 300 receives a mono signal, it is understood that the mono signal can bypass the plural-channel encoder 305 without being downmixed.
The sibilant detecting unit 310 detects a sibilant proportion of a current frame. If the detected sibilant proportion is non-sibilant, the sibilant detecting unit 310 delivers an audio signal to the first encoding unit 322. If the detected sibilant proportion is sibilant, an audio signal bypasses the first encoding unit 322 and the sibilant detecting unit 310 delivers the audio signal to the speech signal encoder 340. The sibilant detecting unit 310 generates coding scheme information indicating whether a band extension coding scheme is applied to the current frame and then delivers the generated coding scheme information to the multiplexer 350.
The first encoding unit 322 generates spectral data of narrow band and band extension information by applying the frequency domain based band extension scheme, which was described with reference to
If a specific frame or segment of a downmix signal has a large audio characteristic, the audio signal encoder 330 encodes the downmix signal according to an audio coding scheme. In this case, the audio coding scheme may follow the AAC (advanced audio coding) standard or the HE-AAC (high efficiency advanced audio coding) standard, by which the present invention is non-limited. Meanwhile, the audio signal encoder 340 may correspond to an MDCT (modified discrete transform) encoder.
If a specific frame or segment of a downmix signal has a large speech characteristic, the speech signal encoder 340 encodes the downmix signal according to a speech coding scheme. In this case, the speech coding scheme may follow the AMR-WB (adaptive multi-rate wide-band) standard, by which the present invention is non-limited. Meanwhile, the speech signal encoder 340 can further include the former LPC (linear prediction coding) encoding part 124a-1, 124b-1 or 124c-1 described with reference to
And, the multiplexer 350 generates an audio signal bitstream by multiplexing spatial information, coding scheme information, band extension information, spectral data and the like.
As mentioned in the foregoing description,
Referring to
The demultiplexer 510 extracts spectral data, coding scheme information, band extension information, spatial information and the like from an audio signal bitstream. The demultiplexer 510 delivers an audio signal corresponding to a current frame to the audio signal decoder 520 or the speech signal decoder 530 according to the coding scheme information. In particular, in case that the coding scheme information indicates that a band extension scheme is applied to the current frame, the demultiplexer 510 delivers the audio signal to the audio signal decoder 520. In case that the coding scheme information indicates that a band extension scheme is not applied to the current frame, the demultiplexer 510 delivers the audio signal to the speech signal decoder 530.
If spectral data corresponding to a downmix signal has a large audio characteristic, the audio signal decoder 520 decodes the spectral data according to an audio coding scheme. In this case, as mentioned in the foregoing description, the audio coding scheme can follow the AAC standard or the HE-AAC standard. Meanwhile, the audio signal decoder 520 can include a dequantizing unit (not shown in the drawing) and an inverse transform unit (not shown in the drawing). Therefore, the audio signal decoder 520 is able to perform dequantization and inverse transform on spectral data and scale factor carried on a bitstream.
If the spectral data has a large speech characteristic, the speech signal decoder 530 decodes a downmix signal according to a speech coding scheme. As mentioned in the forgoing description, the speech coding scheme may follow the AMR-WB (adaptive multi-rate wide-band) standard, by which the present invention is non-limited. As mentioned in the foregoing description with reference to
The first decoding unit 540 decodes a band extension information bitstream and then generates an audio signal of a high frequency band by applying the aforesaid frequency domain based band extension scheme to an audio signal using the decoded information.
If the decoded audio signal is a downmix, the plural-channel decoder 550 generates an output channel signal of a multi-channel signal (stereo signal included) using spatial information.
As mentioned in the foregoing description,
The audio signal processing apparatus according to the present invention is available for various products to use. Theses products can be grouped into a stand alone group and a portable group. A TV, a monitor, a settop box and the like can be included in the stand alone group. And, a PMP, a mobile phone, a navigation system and the like can be included in the portable group.
Referring to
A user authenticating unit 720 receives an input of user information and then performs user authentication. The user authenticating unit 720 can include at least one of a fingerprint recognizing unit 720A, an iris recognizing unit 720B, a face recognizing unit 720C and a voice recognizing unit 720D. The fingerprint recognizing unit 720A, the iris recognizing unit 720B, the face recognizing unit 720C and the speech recognizing unit 720D receive fingerprint information, iris information, face contour information and voice information and then convert them into user informations, respectively. Whether each of the user informations matches pre-registered user data is determined to perform the user authentication.
An input unit 730 is an input device enabling a user to input various kinds of commands and can include at least one of a keypad unit 730A, a touchpad unit 730B and a remote controller unit 730C, by which the present invention is non-limited.
A signal coding unit 740 performs encoding or decoding on an audio signal and/or a video signal, which is received via the wire/wireless communication unit 710, and then outputs an audio signal in time domain. The signal coding unit 740 includes an audio signal processing apparatus 745. As mentioned in the foregoing description, the audio signal processing apparatus 745 corresponds to the above-described embodiment of the present invention. Thus, the audio signal processing apparatus 745 and the signal coding unit including the same can be implemented by at least one or more processors.
A control unit 750 receives input signals from input devices and controls all processes of the signal decoding unit 740 and an output unit 760. In particular, the output unit 760 is an element configured to output an output signal generated by the signal decoding unit 740 and the like and can include a speaker unit 760A and a display unit 760B. If the output signal is an audio signal, it is outputted to a speaker. If the output signal is a video signal, it is outputted via a display.
Referring to (A) of
Referring to FIG. (B) of
Referring to
The type determining unit 1110 analyzes an inputted audio signal and then detects a transient proportion. The type determining unit 1110 discriminates a stationary interval and a transient interval from each other. Based on this discrimination, the type determining unit 1110 determines a band extension scheme of a specific type for a current frame among at least two band extension schemes and then generates type information for identifying the determined scheme. Detailed configuration of the type determining unit 1110 will be explained later with reference to
The first band extension encoding unit 1120 encodes a corresponding frame according to the band extension scheme of a first type. And, the second band extension encoding unit 1122 encodes a corresponding frame according to the band extension scheme of a second type. The first band extension encoding unit 1120 is able to perform bandpass filtering, time stretching processing, decimation processing and the like. The first type band extension scheme and the second type band extension scheme will be explained in detail with reference to
The multiplexer 1130 generates an audio signal bitstream by multiplexing the lower band spectral data generated by the first and second band extension encoding units 1120 and 1122 and the type information generated by the type determining unit 1110 and the like. The demultiplexer 1210 of the decoder side 1200 extracts the lower band spectral data, the type information and the like from the audio signal bitstream. Subsequently, the demultiplexer 1210 delivers a current frame to the first or second band extension decoding unit 1220 or 1222 according to the band extension scheme type indicated by the type information. The first band extension decoding unit 1220 reversely decodes the current frame according to the first type band extension scheme encoded by the first band extension encoding unit 1120. Moreover, the first band extension decoding unit 1220 is able to perform bandpass filtering, time stretching processing, decimation processing and the like. Likewise, the second band extension decoding unit 1222 generates spectral data of higher band using the lower band spectral data in a manner of decoding the current frame according to the second type band extension scheme.
Referring to
The transient detecting part 1112 discriminates a stationary interval and a transient interval from each other by analyzing energy of an inputted audio signal. The stationary interval is an interval having a flat energy interval of an audio signal, whereas the transient interval is an interval in which energy of an audio signal varies abruptly. Since energy abruptly varies in the transient interval, a listener may have difficult in recognizing an artifact occurring according to a type change of a band extension scheme. On the contrary, since sound flows smoothly in the stationary interval, if a band extension scheme type is changed in this interval, it seems that the sound is interrupted abruptly and instantly. Hence, when it is necessary to change a time of a band extension scheme from a first type into a second type, if the type is changed not in the stationary interval but in the transient interval, it is able to hide the artifact according to the type change like the masking effect according to psychoacoustic model.
Thus, the type information generating part 1114 determines the band extension scheme of a specific type for a current frame among at least two band extension schemes and then generate type information indicating the determined band extension scheme. At least two band extension schemes will be described with reference to
In order to determine a specific band extension scheme, a type of a band extension scheme is temporarily determined by referring to a coding scheme received from the coding scheme deciding part 1140 and then finally determines a type of the band extension scheme by referring to the information received from the transient detecting part 1112. This is explained in detail with reference to
Referring to
The following first band extension scheme may correspond to first band extension scheme mentioned with reference to
As mentioned in the foregoing description, a band extension scheme generates wideband spectral data using narrowband spectral data. In this case, the narrowband may correspond to a lower band, whereas a newly generated band may correspond to a higher band.
Referring to (A) of
Referring to (B)-1 and (B)-2 of
Referring to
The plural channel encoder 1305 receives an input of a plural channel signal (signal having at least two channels). The plural channel encoder 1305 generates a mono or stereo downmix signal by downmixing the received signal and also generates spatial information required for upmixing the downmix signal into a multi-channel signal. In this case, the spatial information can include channel level difference information, inter-channel correlation information, channel prediction coefficient, downmix gain information and the like. If the audio signal encoding apparatus 1300 receives a mono signal, it is understood that the received mono signal can bypass the plural channel encoder 1305 instead of being downmixed by the plural channel encoder 1305.
The type determining unit 1310 determines a type of a band extension scheme to apply to a current frame and then generates type information indicating the determined type. If a first band extension scheme is applied to a current frame, the type determining unit 1310 delivers an audio signal to the first band extension encoding unit 1320. If a second band extension scheme is applied to a current frame, the type determining unit 1310 delivers an audio signal to the second band extension encoding unit 1322. Each of the first and second band extension encoding units 1320 and 1322 generates band extension information for reconstructing a higher band using a lower band by applying a band extension scheme according to each type. Subsequently, a signal encoded by a band extension scheme is encoded by the audio signal encoder 1330 or the speech signal encoder 134 according to a characteristic of the signal irrespective of a type of the band extension scheme. Coding scheme information according to the characteristic of the signal may include the information generated by the former coding scheme deciding part 1340 described with reference to
If a specific frame or segment of a downmix signal has a dominant audio characteristic, the audio signal encoder 1330 encodes the downmix signal according to a audio coding scheme. In this case, the audio coding scheme may follow the AAC (advanced audio coding) standard or the HE-AAC (high efficiency advanced audio coding) standard, by which the present invention is non-limited. Meanwhile, the audio signal encoder 1330 may include a MDCT (modified discrete transform) encoder.
If a specific frame or segment of a downmix signal has a dominant speech characteristic, the speech signal encoder 1340 encodes the downmix signal according to a speech coding scheme. In this case, the speech coding scheme may follow the AMR-WB (adaptive multi-rate wideband) standard, by which the present invention is non-limited. Meanwhile, the speech signal encoder 1340 can further include a LPC (linear prediction coding) encoding part. If a harmonic signal has high redundancy on a time axis, it can be modeled by linear prediction for predicting a current signal from a past signal. In this case, if a linear prediction coding scheme is adopted, it is able to raise coding efficiency. Meanwhile, the speech signal encoder 1340 can include a time domain encoder.
And, the multiplexer 1350 generates an audio signal bitstream by multiplexing spatial information, coding scheme information, band extension information, spectral data and the like.
Referring to
The demultiplexer 1410 extracts spatial information, coding scheme information, band extension information, spectral data and the like from an audio signal bitstream. According to the coding scheme information, the demultiplexer 1410 delivers an audio signal corresponding to a current frame to the audio signal decoder 1420 or the speech signal decoder 1430.
If the spectral data corresponding to a downmix signal has a dominant audio characteristic, the audio signal decoder 1420 decodes the spectral data according to an audio coding scheme. In this case, as mentioned in the foregoing description, the audio coding scheme can follow the AAC standard, the HE-AAC standard, etc. Meanwhile, the audio signal decoder 1420 can include a dequnatizing unit (not shown in the drawing) and an inverse transform unit (not shown in the drawing). Therefore, the audio signal decoder 1420 is able to perform dequantization and inverse-transform on the spectral data and scale factor carried on the bitstream.
If the spectral data has a dominant speech characteristic, the speech signal decoder 1430 decodes the downmix signal according to a speech coding scheme. As mentioned in the foregoing description, the speech coding scheme may follow the AMR-WB (adaptive multi-rate wideband) standard, by which the present invention is non-limited. And, the speech signal decoder 1430 can include an LPC decoding part.
As mentioned in the foregoing description, according to the type information indicating specific extension information among at least two band extension schemes, the audio signal is delivered to the first band extension decoding unit 1440 or the second band extension decoding unit 1442. The first/second band extension decoding unit 1440/1442 reconstructs wideband spectral data using a portion or whole part of the narrowband spectral data according to the band extension scheme of the corresponding type.
If the decoded audio signal is a downmix, the plural channel decoder 1450 generates an output channel signal of a multi-channel signal (stereo signal included) using the spatial information.
The audio signal processing apparatus according to the present invention is available for various products to use. Theses products can be grouped into a stand alone group and a portable group. A TV, a monitor, a settop box and the like belong to the stand alone group. And, a PMP, a mobile phone, a navigation system and the like belong to the portable group.
Referring to
An audio signal processing method according to the present invention can be implemented into a computer-executable program and can be stored in a computer-readable recording medium. And, multimedia data having a data structure of the present invention can be stored in the computer-readable recording medium. The computer-readable media include all kinds of recording devices in which data readable by a computer system are stored. The computer-readable media include ROM, RAM, CD-ROM, magnetic tapes, floppy discs, optical data storage devices, and the like for example. And, a bitstream generated by the above encoding method can be stored in the computer-readable recording medium or can be transmitted via wire/wireless communication network.
Accordingly, the present invention provides the following effects and/or advantages.
First of all, the present invention selectively applies a band extension scheme per frame according to a characteristic of a signal per frame, thereby enhancing a quality of sound without incrementing the number of bits considerably.
Secondly, the present invention applies an LPC (linear predictive coding) scheme suitable for a speech signal, an HBE (high band extension) scheme or a scheme (PSDD) newly proposed by the present invention to a frame determined as including a sound (e.g., sibilant) having high frequency band energy therein instead of a band extension scheme, thereby minimizing a loss of sound quality.
Thirdly, the present invention applies various types of band extension scheme per time, in the application of various types of band extension scheme, because it is able to reduce artifact of interval in change of band extension scheme, it is able to improve sound quality of audio signal with applying band extension scheme.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the inventions. Thus, it is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
Accordingly, the present invention is applicable to encoding and decoding an audio signal.
While the present invention has been described and illustrated herein with reference to the preferred embodiments thereof, it will be apparent to those skilled in the art that various modifications and variations can be made therein without departing from the spirit and scope of the invention. Thus, it is intended that the present invention covers the modifications and variations of this invention that come within the scope of the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2009-0090705 | Sep 2009 | KR | national |
This application claims the benefit of U.S. Provisional Application No. 61/100,263 filed on Sep. 25, 2008, U.S. Provisional Application No. 61/118,647, filed on Nov. 30, 2008, and KR Patent Application No. 10-2009-0090705, filed on Sep. 24, 2009, which are hereby incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
5455888 | Iyengar et al. | Oct 1995 | A |
5581652 | Abe et al. | Dec 1996 | A |
5950153 | Ohmori et al. | Sep 1999 | A |
5978759 | Tsushima et al. | Nov 1999 | A |
6658383 | Koishida et al. | Dec 2003 | B2 |
6681202 | Miet et al. | Jan 2004 | B1 |
6988066 | Malah | Jan 2006 | B2 |
7359854 | Nilsson et al. | Apr 2008 | B2 |
7546237 | Nongpiur et al. | Jun 2009 | B2 |
20020007280 | McCree | Jan 2002 | A1 |
20020138268 | Gustafsson | Sep 2002 | A1 |
20030050786 | Jax et al. | Mar 2003 | A1 |
20030093278 | Malah | May 2003 | A1 |
20040138876 | Kallio et al. | Jul 2004 | A1 |
20040243402 | Ozawa | Dec 2004 | A1 |
20050004793 | Ojala et al. | Jan 2005 | A1 |
20050004803 | Smeets et al. | Jan 2005 | A1 |
20050267739 | Kontio et al. | Dec 2005 | A1 |
20050281416 | Aarts et al. | Dec 2005 | A1 |
20060149538 | Lee et al. | Jul 2006 | A1 |
20070088558 | Vos et al. | Apr 2007 | A1 |
20080208572 | Nongpiur et al. | Aug 2008 | A1 |
20080215344 | Song et al. | Sep 2008 | A1 |
20090198498 | Ramabadran et al. | Aug 2009 | A1 |
20090201983 | Jasiuk et al. | Aug 2009 | A1 |
Number | Date | Country |
---|---|---|
10-0566630 | Mar 2006 | KR |
10-0707174 | Jul 2006 | KR |
9857436 | Dec 1998 | WO |
0191111 | Nov 2001 | WO |
02052545 | Jul 2002 | WO |
03044777 | May 2003 | WO |
Entry |
---|
Hsu, “Robust Bandwidth Extension of Narrowband Speech”, Department of Electrical & Computer Engineering, McGill University Montreal, Canada, Nov. 2004. |
Ehret et al., “Audio Coding Technology of ExAC,” Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, Oct. 20-22, 2004. Hong Kong, pp. 290-293. |
Seng et al., “Low Power Spectral Band Replication Technology for the MPEG-4 Audio Standard,” Joint Conference of International Conference on Information, Communication and Signal Processing and the Fourth Pacific Rim Conference on Multimedia, Dec. 15-16, 2003, Singapore, pp. 1408-1412. |
Shin et al., “Designing a unified speech/audio codec by adopting a single channel harmonic source separation module,” IEEE 08, ICASSP, Mar. 31-Apr. 4, 2008, Korea, pp. 185-188. |
Stott, “DRM-key technical features,” EBU Technical Review, Mar. 2001, pp. 1-24. |
Number | Date | Country | |
---|---|---|---|
20100114583 A1 | May 2010 | US |
Number | Date | Country | |
---|---|---|---|
61100263 | Sep 2008 | US | |
61118647 | Nov 2008 | US |