This application is based on and claims the benefit of priority from the prior Japanese Patent Application No. 2007-001708, filed on Jan. 9, 2007; the entire contents of which are incorporated herein by reference
1. Technical Field
The present invention relates to audio data processing apparatus, a terminal and a method of audio data processing.
2. Description of Related Art
A background noise canceling technique is generally known in a mobile phone realm. For example, JP-2004-289614 discloses a technique for improving clearness of voice signal where the voice signal is emphasized based on estimated signal characteristic of background noise and signal characteristic of voice signal from a microphone.
According to an aspect of the invention, there is provided an audio data processing apparatus including: a decoding unit configured to extract an encoding parameter from encoded audio data by decoding the encoded audio data; an acquisition unit configured to acquire a background noise signal; a correction gain calculating unit configured to calculate a correction gain for correcting frequency characteristics of the audio data by using the encoding parameter and the background noise signal; and a frequency characteristics correcting unit configured to correct the frequency characteristics of the audio data based on the correction gain.
In the accompanying drawings:
An embodiment of the present invention will be described below with reference to the accompanying drawings.
Then, the Audio data processing apparatus 10 corrects frequency characteristics of the playback signal S40 based on an encoding parameter S20 outputted from the audio decoder 20 and a background noise signal S30 obtained by a microphone 30. Thus, influence of a background noise can be reduced even in listening to music, watching a broadcast, etc., in addition to voice communication, etc.
More specifically, in the Audio data processing apparatus 10, the encoded audio data S10, which is read from storage media (not shown) or received by an antenna (not shown), is inputted into a syntax analyzing unit 40. The syntax analyzing unit 40 as parsing means extracts and outputs the audio encoding parameter S20 to an inverse quantizing unit 50 by decoding the encoded audio data S10 with use of, for example, Huffman decoding. The encoding parameter S20 includes a quantization step size S20A called scale factor, and a quantized spectrum S20B composed of a plurality of quantization values which is extracted by quantizing spectrum with the quantization step size S20A. In addition, the quantized spectrum S20B includes quantized audio data in the frequency domain.
Moreover, generally, in an audio encoding method such as AAC (Advanced Audio Coding), redundancy of a spectrum (audio data) transformed into a frequency domain is reduced.
In an audio encoder (not shown), the quantization step size S20A and the quantized spectrum S20B are controlled so as to be quantization noise power having a level, in which no noise is perceived (that is, the noise is masked), for each of frequency bands (scale factor bands) which has frequency resolution based on a human auditory system, in consideration of, for example, signal characteristics such as a tonality (characteristics which indicate a predictability signal in the time domain), and masking characteristics of the hearing (characteristics that a certain signal component auditorily masks signal component which are positioned in the vicinity of the signal in the time domain and the frequency domain).
The inverse quantizing unit 50 inversely quantizes the quantized spectrum S20B based on the quantization step size S20A to convert the quantized spectrum S20B into a spectrum S50 having a normal scale (audio data in a frequency domain).
A frequency-time transforming unit 60 transforms the spectrum S50 in the frequency domain to a PCM (Pulse Code Modulation) signal s40 in the time domain. The playback signal (PCM) S40 is transmitted to a digital-analog (D/A) converting unit 80 via a frequency characteristics correcting unit 70 to be converted into an analog signal (audio signal), and then outputted from a headphones 90 as outputting means.
On the other hand, in the embodiment, the Audio data processing apparatus 10 corrects the frequency characteristics of the playback signal S40 so as to comfortably listen to voice, music and the like, even under presence of background noise. More specifically, in the Audio data processing apparatus 10, the background noise is obtained by the microphone 30 for voice communication to be inputted into a correction gain calculating unit 100 as a background noise signal S30.
The correction gain calculating unit 100 estimates acceptable quantization noise power, which is acceptable quantization noise power, by using the quantization step size S20A and the quantized spectrum S20B transmitted from the syntax analyzing unit 40 via the inverse quantizing unit 50, and calculates the correction gain in a frequency band to be corrected so that power of the background noise signal S30 obtained by the microphone 30 becomes smaller than the acceptable quantization noise power.
First, the frequency characteristics correcting unit 70 subjects the playback signal S40 outputted from the frequency-time converting unit 60 to time-frequency conversion to generate the spectrum which is the audio data in the frequency domain, and then performs equalizing processing, which is correcting processing of the frequency characteristics by multiplying the spectrum by a correction gain Gsm(k) calculated by the correction gain calculating unit 100.
Next, the frequency characteristics correcting unit 70 subjects the spectrum subjected to the correcting processing to the frequency-time conversion to generate a playback signal S60 subjected to the correcting processing of the frequency characteristics, and then the playback signal S60 is converted into an analog signal in the D/A converting unit 80 and the analog signal is playbacked from the headphones 90. Thus, the influence of the background noise is reduced, and sound quality can be improved.
A background noise power calculating unit 120 calculates background noise power for each frequency band (scale factor band), that is the same as frequency band for inverse quantization, from the background noise spectrum S70, and then corrects the background noise power based on coefficients, which are calculated beforehand in consideration of analog characteristics of the microphone 30 and an attenuation rate of the background noise which is leaked into the headphones 90, to calculate background noise power BGN(k). Moreover, k represents an index of each frequency band.
On the other hand, an acceptable quantization noise power calculating unit 130 calculates acceptable quantization noise power QN(k) by using the quantization step size S20A and quantized spectrum S20B outputted from the inverse quantizing unit 50 of the audio decoder 20.
More specifically, in the case where the audio encoding method is, for example, AAC, the inverse-quantization processing in the inverse quantizing unit 50 is represented by the following equation (1):
wherein k represents the index of the frequency band (scale factor band), sf(k) represents the quantization step size (scale factor), i represents a frequency index in the frequency band, q(i) represents the quantization value (quantized spectrum coefficient (integer)), and invq(i) represents an inverse-quantized value.
When the inverse-quantization value invq(i) of equation (1) is represented as a function of k and q(i), IQ(k, q(i)), a quantization step size Qstep(k, i) corresponding to the quantization value q(i) is represented by the following equation (2)
Qstep(k,i)=IQ(k,q(i)+0.5)−IQ(k,q(i)−0.5) (2)
The quantization noise power QN(k) in the frequency band k is calculated by the following equation (3):
wherein sfb0(k) represents a low band end of the frequency index in the frequency band (scale factor band) k, and sfb1(k) represents a high band end of the frequency index in the frequency band k.
Generally, in consideration of a signal level of an input signal and masking characteristics of human auditory system, the audio encoder calculates a masking threshold as a noise level, in which no quantization noise is perceived, and controls the quantization step size in accordance with the masking threshold.
Accordingly, when the noise power is smaller than the quantization noise power QN(k), no noise is perceived and the noise power is allowed. Thus, the allowable quantization noise power calculating unit 130 outputs this quantization noise power QN(k) as the allowable quantization noise power QN(k) in the frequency band k.
A power comparing unit 140 compares the background noise power BGN(k) with the acceptable quantization noise power QN(k) for all the frequency bands and outputs the index k of the frequency band to be corrected, in which the background noise power BGN(k) is larger than the allowable quantization noise power QN(k), and the background noise power BGN(k) and the acceptable quantization noise power QN(k) to a gain calculating unit 150.
The gain calculating unit 150 calculates and outputs a correction gain G(k) (>1.0) for raising the signal level in the frequency band to be corrected to a gain smoothing unit 160 by using the following equation (4) so that the background noise power BGN(k) becomes smaller than the acceptable quantization noise power QN(k).
The gain smoothing unit 160 subjects the correction gain G(k) to smoothing processing and outputs the smoothed correction gain to the frequency characteristics correcting unit 70. Thus, discontinuity of characteristics of the vicinity of the corrected frequency bands or a excessive difference between the corrected signal and the original signal can be attenuated which is caused by gain correction of only a specific frequency band.
The gain smoothing unit 160 calculates correction gains Gs(k) in the vicinity of frequency band by using the following equation (5) in the case where the background noise power BGN(k) is larger than the allowable quantization noise power QN(k).
Gs(k)=α(k0,k−k0)·G(k0) (5)
wherein k0 represents the frequency band to be corrected, and α represents smoothing coefficients. Here, the smoothing coefficient α are positive constant coefficients for each frequency band, and has a convex shape in which α (k0, 0) indicating k=k0 is a peak, and the coefficient simply increases before the peak and simply decreases after the peak.
On the other hand, a mask ratio calculating unit 170 (power ratio calculating unit) calculates, in consideration of the masking characteristics of human auditory system, a mask ratio SMR(k), which is a power ratio of the inverse-quantized spectrum S20 to the acceptable quantization noise power QN(k) in the frequency band k to be corrected, by using the acceptable quantization noise power QN(k), and the quantization step size S20A and quantized spectrum S20B.
More specifically, the mask ratio calculating unit 170 calculates and outputs the mask ratio SMR(k) in the frequency band k to the gain smoothing unit 160 by using the following equation (6) using the acceptable quantization noise power QN(k) and the inverse-quantization value invq(i).
The gain smoothing unit 160 corrects the smoothing coefficient α in the frequency domain in accordance with the mask ratio SMR(k). More specifically, the gain smoothing unit 160 compares the mask ratio SMR(k) with a predetermined threshold. The smoothing coefficient α is corrected so as to be small (steep inclination) in the case that the mask ratio SMR(k) is larger than the threshold. Moreover, in this case, if a plurality of thresholds are provided, the smoothing coefficient α may be corrected by a plurality of stages.
A smoothing coefficient αSMR (k0, k) obtained by the correction is represented by the following equation (7) in which correction of the smoothing coefficient α is represented by a function F( ).
αSMR(k0,k)=α(k0,k−k0)·F(SMR(k0)) (7)
Accordingly, since a frequency band having a large mask ratio SMR generally has a strong tonality (weak noise property) and has a little influence on the vicinity of frequency band, the smoothing coefficient α (k, i≅0) of the vicinity of frequency band is corrected so as to be small (so that inclinations of simple increase and decrease are steep).
On the other hand, since a frequency band having a small mask ratio SMR generally has a weak tonality (strong noise property) and has a lot of influence on the vicinity of the frequency band, the smoothing coefficient α (k, i≅0) of the vicinity of frequency band is corrected so as to hardly become small (so that the inclinations are prevented from being steep).
The gain smoothing unit 160 calculates final correction gains Gsm(k) for all the frequency bands by using the following equation (8), while thus considering the mask ratio SMR(k) transmitted from the mask ratio calculating unit 170.
wherein min_k0 represents the low band end of the index of the frequency band to be corrected, and max_k0 represents the high band end of the index of the frequency band to be corrected. Addition is performed for only a inside frequency band among the frequency bands to be corrected.
According to the embodiment, the influence of the background noise is reduced and the sound quality can be improved in not only playing back voice but playing back the encoded audio data S10 such as music. Additionally, in analyzing the signal characteristics of the acceptable quantization noise power QN(k) and the like, an analyzing time is shortened and high speed processing can be realized by using the encoding parameter S20.
Moreover, the present invention is not limited to the above embodiment. For example, the correction gain G(k) is transmitted from the gain calculating unit 150 of the correction gain calculating unit 100 to the frequency characteristics correcting unit 70, and thus the frequency characteristics correcting unit 70 may correct the frequency characteristics by using the correction gain G(k).
According to the above-described embodiment, the quality of playbacked audio signal can be improved regardless of the kind of inputted audio encoded data.
Number | Date | Country | Kind |
---|---|---|---|
P2007-001708 | Jan 2007 | JP | national |