The present document relates to the design of anti-aliasing and/or anti-imaging filters for resamplers using rational resampling factors. In particular, the present document relates to a method for designing such filters having a reduced number of filter coefficients or an increased perceptual performance, as well as to the filters designed using such a method.
Different audio formats may require different sampling rates (e.g. 32 kHz, 44.1 kHz or 48 kHz). In order to transfer an audio signal at a first sampling rate (e.g. at 32 kHz) to an audio signal at a second sampling rate (e.g. at 48 kHz) rational resamplers may be used. The resampling of audio by rational factors typically introduces imaging/aliasing artifacts into the resampled audio signal. An anti-imaging/anti-aliasing filter may be used to suppress the unwanted images and/or aliases of the audio signal. The present document describes anti-imaging/anti-aliasing filters used in rational resamplers. Furthermore, the present document describes a method for designing such anti-imaging/anti-aliasing filters. In particular, filter design methods (and resulting filters) are described which take into account psychoacoustic constraints, in order to provide filters having a reduced number of filter coefficients, while providing subjectively unchanged or similar audio quality of the resampled audio signal. Vice versa, the filter design methods may be used to design filters with a given number of filter coefficients, which provide an improved audio quality compared to filters designed in accordance to conventional filter design methods.
As a consequence of designing improved anti-imaging/anti-aliasing filters, the complexity of rational resamplers may be decreased, while maintaining a given subjective audio quality. Vice versa, the audio quality of the rational resamplers may be increased, while maintaining the rational resamplers at a given computational complexity.
According to an aspect, a method for designing a filter, e.g. an anti-aliasing and/or anti-imaging filter, is described. The filter may be a digital filter comprising a set of filter coefficients. In the following, the filter will be referred to as anti-aliasing filter (even though the filter may also remove imaging effects). The resulting filter may be configured to reduce imaging and/or aliasing of an output audio signal at an output sampling rate fsout. The output audio signal is a resampled version of an input audio signal at an input sampling rate fsin. The ratio of the output sampling rate fsout and the input sampling rate fsin is a rational number N/M, wherein N>0, M>0. Without loss of generality, N and M should be mutually prime. In an embodiment, neither N, nor M are equal to 1, meaning that the resampler is neither a pure upsampler by an integer factor N, nor a pure downsampler by an integer factor M. In other words, the resampler comprises an upsampling component by a factor N>1, and a downsampling component by a factor M>1. In yet other words, the fraction N/M may not be an integer value, and the fraction M/N may not be an integer value. By way of example, for fsin=40 kHz, fsout=48 kHz, N=6, M=5; or for fsin=32 kHz, fsout=48 kHz, N=3, M=2; or for fsin=44.1 kHz, fsout=48 kHz, N=160, M=147; or for fsin=32 kHz, fsout=44.1 kHz, N=441, M=320; or vice versa, i.e. for fsin=48 kHz, fsout=40 kHz, N=5, M=6; or for fsin=48 kHz, fsout=32 kHz, N=2, M=3; or for fsin=48 kHz, fsout=44.1 kHz, N=147, M=160; or for fsin=44.1 kHz, fsout=32 kHz, N=320, M=441.
The filter may be operated at an upsampled sampling rate which equals N times the input sampling rate fsin. The upsampled sampling rate also equals M times the output sampling rate fsout.
The method for designing the filter may comprise the step of selecting a pass band edge of the frequency response of the filter. The selection of the pass band edge may comprise the selection of the frequency interval of the pass band. Frequencies smaller than the pass band edge are comprised within the pass band. The method may comprise the step of selecting a stop band edge of the frequency response of the filter. The selection of the stop band edge may comprise the selection of the frequency interval of the stop band. Frequencies greater than the stop band edge are comprised within the stop band.
The method may comprise the step of selecting an allowed deviation of the frequency response of the filter within the stop band. The allowed deviation indicates a deviation of the frequency response of the filter from a predetermined attenuation within the stop band. Typically, the predetermined attenuation of the filter within the stop band is 0 (i.e. −inf dB). As such, the allowed deviation may specify the tolerable deviation of the stop band attenuation from the ideal stop band attenuation at the predetermined value 0. In other words, a target frequency response of the filter may be determined. The target frequency response may specify a pass band attenuation (e.g. a value of 1 or 0 dB), a stop band attenuation (e.g. a value of 0 or −inf dB), a pass band edge and/or a stop band edge. The allowed deviation may be the allowed deviation of the frequency response of the filter from the target frequency response.
The resulting filter may be a low pass filter with a pass band covering a frequency interval in the frequency range of 0 kHz to the pass band edge. In such a case, the stop band of the resulting filter would cover the frequency interval above the stop band edge, wherein the stop band edge corresponds to a higher frequency than the pass band edge.
The allowed deviation of the frequency response of the filter within the stop band may be selected based on a perceptual frequency response indicative of an auditory spectral sensitivity. The perceptual frequency response may indicate the sensitivity of an average listener to particular frequencies of an audio signal. In other words, the perceptual frequency response may indicate how well certain frequencies of an audio signal are perceived by a listener. The perceptual frequency response may be associated with a first perceptual frequency response. The first perceptual frequency response may correspond to or may be indicative of a scaled version of an absolute threshold of hearing curve. The scaling may depend on the desired degree of rejection of the stop band. In particular, the absolute threshold of hearing (ATH) curve may be scaled such that the lowest absolute threshold of the ATH curve or an average value of the scaled ATH curve corresponds to a pre-determined degree of attenuation (e.g. −90 dB) of the target frequency response.
The step of selecting an allowed deviation of the frequency response of the filter within the stop band may comprise the step of selecting the allowed deviation based on images and/or mirrored images of the first perceptual frequency response. Images of the first perceptual frequency response may be copies of the first perceptual frequency response, possibly transposed to other frequency intervals. Mirrored images of the first perceptual frequency response may be mirrored versions of the first perceptual frequency response, possibly transposed to other frequency intervals. Typically, the images and/or mirrored images are transposed or shifted by the output sampling rate and/or a multiple thereof.
As indicated above, the resulting filter may be operated at an upsampled sampling rate M*fsout. As such, a spectrum of the (upsampled) output audio signal may cover a frequency range from 0 to M*fsout/2. As a result of a downsampling operation by the factor M generating the output audio signal, a portion of the spectrum covering the frequency range [(m−1)*fsout/2,(m+1)*fsout/2] for m=2,4, . . . , M, may be shifted to the baseband [−fsout/2,+fsout/2], thereby creating aliasing artifacts in the output audio signal. These artifacts are perceived by a human listener in accordance to an auditory spectral sensitivity reflected in the perceptual frequency response.
In order to reflect the shifting of high frequency ranges into the baseband, the first perceptual frequency response covering a frequency range of [0,+fsout/2], as well as a mirrored image of the first perceptual frequency response covering a frequency range of [−fsout/2,0] may be shifted to the frequency ranges [(m−1)*fsout/2,(m+1)*fsout/2] for m=2,4, . . . , M, thereby creating the images and/or mirrored images of the first perceptual frequency response. These images and/or mirrored images are symmetrical with respect to a frequency derived from the output sampling rate fsout. In particular, these images and/or mirrored images are symmetrical with respect to the output sampling rate fsout and/or a multiple thereof. In other words, some of these images and/or mirrored images may be symmetrical with respect to a symmetry axis corresponding to the output sampling rate fsout and/or a multiple thereof.
As such, the first perceptual frequency response may cover a frequency interval from 0 kHz to half the output sampling rate (i.e. [0, +fsout/2]) or a part of this frequency interval. Furthermore, a baseband mirrored image of the first perceptual frequency response (i.e. a mirrored image of the first perceptual frequency response in the baseband, mirrored along the symmetry axis at 0 kHz) may cover a frequency interval from 0 kHz to minus half the output sampling rate (i.e. [−fsout/2,0]) or a part of this latter frequency interval.
The images of the first perceptual frequency response which are used for the selection of the allowed deviation of the frequency response of the filter within the stop band may correspond to the first perceptual frequency response and/or the baseband mirrored image of the first perceptual frequency response shifted by the output sampling rate and/or a multiple thereof.
The selection of the allowed deviation of the frequency response of the filter within the stop band may comprise the step of setting the allowed deviation within a given frequency interval equal to the images of the first perceptual frequency response (and/or its baseband mirrored image) within the given frequency interval. In other words, the allowed deviation of the frequency response within a given frequency interval may be set equal to the images and/or mirrored images of the first perceptual frequency response within the given frequency interval. The given interval may correspond to the frequency intervals within the stop band outside the “don't care” intervals specified below.
The perceptual frequency response may be associated with a second perceptual frequency response. The second perceptual frequency response may comprise a scaled relative masking threshold curve indicative of the masking by a neighbouring masker frequency. In other words, the second perceptual frequency response may reflect the fact that a signal at a masker frequency masks signals at frequencies within the vicinity of the masker frequency. The relative masking threshold curve may indicate the threshold of hearing a frequency in the vicinity of the masker frequency. Due the masking effect of the masker frequency, the threshold of hearing may be increased in the vicinity of the masker frequency.
As a result of an upsampling operation, images of a baseband masker frequency may be created in the intermediate, i.e. upsampled, frequency domain. Some of these images may be aliased back to the baseband masker frequency during the downsampling operation. The images of the baseband masker frequency in the intermediate frequency domain which meet this condition may be referred to as maskee frequencies as their aliases may be masked by the baseband masker frequency. In other words, the baseband masker frequency (and/or the maskee frequency) may meet the self masking condition that the maskee frequency in the intermediate frequency domain corresponds to a baseband masker frequency of the input audio signal in the frequency range of [−fsin/2, fsin/2] shifted by fsin or a multiple thereof; and that the maskee frequency aliases to the output audio signal at plus and/or minus the baseband masker frequency.
The baseband masker frequency may meet the condition that the absolute value of the baseband masker frequency corresponds to the absolute value of (n*fsin/2−m*fsout/2), for at least some of n=1, . . . , N and m=1, . . . , M. In other words, for the baseband masker frequencies the condition
|f|=|n·fsin/2−m·fsout/2|, withn=1, . . . , N, m=1, . . . , M,
may be met for at least some of the possible values of n and m. In a similar manner, the maskee frequency in the intermediate frequency domain may correspond to n*fsin/2+m*fsout/2, for at least some of n=−N, . . . , N and m=−M, . . . , M, i.e. the maskee frequency may meet the condition
f=n·fs
in/2+m·fsout/2, with n=−N, . . . , N, m=−M, . . . , M,
for at least some of the possible values of n and m.
The above self masking conditions may be used to identify one or more maskee frequencies in the intermediate domain and one or more corresponding masker frequencies in the baseband. The one or more maskee frequencies meeting the self masking condition may correspond to a maximum of a scaled relative masking threshold curve. I.e. for these maskee frequencies the masking caused by the baseband masker frequency may be maximal. In addition, if a frequency in the intermediate domain approximately fulfills the above condition, then the alias of this frequency will typically be close to the baseband masker frequency, and can be subject to masking by that baseband masker frequency. This masking of frequencies in the vicinity of the maskee frequencies in the intermediate domain may be modeled by the progression of the scaled relative masking threshold curve.
In an embodiment, the second perceptual frequency response comprises a scaled relative masking threshold curve for each maskee frequency meeting the self masking condition. The overall perceptual frequency response used for the determination of the allowed deviations of the frequency response may correspond to a combination, e.g. a maximum, of the first perceptual frequency response and the second perceptual frequency response.
The step of selecting the allowed deviation of the frequency response of the filter within the stop band may comprise the step of partitioning the stop band into a plurality of frequency intervals comprising one or more “don't care” intervals. The allowed deviation may take on arbitrary or undefined values within a “don't care” interval. In other words, the allowed deviation of the frequency response may be unconstrained or undefined within a “don't care” interval. The one or more “don't care” intervals may comprise one or more first “don't care” intervals associated with frequencies for which a spectrum of the input audio signal is below a pre-determined input energy threshold. By way of example, the input audio signal may be bandwidth limited to a frequency fx lower than the Nyquist frequency fsin/2. As a result, the spectrum of the input audio signal may be below the input energy threshold in the frequency interval [fx,fsin/2] (as well as in the mirrored frequency interval [−fsin/2,−fx]. The one or more first “don't care” intervals may be associated with the frequencies of the frequency interval [fx,fsin/2] (as well as with the frequencies of the mirrored frequency interval [−fsin/2,−fx]).
The one or more first “don't care” intervals may be symmetrical with respect to a frequency derived from the input sampling rate fsin. In particular, the one or more first “don't care” intervals may be symmetrical with respect to the input sampling rate fsin and/or a multiple thereof. In the above example, the frequency interval [fx,fsin/2], as well as the mirrored image [−fsin/2,−fx] may constitute first “don't care” intervals associated with the input audio signal. As a result of the up-by-N upsampling operation, further images and mirrored images of these first “don't care” intervals may be created at frequency intervals [fx+n*fsin,fsin/2+n*fsin], as well as [−fsin/2+n*fsin,−fx+n*fsin], for n=1, . . . , N/2. These further images and mirrored images may also constitute first “don't care” intervals. As a result of the shift operation by fsin some of the first “don't care” intervals are symmetrical with respect to the input sampling rate fsin and/or a multiple thereof.
The one or more “don't care” intervals may comprise one or more second “don't care” intervals associated with frequencies for which the perceptual frequency response exceeds a pre-determined perceptual threshold. The one or more second “don't care” intervals may correspond to frequencies for which the images and/or mirrored images of the perceptual frequency response exceed the pre-determined perceptual threshold. The perceptual frequency response may take on values indicating a low auditory spectral sensitivity for certain frequency intervals. If the perceptual frequency response exceeds the pre-determined perceptual threshold, i.e. if the perceptual frequency response indicates an auditory sensitivity lying below a pre-determined sensitivity threshold, it may be beneficial to remove any constraints on the target frequency response, thereby increasing the degrees of freedom for the filter design. As such, further “don't care” intervals (i.e. the second “don't care” intervals) may be defined.
As indicated above, the method may comprise the step of selecting a pass band edge and/or a stop band edge of the frequency response of the filter. The pass band edge and/or the stop band edge may be based on the lower one of the input sampling rate fsin and the output sampling rate fsout. In particular, the pass band edge and/or the stop band edge may be set to the lower one of the Nyquist rate fsin/2 of the input audio signal and the Nyquist rate fsout/2 of the output audio signal. Alternatively or in addition, the pass band edge and/or the stop band edge may be selected based on the bandwidth of the input audio signal. The pass band is positioned at frequencies lower than the pass band edge, and the stop band is positioned at frequencies higher than the stop band edge.
The method may comprise the step of determining coefficients of the filter such that the frequency response of the filter is fitted to the allowed deviation of the frequency response. The step of determining coefficients of the filter may comprise the step of fitting the frequency response of the filter to the allowed deviation using a maximum absolute difference criteria or a least mean square criteria. In particular, the coefficients of the filter may be determined using a Remez exchange algorithm or Parks-McClellan algorithm.
The Parks-McClellan algorithm minimizes the maximum of an approximation error function, wherein the approximation error function is based on the different between the frequency response of the filter and the predetermined attenuation within the stop band. Typically, the approximation error function is weighted. The weights may be proportional to the inverse of the allowed deviation of the frequency response of the filter.
In an embodiment, the step of determining coefficients of the filter comprises the step of fitting the frequency response of the filter to the allowed deviation of the frequency response outside of the one or more “don't care” intervals. As indicated above, the allowed deviation may take on arbitrary or undefined values within the one or more “don't care” intervals. As such, the fitting to the frequency response to the allowed deviation may be performed by imposing no constraints on the frequency response of the filter within the one or more “don't care” intervals. In the context of the Parks-McClellan algorithm, the “don't care” intervals may be taken into account by ignoring the approximation error function within the “don't care” intervals. In other words, the maximum approximation error function would not be minimized within the “don't care” intervals.
The method may comprise the step of selecting an allowed deviation of the frequency response of the filter within the pass band. The allowed deviation may indicate the deviation of the magnitude of the frequency response from a predetermined pass band attenuation, which is typically 1 (i.e. 0 dB). The allowed deviation may be a fixed, i.e. frequency independent, allowed deviation within the pass band.
According to further aspect a filter is described, wherein the filter may be designed in accordance to the design method and any related feature outlined in the present document.
According to another aspect a filter is described, wherein the filter is configured to reduce imaging and/or aliasing of an output audio signal at an output sampling rate fsout. The output audio signal may be a resampled version of an input audio signal at an input sampling rate fsin. The ratio of the output sampling rate fsout and the input sampling rate fsin may be a rational number N/M, as outlined above. The filter may operate at an upsampled sampling rate which equals N times the input sampling rate fsin. As indicated above, the upsampled sampling rate may also be equal to M times the output sampling rate fsout. The filter may comprise a pass band and a stop band. Furthermore, the filter may have a have a pass band edge and/or a stop band edge (or a cut off frequency) based on the lower one of the input sampling rate and the output sampling rate.
The frequency response of the filter within the stop band may be associated with a perceptual frequency response indicative of an auditory spectral sensitivity. As outlined above, the perceptual frequency response may be associated with a first frequency response comprising a scaled and/or shifted version of the absolute threshold of hearing (ATH) curve. In particular, the frequency response of the filter within the stop band may be associated with images and/or mirrored images of the first perceptual frequency response. These images and/or mirrored images may be symmetrical with respect to a frequency derived from the output sampling rate fsout. As such, the frequency response of the filter within the stop band may be associated with the first perceptual frequency response covering a frequency interval of [0, +fsout/2], as well as a mirrored image of the first perceptual frequency response covering a frequency interval of [−fsout/2, 0]. In particular, the frequency response of the filter within the stop band may be associated with images of these first perceptual frequency responses centered at the output sampling rate fsout and/or multiples thereof, i.e. images at [(m−1)*fsout/2,(m+1)*fsout/2] for m=2,4, . . . , M.
Alternatively or in addition the frequency response in the stop band may be associated with a second perceptual frequency response comprising a scaled relative masking threshold curve indicative of the masking (by a masker frequency) of neighbouring frequencies. In particular, the overall perceptual frequency response may be a combination of the first and second perceptual frequency response.
The frequency response of the filter may be fitted to the perceptual frequency response using a maximum absolute difference criteria. In an embodiment, the frequency response of the filter does not exceed the perceptual frequency response within selected frequency intervals, e.g. frequency intervals outside of the above mentioned “don't care” intervals. In other words, the attenuation of the filter may not exceed the attenuation defined by the perceptual frequency response within selected frequency intervals.
According to a further aspect, a method for resampling an input audio signal at an input sampling rate fsin to an output audio signal at an output sampling rate fsout is described. The ratio of the output sampling rate fsout and the input sampling rate fsin may be a rational number N/M. The method may comprise the step of providing a set of coefficients of a filter. The filter may be any filter described in the present document, e.g. any filter designed according to a method outlined in the present document. The method may proceed in selecting a first subset of coefficients from the set of coefficients. This first subset may comprise a first coefficient of the set and additional coefficients of the set following the first coefficient by multiples of N. In other words, every Nth coefficient (starting from the first coefficient) of the set of coefficients may be selected for the first subset of coefficients.
The method may further comprise the step of determining a first sample of the output audio signal based on the first subset of coefficients and a first plurality of samples of the input audio signal. In other words, a first sample of the output audio signal may be determined by filtering a first plurality of samples of the input audio signal using a filter based on the first subset of coefficients.
In order to determine a second sample of the output audio signal, the method may comprise the step of selecting a second coefficient of the set based on the first coefficient and M. The method may proceed in selecting a second subset of coefficients from the set of coefficients, wherein the second subset comprises the second coefficient and coefficients of the set following the second coefficient by multiples of N. In other words, the method may proceed in selecting a second subset comprising a shifted subset of filter coefficients. Finally, the method may determine the second sample of the output audio signal directly following the first sample, based on the second subset of coefficients and a second plurality of samples of the input audio signal.
In other words, the samples of the output audio signal may be determined using a polyphase finite impulse response implementation of the psychoacoustic filter described in the present document.
According to another aspect a resampler configured to generate an output audio signal at an output sampling rate fsout from an input audio signal at an input sampling rate fsin is described. The ratio of the output sampling rate fsout and the input sampling rate fsin may be a rational number N/M. The resampler may comprise a filter according to any of the aspects outlined in the present document. The filter comprises a set of coefficients. Furthermore, the resampler may comprise a coefficient selection unit configured to select a subset of coefficients from the set of coefficients. The selection of the subset may be performed as outlined above in the context of the first and/or second subset. In addition, the resampler may comprise a filtering unit configured to generate a sample of the output audio signal from a plurality of samples of the input audio signal using the subset of coefficients.
According to a further aspect, a software program is described. The software program may be adapted for execution on a processor and for performing the aspects and features outlined in the present document when carried out on a computing device.
According to another aspect, a storage medium comprising a software program is described. The software program may be adapted for execution on a processor and for performing the aspects and features outlined in the present document when carried out on a computing device.
According to a further aspect, a computer program product is described. The computer program product may comprise executable instructions for performing the aspects and features outlined in the present document when executed on a computer.
It should be noted that the methods and systems including its preferred embodiments as outlined in the present document may be used stand-alone or in combination with the other methods and systems disclosed in this document. Furthermore, all aspects of the methods and systems outlined in the present document may be arbitrarily combined. In particular, the features of the claims may be combined with one another in an arbitrary manner.
The methods and systems described in the present document are explained below in an exemplary manner with reference to the accompanying drawings, wherein
a illustrates example frequency intervals of an input audio signal not contributing to imaging/aliasing;
b shows an example target frequency response and an example frequency response of an anti-imaging/anti-aliasing filter;
a depicts an example absolute threshold of hearing curve;
b shows frequency intervals of an example output audio signal, wherein signal components of the output audio signal in the illustrated frequency intervals are not perceived by a human listener;
c shows an example target frequency response and an example frequency response of an anti-imaging/anti-aliasing filter taking into account psychoacoustic aspects;
a illustrates a mapping of a linear frequency scale to the Bark scale;
b illustrates an example self-masking threshold curve;
c illustrates the allowed deviations of an example resampling filter, due to self-masking and due to absolute threshold of hearing;
It should be noted that the filter 102 runs at an intermediate frequency (IF) at N times the input sampling rate or at M times the output sampling rate (e.g. IF=M*48 kHz for the above mentioned cases). This means that the anti-aliasing filters 102 typically operate at high sampling rates, such that a reduction of the number of computational filter operations is desirable. In other words, it is desirable to reduce the number of required coefficients of the anti-aliasing filter 102, in order to reduce the overall computational complexity of the rational resampler 100.
The filters may be realized as a polyphase FIR (Finite Impulse Response) implementation. Such an implementation exploits the fact that the upsampled audio signal 111 which is filtered by filter 102 comprises N−1 zeros between the samples of the input audio signal 110. Consequently, the “zero” multiplications and additions can be omitted. Furthermore, a polyphase implementation exploits the fact that due to the subsequent down-by-M decimator 103, only every Mth sample of the filtered audio signal 112 needs to be determined. By exploiting this information during the filter implementation, the number of multiplication and/or adding operations can be significantly reduced, thereby reducing the computational complexity of the rational resampler 100. Nevertheless, it is desirable to further reduce the computational complexity or to further improve the perceptual performance of the resampler 100.
As indicated above, the resampling operation creates imaging and/or aliasing artifacts in the output audio signal 113 if no anti-aliasing filter 102 is used. These imaging and/or aliasing artifacts are created as a result of the upsampling 101 and downsampling 103 operations. This is illustrated in the frequency spectrum shown in
The input audio signal 110 has an input sampling rate fsin=40 kHz, i.e. the Nyquist frequency of the input audio signal 110 is at fsin/2=20 kHz. As a result of the upsampling operation 101, the upsampled audio signal 111 has an upsampled sampling rate of N×fsin=240 kHz, i.e. a Nyquist frequency of 120 kHz. The images of the sinusoids 201 at 2/3 kHz can be found at 40 kHz±2/3 kHz (reference numerals 202, 203), at 80 kHz±2/3 kHz (reference numerals 204, 205) and at 120 kHz−2/3 kHz (reference numeral 206). As such, the upsampled audio signal 111 comprises frequency components which exceed the Nyquist frequency of 20 kHz of the input audio signal 110.
If it is assumed that the input audio signal 110 at input sampling rate fsin=40 kHz is to be resampled to an output audio signal 113 at output sampling rate fsout=48 kHz, the downsampler 103 has to perform a downsampling by a factor M=5. However, due to the fact that the upsampled audio signal 111 comprises frequency components which exceed the Nyquist frequency fsout/2=24 kHz of the output audio signal 113 (see the sinusoid images 202, . . . , 206), so called aliasing occurs, thereby creating undesirable contributions of the sinusoid images 202, . . . , 206 to the spectrum of the output audio signal 113.
In order to avoid these undesirable contributions to the output audio signal 113, the upsampled audio signal 111 should be filtered using an anti-aliasing filter 102. The filter 102 should ensure that the spectral images 202, . . . , 206 created during the upsampling operation 101 do not cause aliasing during the downsampling operation 103. This can be ensured by using a low pass filter having a cut-off frequency or a pass band edge/stop band edge which corresponds to the lower one of fsout/2 and fsin/2, i.e. which corresponds to the lower one of the Nyquist frequency of the input audio signal 110 and the output audio signal 113.
The anti-aliasing filters 102 are usually specified by one or more filter design parameters. Typically, the most important design parameters for this type of filters are “stop band rejection”, “pass band edge”, and “pass band ripple” (in particular with regards to the signal processing involved). These three design parameters may have an influence on the number of filter coefficients (i.e. the length) of the anti-aliasing filter 102, and therefore on the complexity of the rational resampler 100. Consequently, a trade-off between the imposed filter design parameters and the length of the anti-aliasing filter 102 must be found. By way of example, the pass band ripple may be set at 0.1 dB and the available cycle budget (i.e. the available number of filter coefficients) may allow for a stop band rejection of around −50 dB.
In the following, different aspects are described which should be taken into account when designing an appropriate anti-aliasing filter 102. For this purpose, reference is made to
In column 310 of the bifrequency map 300 it can be seen that the frequency component at 3 kHz of the input signal 110 creates an intended frequency component 311 at 3 kHz in the output signal 113. However, due to imaging and aliasing effects, the frequency component at 3 kHz of the input signal 110 also creates frequency components 312, 313, 314, 315, 316 at other frequencies of the output signal 113. These latter frequency components may be perceived as artifacts within the output signal 113.
In a similar manner, it can be seen in line 320 of the bifrequency map 300 that the frequency component at 7 kHz of the output signal 113 receives an intended contribution from the frequency component 321 of the input signal 110. However, the frequency component at 7 kHz of the output signal 113 also receives contributions 322, 323, 324, 325 from other frequencies of the input signal 110. These latter contributions may result in audible artifacts of the output signal 113.
As such, the bifrequeny map 300 may be used to illustrate how the different frequency components of the output signal 113 are influenced by the frequency components of the input signal 110. Consequently, the bifrequency map 300 may also be used to identify certain frequency ranges of the input signal 110 which do not influence the output signal 113. This knowledge on frequency ranges of the input signal 110 not influencing the output signal 113 may be taken into account during the design of the anti-aliasing filter 112. As a result, the performance of the filter 112 may be improved and/or the length/complexity of the filter 112 may be reduced.
This is illustrated in
It can be seen in the graph 410 of
The graph 420 illustrates the constraints or parameters imposed during the design of the filter 102. In the illustrated case, the order of the filter 102 (i.e. the number of filter coefficients) was set to 60. The stop band suppression or attenuation was set to −28 dB within the “care” intervals 415, 416, 417 (constraints 425, 426, 427, respectively). The “care” interval 411 corresponds to the desired pass band 421 of the filter 102 (with no attenuation, i.e. a pass band attenuation of 0 dB). No constraints or parameters were imposed for the “don't care” intervals 412, 413, 414.
As such, a set of constraints on the target frequency response of filter 102 can be formulated. These constraints apply to the “care” intervals 411, 415, 416, 417, whereas the target frequency response of the filter 102 can taken on any form within the “don't care” intervals 412, 413, 414. The filter coefficients of a filter 102 meeting or approximating these requirements may be determined using filter design methods such as the Parks-McClellan algorithm. This algorithm determines the set of filter coefficients that minimize the maximum deviation from the target frequency response.
The Parks-McClellan algorithm is directed at minimizing the maximum of an approximation error E(f) given by
E(f)=W(f)|D(f)−H(f)|,
wherein D(f) is the desired form of the low pass filter 102, i.e. the target frequency response, and is typically given by
with fp being the pass band edge and fs being the stop band edge. As outlined above, other attenuation values may be defined in the pass band and/or the stop band. W(f) is a frequency dependent weighting function of the approximation error. H (f) is given by
and relates to the frequency response of the filter 102 by exp(−j2πnf)H(f). The filter coefficients hk of filter 102 are given by
h
k
=h
2n-k
;d
n-k=2hk,k=0, . . . , n−1;d0=hn.
The Parks-McClellan (Remez exchange) algorithm comprises the following steps:
1) Initialization: Choose an extremal set of frequencies {f(0)}.
2) Finite Set Approximation (at iteration m): Calculate the best Chebyshev (i.e. minmax) approximation on the present extremal set, giving a derivation value δ(m) for the minmax error on the present extremal set.
3) Interpolation: Calculate the error function E(f) over the entire set of frequencies Ω using step (2).
4) Look for local maxima of E(m)(f) on the set of frequencies Ω.
5) If maxfεΩ E(m)(f)>δ(m), then update the extremal set to {f(m+1)} by picking new frequencies where E(m)(f) has its local maxima. Make sure that the error alternates on the ordered set of frequencies Ω as described in (4) and (5). Return to Step 2 and iterate.
6) If maxfεΩ E(m)(f)≦δ(m), then the algorithm is complete. Use the set {f(m)} and the interpolation formula to compute an inverse discrete Fourier transform to obtain the filter coefficients.
Details on the Parks-McClellan algorithm are outlined in T. Parks, J. McClellan, “Chebyshev Approximation for Nonrecursive Digital Filters with Linear Phase”, IEEE Transactions on Circuit Theory, Vol. CT-19, No.2, March 1972, which is incorporated by reference.
The fact that certain frequency intervals of the target frequency response D(f) are not specified, i.e. the fact that the target frequency response of the filter 102 comprises “don't care” intervals 412, 413, 414, typically leads to shorter filters 102 for achieving the target frequency response or to filters 102 of a given length achieving an improved approximation of the target frequency response of the filter 102 within the “care” intervals 411, 415, 416, 417. The “don't care” intervals 412, 413, 414 may be taken into account within the Parks-McClellan algorithm by ignoring the approximation error E(f) within the “don't care” intervals 412, 413, 414. In other words, the approximation error E(f) exceeding the derivation value δ within a “don't care” interval would not trigger a further iteration of the algorithm.
It should be noted that other filter design methods may be used to determine a filter 102 approximating the target frequency response.
The frequency response 430 of the resulting filter 102 is depicted in the graph 420 of
Alternatively or in addition, other aspects may be taken into consideration when designing the anti-aliasing filter 102. In particular, audio perceptual aspects, e.g. the absolute threshold of hearing (ATH), may be taken into consideration.
The ATH curve 505 may be approximated by a mathematical equation, e.g. by the equation proposed by Terhardt:
wherein the frequency f is measured in Hz.
b illustrates the frequency intervals 506, 507 corresponding to high values of ATH in the bifrequency map 500. It can be seen that these frequency ranges 506, 507 of the output signal 113 may have high imaging/aliasing contributions from the input signal 110, without being audible to a listener of the output signal 113. In other words, due to the high absolute threshold of hearing in the frequency ranges 506, 507, imaging/aliasing artifacts within the output signal 113 are of reduced importance to the perceived quality of the output signal 113. It is proposed to use this knowledge during the design of the anti-imaging/anti-aliasing filter 102.
c shows how this information regarding the absolute threshold of hearing can be taken into account during the design of the anti-imaging/anti-aliasing filter 102. In a similar manner to
The graph 520 of
I.e. the graph 520 illustrates the constraints which are used during the design of the filter 102. In a similar manner to
It should be noted that frequency ranges for which the ATH value exceeds a certain level, i.e. frequency ranges which cannot reasonably comprise audible frequency components of the output signal 113, may be declared as “don't care” intervals, thereby removing further constraints during the design of filter 102 in this particular frequency range. This is illustrated in segment 525-1 (associated with a frequency range of approx 20-24 kHz), where the ATH value is very high. Consequently, the number of degrees of freedom on the frequency response of filter 102 can be increased.
As illustrated in
In view of the above, the ATH curve 505 is fitted into the frequency diagram of the upsampled output signal while taking into account the images created due to the (imaginary) up-by-M upsampling (i.e. prior to the down-by-M decimation). As such, the allowed deviation from the target frequency response of the filter 102 in the “care” frequency interval 415, 416, 417 is derived from images of the ATH curve 505 in the frequency diagram of the upsampled output signal, i.e. in the frequency diagram mirrored in accordance to the Nyquist frequency of the output signal 113. This is shown in
Segment 525-3 is adjacent to segment 523 corresponding to the “don't care” interval 513. The fact that the target frequency response 532 is left blank within the segment 523 indicates that no constraints are imposed on the target frequency response 532 within the “don't care” interval 513.
Overall, allowed deviations from the target frequency response 531, 532 in the stop band are obtained which is made up of a succession of scaled and possibly mirrored images of the ATH curve 505. These allowed deviations from the target frequency response 531, 532 may be interrupted by “don't care” frequency intervals. Using the allowed deviation from the target frequency response 531, 532 as an input to filter design methods such as the Parks-McClellan algorithm provide the coefficients of an anti-aliasing filter 102. The resulting frequency response 430 of filter 102 is shown in graph 520 of
In
The effect of the psychoacoustic resampler 100 using an anti-aliasing filter 601 can be seen in
In the following further aspects are outlined which may be taken into consideration when designing the anti-aliasing filter 102. For this purpose, the imaging and aliasing caused by rational resampling is analyzed in further detail from a mathematical perspective.
As outlined above, fsin and fsout are the input and output sampling rates, respectively. It has been outlined in the context of
A triangle wave function T(x)=|frac(x+½)−½| may be defined, where the function “frac(.)” denotes the fractional part of its argument. Such a triangle wave function is illustrated in
On the other hand, looking at an image frequency f=fbb+n·fsin, the triangle wave function T(x) can be used to derive the baseband frequency fbb, that the image frequency f originated from
f
bb
=fs
in
·T(f/fsin). (1)
Likewise, during downsampling to the output sampling rate fsout, the image frequency f aliases back into the baseband via the function fal=fsout·T(f/fsout), wherein fal is the alias component (in the baseband of the output signal 113) originating from image frequency f. Varying the parameter f and plotting the periodically varying baseband component fbb against the alias component fal produces the Lissajous-like
The bifrequency plot 900 of
The bifrequency plot 900 also shows that every frequency in the output domain can be an alias of M=5 frequencies in the input domain, one being the same frequency in the input signal itself. This had already been shown in
Furthermore, points 911, 912, 913, 914 can be identified in the bifrequency map 900, where the alias component of an image of the input frequency coincides with the input frequency. These points 911, 912, 913, 914 may be referred to as self-masking points. By way of example, point 911 is positioned at coordinates (4 kHz, 4 kHz) in the bifrequency diagram 900. Line 901 (corresponding to the frequency axis of the upsampled signal 111) traverses point 911 twice, a first time at 4 kHz and a second time at 44 kHz. This means that not only the original input frequency fbb=4 kHz , but also its image fbb+fsin=44 kHz contributes to the output frequency f=4 kHz. In a similar manner, at point 912, not only the original input frequency fbb=8 kHz , but also its image fbb+2fsin=88 kHz contributes to the output frequency f=8 kHz . In view of the fact that the input audio signals are real signals, their frequency spectrum is symmetrical. Consequently, not only the original input frequency fbb=16 kHz , but also its image −fbb+2fsin=64 kkHz contributes to the output frequency f=16 kHz (point 914), as well as, not only the original input frequency fbb=12 kHz , but also its image −fbb+3fsin=108 kkHz contributes to the output frequency f=12 kHz (point 913).
In addition, one can identify points where two aliases which are due to images of the same input frequency coincide with one another (while not coinciding with the original input frequency). These frequencies may require particular attenuation by the filter 102 because the two aliasing components might constructively interfere. These points may be referred to as self interference points. By way of example, point 921 at frequency coordinates (4 kHz, 12 kHz) is traversed twice by the frequency axis 901 of the upsampled signal 111, once at 36 kHz (i.e. at −fbb+fs, with fbb=4 kHz) and a second time at 84 kHz (i.e. fbb+2fsin with fbb=4 kHz). As can be seen, the two images of the baseband frequency at 4 kHz contribute to the output frequency at 12 kHz, i.e. the images of the baseband frequency fbb contribute to an output frequency fal which is different from fbb.
In order to further analyze the generation of images during the upsampling operation 101, it is assumed in the following that M=1. As such, the relation between the input sampling rate fsin and the output sampling rate is fsout=N·fsin·Let τabs(f) be the absolute threshold of hearing (ATH) at frequency f, such that a tone with a signal level lower than τabs(f) will not be audible. A mathematical approximation of the ATH curve 505 has been provided in the context of
Considering the images at frequencies f=n fsin/2, n=1 . . . N, it is the purpose of the anti-imaging (upsampling) filter 102 to reduce the images to a level that will not be audible. To determine whether an image with energy level L at frequency f will be audible, the signal level L at frequency f can be compared to the ATH curve τabs(f). If L<τabs(f), the image will not be audible.
At the time of designing the filter 102, the signal level of the baseband audio signal 110 it typically not known, and thus the signal level L of the images is not known. The higher the signal level of the input signal 101, the more attenuation is required by filter 102. However, the assumption can be made that the signal level of the baseband audio signal is below the threshold of pain τp(fbb) (where fbb is given by equation 1). By way of example, the threshold of pain τp(fbb) can be approximated by a frequency independent constant of 120 dB SPL. In view of the above assumptions, the attenuation of the filter should be equal to or better than τabs(f)/τp(fbb), i.e. the magnitude of the frequency response of the filter 102 should be
|H(f)|2<τabs(f)/τp(fbb). (2)
A possible way of designing such a filter 102 may be the above described Parks-McClellan algorithm, with pass band gain of 1 and a stop band gain of 0, i.e. with a target frequency response
as outlined above. The linear error weighting function W (f) within the stop band may be set To
W(f)=√{square root over (τp(fbb)/τabs(f))}{square root over (τp(fbb)/τabs(f))}.
Likewise, when downsampling an input signal by a factor M (i.e. N=1) from an input sampling rate fsin to an output sampling rate fsout=fsin/M, the threshold of hearing curve 505 may be used to derive the allowed deviations from the ideal stop band suppression. However, for audio applications, the audio input signal 110 is already in the audible frequency range. Consequently, also the downsampled output signal 113 is in the audible frequency range. Therefore, the potential of using the high thresholds of hearings for high frequencies in a pure downsampling scenario is limited.
As has been outlined in the context of
Let μ(f, f0) be the relative masking threshold afforded by a single tone with signal level L at frequency f0. That is, a tone with level L′ at frequency f where L′<L·μ(f, f0) will not be audible. An approximation of the relative masking threshold curve (also referred to as tone masking tone, TMT) may be given by
wherein the Bark scale can be approximated by
Bark(f)=13·atan(0,76·10−3f)+3,5·atan(0,13·10−3f)2.
The Bark scale is illustrated in
In the following, a signal component of the upsampled signal 110 at frequency f ε [0; N·fsin/2] in the intermediate, i.e. upsampled, frequency domain is considered. This component at frequency f is an image of the baseband signal component at fbb=fsin·T(f/fsin). During the downsampling process, it will be aliased to a component at frequency fal=fsout·T(f/fsout) of the output signal 113.
Let Lal=L·|H(f)|2 be the signal level of the alias component at frequency fal. The alias component at fal may be subject to masking from the component at fbb that it originated from via the upsampling-filtering-downsampling process. This may be the case, if the alias component at fal originated from an input signal component at fbb, wherein fbb≈fal. In order to exploit the masking from the component at fbb, i.e. in order to ensure that the alias component at fal is not audible the signal level of the alias component should be Lal=L·|H|2≦L·μ(fal, fbb), i.e.
|H|
2≦μ(fal,fbb). (3)
As can be seen in
As outlined in the context of
This is illustrated in
Furthermore,
|H(f)2≦max(τabs(fal)/τp(fBB),μ(fal, fBB)).
The stop band gain may be restricted to unity or lower in order to avoid amplification of aliases.
In
In step 1303 “don't care” intervals of the stop band are identified. The “don't care” intervals may be due to frequency intervals of the input audio signal having low energy, i.e. having an energy value below an energy threshold. Furthermore, the “don't care” intervals may be due to spectral images and/or mirrored images of such low energy frequency ranges of the input audio signal. Alternatively or in addition, the “don't care” intervals may be due to frequency ranges associated with a low auditory spectral sensitivity of a human listener. As outlined in the present document, images and/or mirrored images of such frequency ranges may be selected as “don't care” intervals of the stop band.
In step 1304 images or mirrored images of a perceptual frequency response indicative of the auditory spectral sensitivity of a human listener are assigned to the stop band, in particular to the frequency ranges of the stop band outside of the “don't care” intervals. The perceptual frequency response may be associated with a scaled and/or shifted version of the absolute threshold of hearing curve, thereby attributing different degrees of attenuation to different frequencies of the stop band. Alternatively or in addition, the perceptual frequency response may be associated with the self-masking threshold curve at particular frequencies.
As a result of steps 1301 to 1304 a target frequency response and allowed deviations from this target frequency response of the psychoacoustic filter 102 have been determined. Within the “don't care” intervals, the allowed deviation from the target frequency response takes on arbitrary or undefined values. Outside the “don't care” intervals of the stop band, the allowed deviation from the target frequency response is associated with a perceptual frequency response indicative of the auditory spectral sensitivity of a human listener. In step 1305 the coefficients of the filter are determined using filter design methods such as a Parks-McClellan algorithm. Such filter design methods determine the filter coefficients such that the frequency response of the resulting filter is fitted to the target frequency response, while taking into account the allowed deviation from the target frequency response.
In the present document, a method and system for designing psychoacoustic anti-aliasing filters has been described. The resulting filters may be used to implement psychoacoustic resamplers 100 which perform rational resampling at reduced computational complexity and/or at improved perceptional quality.
The methods and systems described in the present document may be implemented as software, firmware and/or hardware. Certain components may e.g. be implemented as software running on a digital signal processor or microprocessor. Other components may e.g. be implemented as hardware and or as application specific integrated circuits. The signals encountered in the described methods and systems may be stored on media such as random access memory or optical storage media. They may be transferred via networks, such as radio networks, satellite networks, wireless networks or wireline networks, e.g. the internet. Typical devices making use of the methods and systems described in the present document are portable electronic devices or other consumer equipment which are used to store and/or render audio signals. The methods and system may also be used on computer systems, e.g. internet web servers, which store and provide audio signals, e.g. music signals, for download.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2011/072311 | 12/9/2011 | WO | 00 | 6/7/2013 |
Number | Date | Country | |
---|---|---|---|
61421271 | Dec 2010 | US |