By means of phase vocoders [1-3] or other techniques for time or pitch modification algorithms such as Synchronized Overlap-Add (SOLA), audio signals can for example be modified with respect to the playback rate, whereas the original pitch is preserved. Moreover, these methods can be applied to carry out a transposition of the signal while maintaining the original playback duration. The latter can be accomplished by stretching the audio signal with an integer factor and subsequent adjustment of the playback rate of the stretched audio signal applying the same factor. For a time-discrete signal, the latter corresponds to a down sampling of the time stretched audio signal about the stretching factor given that the sampling rate remains unchanged.
Phase vocoder based bandwidth extension methods like [4-5] generate, in dependency of the necessitated overall bandwidth, a variable number of band limited sub bands (patches) which are summed up to form a sum signal which exhibits the necessitated overall bandwidth.
The temporal alignment of the single patches which result from the phase vocoder application turns out to be a specific challenge. In general, these patches have time delays of different durations. This is because the synthesis windows of the phase vocoders are arranged in fixed hop sizes which are dependent on the stretching factor, and therefore every individual patch has a delay of a predefined duration. This leads to a frequency selective time delay of the bandwidth extended sum signal. Since this frequency selective delay affects the vertical coherence properties of the overall signal it has a negative impact on the transient response of the bandwidth extension method.
Another challenge is presented by considering the individual patches, where a lack of cross frequency coherence has a negative impact of the magnitude response of the phase vocoder.
According to an embodiment, an apparatus for generating a bandwidth extended audio signal from an input signal may have: a patch generator for generating one or more patch signals from the input signal, wherein a patch signal has a patch center frequency being different from a patch center frequency of a different patch or from a center frequency of the input audio signal, wherein the patch generator is configured for performing a time stretching of subband signals from an analysis filterbank, and wherein the patch generator includes a phase adjuster for adjusting phases of the subband signals using a filterbank-channel dependent phase correction.
According to another embodiment, a method of generating a bandwidth extended audio signal from an input signal may have the steps of: generating one or more patch signals from the input signal, wherein a patch signal has a patch center frequency being different from a patch center frequency of a different patch or from a center frequency of the input audio signal, wherein a time stretching of subband signals from an analysis filterbank is performed, and wherein phases of the subband signals are adjusted using a filterbank-channel dependent phase correction.
Another embodiment may have a computer program having a program code for performing, when running in a computer, the inventive method.
An apparatus for generating a bandwidth extended audio signal from an input signal comprises a patch generator for generating one or more patch signals from the input signal. The patch generator is configured for performing a time stretching of subband signals from an analysis filter bank and comprises a phase adjuster for adjusting phases of the subband signals using a filterbank-channel dependent phase correction.
A further advantage of the present invention is that negative impacts on magnitude responses normally introduced by phase vocoder-like structures for bandwidth extension or other structures for bandwidth extension are avoided.
A further advantage of the present invention is that an optimized magnitude response of the individual patches, which are, for example, created by means of phase vocoders or phase vocoder-like structures, is obtained. In a further embodiment, the temporal alignment of the individual patches can be addressed as well, but the phase correction within a patch, i.e. among the subband signals processed using one and the same transposition factor can be applied with or without the time correction which is valid for all subband signals within a patch as a whole.
An embodiment of the present invention is a novel method for the optimization of the magnitude response and temporal alignment of the single patches which are created by means of phase vocoders. This method basically consists of choices of phase corrections to the transposed subbands in a complex modulated filterbank implementation and of the introduction of additional time delays into the single patches which result from phase vocoders with different transposition factors. The time duration of the additional delay introduced to a specific patch is dependent from the applied transposition factor and can be determined theoretically. Alternatively, the delay is adjusted such that, applying a Dirac impulse input signal, the temporal center of gravity of the transposed Dirac impulse in every patch is aligned on the same temporal position in a spectrogram representation.
There are many methods that carry out transpositions of audio signals by a single transposition factor such as the phase vocoder. If several transposed signals have to be combined, one can correct the time delays between the different outputs. A correct vertical alignment between the patches is useful but not necessarily part of these algorithms. This is not harmful as long as no transients are considered. The problem of correct alignment of different patches is not addressed in state of the art literature.
Transposition of spectra by means of phase vocoders does not guarantee to preserve the vertical coherence of transients. Moreover, post echoes emerge in the high frequency bands due to the overlap add method utilized in the phase vocoder as well as the different time delays of the single patches which contribute to the sum signal. It is therefore desirable to align the patches in a way such that the bandwidth extension parametric post processing can exploit a better vertical alignment amongst the patches. The entire time span covering pre- and post-echo has thereby to be minimized.
A phase vocoder is typically implemented by multiplicative integer phase modification of subband samples in the domain of an analysis/synthesis pair of complex modulated filter banks. This procedure does not automatically guarantee the proper alignment of the phases of the resulting output contributions from each synthesis subband, and this leads to a non-flat magnitude response of the phase vocoder. This artifact results in a time-varying amplitude of a transposed slow sine sweep. In terms of audio quality for general audio, the drawback is a coloring of the output by modulation effects.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
a illustrates implementation details for an analysis filterbank allowing a transposition-factor independent phase correction; and
b illustrates implementation details for an analysis filterbank necessitating a transposition-factor dependent phase correction.
The present application provides different aspects of apparatuses, methods or computer programs for processing audio signals in the context of bandwidth extension and in the context of other audio applications, which are not related to bandwidth extension. The features of the subsequently described and claimed individual aspects can be partly or fully combined, but can also be used separately from each other, since the individual aspects already provide advantages with respect to perceptual quality, computational complexity and processor/memory resources when implemented in a computer system or micro processor.
Embodiments employ a time alignment of the different harmonic patches which are created by phase vocoders. The time alignment is carried out on the basis of the center of gravity of a transposed Dirac impulse. The subsequent
By transposing this Dirac impulse by means of a phase vocoder, frequency selective delays are introduced into the resulting sub bands. The time duration of these is dependent on the utilized transposition factor. Subsequently, the transposition of a Dirac impulse with the transposition factors 2, 3 and 4 is shown exemplarily in
The frequency selective delays are compensated for by insertion of an additional individual time delay into each resulting patch. This way, every single sub band is aligned such, that the center of gravity of the Dirac impulse in every patch is located at the same temporal position as the center of gravity of the Dirac impulse in the highest patch. The alignment is carried out based on the highest patch because it usually owns the highest time delay. Applying the inventive delay compensation, the center of gravity of the Dirac impulse is located on the same temporal position for all patches inside a spectrogram. Such a representation of the resulting signals might look as depicted in
Eventually, it is necessitated to additional compensate for the remaining time delay between the transposed high frequency regions and the original input signal For that purpose, the input signal can be delayed as well so that the centers of gravity of the transposed Dirac impulses, which have been aligned to a certain temporal position beforehand, match the temporal position of the band limited Dirac impulse. Subsequently, the spectrogram of the resulting signal is shown in
For the application of the described method it is insignificant whether the phase vocoder as fundamental component of the bandwidth extension method is realised in time domain or inside a filter bank representation like for example a pQMF filter bank.
Using SOLA techniques, the subjective audio quality of transients is impaired by echo effects due to the overlap add whereas the vertical coherence criterion is fulfilled at transients. Possible, slight deviations of the positions of the center of gravity in the single patches from the actual center of gravity in the highest patch lie in the range of the pre masking or post masking, respectively.
The result of a poorly adjusted phase vocoder in terms of magnitude response is illustrated by the output signal on
An operation in a complex modulated filterbank based phase vocoder is the multiplicative phase modification of subband samples. An input time domain sinusoid results to very good precision in the complex valued subband signals of the form
C{circumflex over (v)}
n(ω)exp[i(ωqAk+Θn)]
where ω is the frequency of the sinusoid, n is the subband index, k is the subband time slot index, qA is the time stride of the analysis filterbank, C is a complex constant, {circumflex over (v)}n(ω) is the frequency response of the filter bank prototype filter, and θn is a phase term characteristic for the filterbank in question, defined by the requirement that {circumflex over (v)}n(ω) becomes real valued. For typical QMF filterbank designs, it can be assumed to be positive. Upon phase modification a typical result is then of the form
D{circumflex over (v)}
n(ω)exp[i(TωqSk+Tθn)]
where T is the transposition order and qS is the time stride of the analysis filterbank. As the synthesis filterbank is typically chosen to be a mirror image of the analysis filterbank, a proper sinusoidal synthesis necessitates this last expression to correspond to the analysis subbands of a sinusoid. The failure of conformance to this will lead to the amplitude modulations as depicted in
An embodiment of the present invention is to use an additive post modification phase correction based on
Δθn=(1−T)θn
This will map the unmodified subband signals into having the desirable cross subband phase evolution.
D{circumflex over (v)}
n(ω)exp[i(TωqSk+Tθn)]→D{circumflex over (v)}n(ω)exp[i(TωqSk+θn)].
For the specific example of an oddly stacked complex modulated QMF filterbank, one has
And the inventive phase correction is given based on
The output of the phase adjusted phase vocoder according to this rule is depicted on
If the analysis/synthesis filterbank pair has more asymmetric distribution of phase twiddles, there will exist a phase correction ψn which, when added to the analysis subbands, and a minus sign prior to synthesis brings the situation back to the above symmetric case. In that case the above inventive phase correction should be adjusted based on
Δθn=(1−T)(θn−ψn)
An example of this is given by a 64 band QMF filterbank pair used in the upcoming MPEG standard on Unified Speech and Audio coding (USAC) based on
wherein C is a real number and can have values between 2 and 3.5. Particular values are 321/128 or 385/128.
Hence for that pair one can use
Furthermore, in a special implementation of the above situation, one observes that a phase correction, which is independent the transposition order T, could be incorporated in the analysis filter bank step itself. Since a correction prior to the vocoder phase multiplication corresponds to T times the same correction after phase multiplication, the following decomposition occurs as advantageous,
The analysis filterbank modulation is then modified to add the phase
compared to the case for the standardized QMF filterbank pair, and the inventive phase correction becomes equal to the second term alone,
The advantage of the phase correction is that a flat magnitude response of each vocoder order contribution to the output is obtained.
The inventive processing is suitable for all audio applications that extend the bandwidth of audio signals by application of phase vocoder time stretching and down sampling or playback at increased rate respectively.
Preferably, a patch correction is performed in such a way that the patch generator 82 generates the one or more patch signals so that a time disalignment between the input audio signal and the one or more patch signals or a time disalignment between different patch signals is, when compared to a processing without correction, reduced or eliminated. In the embodiment in
It is to be noted that the
While a single delay for the result of all time stretched signals processed using the same time stretching amount is sufficient, an individual phase correction will have to be applied for each subband signal, since the individual phase correction is, although signal-independent, dependent on the channel number of a subband filterbank or, stated differently, a subband index of a subband signal, where a subband index means the same as a channel number in the context of this description.
The individual blocks are input into a windower 1802 for windowing the blocks using a window function for each block. Additionally, a phase calculator 1804 is provided, which calculates a phase for each block. The phase calculator 1804 can either use the individual block before windowing or subsequent to windowing. Then, a phase adjustment value p x k is calculated and input into a phase adjuster 1806. The phase adjuster applies the adjustment value to each sample in the block. Furthermore, the factor k is equal to the bandwidth extension factor. When, for example, the bandwidth extension by a factor 2 is to be obtained, then the phase p calculated for a block extracted by the block extractor 1800 is multiplied by the factor 2 and the adjustment value applied to each sample of the block in the phase adjustor 1806 is p multiplied by 2.
In an embodiment, the single subband signal is a complex subband signal, and the phase of a block can be calculated by a plurality of different ways. One way is to take the sample in the middle or around the middle of the block and to calculate the phase of this complex sample.
Although illustrated in
The phase-adjusted blocks are input into an overlap/add and amplitude correction block 1808, where the windowed and phase-adjusted blocks are overlap-added. Importantly, however, the sample/block advance value in block 1808 is different from the value used in the block extractor 1800. Particularly, the sample/block advance value in block 1808 is greater than the value e used in block 1800, so that a time stretching of the signal output by block 1808 is obtained. Thus, the processed subband signal output by block 1808 has a length which is longer than the subband signal input into block 1800. When the bandwidth extension of two is to be obtained, then the sample/block advance value is used, which is two times the corresponding value in blocks 1800. This results in a time stretching by a factor of two. When, however, other time stretching factors are necessitated, then other sample/block advance values can be used so that the output of block 1808 has a necessitated time length. In an embodiment, only one sample with index m=0 will be modified to have k (or T) times it's phase. This is, in this embodiment, not valid for the whole block. For the other samples, the modification can be different as for example illustrated in
For addressing the overlap issue, an amplitude correction is performed in order to address the issue of different overlaps in block 1800 and 1808. This amplitude correction could, however, be also introduced into the windower/phase adjustor multiplication factor, but the amplitude correction can also be performed subsequent to the overlap/processing.
In the above example with a block length of 12 and a sample/block advance value in the block extractor of one, the sample/block advance value for the overlap/add block 1808 would be equal to two, when a bandwidth extension by a factor of two is performed. This would still result in an overlap of five blocks. When a bandwidth extension by a factor of three is to be performed, then the sample/block advance value used by block 1808 would be equal to three, and the overlap would drop to an overlap of three. When a four-fold bandwidth extension is to be performed, then the overlap/add block 1808 would have to use a sample/block advance value of four, which would still result in an overlap of more than two blocks.
Additionally, a phase correction dependent on the filterbank channel is input into the phase adjuster. Preferably, a single phase correction operation is performed, where the phase correction value is a combination of the signal-dependent adjustment phase value as determined by the phase calculator and the signal-independent (but filterbank channel number dependent) phase correction.
While
Furthermore, the merger 85 may additionally comprise an envelope adjuster, or basically a high frequency reconstruction processor for processing the signal input into the high frequency reconstructor based on the transmitted high frequency reconstruction parameters.
These reconstruction parameters may comprise envelope adjustment parameters, noise addition parameters, inverse filtering parameters, missing harmonics parameters or other parameters. The usage of these parameters and the parameters themselves and how they are applied for performing an envelope adjustment or, generally, a generation of the bandwidth extended signal is described in ISO/IEC 14496-3: 2005(E), section 4.6.8 dedicated to the spectral band replication (SBR) tool.
Alternatively, however, the merger 85 can comprise a synthesis filterbank and subsequently to the synthesis filterbank an HFR processor for processing the signal using the HFR parameters in the time domain rather than in the filterbank domain, where the HFR processor is situated before the synthesis filterbank.
Furthermore, when
The filterbank 105 finally outputs a transposer output signal which comprises bandwidth extensions by transposition factors 2, 3, and 4, and the signal output by block 105 is no longer bandwidth-limited to the crossover frequency, i.e. to the highest frequency of the core coder signal corresponding to the lowest frequency of the SBR or HFR generated signal components.
In the
Branch 110b, however, has a decimation functionality in order to obtain a transposition by 1.5. Due to the fact that the synthesis filterbank has two times the physical subband spacing of the analysis filterbank, a transposition factor of 3 is obtained as indicated in
Analogously, the third branch has a decimation functionality corresponding to a transposition factor of 2, and the final contribution of the different subband spacing in the analysis filterbank and the synthesis filterbank finally corresponds to a transposition factor of 4 of the third branch 110c.
Particularly, each branch has a block extractor 120a , 120b , 120c and each of these block extractors can be similar to the block extractor 1800 of
In an embodiment, the block extractor 120a of the first transposer branch 110a extracts 10 subband samples and subsequently a conversion of these 10 QMF samples to polar coordinates is performed. The output is then defined as discussed in
However, this is different for branches 110b and 110c . The block extractor 120b extracts a block of 8 subband samples and distributes these 8 subband samples in the extracted block in a different subband sample spacing. The non-integer subband sample entries for the extracted block are obtained by an interpolation, and the thus obtained QMF samples together with the interpolated samples are converted to polar coordinates and are processed by the phase adjuster 124b in order to result in a similar expression as the expression in block 143 of
The block extractor 120c is configured for extracting a block with a time extent of 6 subband samples and performs a decimation of a decimation factor 2, performs a conversion of the QMF samples into polar coordinates and again performs an operation in the phase adjuster 124b in order to obtain an expression similar to what is included in block 143 of
The transposition outputs of each branch are then added to form the combined QMF output by the adder 128, and the combined QMF outputs are finally superimposed using overlap-add in block 130, where the overlap-add advance or stride value is two times the stride value of the block extractors 120a , 120b , 120c as discussed before.
Subsequently, different embodiments for determining phase corrections are discussed in the context of
In this embodiment, the phase adjuster is configured for applying a phase correction using the value Δθn which is indicated as Ω(k) in
In a further embodiment, indicated at 152 in
A further embodiment of the present invention indicated at 153 has the advantage over the embodiments 151 and 152 in that the phase correction term Δθn or Ω(k) illustrated in
a and
An alternative embodiment is illustrated in
When
An embodiment comprises an apparatus for generating a bandwidth extended audio signal from an input signal, comprising: a patch generator for generating one or more patch signals from the input audio signal, wherein a patch signal has a patch center frequency being different from a patch center frequency of a different patch or from a center frequency of the input audio signal, wherein the patch generator is configured to generate the one or more patch signal so that a time disalignment between the input audio signal and the one or more patch signals or a time disalignment between different patch signals is reduced or eliminated, or wherein the patch generator is configured for performing a filterbank-channel dependent phase correction within a time stretching functionality.
In a further embodiment, the patch generator comprises a plurality of patchers, each patcher having a decimating functionality, a time stretching functionality, and a patch corrector for applying a time correction to the patch signals to reduce or eliminate the time disalignment.
In a further embodiment, the patch generator is configured so that the time delay is stored and selected in such a way that, when an impulse-like signal is processed, centers of gravities of patched signals obtained by the processing are aligned with each other in time.
In a further embodiment the time delays applied by the patch generator for reducing or eliminating the disalignment are fixedly stored and independent on the processed signal.
In a further embodiment the time stretcher comprises a block extractor using an extraction advance value, a windower/phase adjuster, and an overlap-adder having an overlap-add advance value being different from the extraction advance value.
In a further embodiment, a time delay applied for reducing or eliminating the disalignment depends on the extraction advance value, the overlap-add advance value or both values.
In a further embodiment, the time stretcher comprises the block extractor, the windower/phase adjuster, and the overlap-adder for at least two different channels having different channel numbers of an analysis filterbank, wherein the windower/phase adjuster for each of the at least two channels is configured for applying a phase adjustment for each channel, the phase adjustment depending on the channel number.
In a further embodiment, wherein the phase adjuster is configured for applying a phase adjustment to sampling values of a block of sampling values, the phase adjustment being a combination of a phase value depending on a time stretching amount and on an actual phase of the block, and a signal-independent phase value depending on the channel number.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
[1] J. L. Flanagan and R. M. Golden, Phase Vocoder, The Bell System Technical Journal, November 1966, pp 1394 -1509
[2] U.S. Pat. No. 6,549,884 Laroche, J. & Dolson, M.: Phase-vocoder pitch-shifting
[3] J. Laroche and M. Dolson, New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic Effects, Proc. IEEE Workshop on App. of Signal Proc. to Signal Proc. to Audio and Acous., New Paltz, N.Y. 1999.
[4] Frederik Nagel, Sascha Disch, A harmonic bandwidth extension method for audio codecs, ICASSP, Taipei, Taiwan, April 2009
[5] Frederik Nagel., Sascha Disch and Nikolaus Rettelbach, A phase vocoder driven bandwidth extension method with novel transient handling for audio codecs, 126th AES Convention, Munich, Germany, May 7-10, 2009
This application is a continuation of copending International Application No. PCT/EP2011/053298, filed Mar. 4, 2011, which is incorporated herein by reference in its entirety, and additionally claims priority from U.S. Application No. 61/312,118, filed Mar. 9, 2010, which is also incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61312118 | Mar 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2011/053298 | Mar 2011 | US |
Child | 13604313 | US |