In PCT WO 98/57436 the concept of transposition was established as a method to recreate a high frequency band from a lower frequency band of an audio signal. A substantial saving in bitrate can be obtained by using this concept in audio coding. In an HFR based audio coding system, a low bandwidth signal is processed by a core waveform coder and the higher frequencies are regenerated using transposition and additional side information of very low bitrate describing the target spectral shape at the decoder side. For low bitrates, where the bandwidth of the core coded signal is narrow, it becomes increasingly important to recreate a high band with perceptually pleasant characteristics. The harmonic transposition defined in PCT WO 98/57436 performs very well for complex musical material in a situation with low crossover frequency. The principle of a harmonic transposition is that a sinusoid with frequency ω is mapped to a sinusoid with frequency To where T>1 is an integer defining the order of transposition. In contrast to this, a single sideband modulation (SSB) based HFR method maps a sinusoid with frequency ω to a sinusoid with frequency ω+Δω where Δω is a fixed frequency shift. Given a core signal with low bandwidth, a dissonant ringing artifact can result from SSB transposition.
In order to reach the best possible audio quality, state of the art high quality harmonic HFR methods employ complex modulated filter banks, e.g. a Short Time Fourier Transform (STFT), with high frequency resolution and a high degree of oversampling to reach the audio quality that may be used. The fine resolution may be used to avoid unwanted intermodulation distortion arising from nonlinear processing of sums of sinusoids. With sufficiently high frequency resolution, i.e. narrow subbands, the high quality methods aim at having a maximum of one sinusoid in each subband. A high degree of oversampling in time may be used to avoid alias type of distortion, and a certain degree of oversampling in frequency may be used to avoid pre-echoes for transient signals. The obvious drawback is that the computational complexity can become high.
Subband block based harmonic transposition is another HFR method used to suppress intermodulation products, in which case a filter bank with coarser frequency resolution and a lower degree of oversampling is employed, e.g. a multichannel QMF bank. In this method, a time block of complex subband samples is processed by a common phase modifier while the superposition of several modified samples forms an output subband sample. This has the net effect of suppressing intermodulation products which would otherwise occur when the input subband signal consists of several sinusoids. Transposition based on block based subband processing has much lower computational complexity than the high quality transposers and reaches almost the same quality for many signals. However, the complexity is still much higher than for the trivial SSB based HFR methods, since a plurality of analysis filter banks, each processing signals of different transposition orders T, may be used in a typical HFR application in order to synthesize the bandwidth that may be used. Additionally, a common approach is to adapt the sampling rate of the input signals to fit analysis filter banks of a constant size, albeit the filter banks process signals of different transposition orders. Also common is to apply bandpass filters to the input signals in order to obtain output signals, processed from different transposition orders, with non-overlapping spectral densities.
Storage or transmission of audio signals is often subject to strict bitrate constraints. In the past, coders were forced to drastically reduce the transmitted audio bandwidth when only a very low bitrate was available. Modern audio codecs are nowadays able to code wideband signals by using bandwidth extension (BWE) methods [1-12]. These algorithms rely on a parametric representation of the high-frequency content (HF) which is generated from the low-frequency part (LF) of the decoded signal by means of transposition into the HF spectral region (“patching”) and application of a parameter driven post processing. The LF part is coded with any audio or speech coder. For example, the bandwidth extension methods described in [1-4] rely on single sideband modulation (SSB), often also termed the “copy-up” method, for generating the multiple HF patches.
Lately, a new algorithm, which employs a bank of phase vocoders [15-17] for the generation of the different patches, has been presented [13] (see
However, since the BWE algorithm is performed on the decoder side of a codec chain, computational complexity is a serious issue. State-of-the-art methods, especially the phase vocoder based HBE, comes at the prize of a largely increased computational complexity compared to SSB based methods.
As outlined above, existing bandwidth extension schemes apply only one patching method on a given signal block at a time, be it SSB based patching [1-4] or HBE vocoder based patching [15-17]. Additionally, modern audio coders [19-20] offer the possibility of switching the patching method globally on a time block basis between alternative patching schemes.
SSB copy-up patching introduces unwanted roughness into the audio signal, but is computationally simple and preserves the time envelope of transients. In audio codecs employing HBE patching, the transient reproduction quality is often suboptimal. Moreover, the computational complexity is significantly increased over the computational very simple SSB copy-up method.
When it comes to a complexity reduction, sampling rates are of particular importance. This is due to the fact that a high sampling rate means a high complexity and a low sampling rate generally means low complexity due to the reduced number of operations that may be performed. On the other hand, however, the situation in bandwidth extension applications is particularly so that the sampling rate of the core coder output signal will typically be so low that this sampling rate is too low for a full bandwidth signal. Stated differently, when the sampling rate of the decoder output signal is, for example, 2 or 2.5 times the maximum frequency of the core coder output signal, then a bandwidth extension by for example a factor of 2 means that an upsampling operation may be performed so that the sampling rate of the bandwidth extended signal is so high that the sampling can “cover” the additionally generated high frequency components.
Additionally, filterbanks such as analysis filterbanks and synthesis filterbanks are responsible for a considerable amount of processing operations. Hence, the size of the filterbanks, i.e. whether the filterbank is a 32 channel filterbank, a 64 channel filterbank or even a filterbank with a higher number of channels will significantly influence the complexity of the audio processing algorithm. Generally, one can say that a high number of filterbank channel involves more processing operations and, therefore, higher complexity then a small number of filterbank channels. In view of this, in bandwidth extension applications and also in other audio processing applications, where different sampling rates are an issue, such as in vocoder-like applications or any other audio effect applications, there is a specific interdependency between complexity and sampling rate or audio bandwidth, which means that operations for upsampling or subband filtering can drastically enhance the complexity without specifically influencing the audio quality in a good sense when the wrong tools or algorithms are chosen for the specific operations.
In the context of bandwidth extension, parametric data sets are used for performing a spectral envelope adjustment and for performing other manipulations to a signal generated by a patching operation, i.e. by an operation that takes some data from the source range, i.e. from the low band portion of the bandwidth extended signal which is available at the input of the bandwidth extension processor and then maps this data to a high frequency range. Spectral envelope adjustment can take place before actually mapping the low band signal to the high frequency range or subsequently to having mapped the source range to the high frequency range.
Typically, the parametric data sets are provided with a certain frequency resolution, i.e. parametric data refer to frequency bands of the high frequency part. On the other hand, the patching from the low band to the high band, i.e. which source ranges are used for obtaining which target or high frequency ranges, is an operation independent on the resolution, in which the parametric data sets are given with respect to frequency. The fact that the transmitted parametric data are, in a sense, independent from what is actually used as the patching algorithm is an important feature, since this allows great flexibility on the decoder-side, i.e. when it comes to the implementation of the bandwidth extension processor. Here, different patching algorithms can be used, but one and the same spectral envelope adjustment can be performed. Stated differently, the high frequency reconstruction processor or spectral envelope adjustment processor in a bandwidth extension application does not need to have information on the applied patching algorithm in order to perform the spectral envelope adjustment.
A disadvantage of this procedure, however, is that a misalignment between the frequency bands, for which the parametric data sets are provided on the one hand and the spectral borders of a patch on the other hand, can occur. Particularly in situations where the spectral energy strongly changes in the vicinity of a patch border, artifacts may arise specifically in this region, which degrade the quality of the bandwidth extended signal.
According to an embodiment, an apparatus for processing an audio signal to generate a bandwidth extended signal having a high frequency part and a low frequency part using parametric data for the high frequency part, the parametric data relating to frequency bands of the high frequency part, may have: a patch border calculator for calculating a patch border of a plurality of patch borders such that the patch border coincides with a frequency band border of the frequency bands of the high frequency part; and a patcher for generating a patched signal using the audio signal and the patch border, wherein the patch borders relate to the high frequency part of the bandwidth extended signal; wherein the patch border calculator is configured for: calculating a frequency table defining the frequency bands of the high frequency part using the parametric data or further configuration input data; setting a target synthesis patch border different from the patch border using at least one transposition factor; searching, in the frequency table, for a matching frequency band having a matching border coinciding with the target synthesis patch border within a predetermined matching range, or searching for the frequency band having a frequency band border being closest to the target synthesis patch border; and selecting the matching frequency band as the patch border, wherein the matching frequency band has a matching border coinciding with the target synthesis patch border within a predetermined matching range or has a frequency band border being closest to the target synthesis patch border.
According to another embodiment, a method of processing an audio signal to generate a bandwidth extended signal having a high frequency part and a low frequency part using parametric data for the high frequency part, the parametric data relating to frequency bands of the high frequency part, may have the steps of: calculating a patch border such that the patch border of a plurality of patch borders coincides with a frequency band border of the frequency bands of the high frequency part; and generating a patched signal using the audio signal and the patch border, wherein the patch borders relate to the high frequency part of the bandwidth extended signal, wherein said calculating a patch border may have the steps of calculating a frequency table defining the frequency bands of the high frequency part using the parametric data or further configuration input data; setting a target synthesis patch border different from the patch border using at least one transposition factor; searching, in the frequency table, for a matching frequency band having a matching border coinciding with the target synthesis patch border within a predetermined matching range, or to search for the frequency band having a frequency band border being closest to the target synthesis patch border; and selecting the matching frequency band as the patch border, wherein the matching frequency band has a matching border coinciding with the target synthesis patch border within a predetermined matching range or has a frequency band border being closest to the target synthesis patch border.
Another embodiment may have a computer program having a program code for performing when running on a computer, the method of processing an audio signal to generate a bandwidth extended signal having a high frequency part and a low frequency part using parametric data for the high frequency part, the parametric data relating to frequency bands of the high frequency part, which method may have the steps of: calculating a patch border such that the patch border of a plurality of patch borders coincides with a frequency band border of the frequency bands of the high frequency part; and generating a patched signal using the audio signal and the patch border, wherein the patch borders relate to the high frequency part of the bandwidth extended signal, wherein said calculating a patch border may have the steps of: calculating a frequency table defining the frequency bands of the high frequency part using the parametric data or further configuration input data; setting a target synthesis patch border different from the patch border using at least one transposition factor; searching, in the frequency table, for a matching frequency band having a matching border coinciding with the target synthesis patch border within a predetermined matching range, or to search for the frequency band having a frequency band border being closest to the target synthesis patch border; and selecting the matching frequency band as the patch border, wherein the matching frequency band has a matching border coinciding with the target synthesis patch border within a predetermined matching range or has a frequency band border being closest to the target synthesis patch border.
Embodiments of the present invention relate to an apparatus for processing an audio signal to generate a bandwidth extended signal having a high frequency portion and a low frequency portion, where parametric data for the high frequency portion is used, and where the parametric data relates to frequency bands of the high frequency part. The apparatus comprises a patch border calculator for calculating a patch border such that the patch border coincides with a frequency band border of the frequency bands. The apparatus furthermore comprises a patcher for generating a patch signal using the audio signal and the calculated patch border. In an embodiment, the patch border calculator is configured to calculate the patch border as a frequency border in a synthesis frequency range corresponding to the high frequency part. In this context, the patcher is configured to select a frequency portion of the low band part using a transposition factor and the patch border. In a further embodiment, the patch border calculafor is configured for calculating the patch border using a target patch border not coinciding with a frequency band border of the frequency band. Then, the patch border calculator is configured to set the patch border different from the target patch border in order to obtain the alignment. Particularly in the context of a plurality of patches using different transposition factors, the patch border calculator is configured to calculate patch borders, for example, for three different transposition factors such that each patch border coincides with a frequency band border of the frequency bands of the high frequency part. The patcher is then configured to generate the patch signal using the three different transposition factors such that the border between two adjacent patches coincides with a border between two adjacent frequency bands to which the parametric data is related.
The present invention is particularly useful in that the artifacts arising from misaligned patch borders on the one hand and frequency bands for the parametric data on the other hand are avoided. Instead, due to the perfect alignment, even strongly changing signals or signals having strongly changing portions in the region of the patch border are subjected to bandwidth extension with a good quality.
Furthermore, the present invention is advantageous in that it nevertheless allows high flexibility due to the fact that the encoder does not have to deal with a patching algorithm to be applied on the decoder-side. The independency between patching on the one hand and spectral envelope adjustment, i.e. using the parametric data generated by a bandwidth extension encoder, on the other hand is maintained and allows the application of different patching algorithms or even a combination of different patching algorithms. This is possible, since the patch border alignment makes sure that in the end the patch data on the one hand and the parametric data sets on the other hand match with each other with respect to the frequency bands, which are also called scale factor bands.
Depending on the calculated patch borders which can, for example, relate to the target range, i.e. the high frequency part of the finally obtained bandwidth extended signal, the corresponding source ranges for determining the patch source data from the low band portion of the audio signal are determined. It turns out that only a certain (small) bandwidth of the low band portion of the audio signal may be used due to the fact that in some embodiments harmonic transposition factors are applied. Therefore, in order to efficiently extract this portion from the low band audio signal, a specific analysis filterbank structure relying on cascaded individual filterbanks is used.
Such embodiments rely on a specific cascaded placement of analysis and/or synthesis filterbanks in order to obtain a low complexity resampling without sacrificing audio quality. In an embodiment, an apparatus for processing an input audio signal comprises a synthesis filterbank for synthesizing an audio intermediate signal from the input audio signal, where the input audio signal is represented by a plurality of first subband signals generated by an analysis filterbank placed in processing direction before the synthesis filterbank, wherein a number of filterbank channels of the synthesis filterbank is smaller than a number of channels of the analysis filterbank. The intermediate signal is furthermore processed by a further analysis filterbank for generating a plurality of second subband signals from the audio intermediate signal, wherein the further analysis filterbank has a number of channels being different from the number of channels of the synthesis filterbank so that a sampling rate of a subband signal of the plurality of subband signals is different from a sampling rate of a first subband signal of the plurality of first subband signals generated by the analysis filterbank.
The cascade of a synthesis filterbank and a subsequently connected further analysis filterbank to provides a sampling rate conversion and additionally a modulation of the bandwidth portion of the original audio input signal which has been input into the synthesis filterbank to a base band. This time intermediate signal, that has now been extracted from the original input audio signal which can, for example, be the output signal of a core decoder of a bandwidth extension scheme, is now represented advantageously as a critically sampled signal modulated to the base band, and it has been found that this representation, i.e. the resampled output signal, when being processed by a further analysis filterbank to obtain a subband representation allows a low complexity processing of further processing operations which may or may not occur and which can, for example, be bandwidth extension related processing operations such as non-linear subband operations followed by high frequency reconstruction processing and by a merging of the subbands in the final synthesis filterbank.
The present application provides different aspects of apparatuses, methods or computer programs for processing audio signals in the context of bandwidth extension and in the context of other audio applications, which are not related to bandwidth extension. The features of the subsequently described and claimed individual aspects can be partly or fully combined, but can also be used separately from each other, since the individual aspects already provide advantages with respect to perceptual quality, computational complexity and processor/memory resources when implemented in a computer system or micro processor.
Embodiments provide a method to reduce the computational complexity of a subband block based harmonic HFR method by means of efficient filtering and sampling rate conversion of the input signals to the HFR filter bank analysis stages. Further, the bandpass filters applied to the input signals can be shown to be obsolete in a subband block based transposer.
The present embodiments help to reduce the computational complexity of subband block based harmonic transposition by efficiently implementing several orders of subband block based transposition in the framework of a single analysis and synthesis filter bank pair. Depending on the perceptual quality versus computational complexity trade-off, only a suitable sub-set of orders or all orders of transposition can be performed jointly within a filterbank pair. Furthermore, a combined transposition scheme where only certain transposition orders are calculated directly whereas the remaining bandwidth is filled by replication of available, i.e. previously calculated, transposition orders (e.g. 2″ order) and/or the core coded bandwidth. In this case patching can be carried out using every conceivable combination of available source ranges for replication
Additionally, embodiments provide a method to improve both high quality harmonic HFR methods as well as subband block based harmonic HFR methods by means of spectral alignment of HFR tools. In particular, increased performance is achieved by aligning the spectral borders of the HFR generated signals to the spectral borders of the envelope adjustment frequency table. Further, the spectral borders of the limiter tool are by the same principle aligned to the spectral borders of the HFR generated signals.
Further embodiments are configured for improving the perceptual quality of transients and at the same time reducing computational complexity by, for example, application of a patching scheme that applies a mixed patching consisting of harmonic patching and copy-up patching.
In specific embodiments, the individual filterbanks of the cascaded filterbank structure are quadrature mirror filterbanks (QMF), which all rely on a lowpass prototype filter or window modulated using a set of modulation frequencies defining the center frequencies of the filterbank channels. Advantageously, all window functions or prototype filters depend on each other in such a way that the filters of the filterbanks with different sizes (filterbank channels) depend on each other as well. Advantageously, the largest filterbank in a cascaded structure of filterbanks comprising, in embodiments, a first analysis filterbank, a subsequently connected filterbank, a further analysis filterbank, and at some later state of processing a final synthesis filter bank, has a window function or prototype filter response having a certain number of window function or prototype filter coefficients. The smaller sized filterbanks are all sub-sampled versions of this window function, which means that the window functions for the other filterbanks are sub-sampled versions of the “large” window function. For example, if a filterbank has half the size of the large filterbank, then the window function has half the number of coefficients, and the coefficients of the smaller sized filterbanks are derived by sub-sampling. In this situation, the sub-sampling means that e.g. every second filter coefficient is taken for the smaller filterbank having half the size. However, when there are other relations between the filterbank sizes which are non-integer valued, then a certain kind of interpolation of the window coefficients is performed so that in the end the window of the smaller filterbank is again a sub-sampled version of the window of the larger filterbank.
Embodiments of the present invention are particularly useful in situations where only a portion of the input audio signal may be used for further processing, and this situation particularly occurs in the context of harmonic bandwidth extension. In this context, vocoder-like processing operations are particularly advantageous.
It is an advantage of embodiments that the embodiments provide a lower complexity for a QMF transposer by efficient time and frequency domain operations and an improved audio quality for QMF and DFT based harmonic spectral band replication using spectral alignment.
Embodiments relate to audio source coding systems employing an e.g. subband block based harmonic transposition method for high frequency reconstruction (HFR), and to digital effect processors, e.g. so-called exciters, where generation of harmonic distortion adds brightness to the processed signal, and to time stretchers, where the duration of a signal is extended while maintaining the spectral content of the original. Embodiments provide a method to reduce the computational complexity of a subband block based harmonic HFR method by means of efficient filtering and sampling rate conversion of the input signals prior to the HFR filter bank analysis stages. Further, embodiments show that the conventional bandpass filters applied to the input signals are obsolete in a subband block based HFR system. Additionally, embodiments provide a method to improve both high quality harmonic HFR methods as well as sub-band block based harmonic HFR methods by means of spectral alignment of HFR tools. In particular, embodiments teach how increased performance is achieved by aligning the spectral borders of the HFR generated signals to the spectral borders of the envelope adjustment frequency table. Further, the spectral borders of the limiter tool are by the same principle aligned to the spectral borders of the HFR generated signals.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
a illustrates an advantageous implementation of the patch border calculator of
b illustrates a further overview of a sequence of steps performed by embodiments of the invention;
a illustrates a block diagram illustrating more details of the patch border calculator and more details on the spectral envelope adjustment in the context of the alignment of patch borders;
b illustrates a flowchart for the procedure indicated in
The below-described embodiments are merely illustrative and may provide a lower complexity of a QMF transposer by efficient time and frequency domain operations, and improved audio quality of both QMF and DFT based harmonic SBR by spectral alignment. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
The table in
For the example in
Although it has been outlined that the source ranges are changed together with the target ranges, for other implementations one could also manipulate the transposition factor and to maintain the source range or the target borders or for other applications one could even change the source range and the transposition factor in order to finally arrive at adjusted patch borders which coincide with frequency band borders of frequency bands to which the parametric bandwidth extension data describing the spectral envelope of the high band portion of the original signal are related.
The filterbanks 1401 and 1403 can be of any complex exponential modulated type such as QMF or a windowed DFT. They can be evenly or oddly stacked in the modulation and can be defined from a wide range of prototype filters or windows. It is important to know the quotient ΔfS/ΔfA of the following two filter bank parameters, measured in physical units.
For the configuration of the subband processing 1402 it is useful to find the correspondence between source and target subband indices. It is observed that an input sinusoid of physical frequency Ω will result in a main contribution occurring at input subbands with index n≈Ω/ΔfA. An output sinusoid of the desired transposed physical frequency T·Ω will result from feeding the synthesis subband with index m≈T·Ω/ΔfS. Hence, the appropriate source subband index values of the subband processing for a given target subband index m obeys
Consider first the case T=2. The objective is specifically that the processing chain of a 64 band QMF analysis 1602-2, a subband processing unit 1603-2, and a 64 band QMF synthesis 1505 results in a physical transposition of T=2. Identifying these three blocks with 1401, 1402 and 1403 of
For the case T=3, the exemplary system includes a sampling rate converter 1601-3 which converts the input sampling rate down by a factor 3/2 from fS to 2fS/3. The objective is specifically that the processing chain of the 64 band QMF analysis 1602-3, the subband processing unit 1603-3, and a 64 band QMF synthesis 1505 results in a physical transposition of T=3. By identifying these three blocks with 1401, 1402 and 1403 of
For the case T=4, the exemplary system includes a sampling rate converter 1601-4 which converts the input sampling rate down by a factor two from fS to fS/2. The objective is specifically that the processing chain of the 64 band QMF analysis 1602-4, the subband processing unit 1603-4, and a 64 band QMF synthesis 1505 results in a physical transposition of T=4. By identifying these three blocks with 1401, 1402 and 1403 of
In the subband block based transposer of the HFR module 103, three transposition orders T=2, 3 and 4, are to be produced and delivered in the domain of a 64 band QMF operating at output sampling rate 2fS. The input time domain signal is bandpass filtered in the blocks 103-12, 103-13 and 103-14. This is done in order to make the output signals, processed by the different transposition orders, to have non-overlapping spectral contents. The signals are further downsampled (103-23, 103-24) to adapt the sampling rate of the input signals to fit analysis filter banks of a constant size (in this case 64). It can be noted that the increase of the sampling rate, from fS to 2fS, can be explained by the fact that the sampling rate converters use downsampling factors of T/2 instead of T, in which the latter would result in transposed sub-band signals having equal sampling rate as the input signal. The downsampled signals are fed to separate HFR analysis filter banks (103-32, 103-33 and 103-34), one for each transposition order, which provide a multitude of complex valued subband signals. These are fed to the non-linear subband stretching units (103-42, 103-43 and 103-44). The multitude of complex valued output subbands are fed to the Merge/Combine module 104 together with the output from the subsampled analysis bank 102. The Merge/Combine unit simply merges the sub-bands from the core analysis filter bank 102 and each stretching factor branch into a single multitude of QMF subbands to be fed into the HFR processing unit 105.
When the signal spectra from different transposition orders are set to not overlap, i.e. the spectrum of the 7th transposition order signal should start where the spectrum from the T-1 order signal ends, the transposed signals need to be of bandpass character. Hence the traditional bandpass filters 103-12-103-14 in
While the SSB transposer employed by SBR [ISO/IEC 14496-3:2009, “Information technology—Coding of audio-visual objects—Part 3: Audio] typically exploits the entire base band, excluding the first subband, to generate the high band signal, a harmonic transposer generally uses a smaller part of the core coder spectrum. The amount used, the so-called source range, depends on the transposition order, the bandwidth extension factor, and the rules applied for the combined result, e.g. if the signals generated from different transposition orders are allowed to overlap spectrally or not. As a consequence, just a limited part of the harmonic transposer output spectrum for a given transposition order will actually be used by the HFR processing module 105.
Therefore, the time length of the single subband signal is shorter than the time length before forming the decimation. The single subband signal is input into a block extractor 1800, which can be identical to the block extractor 201, but which can also be implemented in a different way. The block extractor 1800 in
The individual blocks are input into a windower 1802 for windowing the blocks using a window function for each block. Additionally, a phase calculator 1804 is provided, which calculates a phase for each block. The phase calculator 1804 can either use the individual block before windowing or subsequent to windowing. Then, a phase adjustment value p×k is calculated and input into a phase adjuster 1806. The phase adjuster applies the adjustment value to each sample in the block. Furthermore, the factor k is equal to the bandwidth extension factor. When, for example, the bandwidth extension by a factor 2 is to be obtained, then the phase p calculated for a block extracted by the block extractor 1800 is multiplied by the factor 2 and the adjustment value applied to each sample of the block in the phase adjustor 1806 is p multiplied by 2. This is an exemplary value/rule. Alternatively, the corrected phase for synthesis is k*p, p+(k−1)*p. So in this example the correction factor is either 2, if multiplied or 1*p if added. Other values/rules can be applied for calculating the phase correction value.
In an embodiment, the single subband signal is a complex subband signal, and the phase of a block can be calculated by a plurality of different ways. One way is to take the sample in the middle or around the middle of the block and to calculate the phase of this complex sample. It is also possible to calculate the phase for every sample.
Although illustrated in
The phase-adjusted blocks are input into an overlap/add and amplitude correction block 1808, where the windowed and phase-adjusted blocks are overlap-added. Importantly, however, the sample/block advance value in block 1808 is different from the value used in the block extracfor 1800. Particularly, the sample/block advance value in block 1808 is greater than the value e used in block 1800, so that a time stretching of the signal output by block 1808 is obtained. Thus, the processed subband signal output by block 1808 has a length which is longer than the subband signal input into block 1800. When the bandwidth extension of two is to be obtained, then the sample/block advance value is used, which is two times the corresponding value in block 1800. This results in a time stretching by a factor of two. When, however, other time stretching factors may be used, then other sample/block advance values can be used so that the output of block 1808 has a useful time length.
For addressing the overlap issue, an amplitude correction is advantageously performed in order to address the issue of different overlaps in block 1800 and 1808. This amplitude correction could, however, be also introduced into the windower/phase adjustor multiplication factor, but the amplitude correction can also be performed subsequent to the overlap/processing.
In the above example with a block length of 12 and a sample/block advance value in the block extractor of one, the sample/block advance value for the overlap/add block 1808 would be equal to two, when a bandwidth extension by a factor of two is performed. This would still result in an overlap of five blocks. When a bandwidth extension by a factor of three is to be performed, then the sample/block advance value used by block 1808 would be equal to three, and the overlap would drop to an overlap of three. When a four-fold bandwidth extension is to be performed, then the overlap/add block 1808 would have to use a sample/block advance value of four, which would still result in an overlap of more than two blocks.
Large computational savings can be achieved by restricting the input signals to the transposer branches to solely contain the source range, and this at a sampling rate adapted to each transposition order. The basic block scheme of such a system for a subband block based HFR generator is illustrated in
The essential effect of each downsampler is to filter out the source range signal and to deliver that to the analysis filter bank at the lowest possible sampling rate. Here, lowest possible refers to the lowest sampling rate that is still suitable for the downstream processing, not necessarily the lowest sampling rate that avoids aliasing after decimation. The sampling rate conversion may be obtained in various manners. Without limiting the scope of the invention, two examples will be given: the first shows the resampling performed by multi-rate time domain processing, and the second illustrates the resampling achieved by means of QMF subband processing.
Examples of an input signal and the spectrum after modulation is depicted in
where the fraction has been reduced by the common factor 8. Hence, the interpolation factor is 3 (as seen from
Another approach is to use the subband outputs from the subsampled 32-band analysis QMF bank 102 already present in the SBR HFR method. The subbands covering the source ranges for the different transposer branches are synthesized to the time domain by small subsampled QMF banks preceding the HFR analysis filter banks. This type of HFR system is illustrated in
The system outlined in
A block diagram of a factor 2 downsampler is shown in
Hence, the downsampler may be structured as in
A block diagram of the factor 1.5=3/2 downsampler is shown in
Hence, the downsampler may be structured as in
The time domain signal from the core decoder (101 in
Specifically,
In the
If the signals generated by different transposition orders are unaligned to the scale-factor bands, as illustrated in
A realistic scenario showing the potential artifacts when using unaligned borders is depicted in
a illustrates an overview of an implementation of the patch border calculator 2302 and the patcher and the location of those elements within a bandwidth extension scenario in accordance with an embodiment. Specifically, an input interface 2500 is provided, which receives the low band data 2300 and parametric data 2302. The parametric data can be bandwidth extension data as, for example, known from ISO/IEC 14496-3: 2009, which is incorporated herein by reference in its entirety, and particularly with respect to the section related to bandwidth extension, which is section 4.6.18 “SBR tool”. Of particular relevance in section 4.6.18 is section 4.6.18.3.2 “Frequency band tables”, and particularly the calculation of some frequency tables fmaster, fTableHigh, fTableLow, fTableNoise and fTableLim. Particularly, section 4.6.18.3.2.1 of the Standard defines the calculation of the master frequency band tables, and section 4.6.18.3.2.2 defines the calculation of the derived frequency band tables from the master frequency band table, and particularly outputs how fTableHigh, fTableLow and f TableNoise are calculated. Section 4.6.18.3.2.3 defines the calculation of the limiter frequency band table.
The low resolution frequency table fTableLow is for low resolution parametric data and the high resolution frequency table fTableHigh is for high resolution parametric data, which are both possible in the context of the MPEG-4 SBR tool, as discussed in the mentioned Standard and whether the parametric data is low resolution parametric data or high resolution parametric data depends on the encoder implementation. The input interface 2500 determines whether the parametric data is low or high resolution data and provides this information to the frequency table calculator 2501. The frequency table calculator then calculates the master table or generally derives a high resolution table 2502 and a low resolution table 2503 and provides same to the patch border calculator core 2504, which additionally comprises or cooperates with a limiter band calculator 2505. Elements 2504 and 2505 generate aligned synthesis patch borders 2506 and corresponding limiter band borders related to the synthesis range. This information 2506 is provided to a source band calculator 2507, which calculates the source range of the low band audio signal for a certain patch so that together with the corresponding transposition factors, the aligned synthesis patch borders 2506 are obtained after patching using, for example, a harmonic transposer 2508 as a patcher.
Particularly, the harmonic transposer 2508 may perform different patching algorithms such as a DFT-based patching algorithm or a QMF-based patching algorithm. The harmonic transposer 2508 may be implemented to perform a vocoder-like processing which is described in the context of
The transposed signal 2509 output by the transposer 2508 is forwarded to an envelope adjuster and gain limiter 2510, which receives as an input the high resolution table 2502 and the low resolution table 2503, the adjusted limiter bands 2511 and, naturally, the parametric data 2302. The envelope adjusted high band on line 2512 is then input into a synthesis filterbank 2514, which additionally receives the low band typically in the form as output by the core decoder 2509. Both contributions are merged by the synthesis filterbank 2514 to finally obtain the high frequency reconstructed signal on line 2515.
It is clear that the merging of the high band and the low band can be done differently, such as by performing a merging in the time domain rather than in the frequency domain. Furthermore, it is clear that the order of merging irrespective of the implementation of the merging and envelope adjustment can be changed, i.e. so that envelope adjustment of a certain frequency range can be performed subsequent to merging or, alternatively, before merging, where the latter case is illustrated in
As already outlined in the context of block 2508, a DFT-based harmonic transposer or a QMF-based harmonic transposer can be applied in embodiments. Both algorithms rely on a phase-vocoder frequency spreading. The core coder time-domain signal is bandwidth extended using a modified phase vocoder structure. The bandwidth extension is performed by time stretching followed by decimation, i.e. transposition, using several transposition factors (t=2, 3, 4) in a common analysis/synthesis transform stage. The output signal of the transposer will have a sampling rate twice that of the input signal, which means that for a transposition factor of two, the signal will be time stretched but not decimated, efficiently producing a signal of equal time duration as the input signal but having the twice the sampling frequency. The combined system may be interpreted as three parallel transposers using transposition factors of 2, 3 and 4, respectively, where the decimation factors are 1, 1.5 and 2. To reduce complexity, the factor 3 and 4 transposers (third and fourth order transposers) are integrated into the factor 2 transposer (second order transposer) by means of interpolation as is subsequently discussed in the context of
For each frame, a nominal “full size” transform size of a transposer is determined depending on a signal-adaptive frequency domain oversampling which can be applied in order to improve the transient response or which can be switched off. This value is indicated in
Subsequently, the pseudo code indicated in
Regarding the pseudo code in
Furthermore, it is not necessarily the case that a matching within an alignment range is looked for where the alignment range is predetermined. Instead, a search in the table can be performed to find the best matching table entry, i.e. the table entry which is closest to the target frequency value irrespective of whether the difference between those two is small or high.
Other implementations relate to a search in the table, such as fTableLow or fTableHigh for the highest border that does not exceed the (fundamental) bandwidth limits of the HFR generated signal for a transposition factor T. Then, this found highest border is used as the frequency limit of the HFR generated signal of transposition factor T. In this implementation, the target calculation indicated near box 2522 in
Further embodiments employ a mixed patching scheme which is shown in
Thus embodiments generate the patches of higher order that occupy the upper spectral regions advantageously by computationally efficient SSB copy-up patching and the lower order patches covering the middle spectral regions, for which the preservation of the harmonic structure is desired, advantageously by HBE patching. The individual mix of patching methods can be static over time or, advantageously, be signaled in the bitstream.
For the copy-up operation, the low frequency information can be used as shown in
The advantages of the proposed concepts are
The filterbank 1050 finally outputs a transposer output signal which comprises bandwidth extensions by transposition factors 2, 3, and 4, and the signal output by block 1050 is no longer bandwidth-limited to the crossover frequency, i.e. to the highest frequency of the core coder signal corresponding to the lowest frequency of the SBR or HFR generated signal components. The analysis filterbank 1010 in
Regarding the limiter frequency band tables, it is to be noted that the limiter frequency band tables can be constructed to have either one limiter band over the entire reconstruction range or approximately 1.2, 2 or 3 bands per octave, signaled by a bitstream element bs_limiter bands as defined in ISO/IEC 14496-3: 2009, 4.6.18.3.2.3. The band table may comprise additional bands corresponding to the high frequency generator patches. The table may hold indices of the synthesis filterbank subbands, where the number of element is equal to the number of bands plus one. When harmonic transposition is active, it is made sure that the limiter band calculator introduces limiter band borders coinciding with the patch borders defined by the patch border calculator 2504. Additionally, the remaining limiter band borders are then calculated between those “fixedly” set limiter band borders for the patch borders.
In the
Branch 110b, however, has a decimation functionality in order to obtain a transposition by 1.5. Due to the fact that the synthesis filterbank has two times the physical subband spacing of the analysis filterbank, a transposition factor of 3 is obtained as indicated in
Analogously, the third branch has a decimation functionality corresponding to a transposition factor of 2, and the final contribution of the different subband spacing in the analysis filterbank and the synthesis filterbank finally corresponds to a transposition factor of 4 of the third branch 110c.
Particularly, each branch has a block extractor 120a, 120b, 120c and each of these block extractors can be similar to the block extractor 1800 of
In an embodiment, the block extractor 120a of the first transposer branch 110a extracts 10 subband samples and subsequently a conversion of these 10 QMF samples to polar coordinates is performed. This output, generated by the phase adjuster 124a, is then forwarded to the windower 126a, which extends the output by zeroes for the first and the last value of the block, where this operation is equivalent to a (synthesis) windowing with a rectangular window of length 10. The block extractor 120a in branch 110a does not perform a decimation. Therefore, the samples extracted by the block extractor are mapped into an extracted block in the same sample spacing as they were extracted.
However, this is different for branches 110b and 110c. The block extractor 120b advantageously extracts a block of 8 subband samples and distributes these 8 subband samples in the extracted block in a different subband sample spacing. The non-integer subband sample entries for the extracted block are obtained by an interpolation, and the thus obtained QMF samples together with the interpolated samples are converted to polar coordinates and are processed by the phase adjuster. Then, again, windowing in the windower 126b is performed in order to extend the block output by the phase adjuster 124b by zeroes for the first two samples and the last two samples, which operation is equivalent to a (synthesis) windowing with a rectangular window of length 8.
The block extractor 120c is configured for extracting a block with a time extent of 6 subband samples and performs a decimation of a decimation factor 2, performs a conversion of the QMF samples into polar coordinates and again performs an operation in the phase adjuster 124b, and the output is again extended by zeroes, however now for the first three subband samples and for the last three subband samples. This operation is equivalent to a (synthesis) windowing with a rectangular window of length 6.
The transposition outputs of each branch are then added to form the combined QMF output by the adder 128, and the combined QMF outputs are finally superimposed using overlap-add in block 130, where the overlap-add advance or stride value is two times the stride value of the block extractors 120a, 120b, 120c as discussed before.
For the third branch 110c, the block extractor 120c once again receives the patch borders and performs a block extraction from the subbands corresponding to synthesis bands defined by xOverQmf(2) until xOverQmf(3). The analysis numbers n are calculated by 2 multiplied by k, and this is the calculation rule for calculating the analysis channel numbers from the synthesis channel numbers. In this context, it is to be outlined that xOverQmf corresponds to xOverBin of
The procedure for determining the patch borders for calculating the analysis ranges for the embodiment of
An embodiment comprises a method for decoding an audio signal by using subband block based harmonic transposition, comprising the filtering of a core decoded signal through an M-band analysis filter bank to obtain a set of subband signals; synthesizing a subset of said sub-band signals by means of subsampled synthesis filter banks having a decreased number of subbands, to obtain subsampled source range signals.
An embodiment relates to a method for aligning the spectral band borders of HFR generated signals to spectral borders utilized in a parametric process.
An embodiment relates to a method for aligning the spectral borders of the HFR generated signals to the spectral borders of the envelope adjustment frequency table comprising: the search for the highest border in the envelope adjustment frequency table that does not exceed the fundamental bandwidth limits of the HFR generated signal of transposition factor T; and using the found highest border as the frequency limit of the HFR generated signal of transposition factor T.
An embodiment relates to a method for aligning the spectral borders of the limiter tool to the spectral borders of the HFR generated signals comprising: adding the frequency borders of the HFR generated signals to the table of borders used when creating the frequency band borders used by the limiter tool; and forcing the limiter to use the added frequency borders as constant borders and to adjust the remaining borders accordingly.
An embodiment relates to combined transposition of an audio signal comprising several integer transposition orders in a low resolution filter bank domain where the transposition operation is performed on time blocks of subband signals.
A further embodiment relates to combined transposition, where transposition orders greater than 2 are embedded in an order 2 transposition environment.
A further embodiment relates to combined transposition, where transposition orders greater than 3 are embedded in an order 3 transposition environment, whereas transposition orders lower than 4 are performed separately.
A further embodiment relates to combined transposition, where transposition orders (e.g. transposition orders greater than 2) are created by replication of previously calculated transposition orders (i.e. especially lower orders) including the core coded bandwidth. Every conceivable combination of available transposition orders and core bandwidth is possible without restrictions.
An embodiment relates to reduction of computational complexity due to the reduced number of analysis filter banks which may be used for transposition.
An embodiment relates to an apparatus for generating a bandwidth extended signal from an input audio signal, comprising: a patcher for patching an input audio signal to obtain a first patched signal and a second patched signal, the second patched signal having a different patch frequency compared to the first patched signal, wherein the first patched signal is generated using a first patching algorithm, and the second patched signal is generated using a second patching algorithm; and a combiner for combining the first patched signal and the second patched signal to obtain the bandwidth extended signal.
A further embodiment relates to this apparatus according, in which the first patching algorithm is a harmonic patching algorithm, and the second patching algorithm is a non-harmonic patching algorithm.
A further embodiment relates to a preceding apparatus, in which the first patching frequency is lower than the second patching frequency or vice versa.
A further embodiment relates to a preceding apparatus, in which the input signal comprises a patching information; and in which the patcher is configured for being controlled by the patching information extracted from the input signal to vary the first patching algorithm or the second patching algorithm in accordance with the patching information.
A further embodiment relates to a preceding apparatus, in which the patcher is operative to patch subsequent blocks of audio signal samples, and in which the patcher is configured to apply the first patching algorithm and the second patching algorithm to the same block of audio samples.
A further embodiment relates to a preceding apparatus, in which a patcher comprises, in arbitrary orders, a decimator controlled by a bandwidth extension factor, a filter bank, and a stretcher for a filter bank subband signal.
A further embodiment relates to a preceding apparatus, in which the stretcher comprises a block extractor for extracting a number of overlapping blocks in accordance with an extraction advance value; a phase adjuster or windower for adjusting subband sampling values in each block based on a window function or a phase correction; and an overlap/adder for performing an overlap-add-processing of windowed and phase adjusted blocks using an overlap advance value greater than the extraction advance value.
A further embodiment relates to an apparatus for bandwidth extending an audio signal comprising: a filter bank for filtering the audio signal to obtain downsampled subband signals; a plurality of different subband processors for processing different subband signals in different manners, the subband processors performing different subband signal time stretching operations using different stretching factors; and a merger for merging processed subbands output by the plurality of different subband processors to obtain a bandwidth extended audio signal.
A further embodiment relates to an apparatus for downsampling an audio signal, comprising: a modulator; an interpolator using an interpolation factor; a complex low-pass filter; and a decimator using a decimation factor, wherein the decimation factor is higher than the interpolation factor.
An embodiment relates to an apparatus for downsampling an audio signal, comprising: a first filter bank for generating a plurality of subband signals from the audio signal, wherein a sampling rate of the subband signal is smaller than a sampling rate of the audio signal; at least one synthesis filter bank followed by an analysis filter bank for performing a sample rate conversion, the synthesis filter bank having a number of channels different from a number of channels of the analysis filter bank; a time stretch processor for processing the sample rate converted signal; and a combiner for combining the time stretched signal and a low-band signal or a different time stretched signal.
A further embodiment relates to an apparatus for downsampling an audio signal by a noninteger downsampling factor, comprising: a digital filter; an interpolator having an interpolation factor; a poly-phase element having even and odd taps; and a decimator having a decimation factor being greater than the interpolation factor, the decimation factor and the interpolation factor being selected such that a ratio of the interpolation factor and the decimation factor is non-integer.
An embodiment relates to an apparatus for processing an audio signal, comprising: a core decoder having a synthesis transform size being smaller than a nominal transform size by a factor, so that an output signal is generated by the core decoder having a sampling rate smaller than a nominal sampling rate corresponding to the nominal transform size; and a post processor having one or more filter banks, one or more time stretchers and a merger, wherein a number of filter bank channels of the one or more filter banks is reduced compared to a number as determined by the nominal transform size.
A further embodiment relates to an apparatus for processing a low-band signal, comprising: a patch generator for generating multiple patches using the low-band audio signal; an envelope adjustor for adjusting an envelope of the signal using scale factors given for adjacent scale factor bands having scale factor band borders, wherein the patch generator is configured for performing the multiple patches, so that a border between the adjacent patches coincides with a border between adjacent scale factor bands in the frequency scale.
An embodiment relates to an apparatus for processing a low-band audio signal, comprising: a patch generator for generating multiple patches using the low band audio signal; and an envelope adjustment limiter for limiting envelope adjustment values for a signal by limiting in adjacent limiter bands having limiter band borders, wherein the patch generator is configured for performing the multiple patches so that a border between adjacent patches coincides with a border between adjacent limiter bands in a frequency scale.
The inventive processing is useful for enhancing audio codecs that rely on a bandwidth extension scheme. Especially, if an optimal perceptual quality at a given bitrate is highly important and, at the same time, processing power is a limited resource.
Most prominent applications are audio decoders, which are often implemented on hand-held devices and thus operate on a battery power supply.
The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a micro-processor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
This application is a continuation of copending International Application No. PCT/EP2011/053313, filed Mar. 4, 2011, which is incorporated herein by reference in its entirety, and additionally claims priority from U.S. Application No. 61/312,127, filed Mar. 9, 2010, which is also incorporated herein by reference in its entirety. The present invention relates to audio source coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR), and to digital effect processors, e.g. so-called exciters, where generation of harmonic distortion adds brightness to the processed signal, and to time stretchers, where the duration of a signal is extended while maintaining the spectral content of the original.
Number | Date | Country | |
---|---|---|---|
61312127 | Mar 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2011/053313 | Mar 2011 | US |
Child | 13604336 | US |