The replay speed of audio signals can be changed while maintaining the pitch, for example with the help of a phase vocoder (see for example J. L. Flanagan and R. M. Golden, “The Bell System Technical Journal”, November 1966, pages 1394 to 1509; U.S. Pat. No. 6,549,884 Laroche, J. & Dolson, M.: “Phase-vocoder pitch-shifting”; Jean Laroche and Mark Dolson, “New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing And Other Exotic Effects”, Proc. 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, N.Y., Oct. 17-20, 1999). In the same way, with such methods transposition of the signal can be performed while maintaining the original replay duration. The latter is obtained by replaying the stretched signal accelerated by the factor of time stretching. In time discrete signal representation, this corresponds to down-sampling the signal by the stretching factor while maintaining the sampling frequency. Conventionally, this time stretching takes place in the time domain. Alternatively, the same can also take place within a filter bank, such as a pseudo-quadrature mirror filterbank (pQMF). The pseudo-quadrature mirror filterbank (pQMF) is sometimes also called a QMF filterbank.
Specific challenges in stretching are transient events that are “blurred” in time during the processing step of time stretching. This occurs because methods, such as the phase vocoder, affect the so-called vertical coherence properties (with regard to a time frequency spectrogram representation) of the signal.
Some current methods stretch the time more around the transients, in order to not have to perform any or only little time stretching during the duration of the transient. This has been described, for example, in:
Another paper on the topic was written by Röbel, A.: A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER; Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, Sep. 8-11, 2003.
In time stretching of audio signals by phase vocoders, transient signal portions are “blurred” by dispersions, since the so-called vertical coherency in spectrogram view of the signal is affected. Methods operating with so-called overlap-add methods can generate spurious pre echoes and post echoes of transient sound events. These problems can be handled by changing time stretching in the environment of transients, no stretching during the actual transients and stronger stretching in the surrounding. If, however, transposition is to take place, the transposition factor will no longer be constant in the environment of the transients, i.e. the pitch of superimposed (possibly tonal) signal portions changes in a spuriously audible manner. When time stretching takes place within a filter bank, such as the pQMF, similar problems occur.
The field of this application relates to a method for perceptually motivated handling of transient sound events within such a process. In particular, transient sound events may be removed during signal manipulation of time stretching. Subsequently, a precisely fitting addition may be performed of the unprocessed transient signal portion to the changed (stretched) signal under consideration of the stretching.
According to an embodiment, an apparatus for processing an audio signal may have an analysis filterbank for generating subband signals of the audio signal; a time manipulator for individually time manipulating a plurality of subband signals representing the audio signal, wherein the time manipulator may have an overlap-add stage for overlapping and adding blocks of at least one of the plurality of subband signals using an overlap-add-advance value different from a block-extraction-advance value used for extracting the blocks from a subband signal of the plurality of subband signals; a transient detector for detecting a transient in the audio signal or the at least one subband signal of the plurality of subband signals, wherein the overlap-add stage is configured for reducing an influence of a detected transient or for not using the detected transients in a subband-individual manner when adding by the overlap-add stage; and a transient adder for adding a detected transient to the at least one subband signal generated by the overlap/add stage in a subband-individual manner.
According to another embodiment, a method for processing an audio signal may have the steps of generating a plurality of subband signals of the audio signal; overlapping and adding blocks of a corresponding one of the plurality of subband signals representing the audio signal using an overlap-add-advance value different from a block-extraction-advance value used for extracting the blocks from a subband signal of the plurality of subband signals; detecting a transient in the at least one subband signal of the plurality of subband signals; either reducing an influence or discarding a detected transient when overlapping and adding in a subband-individual manner; adding a detected transient to the at least one subband signal generated by the action of overlapping and adding in a subband-individual manner.
According to another embodiment, a computer program may perform a method for processing an audio signal when the computer program runs on a computer, wherein the method may have the steps of generating a plurality of subband signals of the audio signal; overlapping and adding blocks of a corresponding one of the plurality of subband signals representing the audio signal using an overlap-add-advance value different from a block-extraction-advance value used for extracting the blocks from a subband signal of the plurality of subband signals; detecting a transient in the at least one subband signal of the plurality of subband signals; either reducing an influence or discarding a detected transient when overlapping and adding in a subband-individual manner; adding a detected transient to the at least one subband signal generated by the action of overlapping and adding in a subband-individual manner.
According to embodiments of the teachings disclosed in this document, an apparatus for processing an audio signal, comprises a time manipulator for individually time manipulating a plurality of subband signals of the audio signal. The time manipulator comprises an overlap-add stage for overlapping and adding blocks of at least one of the plurality of subband signals using an overlap-add-advance value being different from a block extraction advance value, a transient detector for detecting a transient in the audio signal or a subband signal, and a plurality of transient adders for adding a detected transient to a plurality of signals generated by the overlap-add stage. The overlap-add stage is configured for reducing an influence of a detected transient or for not using the detected transients when adding.
According to another embodiment, an apparatus for processing an audio signal, comprises an analysis filterbank for generating subband signals; a time manipulator for individually time manipulating a plurality of subband signals, the time manipulator comprising: an overlap-add stage for overlapping and adding blocks of the subband signal using an overlap-add-advance value being different from a block extraction advance value; a transient detector for detecting a transient in the audio signal or a subband signal, wherein the overlap-adder stage is configured for reducing an influence of a detected transient or for not using the detected transients when adding; and a transient adder for adding a detected transient to a signal generated by the overlap/add stage.
According to another embodiment, a method for processing an audio signal comprises:
Another embodiment relates to a computer program for performing a method when the computer program runs on a computer, the method comprising:
According to related embodiments, the apparatus may further comprise a decimator for decimating the audio signal or the plurality of audio signals. The time manipulator may be configured for performing a time stretching of the plurality of subband signals.
According to a further embodiment, the transient detector may be configured to mark blocks detected as comprising a transient; and in which the plurality of overlap-add stages is configured to ignore the marked blocks.
According to a further embodiment, the plurality of overlap-add stages may be configured for applying an overlap-add value being greater than a block extraction value for performing a time stretching of the plurality of subband signals.
According to a further embodiment, the time manipulator may further comprise a block extractor, a windower/phase adjustor, and a phase calculator for calculating a phase, based on which the windower/phase adjustor performs the adjustment of an extracted block.
According to a further embodiment, the transient adder may be further configured to insert a portion of the subband signal having the transient, wherein the length of the portion is selected sufficiently long, such that a cross-fade from the signal output from the portion having the transient to the output from the overlap-add-processing is possible.
According to a related embodiment, the transient adder may be configured for performing the cross-fade operation.
According to a further embodiment, the transient detector may be configured for detecting blocks extracted by a block extractor from the subband signal having a transient characteristic. The overlap-add stage may be further configured for reducing an influence of the detected blocks or for not using the detected blocks when adding.
According to a further embodiment, the transient detector may be configured for performing a moving center of gravity calculation of energy across a predetermined time period of a signal to be input into an analysis filterbank or a subband signal.
Exact determination of the position of the transient for the purpose of selecting an appropriate section, can, for example, be performed with the help of a moving centroid calculation of the energy across an appropriate time period. In particular, transient determination can be performed in a frequency-selective manner within a filter bank. Additionally, the time period of the section can be selected as a constant value or in a variable manner based on information from the transient determination.
According to a further embodiment, the apparatus may further comprise an analysis filterbank for generating the subband signals.
According to a further embodiment, the apparatus may further comprise a decimator arranged at an input side or an output side of the analysis filter bank. The time manipulator may be configured for performing a time stretching of the plurality of subband signals.
According to a further embodiment, the apparatus may further comprise a first analysis filterbank, a second analysis filter bank, a resampler upstream of the second analysis filter bank, and a plurality of phase vocoders for a second plurality of subband signals output by the second analysis filterbank, the plurality of phase vocoders having a bandwidth extension factor greater than one and a phase vocoder output being provided to the plurality of overlap-add stages.
According to a further embodiment, the apparatus may further comprise a connecting stage between the first analysis bank and the plurality of phase vocoders at an input side of the connecting stage and the plurality of overlap-add stages at an output stage of the connecting stage, the connecting stage being configured to control a provision of the blocks of the corresponding one of the plurality of subband signals and phase-vocoder processed signal to the overlap-add stage.
According to a further embodiment, the apparatus may further comprise: an amplitude correction configured to compensate for amplitude affecting effects of different overlap values.
The present application thus provides different aspects of apparatuses, methods or computer programs for processing audio signals in the context of bandwidth extension and in the context of other audio applications which are not related to bandwidth extension. The features of the described and claimed individual aspects can be partly or fully combined, but can also be used separately from each other, since the individual aspects already provide advantages with respect to perceptual quality, computational complexity and processor/memory resources when implemented in a computer system or micro processor.
According to the teachings disclosed herein, and in contrast to existing methods, a windowed section including the transient may be removed from the signal to be manipulated. This may be obtained by summing up only those time portions not including transients, block by block, during the overlap-and-add (OLA) process. This results in a time stretched signal including no transients. After terminating the time stretching, the unstretched transients that have been removed from the original signal are added again.
Dispersion and echo effects hence no longer affect the subjective audio quality of the transient.
By inserting the original signal portion, change of timbre or pitch will result when changing the sampling rate. Generally, however, the transient psycho-acoustically masks this. If, in particular, stretching by an integer factor takes place, the timbre will be changed only slightly, since outside the environment of the transient, only every n-th (n=stretching factor) harmonic is mapped.
The accompanying drawings are included to provide a further understanding of embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and, together with the description, serve to explain the principles of the embodiments. Other embodiments and many of the intended advantages of embodiments will be readily appreciated, as they become better understood with reference to the following detailed description. Like reference numerals designate corresponding or similar parts.
Derived there from,
With the apparatus, method, and computer program according to the disclosed teachings, artifacts (dispersions, pre and post echoes) resulting when processing transients by time stretching and transposition methods, are effectively avoided. Above that, it is differentiated in a frequency-selective manner whether stationary or transient portions in a subband predominate, and the transient handling method is selected correspondingly. Additionally, the time period of the signal portion to be inserted can be formed in a variable manner considering parameters of transient determination for optimally adapting the time period of the signal portion to the transient.
The method is suitable for all audio applications where the replay speed of audio signals or their pitch is to be changed. Particularly suited are applications for bandwidth extension or in the field of audio effects.
Each pQMF analysis stage 104a, 104b, 104c outputs a plurality of different subband signals in different subband channels, where each subband signal has a reduced bandwidth and, typically, a reduced sampling rate. In this case, the filterbank is a 2-times oversampled filterbank which is advantageous for the present invention. However, also a critically sampled filterbank may be used.
The corresponding narrow band signal or subband signal output in a pQMF analysis channel is input into a phase vocoder. Although
An apparatus according to the teachings disclosed herein may be implemented in a distributed manner in one or more of the QMF analysis stages 104a, 104b, 104c and the QMF synthesis filterbank 108. In the same manner or a similar manner, a time manipulator which is a part of the apparatus according to the disclosed teachings may be distributed aming the QMF analysis stages 104a, 104b, 104c and the QMF synthesis filterbank 108. Accordingly, the one or more of the QMF analysis stages 104a, 104b, 104c may omit blocks containing a transient from time manipulation and forward the original blocks to the synthesis filterbank 108. The synthesis filterbank 108 may provide the functionality of a transient adder by adding a detected and typically unmodified transient to a signal generated by an overlap-add stage of the synthesis filterbank 108. The schematic block diagram of
The individual phase vocoders are related to an individual pQMF band. In
The synthesized signal can be generated using an arbitrarily selected combination of phase vocoder outputs and baseband pQMF analysis 112 outputs. It is to be noted that the switching stage 114 can be a controlled switching stage which is controlled by an audio signal having a certain side information, or which is controlled by a certain signal characteristic. Alternatively, the stage 114 can be a simple connecting stage without any switching capabilities. This is the case, when a certain distribution of output signals from elements 112 and 106a-106b is fixedly set and fixedly programmed. In this case, the stage 114 will not comprise any switches, but will comprise certain through-connections.
The individual blocks are input into a windower 1802 for windowing the blocks using a window function for each block. Additionally, a phase calculator 1804 is provided which calculates a phase for each block. The phase calculator 1804 can either use the individual block before windowing or subsequent to windowing. Then, a phase adjustment value p×k is calculated and input into a phase adjuster 1806. The phase adjuster applies the adjustment value to each sample in the block. Furthermore, the factor k is equal to the bandwidth extension factor. When, for example, the bandwidth extension by a factor 2 is to be obtained, then the phase p calculated for a block extracted by the block extractor 1800 is multiplied by the factor 2 and the adjustment value applied to each sample of the block in the phase adjustor 1806 is p multiplied by 2. This is a value/rule provided by way of example. Alternatively, the corrected phase for synthesis is k*p, p+(k−1)*p. So in this example the correction factor is either 2, if multiplied or 1*p if added. Other values/rules can be applied for calculating the phase correction value.
In an embodiment, the single subband signal is a complex subband signal, and the phase of a block can be calculated by a plurality of different ways. One way is to take the sample in the middle or around the middle of the block and to calculate the phase of this complex sample.
Although illustrated in
The phase-adjusted blocks are input into an overlap/add and amplitude correction block 1808, where the windowed and phase-adjusted blocks are overlap-added. Importantly, however, the sample/block advance value in block 1808 is different from the value used in the block extractor 1800. Particularly, the sample/block advance value in block 1808 is greater than the value e used in block 1800, so that a time stretching of the signal output by block 1808 is obtained. Thus, the processed subband signal output by block 1808 has a length which is longer than the subband signal input into block 1800. When the bandwidth extension of two is to be obtained, then the sample/block advance value is used which is two times the corresponding value in blocks 1800. This results in a time stretching by a factor of two. When, however, other time stretching factors are needed, then other sample/block advance values can be used so that the output of block 1808 has a needed time length.
For addressing the overlap issue, an amplitude correction is advantageously performed in order to address the issue of different overlaps in block 1800 and 1808. This amplitude correction could, however, be also introduced into the windower/phase adjustor multiplication factor, but the amplitude correction can also be performed subsequent to the overlap/processing.
In the above example with a block length of 12 and a sample/block advance value in the block extractor of one, the sample/block advance value for the overlap/add block 1808 would be equal to two, when a bandwidth extension by a factor of two is performed. This would still result in an overlap of six blocks. When a bandwidth extension by a factor of three is to be performed, then the sample/block advance value used by block 1808 would be equal to three, and the overlap would drop to an overlap of four. When a four-fold bandwidth extension is to be performed, then the overlap/add block 1808 would have to use a sample/block advance value of four which would still result in an overlap of more than two blocks.
The phase vocoder for an individual subband signal illustrated in
The stretched signal without transients is input into the transient adder which is configured for adding the transient to the stretched signal so that, at the output, there exists a stretched signal having inserted transients, but these inserted transients have not been affected by a multiple overlap/add processing.
In one embodiment, the transient portion is inserted from the subband signal itself as illustrated by connection line 206 and line 201a. Alternatively, the signal can be taken out from any other subband signal or from the signal before the subband analysis, since it is characteristic for a transient that the transient occurs in a quite similar manner over the individual subbands. On the other hand, however, using the transient event occurring in a subband is advantageous in some instances, since the sampling rate and other considerations are as close as possible to a stretched signal.
The transient-containing samples are then added again to the stretched signal without transients by the transient adder 204. The transient adder 204 receives a control signal from the transient detector 200 and the original single subband signal as inputs. With this information, the transient adder can identify the samples that have been suppressed by the transient suppression windower 1798 and re-insert these samples in the stretched signal without transients. At the output of the transient adder 204 the processed subband signal (long time length) having inserted transients is obtained.
Beneath the sequence 1202 in
In
The lower part of
As mentioned above, a residual gap of two samples remains. When the regular blocks begin again, starting with the subsequent block 1208′,
As an alternative to removing complete blocks that comprise one or more transient-containing samples, as illustrated in
The block extractor and buffer 1810 outputs extracted blocks and provides them to an overlap-add stage 1808 in which the extracted blocks are overlapped with an overlap-add-advance value k*e different from the block extraction advance value e and added up to form the time manipulated audio signal. The overlap-add stage 1808 may comprise a plurality of overlap-add units, e.g. one overlap-add unit for a corresponding one of the plurality of subband signals. Another option would be to use a single overlap-add stage or a few overlap-add units in a time-sharing or multiplexed manner so that the subband signals are overlap-added individually and successively.
The time manipulator further comprises a transient detector 200 which receives the plurality of subband signals. The transient detector 200 may analyze the subband signals or the audio signal with respect to e.g. a non-harmonic attack phase of a musical sound or spoken word or a high degree of non-periodic components and/or a higher magnitude of high frequencies than the harmonic content of that sound. An output of the transient detector 200 indicates whether or not a transient has been identified in a current section of the audio signal and is provided to the overlap-add stage 1808 and a transient adder 1812. In case the output of the transient detector 200 indicates that a transient has been detected, the overlap-add stage 1808 is controlled to ignore those blocks that contain the transient T when performing the overlap-add action. The transient adder 1812, on its part, inserts the original transient section to the otherwise time-manipulated audio signal upon reception of an indication from the transient detector 200 that a transient has been detected. The time-manipulated signal with the added transient forms an output of the time manipulator.
At 1504 the blocks of a corresponding subband signal of the plurality of subband signals are overlapped and added. An overlap-add advance value is used that is different from a block extraction advance value. The action 1504 represents the normal process flow in the absence of transients and is performed continuously.
A transient detection action is performed at 1506 to detect a transient in the audio signal or in a subband signal. The action 1506 may be performed concurrently with the action 1504 and other actions shown in the flow diagram of
An influence of a detected transient is either reduced, or the detected transient is discarded, when performing the action 1504 of overlapping and adding.
A detected transient is then added, at action 1510, to a plurality of signals generated by the action 1504 of overlapping and adding.
Although according to the teachings disclosed herein the transient section of the audio signal has typically not undergone the same time manipulation as the rest of the audio signal, the time-manipulated resulting signal typically renders the transient sections in a realistic manner. This may be at least partly due to the fact that a transient is highly insensitive to many signal manipulation methods, such as frequency shifting.
According to another aspect of the teachings disclosed herein, an apparatus for processing an audio signal may comprise:
an analysis filterbank for generating subband signals;
a time manipulator for individually time manipulating a plurality of subband signals, the time manipulator comprising:
an overlap-add stage for overlapping and adding blocks of the subband signal using an overlap-add-advance value being different from a block extraction advance value;
a transient detector for detecting a transient in the audio signal or a subband signal,
wherein the overlap-adder stage is configured for reducing an influence of a detected transient or for not using the detected transients when adding; and
a transient adder for adding a detected transient to a signal generated by the overlap/add stage.
According to another aspect of the teachings disclosed herein, an apparatus as previously described, may further comprise a decimator arranged at an input side or an output side of the analysis filterbank, wherein the time manipulator may be configured for performing a time stretching of a subband signal.
According to another aspect of the teachings disclosed herein, in an apparatus as previously described, the transient detector may be configured to mark blocks detected as comprising a transient; and the overlap-adder-stage may be configured to ignore the marked blocks.
According to another aspect of the teachings disclosed herein, in an apparatus as previously described, the overlap-add-stage may be configured for applying an overlap-add-advance value being greater than a block-extraction-advance value for performing a time stretching of the subband signal.
According to another aspect of the teachings disclosed herein, in an apparatus in accordance with one of the preceding claims, the time manipulator may comprise: a block extractor; a windower/phase adjustor; and a phase calculator for calculating a phase, based on which the windower/phase adjuster performs the phase adjustment of an extracted block.
According to another aspect of the teachings disclosed herein, in an apparatus as previously described, the transient detector may be configured to determine a length of a portion of the subband signal containing the transient, the length matching the length of the signal to be inserted by the transient adder.
According to another aspect of the teachings disclosed herein, in an apparatus as previously described, the transient adder may be configured to insert a portion of the subband signal having the transient, wherein the length of the portion may be selected sufficiently long, such that a cross-fade from the signal output from the overlap-add-processing to the portion having the transient or from the portion having the transient to the output from the overlap-add-processing is possible.
According to another aspect of the teachings disclosed herein, in an apparatus as previously described, the transient adder may be configured for performing the cross-fade operation.
According to another aspect of the teachings disclosed herein, in an apparatus as previously described, the transient detector may be configured for detecting blocks extracted by a block extractor from the subband signal having a transient characteristic, and the overlap-add-stage may be configured for reducing an influence of the detected blocks or for not using the detected blocks when adding.
According to another aspect of the teachings disclosed herein, in an apparatus as previously described, the transient detector may be configured for performing a moving center of gravity calculation of an energy across a predetermined time period of a signal to be input into an analysis filterbank or a subband signal.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
This application is a continuation of copending International Application No. PCT/EP2011/053303, filed Mar. 4, 2011, which is incorporated herein by reference in its entirety, and additionally claims priority from U.S. Patent Application No. 61/312,131, filed Mar. 9, 2010, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61312131 | Mar 2010 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2011/053303 | Mar 2011 | US |
Child | 13604813 | US |