The present invention is related to audio processing and, in particular, to the decomposition of audio signals into a background component signal and a foreground component signal.
A significant amount of references directed to audio signal processing exist, in which some of these references are related to audio signal decomposition. Exemplary references are:
Furthermore, WO 2010017967 discloses an apparatus for determining a spatial output multichannel audio signal based on an input audio signal comprising a semantic decomposer for decomposing the input audio signal into a first decomposed signal being a foreground signal part and into a second decomposed signal being a background signal part. Furthermore, a renderer is configured for rendering the foreground signal part using amplitude panning and for rendering the background signal part by decorrelation. Finally, the first rendered signal and the second rendered signal are processed to obtain a spatial output multi-channel audio signal.
Furthermore, references [1] and [2] disclose a transient steering decorrelator.
The not yet published European application 16156200.4 discloses a high resolution envelope processing. The high resolution envelope processing is a tool for improved coding of signals that predominantly consist of many dense transient events such as applause, raindrop sounds, etc. At an encoder side, the tool works as a preprocessor with high temporal resolution before the actual perceptual audio codec by analyzing the input signal, attenuating and, thus, temporally flattening the high frequency part of transient events and generating a small amount of side information such as 1 to 4 kbps for stereo signals. At the decoder side, the tool works as a postprocessor after the audio codec by boosting and, thus, temporally shaping the high frequency part of transient events, making use of the side information that was generated during encoding.
Upmixing usually involves a signal decomposition into direct and ambient signal parts where the direct signal is panned between loudspeakers and the ambient part is decorrelated and distributed across the given number of channels. Remaining direct components, like transients, within the ambient signals lead to an impairment of the resulting perceived ambience in the upmixed sound scene. In [3] a transient detection and processing is proposed which reduces detected transients within the ambient signal. One method proposed for transient detection comprises a comparison between a frequency weighted sum of bins in one time block and a weighted long time running mean for deciding whether a certain block is to be suppressed or not.
In [4], efficient spatial audio coding of applause signals is addressed. The proposed downmix- and upmix methods all work for a full applause signal.
Furthermore, reference [5] discloses a harmonic/percussive separation where signals are separated in harmonic and percussive signal components by applying median filters to the spectrogram in horizontal and vertical direction.
Reference [6] represents a tutorial comprising frequency domain approaches, time domain approaches such as an envelope follower or an energy follower in the context of onset detection. Reference [7] discloses power tracking in the frequency domain such as a rapid increase of power and reference [8] discloses a novelty measure for the purpose of onset detection.
The separation of a signal into a foreground and a background signal part as described in references of conventional technology is disadvantageous due to the fact that such known procedures may result in a reduced audio quality of a result signal or of decomposed signals.
According to an embodiment, an apparatus for decomposing an audio signal into a background component signal and a foreground component signal may have: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a block characteristic of a current block of the audio signal and for determining an average characteristic for a group of blocks, the group of blocks including at least two blocks; and a separator for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal includes the background portion of the current block and the foreground component signal includes the foreground portion of the current block.
According to another embodiment, a method of decomposing an audio signal into a background component signal and a foreground component signal may have the steps of: generating a time sequence of blocks of audio signal values; determining a block characteristic of a current block of the audio signal and determining an average characteristic for a group of blocks, the group of blocks including at least two blocks; and separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal includes the background portion of the current block and the foreground component signal includes the foreground portion of the current block.
According to another embodiment, a non-transitory digital storage medium may have a computer program stored thereon to perform the inventive method when said computer program is run by a computer.
In one aspect, an apparatus for decomposing an audio signal into a background component signal and a foreground component signal comprises a block generator for generating a time sequence of blocks of audio signal values, an audio signal analyzer connected to the block generator and a separator connected to the block generator and the audio signal analyzer. In accordance with a first aspect, the audio signal analyzer is configured for determining a block characteristic of a current block of the audio signal and an average characteristic for a group of blocks, the group of blocks comprising at least two blocks such as a preceding block, the current block and a following block or even more preceding blocks or more following blocks.
The separator is configured for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic. Thus, the background component signal comprises the background portion of the current block and the foreground component signal comprises the foreground portion of the current block. Therefore, the current block is not simply decided as being either background or foreground. Instead, the current block is actually separated into a non-zero background portion and a non-zero foreground portion. This procedure reflects the situation that, typically, a foreground signal never exists alone in a signal but is typically combined to a background signal component. Thus, the present invention, in accordance with this first aspect, reflects the situation that irrespective of whether a certain thresholding is performed or not, the actual separation either without any threshold or when a certain threshold is reached by the ratio, a background portion in addition to the foreground portion typically remains.
Furthermore, the separation is done by a very specific separation measure, i.e., the ratio of a block characteristic of the current block and the average characteristic derived from at least two blocks, i.e., derived from the group of blocks. Thus, depending on the size of the group of blocks, a quite slowly changing moving average or a quite rapidly changing moving average can be set. For a high number of blocks in the group of blocks, the moving average is relatively slowly changing while, for a small number of blocks in the group of blocks, the moving average is quite rapidly changing. Furthermore, the usage of a relation between a characteristic from the current block and an average characteristic over the group of blocks reflects a perceptual situation, i.e., that individuals perceive a certain block as comprising a foreground component when a ratio between a characteristic of this block with respect to an average is at a certain value. In accordance with this aspect, however, this certain value does not necessarily have to be a threshold. Instead, the ratio itself can already be used for performing a quantitative separation of the current block into a background portion and a foreground portion. A high ratio results in a high portion of the current block being a foreground portion while a low ratio results in the situation that most or all of the current block remains in the background portion and the current block only has a small foreground portion or does not have any foreground portion at all.
Advantageously, an amplitude-related characteristic is determined and this amplitude-related characteristic such as an energy of the current block is compared to an average energy of the group of blocks to obtain the ratio, based on which the separation is performed. In order to make sure that in response to a separation a background signal remains, a gain factor is determined and this gain factor then controls how much of the average energy of a certain block remains within the background or noise-like signal and which portion goes into the foreground signal portion that can, for example, be a transient signal such as a clap signal or a raindrop signal or the like.
In a further second aspect of the present invention that can be used in addition to the first aspect or separate from the first aspect, the apparatus for decomposing the audio signal comprises a block generator, an audio signal analyzer and a separator. The audio signal analyzer is configured for analyzing the characteristic of the current block of the audio signal. The characteristic of the current block of the audio signal can be the ratio as discussed with respect to the first aspect but, alternatively, can also be a block characteristic only derived from the current block without any averaging. Furthermore, the audio signal analyzer is configured for determining a variability of the characteristic within a group of blocks, where the group of blocks comprises at least two blocks and advantageously at least two preceding blocks with or without the current block or at least two following blocks with or without the current block or both at least two preceding blocks, at least two following blocks, again with or without the current block. In advantageous embodiments, the number of blocks is greater than 30 or even 40.
Furthermore, the separator is configured for separating the current block into the background portion and the foreground portion, wherein this separator is configured to determine a separation threshold based on the variability determined by the signal analyzer and to separate the current block when the characteristic of the current block is in a predetermined relation to the separation threshold such as greater than or equal to the separation threshold. Naturally, when the threshold is defined to be a kind of inverse value then the predetermined relation can be a smaller than relation or a smaller than or equal relation. Thus, thresholding is typically performed in such a way that when the characteristic is within a predetermined relation to the separation threshold then the separation into the background portion and the foreground portion is performed while, when the characteristic is not within the predetermined relation to the separation threshold then a separation is not performed at all.
In accordance with the second aspect that uses the variable threshold depending on the variability of the characteristic within the group of blocks, the separation can be a full separation, i.e., that the whole block of audio signal values is introduced into the foreground component when a separation is performed or the whole block of audio signal values resembles a background signal portion when the predetermined relation with respect to the variable separation threshold is not fulfilled. In an advantageous embodiment this aspect is combined with the first aspect in that as soon as the variable threshold is found to be in a predetermined relation to the characteristic then a non-binary separation is performed, i.e., that only a portion of the audio signal values is put into the foreground signal portion and a remaining portion is left in the background signal.
Advantageously, the separation of the portion for the foreground signal portion and the background signal portion is determined based on a gain factor, i.e., the same signal values are, in the end, within the foreground signal portion and the background signal portion but the energy of the signal values within the different portions is different from each other and is determined by a separation gain that, in the end, depends on the characteristic such as the block characteristic of the current block itself or the ratio for the current block between the block characteristic for the current block and an average characteristic for the group of blocks associated with the current block.
The usage of a variable threshold reflects the situation that individuals perceive a foreground signal portion even as a small deviation from a quite stationary signal, i.e., when a certain signal is considered that is very stationary, i.e., does not have significant fluctuations. Then even a small fluctuation is already perceived to be a foreground signal portion. However, when there is a strongly fluctuating signal then it appears that the strongly fluctuating signal itself is perceived to be the background signal component and a small deviation from this pattern of fluctuations is not perceived to be a foreground signal portion. Only stronger deviations from the average or expected value are perceived to be a foreground signal portion. Thus, it is advantageous to use a quite small separation threshold for signals with a small variance and to use a higher separation threshold for signals with a high variance. However, when inverse values are considered the situation is opposite to the above.
Both aspects, i.e., the first aspect having a non-binary separation into the foreground signal portion and the background signal portion based on the ratio between the block characteristic and the average characteristic and the second aspect comprising a variable threshold depending on the variability of the characteristic within the group of blocks, can be used separately from each other or can even be used together, i.e., in combination with each other. The latter alternative constitutes an advantageous embodiment as described later on.
Embodiments of the invention are related to a system where an input signal is decomposed into two signal components to which individual processing can be applied and where the processed signals are re-synthesized to form an output signal. Applause and also other transient signals can be seen as a superposition of distinctly and individually perceivable transient clap events and a more noise-like background signal. In order to modify characteristics such as the ratio of foreground and background signal density, etc., of such signals, it is advantageous to be able to apply an individual processing to each signal part. Additionally, a signal separation motivated by human perception is obtained. Furthermore, the concept can also be used as a measurement device to measure signal characteristics such as on a sender site and restore those characteristics on a receiver site.
Embodiments of the present invention do not exclusively aim at generating a multi-channel spatial output signal. A mono input signal is decomposed and individual signal parts are processed and re-synthesized to a mono output signal. In some embodiments the concept, as defined in the first or the second aspect, outputs measurements or side information instead of an audible signal.
Additionally, a separation is based on a perceptual aspect and advantageously a quantitative characteristic or value rather than a semantic aspect.
In accordance with embodiments, the separation is based on a deviation of an instantaneous energy with respect to an average energy within a considered short time frame. While a transient event with an energy level close to or below the average energy in such a time frame is not perceived as substantially different from the background, events with a high energy deviation can be distinguished from the background signal. This kind of signal separation adopts the principle and allows for processing closer to the human perception of transient events and closer to the human perception of foreground events over background events.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
Furthermore, the apparatus comprises a separator 130 for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic. Thus, the ratio of the block characteristic of the current block and the average characteristic is used as a characteristic, based on which the separation of the current block of audio signal values is performed. Particularly, the background component signal at signal output 140 comprises the background portion of the current block, and the foreground component signal output at the foreground component signal output 150 comprises the foreground portion of the current block. The procedure illustrated in
Advantageously, the audio signal analyzer is configured for analyzing an amplitude-related measure as the block characteristic of the current block and, additionally, the audio signal analyzer 120 is configured for additionally analyzing the amplitude-related characteristic for the group of blocks as well.
Advantageously, a power measure or an energy measure for the current block and an average power measure or an average energy measure for the group of blocks is determined by the audio signal analyzer, and a ratio between those two values for the current block is used by the separator 130 to perform the separation.
In step 202, a separation gain is calculated from the ratio or the characteristic. Then, a threshold comparison in step 204 can be performed optionally. When a threshold comparison is performed in step 204, then the result can be that the characteristic is in a predetermined relation to the threshold. When this is the case, the control proceeds to step 206. When, however, it is determined in step 204 that the characteristic is not in relation to the predetermined threshold, then no separation is performed and the control proceeds to the next block in the sequence of blocks.
In accordance with the first aspect, a threshold comparison in step 204 can be performed or can, alternatively, not be performed as illustrated by the broken line 208. When it is determined in block 204 that the characteristic is in a predetermined relation to the separation threshold or, in the alternative of line 208, in any case, step 206 is performed, where the audio signals are weighted using a separation gain. To this end, step 206 receives the audio signal values of an input audio signal in a time representation or, advantageously, a spectral representation as illustrated by line 210. Then, depending on the application of the separation gain, the foreground component C is calculated as illustrated by the equation directly below
Subsequently,
The characteristic of the current block and the variability of the characteristic are both forwarded to the separator 130 via a connection line 129. The separator is then configured for separating the current block into a background portion and the foreground portion to generate the background component signal 140 and the foreground component signal 150. Particularly, the separator is configured, in accordance with the second aspect, to determine a separation threshold based on the variability determined by the audio signal analyzer and to separate the current block into the background component signal portion and the foreground component signal portion, when the characteristic of the current block is a predetermined relation to the separation threshold. When, however, the characteristic of the current block is not in the predetermined relation to the (variable) separation threshold, then no separation of the current block is performed and the whole current block is forwarded to or used or assigned as the background component signal 140.
Specifically, the separator 130 is configured to determine the first separation threshold for a first variability and a second separation threshold for a second variability, wherein the first separation threshold is lower than the second separation threshold and the first variability is lower than the second variability, and wherein the predetermined relation is “greater than”.
An example is illustrated in
Depending on certain implementations, the separator 130 is configured to determine the (variable) separation threshold either using a table access, where the functions illustrated in
As illustrated in
Particularly, a separation stage 600 that is illustrated in detail in
Advantageously, based on signal separation/decomposition of the input signal a(t) into distinctly perceivable claps c(t) and more noise-like background signals n(t) an individual processing of the decomposed signal parts is realized. After processing, the modified foreground and background signals c′(t) and n′(t) are re-synthesized resulting in the output signal a′(t).
Particularly, the system in
The applause input signal a(t), i.e., the input signal comprising background components and applause components, is fed into a signal switch (not shown in
The signal separator 130 in
Furthermore, when the adaptive thresholding operation in accordance with the second aspect is performed, then the audio signal analyzer additionally performs an envelope variability estimation as illustrated in block 174, and the variability measure v(n) is forwarded to the separator, and particularly, to the adaptive thresholding processing block 182 to finally obtain the gain gs(n) as will be described later on.
A flow chart of the internals of the foreground signal detector is depicted in
where w(n) denotes a weighting window applied to the instantaneous energy estimates with window length Lw=2M+1. As an indication as to whether a distinct clap is active within the input signal, the energy ratio Ψ(n) of instantaneous and average energy is used according to;
In the simpler case without adaptive thresholding, for time instances where the energy ratio exceeds the attack threshold τattack, the separation gain which extracts the distinct clap part from the input signal is set to 1; consequently, the noise-like signal is zero at these time instances. A block diagram of a system with hard signal switching is depicted in
In a further embodiment, the above equation is replaced by the following equation:
Note: if τattack=0, the amount of signal routed to the distinctive clap only depends on the energy ratio Ψ(n) and the fixed gain gN yielding a signal dependent soft decision. In a well-tuned system, the time period in which the energy ratio exceeds the attack thresholds captures only the actual transient event. In some cases, it might be desirable to extract a longer period of time frames after an attack occurred. This can be done, for instance, by introducing a release threshold τrelease indicating the level to which the energy ratio Ψ has to decrease after an attack before the separation gain is set back to zero:
In a further embodiment, the immediately preceding equation is replaced by the following equation:
An alternative but more static method is to simply route a certain number of frames after a detected attack to the distinct clap signal.
In order to increase flexibility of the thresholding, thresholds could be chosen in a signal adaptive manner resulting in τattack(n) and τrelease(n), respectively. The thresholds are controlled by an estimate of the variability of the envelope of the applause input signal, where a high variability indicates the presence of distinctive and individually perceivable claps and a rather low variability indicates a more noise-like and stationary signal. Variability estimation could be done in time domain as well as in frequency domain. The advantageous method in this case is to do the estimation in frequency domain:
v′(n)=var([ΦA(n−M),ΦA(n−M+1), . . . ,ΦA(n+m)]), m=−M . . . M
where var(⋅) denotes the variance computation. To yield a more stable signal, the estimated variability is smoothed by low pass filtering yielding the final envelope variability estimate
v(n)=hTP(n)*v′(n)
where * denotes a convolution. The mapping of envelope variability to corresponding threshold values can be done by mapping functions ƒattack(x) and ƒrelease(x) such that
τattack(n)=ƒ
τrelease(n)=ƒ
In one embodiment, the mapping function could be realized as clipped linear functions, which corresponds to a linear interpolation of the thresholds. The configuration for this scenario is depicted in
The separated signals are obtained by
C(k,n)=gs(n)·A(k,n)
N(k,n)=A(k,n)−C(k,n)
Furthermore,
Furthermore, as illustrated with respect to equations (7) to (9) in
Furthermore,
Particularly,
Alternatively, as illustrated in the right portion of
The separated applause signal parts can be fed into measurement stages where certain (perceptually motivated) characteristics of transient signals can be measured. An exemplary configuration for such a use case is depicted in
Estimating the foreground density ΘFGD(n) can be done by counting the event rate per second, i.e. the number of detected claps per second. The foreground prominence ΘFFG(n) is given by the energy ratio of estimated foreground clap signal C(n) and A(n):
A block diagram of the restoration of the measured signal characteristics is depicted in
While in the previous embodiment, the signal characteristic was only measured, the system is used to modify signal characteristics. In one embodiment, the foreground processing could output a reduced number of the detected foreground claps resulting in a density modification towards lower density of the resulting output signal. In another embodiment, the foreground processing could output an increased number of foreground claps, e.g., by adding a delayed version of the foreground clap signal to itself resulting in a density modification towards increased density. Furthermore, by applying weights in the respective processing stages, the balance of foreground claps and noise-like background could be modified. Additionally, any processing like filtering, adding reverb, delay, etc. in both paths can be used to modify the characteristics of an applause signal.
Subsequently, further advantageous embodiments are discussed with respect to
In the
The exemplarily illustrated overlapping blocks consist, for example, of a current block 304 that overlaps within the overlap range with a preceding block 303 or a following block 305. Thus, when a group of blocks comprises at least two preceding blocks then this group of blocks would consist of the preceding block 303 with respect to the current block 304 and the further preceding block indicated with order number 3 in
These blocks are, for example, formed by the block generator 110 that advantageously also performs a time-spectral conversion such as the DFT mentioned earlier or an FFT (Fast Fourier transform).
The result of the time-spectral conversion is a sequence of spectral blocks I to VIII, where each spectral block illustrated in
Advantageously, a separation is then performed in the frequency domain, i.e., using the spectral representation where the audio signal values are spectral values. Subsequent to the separation, a foreground spectral representation, once again consisting of blocks I to VIII, and a background representation consisting of I to VIII, are obtained. Naturally, and depending on the thresholding operation, it is not necessarily the case that each block of the foreground representation subsequent to the separation 130 has values different from zero. However, advantageously, it is made sure by at least the first aspect of the present invention that each block in the spectral representation of the background component has values different from zero in order to avoid a drop out of energy in the background signal component.
For each component, i.e., the foreground component and the background component, a spectral-time conversion is performed as has been discussed in the context of
Advantageously, as illustrated in
In particular, step 400 illustrates the determination of a general characteristic or a ratio between a block characteristic and an average characteristic for a current block as illustrated at 400.
In block 402, a raw variability is calculated with respect to the current block. In block 404, raw variabilities for preceding or following blocks are calculated to obtain, by the output of block 402 and 404, a sequence of raw variabilities. In block 406, the sequence is smoothed. Thus, at the output of block 406 a smoothed sequence of variabilities exists. The variabilities of the smoothed sequence are mapped to corresponding adaptive thresholds as illustrated in block 408 so that one obtains the variable threshold for the current block.
An alternative embodiment is illustrated in
In block 403, a sequence of variabilities is calculated using, for example, equation 6 of
In block 405, the sequence of variabilities is mapped to a sequence of raw thresholds in accordance with equation 8 and equation 9 but with non-smoothed variabilities in contrast to equation 7 of
In block 407, the sequence of raw thresholds is smoothed in order to finally obtain the (smoothed) threshold for the current block.
Subsequently,
Once again, in step 500, a characteristic or ratio between a current block characteristic and an average block characteristic is calculated.
In step 502, an average or, generally, an expectation over the characteristics/ratios for the group of blocks is calculated.
In block 504, differences between characteristics/ratios and the average value/expectation value are calculated and, as illustrated in block 506, the addition of the differences or certain values derived from the differences are performed advantageously with a normalization. When the squared differences are added then the sequence of steps 502, 504, 506 reflect the calculation of a variance as has been outlined with respect to equation 6. However, for example, when magnitudes of differences or other powers of differences different from two are added together then a different statistical value derived from the differences between the characteristics and the average/expectation value is used as the variability.
Alternatively, however, as illustrated in step 508, also differences between time-following characteristics/ratios for adjacent blocks are calculated and used as the variability measure. Thus, block 508 determines a variability that does not rely on an average value but that relies on a change from one block to the other, wherein, as illustrated in
Subsequently, examples of embodiments are defined that can be used separately from the below examples or in combination with any of the below examples:
Subsequently, further examples are described that can be used separately from the above examples or in combination with any of the above examples.
An inventively encoded audio signal can be stored on a digital storage medium or a non-transitory storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
16199402 | Nov 2016 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2017/079516, filed Nov. 16, 2017, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 16 199 402.5, filed Nov. 17, 2016, which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5012519 | Adlersberg | Apr 1991 | A |
6640145 | Hoffberg et al. | Oct 2003 | B2 |
6799170 | Lee et al. | Sep 2004 | B2 |
7006881 | Hoffberg et al. | Feb 2006 | B1 |
7283954 | Crockett | Oct 2007 | B2 |
7295972 | Choi | Nov 2007 | B2 |
7454329 | Abe | Nov 2008 | B2 |
7567845 | Avendano et al. | Jul 2009 | B1 |
7903751 | Fiesel | Mar 2011 | B2 |
7930170 | Chakravarthy | Apr 2011 | B2 |
7962332 | Liebchen | Jun 2011 | B2 |
8046234 | Kim | Oct 2011 | B2 |
8155971 | Falch et al. | Apr 2012 | B2 |
8224658 | Lei | Jul 2012 | B2 |
8239052 | Goto et al. | Aug 2012 | B2 |
8359205 | Srinivasan | Jan 2013 | B2 |
8379868 | Goodwin et al. | Feb 2013 | B2 |
8725503 | Bessette | May 2014 | B2 |
8805679 | You | Aug 2014 | B2 |
8812322 | Mysore | Aug 2014 | B2 |
8958566 | Falch et al. | Feb 2015 | B2 |
9338420 | Xiang | May 2016 | B2 |
9633665 | Hennequin | Apr 2017 | B2 |
10176826 | Wang | Jan 2019 | B2 |
10504539 | Kaskari | Dec 2019 | B2 |
20020163533 | Dongge et al. | Nov 2002 | A1 |
20030112265 | Zhang | Jun 2003 | A1 |
20040005065 | Griesinger | Jan 2004 | A1 |
20050065792 | Gao | Mar 2005 | A1 |
20060241938 | Hetherington | Oct 2006 | A1 |
20070174050 | Li | Jul 2007 | A1 |
20070177620 | Yusuke et al. | Aug 2007 | A1 |
20090168984 | Kreiner | Jul 2009 | A1 |
20090252341 | Goodwin | Oct 2009 | A1 |
20090254338 | Chan et al. | Oct 2009 | A1 |
20090281805 | LeBlanc | Nov 2009 | A1 |
20100125352 | Kazunobu et al. | May 2010 | A1 |
20100138010 | Aziz et al. | Jun 2010 | A1 |
20100174389 | Aziz et al. | Jul 2010 | A1 |
20110026813 | Carlin et al. | Feb 2011 | A1 |
20110075832 | Atsushi | Mar 2011 | A1 |
20110091043 | Wang | Apr 2011 | A1 |
20110099010 | Zhang | Apr 2011 | A1 |
20110282658 | Quatieri, Jr. et al. | Nov 2011 | A1 |
20120045066 | Kazuhiro et al. | Feb 2012 | A1 |
20120114126 | Del Galdo et al. | May 2012 | A1 |
20130018660 | Qi et al. | Jan 2013 | A1 |
20140236582 | Raychowdhury | Aug 2014 | A1 |
20140278391 | Braho et al. | Sep 2014 | A1 |
20140355766 | Morrell et al. | Dec 2014 | A1 |
20140358265 | Wang et al. | Dec 2014 | A1 |
20150066499 | Wang | Mar 2015 | A1 |
20150127354 | Günther et al. | May 2015 | A1 |
20150213803 | Günther et al. | Jul 2015 | A1 |
20160086609 | Chen et al. | Mar 2016 | A1 |
20160189731 | Hennequin | Jun 2016 | A1 |
20160307554 | Tsai | Oct 2016 | A1 |
20160307581 | Salmela | Oct 2016 | A1 |
20170098310 | Chefd'Hotel et al. | Apr 2017 | A1 |
20170133034 | Gampp et al. | May 2017 | A1 |
20170178664 | Donnelly et al. | Jun 2017 | A1 |
20170194008 | Vandroux | Jul 2017 | A1 |
20170278519 | Yinyi et al. | Sep 2017 | A1 |
20180033444 | Baeckstroem | Feb 2018 | A1 |
20180068670 | Lu | Mar 2018 | A1 |
20180204580 | Fischer | Jul 2018 | A1 |
20190013036 | Graf | Jan 2019 | A1 |
20190272835 | Adami | Sep 2019 | A1 |
20190272836 | Adami | Sep 2019 | A1 |
Number | Date | Country |
---|---|---|
1855272 | Nov 2007 | EP |
2154911 | Feb 2010 | EP |
2000-250568 | Sep 2000 | JP |
2008-015481 | Jan 2008 | JP |
2011-075728 | Apr 2011 | JP |
2014-115377 | Jun 2014 | JP |
10-1456640 | Nov 2014 | KR |
2589298 | Jul 2016 | RU |
0247068 | Jun 2002 | WO |
2009028937 | Mar 2009 | WO |
2009051132 | Apr 2009 | WO |
2010017967 | Feb 2010 | WO |
2011049515 | Apr 2011 | WO |
2011111091 | Sep 2011 | WO |
2016133785 | Aug 2016 | WO |
Entry |
---|
Adami, A., et al. “Perception and Measurement of Applause Characteristics” Proc 29th Tonmeistertagung, Nov. 2016, pp. 199-206, International Audio Laboratories Erlangen, Germany. |
Goodwin, M.M., et al. “Frequency-Domain Algorithms for Audio Signal Enhancement Based on Transient Modification” J., 2006, vol. 54, No. 9, pp. 827-840, Audio Eng. Soc, Scotts Valley, USA. |
Walther, A., et al. “Using Transient Suppression in Blind Multi-channel Upmix Algorithms” 122nd AES Pro Audio Expo and Convention, May 2007, Paper 6990, Audio Engineering Society, Vienna, Austria. |
Disch, S., et al. “A Dedicated Decorrelator for Parametric Spatial Coding of Applause-Like Audio Signals” Springer-Verlag, Jan. 2012, pp. 355-363, Fraunhofer IIS, Erlangen, Germany. |
Kuntz, A., et al. “The Transient Steering Decorrelator Tool in the Upcoming MPEG Unified Speech and Audio Coding Standard” 131st AES Convention, Oct. 2011, Paper 8533, Audio Engineering Society, New York, USA. |
Hotho, G., et al. “Multichannel Coding of Applause Signals” EURASIP J. Adv. Signal Process., vol. 2008, Article ID 531693, http://dx.doi.org/10.1155/2008/534693, Jul. 26, 2007, Digital Signal Processing Group, Philips Research, High Tech Campus 36, 5656 AE Eindhoven, The Netherlands. |
Fitzgerald, D. “Harmonic/Percussive Separation Using Median Filtering” Proc. 13th Int. Conference on Digital Audio Effects (DAFx-10), Sep. 2010, Graz, Austria. |
Bello, J.P., et al. “A Tutorial on Onset Detection in Music Signals” IEEE Trans. Speech Audio Process., 2005, vol. 13, No. 5, pp. 1035-1047. |
Goto, M., et al. “Beat tracking based on multiple-agent architecture—a real-time beat tracking system for audio signals” Proceeding of the 2nd Int. Conference on Multi-Agent Systems, 1996, pp. 103-110, Tokyo, Japan. |
Klapuri, A. “Sound onset detection by applying psychoacoustic knowledge” Proc. IEEE ICASSP, 1999, vol. 6, pp. 3089-3092, Signal Processing Laboratory, Tampere University of Technology,Tampere, Finland. |
Sung-Yoon Jung, “Office Action for KR Application No. 10-2019-7017323,” Nov. 11, 2020, KIPO, Korea. |
K. Sakhnov et al., “Approach for Energy-Based Voice Detector with Adaptive Scaling Factor”, IAENG International Journal of Computer Science, Nov. 1, 2009, pp. 394-399, XP055216468. |
Stefanie Ebbinghaus et al., “Office Communication for EP Application No. 17798236.0”, Feb. 18, 2021 , EPO, Europe. |
Number | Date | Country | |
---|---|---|---|
20190272835 A1 | Sep 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2017/079516 | Nov 2017 | US |
Child | 16415392 | US |