The present invention relates to a transient detector operating on an audio signal, and a method for supporting encoding of an audio signal.
An encoder is a device, circuitry or computer program that is capable of analyzing a signal such as an audio signal and outputting a signal in an encoded form. The resulting signal is often used for transmission, storage and/or encryption purposes. On the other hand a decoder is a device, circuitry or computer program that is capable of inverting the encoder operation, in that it receives the encoded signal and outputs a decoded signal.
In most state-of the art encoders such as audio encoders, each frame of the input signal is analyzed in the frequency domain. The result of this analysis is quantized and encoded and then transmitted or stored depending on the application. At the receiving side (or when using the stored encoded signal) a corresponding decoding procedure followed by a synthesis procedure makes it possible to restore the signal in the time domain.
Codecs are often employed for compression/decompression of information such as audio and video data for efficient transmission over bandwidth-limited communication channels.
In particular, there is a high market need to transmit and store audio signals at low bit rates while maintaining high audio quality. For example, in cases where transmission resources or storage is limited low bit rate operation is an essential cost factor. This is typically the case, for example, in streaming and messaging applications in mobile communication systems.
A general example of an audio transmission system using audio encoding and decoding is schematically illustrated in
An audio signal can be considered quasi-stationary, i.e. stationary for short time periods. For example, a transform-based audio codec divides the signal into short time periods, frames, and relies on the quasi-stationarity to achieve efficient compression.
The audio signal may contain a number of rapid changes in frequency spectrum or amplitude, so called transients. It is desirable to detect these transients such that the audio codec can take proper actions to avoid the audible artifacts that transients may cause in for example transform-based audio codecs (for example the pre-echo effect; i.e. quantization noise spread in time).
For this reason a transient detector is used in connection with the audio codec. The transient detector analyzes the audio signal and is responsible for signaling detected transients to the encoder. There are transient detectors operating in the time-domain as well as transient detectors operating in the frequency-domain.
For example, a transient detector is commonly included into audio codecs as the input to the window switching module [1, 2].
However, there is a general demand for more efficient audio encoding and improved mechanisms and realizations for supporting audio encoding including transient detectors.
It is a general object of the present invention to provide an improved transient detector operating on an audio signal.
It is also an object to provide a method for supporting encoding of an audio signal.
These and other objects are met by the invention as defined by the accompanying patent claims.
The inventors have recognized that when transient detection is performed in the time domain and the codec operates based on a lapped transform, a transient in a given frame will also affect the encoding of a following frame. A basic idea of the invention is therefore to provide a transient detector which analyzes a given frame n of the input audio signal to determine, based on audio signal characteristics of the given frame n, a transient hangover indicator for a following frame n+1, and signals the determined transient hangover indicator to an associated audio encoder to enable proper encoding of the following frame n+1.
Preferably, when the audio signal characteristics of frame n includes characteristics representative of a transient the transient detector determines a transient hangover indicator indicating a transient for the following frame n+1.
In practice, it is thus possible to configure the transient detector in such a way that if a transient is detected and signaled to the codec for a current frame, the transient detector will also signal a transient hangover that is relevant for the following frame.
In this way it can be ensured that proper encoding actions are taken, when the codec operates based on a lapped transform, also for the following frame.
The invention covers both a transient detector and a method for supporting encoding of an audio signal.
Other advantages offered by the invention will be appreciated when reading the below description of embodiments of the invention.
The invention, together with further objects and advantages thereof, will be best understood by reference to the following description taken together with the accompanying drawings, in which:
Throughout the drawings, the same reference characters will be used for corresponding or similar elements.
As previously mentioned, it is desirable to detect transients in an audio signal such that the audio codec can take proper actions to avoid the audible artifacts that transients may cause in for example transform-based audio codecs (e.g. the pre-echo effect) and more generally audio encoders operating based on a lapped transform. Pre-echoes generally occur when a signal with a sharp attack begins near the end of a transform block immediately following a region of low energy. In general, a transient is characterized by a sudden change in audio signal characteristics such as amplitude and/or power measured in the time and/or frequency domain. Preferably, the audio encoder is configured to perform transform-based encoding especially adapted for transients (transient encoding mode) when a transient is detected for an input frame. There are a number of different conventional strategies for encoding transients.
However, the inventors have recognized that when transient detection is performed in the time domain and the codec operates based on a lapped transform, a transient in a given frame will also affect the encoding of a following frame. Based on this insight into the operation of a lapped transform codec, a novel transient detector is introduced.
The analyzer 110 performs suitable signal analysis based on the received audio signal. Preferable, the transient detector 100 analyzes a given frame n, a transient hangover indicator for a following frame n+1 in a novel hangover indicator module 112 of the analyzer 110. The signaling module 120 is operable for signaling the determined transient hangover indicator to the associated audio encoder 10 to enable proper encoding of the following frame n+1. Any suitable transient detection measure may be used such as a short-to-long-term-energy-ratio.
It is thus possible for the transient detector 100 to signal not only a transient for the current frame n, but also a transient hangover indicator for a following frame n+1 based on an analysis of the current frame n.
As illustrated in
For example, transform-based audio encoders are normally built around a time-to-frequency domain transform such as a DCT (Discrete Cosine Transform), a Modified Discrete Cosine Transform (MDCT) or a lapped transform other than the MDCT. A common characteristic of transform-based audio encoders is that they operate on overlapped blocks of samples: overlapped frames.
In
In
It should be noted that the input to the transform for encoding frame n and the input to the transform for encoding frame n+1 are overlapping. Hence, the reason for referring to these larger transform input blocks as overlapped frames.
If transient detection is performed in time domain and the codec operates with lapped transforms, such as the Modified Discrete Cosine Transform (MDCT), a transient in the input frame will also appear in the following frame.
Since the transient is encoded not only in the frame where it is detected, but also in the following frame, it is suggested to introduce a hangover in the transient detector. The hangover implies that if a transient is detected and signalled to the codec for the current frame, then the transient detector shall also signal to the codec that a transient is detected in the following frame.
In this way it can be ensured that proper encoding actions are taken also for the following frame. When a hangover indicator indicating a transient is signaled from the signaling module 120 of the transient detector 100 to the audio encoder 10, the encoder 10 performs so-called transient encoding of frame n+1; i.e. using a so-called transient encoding mode adapted for encoding of an overlapped frame block that includes a transient.
Proper encoding actions in so-called transient encoding mode could for instance be to decrease the length of the transform to improve the time resolution at the cost of a worse frequency resolution. This may for example be effectuated by performing time-domain aliasing (TDA) based on an overlapped frame to generate a corresponding time-domain aliased frame, and perform segmentation in time based on the time-domain aliased frame to generate at least two segments, also referred to as sub-frames. Based on these segments, transform-based spectral analysis may then be performed to obtain, for each segment, coefficients representative of the frequency content of the segment.
It should be understood that even if no transient is detected by the transient detector 100 based on the audio signal characteristics of input frame n+1 (see
With reference to the exemplary schematic flow diagram of
In step S1, an audio signal is received. In step S2, a given frame n is analyzed to determine, based on audio signal characteristics of the given frame n, a transient hangover indicator for a following frame n+1. In step S3, the transient hangover indicator is signaled to an associated audio encoder to enable appropriate encoding actions with respect to the following frame n+1 of the audio signal.
As indicated above, the value of the transient hangover indicator is preferably determined in dependence on the existence of audio signal characteristics representative of a transient within the given input frame n that is being analyzed. The value of the hangover indicator may be expressed in many different ways, including True/False, I/O, +1/−1 and a number of other equivalent representations.
For a better understanding of the invention, more detailed examples of signal analysis and detection mechanisms will now be described.
Block-Wise Energy Calculation
As an example, a transient detector may be based on the fluctuations in power in the audio signal. For instance the audio frame to be encoded can be divided in several blocks, as illustrated in
A long term power, PLT(i) can be calculated by a simple IIR filter, PLT (i)=αPLT(i−1)+(1−α) PST(i), where α is a forgetting factor
When the quotient PST(i)/PLT(i) exceeds a certain threshold, the transient detector signals that a transient is found in block i.
Expressed in terms of energy; for each block, a comparison between the short term energy E(i) and the long term energy ELT(i) is performed. A transient can be considered as detected whenever the energy ratio is above a certain threshold:
E(i)≥RATIO×ELT(i), where RATIO is an energy ratio threshold that may be set to some suitable value such as for example 7.8 dB.
This is merely an example of a detection measure, and the invention is not limited thereto.
High-Pass Filter and Zero-Crossings
Since the blocks of the audio frame are short, there is a risk that the transient detector above triggers on stationary signals where the fluctuations of a low frequency sine function appears to be rapid power changes.
This problem can be avoided by adding a high-pass filter prior to power calculation, as illustrated in the example of
Another possible solution to the problem above could be to calculate the number of zero-crossings in the analyzed block. If the number of zero crossings is low, it is assumed that the signal only contains low frequencies and the transient detector could decide to increase the threshold value or to consider the block as free of transients.
Transient/Hangover Detection Dependent on Window-Function and/or Location
Optionally, the signal analyzer of the transient detector may be configured to determine the value of the transient hangover indicator not only in dependence on the existence of a transient but also in dependence on a predetermined window function and/or the location of the transient within the frame being analyzed.
Before transformation in the audio encoder, the audio signal is normally multiplied by a window function. In the case of codecs based on the Modified Discrete Cosine Transform (MDCT), the window function is often the so called sine window, but it could also be a Kaiser-Bessel window or some other window function.
The window functions generally have a maximum value at the beginning of the current frame and the end of the preceding frame, while the end of the current frame and the beginning of the preceding frame is close to zero.
This means that a transient near the end of the current frame will be suppressed by the window function and therefore less important to signal to the encoder. If the transient is suppressed enough it may even be beneficial to not signal to the encoder that a transient is detected.
However, when the next frame is to be encoded the transient will be in the end of the preceding frame, i.e. located near the maximum of the window function and it is essential that the encoder is signaled that a transient is detected.
A detected transient near the end of a frame should therefore result in a Hangover set to 1 (or equivalent representation) while no detected transient is signaled to the encoder. This way the transient detector signals that a transient is detected in the following frame.
Similarly, if a transient is detected in the beginning of a frame, the transient detector should signal that a transient is detected, but set the Hangover to 0 (or equivalent representation) since the transient will be suppressed by the window function when the next frame is encoded.
A transient located in the center of the frame will appear in both the current frame and the following frame. “Transient detected should therefore be signaled and Hangover set to 1.
The exact borders between “Beginning of Frame”, “Center of Frame” and “End of Frame” are preferably chosen with respect to the window function.
It should also be understood that the I/O representation of Table 1 are merely used as an example. In fact, any suitable representation including True/False and +1/−1 may be used for indicating hangover/not hangover. It is even possible to use non-binary representations such as probability indications.
In other words, the transient detector may be configured to determine a transient hangover indicator indicating a transient for the following frame n+1 if audio signal characteristics representative of a transient in frame n is detectable after a windowing operation based on a predetermined window function. The transient detector may also be configured to determine a hangover indicator that does not indicate a transient for the following frame n+1 if audio signal characteristics representative of a transient in frame n is suppressed after a windowing operation based on the window function. The window function generally corresponds to the window function (covering at least two frames) used for transform coding of frame n in the associated audio encoder, but shifted one frame forward in time, as will be explained below.
This invention introduces a decision logic which modifies a primary transient detection in order to adjust the decision to cope with overlapped frames. This is based on the fact that certain transients depending on the time occurrence do not need to be handled in a special way. For such cases the invention will override the primary decision and signal that there is no transient. In general the invention would modify the primary transient detection to adjust the decision based on the specific application.
For hangover indication purposes, frame n is used as the analysis frame, but the window function is shifted one frame forward as illustrated in
After a window operation using the selected window function, the transient in frame n (beginning of frame) is detectable in the example of
In the example of
In the example of
As illustrated in the example of
The above concept could be improved by adapting the transient detection to the selected window function even further.
In an exemplary embodiment of the invention: before dividing the short-term energy with the long-term energy and comparing the quotient to the threshold, the short-term energy could be scaled by the window function at the current block. The long-term energy is still updated with the unscaled version of the short-term energy. If the scaled short-term energy divided by the long-term energy exceeds the threshold, the transient detector signals that a transient is detected.
Similarly the short-term energy is scaled by the window function at the position of the block shifted one frame length (the position of the block when the next frame is encoded). If the scaled short-term energy divided by the long-term energy exceeds the threshold, the transient detector sets Hangover to 1, otherwise 0.
In a preferred exemplary embodiment of the invention, the transient detector comprises means for scaling frame n by the selected window function to produce a first scaled frame, means for determining a transient indicator for frame n based on the first scaled frame, means for scaling frame n by the window function shifted one frameforward in time to produce a second scaled frame, and means for determining a transient hangover indicator for the following frame n+1 based on the second scaled frame.
In the following, the invention will be described in relation to a specific exemplary and non-limiting codec realization suitable for the “ITU-T G.722.1 fullband codec extension”, now renamed ITU-T G.719 standard. In this particular example, the codec is presented as a low-complexity transform-based audio codec, which preferably operates at a sampling rate of 48 kHz and offers full audio bandwidth ranging from 20 Hz up to 20 kHz. The encoder processes input 16-bits linear PCM signals in frames of 20 ms and the codec has an overall delay of 40 ms. The coding algorithm is preferably based on transform coding with adaptive time-resolution, adaptive bit-allocation and low-complexity lattice vector quantization. In addition, the decoder may replace non-coded spectrum components by either signal adaptive noise-fill or bandwidth extension.
A transient detected at a ce1iain frame will also trigger a transient at the next frame. The output of the transient detector is a flag, for example denoted IsTransient. The flag is set to the value 1 or the logical value TRUE or equivalent representation if a transient is detected, or set to the value O or the logical value FALSE or equivalent representation otherwise (if a transient is not detected).
It may be beneficial to group the obtained spectral coefficients into bands of unequal lengths. The norm of each band is estimated and the resulting spectral envelope consisting of the norms of all bands is quantized and encoded. The coefficients are then normalized by the quantized norms. The quantized norms are further adjusted based on adaptive spectral weighting and used as input for bit allocation. The normalized spectral coefficients are lattice vector quantized and encoded based on the allocated bits for each frequency band. The level of the non-coded spectral coefficients is estimated, coded and transmitted to the decoder. Huffman encoding is preferably applied to quantization indices for both the coded spectral coefficients as well as the encoded norms.
After de-quantization, low frequency non-coded spectral coefficients (allocated zero bits) are regenerated, preferably by using a spectral-fill codebook built from the received spectral coefficients (spectral coefficients with non-zero bit allocation).
Noise level adjustment index may be used to adjust the level of the regenerated coefficients. High frequency non-coded spectral coefficients are preferably regenerated using bandwidth extension.
The decoded spectral coefficients and regenerated spectral coefficients are mixed and lead to a normalized spectrum. The decoded spectral envelope is applied leading to the decoded full-band spectrum.
Finally, the inverse transform is applied to recover the time-domain decoded signal. This is preferably performed by applying either the inverse Modified Discrete Cosine Transform (IMDCT) for stationary modes, or the inverse of the higher temporal resolution transform for transient mode.
The algorithm adapted for fullband extension is based on adaptive transform-coding technology. It operates on 20 ms frames of input and output audio. Because the transform window (basis function length) is of 40 ms and a 50 percent overlap is used between successive input and output frames, the effective look-ahead buffer size is 20 ms. Hence, the overall algorithmic delay is of 40 ms which is the sum of the frame size plus the look-ahead size. All other additional delays experienced in use of an ITU-T G.719 codec are either due to computational and/or network transmission delays.
Advantages of the invention include low complexity, time domain computation (no spectrum computation required), and/or compatibility with lapped transforms based on the hangover value.
The embodiments described above are merely given as examples, and it should be understood that the present invention is not limited thereto. Further modifications, changes and improvements which retain the basic underlying principles disclosed and claimed herein are within the scope of the invention.
This application is a continuation of U.S. application Ser. No. 15/296,600, filed on Oct. 18, 2016 (now U.S. Pat. No. 10,311,883 issued on Jun. 4, 2019), which is a continuation of U.S. application Ser. No. 12/673,862 (now U.S. Pat. No. 9,495,971, issued on Nov. 15, 2016), which has a § 371(c) date of Aug. 12, 2010 and which is the National Stage of International Patent Application No. PCT/SE2008/050960, filed Aug. 25, 2008, which claims priority to U.S. Provisional Application No. 60/968,229, filed Aug. 27, 2007. The above identified applications and publications are incorporated by this reference.
Number | Name | Date | Kind |
---|---|---|---|
5978761 | Johansson | Nov 1999 | A |
5991718 | Malah | Nov 1999 | A |
6078882 | Sato et al. | Jun 2000 | A |
6202046 | Oshikiri | Mar 2001 | B1 |
6597961 | Cooke | Jul 2003 | B1 |
6615169 | Ojala et al. | Sep 2003 | B1 |
6704705 | Kabal et al. | Mar 2004 | B1 |
6775650 | Lockwood | Aug 2004 | B1 |
6889187 | Zhang | May 2005 | B2 |
7328150 | Chen et al. | Feb 2008 | B2 |
7937271 | You | May 2011 | B2 |
8392202 | Taleb | Mar 2013 | B2 |
8744862 | You | Jun 2014 | B2 |
9153240 | Briand et al. | Oct 2015 | B2 |
9495971 | Ullberg et al. | Nov 2016 | B2 |
9558750 | Sung et al. | Jan 2017 | B2 |
10096324 | Sung et al. | Oct 2018 | B2 |
10311883 | Taleb | Jun 2019 | B2 |
20020103643 | Rotola-Pukkila | Aug 2002 | A1 |
20020111798 | Huang | Aug 2002 | A1 |
20020133764 | Wang | Sep 2002 | A1 |
20030115052 | Chen | Jun 2003 | A1 |
20040044520 | Chen | Mar 2004 | A1 |
20040044534 | Chen | Mar 2004 | A1 |
20040088160 | Manu | May 2004 | A1 |
20040133423 | Crockett | Jul 2004 | A1 |
20050075861 | Youn | Apr 2005 | A1 |
20050131678 | Chandran | Jun 2005 | A1 |
20060031065 | Liljeryd et al. | Feb 2006 | A1 |
20060116873 | Hetherington et al. | Jun 2006 | A1 |
20060161427 | Ojala | Jul 2006 | A1 |
20070016405 | Mehrotra | Jan 2007 | A1 |
20070061138 | Chen | Mar 2007 | A1 |
20070078650 | Rogers | Apr 2007 | A1 |
20070140499 | Davis | Jun 2007 | A1 |
20070242833 | Herre et al. | Oct 2007 | A1 |
20080024207 | Baker | Jan 2008 | A1 |
20080027717 | Rajendran et al. | Jan 2008 | A1 |
20080059202 | You | Mar 2008 | A1 |
20080120116 | Schnell et al. | May 2008 | A1 |
20100250265 | Taleb | Sep 2010 | A1 |
20120128162 | Chen et al. | May 2012 | A1 |
20140257824 | Taleb et al. | Sep 2014 | A1 |
20150142452 | Sung et al. | May 2015 | A1 |
20170140762 | Sung et al. | May 2017 | A1 |
Number | Date | Country |
---|---|---|
2001127641 | May 2001 | JP |
2006201375 | Aug 2006 | JP |
200019414 | Apr 2000 | WO |
0045389 | Aug 2000 | WO |
2008022566 | Feb 2008 | WO |
WO-2009029033 | Mar 2009 | WO |
Entry |
---|
Office Action issued in corresponding Japanese application No. 2013-030367 dated Dec. 6, 2013, 2 pages. |
Bosi et al., “Description and Complexity Estimation of the NBC Advanced Blockswitching Scheme (ABS),” International Organization for Standardization ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Associated Audio, MPEG Meeting 96/1114, Tampere, Jul. 1996, XP030030508, 8 pages. |
ITU-T Draft, “G.722.1 extension for 20 KHz fullband audio,” Draft new ITU-T Recommendation G.722.1-FB, Study Period 2005-2008, International Telecommunication Union, Study Group 16, Apr. 29, 2008, XP017543692, 44 pages. |
Extended European Search Report issued in corresponding European Patent Application No. 08828880.8, dated Nov. 27, 2013, 3 pages. |
European Result of consutation cited in EP 08 828 880.8 dated May 9, 2016, 8 pages. |
Summons to attend oral proceedings pursuant to Rule 115(1) EPC issued in European Application 08828880.8-1901/2186090 dated Sep. 22, 2015, 7 pages. |
Response to examination report issued in European Patent Application No. 08828880.08-1907 dated Mar. 26, 2015, 4 pages. |
Office Action issued in corresponding Japanese Application No. 2010-522866 dated Oct. 22, 2012, 2 pages. |
Decision of Refusal dated Dec. 8, 2014 in corresponding Japanese Patent Application No. 2013-030367, 3 pages. |
Requisition by the Examiner cited in Canadian Application No. 2,697,920 dated Apr. 30, 2014, 4 pages. |
Requisition by the Examiner cited in Canadian Application No. 2,697,920 dated May 26, 2015, 4 pages. |
First Office Action cited in Chinese Patent Application No. 200880104833.5 dated Sep. 26, 2011, 8 pages. |
European extended Search Report cited in EP 08828880.08-1907 dated Nov. 27, 2013, 7 pages. |
Communication pursuant to Article 94(3) EPC cited in 08 828 880.8-1907 dated Sep. 17, 2014, 5 pages. |
M. Bosi et al, “IS 13818-7 (MPEG-2 Advanced Audio Coding, AAC)”, International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Audio, ISO/IEC 13818-7:1997(E), Apr. 1997, 107 pages. |
Information Technology—Generic Coding of Moving Pictures and Associated Audio: Systems Recommendation H.222.0,International Organization for Standardization Organisation Internationale de Normalisation ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Audio, ISO/IEC 13818-1, Nov. 13, 1994, 161 pages. |
Office action issued for Canadian Patent Application No. 2,697,920, dated May 25, 2016, 4 pages. |
Taleb et al., “G.719: The First ITU-T Standard for High-Quality Conversational Fullband Audio Coding”, IEEE Communications Magazine, Oct. 2009, pp. 124-130. |
International Search Report and Written Opinion dated Nov. 24, 2008 in International application No. PCT/SE2008/050960, 9 pages. |
Number | Date | Country | |
---|---|---|---|
20190244625 A1 | Aug 2019 | US |
Number | Date | Country | |
---|---|---|---|
60968229 | Aug 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15296600 | Oct 2016 | US |
Child | 16386863 | US | |
Parent | 12673862 | US | |
Child | 15296600 | US |