Examples herein relate to encoding and decoding apparatus, in particular for performing temporal noise shaping (TNS).
The following documents are in the known technology:
Temporal Noise Shaping (TNS) is a tool for transform-based audio coders that was developed in the 90s (conference papers [1-3] and patents [4-5]). Since then, it has been integrated in major audio coding standards such as MPEG-2 AAC, MPEG-4 AAC, 3GPP E-AAC-Plus, MPEG-D USAC, 3GPP EVS, MPEG-H 3D Audio.
TNS can be briefly described as follows. At the encoder-side and before quantization, a signal is filtered in the frequency domain (FD) using linear prediction, LP, in order to flatten the signal in the time-domain. At the decoder-side and after inverse quantization, the signal is filtered back in the frequency-domain using the inverse prediction filter, in order to shape the quantization noise in the time-domain such that it is masked by the signal.
TNS is effective at reducing the so-called pre-echo artefact on signals containing sharp attacks such as e.g. castanets. It is also helpful for signals containing pseudo stationary series of impulse-like signals such as e.g. speech.
TNS is generally used in an audio coder operating at relatively high bitrate. When used in an audio coder operating at low bitrate, TNS can sometimes introduce artefacts, degrading the quality of the audio coder. These artefacts are click-like or noise-like and appear in most of the cases with speech signals or tonal music signals.
Examples in the present document permit to suppress or reduce the impairments of TNS maintaining its advantages.
Several examples below permit to obtain an improved TNS for low-bitrate audio coding.
According to an embodiment, an encoder apparatus may have: a temporal noise shaping, TNS, tool for performing linear prediction, LP, filtering on an information signal including a plurality of frames; and a controller configured to control the TNS tool so that the TNS tool performs LP filtering with: a first filter whose impulse response has a higher energy; and a second filter whose impulse response has a lower energy, wherein the second filter is not an identity filter, wherein the controller is configured to choose between filtering with the first filter and filtering with the second filter on the basis of a frame metrics, wherein the controller is further configured to: modify the first filter so as to acquire the second filter in which the filter's impulse response energy is reduced.
According to another embodiment, a method for performing temporal noise shaping, TNS, filtering on an information signal including a plurality of frames may have the steps of: for each frame, choosing between filtering with a first filter and filtering with a second filter, whose impulse response has a lower energy, on the basis of a frame metrics, wherein the second filter is not an identity filter; filtering the frame using the filtering with the filtering chosen between filtering with the first filter and filtering with the second filter; and modify the first filter so as to acquire the second filter in which the filter's impulse response energy is reduced.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the method for performing temporal noise shaping, TNS, filtering on an information signal including a plurality of frames, the method having the steps of: for each frame, choosing between filtering with a first filter and filtering with a second filter, whose impulse response has a lower energy, on the basis of a frame metrics, wherein the second filter is not an identity filter; filtering the frame using the filtering with the filtering chosen between filtering with the first filter and filtering with the second filter; and modify the first filter so as to acquire the second filter in which the filter's impulse response energy is reduced, when said computer program is run by a computer.
In accordance with examples, there is provided an encoder apparatus comprising:
It has been noted that it is possible to remove artefacts on problematic frames while minimally affecting the other frames.
Instead of simply turning on/off the TNS operations, it is possible to maintain the advantages of the TNS tool while reducing its impairments. Therefore, an intelligent real-time feedback-based control is therefore obtained by simply reducing filtering where needed instead of avoiding it.
In accordance with examples, the controller is further configured to:
Accordingly, the second filter with reduced impulse response energy may be crated when needed.
In accordance with examples, the controller is further configured to:
By intelligently modifying the first filter, a filtering status may be created which is not be achievable by simply performing operations of turning on/off the TNS. At least one intermediate status between full filtering and no filtering is obtained. This intermediate status, if invoked when needed, permits to reduce the disadvantages of the TNS maintaining its positive characteristics.
In accordance with examples, the controller is further configured to:
In accordance with examples, the controller is further configured to:
In accordance with examples, the controller is further configured to:
Therefore, it is possible to define, for different metrics, different adjustment factors to obtain the filter parameters which are the most appropriated for each frame.
In accordance with examples, the controller is further configured to define the adjustment factor as
wherein thresh is the TNS filtering determination threshold, thresh2 is the filtering type determination threshold, frameMetrics is a frame metrics, and γmin is a fixed value.
Artefacts caused by the TNS occur in frames in which the prediction gain is in a particular interval, which is here defined as the set of values higher than the TNS filtering determination threshold thresh but lower than the filtering determination threshold thresh2. In some cases in which the metrics is the prediction gain, thresh=1.5 and thresh2=2, artefacts caused by the TNS tend to occur between 1.5 and 2. Therefore, several examples permit to overcome these impairments by reducing the filtering for 1.5<predGain<2.
In accordance with examples, the controller is further configured to modify the parameters of the first filter to obtain the parameters of the second filter by applying:
αw(k)=γkα(k),k=0, . . . ,K
where α(k) are parameters of the first filter, γ is the adjustment factor such that 0<γ<1, αw(k) are the parameters of the second filter and K is the order of the first filter.
This is an easy but valid technique for obtaining the parameters of the second filter so that the impulse response energy is reduced in respect to the impulse response energy of the first filter.
In accordance with examples, the controller is further configured to obtain the frame metrics from at least one of a prediction gain, an energy of the information signal and/or a prediction error.
That these metrics permit to easily and reliably discriminate the frames which need to be filtered by the second filter from the frames which need to be filtered by the first filter.
In accordance with examples, the frame metrics comprises a prediction gain calculated as
where energy is a term associated to an energy of the information signal, and predError is a term associated to a prediction error.
In accordance with examples, the controller is configured so that:
In accordance with examples, the controller is configured to:
Accordingly, it is easy to automatically establish whether the signal is to be filtered using the first filter or using the second filter.
In accordance with examples, the controller is configured to:
Accordingly, it is also possible to completely avoid TNS filtering at all when not appropriated.
In examples, the same metrics may be used twice (by performing comparisons with two different thresholds): both for deciding between the first filter and second filter, and for deciding whether to filter or not to filter.
In accordance with examples, the controller is configured to:
In accordance with examples, the apparatus may further comprise:
These data may be stored and/or transmitted, for example, to a decoder.
In accordance with examples, there is provided a system comprising an encoder side and a decoder side, wherein the encoder side comprises an encoder apparatus as above and/or below.
In accordance with examples, there is provided a method for performing temporal noise shaping, TNS, filtering on an information signal including a plurality of frames, the method comprising:
In accordance with examples, there is provided a non-transitory storage device storing instructions which, when executed by a processor, cause the processor to perform at least some of the steps of the methods above and/or below and/or to implement a system as above or below and/or an apparatus as above and/or below.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
The encoder apparatus 10 may, inter alia, comprise a temporal noise shaping, TNS, tool 11 for performing TNS filtering on an FD information signal 13 (Xs(n)). The encoder apparatus 10 may, inter alia, comprise a TNS controller 12. The TNS controller 12 may be configured to control the TNS tool 11 so that the TNS tool 11 performs filtering (e.g., for some frames) using at least one higher impulse response energy linear prediction (LP) filtering and (e.g., for some other frames) using at least one higher impulse response energy LP filtering. The TNS controller 12 is configured to perform a selection between higher impulse response energy LP filtering and lower impulse response energy LP filtering on the basis of a metrics associated to the frame (frame metrics). The energy of the impulse response of the first filter is higher than the energy of the impulse response of the second filter.
The FD information signal 13 (Xs(n)) may be, for example, obtained from a modified discrete cosine transform, MDCT, tool (or modified discrete sine transform MDST, for example) which has transformed a representation of a frame from a time domain, TD, to the frequency domain, FD.
The TNS tool 11 may process signals, for example, using a group of linear prediction (LP) filter parameters 14 (a(k)), which may be parameters of a first filter 14a. The TNS tool 11 may also comprise parameters 14′ (aw(k)) which may be parameters of a second filter 15a (the second filter 15a may have an impulse response with lower energy as compared to the impulse response of the first filter 14a). The parameters 14′ may be understood as a weighted version of the parameters 14, and the second filter 15a may be understood as being derived from the first filter 14a. Parameters may comprise, inter alia, one or more of the following parameters (or the quantized version thereof): LP coding, LPC, coefficients, reflection coefficients, RCs, coefficients rci(k) or quantized versions thereof rcq(k), arcsine reflection coefficients, ASRCs, log-area ratios, LARs, line spectral pairs, LSPs, and/or line spectral frequencies, LS, or other kinds of such parameters. In examples, it is possible to use any representation of filter coefficients.
The output of the TNS tool 11 may be a filtered version 15 (Xf(n)) of the FD information signal 13 (Xs(n)).
Another output of the TNS tool 11 may be a group of output parameters 16, such as reflection coefficients rci(k) (or quantized versions thereof rcq(k)).
Downstream to the components 11 and 12, a bitstream coder may encode the outputs 15 and 16 into a bitstream which may be transmitted (e.g., wirelessly, e.g., using a protocol such as Bluetooth) and/or stored (e.g., in a mass memory storage unit).
TNS filtering provides reflection coefficients which are in general different from zero. TNS filtering provides an output which is in general different from the input.
As shown in
Reference numeral 17′ in
The metrics 17 may be, for example, a metrics associated to the energy of the signal in the frame (for example, the metrics may be such that the higher the energy, the higher the metrics). The metrics may be, for example, a metrics associated to a prediction error (for example, the metrics may be such that the higher the prediction error, the lower the metric). The metrics may be, for example, a value associated to the relationship between the prediction error and energy of the signal (for example, the metrics may be such that the higher the ratio between the energy and the prediction error, the higher the metrics). The metrics may be, for example, a prediction gain for a current frame, or a value associated or proportional to the prediction gain for the current frame (such as, for example, the higher the prediction gain, the higher the metrics). The frame metrics (17) may be associated to the flatness of the signal's temporal envelope.
It has been noted that artefacts due to TNS occur only (or at least prevalently) when the prediction gain is low. Therefore, when the prediction gain is high, the problems caused by TNS do not arise (or are less prone to arise) and it is possible to perform full TNS (e.g., higher impulse response energy LP). When the prediction gain is very low, it is advantageous not to perform TNS at all (non-filtering). When the prediction gain is intermediate, it is advantageous to reduce the effects of the TNS by using a lower impulse response energy linear prediction filtering (e.g., by weighting LP coefficients or other filtering parameters and/or reflection coefficients and/or using a filter whose impulse response has a lower energy). The higher impulse response energy LP filtering and the lower impulse response energy LP filtering are different from each other in that the higher impulse response energy LP filtering is defined so as to cause a higher impulse response energy than the lower impulse response energy LP filtering. A filter is in general characterized by the impulse response energy and, therefore, it is possible to identify it with its impulse response energy. The higher impulse response energy LP filtering means using a filter whose impulse response has a higher energy than the filter used in the lower impulse response energy LP filtering.
Hence, with the present examples, the TNS operations may be computed by:
High impulse response energy LP filtering may be obtained, for example, using a first filter having a high impulse response energy. Low impulse response energy LP filtering may be obtained, for example, using a second filter having a lower impulse response energy. The first and second filter may be linear time-invariant (LTI) filters.
In examples, the first filter may be described using the filter parameters a(k) (14). In examples, the second filter may be a modified version of the first filter (e.g., as obtained by the TNS controller 12). The second filter (lower impulse response energy filter) may be obtained by downscaling the filter parameters of the first filter (e.g., using a parameter γ or γk such that 0<γk such that 0<γ<1, with k being a natural number such that k≤K, K being the order of the first filter).
Therefore, in examples, when the filter parameters are obtained, and on the basis of the metrics, it is determined that the lower impulse response energy filtering may be used, the filter parameters of the first filter may be modified (e.g., downscaled) to obtain filter parameters of the second filter, to be used for the lower impulse selection energy filter.
At step S31, a frame metrics (e.g., prediction gain 17) is obtained.
At step S32, it is checked whether the frame metrics 17 is higher than a TNS filtering determination threshold or first threshold (which may be 1.5, in some examples). An example of metrics may be a prediction gain.
If at S32 it is verified that the frame metrics 17 is lower than the first threshold (thresh), no filtering operation is performed at S33 (it could be possible to say that an identity filter is used, the identity filter being a filter in which the output is the same of the input). For example, Xf(n)=Xs(n) (the output 15 of the TNS tool 11 is the same as the input 13), and/or the reflection coefficients rci(k) (and/or their quantized versions rc0(k)) are also set at 0. Therefore, the operations (and the output) of the decoder apparatus 20 will not be influenced by the TNS tool 11. Hence, at S33, neither the first filter nor the second filter may be used.
If at S32 it is verified that the frame metrics 17 is greater than the TNS filtering determination threshold or first threshold (thresh), a second check may be performed at step S34 by comparing the frame metrics with a filtering type determination threshold or second threshold (thresh2, which may be greater than the first threshold, and be, for example, 2).
If at S34 it is verified that the frame metrics 17 is lower than the filtering type determination threshold or second threshold (thresh2), lower impulse response energy LP filtering is performed at S35 (e.g., a second filter with lower impulse response energy is used, the second filter non-being an identity filter).
If at S34 it is verified that the frame metrics 17 is greater than the filtering type determination threshold or second threshold (thresh2), higher impulse response energy LP filtering is performed at S36 (e.g., a first filter whose response energy is higher than the lower energy filter is used).
The method 30 may be reiterated for a subsequent frame.
In examples, the lower impulse response energy LP filtering (S35) may differ from the higher impulse response energy LP filtering (S36) in that the filter parameters 14 (a(k)) may be weighted, for example, by different values (e.g., the higher impulse response energy LP filtering may be based on unitary weights and the lower impulse response energy LP filtering may be based on weights lower than 1). In examples, the lower impulse response energy LP filtering may differ from the higher impulse response energy LP filtering in that the reflection coefficients 16 obtained by performing lower impulse response energy LP filtering may cause a higher reduction of the impulse response energy than the reduction caused by the reflection coefficients obtained by performing higher impulse response energy LP filtering.
Hence, while performing higher impulse response energy filtering at the step S36, the first filter is used on the basis of the filter parameters 14 (a(k)) (which are therefore the first filter parameters). While performing lower impulse response energy filtering at the step S35, the second filter is used. The second filter may be obtained by modifying the parameters of the first filter (e.g., by weighting with weight less than 1).
The sequence of steps S31-S32-S34 may be different in other examples: for example, S34 may precede S32. One of the steps S32 and/or S34 may be optional in some examples.
In examples, at least one of the first and/or second thresholds may be fixed (e.g., stored in a memory element).
In examples, the lower impulse response energy filtering may be obtained by reducing the impulse response of the filter by adjusting the LP filter parameters (e.g., LPC coefficients or other filtering parameters) and/or the reflection coefficients, or an intermediate value used to obtain the reflection coefficients. For example, coefficients less than 1 (weights) may be applied to the LP filter parameters (e.g., LPC coefficients or other filtering parameters) and/or the reflection coefficients, or an intermediate value used to obtain the reflection coefficients.
In examples, the adjustment (and/or the reduction of the impulse response energy) may be (or be in terms of):
where thresh2 is the filtering type determination threshold (and may be, for example, 2), thresh is the TNS filtering determination threshold (and may be 1.5), γmin is a constant (e.g., a value between 0.7 and 0.95, such as between 0.8 and 0.9, such as 0.85). γ values may be used to scale the LPC coefficients (or other filtering parameters) and/or the reflection coefficients. frameMetrics is the frame metrics.
In one example, the formula may be
where thresh2 is the filtering type determination threshold (and may be, for example, 2), thresh is the TNS filtering determination threshold (and may be 1.5), γmin is a constant (e.g., a value between 0.7 and 0.95, such as between 0.8 and 0.9, such as 0.85). γ values may be used to scale the LPC coefficients (or other filtering parameters) and/or the reflection coefficients. predGain may be the prediction gain, for example.
From the formula it may be seen that a frameMetrics (or predGain) lower than thresh2 but close to it (e.g., 1.999) will cause the reduction of impulse response energy to be weak (e.g. γ≅1). Therefore, the lower impulse response energy LP filtering may be one of a plurality of different lower impulse response energy LP filterings, each being characterized by a different adjustment parameter γ, e.g., in accordance to the value of the frame metrics.
In examples of lower impulse response energy LP filtering, different values of the metrics may cause different adjustments. For example, a higher prediction gain may be associated to a higher a higher value of γ, and a lower reduction of the impulse response energy with respect to the first filter. γ may be seen as a linear function dependent from predGain. An increment of predGain will cause an increment of γ, which in turn will diminish the reduction of the impulse response energy. If predGain is reduced, γ is also reduced, and the impulse response energy will be accordingly also reduced.
Therefore, subsequent frames of the same signal may be differently filtered:
Accordingly, for each frame, a particular first filter may be defined (e.g., on the basis of the filter parameters), while a second filter may be developed by modifying the filter parameters of the first filter.
A frame metrics (e.g., prediction gain) 17 may be obtained and compared to a TNS filtering determination threshold 18a (e.g., at a comparer 10a). If the frame metrics 17 is greater than the TNS filtering determination threshold 18a (thresh), it is permitted (e.g., by the selector 11a) to compare the frame metrics 17 with a filtering type determination threshold 18b (e.g., at a comparer 12a). If the frame metrics 17 is greater than the filtering type determination threshold 18b, then a first filter 14a whose impulse response has higher energy (e.g. γ=1) is activated. If the frame metrics 17 is lower than the filtering type determination threshold 18b, then a second filter 15a whose impulse response has lower energy (e.g., γ<1) is activated (element 12b indicates a negation of the binary value output by the comparer 12a). The first filter 14a whose impulse response has higher energy may perform filtering S36 with higher impulse response energy, and the second filter 15a whose impulse response has lower energy may perform filtering S35 with lower impulse response energy.
The method 36 may comprise a step S36a of obtaining the filter parameters 14. The method 36 may comprise a step S36b performing filtering (e.g., S36) using the parameters of the first filter 14a. Step S35b may be performed only at the determination (e.g., at step S34) that the frame metrics is over the filtering type determination threshold (e.g., at step S35).
The method 35 may comprise a step S35a of obtaining the filter parameters 14 of the first filter 14a. The method 35 may comprise a step S35b of defining the adjustment factor γ (e.g., by using at least one of the thresholds thresh and thresh2 and the frame metrics). The method 35 may comprise a step 35c for modifying the first filter 14a to obtain a second filter 15a having lower impulse response energy with respect to the first filter 14a. In particular, the first filter 14a may be modified by applying the adjustment factor γ (e.g., as obtained at S35b) to the parameters 14 of the first filter 14a, to obtain the parameters of the second filter. The method 35 may comprise a step S35d in which the filtering with the second filter (e.g., at S35 of the method 30) is performed. Steps S35a, S35b, and S35c may be performed at the determination (e.g., at step S34) that the frame metrics is less than the filtering type determination threshold (e.g., at step S35).
The steps of method 40 (indicated as a sequence a)-b)-c)-d)-1)-2)-3)-e-f) and by the sequence S41′-S49′) is discussed here below.
r(k)=r(k)w(k), k=0, . . . ,K
w(k)=exp[−½(2παk)2], k=0, . . . ,K
αw(k)=γkα(k), k=0, . . . ,K
and round(.) is the rounding-to-nearest-integer function.
s
0(nstart−1)=s1(nstart−1)= . . . =sK-1(nstart−1)=0
A bitstream may be transmitted to the decoder. The bitstream may comprise, together with an FD representation of the information signal (e.g., an audio signal), also control data, such as the reflection coefficients obtained by performing TNS operations described above (TNS analysis).
The method 40″ (decoder side) may comprise steps g) (S41″) and h) (S42″) in which, if TNS is on, the quantized reflection coefficients are decoded and the quantized MDCT (or MDST) spectrum is filtered back. The following procedure may be used:
s
0(nstart−1)=s1(nstart−1)= . . . =sK-1(nstart−1)=0
for n=nstart to nstop do
An example of encoder apparatus 50 (which may embody the encoder apparatus 10 and/or perform at least some of the operation of the methods 30 and 40′) is shown in
The encoder apparatus 50 may comprise a plurality of tools for encoding an input signal (which may be, for example, an audio signal). For example, a MDCT tool 51 may transform a TD representation of an information signal to an FD representation. A spectral noise shaper, SNS, tool 52 may perform noise shaping analysis (e.g., a spectral noise shaping, SNS, analysis), for example, and retrieve LPC coefficients or other filtering parameters (e.g., a(k), 14). The TNS tool 11 may be as above and may be controlled by the controller 12. The TNS tool 11 may perform a filtering operation (e.g. according to method 30 or 40′) and output both a filtered version of the information signal and a version of the reflection coefficients. A quantizer tool 53 may perform a quantization of data output by the TNS tool 11. An arithmetic coder 54 may provide, for example, entropy coding. A noise level tool 55′ may also be used for estimating a noise level of the signal. A bitstream writer 55 may generate a bitstream associated to the input signal that may be transmitted (e.g., wireless, e.g., using Bluetooth) and/or stored.
A bandwidth detector 58′ (which may detect the bandwidth of the input signal) may also be used. It may provide the information on active spectrum of the signal. This information may also be used, in some examples, to control the coding tools.
The encoder apparatus 50 may also comprise a long term post filtering tool 57 which may be input with a TD representation of the input signal, e.g., after that the TD representation has been downsampled by a downsampler tool 56.
An example of decoder apparatus 60 (which may embody the decoder apparatus 20 and/or perform at least some of the operation of the method 40″) is shown in
The decoder apparatus 60 may comprise a reader 61 which may read a bitstream (e.g., as prepared by the apparatus 50). The decoder apparatus 60 may comprise an arithmetic residual decoder 61a which may perform, for example, entropy decoding, residual decoding, and/or arithmetic decoding with a digital representation in the FD (restored spectrum), e.g., as provided by the decoder. The decoder apparatus 60 may comprise a noise filing tool 62 and a global gain tool 63, for example. The decoder apparatus 60 may comprise a TNS decoder 21 and a TNS decoder controller 22. The apparatus 60 may comprise an SNS decoder tool 65, for example. The decoder apparatus 60 may comprise an inverse MDCT (or MDST) tool 65′ to transform a digital representation of the information signal from the FD to the TD. A long term post filtering may be performed by the LTPF tool 66 in the TD. Bandwidth information 68 may be obtained from the bandwidth detector 58′, for example, ad applied to some of the tools (e.g., 62 and 21).
Examples of the operations of the apparatus above are here provided.
Temporal Noise Shaping (TNS) may be used by tool 11 to control the temporal shape of the quantization noise within each window of the transform.
In examples, if TNS is active in the current frame, up to two filters per MDCT-spectrum (or MDST spectrum or other spectrum or other FD representation) may be applied. It is possible to apply a plurality of filters and/or to perform TNS filtering on a particular frequency range. In some examples, this is only optional.
The number of filters for each configuration and the start and the stop frequency of each filter are given in the following table:
Information such as the start and stop frequencies may be signalled, for example, from the bandwidth detector 58′.
Where NB is narrowband, WB is wideband, SSWB is semi-super wideband, SWB is super wideband, and FB is full wideband.
The TNS encoding steps are described in the below. First, an analysis may estimate a set of reflection coefficients for each TNS filter. Then, these reflection coefficients may be quantized. And finally, the MDCT-spectrum (or MDST spectrum or other spectrum or other FD representation) may be filtered using the quantized reflection coefficients.
The complete TNS analysis described below is repeated for every TNS filter f, with f=0 . . . num_tns_filters−1 (num_tns_filters being provided by the table above).
A normalized autocorrelation function may be calculated (e.g., at step S41′) as follows, for each k=0 . . . 8
with sub_start(f, s) and sub_stop(f, s) are given in the table above.
The normalized autocorrelation function may be lag-windowed (e.g., at S42′) using, for example:
r(k)=r(k)exp[−½(0.02πk)2] for k=0 . . . 8
The Levinson-Durbin recursion described above may be used (e.g., at step S43′) to obtain LPC coefficients or other filtering parameters α(k), k=0 . . . 8 and/or a prediction error e.
The decision to turn on/off the TNS filter f in the current frame is based on the prediction gain:
If predGain>thresh, then turn on the TNS filter f
With, for example, thresh=1.5 and the prediction gain being obtained, for example, as:
The additional steps described below are performed only if the TNS filter f is turned on (e.g., if the step S32 has result “YES”).
A weighting factor γ is computed by
with thresh2=2, γmin=0.85 and
The LPC coefficients or other filtering parameters may be weighted (e.g., at step S46′) using the factor γ
αw(k)=γkα(k) for k=0 . . . 8
The weighted LPC coefficients or other filtering parameters may be converted (e.g., at step S47′) to reflection coefficients using, for example, the following algorithm:
wherein rc(k, f)=rc(k) are the final estimated reflection coefficients for the TNS filter f.
If the TNS filter f is turned off (e.g., outcome “NO” at the check of step S32), then the reflection coefficients may be simply set to 0: rc(k, f)=0, k=0 . . . 8.
The quantization process, e.g., as performed at step S48′, is now discussed.
For each TNS filter f, the reflection coefficients obtained may be quantized, e.g., using scalar uniform quantization in the arcsine domain
and
rc
q(k,f)=sin [Δ(rci(k,f)−8)] for k=0 . . . 8
wherein
and nint(.) is the rounding-to-nearest-integer function, for example. rci(k, f) may be the quantizer output indices and rcq(k, f) may be the quantized reflection coefficients.
The order of the quantized reflection coefficients may be calculated using
k=7
while k≥0 and rcq(k, f)=0 do
k=k−1
rc
order(f)=k+1
The total number of bits consumed by TNS in the current frame can then be computed as follows
The values of tab_nbits_TNS_order and tab_nbits_TNS_coef may be provided in tables.
The MDCT (or MDST) spectrum Xs(n) (input 15 in
s
0(start_freq(0)−1)=s1(start_freq(0)−1)= . . . =s7(start_freq(0)−1)=0
for f=0 to num_tns_filters−1 do
With reference to operations performed at the decoder (e.g., 20, 60), quantized reflection coefficients may be obtained for each TNS filter f using
rc
q(k,f)=sin [Δ(rci(k,f)−8)] k=0 . . . 8
wherein rcq(k, f) are the quantizer output indices.
The MDCT (or MDST) spectrum (n) as provided to the TNS decoder 21 (e.g., as obtained from the global gain tool 63) may then be filtered using the following algorithm
s
0(start_freq(0)−1)=s1(start_freq(0)−1)= . . . =s7(start_freq(0)−1)=0
for f=0 to num_tns_filters−1 do
As explained above, TNS can sometimes introduce artefacts, degrading the quality of the audio coder. These artefacts are click-like or noise-like and appear in most of the cases with speech signals or tonal music signals.
It was observed that artefacts generated by TNS only occur in frames where the prediction gain predGain is low and close to a threshold thresh.
One could think that increasing the threshold would easily solve the problem. But for most of the frames, it is actually beneficial to turn on TNS even when the prediction gain is low.
Our proposed solution is to keep the same threshold but to adjust the TNS filter when the prediction gain is low, so as to reduce the impulse response energy.
There are many ways to implement this adjustment (which is some cases may be referred to as “attenuation”, e.g., when the reduction of impulse response energy is obtained by reducing the LP filter parameters, for example). We may choose to use weighting, which may be, for example, a weighting
a_w(k)=γ{circumflex over ( )}k a(k),k=0, . . . ,K
with a(k) are the LP filter parameters (e.g., LPC coefficients) computed in Encoder Step c) and a_w (k) are the weighted LP filter parameters. The adjustment (weighting) factor γ is made dependent on the prediction gain such that higher reduction of impulse response energy (γ<1) is applied for lower prediction gains and such that there is, for example, no reduction of impulse response energy (γ=1) for higher prediction gains.
The proposed solution was proven to be very effective at removing all artefacts on problematic frames while minimally affecting the other frames.
Reference can now be made to
The prediction gain is related to the flatness of the signal's temporal envelope (see, for example, Section 3 of ref [2] or Section 1.2 of ref [3]).
A low prediction gain implies a tendentially flat temporal envelope, while a high prediction gain implies an extremely un-flat temporal envelope.
Depending on certain implementation requirements, examples may be implemented in hardware. The implementation may be performed using a digital storage medium, for example a floppy disk, a Digital Versatile Disc (DVD), a Blu-Ray Disc, a Compact Disc (CD), a Read-only Memory (ROM), a Programmable Read-only Memory (PROM), an Erasable and Programmable Read-only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM) or a flash memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Generally, examples may be implemented as a computer program product with program instructions, the program instructions being operative for performing one of the methods when the computer program product runs on a computer. The program instructions may for example be stored on a machine readable medium.
Other examples comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier. In other words, an example of method is, therefore, a computer program having a program instructions for performing one of the methods described herein, when the computer program runs on a computer.
A further example of the methods is, therefore, a data carrier medium (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier medium, the digital storage medium or the recorded medium are tangible and/or non-transitionary, rather than signals which are intangible and transitory.
A further example comprises a processing unit, for example a computer, or a programmable logic device performing one of the methods described herein.
A further example comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further example comprises an apparatus or a system transferring (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some examples, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some examples, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any appropriate hardware apparatus.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
17201094.4 | Nov 2017 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2018/080339, filed Nov. 6, 2018, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 17 201 094.4, filed Nov. 10, 2017, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2018/080339 | Nov 2018 | US |
Child | 16868954 | US |