The present invention relates to the field of audio signal encoding and decoding techniques and to Adaptive Differential Pulse Code Modulation (ADPCM) techniques in particular.
For the purpose of data compression, samples of a digital audio signal are frequently coded using entropy coding. However, such samples typically exhibit substantial temporal correlation, leading to redundancy in the compressed stream if they are coded independently. If this redundancy is removed or reduced, the compressed data rate can be reduced further. For example, in a transform based encoder, the correlation causes the signal power to be concentrated in a few of the transformed bands, and the remaining, low-power bands can be adequately represented by a smaller number of bits.
In an Adaptive Differential Pulse Code Modulation (ADPCM) encoder, the current audio sample is predicted on the basis of previous samples, and the predicted value subtracted from the actual current value audio to leave a ‘residual’. An approximation to the residual is communicated over a transmission channel to the ADPCM decoder which similarly computes a predicted value. The decoder then adds the residual to the predicted value in order to reconstruct an approximation to the original audio sample. The sequence of residual samples is known as an ‘innovation signal’ because it contains that which was not predicted and is therefore new, and the ratio between the audio signal power and the innovation signal power is termed the prediction gain.
It is generally desirable to maximise the prediction gain, i.e. to minimise the innovation signal power, in order to minimise the data rate required to transmit the innovation signal. Linear prediction is almost invariably used and, according to linear prediction theory, the prediction gain is maximised when temporal correlation has been removed. This criterion is equivalent to saying that, viewed in the frequency domain, the innovation signal has a white spectrum. The process of subtracting a linear prediction from the input signal can be expressed as a filtering operation performed by a filter whose z-transform response is W(z). In order to produce a white innovation spectrum the filter W(z) needs to have an amplitude response equal to the inverse of the amplitude spectrum of the input audio signal. Frequently, an adjustable or reconfigurable filter is used in order to follow, at least to some extent, variations in the input spectrum.
The prediction algorithms in an ADPCM encoder and in the corresponding decoder should be kept in step with each other so that the decoder can accurately invert the encoder's prediction processing at all times and there are three methods of achieving this. A fixed predictor can be used, or the encoder can choose suitable predictor settings from time to time and communicate these settings to the decoder, or the encoder and decoder can both use a common method of adapting predictor settings from the innovation signal as conveyed over the transmission channel. This last method is termed backwards adaptive prediction and the invention concerns a method for backwards adaptive prediction.
Backwards adaptive prediction requires of the predictor adaptation method not only that it should result in a predictor having a useful prediction gain, but also that it should be robust against transmission errors. That is, when implemented in both an encoder and a decoder, the decoder's prediction settings should converge to those of the encoder, for example on start up or after a transmission error. Moreover, a transmission error should not perturb the decoder's setting more than necessary. It is usual to employ a filter of fixed architecture but with dynamically adjustable coefficients.
Prediction filters may be implemented digitally in a transversal (Finite Impulse Response or FIR) or a recursive (Infinite Impulse Response or IIR) structure, or a combination of both. Known algorithms for adapting or training a filter include the least-mean-squares (LMS) algorithm of Widrow and Hoff (Bernard Widrow, Samuel D. Stearns in “Adaptive Signal Processing”, Prentice Hall, 1985, ISBN 0-13-004029-0) and its variants. In the context of an encoder, this algorithm explicitly attempts to minimise the power in the residual signal. In order to keep the adaptations within the encoder and decoder in step with each other, both must operate from the same signal, and to this end the encoder quantises the signal for transmission and then uses an inverse quantiser to reconstruct a signal from which both the encoder and the decoder can adapt their predictors.
Whether or not in the context of an ADPCM codec, it is conventional to incorporate ‘leakage’ into an LMS implementation, to improve the stability of the adaptation and prevent filter coefficients from wandering in the presence of arithmetic rounding errors. In an ADPCM codec, however, prediction gain will generally be significantly reduced if sufficient leakage is applied to provide adequate stability of the adaptation. It would therefore be desirable to provide high prediction gain while maintaining good stability and convergence properties in the adaptation.
According to a first aspect of the present invention there is provided a method of adaptively filtering an input signal having an input spectral dynamic range and related to an audio signal to furnish a partially whitened signal having a reduced spectral dynamic range compared to the input spectral dynamic range, the method comprising the steps of:
The present invention thus employs a filter depending on adjustable time-varying coefficients to furnish a partially whitened signal and also employs an adaptive whitening filter for performing a further filtering operation. In many embodiments any signal that results from the filtering by the adaptive whitening filter is not used except internally to guide the adaptation. Rather, the purpose of the adaptive whitening filter is to provide coefficients which vary with time, which represent the response F1(z), which depend on the spectrum of the partially whitened signal and which are used in the step of adjusting. The invention in its first aspect thus allows W(z) to depend on the spectrum of the partially whitened signal in a controlled manner. This allows the spectral relationship of the partially whitened or ‘compressed’ signal to depend in a controlled manner on the spectrum of the input signal, in contrast to prior art methods that provide a compressed signal which is the output of a whitening filter, and which is fully whitened except for imperfections and limitations of the whitening filter that may be more difficult to control.
According to a second aspect of the present invention there is provided a method of adaptively filtering a partially whitened signal having a reduced spectral dynamic range to produce an output signal having an output spectral dynamic range which is increased compared to the reduced spectral dynamic range, the method comprising the steps of:
In this way, the method allows W−1(z) to depend on F1(z), and hence on the spectrum of the partially whitened signal, in a controlled manner. If suitably configured, the invention in its second aspect may thus substantially invert the processing performed by the invention its first aspect.
In some embodiments, one or more of the adjustable time-varying coefficients are identified with coefficients of the first adaptive whitening filter, such that the step of adjusting those coefficients is performed as part of normal operation of the first adaptive whitening filter. That is, an adaptive whitening filter adjusts its own coefficients, and therefore if these are identified with the adjustable time varying coefficients, for example by using shared memory, then those other coefficients are adjusted automatically.
In some embodiments, the step of adjusting comprises copying coefficients of the first adaptive whitening filter to the adjustable time-varying coefficients. Preferably, the copying comprises scaling a coefficient by a factor γj, where 0.7<γ≦1 and where the coefficient multiplies a signal sample having a delay of j samples. This provides a safety margin to ensure that the step of filtering the input signal is minimum-phase, or alternatively that the step of filtering the partially whitened signal according to the second aspect does not result in excessively high amplification.
In the method of the first aspect, the time-varying response W(z) may incorporate a factor F1(z/γ)n where 0.7<γ≦1 and where n is a positive integer. Similarly, in the method of the second aspect, the time-varying response W−1(z) may incorporate a factor F1(z/γ)−n where 0.7<γ≦1 and where n is a positive integer. Preferably, the positive integer n is selected from the integers 1, 2, 3, 4 and 5.
In some embodiments the step of filtering may be performed using a component filter whose response is proportional to F1(z/γ)−1.
The method of the first aspect may comprise the further step of feeding the partially whitened signal to a second adaptive whitening filter, wherein the second adaptive whitening filter filters the partially whitened signal with a time-varying response F2(z), and wherein the response W(z) incorporates a factor F2(z/γ2)n
In a similar manner, the method of the second aspect may comprise the further step of feeding the partially whitened signal to a second adaptive whitening filter, wherein the second adaptive whitening filter filters the partially whitened signal with a time-varying response F2(z), and wherein the response W−1(z) incorporates a factor:
for some γ1, γ2≦1 and integers n1, n2≧1.
Preferably, the second adaptive whitening filter has lower spectral resolution and adapts more quickly than the first adaptive whitening filter. This allows fast adaptation to sudden changes in broad spectral features of the input signal, while avoiding stability problems that typically occur when an attempt is made to provide fast adaptation simultaneously with high spectral resolution.
It is also preferred that the second adaptive whitening filter has lower spectral resolution than the first adaptive whitening filter and that n2>n1. This allows broad spectral features of the input signal to be compressed more heavily than narrower features, thus avoiding stability problems that typically occur when an attempt is made to provide high compression with a filter of high spectral resolution.
In the method of the first aspect, the time-varying response W(z) may incorporate a fixed factor K(z) where K(z)≠1. Likewise, in the method of the second aspect, the time-varying response W−1(z) may incorporate a fixed factor K−1 (z) where K(z)≠1. This allows a falling spectral density representative of typical audio signals to be compensated (or, according to the second aspect, reconstituted) by a fixed filter, the time-varying filters thereby needing to provide less compression in order reduce the spectral dynamic range of the partially whitened signal to an acceptable level.
Preferably, the first adaptive whitening filter is an FIR filter that adapts its coefficients based on the method of Widrow and Hoff.
Preferably, in the method of the first aspect, the step of filtering comprises the steps of:
The step of deriving may comprise quantising the difference sample or a sample generated therefrom. The latter takes account of the fact that the difference sample may be processed before quantisation. Additionally, or alternatively, the step of deriving comprises furnishing a normalised version of the partially whitened signal, and the step of feeding feeds the normalised version of the partially whitened signal to the first adaptive whitening filter.
Similarly, in the method of the second aspect, it is preferred that the step of filtering comprises the steps of:
According to a third aspect of the present invention, there is provided an encoder adapted to encode a signal according to the method of the first aspect. Preferably, the encoder is an Adaptive Differential Pulse Code Modulation (ADPCM) encoder.
According to a fourth aspect of the present invention, there is provided a decoder adapted to decode a signal according to the method of the second aspect. Preferably, the decoder is an Adaptive Differential Pulse Code Modulation (ADPCM) decoder.
According to a fifth aspect of the present invention, a codec comprises an encoder according to the third aspect in combination with a decoder according to the fourth aspect, wherein the decoder is adapted to produce the output signal having a spectrum substantially the same as a spectrum of the input signal to the encoder. In this way, the decoder substantially undoes the actions of the encoder. The qualifier “substantially” serves to emphasise the fact that the recovered signal may not be identical, since quantisation noise may “fill in” any deep troughs that were present in the input spectrum. However, the output signal components that are correlated with input signal components should have the same spectrum as those input signal components. This condition need not apply to noise components, particularly those that are uncorrelated.
According to a sixth aspect of the present invention, a data carrier comprises an audio signal encoded using the method of the first aspect.
According to a seventh aspect of the present invention, there is provided an encoder for encoding an audio signal, the encoder comprising an adaptive filter that compresses the spectral dynamic range of an input signal by a substantially constant ratio.
Using known techniques, such as LMS (with leakage), the compression ratio applied to a “sine-wave” variation of an input spectrum, for example, is vastly higher when the sine wave is on a high part of the spectrum than when it is on a low-level part of the spectrum. By comparison the present invention manages substantially similar compression in the two cases.
Preferably, a local spectral feature is compressed by a ratio that does not change by more than a factor of two when the input signal changes to reduce the spectral density in the vicinity of the feature from 0 dB to −20 dB relative to the largest spectral density within the spectrum, and wherein the compression ratio is at least a factor 1.5:1 when said spectral density in the vicinity is −20 dB.
The intention here is that the “local feature” is superposed on either a loud part of the spectrum or a quiet part. Moreover, the test is actually performed twice, with the signal being either loud or quiet in the region of the feature. The term “Local spectral feature” might imply a visually identifiable feature, but in the case of a low order filter the compression will not be able to follow any detail of something “local” nor get the same clear compression. Therefore, what is actually intended here is that, if one were to take a first spectrum and make a “broad” perturbation in a region of it where the spectral density is high, and then apply the same perturbation to a second spectrum having a low spectral density in the same region (e.g. lower by 20 dB), then on comparing all four compressed spectra, one would find that the perturbation has made substantially the same difference in the two cases. For example, the two compressed perturbations might be within a factor of two of each other.
As will be appreciated by those skilled in the art, the present invention provides an ADPCM encoder and decoder, the encoder receiving an input signal and transmitting information relating to a quantised innovation signal to the decoder. The decoder reconstructs a replica of the quantised innovation signal and both the encoder and decoder train the coefficients of a first adaptive whitening filter having time-varying response denoted by F(z) so as to substantially whiten the quantised innovation signal, within the modelling capability of said filter.
No signal output from this adaptive whitening filter is used for prediction purposes. Instead, the signal path from the input of the encoder to the quantised innovation signal includes a further filter whose coefficients are continually adjusted in dependence on the coefficients of the adaptive whitening filter so that the transfer function from the signal input to the quantiser has a response that includes a factor F(z)n for some n≧1, or alternatively F(z/γ)n where 0.7<γ≦1. The decoder similarly comprises a reconstruction filter whose coefficients are continually adjusted in dependence on the adaptive whitening filter within the decoder so that the reconstruction filter has a response that includes a factor F(z)−n or F(z/γ)−n. The adjusting may be accomplished by using the same storage for the coefficients of the further filter or reconstruction filter as for the adaptive whitening filter, or alternatively by continually copying the coefficients of the adaptive whitening filter.
The further filter may comprise n cascaded filters whose coefficients are adjusted in dependence on the coefficients of the adaptive whitening filter. The quantised innovation signal can be normalised to have a reduced dynamic range relative to the input signal. The encoder and decoder can both use the quantised innovation signal to train also a second adaptive whitening filter having a different modelling capability from the first adaptive whitening filter. Thus, the further filter in the encoder then comprises also one or more filters whose coefficients are adjusted in dependence on the coefficients of the second adaptive whitening filter, and similarly for the reconstruction filter in the decoder.
Examples of the present invention will now be described in detail with reference to the accompanying drawings, in which:
The invention will now be described with reference to the figures, in which circles represent arithmetic addition except where otherwise indicated by a minus sign or a multiplication sign.
The decoder 102 feeds a received symbol stream 154d to an inverse quantiser 130d which furnishes a received QIS 155d. The received QIS 155d is fed to an adaptive predictor 140d to furnish a predicted signal 156d. The adder 111 adds the predicted signal 156d to the received QIS 155d to furnish the decoder's output signal 152.
In a practical system, the communication path 150 often includes an entropy coder followed by a data formatting unit within the encoder 101, then data storage or a data transmission path to the decoder 102, which in turn unpacks the data and entropy decodes it if necessary to recover the stream of symbols 154d. These aspects are not the subject of the invention and so have not been shown explicitly, it being assumed simply that, subject to transmission errors, the communication path furnishes for the decoder a stream of symbols 154d that is a replica of the symbol stream 154 in the encoder.
Thus, in the absence of transmission errors, and subject to appropriate stability and start-up conditions, signals 155 and 155d in the encoder and decoder respectively should be identical, hence the two filters 140 and 140d should adapt in the same manner and the two predicted signals 156 and 156d should also be identical. It follows that, to the extent that the quantised innovation signal 155 approximates the unquantised innovation signal 153, the decoder's output signal 152 will approximate the encoder's input signal 151.
Prior-art adaptation of an FIR predictor by the LMS method of Widrow and Hoff (Bernard Widrow, Samuel D. Stearns in “Adaptive Signal Processing”, Prentice Hall, 1985, ISBN 0-13-004029-0) is now reviewed with reference to
where Hn,1 . . . Hn,p are the p time-varying coefficient values of a prediction filter H at time-step n. The LMS coefficient adaptation formula is then:
H
n+1,i=(1−λ)Hn,i+μenxn−ii=1 . . . p (2)
where:
Alternatively, the normalised update formula known as NMLS may be used in order to render the adjustment speed independent of the overall scaling of the input signal 251:
Strictly, z-transform techniques apply only to fixed filters, but if time variation is ignored we can denote the transfer function of H by H(z). Similarly denoting the transfer function from {xn} to {en} by F(z), we have:
F(z)=1−H(z) (4)
For stationary signals, with λ=0 and with small μ, the update formula (2) or (3) adjusts the coefficients of filter H so as to minimise the mean-square value of {en}. It can be shown that this corresponds in the spectral domain to whitening the spectrum of {en}, provided the length p of filter H is sufficiently large. Hence we refer to F(z) a whitening filter, though the whitening is in general imperfect because of the finite length of H. We say that {xn} has been whitened to furnish {en}, subject to the modelling capability of filter F(z).
Note that H has no delay-free path, that is H(z)=H1.z−1+H2.z−2 . . . +Hp.z−p with no term in z0, and the same is therefore true of 1−F(z) or of F(z)−1, a fact that will be used later.
Inspection of
We now consider the effect of transmission channel errors, which cause erroneous symbols to be received by the decoder, in turn resulting in disturbances to the output 155d of the decoder's inverse quantiser 130d. These disturbances propagate to the decoder's output by at least three mechanisms: directly via adder 111, via the signal path of the predictor 140d and also by perturbing the adaptation of the adaptive predictor 140d, causing the spectral shape of the decoded output to be altered. It is the object of the invention to minimise the effect of the last of these three disturbance mechanisms
In the prior art it is known that the perturbation to the adaptation caused by transmission channel errors can be reduced by applying leakage or ‘damping’ to the predictor's coefficients, but that such leakage also degrades system performance when there are no errors (Gibson, J. D. et. al., “Kalman Backward Adaptive Predictor Coefficient Identification in ADPCM with PCQ”, IEEE trans. Communications, vol. COM-28 no. 3, 1980 March). A prior art ADPCM codec using IIR prediction filters including leakage is described in detail in Recommendation G.722“7 kHz Audio—Coding within 64 kbit/s”, International Telecommunications Union, 1988 & 1993.
The present invention recognises that the effect of the adaptive predictor in the decoder is to detect small deviations from of the spectrum of the received QIS 155b relative to a completely flat or ‘white’ spectrum, and to amplify these deviations so that is the final decoded signal 152 has the same spectrum as the original signal 151. Thus, although optimal prediction gain would require almost complete whitening of the QIS, such almost complete whitening would require the decoder to amplify the remaining small deviations to an excessive extent, also amplifying spectral disturbances caused by transmission errors.
Fortunately, it can be shown that a variation of about 12 dB in spectral density can be tolerated in the QIS without significant adverse effect on the prediction gain, though a variation significantly greater than 12 dB, except for narrow bands of lower spectral density, will significantly reduce prediction gain. The invention therefore attempts to compress the input spectral dynamic range in a controlled manner, such that the range of 40 dB or more found in many audio signals can be compressed into a smaller range, perhaps of order 12 dB.
If all that were required was compression of a larger spectral range into a smaller one, then the prior art use of leakage would suffice. However leakage does not perform this compression in an optimal way, as we shall now show.
In
In trace C of each of
A more nearly uniform compression ratio produced by an encoder according to the invention is shown in trace C of
The invention does not require that the compression ratio provided by the encoder be completely uniform, but a reasonable design aim is to ensure that the spectral compression ratio provided by the encoder does not vary by more than a factor two as the input spectral density varies over a useful range of 20 dB, for example a range extending from the highest spectral density in the signal to a density 20 dB below that.
An encoder 401 according to the invention and implementing this controlled spectral compression is shown in
The coefficients of the training filter 460 are passed along the path 450 and used to configure a forward-path filter 441 that receives the input signal 151 and furnishes a signal 153 that is fed to quantiser 120. Filter 441 is intended to implement a response Fn, or an approximation thereto, where n is typically an integer≧1. As F is in general time-varying, filter 441 has adjustable coefficients that are continually adjusted in order to implement the time-varying response Fn.
To analyse the spectral behaviour of this encoder architecture, we assume steady-state conditions and also that the discarded signal output 456 of the fully-adapted filter 460 is indeed spectrally white, having a spectral density k where k is constant. Inspection shows that the transfer function from input 151 to the discarded signal 456 has a factor Fn from filter 441 and a factor F from filter 460 and is thereby F(n+1). It follows that the spectral density D(f) of the input signal is given by
D=k/F(n+1) (5)
from which
F=(k/D)1/(n+1) (6)
The spectral density Q(f) of the QIS is given by
The term D1/(n+1) represents a compression factor of (n+1) when the spectral densities D and Q are plotted logarithmically, for example on a decibel scale as in
Given a filter with response F, a filter with response Fn, where n is an integer 1, may be implemented as n cascaded copies of F.
The decoder 402 of
As noted earlier, (F−1) and (1−F) are filters that do not contain a delay-free path so that it is legitimate to put feedback around them. We assume further that F is causal and minimum-phase, so that F−1 and F−n are also causal minimum-phase filters.
Under these conditions the architecture of
In the codecs of
In
The codec of
Meanwhile, it will be clear that the action of decoder 502 can be made identical to that of decoder 402 if filter 540d is configured to have a response F−n−1 as shown, so that filter 540d and adder 511 in combination implement the response F−n of filter 442. Since the signal response of encoder 401 is the inverse of the signal response of decoder 402, it follows that the encoders 401 and 501 have identical responses Fn to a signal, and therefore that the codec of
We now consider how the response Fn−1 of filters 540 and 540d may be implemented, taking as an example the case n=3.
The structure of
In practice, implementation of several copies of a filter having a response F or F−1 is unlikely to result in a severe load on a typical Digital Signal Processor, which is well suited to the task. The major computational load is likely instead to be in the adaptation of the coefficients of the training filter 460.
The adaptation of the training filter 460 may be performed using any of several prior art methods, such as the normalised least-mean-squares (NLMS) method discussed previously. In some cases it will be necessary to adjust the adaptation parameters on account of the coefficient feedback path 450 in the encoder. The adaptation may be stabilised if necessary by adapting the coefficients more slowly, that is, by reducing the parameter μ in the adaptation formulas (2) and (3) displayed earlier. Since the input 155 of the adaptive whitening filter has the benefit of spectral compression it may be possible to apply moderate leakage without suffering the severe variation in the spectral compression ratio that results, as noted in relation to
A plain LMS adaptation algorithm is unable maintain a sensible adaptation rate in the presence of the very large changes in signal amplitude found in typical audio signals. The NLMS algorithm incorporates normalisation to address this problem, while the G722 algorithm (Recommendation G.722 “7 kHz Audio—Coding within 64 kbit/s”, International Telecommunications Union, 1988 & 1993) addresses the problem of dynamic range by considering only the signs (positive, zero or negative) and disregarding the magnitudes of the signals relevant to the adaptation.
Another method to cope with wide dynamic range is to feed the training filter from a normalised version of the QIS. Such a normalised signal will typically be produced within each inverse quantiser of an ADPCM codec as shown in
The normalised QIS 652 will differ spectrally from the unnormalised QIS 651 because in quantiser 120 the effective division by the time-varying scale factor 653 will generate sidebands; nevertheless the use of the normalised QIS for training will often be more attractive on grounds of computational efficiency than using a self-normalising adaptation algorithm such as NLMS. Normalisation will not normally be instantaneous, and sudden increases in the amplitude of the original audio signal may cause the signal 652 to take a large value transiently and possibly provoke instability through excessive adaptation speed. Such transient misbehaviour can be controlled by use of clipping, for example by clipping the signal 251 in
For the decoder 402 to implement a filter 442 having response F−n, the gain of F must not be zero at any frequency; moreover F must have a minimum-phase response if F−n is to be a causal and stable filter. The same conditions are required in the decoder 502. It is known that the LMS and similar algorithms do converge in the mean towards a minimum-phase response, but as the process is stochastic there is no guarantee that the minimum-phase condition will not be violated transiently.
In the z-transform domain, the minimum-phase condition is that the zeroes of F(z) should always lie inside the unit circle, so that the poles of F(z)−n also lie inside the unit circle. Ideally there should be some safety margin, since poles close to the unit circle result in high gain in the decoder, and hence amplification of any disturbances to the received QIS 155d caused by transmission channel errors.
An example of a signal that may cause trouble is a single sinewave, which theoretically will cause a plain LMS algorithm to converge to a filter having zeroes of F(z) on the unit circle. Coefficient leakage will restrain the zeroes somewhat, but leakage alone is not always sufficient and has disadvantages as already mentioned. In the case that filter 441 is a transversal (FIR) filter, the desired safety margin can be obtained by simple adjustment of coefficients as they are transferred along the copying path 450, using a technique known as ‘bandwidth expansion’ (P. Kabal, “III-Conditioning and Bandwidth Expansion in Linear Prediction of Speech”, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, pp. I-824-I-827, 2003). According to this technique, the coefficient of z−i is scaled (i.e., multiplied) by γi for some 0<γ<1. This has the effect of multiplying each zero of the filter 441 by a factor γ, i.e. bringing the zero to a position closer to the centre of the unit circle. If the unmodified response of the filter 441 was F(z)n and the response of the modified filter is F′(z)n, then
F′(z)=F(z/γ) (8)
The replacement of F(z) by F′(z) should of course be made consistently between filters 441 and 442 in the encoder 401 and decoder 402, implying that this replacement should also apply in filters 461 and 462; similarly the replacement should be made in filters 540 and 540d. For filters that are not plain transversal filters, the same principles can be applied, for example a coefficient of zi can be scaled by a factor γi in both the numerator and denominator of a biquad or other recursive filter. Although the coefficient modification must be applied consistently between an encoder and a decoder, consistent modification is not required in the different sections of a filter. For example, it may not be necessary to modify the denominator of a recursive filter F, or alternatively a different modification may be used.
A suitable value for γ may be found empirically. If γ is too small, prediction gain will be reduced excessively. If γ is too large, then there may be inadequate safety margin against violation of the minimum-phase condition. A use of γ<1 also restrains the encoder from placing excessively deep nulls in its prediction filter in the case that the input signal contains one or more pure sine waves. Such “notching out” in the encoder would require the decoder to provide a very high resonant peak in its response, increasing the sensitivity to transmission channel errors. Leakage alone will be ineffective in controlling this behaviour, and a combination of a modest amount of leakage and coefficient modification using γ is recommended. Bandwidth expansion multiplies the last tap of an n-tap filter by γn and it may be considered that coefficients that are thus multiplied by a factor of less say 0.5 do not provide enough decorrelation to justify their computational cost, and a filter with fewer taps should be considered. In practice a value γ=0.875 has been used for a fourth order filter or γ=0.995 for a 64th order filter.
Typical audio applications benefit from the use of more than one prediction filter, so that a single filter does not have to address conflicting requirements. For example, speech contains both voiced vowels that have a harmonic structure, and explosive consonants that cause sudden spectral changes and require rapid adaptation if prediction gain is to be maintained. The voiced consonants favour the use of a high order filter, for example a transversal filter with several tens of taps, while the explosive consonants are more suited to a low order filter having lower spectral resolution but configured for faster adaptation.
If the transfer functions F(z) and G(z) were to have the same modelling capability, and if the corresponding training filters 460 and 760 were to use the same adaptation algorithm, then these training filters 760 and 460 would converge to have the same response and the architecture of
While F(z) may provide high spectral resolution and G(z) may provide speed of response, broad spectral features that persist for some time may be ‘seen’ by both training filters, and therefore corrected by both forward-path filters F(z)n and G(z)m. These features will thereby be compressed spectrally in the ratio (m+n+1):1. If this higher compression spectral compression ratio is considered undesirable, it can be avoided by feeding the training filter 760 not from the QIS but from a signal such as 456 in
It will be clear that the architecture of
W(z)=K(z).Fn(z).Gm(z) (9)
or alternatively by
W(z)=K(z).Fn(z/γ1).Gm(z/γ2) (10)
where γ1, γ2 provide bandwidth expansion, perhaps differently for the two factors F and G as discussed earlier. The decoder's response W−1(z) is the inverse of the encoder's response W(z).
As in the decoder 402 of
Referring to
The ordering of F and G filters is significant, the two architectures of
It will be clear how the architectures of
Number | Date | Country | Kind |
---|---|---|---|
0915395.8 | Sep 2009 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/GB10/01663 | 9/2/2010 | WO | 00 | 11/15/2012 |