This invention relates to an encoder and a vector estimation system and method for processing a sequence of input vectors to determine a filtered estimate vector for each input vector. The invention is particularly useful for, but not necessarily limited to, determining filtered estimate vectors to be encoded by a speech encoder and transmitted over a communication link.
A digital speech communication or storage system typically uses a speech encoder to produce a parsimonious representation of the speech signal. A corresponding decoder is used to generate an approximation to the speech signal from that representation. The combination of the encoder and decoder is known in the art as a speech codec. As will be apparent to a person skilled in the art, many segments of speech signals contain quasiperiodic waveforms. Accordingly, consecutive cycles of quasiperiodic waveforms can be considered, and processed, by a speech codec as data vectors that evolve slowly over time.
An important element of a speech codec is the way it exploits correlation between consecutive cycles of quasiperiodic waveforms. Frequently, correlation is exploited by transmitting a single cycle of the waveform, or of a filtered version of the waveform, only once every 20–30 ms, so that a portion of the data is missing in the received signal. In a typical decoder the missing data is determined by interpolating between samples of the transmitted cycles.
In general, the use of interpolation by a speech decoder to generate data between the transmitted cycles only produces an adequate approximation to the speech signal if the speech signal really is quasiperiodic, or, equivalently, if the vectors representing consecutive cycles of the waveform evolve sufficiently slowly. However, many segments of speech contain noisy signal components, and this results in comparatively rapid evolution of the waveform cycles. In order for waveform interpolation in an encoder to be useful for such signals, it is necessary to extract a sufficiently quasiperiodic component from the noisy signal in the encoder. This extracted component may be encoded by transmitting only selected cycles and decoded by interpolation in the manner described above. The remaining noisy component may also be encoded using other appropriate techniques and combined with the quasiperiodic component in the decoder.
Linear low pass filtering a sequence of vectors representing consecutive cycles of speech in the time dimension is well known in the speech coding literature. The difficulty with this approach is that in order to get good separation of the slowly and rapidly evolving components, the low pass filter frequency response must have a sharp roll-off. This requires a long impulse response, which necessitates an undesirably large filter delay.
A Kalman filter technique for estimating quasiperiodic signal components has been described by Gruber and Todtli (IEEE Trans Signal Processing, Vol. 42, No. 3, March 1994, pp 552–562). However, because this Kalman filter technique is based on a linear dynamic system model of a frequency domain representation of the signal, it is unnecessarily complex. It also assumes that the dynamic system model parameters (i.e. noise energy and the harmonic signal gain) are known. However, when considering speech coding, noise energy and the harmonic signal gain parameters are not known.
A technique for determining the system parameters required in a Kalman filter using an Expectation Maximisation algorithm has been described in a more general setting by Digalakis et al (IEEE Trans Speech and Audio Processing, Vol. 1, No. 4, October 1993, pp 431–442). However, the technique is iterative, and in the absence of good initial estimates may converge slowly. It may also produce a result that is not globally optimal. No prior art method is known for obtaining good initial estimates. Further, this method typically requires a significant amount of data, over which the unknown parameters are constant. In the context of speech coding, where the parameters change continuously, rapid estimation is essential, and therefore this method of applying the Expectation Maximization algorithm needs to be improved.
Stachurski (PhD Thesis, McGill University, Montreal Canada, 1997) proposed a technique for estimating quasiperiodic signal components of a speech signal. This method involves minimizing a weighted combination of estimated noise energy and a measure of rate of change in the quasiperiodic component. This method is highly complex and does not allow the rate of evolution of the quasiperiodic component to be specified independently. Nor does it allow for an independently varying gain for the quasiperiodic component.
In this specification, including the claims, the terms comprises, comprising or similar terms are intended to mean a non-exclusive inclusion, such that a method or apparatus that comprises a list of elements does not include those elements solely, but may well include other elements not listed.
According to one aspect of the invention there is provided a vector estimation system for processing a sequence of input vectors, said input vectors each comprising a plurality of element values, and said system comprising:
Suitably, said parameter estimator may be characterised by said current predictor gain element values being dependent upon both a sequence of previous input vectors and a sequence of said previous filtered estimate vectors.
Preferably, said filter may have a predictor error variance output and an observation noise variance input, said predictor error variance output providing a current predictor error variance vector of current predictor error variance element values.
Suitably, when said vector estimation system receives said current input vector, said parameter estimator may provide a current observation noise variance vector of current observation noise variance element values at said observation noise variance output thereby modifying said current filtered estimate element values at said current slowly evolving filter estimate output, said current observation noise variance element values being dependent upon said previous filtered estimate vector received at said previous slowly evolving filter estimate input, said current input vector received at said estimator vector input, a said current predictor gain vector and said current predictor error variance vector.
Preferably, the parameter estimator may have an unvoiced speech module that determines the current input vector's harmonic energy content by assessing the current predictor gain element values and depending upon the current predictor gain element values the parameter estimator selectively sets the current observation noise variance values.
According to another aspect of the invention there is provided a vector estimation system for processing a sequence of input vectors, said input vectors each comprising a plurality of element values, and said system comprising:
Preferably, the parameter estimator may have an unvoiced speech module that determines the current input vector's harmonic energy content by assessing the current predictor gain element values and depending upon the current predictor gain element values the parameter estimator selectively sets the current observation noise variance values.
Suitably, said digital filter may further include: a slowly evolving predicted estimate output providing a current predicted estimate vector of current predicted estimate element values of said slowly evolving component of said sequence of input vectors. The digital filter may also have a process noise variance input.
Suitably, there may be a smoother module having inputs coupled respectively to at least two outputs of said digital filter.
Preferably, said smoother module may have five inputs coupled to respective outputs of said filter. Preferably, said smoother module may have a smoothed estimate output providing a smoothed estimate value of a previous slowly evolving component.
Suitably, said smoothed estimate output is coupled to a smoothed estimate input of said parameter estimator.
According to another aspect of the invention there is provided a method for processing a sequence of input vectors each comprising a plurality of elements, said vectors being applied to a vector estimation system having a parameter estimator coupled to a digital filter, said method comprising the steps of:
Preferably, said step of determining may be further characterised by providing a current observation noise variance vector of current observation noise variance element values and a current predictor error variance vector of current predictor error variance element values from said current input vector.
Suitably, said step of applying may be further characterised by said filter receiving said current observation noise variance element values thereby modifying said current filtered estimate element values, each of said current observation noise variance element values being dependent upon a said previous filtered estimate vector, said current input vector, a said current predictor gain element vector and said current predictor error variance vector.
According to another aspect of the invention there is provided a method for processing a sequence of input vectors each comprising a plurality of elements, said vectors being applied to a vector estimation system having a parameter estimator coupled to a digital filter, said method comprising the steps of:
Preferably, the filter may be a Kalman filter.
According to another aspect of the invention there is provided an encoder for processing a speech signal, said encoder comprising:
Preferably, the encoder may include an adder module with one input coupled to said slowly evolving filter estimate output and another input coupled to the output of the signal normalization module, wherein in use said adder subtracts the said current filtered estimate element values at the output of the vector estimation system from at least one of the elements of the sequence of input vectors.
Suitably, an output of the adder module may be coupled to a rapidly evolving component encoder.
Suitably, said parameter estimator may be characterised by said current predictor gain element values being dependent upon both a sequence of previous input vectors and a sequence of filtered estimate vectors.
In order that the invention may be readily understood and put into practical effect, reference will now be made to a preferred embodiment as illustrated with reference to the accompanying drawings in which:
In the drawings, like numerals on different Figs are used to indicate like elements throughout. Referring to
The parameter estimator 10 has four inputs and three outputs. The parameter estimator 10 inputs are an estimator vector input 19 coupled to the vector input 3; a previous slowly evolving filter estimate input 13 coupled to the previous slowly evolving filter estimate output 20; a current predictor error variance input 15 coupled to the current predictor error variance output 21; and a smoothed estimate input 16. The three outputs of the parameter estimator 10 are a predictor gain output 11 coupled to the predictor gain input 4; an observation noise variance output 12 coupled to the observation noise variance input 5; and an OnsetFlag output 22 coupled to the OnsetFlag input 26.
The smoother module 17 has six inputs one being coupled to the slowly evolving filter estimate output 6; one coupled to the slowly evolving predicted estimate output 7; one coupled to the previous filter error variance output 9; one coupled to the previous slowly evolving filter estimate output 20; one coupled to the predictor error variance output 21; and one coupled to the predictor gain output 11. The smoother module 17 also has a smoothed estimate output 18 providing an output for the vector estimation system 1, the smoothed estimate output 18 is coupled to the smoothed estimate input 16 of the parameter estimator 10.
Referring to
An output from the previous filtered state adjustment module 33 provides the previous slowly evolving filter estimate output 20 that is coupled to an input of a predicted state estimation module 35. Another input to the predicted state estimation module 35 is provided by the predictor gain input 4. An output of the predicted state estimation module 35 provides the slowly evolving predicted estimate output 7 that is coupled to an input of the filtered state estimation module 31.
The output from the Kalman gain determination module 30 is also coupled to an input of a filter variance estimation module 32 that has an output coupled to an input to a previous filter variance adjustment module 36. An output from the previous filter variance adjustment module 36 provides the previous filter error variance output 9 that also provides an input to a predictor variance estimation module 37. Other inputs to the predictor variance estimation module 37 are provided by the predictor gain input 4, process noise variance input 25, OnsetFlag input 26 and observation noise variance input 5. An output from the predictor variance estimation module 37 provides the predictor error variance output 21 that is coupled to inputs of the Kalman gain determination module 30, the filter variance estimation module 32 and previous filter variance adjustment module 36. Other inputs to the previous filter variance adjustment module 36 are provided by the predictor gain input 4, the process noise variance input 25 and the OnsetFlag input 26.
As will be apparent to a person skilled in the art, the characteristics of the digital filter 2 are formalised in equations (1)–(6) below.
At an nth input vector yn (a current input vector) of the series of input vectors (y0 to yT) received by the system 1, the previous filtered state adjustment module 33 provides, at the previous slowly evolving filter estimate output 20, a previous filtered estimate vector xf,n−1 of previous filtered estimate element values xf,n−1,i.
The OnsetFlag input 26 is a binary signal input that indicates whether or not the beginning of a signal segment containing a significant amount of harmonic energy (determined by a threshold value) has been detected. If OnsetFlag input 26 is set to a value that indicates that the beginning of such a segment has been detected, then the previous filtered estimate vector xf,n−1 is set to a previous input vector yn−1.
For the current input vector yn, the digital filter 2 provides a current predicted estimate vector xp,n of current predicted estimate element values xp,n,i at the predicted estimate output 7. Each of the current predicted estimate element values xp,n,i are computed according to:
xp,n,i=αn,i.xf,n−1,i (1)
Where i is an index identifying an element of a vector; and αn,i is a current predictor gain element value of a current predictor gain vector αn for an ith element in the nth input vector yn, provided at the predictor gain input 4.
Once the current predicted estimate vector xp,n is computed, then also for the current input vector yn a current filtered estimate vector xf,n of current filtered estimate element values xf,n,i is provided at the slowly evolving filter estimate output 6 Each of the current filtered estimate element values xf,n,i are computed according to:
xf,n,i=xp,n,i+kn,i.(yn,ixp,n,i) (2)
Where Kn,i is a current Kalman gain element value in a current Kalman gain vector Kn for the digital filter 2 for the ith element of the nth current input vector Yn.
The Kalman gain element value Kn,i is computed according to:
Kn,i=Σp,n,i/(Σp,n,i+σv
Where, Σp,n,i is a current predictor error variance element value in a current predictor error variance vector Σp,n provided at the predictor error variance output 21 for the ith element of the nth input vector yn;; and σv
If the OnsetFlag is set to a value that indicates that the beginning of a signal segment containing a significant amount of harmonic energy has been detected, then the current predictor error variance vector Σp,n is typically set to the observation noise variance vector
This results in Equation (3) producing the current Kalman gain element value Kn,i equal to 0.5 for all elements of the Kalman gain vector Kn.
If the OnsetFlag is set to a value that indicates that the beginning of a signal segment containing a significant amount of harmonic energy has not been detected, then the current predictor error variance element values Σp,n,i are computed according to:
Σp,n,i=αn,i.αn,i.Σf,n−1,i+σw2 (4)
where σw2 is a process noise variance value provided at the process noise variance input 25; and Σf,n−1,i is a previous filtered error variance element value in a previous filtered error variance vector Σf,n−1 for the ith element of a previous input vector yn−1.
If the OnsetFlag is set to a value that indicates that the beginning of a signal segment containing a significant amount of harmonic energy has not been detected then a current filtered error variance element value Σf,n,i of a current filtered error variance vector Σf,n provided at the output of the filter error variance estimation module 32, is computed according to:
Σf,n,i=(1−Kn,i).Σp,n,i (5)
If the OnsetFlag is set to a value that indicates that the beginning of a signal segment containing a significant amount of harmonic energy has been detected, then each current filtered error variance element value Σf,n,i is computed according to:
Σf,n,i=(Σp,n,i−σw2)/αn,i2 (6)
Referring to
The initial parameter estimation module 40 computes initial estimates of the current predictor gain element values αn,i and the current observation noise variance element values σvn,i2. These are determined as follows:
αn,i=αn(i,an,0, . . . an,m
where an,0 . . . an,m
In general, the functions in (7a) and (8a) may take on a variety of forms. In one preferred embodiment, where indexes ma and mb equal 2, the parameter estimator 10 computes estimates of the current predictor gain element values αn,i and the current observation noise variance element values σv
αn,i=an,0+an,1.i/N (7b)
It may be assumed that smoothness constraints apply to αn,i and
at boundaries between each cycle (input vector). We may assume, for example, that the function αn(i,an,0 . . . an,m
evaluated at i=0 is the same as
at i=N. Hence an,0 is equal to αn−1,N, and bn,0 is equal to
Furthermore, an,1 is calculated using the below equation (9) as follows:
And the parameter bn,1 is calculated by substituting equation (8b) into the below equation (10).
In order to determine bn,1 we need to substitute
by using equation (8b) and then substitute for xf,n,i by using equations (2) and (3). This results in the following equation (10b):
As will be apparent to a person skilled in the art, from equation (10b), bn,1 can be determined by an iterative method, such as the Newton-Raphson algorithm.
The unvoiced speech adjustment module 41 determines whether the current input vector yn represents a segment of speech that contains no significant harmonic energy, and if so selectively sets the current predictor gain vector αn and the current observation noise variance vector
appropriately. Preferably, the unvoiced speech adjustment unit determines that the current input vector yn represents a segment of speech that contains no significant harmonic energy by detecting whether either of the following conditions is true:
If either conditions (i) or (ii) hold, then typically the unvoiced speech adjustment module 41 will set αn,i to 1.0, and re-compute
accordingly using Equation (8).
The voicing onset adjustment module 42 determines if the current input vector yn represents the second cycle of a segment of speech containing a significant amount of harmonic energy, and if so adjusts current predictor gain element values αn,i and the observation noise variance element values
to more appropriate values and sets the OnsetFlag to a value indicating that voicing onset has been detected.
Typically, the voicing onset adjustment module 42 determines that the current input vector Yn is the second cycle of a segment of speech containing a significant amount of harmonic energy as follows. An input prediction gain, β, is computed according to:
β=(ynT.yn−1)/(yn−1T.yn−1) (11)
Input prediction error variance values, σe,i2, are computed according to:
σe,i2=yn,T.(yn,i−β.yn−1)/N (12)
where σe,i2 is the same for all elements in the vector σe2.
The voicing onset adjustment unit determines whether both of the following conditions are true:
If both conditions (iii) and (iv) hold, then typically the voicing onset adjustment unit will set αn,i to β and set
to σe,i2.
Referring to
The smoothed state estimation modules 50 provide smoothed estimates XS,(n−j),i for successive values of j beginning with j=1. These estimates are computed according to:
Xs,(n−j),i=xf,(n−j),i+C.(xs,(n−j+1),i−Xp,(n−l),i) (13)
wherein
C=(Σf,n−j,i.αn−j,i/Σp,(n−j+1,i)) (14)
and
Xs,n,i=Xf,n,i (15)
From the above it will be apparent that the purpose of the smoother module 17 is to provide an estimate Xs,(n−j) of the slowly evolving component of an input vector yn−j based upon input vectors up to and including yn. The smoother module 17 thus uses current data to estimate a past slowly evolving component value, in contrast to the digital filter 2, which uses current data to estimate a current slowly evolving component value.
In use, the vector estimation system 1 receives the sequence of input vectors y0 to yT that are each comprising N elements. Each of the input vectors y0 to yT contains a sampled period of a presumed quasiperiodic signal. This sampled signal is typically time warped to allow for variations of quasiperiodic periods, so that each input vector contains the same number of elements, as will be apparent to a person skilled in the art. Alternatively, consecutive input vectors y0 to yT may have elements added to them or removed from them, again so that the resulting number of elements in each is the same. For an nth iteration, an input vector yn will be applied to vector input 3 and estimator vector input 19. The digital filter 2 processes this input vector yn resulting in the slowly evolving filter estimate output 6 providing, to input 13, the previous filtered estimate vector xf,n−1 of a slowly evolving component of sequence of vectors y0 to yT.
The parameter estimator 10 processes the previous filtered estimate value xf,n−1 and current input vector yn to provide a current current predictor gain vector αn at predictor output 11. The current predictor gain vector an is thereby applied to input 4 of the digital filter 2 for controlling the gain thereof during filtering of input vector yn. The parameter estimator 10 determines the current predictor gain element values αn,i for the current predictor gain vector αn by the calculation stated in equation (7b).
As will be apparent to a person skilled in the art, at initialisation (i.e. the first sample time when n is 0 therefore input vector y0 is applied to digital filter system 1), there will be no previous filtered estimate element values xf,n−1,i. Accordingly, although there are many ways to allocate values for the previous filtered estimate values xf,n−1,i, the present invention preferably assigns the previous filtered estimate values xf,n−1,i with the same element values as input vector y0.
Referring to
In operation, the speech encoder 60 firstly normalizes a speech signal with respect to its spectral envelope, energy and period. The normalisation process involves estimating parameters that describe the spectral envelope, energy and period of the input signal and these parameters are typically transmitted to a speech decoder at outputs 66, 67, 68. The process noise variance provided at the process noise variance input 25 is typically used to control the vector estimation system 1. The normalisation process produces the sequence of input vectors (y0 to yT) for the vector estimation system 1. The sequence of input vectors (y0 to yT) are a sequence of fixed length vectors representing sampled consecutive cycles of the normalised waveform. These vectors (y0 to yT) are applied to the filter vector input 3 of the vector estimation system 1, which generates a slowly evolving component at the smoothed estimate output 18. By subtracting this slowly evolving component from the input vectors (y0 to yT) a rapidly evolving, or noise-like component is produced and provided to the rapidly evolving component encoder 64. The slowly evolving and rapidly evolving components are encoded respectively by the slowly and rapidly evolving component encoders 65, 64,. The encoders 64, 65 use appropriate methods known in the art to produce parameters at respective outputs 70, 69 which are transmitted to a speech decoder.
Advantageously, the present invention provides for the vector estimation system 1 to receive the current input vector yn that is one of the sequence of input vectors y0 to yT. The parameter estimator 10 then provides the current predictor gain element values αn,i, at the predictor gain output 11, thereby modifying the current filtered estimate element values xf,n,i at the slowly evolving filter estimate output 6 (see equations (1) and (2)). The current predictor gain element values αn,i are dependent upon the previous filtered estimate vector xf,n−1 and the current input vector yn (see equations (7b) and (9)) As will be apparent to a person skilled in the art, the parameter estimator 10 determines the current predictor gain element values αn,i from both a sequence of input vectors yn to y0 and a sequence of previous filtered estimate vectors xf,0 to xf,n−1.
The present invention also advantageously allows for the parameter estimator 10 to provide the current observation noise variance values σv
are dependent upon the current input vector yn, the current predictor gain element vector αn, the current predictor error variance vector Σp,n, and the previous filtered estimate vector xf,n−1 (see equations ((10a), (10b) and (8b)).
The detailed description provides a preferred exemplary embodiment only, and is not intended to limit the scope, applicability, or configuration of the invention. Rather, the detailed description of the preferred exemplary embodiment provides those skilled in the art with an enabling description for implementing a preferred exemplary embodiment of the invention. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth in the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5267317 | Kleijn | Nov 1993 | A |
5517595 | Kleijn | May 1996 | A |
5694474 | Ngo et al. | Dec 1997 | A |
5761383 | Engel et al. | Jun 1998 | A |
5884253 | Kleijn | Mar 1999 | A |
5924061 | Shoham | Jul 1999 | A |
6107963 | Ohmi et al. | Aug 2000 | A |
6272479 | Farry et al. | Aug 2001 | B1 |
6691092 | Udaya Bhaskar et al. | Feb 2004 | B1 |
20020116184 | Gottsman et al. | Aug 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
20030125937 A1 | Jul 2003 | US |