Periodicity enhancement in decoding wideband signals

Information

  • Patent Grant
  • 6795805
  • Patent Number
    6,795,805
  • Date Filed
    Monday, July 23, 2001
    23 years ago
  • Date Issued
    Tuesday, September 21, 2004
    20 years ago
Abstract
An alternative approach by which periodicity enhancement of an excitation signal is achieved through filtering an innovative codevector by an innovation filter to reduce low frequency content of the innovative codevector and enhance the periodicity at low frequencies more than high frequencies.
Description




BACKGROUND OF THE INVENTION




1. Field of the Invention




The present invention relates to a method and device for enhancing periodicity of the excitation of a signal synthesis filter in view of producing a synthesized wideband signal.




2. Brief Description of the Prior Art




The demand for efficient digital wideband speech/audio encoding techniques with a good subjective quality/bit rate trade-off is increasing for numerous applications such as audio/video teleconferencing, multimedia, and wireless applications, as well as Internet and packet network applications. Until recently, telephone bandwidths filtered in the range 200-3400 Hz were mainly used in speech coding applications. However, there is an increasing demand for wideband speech applications in order to increase the intelligibility and naturalness of the speech signals. A bandwidth in the range 50-7000 Hz was found sufficient for delivering a face-to-face speech quality. For audio signals, this range gives an acceptable audio quality, but still lower than the CD quality which operates on the range 20-20000 Hz.




A speech encoder converts a speech signal into a digital bitstream which is transmitted over a communication channel (or stored in a storage medium). The speech signal is digitized (sampled and quantized with usually 16-bits per sample) and the speech encoder has the role of representing these digital samples with a smaller number of bits while maintaining a good subjective speech quality. The speech decoder or synthesizer operates on the transmitted or stored bit stream and converts it back to a sound signal.




One of the best prior art techniques capable of achieving a good quality/bit rate trade-off is the so-called Code Excited Linear Prediction (CELP) technique. According to this technique, the sampled speech signal is processed in successive blocks of L samples usually called frames where L is some predetermined number (corresponding to 10-30 ms of speech). In CELP, a linear prediction (LP) synthesis filter is computed and transmitted every frame. The L-sample frame is then divided into smaller blocks called subframes of size N samples, where L=kN and k is the number of subframes in a frame (N usually corresponds to 4-10 ms of speech). An excitation signal is determined in each subframe, which usually consists of two components: one from the past excitation (also called pitch contribution or adaptive codebook or pitch codebook) and the other from an innovative codebook (also called fixed codebook). This excitation signal is transmitted and used at the decoder as the input of the LP synthesis filter in order to obtain the synthesized speech.




An innovative codebook in the CELP context, is an indexed set of N-sample-long sequences which will be referred to as N-dimensional codevectors. Each codebook sequence is indexed by an integer k ranging from 1 to M where M represents the size of the codebook often expressed as a number of bits b, where M=2


b


.




To synthesize speech according to the CELP technique, each block of N samples is synthesized by filtering an appropriate codevector from a codebook through time varying filters modeling the spectral characteristics of the speech signal. At the encoder end, the synthesis output is computed for all, or a subset, of the codevectors from the codebook (codebook search). The retained codevector is the one producing the synthesis output closest to the original speech signal according to a perceptually weighted distortion measure. This perceptual weighting is performed using a so-called perceptual weighting filter, which is usually derived from the LP synthesis filter.




The CELP model has been very successful in encoding telephone band sound signals, and several CELP-based standards exist in a wide range of applications, especially in digital cellular applications. In the telephone band, the sound signal is band-limited to 200-3400 Hz and sampled at 8000 samples/sec. In wideband speech/audio applications, the sound signal is band-limited to 50-7000 Hz and sampled at 16000 samples/sec.




Some difficulties arise when applying the telephone-band optimized CELP model to wideband signals, and additional features need to be added to the model in order to obtain high quality wideband signals.




Enhancing the periodicity of the excitation signal improves the quality in case of voiced segments. This was done in the past by filtering the innovative codevector from the fixed codebook through a filter having a transfer function of the form 1/(1−εbz


−T


) where ε is a factor below 0.5 which controls the amount of introduced periodicity. This approach is less efficient in case of wideband signals since it introduces the periodicity over the entire spectrum.




SUMMARY OF THE INVENTION




More specifically, in accordance with the present invention, there is provided a method for enhancing periodicity of an excitation signal produced in relation to a pitch codevector and an innovative codevector for supplying a signal synthesis filter in view synthesizing a wideband signal. In this periodicity enhancing method, a periodicity factor related to the wideband signal is calculated. Then, the innovative codevector is filtered in relation to the periodicity factor to thereby reduce energy of a low frequency portion of the innovative codevector and enhance periodicity of a low frequency portion of the excitation signal.




The device of the invention, for enhancing periodicity of an excitation signal produced in relation to adaptive and innovative codevectors for supplying a signal synthesis filter in view of synthesizing a wideband signal, comprises:




a) a factor generator for calculating a periodicity factor related to said wideband signal; and




b) an innovative filter for filtering the innovative codevector in relation to the periodicity factor to thereby reduce energy of a low frequency portion of the innovative codevector and enhance periodicity of a low frequency portion of the excitation signal.




According to a first preferred embodiment:




the innovative codevector is filtered with a transfer function of the form:








F


(


z


)=−α


z


+1


−αz




−1








where α is the periodicity factor derived from a level of periodicity of the excitation signal; and




the periodicity factor α is calculated using the relation:






α=


qR




p


bounded by α<


q








where q is an enhancement factor set for example to 0.25, and where







R
p

=




b
2



v
T
t



v
T




u
t


u


=



b
2






n
=
0


N
-
1









v
T
2



(
n
)








n
=
0


N
-
1









u
2



(
n
)















where v


T


is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal, or




the relation:






α=0.125 (1


+r




v


), where










r




v


=(


E




v




−E




c


)/(


E




v




+E




c


)






where E


v


is the energy of the pitch codevector and E


c


is the energy of the innovative codevector.




According to a second preferred embodiment:




the the innovative codevector is filtered with a transfer function of the form:








F


(


z


)=1


−σz




−1








where σ is a periodicity factor derived from a level of periodicity of the excitation signal; and




the periodicity factor σ is calculated using the relation:




 σ=2


qR




p


bounded by σ<2


q






where q is an enhancement factor set for example to 0.25, and where







R
p

=




b
2



v
T
t



v
T




u
t


u


=



b
2






n
=
0


N
-
1









v
T
2



(
n
)








n
=
0


N
-
1









u
2



(
n
)















where v


T


is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal, or




the relation:






σ=0.25 (1


+r




v


), where










r




v


=(


E




v




−E




c


)/(


E




v




+E




c


)






where E


v


is the energy of the pitch codevector and E


c


is the energy of the innovative codevector.




The present invention further relates to a decoder for producing a synthesized wideband signal, comprising:




a) a signal fragmenting device for receiving an encoded wideband signal and extracting from this encoded wideband signal at least pitch codebook parameters, innovative codebook parameters, and synthesis filter coefficients;




b) an pitch codebook responsive to the pitch codebook parameters for producing a pitch codevector;




c) an innovative codebook responsive to innovative codebook parameters for producing an innovative codevector;




d) a periodicity enhancing device as described above, comprising the factor generator for calculating a periodicity factor related to the wideband signal; and the innovation filter for filtering the innovative codevector in relation to the periodicity factor;




e) a combiner circuit for combining the pitch codevector and the innovative codevector filtered by the innovation filter to thereby produce a periodicity-enhanced excitation signal; and




f) a signal synthesis filter for filtering that periodicity-enhanced excitation signal in relation to the synthesis filter coefficients to thereby produce the synthesized wideband signal.




According to the present invention, in a decoder for producing a synthesized wideband signal, comprising: a signal fragmenting device for receiving an encoded wideband signal and extracting from this encoded wideband signal at least pitch codebook parameters, innovative codebook parameters, and synthesis filter coefficients; an pitch codebook responsive to the pitch codebook parameters for producing a pitch codevector; an innovative codebook responsive to innovative codebook parameters for producing an innovative codevector; a combiner circuit for combining the pitch codevector and the innovative codevector to thereby produce an excitation signal; and a signal synthesis filter for filtering that excitation signal in relation to the synthesis filter coefficients to thereby produce the synthesized wideband signal; the improvement therein comprising a periodicity enhancing device as described above, comprising the factor generator for calculating a periodicity factor related to the wideband signal; and the innovation filter for filtering the innovative codevector in relation to the periodicity factor before supplying this innovative codevector to the combiner circuit.




The present invention still further relates to a cellular communication system, a cellular mobile transmitter/receiver unit, a cellular network element, and a bidirectional wireless communication sub-system comprising the above described decoder.




The objects, advantages and other features of the present invention will become more apparent upon reading of the following non restrictive description of a preferred embodiment thereof, given by way of example only with reference to the accompanying drawings.











BRIEF DESCRIPTION OF THE DRAWINGS




In the appended drawings:





FIG. 1

is a schematic block diagram of a preferred embodiment of wideband encoding device;





FIG. 2

is a schematic block diagram of a preferred embodiment of wideband decoding device;





FIG. 3

is a schematic block diagram of a preferred embodiment of pitch analysis device; and





FIG. 4

is a simplified, schematic block diagram of a cellular communication system in which the wideband encoding device of FIG.


1


and the wideband decoding device of

FIG. 2

can be used.











DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT




As well known to those of ordinary skill in the art, a cellular communication system such as


401


(see

FIG. 4

) provides a telecommunication service over a large geographic area by dividing that large geographic area into a number C of smaller cells. The C smaller cells are serviced by respective cellular base stations


402




1


,


402




2


. . .


402




C


to provide each cell with radio signaling, audio and data channels.




Radio signaling channels are used to page mobile radiotelephones (mobile transmitter/receiver units) such as


403


within the limits of the coverage area (cell) of the cellular base station


402


, and to place calls to other radiotelephones


403


located either inside or outside the base station's cell or to another network such as the Public Switched Telephone Network (PSTN)


404


.




Once a radiotelephone


403


has successfully placed or received a call, an audio or data channel is established between this radiotelephone


403


and the cellular base station


402


corresponding to the cell in which the radiotelephone


403


is situated, and communication between the base station


402


and radiotelephone


403


is conducted over that audio or data channel. The radiotelephone


403


may also receive control or timing information over a signaling channel while a call is in progress.




If a radiotelephone


403


leaves a cell and enters another adjacent cell while a call is in progress, the radiotelephone


403


hands over the call to an available audio or data channel of the new cell base station


402


. If a radiotelephone


403


leaves a cell and enters another adjacent cell while no call is in progress, the radiotelephone


403


sends a control message over the signaling channel to log into the base station


402


of the new cell. In this manner mobile communication over a wide geographical area is possible.




The cellular communication system


401


further comprises a control terminal


405


to control communication between the cellular base stations


402


and the PSTN


404


, for example during a communication between a radiotelephone


403


and the PSTN


404


, or between a radiotelephone


403


located in a first cell and a radiotelephone


403


situated in a second cell.




Of course, a bidirectional wireless radio communication subsystem is required to establish an audio or data channel between a base station


402


of one cell and a radiotelephone


403


located in that cell. As illustrated in very simplified form in

FIG. 4

, such a bidirectional wireless radio communication subsystem typically comprises in the radiotelephone


403


:




a transmitter


406


including:




an encoder


407


for encoding the voice signal; and




a transmission circuit


408


for transmitting the encoded voice signal from the encoder


407


through an antenna such as


409


; and




a receiver


410


including:




a receiving circuit


411


for receiving a transmitted encoded voice signal usually through the same antenna


409


; and




a decoder


412


for decoding the received encoded voice signal from the receiving circuit


411


.




The radiotelephone further comprises other conventional radiotelephone circuits


413


to which the encoder


407


and decoder


412


are connected and for processing signals therefrom, which circuits


413


are well known to those of ordinary skill in the art and, accordingly, will not be further described in the present specification.




Also, such a bidirectional wireless radio communication subsystem typically comprises in the base station


402


:




a transmitter


414


including:




an encoder


415


for encoding the voice signal; and




a transmission circuit


416


for transmitting the encoded voice signal from the encoder


415


through an antenna such as


417


; and




a receiver


418


including:




a receiving circuit


419


for receiving a transmitted encoded voice signal through the same antenna


417


or through another antenna (not shown); and




a decoder


420


for decoding the received encoded voice signal from the receiving circuit


419


.




The base station


402


further comprises, typically, a base station controller


421


, along with its associated database


422


, for controlling communication between the control terminal


405


and the transmitter


414


and receiver


418


.




As well known to those of ordinary skill in the art, voice encoding is required in order to reduce the bandwidth necessary to transmit sound signal, for example voice signal such as speech, across the bidirectional wireless radio communication subsystem, i.e., between a radiotelephone


403


and a base station


402


.




LP voice encoders (such as


415


and


407


) typically operating at 13 kbits/second and below such as Code-Excited Linear Prediction (CELP) encoders typically use a LP synthesis filter to model the short-term spectral envelope of the voice signal. The LP information is transmitted, typically, every 10 or 20 ms to the decoder (such


420


and


412


) and is extracted at the decoder end.




The novel techniques disclosed in the present specification may apply to different LP-based coding systems. However, a CELP-type coding system is used in the preferred embodiment for the purpose of presenting a non-limitative illustration of these techniques. In the same manner, such techniques can be used with sound signals other than voice and speech as well with other types of wideband signals.





FIG. 1

shows a general block diagram of a CELP-type speech encoding device


100


modified to better accommodate wideband signals.




The sampled input speech signal


114


is divided into successive L-sample blocks called “frames”. In each frame, different parameters representing the speech signal in the frame are computed, encoded, and transmitted. LP parameters representing the LP synthesis filter are usually computed once every frame. The frame is further divided into smaller blocks of N samples (blocks of length N), in which excitation parameters (pitch and innovation) are determined. In the CELP literature, these blocks of length N are called “subframes” and the N-sample signals in the subframes are referred to as N-dimensional vectors. In this preferred embodiment, the length N corresponds to 5 ms while the length L corresponds to 20 ms, which means that a frame contains four subframes (N=80 at the sampling rate of 16 kHz and 64 after down-sampling to 12.8 kHz). Various N-dimensional vectors occur in the encoding procedure. A list of the vectors which appear in

FIGS. 1 and 2

as well as a list of transmitted parameters are given herein below:




List of the Main N-dimensional Vectors




s Wideband signal input speech vector (after down-sampling, preprocessing, and preemphasis);




s


w


Weighted speech vector;




s


o


Zero-input response of weighted synthesis filter;




s


p


Down-sampled pre-processed signal;




Oversampled synthesized speech signal;




s′ Synthesis signal before deemphasis;




s


d


Deemphasized synthesis signal;




s


h


Synthesis signal after deemphasis and postprocessing;




x Target vector for pitch search;




x′ Target vector for innovation search;




h Weighted synthesis filter impulse response;




v


T


Adaptive (pitch) codebook vector at delay T;




y


T


Filtered pitch codebook vector (v


T


convolved with h);




c


k


Innovative codevector at index k (k-th entry from the innovation codebook);




c


f


Enhanced scaled innovation codevector;




u Excitation signal (scaled innovation and pitch codevectors);




u′ Enhanced excitation;




z Band-pass noise sequence;




w′ White noise sequence; and




w Scaled noise sequence.




List of Transmitted Parameters




STP Short term prediction parameters (defining A(z));




T Pitch lag (or pitch codebook index);




b Pitch gain (or pitch codebook gain);




j Index of the low-pass filter used on the pitch codevector;




k Codevector index (innovation codebook entry); and




g Innovation codebook gain.




In this preferred embodiment, the STP parameters are transmitted once per frame and the rest of the parameters are transmitted four times per frame (every subframe).




Encoder Side




The sampled speech signal is encoded on a block by block basis by the encoding device


100


of

FIG. 1

which is broken down into eleven modules numbered from


101


to


111


.




The input speech is processed into the above mentioned L-sample blocks called frames.




Referring to

FIG. 1

, the sampled input speech signal


114


is down-sampled in a down-sampling module


101


. For example, the signal is down-sampled from 16 kHz down to 12.8 kHz, using techniques well known to those of ordinary skill in the art. Down-sampling down to another frequency can of course be envisaged. Down-sampling increases the coding efficiency, since a smaller frequency bandwidth is encoded. This also reduces the algorithmic complexity since the number of samples in a frame is decreased. The use of down-sampling becomes significant when the bit rate is reduced below 16 kbit/s, although down-sampling is not essential above 16 kbit/s.




After down-sampling, the 320-sample frame of 20 ms is reduced to 256-sample frame (down-sampling ratio of 4/5).




The input frame is then supplied to the optional pre-processing block


102


. Pre-processing block


102


may consist of a high-pass filter with a 50 Hz cut-off frequency. High-pass filter


102


removes the unwanted sound components below 50 Hz.




The down-sampled pre-processed signal is denoted by s


p


(n), n=0, 1, 2, . . . , L−1, where L is the length of the frame (256 at a sampling frequency of 12.8 kHz). In a preferred embodiment of the preemphasis filter


103


, the signal s


p


(n) is preemphasized using a filter having the following transfer function:








P


(


z


=1


−μz




−1








where μ is a preemphasis factor with a value located between 0 and 1 (a typical value is μ=0.7). A higher-order filter could also be used. It should be pointed out that high-pass filter


102


and preemphasis filter


103


can be interchanged to obtain more efficient fixed-point implementations.




The function of the preemphasis filter


103


is to enhance the high frequency contents of the input signal. It also reduces the dynamic range of the input speech signal, which renders it more suitable for fixed-point implementation. Without preemphasis, LP analysis in fixed-point using single-precision arithmetic is difficult to implement.




Preemphasis also plays an important role in achieving a proper overall perceptual weighting of the quantization error, which contributes to improved sound quality. This will be explained in more detail herein below.




The output of the preemphasis filter


103


is denoted s(n). This signal is used for performing LP analysis in calculator module


104


. LP analysis is a technique well known to those of ordinary skill in the art. In this preferred embodiment, the autocorrelation approach is used. In the autocorrelation approach, the signal s(n) is first windowed using a Hamming window (having usually a length of the order of 30-40 ms). The autocorrelations are computed from the windowed signal, and Levinson-Durbin recursion is used to compute LP filter coefficients, a


i


, where i=1, . . . , p, and where p is the LP order, which is typically 16 in wideband coding. The parameters a


i


are the coefficients of the transfer function of the LP filter, which is given by the following relation:







A


(
z
)


=

1
+




i
=
1

p








a
i



z

-
1















LP analysis is performed in calculator module


104


, which also performs the quantization and interpolation of the LP filter coefficients. The LP filter coefficients are first transformed into another equivalent domain more suitable for quantization and interpolation purposes. The line spectral pair (LSP) and immitance spectral pair (ISP) domains are two domains in which quantization and interpolation can be efficiently performed. The 16 LP filter coefficients, a


i


, can be quantized in the order of 30 to 50 bits using split or multi-stage quantization, or a combination thereof. The purpose of the interpolation is to enable updating the LP filter coefficients every subframe while transmitting them once every frame, which improves the encoder performance without increasing the bit rate. Quantization and interpolation of the LP filter coefficients is believed to be otherwise well known to those of ordinary skill in the art and, accordingly, will not be further described in the present specification.




The following paragraphs will describe the rest of the coding operations performed on a subframe basis. In the following description, the filter A(z) denotes the unquantized interpolated LP filter of the subframe, and the filter Â(z) denotes the quantized interpolated LP filter of the subframe.




Perceptual Weighting:




In analysis-by-synthesis encoders, the optimum pitch and innovation parameters are searched by minimizing the mean squared error between the input speech and synthesized speech in a perceptually weighted domain. This is equivalent to minimizing the error between the weighted input speech and weighted synthesis speech.




The weighted signal s


w


(n) is computed in a perceptual weighting filter


105


. Traditionally, the weighted signal s


w


(n) is computed by a weighting filter having a transfer function W(z) in the form:








W


(


z


)=


A


(


z/γ




1


)/


A


(


z/γ




2


)






where






0<γ


2





1


≦1






As well known to those of ordinary skill in the art, in prior art analysis-by-synthesis (AbS) encoders, analysis shows that the quantization error is weighted by a transfer function W


−1


(z), which is the inverse of the transfer function of the perceptual weighting filter


105


. This result is well described by B. S. Atal and M. R. Schroeder in “Predictive coding of speech and subjective error criteria”, IEEE Transaction ASSP, vol. 27, no. 3, pp. 247-254, June 1979. Transfer function W


−1


(z) exhibits some of the formant structure of the input speech signal. Thus, the masking property of the human ear is exploited by shaping the quantization error so that it has more energy in the formant regions where it will be masked by the strong signal energy present in these regions. The amount of weighting is controlled by the factors γ


1


and γ


2


.




The above traditional perceptual weighting filter


105


works well with telephone band signals. However, it was found that this traditional perceptual weighting filter


105


is not suitable for efficient perceptual weighting of wideband signals. It was also found that the traditional perceptual weighting filter


105


has inherent limitations in modeling the formant structure and the required spectral tilt concurrently. The spectral tilt is more pronounced in wideband signals due to the wide dynamic range between low and high frequencies. The prior art has suggested to add a tilt filter into W(z) in order to control the tilt and formant weighting of the wideband input signal separately.




A novel solution to this problem is, in accordance with the present invention, to introduce the preemphasis filter


103


at the input, compute the LP filter A(z) based on the preemphasized speech s(n), and use a modified filter W(z) by fixing its denominator.




LP analysis is performed in module


104


on the preemphasized signal s(n) to obtain the LP filter A(z). Also, a new perceptual weighting filter


105


with fixed denominator is used. An example of transfer function for the perceptual weighting filter


104


is given by the following relation:








W


(


z


)=


A


(


z/γ




1


)/(1−γ


2




z




−1


)






where






0<γ


2





1


≦1






A higher order can be used at the denominator. This structure substantially decouples the formant weighting from the tilt.




Note that because A(z) is computed based on the preemphasized speech signal s(n), the tilt of the filter 1/A(z/γ


1


) is less pronounced compared to the case when A(z) is computed based on the original speech. Since deemphasis is performed at the decoder end using a filter having the transfer function:








P




−1


(


z


)=1/(1


−μz




−1


),






the quantization error spectrum is shaped by a filter having a transfer function W


−1


(z)P


−1


(z). When γ


2


is set equal to μ, which is typically the case, the spectrum of the quantization error is shaped by a filter whose transfer function is 1/A(z/γ


1


), with A(z) computed based on the preemphasized speech signal. Subjective listening showed that this structure for achieving the error shaping by a combination of preemphasis and modified weighting filtering is very efficient for encoding wideband signals, in addition to the advantages of ease of fixed-point algorithmic implementation.




Pitch Analysis:




In order to simplify the pitch analysis, an open-loop pitch lag T


OL


is first estimated in the open-loop pitch search module


106


using the weighted speech signal s


w


(n). Then the closed-loop pitch analysis, which is performed in closed-loop pitch search module


107


on a subframe basis, is restricted around the open-loop pitch lag T


OL


which significantly reduces the search complexity of the LTP parameters T and b (pitch lag and pitch gain). Open-loop pitch analysis is usually performed in module


106


once every 10 ms (two subframes) using techniques well known to those of ordinary skill in the art.




The target vector x for LTP (Long Term Prediction) analysis is first computed. This is usually done by subtracting the zero-input response s


0


of weighted synthesis filter W(z)/Â(z) from the weighted speech signal s


w


(n). This zero-input response s


0


is calculated by a zero-input response calculator


108


. More specifically, the target vector x is calculated using the following relation:








x=s




w




−s




0








where x is the N-dimensional target vector, s


w


is the weighted speech vector in the subframe, and s


0


is the zero-input response of filter W(z)/Â(z) which is the output of the combined filter W(z)/Â(z) due to its initial states.




The zero-input response calculator


108


is responsive to the quantized interpolated LP filter Â(z) from the LP analysis, quantization and interpolation calculator


104


and to the initial states of the weighted synthesis filter W(z)/Â(z) stored in memory module


111


to calculate the zero-input response so (that part of the response due to the initial states as determined by setting the inputs equal to zero) of filter W(z)/Â(z). This operation is well known to those of ordinary skill in the art and, accordingly, will not be further described.




Of course, alternative but mathematically equivalent approaches can be used to compute the target vector x.




A N-dimensional impulse response vector h of the weighted synthesis filter W(z)/Â(z) is computed in the impulse response generator


109


using the LP filter coefficients A(z) and Â(z) from module


104


. Again, this operation is well known to those of ordinary skill in the art and, accordingly, will not be further described in the present specification.




The closed-loop pitch (or pitch codebook) parameters b, T and j are computed in the closed-loop pitch search module


107


, which uses the target vector x, the impulse response vector h and the open-loop pitch lag T


OL


as inputs. Traditionally, the pitch prediction has been represented by a pitch filter having the following transfer function:






1/(1


−bz




−T


)






where b is the pitch gain and T is the pitch delay or lag. In this case, the pitch contribution to the excitation signal u(n) is given by bu(n−T), where the total excitation is given by








u


(


n


)=


bu


(


n−T


)+


gc




k


(


n


)






with g being the innovative codebook gain and c


k


(n) the innovative codevector at index k.




This representation has limitations if the pitch lag T is shorter than the subframe length N. In another representation, the pitch contribution can be seen as an pitch codebook containing the past excitation signal. Generally, each vector in the pitch codebook is a shift-by-one version of the previous vector (discarding one sample and adding a new sample). For pitch lags T>N, the pitch codebook is equivalent to the filter structure (1/(1−bz


−T


), and an pitch codebook vector v


T


(n) at pitch lag T is given by








v




T


(


n


)=


u


(


n−T


),


n


=0


, . . . , N


−1.






For pitch lags T shorter than N, a vector v


T


(n) is built by repeating the available samples from the past excitation until the vector is completed (this is not equivalent to the filter structure).




In recent encoders, a higher pitch resolution is used which significantly improves the quality of voiced sound segments. This is achieved by oversampling the past excitation signal using polyphase interpolation filters. In this case, the vector v


T


(n) usually corresponds to an interpolated version of the past excitation, with pitch lag T being a non-integer delay (e.g. 50.25).




The pitch search consists of finding the best pitch lag T and gain b that minimize the mean squared weighted error E between the target vector x and the scaled filtered past excitation. Error E being expressed as:








E=∥x−by




T





2








where y


T


is the filtered pitch codebook vector at pitch lag T:









y
T



(
n
)


=




v
T



(
n
)


*

h


(
n
)



=




i
=
0

n









v
T



(
i
)



h


(

n
-
i

)





,

n
=
0

,





,

N
-
1.











It can be shown that the error E is minimized by maximizing the search criterion






C
=



x
t



y
T





y
T
t



y
T














where t denotes vector transpose.




In the preferred embodiment of the present invention, a 1/3 subsample pitch resolution is used, and the pitch (pitch codebook) search is composed of three stages.




In the first stage, an open-loop pitch lag T


OL


is estimated in open-loop pitch search module


106


in response to the weighted speech signal s


w


(n). As indicated in the foregoing description, this open-loop pitch analysis is usually performed once every 10 ms (two subframes) using techniques well known to those of ordinary skill in the art.




In the second stage, the search criterion C is searched in the closed-loop pitch search module


107


for integer pitch lags around the estimated open-loop pitch lag T


OL


(usually ±5), which significantly simplifies the search procedure. A simple procedure is used for updating the filtered codevector y


T


without the need to compute the convolution for every pitch lag.




Once an optimum integer pitch lag is found in the second stage, a third stage of the search (module


107


) tests the fractions around that optimum integer pitch lag.




When the pitch predictor is represented by a filter of the form 1/(1−bz


−T


), which is a valid assumption for pitch lags T>N, the spectrum of the pitch filter exhibits a harmonic structure over the entire frequency range, with a harmonic frequency related to 1/T. In case of wideband signals, this structure is not very efficient since the harmonic structure in wideband signals does not cover the entire extended spectrum. The harmonic structure exists only up to a certain frequency, depending on the speech segment. Thus, in order to achieve efficient representation of the pitch contribution in voiced segments of wideband speech, the pitch prediction filter needs to have the flexibility of varying the amount of periodicity over the wideband spectrum.




A new method which achieves efficient modeling of the harmonic structure of the speech spectrum of wideband signals is disclosed in the present specification, whereby several forms of low pass filters are applied to the past excitation and the low pass filter with higher prediction gain is selected.




When subsample pitch resolution is used, the low pass filters can be incorporated into the interpolation filters used to obtain the higher pitch resolution. In this case, the third stage of the pitch search, in which the fractions around the chosen integer pitch lag are tested, is repeated for the several interpolation filters having different low-pass characteristics and the fraction and filter index which maximize the search criterion C are selected.




A simpler approach is to complete the search in the three stages described above to determine the optimum fractional pitch lag using only one interpolation filter with a certain frequency response, and select the optimum low-pass filter shape at the end by applying the different predetermined low-pass filters to the chosen pitch codebook vector v


T


and select the low-pass filter which minimizes the pitch prediction error. This approach is discussed in detail below.





FIG. 3

illustrates a schematic block diagram of a preferred embodiment of the proposed approach.




In memory module


303


, the past excitation signal u(n), n<0, is stored. The pitch codebook search module


301


is responsive to the target vector x, to the open-loop pitch lag T


OL


and to the past excitation signal u(n), n<0, from memory module


303


to conduct a pitch codebook (pitch codebook) search minimizing the above-defined search criterion C. From the result of the search conducted in module


301


, module


302


generates the optimum pitch codebook vector v


T


. Note that since a sub-sample pitch resolution is used (fractional pitch), the past excitation signal u(n), n<0, is interpolated and the pitch codebook vector v


T


corresponds to the interpolated past excitation signal. In this preferred embodiment, the interpolation filter (in module


301


, but not shown) has a low-pass filter characteristic removing the frequency contents above 7000 Hz.




In a preferred embodiment, K filter characteristics are used; these filter characteristics could be low-pass or band-pass filter characteristics. Once the optimum codevector v


T


is determined and supplied by the pitch codevector generator


302


, K filtered versions of v


T


are computed respectively using K different frequency shaping filters such as


305




(j)


, where j=1, 2, . . . , K. These filtered versions are denoted v


f




(j)


, where j=1, 2, . . . , K. The different vectors v


f




(j)


are convolved in respective modules


304




(j)


, where j=0, 1, 2, . . . , K, with the impulse response h to obtain the vectors y


(j)


, where j=0, 1, 2, . . . , K. To calculate the mean squared pitch prediction error for each vector y


(j)


, the value y


(j)


is multiplied by the gain b by means of a corresponding amplifier


307




(j)


and the value by


(j)


is subtracted from the target vector x by means of a corresponding subtractor


308




(j)


. Selector


309


selects the frequency shaping filter


305




(j)


which minimizes the mean squared pitch prediction error








e




(j)




=∥x−b




(j)




y




(j)





2




, j=


1, 2


, . . . , K








To calculate the mean squared pitch prediction error e


(j)


for each value of y


(j)


, the value y


(j)


is multiplied by the gain b by means of a corresponding amplifier


307




(j)


and the value b


(j)


y


(j)


is subtracted from the target vector x by means of subtractors


308




(j)


. Each gain b


(j)


is calculated in a corresponding gain calculator


306




(j)


in association with the frequency shaping filter at index j, using the following relationship:








b




(j)




=x




t




y




(j)




/∥y




(j)





2








In selector


309


, the parameters b, T, and j are chosen based on v


T


or v


f




(j)


which minimizes the mean squared pitch prediction error e.




Referring back to

FIG. 1

, the pitch codebook index T is encoded and transmitted to multiplexer


112


. The pitch gain b is quantized and transmitted to multiplexer


112


. With this new approach, extra information is needed to encode the index j of the selected frequency shaping filter in multiplexer


112


. For example, if three filters are used (j=0, 1, 2, 3), then two bits are needed to represent this information. The filter index information j can also be encoded jointly with the pitch gain b.




Innovative Codebook Search:




Once the pitch, or LTP (Long Term Prediction) parameters b, T, and j are determined, the next step is to search for the optimum innovative excitation by means of search module


110


of FIG.


1


. First, the target vector x is updated by subtracting the LTP contribution:








x′=x−by




T








where b is the pitch gain and y


T


is the filtered pitch codebook vector (the past excitation at delay T filtered with the selected low pass filter and convolved with the impulse response h as described with reference to FIG.


3


).




The search procedure in CELP is performed by finding the optimum excitation codevector c


k


and gain g which minimize the mean-squared error between the target vector and the scaled filtered codevector








E=∥x′−gHc




k





2








where H is a lower triangular convolution matrix derived from the impulse response vector h.




In the preferred embodiment of the present invention, the innovative codebook search is performed in module


110


by means of an algebraic codebook as described in U.S. Pat. No. 5,444,816 (Adoul et al.) issued on Aug. 22, 1995; U.S. Pat. No. 5,699,482 granted to Adoul et al., on Dec. 17, 1997; U.S. Pat. No. 5,754,976 granted to Adoul et al., on May 19, 1998; and U.S. Pat. No. 5,701,392 (Adoul et al.) dated Dec. 23, 1997.




Once the optimum excitation codevector c


k


and its gain g are chosen by module


110


, the codebook index k and gain g are encoded and transmitted to multiplexer


112


.




Referring to

FIG. 1

, the parameters b, T, j, Â(z), k and g are multiplexed through the multiplexer


112


before being transmitted through a communication channel.




Memory Update:




In memory module


111


(FIG.


1


), the states of the weighted synthesis filter W(z)/Â(z) are updated by filtering the excitation signal u=gc


k


+bv


T


through the weighted synthesis filter. After this filtering, the states of the filter are memorized and used in the next subframe as initial states for computing the zero-input response in calculator module


108


.




As in the case of the target vector x, other alternative but mathematically equivalent approaches well known to those of ordinary skill in the art can be used to update the filter states.




Decoder Side




The speech decoding device


200


of

FIG. 2

illustrates the various steps carried out between the digital input


222


(input stream to the demultiplexer


217


) and the output sampled speech


223


(output of the adder


221


).




Demultiplexer


217


extracts the synthesis model parameters from the binary information received from a digital input channel. From each received binary frame, the extracted parameters are:




the short-term prediction parameters (STP) Â(z) (once per frame);




the long-term prediction (LTP) parameters T, b, and j (for each subframe); and




the innovation codebook index k and gain g (for each subframe).




The current speech signal is synthesized based on these parameters as will be explained hereinbelow.




The innovative codebook


218


is responsive to the index k to produce the innovation codevector c


k


, which is scaled by the decoded gain factor g through an amplifier


224


. In the preferred embodiment, an innovative codebook


218


as described in the above mentioned U.S. Pat. Nos. 5,444,816; 5,699,482; 5,754,976; and 5,701,392 is used to represent the innovative codevector c


k


.




The generated scaled codevector gc


k


at the output of the amplifier


224


is processed through a innovation filter


205


.




Periodicity Enhancement:




The generated scaled codevector at the output of the amplifier


224


is processed through a frequency-dependent pitch enhancer


205


.




Enhancing the periodicity of the excitation signal u improves the quality in case of voiced segments. This was done in the past by filtering the innovation vector from the innovative codebook (fixed codebook)


218


through a filter in the form 1/(1−εbz


−T


) where ε is a factor below 0.5 which controls the amount of introduced periodicity. This approach is less efficient in case of wideband signals since it introduces periodicity over the entire spectrum. A new alternative approach, which is part of the present invention, is disclosed whereby periodicity enhancement is achieved by filtering the innovative codevector c


k


from the innovative (fixed) codebook through an innovation filter


205


(F(z)) whose frequency response emphasizes the higher frequencies more than lower frequencies. The coefficients of F(z) are related to the amount of periodicity in the excitation signal u.




Many methods known to those skilled in the art are available for obtaining valid periodicity coefficients. For example, the value of gain b provides an indication of periodicity. That is, if gain b is close to 1, the periodicity of the excitation signal u is high, and if gain b is less than 0.5, then periodicity is low.




Another efficient way to derive the filter F(z) coefficients used in a preferred embodiment, is to relate them to the amount of pitch contribution in the total excitation signal u. This results in a frequency response depending on the subframe periodicity, where higher frequencies are more strongly emphasized (stronger overall slope) for higher pitch gains. Innovation filter


205


has the effect of lowering the energy of the innovative codevector c


k


at low frequencies when the excitation signal u is more periodic, which enhances the periodicity of the excitation signal u at lower frequencies more than higher frequencies. Suggested forms for innovation filter


205


are








F


(


z


)=1


−σz




−1


,  (1)






or








F


(


z


)=−α


z


+1


−αz




−1


  (2)






where σ or α are periodicity factors derived from the level of periodicity of the excitation signal u.




The second three-term form of F(z) is used in a preferred embodiment. The periodicity factor α is computed in the voicing factor generator


204


. Several methods can be used to derive the periodicity factor α based on the periodicity of the excitation signal u. Two methods are presented below.




Method 1:




The ratio of pitch contribution to the total excitation signal u is first computed in voicing factor generator


204


by







R
p

=




b
2



v
T
t



v
T




u
t


u


=



b
2






n
=
0


N
-
1









v
T
2



(
n
)








n
=
0


N
-
1









u
2



(
n
)















where v


T


is the pitch codebook vector, b is the pitch gain, and u is the excitation signal u given at the output of the adder


219


by








u=gc




k




+bv




T








Note that the term bv


T


has its source in the pitch codebook (pitch codebook)


201


in response to the pitch lag T and the past value of u stored in memory


203


. The pitch codevector v


T


from the pitch codebook


201


is then processed through a low-pass filter


202


whose cut-off frequency is adjusted by means of the index j from the demultiplexer


217


. The resulting codevector v


T


is then multiplied by the gain b from the demultiplexer


217


through an amplifier


226


to obtain the signal bv


T


.




The factor α is calculated in voicing factor generator


204


by




 α=


qR




p


bounded by α<


q






where q is a factor which controls the amount of enhancement (q is set to 0.25 in this preferred embodiment).




Method 2:




Another method used in a preferred embodiment of the invention for calculating periodicity factor α is discussed below.




First, a voicing factor r


v


is computed in voicing factor generator


204


by








r




v


=(


E




v




−E




c


)/(


E




v




+E




c


)






where E


v


is the energy of the scaled pitch codevector bv


T


and E


c


is the energy of the scaled innovative codevector gc


k


. That is







E
v

=



b
2



v
T
t



v
T


=


b
2






n
=
0


N
-
1






v
T
2



(
n
)







and









E
c

=



g
2



c
k
t



c
k


=


g
2






n
=
0


N
-
1










c
k
2



(
n
)


.














Note that the value of r


v


lies between −1 and 1 (1 corresponds to purely voiced signals and −1 corresponds to purely unvoiced signals).




In this preferred embodiment, the factor α is then computed in voicing factor generator


204


by






α=0.125 (1


+r




v


)






which corresponds to a value of 0 for purely unvoiced signals and 0.25 for purely voiced signals.




In the first, two-term form of F(z), the periodicity factor σ can be approximated by using σ=2α in methods 1 and 2 above. In such a case, the periodicity factor σ is calculated as follows in method 1 above:






σ=2


qR




p


bounded by σ<2


q.








In method 2, the periodicity factor σ is calculated as follows:






σ=0.25 (1


+r




v


).






The enhanced signal c


f


is therefore computed by filtering the scaled innovative codevector gc


k


through the innovation filter


205


(F(z)).




The enhanced excitation signal u′ is computed by the adder


220


as:








u′=c




f




+bv




T








Note that this process is not performed at the encoder


100


. Thus, it is essential to update the content of the pitch codebook


201


using the excitation signal u without enhancement to keep synchronism between the encoder


100


and decoder


200


. Therefore, the excitation signal u is used to update the memory


203


of the pitch codebook


201


and the enhanced excitation signal u′ is used at the input of the LP synthesis filter


206


.




Synthesis and Deemphasis




The synthesized signal s′ is computed by filtering the enhanced excitation signal u′ through the LP synthesis filter


206


which has the form 1/Â(z), where Â(z) is the interpolated LP filter in the current subframe. As can be seen in

FIG. 2

, the quantized LP coefficients Â(z) on line


225


from demultiplexer


217


are supplied to the LP synthesis filter


206


to adjust the parameters of the LP synthesis filter


206


accordingly. The deemphasis filter


207


is the inverse of the preemphasis filter


103


of FIG.


1


. The transfer function of the deemphasis filter


207


is given by








D


(


z


)=1/(1


−μz




−1


)






where μ is a preemphasis factor with a value located between 0 and 1 (a typical value is μ=0.7). A higher-order filter could also be used.




The vector s′ is filtered through the deemphasis filter D(z) (module


207


) to obtain the vector s


d


, which is passed through the high-pass filter


208


to remove the unwanted frequencies below 50 Hz and further obtain s


h


.




Oversampling and High-frequency Regeneration




The over-sampling module


209


conducts the inverse process of the down-sampling module


101


of FIG.


1


. In this preferred embodiment, oversampling converts from the 12.8 kHz sampling rate to the original 16 kHz sampling rate, using techniques well known to those of ordinary skill in the art. The oversampled synthesis signal is denoted ŝ. Signal ŝ is also referred to as the synthesized wideband intermediate signal.




The oversampled synthesis ŝ signal does not contain the higher frequency components which were lost by the downsampling process (module


101


of

FIG. 1

) at the encoder


100


. This gives a low-pass perception to the synthesized speech signal. To restore the full band of the original signal, a high frequency generation procedure is disclosed. This procedure is performed in modules


210


to


216


, and adder


221


, and requires input from voicing factor generator


204


(FIG.


2


).




In this new approach, the high frequency contents are generated by filling the upper part of the spectrum with a white noise properly scaled in the excitation domain, then converted to the speech domain, preferably by shaping it with the same LP synthesis filter used for synthesizing the down-sampled signal ŝ.




The high frequency generation procedure in accordance with the present invention is described hereinbelow.




The random noise generator


213


generates a white noise sequence w′ with a flat spectrum over the entire frequency bandwidth, using techniques well known to those of ordinary skill in the art. The generated sequence is of length N′ which is the subframe length in the original domain. Note that N is the subframe length in the down-sampled domain. In this preferred embodiment, N=64 and N′=80 which correspond to 5 ms.




The white noise sequence is properly scaled in the gain adjusting module


214


. Gain adjustment comprises the following steps. First, the energy of the generated noise sequence w′ is set equal to the energy of the enhanced excitation signal u′ computed by an energy computing module


210


, and the resulting scaled noise sequence is given by








w


(
n
)


=



w




(
n
)









n
=
0


N
-
1









u
′2



(
n
)







n
=
0



N


-
1









w
′2



(
n
)







,

n
=
0

,





,


N


-
1.











The second step in the gain scaling is to take into account the high frequency contents of the synthesized signal at the output of the voicing factor generator


204


so as to reduce the energy of the generated noise in case of voiced segments (where less energy is present at high frequencies compared to unvoiced segments). In this preferred embodiment, measuring the high frequency contents is implemented by measuring the tilt of the synthesis signal through a spectral tilt calculator


212


and reducing the energy accordingly. Other measurements such as zero crossing measurements can equally be used. When the tilt is very strong, which corresponds to voiced segments, the noise energy is further reduced. The tilt factor is computed in module


212


as the first correlation coefficient of the synthesis signal s


h


and it is given by:







tilt
=





n
=
0


N
-
1






s
h



(
n
)





s
h



(

n
-
1

)








n
=
0


N
-
1





s
h
2



(
n
)





,


conditioned





by





tilt



0





and





tilt



r
v


,










where voicing factor r


v


is given by








r




v


=(


E




v




−E




c


)/(


E




v




+E




c


)






where E


v


is the energy of the scaled pitch codevector bv


T


and E


c


is the energy of the scaled innovative codevector gc


k


, as described earlier. Voicing factor r


v


is most often less than tilt but this condition was introduced as a precaution against high frequency tones where the tilt value is negative and the value of r


v


is high. Therefore, this condition reduces the noise energy for such tonal signals.




The tilt value is 0 in case of flat spectrum and 1 in case of strongly voiced signals, and it is negative in case of unvoiced signals where more energy is present at high frequencies.




Different methods can be used to derive the scaling factor g


t


from the amount of high frequency contents. In this invention, two methods are given based on the tilt of signal described above.




Method 1:




The scaling factor g


t


is derived from the tilt by








g




t


=1−tilt bounded by 0.2


≦g




t


≦1.0






For strongly voiced signal where the tilt approaches 1, g


t


is 0.2 and for strongly unvoiced signals g


t


becomes 1.0.




Method 2:




The tilt factor g


t


is first restricted to be larger or equal to zero, then the scaling factor is derived from the tilt by








g




t


=10


−0.6tilt








The scaled noise sequence w


g


produced in gain adjusting module


214


is therefore given by:








w




g




=g




t




w.








When the tilt is close to zero, the scaling factor g


t


is close to 1, which does not result in energy reduction. When the tilt value is 1, the scaling factor g


t


results in a reduction of 12 dB in the energy of the generated noise.




Once the noise is properly scaled (w


g


), it is brought into the speech domain using the spectral shaper


215


. In the preferred embodiment, this is achieved by filtering the noise w


g


through a bandwidth expanded version of the same LP synthesis filter used in the down-sampled domain (1/Â(z/0.8)). The corresponding bandwidth expanded LP filter coefficients are calculated in spectral shaper


215


.




The filtered scaled noise sequence w


f


is then band-pass filtered to the required frequency range to be restored using the band-pass filter


216


. In the preferred embodiment, the band-pass filter


216


restricts the noise sequence to the frequency range 5.6-7.2 kHz. The resulting band-pass filtered noise sequence z is added in adder


221


to the oversampled synthesized speech signal ŝ to obtain the final reconstructed sound signal s


out


on the output


223


.




Although the present invention has been described hereinabove by way of a preferred embodiment thereof, this embodiment can be modified at will, within the scope of the appended claims, without departing from the spirit and nature of the subject invention. Even though the preferred embodiment discusses the use of wideband speech signals, it will be obvious to those skilled in the art that the subject invention is also directed to other embodiments using wideband signals in general and that it is not necessarily limited to speech applications.



Claims
  • 1. A device for enhancing periodicity of an excitation signal produced in relation to a pitch codevector and an innovative codevector for supplying a signal synthesis filter in view of synthesizing a wideband speech signal, said periodicity enhancing device comprising:a) a factor generator for calculating a periodicity factor related to the wideband speech signal; and b) an innovation filter for filtering the innovative codevector in relation to said periodicity factor to thereby reduce energy of a low frequency portion of the innovative codevector and enhance periodicity of a low frequency portion of the excitation signal.
  • 2. A periodicity enhancing device as defined in claim 1, wherein said factor generator comprises a means for calculating a periodicity factor in response to the pitch codevector and the innovative codevector.
  • 3. A periodicity enhancing device as defined in claim 1, wherein said innovation filter has a transfer function of the form:F(z)=−αz+1−αz−1 where α is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 4. A periodicity enhancing device as defined in claim 3, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=qRp bounded by α<q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢ ⁢vT2⁡(n)∑n=0N-1⁢ ⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 5. A periodicity enhancing device as defined in claim 4, wherein said enhancement factor q is set to 0.25.
  • 6. A periodicity enhancing device as defined in claim 3, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=0.125 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 7. A periodicity enhancing device as defined in claim 1, wherein said innovation filter has a transfer function of the form:F(z)=1−σz−1 where σ is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 8. A periodicity enhancing device as defined in claim 7, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=2qRp bounded by σ<2q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 9. A periodicity enhancing device as defined in claim 8, wherein said enhancement factor q is set to 0.25.
  • 10. A periodicity enhancing device as defined in claim 7, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=0.25 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 11. A method for enhancing periodicity of an excitation signal produced in relation to a pitch codevector and an innovative codevector for supplying a signal synthesis filter in view of synthesizing a wideband speech signal, said periodicity enhancing method comprising:a) calculating a periodicity factor related to the wideband speech signal; and b) filtering the innovative codevector in relation to said periodicity factor to thereby reduce energy of a low frequency portion of the innovative codevector and enhance periodicity of a low frequency portion of the excitation signal.
  • 12. A method for enhancing periodicity as defined in claim 10, wherein said factor generator comprises a means for calculating a periodicity factor in response to the pitch codevector and the innovative codevector.
  • 13. A method for enhancing periodicity as defined in claim 10, wherein said filtering comprises processing the innovation vector through an innovation filter having a transfer function of the form:F(z)=−αz+1−αz−1 where α is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 14. A method for enhancing periodicity as defined in claim 13, wherein said periodicity factor calculation comprises calculating said periodicity factor α using the relation:α=qRp bounded by α<q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 15. A method for enhancing periodicity as defined in claim 14, wherein said enhancement factor q is set to 0.25.
  • 16. A method for enhancing periodicity as defined in claim 13, wherein said periodicity factor calculation comprises calculating said periodicity factor α using the relation:α=0.125 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 17. A method for enhancing periodicity as defined in claim 11, wherein said filtering comprises processing the innovation vector through an innovation filter having a transfer function of the form:F(z)=1−σz−1 where σ is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 18. A method for enhancing periodicity as defined in claim 17, wherein said periodicity factor calculation comprises calculating said periodicity factor σ using the relation:σ=2qRp bounded by σ<2q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 19. A method for enhancing periodicity as defined in claim 18, wherein said enhancement factor q is set to 0.25.
  • 20. A method for enhancing periodicity defined in claim 17, wherein said periodicity factor calculation comprises calculating said periodicity factor σ using the relation:σ=0.25 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 21. A decoder for producing a synthesized wideband speech signal, comprising:a) a signal fragmenting device for receiving an encoded wideband speech signal and extracting from said encoded wideband speech signal at least pitch codebook parameters, innovative codebook parameters, and synthesis filter coefficients; b) an pitch codebook responsive to said pitch codebook parameters for producing a pitch codevector; c) an innovative codebook responsive to said innovative codebook parameters for producing an innovative codevector; d) a periodicity enhancing device as recited in claim 1 comprising said factor generator for calculating a periodicity factor related to the wideband speech signal, and said innovation filter for filtering the innovative codevector; e) a combiner circuit for combining said pitch codevector and said innovative codevector filtered by said innovation filter to thereby produce said periodicity enhanced excitation signal; and f) a signal synthesis filter for filtering said periodicity enhanced excitation signal in relation to said synthesis filter coefficients to thereby produce said synthesized wideband speech signal.
  • 22. A decoder for producing a synthesized wideband speech signal as defined in claim 21, wherein said factor generator comprises a means for calculating a periodicity factor in response to the pitch codevector and the innovative codevector.
  • 23. A decoder for producing a synthesized wideband speech signal as defined in claim 21, wherein said innovation filter has a transfer function of the form:F(z)=−αz+1−αz−1 where α is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 24. A decoder for producing a synthesized wideband speech signal as defined in claim 23, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=qRp bounded by α<q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 25. A decoder for producing a synthesized wideband speech signal as defined in claim 24, wherein said enhancement factor q is set to 0.25.
  • 26. A decoder for producing a synthesized wideband speech signal as defined in claim 23, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=0.125 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 27. A decoder for producing a synthesized wideband speech signal as defined in claim 21, wherein said innovation filter has a transfer function of the form:F(z)=1−σz−1 where σ is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 28. A decoder for producing a synthesized wideband speech signal as defined in claim 27, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=2qRp bounded by σ<2q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 29. A decoder for producing a synthesized wideband speech signal as defined in claim 28, wherein said enhancement factor q is set to 0.25.
  • 30. A decoder for producing a synthesized wideband speech signal as defined in claim 27, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=0.25(1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 31. In a decoder for producing a synthesized wideband speech signal, comprising:a) a signal fragmenting device for receiving an encoded wideband speech signal and extracting from said encoded wideband speech signal at least pitch codebook parameters, innovative codebook parameters, and synthesis filter coefficients; b) an pitch codebook responsive to said pitch codebook parameters for producing a pitch codevector; c) an innovative codebook responsive to said innovative codebook parameters for producing an innovative codevector; d) a combiner circuit for combining said pitch codevector and innovative codevector to thereby produce an excitation signal; and e) a signal synthesis filter for filtering said excitation signal in relation to said synthesis filter coefficients to thereby produce said synthesized wideband speech signal; the improvement comprising of a periodicity enhancing device as recited in claim 1 comprising said factor generator for calculating a periodicity factor related to the wideband speech signal, and said innovation filter for filtering the innovative codevector.
  • 32. A decoder for producing a synthesized wideband speech signal as defined in claim 31, wherein said factor generator comprises a means for calculating a periodicity factor in response to the pitch codevector and the innovative codevector.
  • 33. A decoder for producing a synthesized wideband speech signal as defined in claim 31, wherein said innovation filter has a transfer function of the form: F(z)=−αz+1−αz−1 where α is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 34. A decoder for producing a synthesized wideband speech signal as defined in claim 33, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=qRp bounded by α<q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 35. A decoder for producing a synthesized wideband speech signal as defined in claim 34, wherein said enhancement factor q is set to 0.25.
  • 36. A decoder for producing a synthesized wideband speech signal as defined in claim 33, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=0.125 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 37. A decoder for producing a synthesized wideband speech signal as defined in claim 31, wherein said innovation filter has a transfer function of the form:F(z)=1−σz−1 where σ is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 38. A decoder for producing a synthesized wideband speech signal as defined in claim 37, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=2qRp bounded by σ<2q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 39. A decoder for producing a synthesized wideband speech signal as defined in claim 38, wherein said enhancement factor q is set to 0.25.
  • 40. A decoder for producing a synthesized wideband speech signal as defined in claim 37, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=0.25 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 41. A cellular communication system for servicing a large geographical area divided into a plurality of cells, comprising:a) mobile transmitter/receiver units; b) cellular base stations respectively situated in said cells; c) a control terminal for controlling communication between the cellular base stations; d) a bidirectional wireless communication sub-system between each mobile unit situated in one cell and the cellular base station of said one cell, said bidirectional wireless communication sub-system comprising, in both the mobile unit and the cellular base station: i) a transmitter including an encoder for encoding a wideband speech signal and a transmission circuit for transmitting the encoded wideband speech signal; and ii) a receiver including a receiving circuit for receiving a transmitted encoded wideband speech signal and a decoder as recited in claim 21 for decoding the received encoded wideband speech signal.
  • 42. A cellular communication system as defined in claim 41, wherein said factor generator comprises a means for calculating a periodicity factor in response to the pitch codevector and the innovative codevector.
  • 43. A cellular communication system as defined in claim 41, wherein said innovation filter has a transfer function of the form:F(z)=−αz+1−αz−1 where α is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 44. A cellular communication system as defined in claim 43, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=qRp bounded by α<q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 45. A cellular communication system as defined in claim 44, wherein said enhancement factor q is set to 0.25.
  • 46. A cellular communication system as defined in claim 43, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=0.125 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 47. A cellular communication system as defined in claim 41, wherein said innovation filter has a transfer function of the form:F(z)=1−σz−1 where σ is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 48. A cellular communication system as defined in claim 47, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=2qRp bounded by σ<2q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 49. A cellular communication system as defined in claim 48, wherein said enhancement factor q is set to 0.25.
  • 50. A cellular communication system as defined in claim 47, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=0.25 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 51. A cellular mobile transmitter/receiver unit comprising:a) a transmitter including an encoder for encoding a wideband speech signal and a transmission circuit for transmitting the encoded wideband speech signal; and b) a receiver including a receiving circuit for receiving a transmitted encoded wideband speech signal and a decoder as recited in claim 21 for decoding the received encoded wideband speech signal.
  • 52. A cellular mobile transmitter/receiver unit as defined in claim 51, wherein said factor generator comprises a means for calculating a periodicity factor in response to the pitch codevector and the innovative codevector.
  • 53. A cellular mobile transmitter/receiver unit as defined in claim 51, wherein said innovation filter has a transfer function of the form:F(z)=−αz+1−αz−1 where α is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 54. A cellular mobile transmitter/receiver unit as defined in claim 53, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=qRp bounded by α<q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 55. A cellular mobile transmitter/receiver unit as defined in claim 54, wherein said enhancement factor q is set to 0.25.
  • 56. A cellular mobile transmitter/receiver unit as defined in claim 53, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=0.125 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 57. A cellular mobile transmitter/receiver unit as defined in claim 51, wherein said innovation filter has a transfer function of the form:F(z)=1−σz−1 where σ is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 58. A periodicity enhancing device as defined in claim 57, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ32 2qRp bounded by σ<2q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 59. A cellular mobile transmitter/receiver unit as defined in claim 58, wherein said enhancement factor q is set to 0.25.
  • 60. A cellular mobile transmitter/receiver unit as defined in claim 57, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=0.25 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 61. A cellular network element comprising:a) a transmitter including an encoder for encoding a wideband speech signal and a transmission circuit for transmitting the encoded wideband speech signal; and b) a receiver including a receiving circuit for receiving a transmitted encoded wideband speech signal and a decoder as recited in claim 21 for decoding the received encoded wideband speech signal.
  • 62. A cellular network element as defined in claim 61, wherein said factor generator comprises a means for calculating a periodicity factor in response to the pitch codevector and the innovative codevector.
  • 63. A cellular network element as defined in claim 61, wherein said innovation filter has a transfer function of the form:F(z)=−αz+1−αz−1 where α is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 64. A cellular network element as defined in claim 63, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=qRp bounded by α<q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 65. A cellular network element as defined in claim 64, wherein said enhancement factor q is set to 0.25.
  • 66. A cellular network element as defined in claim 63, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=0.125 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 67. A cellular network element as defined in claim 61, wherein said innovation filter has a transfer function of the form:F(z)=1−σz−1 where σ is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 68. A cellular network element as defined in claim 67, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=2qRp bounded by σ<2q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 69. A cellular network element as defined in claim 68, wherein said enhancement factor q is set to 0.25.
  • 70. A cellular network element as defined in claim 67, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=0.25 (1+rv), where  rv=(Ev−Ec)/(Ev+Ec)where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 71. In a cellular communication system for servicing a large geographical area divided into a plurality of cells, comprising: mobile transmitter/receiver units; cellular base stations, respectively situated in said cells; and control terminal for controlling communication between the cellular base stations:a bidirectional wireless communication sub-system between each mobile unit situated in one cell and the cellular base station of said one cell, said bidirectional wireless communication subsystem comprising, in both the mobile unit and the cellular base station: a) a transmitter including an encoder for encoding a wideband speech signal and a transmission circuit for transmitting the encoded wideband speech signal; and b) a receiver including a receiving circuit for receiving a transmitted encoded wideband speech signal and a decoder as recited in claim 21 for decoding the received encoded wideband speech signal.
  • 72. A bidirectional wireless communication sub-system as defined in claim 71, wherein said factor generator comprises a means for calculating a periodicity factor in response to the pitch codevector and the innovative codevector.
  • 73. A bidirectional wireless communication sub-system as defined in claim 71, wherein said innovation filter has a transfer function of the form:F(z)=−αz+1−αz−1 where α is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 74. A bidirectional wireless communication sub-system as defined in claim 73, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=qRp bounded by α<q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 75. A bidirectional wireless communication sub-system as defined in claim 74, wherein said enhancement factor q is set to 0.25.
  • 76. A bidirectional wireless communication sub-system as defined in claim 73, wherein said factor generator comprises a means for calculating said periodicity factor α using the relation:α=0.125 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
  • 77. A bidirectional wireless communication subsystem as defined in claim 71, wherein said innovation filter has a transfer function of the form:F(z)=1−σz−1 where σ is a periodicity factor derived from a level of periodicity of the excitation signal.
  • 78. A bidirectional wireless communication sub-system as defined in claim 77, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=2qRp bounded by σ<2q where q is an enhancement factor, and where Rp=b2⁢vTt⁢vTut⁢u=b2⁢∑n=0N-1⁢vT2⁡(n)∑n=0N-1⁢u2⁡(n)where vT is the pitch codevector, b is a pitch gain, N is a subframe length, and u is the excitation signal.
  • 79. A bidirectional wireless communication sub-system as defined in claim 78, wherein said enhancement factor q is set to 0.25.
  • 80. A bidirectional wireless communication sub-system as defined in claim 77, wherein said factor generator comprises a means for calculating said periodicity factor σ using the relation:σ=0.25 (1+rv), where rv=(Ev−Ec)/(Ev+Ec) where Ev is the energy of the pitch codevector and Ec is the energy of the innovative codevector.
Priority Claims (1)
Number Date Country Kind
2252170 Oct 1998 CA
PCT Information
Filing Document Filing Date Country Kind
PCT/CA99/01009 WO 00
Publishing Document Publishing Date Country Kind
WO00/25303 5/4/2000 WO A
US Referenced Citations (7)
Number Name Date Kind
5235669 Ordentlich et al. Aug 1993 A
5444816 Adoul et al. Aug 1995 A
5450449 Kroon Sep 1995 A
5699482 Adoul et al. Dec 1997 A
5701392 Adoul et al. Dec 1997 A
5754976 Adoul et al. May 1998 A
5819213 Oshikiri et al. Oct 1998 A
Foreign Referenced Citations (2)
Number Date Country
0788091 Aug 1997 EP
0658874 Aug 1999 EP
Non-Patent Literature Citations (1)
Entry
Atal and Schroeder, “Predictive Coding of Speech Signals and Subjective Error Criteria,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, No. 2, Jun. 1979, pp. 247-254.