Waveform interpolation speech coding apparatus and method for reducing complexity thereof

Information

  • Patent Application
  • 20080004867
  • Publication Number
    20080004867
  • Date Filed
    December 19, 2006
    17 years ago
  • Date Published
    January 03, 2008
    16 years ago
Abstract
A waveform interpolation speech coding apparatus and method for reducing complexity thereof are disclosed. The waveform interpolation speech coding apparatus includes: a waveform interpolation encoding unit for receiving a speech signal, calculating parameters for a waveform interpolation from the received speech signal, and quantizing the calculating parameters; and a realignment parameter calculating unit for restoring a characteristic waveform (CW) using the quantized parameter, calculating a realignment parameter that maximizes a cross-correlation among consecutive CWs for the restored CW.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects and features of the present invention will become better understood with regard to the following description of the preferred embodiments given in conjunction with the accompanying drawings, in which:



FIG. 1 is a block diagram illustrating a waveform interpolation encoder in accordance with a related art;



FIG. 2 is a block diagram illustrating a waveform interpolation encoder for reducing a computation amount of a decoder in accordance with an embodiment of the present invention; and



FIG. 3 is a flowchart of a waveform interpolation encoding method for reducing a computation amount of a decoder in accordance with an embodiment of the present invention.





DETAILED DESCRIPTION OF THE INVENTION

Hereinafter, a waveform interpolation speech coding apparatus and method will be described in more detail with reference to the accompanying drawings.



FIG. 2 is a block diagram illustrating a waveform interpolation encoder for reducing a computation amount of a decoder in accordance with an embodiment of the present invention.


Referring to FIG. 2, the waveform interpolation encoder according to the present embodiment includes a linear prediction coefficient (LPC) analyzer 10, a line spectral frequency (LSF) converter 11, a linear prediction (LP) analysis filter 12, a pitch estimator 13, a characteristic waveform (CW) extractor 14, a power calculator 15, a CW aligning unit 16, a decomposition/down-sampler 17, a SEW quantizer 18, a REW quantizer 19, and a realignment parameter calculator 20.


The realignment parameter calculator 20 includes a REW decoder 21, a SEW decoder 22, a waveform compositor 23, and a CW realigning unit 24.


The realignment parameter calculator 20 is newly included in the encoder according to the present embodiment, and calculates a realignment parameter that is a phase shift, which is required to realign CWs in a decoder. The conventional WI encoder obtains an LPC, a pitch period, a power of CW, a SEW, and a REW in an encoding procedure. However, in the present embodiment, the encoder additionally calculates a realignment parameter through the realignment parameter calculator 20 as well as calculating the above five parameters.


At first, the waveform interpolation encoder according to an embodiment of the present invention receives a speech signal, calculates parameters for waveform interpolation, and quantizes the calculated parameters.


Then, the waveform interpolation encoder according to the present embodiment calculates a realignment parameter to be used in a decoder. Hereinafter, a step of calculating the realignment parameter will be described.


At first, the REW decoder 21 decodes the quantized REW parameter, and the SEW decoder 22 decodes the quantized SEW parameter.


Then, the waveform compositor 23 composites the SEW parameter and the REW parameter, thereby restoring an original CW.


The CW restored in the waveform compositor 23 is not aligned due to a quantization error unlike the CWs outputted from the CW aligning unit 16 shown in FIG. 1. Therefore, the CW realigning unit 24 calculates a phase shift value for realigning the CWs like as the CW alignment operation shown in FIG. 1.


Accordingly, the waveform interpolation decoder receives the phase shift value for realignment from the encoder and performs a decoding operation without calculating a realignment parameter. In the encoder, the computation amount increases due to the additional operation for calculating the realignment parameter. In the technology field for storing the speech signal, the encoder is not required to process speech signals in real time. Therefore, although the computation amount of the encoder increases due to the realignment parameter calculation, it dose not influence the performance of the speech CODEC.


The realignment parameter obtained in the encoder is required to be quantized because it needs to be transmitted to the decoder for using it in the realignment operation. The influence of quantizing a realignment parameter to the realignment in a decoder can be measured using an average normalized cross-correlation like as Eq. 9.









ANCC
=


1
N








n
i


i

=
1

N



[


C


(


n
i

,

φ
T


)



C


(


n
i

,

φ
T


)



]







Eq
.




9







In Eq. 9, C(uiτ) denotes a maximum cross-correlation value for alignment, and C(uiτ′) denotes a maximum cross-correlation value for realignment.


If the decoder perfectly realigns the CW, the ANCC value becomes one. Table 1 shows ANCC values measured to show the effect of realignment parameters in a decoder. A short range in Table 1 denotes a phase shift range for realignment in a decoder.














TABLE 1







The number of


Realignment



bits
Shift range
ANCC
rate









0
0
0.94667
77.45%



2
−2 ≦ T ≦ 2
0.96216
91.22%



3
−4 ≦ T ≦ 4
0.97418
96.38%



4
−8 ≦ T ≦ 8
0.98722
98.56%



5
−16 ≦ T ≦ 16
0.99501
99.39%



6
−32 ≦ T ≦ 32
0.99906
99.89%










In Table 1, when the shift range is 0, that is, when there is no realignment value to transmit in an encoder, the decoder does not perform a realignment operation. Although no alignment operation is performed, 77.45% of entire CWs are already aligned, and only 22.55% of CWs are misaligned due to the quantization error.


When the shift range is in 8, four bits are required to transmit a realignment parameter. If the realignment operation is performed using the realignment parameter, 98.56% of CWs are aligned. If a 25 msec frame length is used in a speed signal coding operation and five bits of realignment parameters are used, the rate of realignment is 99.39% compared with a real decoder, and the overall bit rate increases to about 0.2 kbps.



FIG. 3 is a flowchart of a waveform interpolation encoding method for reducing a computation amount of a decoder in accordance with an embodiment of the present invention.


Referring to FIG. 3, an encoder according to the present embodiment receives a speech signal, and calculates parameters for waveform interpolation encoding using the received speech signal. These parameters are an LPC, a pitch period, the power of CW, a SEW, and a REW as shown in FIG. 2, and the calculated parameters are quantized at step S302.


Then, the quantized SEW and REW parameters are decoded, and the two parameters are composited, thereby restoring the original CWs at step S304.


The CW restored at the step S304 is not aligned due to quantization error unlike CWs outputted in the CW alignment step. Therefore, a realignment parameter is calculated for realigning the CWs like as the CW alignment, and the realignment parameter is quantized at step S306. Herein, the realignment parameter is a parameter for maximizing the cross-correlation among consecutive CWs.


The step S306 for calculating the realignment parameter occupies about 20% of entire computation amount in a decoder. Therefore, it is preferable to calculate the realignment parameter in the encoding procedure using a waveform interpolation encoder for reducing the computation amount of decoding.


The above described method according to the present invention can be embodied as a program and stored on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by the computer system. The computer readable recording medium includes a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a floppy disk, a hard disk and an optical magnetic disk.


According to the certain embodiments of the present invention, an encoder, which is not required real time operation, previously calculates a CW realignment parameter, quantizes the CW realignment parameter, and transmits the quantized CW realignment parameter to a decoder. The decoder uses the received CW realignment parameter for realigning the CWs without calculating the CW realignment parameter which requires a mass amount of complicated computation. Therefore, the computation amount of decoder can be reduced.


Although the bit rate would slightly increase due to transmission of the CW realignment parameter, the computation amount of the decoder can be reduced in the technology field of storing a speech signal in which the computation amount is a major factor influencing the performance thereof.


An encoder and a decoder must be operated in real time in the communication technology field. However, in the technology field of storing a speech signal, the encoder is not required to be operated in real time. Therefore, in the present invention, it allows an encoder to encode, compress and store the speech signal at off-line, and allows a decoder to restore the original speech signal through real time decoding according to needs, thereby reducing the computation among in the decoder that requires the real time decoding operation.


Since most test-to-speech (TTS) synthesizers developed recently are based on a technique known as synthesis by concatenation, the implementation of a high-quality TTS requires huge storage space for a large number of speech segments. In order to compress the database of TTS system, it is essential to use a speech CODEC. In a technology field related to compress the database of TTS synthesizer, the computation amount of a decoder seriously influences the performance of a speech codec.


The waveform interpolation encoding apparatus according to the present invention may be applied to the TTS compositor in order to reduce the complexity of the decoder, thereby decoding the database of the TTS compositor with less amount of computation after compressing and storing the database.


Such an effective speech coding method for a TTS compositor can be embedded in the TTS compositor.


The present application contains subject matter related to Korean patent application Nos. KR 2006-0055059 and KR 2006-81265 filed in the Korean Intellectual Property Office on Jun. 19, 2006, Aug. 25, 2006, respectively, the entire contents of which being incorporated herein by reference.


While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirits and scope of the invention as defined in the following claims.

Claims
  • 1. A waveform interpolation coding apparatus for reducing a computation amount of a decoder, comprising: a waveform interpolation encoding means for receiving a speech signal, calculating parameters for a waveform interpolation from the received speech signal, and quantizing the calculating parameters; anda realignment parameter calculating means for restoring a characteristic waveform (CW) using the quantized parameter, calculating a realignment parameter that maximizes a cross-correlation among consecutive CWs for the restored CW.
  • 2. The waveform interpolation coding apparatus as recited in claim 2, wherein the realignment parameter calculating means includes: a rapidly evolving waveform (REW) coding means for receiving a REW parameter among the quantized parameters and decoding the received REW parameter;a slowly evolving waveform (SEW) coding means for receiving a SEW parameter among the quantized parameters and decoding the received SEW parameter;a waveform combining means for combining the decoded REW parameter and the decoded SEW parameter in order to restore the CWs; anda CW realigning means for calculating a realignment parameter that maximizes a cross-correlation among consecutive CWs for the restored CW and quantizing the realignment parameter.
  • 3. The waveform interpolation coding apparatus as recited in claim 2, wherein the CW realigning means allocates a corresponding bit rate for transmitting the obtained realignment parameter to a decoder according to a rate of realigning the CWs.
  • 4. A waveform interpolation encoding method for reducing a computation amount in a decoder, comprising the steps of: a) receiving a speech signal, calculating parameters for waveform interpolation encoding, and quantizing the calculated parameters;b) restoring characteristic waveforms using the quantized parameters; andc) calculate a realignment parameter maximizing a cross-correlation among consecutive CWs for the restored CWs and quantizing the calculated realignment parameter.
  • 5. The waveform interpolation encoding method as recited in claim 4, wherein the step b) includes the steps of: b1) decoding a rapidly evolving waveform (REW) parameter among the quantized parameters;b2) decoding a slowly evolving waveform (SEW) parameter among the quantized parameters; andb3) restoring a CW by combining the decoded REW parameter and the decoded SEW parameter.
  • 6. The waveform interpolation encoding method as recited in claim 4, wherein in the step c), a bit rate for transmitting the calculated realignment parameter to a decoder is allocated according to a rate of realigning the CWs.
Priority Claims (2)
Number Date Country Kind
10-2006-0055059 Jun 2006 KR national
10-2006-0081265 Aug 2006 KR national