The above and other objects and features of the present invention will become better understood with regard to the following description of the preferred embodiments given in conjunction with the accompanying drawings, in which:
Hereinafter, a waveform interpolation speech coding apparatus and method will be described in more detail with reference to the accompanying drawings.
Referring to
The realignment parameter calculator 20 includes a REW decoder 21, a SEW decoder 22, a waveform compositor 23, and a CW realigning unit 24.
The realignment parameter calculator 20 is newly included in the encoder according to the present embodiment, and calculates a realignment parameter that is a phase shift, which is required to realign CWs in a decoder. The conventional WI encoder obtains an LPC, a pitch period, a power of CW, a SEW, and a REW in an encoding procedure. However, in the present embodiment, the encoder additionally calculates a realignment parameter through the realignment parameter calculator 20 as well as calculating the above five parameters.
At first, the waveform interpolation encoder according to an embodiment of the present invention receives a speech signal, calculates parameters for waveform interpolation, and quantizes the calculated parameters.
Then, the waveform interpolation encoder according to the present embodiment calculates a realignment parameter to be used in a decoder. Hereinafter, a step of calculating the realignment parameter will be described.
At first, the REW decoder 21 decodes the quantized REW parameter, and the SEW decoder 22 decodes the quantized SEW parameter.
Then, the waveform compositor 23 composites the SEW parameter and the REW parameter, thereby restoring an original CW.
The CW restored in the waveform compositor 23 is not aligned due to a quantization error unlike the CWs outputted from the CW aligning unit 16 shown in
Accordingly, the waveform interpolation decoder receives the phase shift value for realignment from the encoder and performs a decoding operation without calculating a realignment parameter. In the encoder, the computation amount increases due to the additional operation for calculating the realignment parameter. In the technology field for storing the speech signal, the encoder is not required to process speech signals in real time. Therefore, although the computation amount of the encoder increases due to the realignment parameter calculation, it dose not influence the performance of the speech CODEC.
The realignment parameter obtained in the encoder is required to be quantized because it needs to be transmitted to the decoder for using it in the realignment operation. The influence of quantizing a realignment parameter to the realignment in a decoder can be measured using an average normalized cross-correlation like as Eq. 9.
In Eq. 9, C(ui,φτ) denotes a maximum cross-correlation value for alignment, and C(ui,φτ′) denotes a maximum cross-correlation value for realignment.
If the decoder perfectly realigns the CW, the ANCC value becomes one. Table 1 shows ANCC values measured to show the effect of realignment parameters in a decoder. A short range in Table 1 denotes a phase shift range for realignment in a decoder.
In Table 1, when the shift range is 0, that is, when there is no realignment value to transmit in an encoder, the decoder does not perform a realignment operation. Although no alignment operation is performed, 77.45% of entire CWs are already aligned, and only 22.55% of CWs are misaligned due to the quantization error.
When the shift range is in 8, four bits are required to transmit a realignment parameter. If the realignment operation is performed using the realignment parameter, 98.56% of CWs are aligned. If a 25 msec frame length is used in a speed signal coding operation and five bits of realignment parameters are used, the rate of realignment is 99.39% compared with a real decoder, and the overall bit rate increases to about 0.2 kbps.
Referring to
Then, the quantized SEW and REW parameters are decoded, and the two parameters are composited, thereby restoring the original CWs at step S304.
The CW restored at the step S304 is not aligned due to quantization error unlike CWs outputted in the CW alignment step. Therefore, a realignment parameter is calculated for realigning the CWs like as the CW alignment, and the realignment parameter is quantized at step S306. Herein, the realignment parameter is a parameter for maximizing the cross-correlation among consecutive CWs.
The step S306 for calculating the realignment parameter occupies about 20% of entire computation amount in a decoder. Therefore, it is preferable to calculate the realignment parameter in the encoding procedure using a waveform interpolation encoder for reducing the computation amount of decoding.
The above described method according to the present invention can be embodied as a program and stored on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by the computer system. The computer readable recording medium includes a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a floppy disk, a hard disk and an optical magnetic disk.
According to the certain embodiments of the present invention, an encoder, which is not required real time operation, previously calculates a CW realignment parameter, quantizes the CW realignment parameter, and transmits the quantized CW realignment parameter to a decoder. The decoder uses the received CW realignment parameter for realigning the CWs without calculating the CW realignment parameter which requires a mass amount of complicated computation. Therefore, the computation amount of decoder can be reduced.
Although the bit rate would slightly increase due to transmission of the CW realignment parameter, the computation amount of the decoder can be reduced in the technology field of storing a speech signal in which the computation amount is a major factor influencing the performance thereof.
An encoder and a decoder must be operated in real time in the communication technology field. However, in the technology field of storing a speech signal, the encoder is not required to be operated in real time. Therefore, in the present invention, it allows an encoder to encode, compress and store the speech signal at off-line, and allows a decoder to restore the original speech signal through real time decoding according to needs, thereby reducing the computation among in the decoder that requires the real time decoding operation.
Since most test-to-speech (TTS) synthesizers developed recently are based on a technique known as synthesis by concatenation, the implementation of a high-quality TTS requires huge storage space for a large number of speech segments. In order to compress the database of TTS system, it is essential to use a speech CODEC. In a technology field related to compress the database of TTS synthesizer, the computation amount of a decoder seriously influences the performance of a speech codec.
The waveform interpolation encoding apparatus according to the present invention may be applied to the TTS compositor in order to reduce the complexity of the decoder, thereby decoding the database of the TTS compositor with less amount of computation after compressing and storing the database.
Such an effective speech coding method for a TTS compositor can be embedded in the TTS compositor.
The present application contains subject matter related to Korean patent application Nos. KR 2006-0055059 and KR 2006-81265 filed in the Korean Intellectual Property Office on Jun. 19, 2006, Aug. 25, 2006, respectively, the entire contents of which being incorporated herein by reference.
While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirits and scope of the invention as defined in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2006-0055059 | Jun 2006 | KR | national |
10-2006-0081265 | Aug 2006 | KR | national |