System and method for reducing distortion in voice synthesis through improved interpolation

Information

  • Patent Grant
  • 5111505
  • Patent Number
    5,111,505
  • Date Filed
    Tuesday, October 16, 1990
    33 years ago
  • Date Issued
    Tuesday, May 5, 1992
    32 years ago
Abstract
A voice synthesizing device which compiles wave segments, such as pitch wave segments, in order to synthesize speech. Speech is synthesized by connecting wave segments to form a contiguous waveform. Each wave segment is assigned one or more connection types which describe the connection to be made between points on that wave segment and points on adjacent wave segments. A wave segment connector uses information on the connection types of adjacent wave segments to connect the end point and lead point of the adjacent wave segments using a normal sampling period or a normal sampling period compressed or expanded by 1/2 of the sampling period. The period used depends on the connection type stored in the connection type memory.
Description
Claims
  • 1. A device used with a voice synthesizing device which connects wave segments such as pitch wave segments in speech input to the device, comprising:
  • a connection type memory for storing a plurality of wave segment connection types;
  • means for assigning a connection type to a connection between a preceding wave segment and a following wave segment; and
  • a wave segment connector which, when said wave segments are connected, connects an end sampling point of the preceding wave segment and a lead sampling point of the following wave segment utilizing a preferred sampling period between the end sampling point of the preceding wave segment and the lead sampling point of the following wave segment with an interval determined by the connection type assigned to the connection between the preceding wave segment and the following segment.
  • 2. A device according to claim 1 wherein said preferred sampling period is selected from the group consisting of a predetermined sampling time period, three-halves a predetermined sampling time period, and one half of a predetermined sampling time period.
  • 3. A device used with a voice synthesizing device for connecting wave segments, comprising:
  • a) a connection type memory for storing a plurality of preferred connection types for wave segments, said connection types each representing a connection of an interpolated waveform for an end sampled value of a preceding wave segment of a particular type with an interpolated waveform for a lead sampled value of a following wave segment of a particular type, each of said preferred connection types determining a preferred sampling period for use during connection of said wave segments;
  • b) means for assigning a connection type to a connection between a preceding wave segment and a following wave segment by interpolating a time axis zero cross point for said interpolated waveform for said end sampled value of said preceding wave segment and a time axis zero cross point for said interpolated waveform for said lead sampled value of said following wave segment and
  • c) a wave segment connector providing connection of said preceding and following wave segments using one of said preferred sampling periods as determined by the connection type assigned to the connection between said preceding and following wave segments.
  • 4. A device according to claim 3 wherein said preferred sampling period has one of the following three values: a predetermined sampling time period, three-halves a predetermined sampling time period, and one half of a predetermined sampling time period.
  • 5. A device according to claim 3 wherein said plurality of preferred connection types comprises:
  • a) a first connection type in which both the time axis zero cross point of said interpolated waveform for said lead sampled value of said following wave segment and the time axis zero cross point of said interpolated wave segment for said end sampled value of said preceding wave segment are located within a second half of a predetermined sampling time period;
  • b) a second connection type in which both the time axis zero cross point of said interpolated waveform for said lead sampled value of said following wave segment and the time axis zero cross point of said interpolated wave segment for said end sampled value of said preceding wave segment are located within a first half of a predetermined sampling time period;
  • c) a third connection type in which the time axis zero cross point of said interpolated waveform for said lead sampled value of said following wave segment is located within a second half of a predetermined sampling period and the time axis zero cross point of said interpolated waveform segment for said end sampled value of said preceding wave segment is located within a first half of a predetermined sampling time period; and
  • d) a fourth connection type in which the time axis zero cross point of said interpolated waveform for said lead sampled value of said following wave segment is located within a first half of a predetermined sampling time period and the time axis zero cross point of said interpolated wave segment for said end sampled value of said preceding wave segment is located within a second half of a predetermined sampling time period.
  • 6. A device for connecting wave segments according to claim 3 wherein said wave segments comprise pitch wave segments.
  • 7. A device for connecting wave segments according to claim 3 wherein said wave segments comprise voice wave segments.
  • 8. A device for connecting wave segments according to claim 7 wherein said voice wave segments comprise quasi-voice wave segments.
  • 9. An improved voice synthesizing device of the type in which a read only memory device stores a control program for use by a central processing unit for voice synthesis, a random access memory device is used as a work memory during voice synthesis, a data read only memory device is used to store voice coding data, an input/output interface is provided through which input/output signals pass at the start of voice synthesis and using other processes, a digital to analog convertor is used for conversion of voice wave data synthesized under the control of the central processing unit, and in which an amplifier amplifies an input analog voice wave and outputs to a loudspeaker, wherein the improvement comprises:
  • a) a connection type memory for storing a plurality of preferred connection types for wave segments, said connection types each representing a connection of an interpolated waveform for an end sampled value of a preceding wave segment of a particular type with an interpolated waveform for a lead sampled value of a following wave segment of a particular type, each of said preferred connection types determining a preferred sampling period for use during connection of said wave segments;
  • b) means for assigning a connection type to a connection between a preceding wave segment and a following wave segment by interpolating a time axis zero cross point for said interpolated waveform for said end sampled value of said preceding wave segment and a time axis zero cross point for said interpolated waveform for said lead sampled value of said following wave segment;
  • c) a wave segment connector providing connection of said wave segments using one of said preferred sampling portions as determined by the connection type assigned to the connection between said wave segments to provide a synthesized voice output independent of any distortion in the pitch wave rise; and
  • d) means for electrically interconnecting said connection type memory and said wave segment connector with the control read only memory, the input/output interface, the central processing unit, the data read only memory, and the digital to analog convertor.
  • 10. A method of smoothly connecting wave segments for use in creating a synthesized voice free of distortion in a pitch wave rise, comprising the steps of:
  • a) interpolating between sampled values to determine interpolated values to produce an interpolated waveform;
  • b) identifying a time axis zero cross point for an interpolated waveform of an end sampled value of a preceding wave segment;
  • c) determining a time axis zero cross point for an interpolated waveform of a lead sampled value of a following wave segment;
  • d) classifying the time axis zero cross point of the preceding wave segment and the following wave segment with a connection type memory to select a preferred wave segment connection type;
  • e) selecting a preferred wave segment connection type and a preferred sampling period from a plurality of connection types and sampling periods as determined by said wave types; and
  • f) connecting said preceding wave segment with said following wave segment using said selected preferred wave segment connection type and said selected preferred sampling period to provide a synthesized voice independent of distortion in the pitch wave rise.
  • 11. A method of smoothly connecting wave segments which can be used for creating a synthesized voice free of distortion in the pitch wave rise according to claim 10, wherein the step of selecting a preferred wave segment connection type and a preferred sampling period comprises the steps of:
  • a) categorizing the time axis zero cross points of each of the interpolated waveforms for the preceding wave segment and the following wave segment by determining which memory waveforms stored in a wave segment connection type memory are most similar to said interpolated waveforms; and
  • b) interpolating between said end sampled value and said lead sampled value with the preferred sampling period corresponding to the preferred wave connection type, the sampling period selected from a group comprising a predetermined sampling time, three-halves a predetermined sampling time, and one half a predetermined sampling time.
Priority Claims (1)
Number Date Country Kind
63-183906 Jul 1988 JPX
SUMMARY OF THE INVENTION

This is a continuation of application Ser. No. 07/381,000, filed Jul. 17, 1989, abandoned. 1. Field of the Invention The present invention relates to a voice synthesizing device which compiles wave segments such as pitch wave segments and quasi-voice wave segments to reproduce a voice wave. 2. Description of the Prior Art It is well known that of the different voice waves, the waves of voiced sounds such as vowels have a redundant pitch structure in which essentially the same wave is repeated from several to a dozen times within a cycle of from 2 or 3 ms to 10 ms. Conventionally, voice synthesizers have employed a phoneme segment compiling method using the above pitch structure to generate a synthesized voice. Voice synthesizers of this type repeat and connect pitch wave segments or quasi-voice wave segments for a predetermined period to synthesize a voice wave. This serves to reduce the amount of wave segment data for said pitch wave segments or quasi-voice wave segments, and maintains high quality in the eventually synthesized voice. However, because a conventional voice synthesizer using the segment compiling method as described above synthesizes a voice wave by simply repeating and connecting pitch wave segments or voice wave segments based on said pitch wave segments for a predetermined period, distortion arises where said pitch wave segments or quasi-voice wave segments are connected as described below. FIG. 4a through FIG. 4d shows an example of pitch wave segments used in voice waveform synthesis. Each double circle in FIG. 4a through 4d shows the sampled value at every sampling time (hereafter referred to as a sampled value); the solid lines drawn perpendicular to the time axis from these points represent the sampling time; and the dotted lines drawn perpendicular to the time axis between these sampling points represent the interpolated sampling time at which said sampled value is interpolated to output the interpolated value during the waveform synthesis. The pitch wave segments shown in FIG. 4a through FIG. 4d may be of one of the following four wave types depending on the position at which the wave crosses the zero point. Specifically, the sampling time period Ts is divided into two phases, the first referred to as P1 and the later as P2. Thus, in wave type (1) shown in FIG. 4(a), zero cross point m for the interpolated waveform of top sampled value of the pitch segment falls within the range P2, and the zero cross point o for the interpolated waveform of the end sampled value of the pitch segment falls within the range P2. In wave type (2) shown in FIG. 4(b), the zero cross point for the interpolated waveform of the top or lead sampled value of the pitch segment falls within the range P1, and the zero cross point for the interpolated waveform of the end sampled value of the pitch segment falls within the range P1. In wave type (3) shown in FIG. 4(c), the zero cross point for the interpolated waveform of top sampled value of the pitch segment falls within the range P2, and the zero cross point for the interpolated waveform of end sampled value of the pitch segment falls within the range P1. In wave type (4) shown in FIG. 4(d), the zero cross point for the interpolated waveform of top sampled value of the pitch segment falls within the range P1, and the zero cross point for the interpolated waveform of end sampled value of the pitch segment falls within the range P2. Thus, if pitch wave segments of each of the types previously described are simply repeated and connected, the pitch cycle where the segments are connected will be shifted in phase by a quantity equal to half the sampling period, resulting in distortion which differs from the original wave. In other words, if, for example, like waves of type (3) are simply connected, the phase of the resulting wave will be delayed by one-half sampling cycle as shown in FIG. 5(b). Furthermore, if like waves of type (4) are simply connected, the phase of the resulting wave will be advanced by one-half sampling cycle as shown in FIG. 5(c). In this event, interference will occur in the rise of the pitch wave segment, and the sound quality of the eventually synthesized voice will significantly deteriorate. The deterioration in sound quality is particularly severe when the pitch period is short (i.e., the pitch frequency is high) as in female voices. In order to solve the above discussed problem, there are two methods. According to one method, one pitch wave segment is cut out, temporarily converted to a frequency axis wave by fast Fourier transformation (FFT) analysis, and reconverted to a time axis wave by reverse FFT after phase adjustment so that both ends of the pitch wave segment can approach zero. According to the other method, an impulse response wave is reproduced by linear predictive coding (LPC) of the one pitch wave which has been cut out, and this impulse response wave is used as the pitch wave segment. However, in the above methods, the ends of the pitch wave segment are not sufficiently close to zero and distortion thus remains in the pitch wave segment, resulting in variations in the tone. Therefore, it is an object of the present invention to provide a voice synthesizing device which is effective to produce a synthetic voice with no sound quality distortion through a simple process to connect the wave segments. In order to achieve the aforementioned objective, a voice synthesizing device of the present invention for compiling wave segments such as pitch wave segments in speech to synthesize speech is characterized by the provision of a connection type memory for storing a connection type descriptive of the connection state of that point where said wave segments are connected; and a wave segment connector which, when said wave segments are connected, connects the end sampling point and the lead sampling point of the wave segments with a conventional sampling period, or with a conventional sampling period compressed or expanded by only 1/2 of the sampling period according to the connection type stored in said connection type memory. Thus, when voice wave segments are compiled to synthesize a voice, the connection type stored in the connection type memory is referenced. According to the referenced connection type, the end and leading sampling points of the wave segments are connected with a conventional sampling period, or with a conventional sampling period compressed or expanded by only 1/2 of the sampling period so that said wave segments are connected smoothly to provide a synthesized voice wave.

US Referenced Citations (7)
Number Name Date Kind
4214125 Moser et al. Jul 1980
4392018 Fette Jul 1983
4419540 Henderson Dec 1983
4433434 Mozer Feb 1984
4489437 Fukuichi et al. Dec 1984
4601052 Saito et al. Jul 1986
4619359 Morito Sep 1987
Foreign Referenced Citations (2)
Number Date Country
0081595 Jun 1983 EPX
WO8504747 Oct 1985 WOX
Non-Patent Literature Citations (1)
Entry
Yato et al., "Speech Synthesis by the Compililation of Speech Segments (in Japanese)", presented Dec. 18, 1973, at a laboratory of Kokusai Denshin Denwa. This paper contains an English summary.
Continuations (1)
Number Date Country
Parent 381000 Jul 1989