Claims
- 1. In a system of speech analysis used, for example, in a vocoder in which the pitch value of human speech is determined and subsequently transmitted together with other speech parameters to the receiving section of another vocoder wherein the speech is synthesized and reproduced, a method for determining the pitch of a speech signal comprising the steps:
- analyzing the amplitude of said speech signal by regularly selecting time segments of the speech signal, determining from each time segment a sequence of spectrum components which constitutes the descrete Fourier transform of samples of the speech signal, and deriving in each time segment the position of the significant peaks in the spectrum from the sequence of spectrum components;
- selecting a starting value for the pitch, determining a sequence of consecutive integral multiples of said pitch value, and establishing intervals around said pitch value and multiples thereof, said intervals defining a mask having apertures in situ of said intervals, harmonic numbers corresponding to the multiplication factors in said multiples being associated respectively with said apertures;
- determining the number of said significant peak positions which coincide with said mask apertures;
- computing a quality figure in accordance with a criterion indicating the degree to which said significant peak positions and said mask apertures match;
- repeating said immediately preceding determining and computing steps using masks, as determined in said selecting step, for consecutively higher values of pitch up to a predetermined highest value of pitch, resulting in the computing of separate quality figures associated with each of said pitch values;
- selecting the value of pitch having the highest associated quality figure and designating the associated mask as a reference mask;
- associating the harmonic numbers of the apertures of said reference mask with the significant peak positions coinciding with said apertures, thereby defining the location of each of said significant peak positions in a sequence of harmonics of a same fundamental tone;
- determining a probable value for the pitch of the speech signal, wherein the deviations between the significant peak positions and the corresponding multiples of the probable value having the same harmonic numbers are as small as possible, and
- combining said determined pitch value with other speech parameters for subsequent transmission or storage thereof in, for example, a read-only-memory.
- 2. A method for determining the pitch of a speech signal as claimed in claim 1, characterized in that the step of computing the quality figure Q uses one of the expressions: ##EQU9## wherein K represents the number of significant peak positions coinciding with apertures of the mask, M representing the number of apertures of the mask and N the number of significant peak positions.
- 3. A method for determining the pitch of a speech signal as claimed in claim 2, characterized in that in the step of computing the quality figure, M' is substituted for the quantity M in the expressions for the quality figure Q, wherein M' is equal to M reduced by the number of apertures located outside the range of the significant peak positions.
- 4. A method for determining the pitch of a speech signal as claimed in claim 2, characterized in that in the step of computing the quality figure, in the expressions for the quality figure Q the quantity N is replaced by N' which is equal to N reduced by the number of significant peak positions which are located outside the range of the mask apertures.
- 5. A method for determining the pitch of a speech signal as claimed in claim 1, characterized in that the step of determining the probable value of the pitch F.sub.o uses the expression: ##EQU10## wherein x.sub.i represents the i.sup.th significant peak position and n.sub.i the harmonic number associated therewith and wherein K represents the number of significant peak positions which coincide with apertures of the mask.
- 6. In a system of speech analysis used, for example, in a vocoder in which the pitch value of human speech is determined and subsequently transmitted together with other speech parameters to the receiving section of another vocoder wherein the speech is synthesized and reproduced, a method for determining the pitch of a speech signal comprising the steps:
- analyzing the amplitude of said speech signal by regularly selecting time segments of the speech signal, determining from each time segment a sequence of spectrum components which constitutes the discrete Fourier transform of samples of speech signal, and deriving in each time segment the position of the significant peaks in the spectrum from the sequence of spectrum components;
- selecting an initial value for the pitch and determining a sequence of consecutive integral multiples of this pitch value, harmonic numbers corresponding to the multiplication factor in said multiples being associated respectively with said multiples of said pitch value;
- establishing intervals around the significant peak positions, said intervals defining a mask having apertures in situ of said peak positions;
- determining the number of multiples of said pitch value which coincide with said mask apertures;
- computing a quality figure in accordance with a criterion indicating the degree to which said multiples of said pitch value and said mask apertures match;
- repeating said immediately preceding determining and computing steps using consecutively higher values of pitch and multiples thereof, as determined in said selecting step, up to a predetermined highest value, resulting in the computing of separate quality figures associated with each of said pitch values;
- selecting the value of pitch having the highest associated quality figure and designating this pitch value as a reference pitch value;
- associating the harmonic numbers of the multiples of the reference pitch value with the significant peak positions located in the same aperture, thereby defining the location of said significant peak positions in a sequence of harmonics of the same fundamental tone;
- determining a probable value for the pitch of the speech signal, wherein the deviations between the significant peak positions and the corresponding multiples of the probable value having the same harmonic numbers are as small as possible; and
- combining said determined pitch value with other speech parameters for subsequent transmission or storage thereof in, for example, a read-only-memory.
- 7. A method as claimed in claim 6 characterized in that the step of computing the quality figure Q uses one of the expressions: ##EQU11## wherein K represents the number of multiples of the pitch which coincide with an aperture of the mask, wherein M represents the number of multiples of the pitch of the sequence and N the number of significant peak positions.
- 8. A method as claimed in claim 7, characterized in that in the step of computing the quality figure, M' is substituted for the quality M in the expression for the quality figure Q, wherein M' is equal to M reduced by the number of multiples of the pitch which are located outside the range of the significant peak positions.
- 9. A method as claimed in claim 7, characterized in that in the step of computing the quality figure, in the expressions for the quality figure Q the quantity N is replaced by N' which is equal to N reduced by the number of significant peak positions which are located outside the range of the sequence of multiples of the pitch.
- 10. A method as claimed in claim 2, characterized in that the step of determining the probable value of the pitch F.sub.o uses the expression: ##EQU12## wherein x.sub.i represents the value of the i.sup.th significant peak position and R.sub.i the harmonic number associated therewith wherein N represents the number of significant peak positions and therein the number zero is associated with a significant peak position when no multiple of the selected pitch is located in the relevant mask aperture.
Priority Claims (1)
Number |
Date |
Country |
Kind |
7812151 |
Dec 1978 |
NLX |
|
Parent Case Info
This is a continuation of application Ser. No. 099,296, filed Dec. 3, 1979.
US Referenced Citations (6)
Non-Patent Literature Citations (2)
Entry |
L. Rabiner et al., "A Comparative Performance etc.", IEEE Trans. Acoustics, Sp. and Sig. Proc., Oct. 1976, pp. 399-418. |
G. White et al., "Speech Recognition Experiments etc.", IEEE Trans. Acoustics, Sp. and Sig. Proc., Apr. 1976, pp. 183-188. |
Continuations (1)
|
Number |
Date |
Country |
Parent |
99296 |
Dec 1979 |
|