1. Field of the Invention
The present invention relates to a pitch period extracting apparatus of a speech signal. More specifically, the present invention relates to a pitch period extracting apparatus which extracts a pitch period of an inputted speech signal by evaluating a delay time at which a maximum autocorrelative value is obtainable.
2. Description of the Prior Art
As methods for extracting a pitch period of a speech signal with utilizing autocorrelative values, two methods are known. A first method is a method utilizing a short-time autocorrelation, and a second method is a method utilizing a modified short-time autocorrelation.
In the first method, it is assumed that the speech signal in restricted in time, and autocorrelative values are evaluated by regarding as that the speech signal exists within only a period of a time length Ts and the speech signal is always zero out of the period. In the second method, it is assumed that the speech signal is not restricted in time, and autocorrelative values between a period of a time length Tt and a period determined by delaying the period of the time length Tt within a range in which a presence of a pitch period is assumed.
Now, if a waveform of an inputted speech signal is represented by digital speech data x(n), the short-time autocorrelative value Rn(k) in the first method is given by the following equation (1).
In the equation (1), “Ts” indicates a time period in which a presence of the speech signal is assumed, and “k” is a delay time for delaying the speech signal waveform in calculating the short-time autocorrelative value Rn(k).
Furthermore, the modified short-time auto correlative value R′n(k) in the second method is given by the following equation (2).
In addition, in the equation (2), “k” is a delay time for delaying a speech signal waveform in calculating the short-time autocorrelative value R′n(k), and having a relationship of Ts>Tt>>k.
As well seen from the equations (1) and (2), in the first method, a range in which a product sum is calculated in evaluating the autocorrelative value (hereinafter, may be called as “product sum range”) is decreased according to an increase of the delay time k, and in contrast, in the second method, the product sum range is constant irrespective of the delay time k.
Therefore, there is not a possibility that double a true pitch period is erroneously evaluated as a pitch period in the first method; however, in the second method, there is a possibility that double a true pitch period is erroneously evaluated as a pitch period. That is, in comparison with the second method, the first method is advantageous in a point of an accuracy of a pitch period.
However, in comparison with the second method, the first method is disadvantageous in a point of a processing time. More specifically, in the first method, the autocorrelative values are weighted with extremely large weights when a pitch period is short, while the autocorrelative values are weighted with extremely small weights when a pitch period is long. Therefore, in the case of a long pitch period, it is necessary to prevent the autocorrelative value from becoming to be smaller than autocorrelative value having a short period which is not a pitch period. Accordingly, in the first method, in order to calculate a pitch period with precision, it is necessary to set the time period Ts at a degree of a time length of at least double a possible longest pitch period (k=100 in FIG. 6). Therefore, in the first method, there is a disadvantage that the processing time becomes long. In contrast, in the second method, since the weights are constant irrespective of the pitch period, the time length Tt may be set at a degree of a time length equal to a pitch period, and therefore, the processing time is short.
In other words, in the first method, there is an advantage that it is possible to extract a pitch period with precision but a disadvantage that the processing time is long, and in the second method, there is an advantage that the processing time is short but a disadvantage that there is a possibility that an erroneous pitch period is extracted.
Therefore, a principal object of the present invention is to provide a novel pitch period extracting apparatus of a speech signal.
Another object of the present invention in to provide a pitch period extracting apparatus in which it is possible to accurately extract a pitch period with a short processing time.
A pitch period extracting apparatus according to the present invention comprises: an A/D converter for converting a speech signal into speech signal data with a sampling frequency; a memory for storing the speech signal data outputted from the A/D converter; an autocorrelative value calculating means for calculating autocorrelative values of the speech signal data stored in the memory on the basis of delay times of the speech signal data; a delay time range determining means for determining a range of the delay times on the basis of the sampling frequency; and a pitch period detecting means for detecting a pitch period of the speech signal by evaluating a maximum value out of the autocorrelative values.
The delay time range determining means determines the delay times in calculating the autocorrelative values by the autocorrelative value calculating means on the basis of information of the sampling frequency. Therefore, it is possible to most suitably set the range of the delay times for extracting the pitch period. Therefore, according to the present invention, it is possible to calculate the pitch period with accuracy and it is possible to prevent a calculation amount from being increased.
In an aspect of the present invention, the pitch period extracting apparatus further comprises a period setting means for setting a plurality of periods within the range of the delay times determined by the delay time range determining means, and a product sum range control means. In a case where the sampling frequency is 8 kHz, the above described delay time range determining means determines the range of 20 samples≦k≦100 samples, and the range of 15 samples≦k≦75 samples in a case where the sampling frequency is 6 kHz. Then, the period setting means sets periods of 20≦k≦40, 40≦k≦80 and 80 ≦k≦100, as a fist period, a second period and a third period in a case of 8 kHz. In a case of 6 kHz, periods of 15≦k<30, 30≦k<60 and 60≦k≦75 are respectively set as a first period, a second period and a third period.
In such a case, the period setting means preferably sets a starting value and an end value of each of the first, second and third periods in a manner that the end value does not include double the starting value.
Furthermore, the product sum range control means controls product sum ranges in respectively evaluating the autocorrelative values in the first, second and third periods. Specifically, the product sum range control means makes the product sum ranges for the first period, the second period and the third period sequentially shorter in this order, whereby the autocorrelative values of the respective periods can be weighted with weights different from each other.
Then, the pitch period detecting means evaluates a maximum value out of the autocorrelative values of the respective periods, and detects a pitch period equal to a delay time at which the maximum value is obtained.
In accordance with the present invention, even if a pitch period is short, the autocorrelative values are not weighted with extremely large weights, and therefore, the range of delay times in calculating the autocorrelative values may be narrow in comparison with the conventional first method. Therefore, a time for calculating the autocorrelative values becomes short, and a memory capacity necessary for calculating the autocorrelative values can be reduced.
The above described objects and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.
A pitch period extracting apparatus 10 of this embodiment shown in
Delay times k in calculating the autocorrelative values Rn(k) are determined by the microcomputer 18 according to information of the sampling frequency fs of the A/D converter 14. Then, the microcomputer 18 evaluates a maximum value out of the autocorrelative values Rn(k) of the speech signal data x(n), and a delay time k at which the maximum value is obtained is outputted as a pitch period P of the analog speech signal x(t). A pitch period of a speech signal is normally approximately 80-400 Hz, and it is possible to almost cover a speech signal generated by human beings by this range. For example, if the sampling frequency of the A/D converter 14 is 8 kHz, a range within which the autocorrelative values Rn(k) are calculated, that is, a range of the delay time k is set as 20 samples≦k≦100 samples. Furthermore, if the sampling frequency is 6 kHz, for example, the range of the delay times k for calculating the auto correlative values Rn(k) is set 15 samples≦k≦75 samples. In addition, the numbers of the samples are calculated according to fs/400-fs/80.
With referring to
Then, in a step S3, the microcomputer 18 sequentially reads-out the speech signal data x(n) stored in the buffer memory 16, and calculates the autocorrelative values Rn(k) on the basis of the following equation (3) and according to the delay times k set in the step S2.
More specifically, the microcomputer 18 calculates, in the product sum range represented by “T” in the equation (3), autocorrelative values Rn(20), Rn(21), . . . Rn(99) and Rn(100) when the sampling frequency fs is 8 kHz, or autocorrelative values Rn(15), Rn(16), . . . Rn(74), and Rn(75) when the sampling frequency fs in 6 kHz. Then, in a step S4, the microcomputer 18 evaluates a maximum value out of the autocorrelative values Rn(k) calculated in the step S3, and outputs a delay time k at which the maximum value is evaluated as a pitch period P of the inputted speech signal x(t).
In the second embodiment, the microcomputer 18 sets a most suitable range of the delay times k according to the sampling frequency fs. In contrast, in
More specifically, in this embodiment shown, the microcomputer 18 divides the range of the delay times k which is a pitch period searching time period in calculating the autocorrelative values Rn(k) into a plurality of periods. In such a case, respective starting values and end values of the respective periods are determined such that the end value of the period does not include double the starting value of that period. Then, autocorrelative values Rn1(k), Rn2(k) and Rn3(k) of the respective periods are calculated.
With referring to
In addition, when the sampling frequency fs is 6 kHz, as similar to
In succeeding steps 511, 512, and 513, the microcomputer 18 calculates the autocorrelative values Rn1(k), Rn2(k), and Rn3(k) according to the following equations (4), (5) and (6).
In the previous steps 511, 512 and 513, the microcomputer 18 determines the respective product sum period T1, T2 and T3 of the equations (4), (5) and (6).
Thereafter, in a step 514, the microcomputer 18 evaluates a maximum value out of the autocorrelative values Rn1(k), Rn2(k) and Rn3(k) calculated in the previous step S11, S12 and S13, and outputs a delay time k at which the maximum value is obtainable as a pitch period P of the input speech signal.
In this embodiment shown, it is noted that a possibility that the double a true pitch period is erroneously recognized as a pitch period is made small by applying small weights to the autocorrelative values having long periods, and therefore, the true pitch period can be extracted. However, in such a case, as different from the first method of the prior art, the autocorrelative values of each of the respective periods are not weighted with the different weights. That is, the autocorrelative values in each of the periods are weighted with the same weight. The reason is that in this embodiment shown, the end values are determined such that the end values of the respective periods do not include values twice the starting values, and therefore, there is no component of double period in each of the respective periods.
However, in
Furthermore, if an end value of the product sum range in calculating the autocorrelative values is set at a possible maximum value in each of the periods, the accuracy of the pitch period can be increased, More specifically, in the above described example, if the sampling frequency is 6 kHz, the product sum ranges T1, T2 and T3 may be set as T1=Ts−29, T2=Ts−59, T3=Ts−75, and in a case of 8 kHz, T1=Ts−39, T2=Ts−79 and T3=Ts−100 may be set.
As similar to
Thereafter, in steps 522, 523, and 524, the autocorrelative values Rn1(k), Rn2(k) and Rn3(k) are calculated for each of the periods according to the aforementioned equations (4), (5) and (6) with the product sum ranges T1, T2 and T3. In a lost step 525, the microcomputer 18 evaluates a maximum value out of the autocorrelative values, and outputs a delay time k at which the maximum value is obtainable as a pitch period of the speech signal.
In addition, in the above described embodiments, a case where the sampling frequency fs is 8 kHz or 6 kHz is described. However, a value of the sampling frequency fs is not limited thereto. Furthermore, the range of the delay times k are determined as 15≦k≦75 (6 kHz) or 20≦k≦100 (8 kHz). However, an arbitrary range of delay time may be set. Furthermore, although the range of delay time is divided into three periods; however, the number of the periods may be an arbitrary value.
Although the present invention has been described and illustrated in detail, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, the spirit and scope of the present invention being limited only by the terms of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
6-108544 | May 1994 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
4653098 | Nakata et al. | Mar 1987 | A |
Number | Date | Country | |
---|---|---|---|
Parent | 08447646 | May 1995 | US |
Child | 09685938 | US |