This non-provisional application claims priority under 35 U.S.C. §119(a) on Patent Application No(s). 95145977 filed in Taiwan, R.O.C. on Dec. 8, 2006, the entire contents of which are hereby incorporated by reference.
1. Field of Invention
The present invention relates to a method for varying speech speed, and more particularly to a method based on pitch period of speech signal to vary the speech speed.
2. Related Art
For the electronic apparatuses equipped with language learning functions, language conversations intended to learn may be recorded in the apparatus in advance. The electronic apparatus may be portable to allow the user learning language wherever and whenever. However, every user is at different learning level; the same speed for playing a section of conversation may be proper to understand for some users, but too fast to understand for others. Therefore, a so-called speed-varying function becomes one of the major functions of the language-learning apparatus.
Speed variation indicates that the language-learning apparatus varies the playing speed by user's demand while playing speech(s), accompanying with the same tone under various speeds. So ideally no matter the speed variation becomes slower or faster, users may all listen clearly; which is really helpful to language learning.
Although the conventional language-learning apparatus has the speed-varying function, usually the speech played through speed variation is distorted. Since the speech signal is a continuous analog signal, the voiceprint frequencies generated from different persons' pronunciations or different sound sources are different. A common speed-varying technology is to repeatedly play the sampling speech data, or to play intermittently by intervals, thereby facilitate the speed-varying function. Such approach will provide decelerated or accelerated playing speeds and the same signal envelope as the original speech. However, it also generates echoes and machine noises, leading to decreases of the voiceprint frequency; the effects are just like decelerating or accelerating the rotation speed of a recorder motor, which causes obvious distortions.
Therefore, how to maintain the tone of the original speech without distortion while the user operates the speed-varying function on a language-learning apparatus has become an issue required to be urgently solved.
Accordingly the present invention provides a method for varying speech speed, which aims at the processing of the speech signal to facilitate deceleration or acceleration of playing the speech by user's demand. Those output to the user's ears after speed variation will be clear speeches without losing its original tones.
A method for varying speech speed provided by an exemplary embodiment of the present invention includes the following steps. First, receive an original speech signal. Calculate a pitch period of the original speech signal. Define search ranges according to the pitch period. Find a maximum within each of the search ranges of the original speech signal. Divide the original speech signal into speech sections according to the maxima. Obtain a speed-varied speech signal by applying a speed-varying algorithm to each speech section of the original speed signal according to a speed-varying command. Eventually, output the speed-varied speech signal.
According to the present invention, first the original speech signal is divided into plural speech sections. The divided sections is not fixed as the conventional technology, but defined according to the Sum of Magnitude Difference Function (SMDF) or Average of Magnitude Difference Function (AMDF). The pitch period of the original speech signal will be obtained in advance, and then a maximum will be found according to the data around the pitch period. Afterwards, use the found maxima to divide the original speech signal into the plural speech sections. The advantage of above solution is to proceed through speed variation process by using the smallest unit in the speech signal, namely, the pitch period. Therefore, the present invention actually uses a more precise solution to improve the quality of relevant speed variation.
Further scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The present invention will become more fully understood from the detailed description given hereinbelow illustration only, and thus are not limitative of the present invention, and wherein:
Please refer to
Step S10: Receive an original speech signal. The original speech signal is language declamation such as English, Japanese conversation and etc.
Step S20: Calculate a pitch period of the original speech signal. The sound range of human voice is about 50 Hz to 1000 Hz. Everyone will read a same section of conversation and make various ways of speech. That is because every person has a different voice timbre. The differences between voice timbres represent different soundwave shapes for their pitch periods. Accordingly, every different speech signal has its different pitch period. As a result of every individual's unique voice timbre, the speech signal generated by the same person will have approximately the same pitch period; even though the speech has different contents.
Please refer to
Please refer to
In addition, the above SMDF calculation will make smaller curves due to the shorter overlapped waveform. To avoid such situation, we can proceed to obtain a normalized SMDF. Namely, divide the inner product of the overlapped portion by the amount of the overlapped dots to obtain the conventional AMDF (Average of Magnitude Difference Function). Therefore, using either SMDF or AMDF may calculate the pitch period of the original speech signal.
Step S30: Define search ranges according to the pitch period calculated in step S20. Although a section of the original speech signal is combined by multiple sections of the pitch period, there are still differences between high and low sounds generated as result of different speech contents (different contents of declaiming languages). So the pitch periods will have minor difference in their period sizes. Consequently, after calculate the pitch period(s) we define a search range around each of the pitch periods to facilitate the following search operations.
Step S40: Find a maximum within each of the search ranges of the original speech signal. Use each of the search ranges defined in step S30 as a unit to search in the original speech signal. Record the maximum found in each of the search ranges in the original speech signal.
Step S50: Divide the original speech signal into plural speech sections according to the maxima. Please refer to
Step S60: Obtain a speed-varied speech signal by applying a speed-varying algorithm to each speech section of the original speed signal according to a speed-varying command. The speed-varying command is given by the user. When the user thinks the speech signal is played too fast, the speed-varying command to decelerate may be given to the apparatus. When the speed-varying command is to decelerate, the speed-varying algorithm duplicates some of the speech section to make the speed-varied speech signal longer than the original speech signal. Please refer to
Oppositely, when the speed-varying command is to accelerate, the speed-varying algorithm will delete some of the speech sections to make the speech signal shorter than the original speech signal. Please refer to
Step S70: Eventually, output the speed-varied speech signal. The speed variation procedure is now completed.
Please refer to
Step S62: Multiply each of the speech sections in the original speech signal by a weighting function to obtain a weighting section; wherein in each of the search ranges the weighting function is an increasing function when prior to the maximum but a decreasing function when posterior to the maximum. Therefore, the weighting function may be a triangle wave function.
Step S64: Add up the weighting sections. Since each of the speech sections has been multiplied by the weighting function and becomes the weighting section, we can add up these weighting sections afterwards according to the speed-varying command. Therefore, the speed-varied speech signal will as clear as the original speech signal without distortions. Neither intermittent sounds nor echoes will be generated.
The aforesaid add-up speed-varying algorithm may further include the step of insetting the add-up weighting section between the speech sections. Please refer to
Oppositely, the add-up speed-varying algorithm may further include another step of replacing the speech section(s) with the add-up weighting section(s). Please refer to
Eventually, please refer to
The invention being thus described, it will be obvious that the same may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the invention, and all such modifications as would be obvious to one skilled in the art are intended to be included within the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
95145977 A | Dec 2006 | TW | national |
Number | Name | Date | Kind |
---|---|---|---|
4864620 | Bialick | Sep 1989 | A |
5175769 | Hejna et al. | Dec 1992 | A |
5341432 | Suzuki et al. | Aug 1994 | A |
5479564 | Vogten et al. | Dec 1995 | A |
5717829 | Takagi | Feb 1998 | A |
5749064 | Pawate et al. | May 1998 | A |
5828995 | Satyamurti et al. | Oct 1998 | A |
6173255 | Wilson et al. | Jan 2001 | B1 |
6496794 | Kleider et al. | Dec 2002 | B1 |
6718309 | Selly | Apr 2004 | B1 |
6944510 | Ballesty et al. | Sep 2005 | B1 |
6982377 | Sakurai et al. | Jan 2006 | B2 |
7412379 | Taori et al. | Aug 2008 | B2 |
20020133334 | Coorman et al. | Sep 2002 | A1 |
20030033140 | Taori et al. | Feb 2003 | A1 |
20050273321 | Choi | Dec 2005 | A1 |
20060149535 | Choi et al. | Jul 2006 | A1 |
Number | Date | Country |
---|---|---|
1197976 | Nov 1998 | CN |
0681398 | Nov 1995 | EP |
0910065 | Apr 1999 | EP |
Number | Date | Country | |
---|---|---|---|
20080140391 A1 | Jun 2008 | US |