1. Technical Field
The embodiments herein generally relates to the voice synthesizers or speech synthesizers and particularly to a pitch tracking system for human voice. The embodiments herein more particularly relates to a real time dynamic pitch tracking system for use in mobile communication system and a singer evaluation method using the real time dynamic pitch tracking system.
2. Description of the Related Art
Over the past few years, the practice of voice tracking in many applications has grown. The property of voice which we call pitch is determined by the rate of vibration of the vocal cords. Pitch tracking is important in some speech processing applications. With such a wide range of interest, the researchers have worked on constructing the pitch determination algorithms that are ideal for their application. Despite advances in mobile communication, the pitch tracking in real-time remains quite a challenge. Accurate speech recognition systems typically depend on algorithms and complex statistical models.
Pitch is the fundamental frequency of the repetitive portion of the voice wave form. Pitch is typically measured in terms of the time period of the repetitive segments of the voiced portion of the speech wave forms. The speech waveform is a highly complex waveform and very rich in harmonics. The complexity of the speech waveform makes it very difficult to extract pitch information.
The basic categories of the pitch tracking methods include a frequency domain analysis and a time domain analysis. Frequency domain analysis utilizes Fourier analysis to transform a window of a signal from amplitude vs. time to amplitude vs. frequency and compute a frequency using the Fourier components. Time domain analysis is performed on the window of the signal without transforming it to the frequency domain and performing calculations on the original signal to determine the pitch.
Various pitch detection algorithms have been developed in the past years. Pitch tracking is not really new, but the currently available system uses complex computational algorithms.
None of the currently available pitch tracking systems estimate and track the pitch of a human being dynamically in real time and in easy manner. Hence there is a need for a dynamic real time pitch tracking system for mobile communication system.
The abovementioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.
The primary object of the embodiments herein is to develop a system to estimate the pitch of the voice of a human being in real time easily using an algorithm.
Another object of the embodiments herein is to develop a system to track the pitch of the voice of a human being dynamically using a time varying model.
Yet another object of the embodiments herein is to develop a system for singer evaluation in real time.
Yet another object of the embodiments herein is to develop a system for short term identification of songs and human vocabulary.
These and other objects and advantages of the embodiments herein will become readily apparent from the following detailed description taken in conjunction with the accompanying drawings.
The various embodiments herein provide a system and method to track the pitch of the voice of a human being in real time using time varying model. According to one embodiment, the input voice is synthesized into a sum of two time series namely into a higher order model (HOM) and a lower order model (LOM). In the current method of tracking the pitch in real time, the voice time series Vlk is extracted from the input voice Vk by passing the input voice into 6th order low pass Butterworth filter. The output of the filter is down sampled and fitted to a time varying 2nd order time varying model. The signal after fitting with a time varying model is passed through a pitch tracking filter to obtain the pitch frequency. The estimated pitch is smoothened using a 2nd order. Kalman filter to remove the noise in the pitch.
According to one embodiment, a model based real time pitch tracking system has a low pass filter. A down sampler is connected to the low pass filter. A second order band pass filter is connected to the down sampler. A Gradient filter is connected to the second order band pass filter. A fading filter is connected to the second order band pass filter. An integrator is connected to the fading filter and to the gradient filter. A first order filter is connected to an integrator. A pitch frequency estimator is connected to the first order filter. A smoothing filter is connected to the pitch frequency estimator.
A lower order model is separated from an input voice time series to perform a pitch tracking process in real time.
The low pass filter is a sixth order low pass Butterworth filter to receive the input voice series and to extract a lower order voice series from the input voice series in real time. The down sampler performs the down sampling of the extracted lower order voice series to obtain a low order voice signal. The second order band pass filter is connected to the down sampler and is provided with an algorithm to fit a second order time varying model to the output of the down sampler to obtain the model parameters related to the lower order voice series of the input voice.
The fading filter is connected to the output of the second order band pass filter through an adder. The fading filter is connected to the input of the second order band pass filter through a first delay unit. The fading filter is connected to the second order band pass filter to calculate an error value in the measurement of the lower order voice in a pitch tracking process.
The gradient filter is connected to the second order band pass filter and is provided with an algorithm to calculate a gradient of the measured error value in the measurement of the lower order voice in a pitch tracking process. The integrator is connected to the gradient filter through a second delay unit to receive the gradient of the measured error value. The integrator is connected to the input and to the output of the fading filter to receive the input lower order voice and the measured error value. The integrator is connected to the fading filter and the gradient filter to calculate a model parameter related to the pitch of the lower order voice. The pitch frequency estimator is connected to the integrator through a first order filter to receive the output of the integrator to calculate a pitch value of the input voice. The smoothing filter is connected to the pitch frequency estimator to obtain a smooth pitch. The smoothing filter is a second order Kalman filter.
According to another embodiment, a singer evaluation method using the model based real time pitch tracking system is provided. According to the method, an interactive voice response system is accessed through a communication means by a singer. A song is selected by the singer for singing. The selected song is played.
Then the selected song is sung by the singer. The song sung by the singer is recorded. The song sung by the singer is compared and evaluated with the selected reference song to calculate a score. The evaluation result is displayed. The process of evaluating includes estimating the pitch of the singer and the pitch of the reference singer who has played the reference singer to calculate the score corresponding to the degree of matching between the singer and the reference singer.
The process of accessing interactive voice response system involves initiating a phone call using a fixed line or a mobile phone. The process of selecting a song for singing involves selecting a desired song from a list of songs stored in a database. The process of selecting further comprises selecting options including language, gender and songs.
The method further comprises a process of selecting a listening option or recording option at the end of the playing of the selected song by a singer. The selected song is played again when the listening option is chosen by the singer. The recording option is selected by the singer to record the song sung by the singer. The process of recording the song sung by the singer includes playing karaoke during the singing of the selected song by the singer. The process of recording the song sung by the singer involves playing the recorded song along with karaoke and returning back to the recording mode after playing the recorded song sung by the singer. The process of recording involves enabling the singer to sing the selected song for any number of times until the singer is satisfied with the recorded song. The process of evaluating the song sung by the singer is initiated after receiving a confirmation of the recorded song from the singer.
These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
The other objects, features and advantages will occur to those skilled in the art from the following description of the preferred embodiment and the accompanying drawings in which:
Although specific features of the embodiments herein are shown in some drawings and not in others. This is done for convenience only as each feature may be combined with any or all of the other features in accordance with the embodiments herein.
In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which the specific embodiments that may be practiced is shown by way of illustration. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments and it is to be understood that the logical, mechanical and other changes may be made without departing from the scope of the embodiments. The following detailed description is therefore not to be taken in a limiting sense.
The various embodiments herein provide a system and method to track the pitch of a human being in real time using time varying model. According to one embodiment, the input voice is synthesised to obtain a lower order model. The lower model is down sampled and fitted to a time varying 2nd order model. The down sampled signal is passed through a pitch tracking filter, a fading filter and a gradient filter to obtain a pitch signal in real time. The noise included in the pitch signal is removed by passing the acquired pitch signal through a Kalman filter to obtain a smoothened pitch signal in real time.
A lower order model is separated from an input voice time series to perform a pitch tracking process in real time. The low pass filter is a sixth order low pass Butterworth filter 401 to receive the input voice series and to extract a lower order voice series from the input voice series in real time. The down sampler 402 performs the down sampling of the extracted lower order voice series to obtain a low order voice signal. The second order band pass filter 403 is connected to the down sampler 402 and is provided with an algorithm to fit a second order time varying model to the output of the down sampler 402 to obtain the model parameters related to the lower order voice series of the input voice.
The fading filter 409 is connected to the output of the second order band pass filter 403 through an adder 407. The fading filter 409 is connected to the input of the second order band pass filter 403 through a first delay unit 406. The fading filter 409 is connected to the second order band pass filter 403 to calculate an error value in the measurement of the lower order voice in a pitch tracking process.
The gradient filter 404 is connected to the second order band pass filter 403 and is provided with an algorithm to calculate a gradient of the measured error value in the measurement of the lower order voice in a pitch tracking process. The integrator 410 is connected to the gradient filter 404 through a second delay unit 405 to receive the gradient of the measured error value. The integrator 410 is connected to the input and to the output of the fading filter 409 to receive the input lower order voice and the measured error value. The integrator 410 is connected to the fading filter 409 and the gradient filter 404 to calculate a model parameter related to the pitch of the lower order voice. The pitch frequency estimator 412 is connected to the integrator 410 through a first order filter 411 to receive the output of the integrator to calculate a pitch value of the input voice. The smoothing filter 413 is connected to the pitch frequency estimator 412 to obtain a smooth pitch. The smoothing filter is a second order Kalman filter 413.
According to the method, the pitch tracking in real-time is performed by extracting the time series (LOM) vkL from vk as
vk→6th Order Butterworth Filter H(z)→{circumflex over (v)}kL (2)
and a time-varying 2nd order model is fitted to vkL. The filter H(z) (in Eq 2) is designed to have a unity gain in the pass-band and roll-off at 600 Hz. Down sampling of the signal {circumflex over (v)}kL is performed to get vkL.
{circumflex over (v)}kL→Down Sampler→vkL (3)
This down sampling is preformed essentially to make the computation involved in tracking of pitch by Eq 4 numerically efficient and stable. A 2nd order time varying model P(z) is fitted to the signal vkL as:
The model parameters are {circumflex over (p)} and r in which r is fixed pole position of the model and {circumflex over (p)} is varied as the pitch changes and this is tracked.
The Pitch Tracking filter in Eq 4 is written in time domain as:
When tracking is at steady state, the error ek=xk−vkL in leastsquare sense is zero and is measured or computed using a fading filter given as:
The model parameter {circumflex over (p)} is up-dated and tracked using the integrator relation
In the above equation sk is the gradient of the error ek is numerically obtained by using a gradient filter given as:
The pitch frequency Fk is estimated using equation
The Equations 5, 6, 7 and 8 are used in tandem to track pitch in real-time. The pitch Fk as obtained using the equation 9 contains some noise, which can be seen as fast variations. This noise is due to the control methods in the tracking filter (Eqns 5, 6, 7 and 8). Normally the pitch of a human voice does not change so rapidly. So, we can reduce the noise by using the smoothing technique given below. Pitch is smoothed using a 2nd order Kalman Filter with a moving window of N=200 samples implemented via:
where
and pitch variations are captured using the relation
{circumflex over (F)}
j
={circumflex over (F)}
k-1
+{dot over (F)}
k
.wav file→Data Converter→uk→mRpT pitch Tracker→{circumflex over (F)}k
The pitch tracking digital circuits are shown in
With respect to
Then the selected song is sung by the singer. The song sung by the singer is recorded 604. The song sung by the singer is compared and evaluated with the selected reference song to calculate a score 605. The evaluation result is displayed. The process of evaluating includes estimating the pitch of the singer and the pitch of the reference singer who has played the reference singer to calculate the score corresponding to the degree of matching between the singer and the reference singer.
The process of accessing interactive voice response system involves initiating a phone call using a fixed line or a mobile phone. The process of selecting a song for singing involves selecting a desired song from a list of songs stored in a database. The process of selecting further comprises selecting options including language, gender and songs.
The method further comprises a process of selecting a listening option or recording option at the end of the playing of the selected song by a singer. The selected song is played again when the listening option is chosen by the singer. The recording option is selected by the singer to record the song sung by the singer. The process of recording the song sung by the singer includes playing karaoke during the singing of the selected song by the singer. The process of recording the song sung by the singer involves playing the recorded song along with karaoke and returning back to the recording mode after playing the recorded song sung by the singer. The process recording involves enabling the singer to sing the selected song for any number of times until the singer is satisfied with the recorded song.
The process of evaluating the song sung by the singer is initiated after receiving a confirmation of the recorded song from the singer.
The embodiments herein present invention provides a simple method to track the pitch of human being in real time using an algorithm. The pitch tracking method and system helps to track the pitch dynamically in real time by fitting a time varying model. The system and method may be used for singer evaluation and for short term identification of songs and human vocabulary.
Although various specific embodiments are provided herein, it will be obvious for a person skilled in the art to practice the embodiments herein with modifications. However, all such modifications are deemed to be within the scope of the claims.
It is also to be understood that the following claims are intended to cover all of the generic and specific features of the embodiments herein and all the statements of the scope of the invention which as a matter of language might be said to fall there between.
Number | Date | Country | Kind |
---|---|---|---|
2970/CHE/2008 | Dec 2008 | IN | national |