1. Field of the Invention
The present invention relates to a method and apparatus for playing back an audio signal at a decelerated rate by a signal processing unit and simultaneously keeping pitch of the audio signal constant using multiresolution analysis technique.
2. Description of the Related Art
A signal can be viewed as composed of a smooth background and fluctuations or details on top of it. The distinction between the smooth part and the details is determined by the resolution. At a given resolution, a signal is approximated by ignoring all fluctuations below that scale. The resolution can be progressively increased; at each stage of the increase in resolution finer details being added to the coarser description, providing a successively better approximation to the signal. Eventually when the resolution goes to infinity, the exact signal is recovered. Multiresolution refers to the simultaneous presence of different resolutions.
Systems are available in the market, which enable users to play back an audio signal at a decelerated rate. The audio signals that are typically played back at decelerated rates can be a speech signal, a music recording and an audio data signal. However in none of the available systems does the pitch of the audio signal remain constant when it is played back at a decelerated rate.
Typically, when an audio signal is played back at a slower rate than the rate at which it is sampled, the pitch of the output audio signal is typically different than that of the original signal. Thus, sound quality deteriorates as it is played slower. There are no known audio systems that can handle this problem.
There may be several reasons for playing an audio signal at a rate that is slower than its sampling rate during audio signal capture or recording. However, the playback at a slower rate is often unpleasant if not a strange version of the original that sounds significantly different than the original.
For the present invention to be easily understood and readily practiced, preferred embodiments will now be described, for purposes of illustration and not limitation, in conjunction with the following figures:
Interpolation is a process of estimating and inserting one or more values within two known values in a sequence of values. There are several known one dimensional interpolation techniques: nearest neighbor interpolation, linear interpolation, cosine interpolation, cubic spline interpolation are few of them. Nearest neighbor interpolation is fastest interpolation technique, but it gives worst result in terms of smoothness. Linear interpolation uses more memory and takes more execution time than nearest neighbor interpolation. In this technique, the known values or points are simply joined by straight line segments. Each segment (bounded by two data points) can be interpolated independently. In spite of being better than nearest neighbor interpolation, here slope of the straight line segments change at vertex points. Cosine interpolation gives a smoother interpolating function than linear interpolation. Cubic spline interpolation has longest relative execution time. It produces smoothest results of all the interpolation techniques. The plurality of interpolators 120, 140, 160 can employ any of known interpolation techniques depending upon availability of memory and execution time.
One of the plurality of interpolators, 120, 140, 160 is communicatively coupled to an output of only one of the plurality of bandpass filters, 110, 130, 150. The interpolator 120 is communicatively coupled to an output of the bandpass filter 110, the interpolator 140 is communicatively coupled to an output of the bandpass filter 130, the interpolator 160 is communicatively coupled to an output of the bandpass filter 150. The plurality of interpolators 120, 140, 160 generate a third set of plurality of samples. Samples generated by the bandpass filter 110, which is a constituent of the second set of plurality of samples, pass through the interpolator 120 and the interpolator 120 inserts at least one sample into the samples passing through it. Hence number of samples at an output of each of the plurality of interpolators 120, 140, 160 is more than the number of samples in x(n). The plurality of interpolators 120, 140, 160 employ different interpolation techniques. Interpolation technique employed by the interpolator 120 depends on the pass band and the stop band of the bandpass filter 110, that employed by the interpolator 140 depends on the pass band and the stop band of the bandpass filter 130, and so on. The adder 170 superimposes constituents of the third set of plurality of samples generated by the plurality of interpolators 120, 140, 160 on a sample by sample basis. Superimposition is carried out in time domain. The adder outputs a fourth plurality of samples, y(n). Each of the constituents of the third set of plurality of samples and y(n) have identical number of samples in them. Thus number of samples in y(n) is more than the number of samples in x(n). Hence on playing y(n), a decelerated version of the audio signal is obtained. The bandpass filters 110, 130, 150 and the interpolators 120, 140, 160 are so chosen that the decelerated version has a pitch which is consistent with a pitch obtained after playing x(n). Pitch of the decelerated version is consistent with the pitch of the audio signal in a non-decelerated condition.
In one embodiment of the present invention, x(n) is, for example, two hundred and fifty six number of samples of the audio signal and the audio signal is played back at a decelerated rate of two. The constituents of the second set of plurality of samples in the said embodiment are thus each two hundred and fifty six in number. The constituents of the third set of plurality of samples in the said embodiment will be each 256×2=512 (five hundred and twelve) number of samples. The plurality of interpolators 120, 140, 160 employ different interpolation techniques. The interpolation techniques employed by the plurality of interpolators in the said embodiment may be as follows. The interpolator 120 inserts one sample after every sample of the two hundred and fifity six samples passing through it. Thus the number of samples obtained at an output of the interpolator 120 is five hundred and twelve. The interpolator 140 inserts two samples after every two samples of the two hundred and fifty six samples passing through it. Hence the number of plurality of samples obtained at an output of the interpolator 140 is five hundred and twelve. Amplitudes of inserted samples depend on amplitudes of samples present at inputs of the plurality of interpolators. In the embodiment of the invention discussed above, the adder 170 superimposes five hundred and twelve samples generated by each of the plurality of interpolators 120, 140, 160. y(n) is thus five hundred and twelve samples available at an output of the signal processing unit 100. x(n) is two hundred and fifty six number of samples of the audio signal. Hence on playing y(n), a decelerated version of the audio signal is obtained.
By way of example, an audio signal is to be played back at a decelerated rate of two. Suppose, x(n) is two hundred and fifty six number of samples of the audio signal. x(n) is passed through each of the plurality of subunits, 210, 220, 230. The plurality of subunits generate a second set of plurality of samples after passing x(n) through them. The constituents of the second set of plurality of samples in the present embodiment are each 256×2=512 number of samples. In other words, number of samples present at outputs of each of the plurality of subunits 210, 220, 230 is five hundred and twelve. Number of samples in y(n), output of the adder, is again five hundred and twelve in the present embodiment. On playing y(n), a two times decelerated version of the audio signal is obtained.
The signal processing unit has a plurality of bandpass filters, a plurality of interpolators and an adder. In block 312, the plurality of bandpass filters and the plurality of interpolators are provided. The number of bandpass filters in the signal processing unit depends at least on the deceleration rate, the sampling frequency and an interference introduced by the plurality of bandpass filters. Q factor across the plurality of bandpass filters is kept constant. Pass bands and stop bands of the plurality of bandpass filters are designed to be different.
The plurality of interpolators and the plurality of the bandpass filters correspond in number. Interpolation technique employed by each of the plurality of interpolators is different. The interpolation technique employed in an interpolator can include inserting at least one sample into the plurality of samples passing through the interpolator. The determination of which of the plurality of bandpass filters is to be connected with which of the plurality of interpolators is done at the next block 316. Such a determination comprises inspecting a pass band and a stop band for each of the plurality of bandpass filters and inspecting the interpolation technique for each of the plurality of interpolators. The plurality of interpolators are communicatively connected with outputs of the plurality of bandpass filters in block 320.
Block 324 illustrates that the first plurality of samples of the audio signal collected at block 304 are passed through each of the plurality of bandpass filters. The plurality of bandpass filters generate a second set of plurality of samples. In the next block 328, samples generated at an output of each of the plurality of bandpass filters is passed through the corresponding interpolator to which the bandpass filter is connected. The plurality of interpolators generate a third set of plurality of samples. Constituents of the third set of plurality of samples are superimposed in step 332 on a sample by sample basis, giving rise to a fourth plurality of samples. The fourth plurality of samples are played in step 336 generating a decelerated version of the audio signal. Actions described in blocks 308, 312, 316, 320, 324, 328 and 332 ensure that pitch of the decelerated version of the audio signal is consistent with a pitch of a non-decelerated version of the audio signal. The process ends at block 340.
The above-discussed embodiments of the invention are discussed for illustrative purposes only. It would be understood to a person of skill in the art that other embodiments and other configurations are possible, while still maintaining the spirit and scope of the invention. For a proper determination of the scope of the present invention, reference should be made to the appended claims.