The present application is based on and claims the benefit of priority from the prior Japanese Patent Application No. 2009-191570, filed Aug. 21, 2009, and No. 2009-191571, filed Aug. 21, 2009, and the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a data converting apparatus and a data converting method, which receive an audio signal and output the audio signal at a changed reproducing speed.
2. Description of the Related Art
A technique is in practical use, which reproduces an audio signal while changing a reproducing time of the audio signal without changing its pitch. The technique is used for applying sound effects in so-called KARAOKE, and in a fast-forward reproduction and/or in a slow-speed reproduction of CD, video tapes, and DVD. The technique is roughly divided into two groups: one is based on operation on a time axis and other is based on operation on a frequency axis. As the typical example of the former technique, OLA (Over and Add) method and its developed method PSOLA (Pitch Synchronous OLA) method are known, and as the typical example of the latter technique, the phase vocoder is known.
The OLA method is described, for instance, in Japanese Patent NO.2005-266571 A. In the OLA method, audio data is divided into plural units of frames, and these frames are subjected to a window function. The windowed frames are successively disposed with their portions overlapped each other and data values of the overlapped portions are added and output. By changing lengths of the overlapped portions, the audio data is extended and/or contracted. The OLA method is simple in processing, and can be realized by a simple configuration. But disconnection can occur around at connecting portions of frames including overlapping portions of the frames, causing noises.
The PSOLA method is disclosed, for example, in Japanese Patent NO. Hei10-124082 A. The PSOLA method extracts a pitch of audio data and sets a frame size of the audio data to be divided based on the extracted pitch, thereby reducing noises caused at connecting portions of the frames. But the PSOLA method has a drawback that the PSOLA method can be used only for audio data having a pitch that can be extracted, in other words, the PSOLA method can be used only for audio data of musical instruments of a single tone.
The technique based on the operation on the frequency axis is disclosed by U.S. Pat. No. 7,672,835. In the technique based, on the operation of the frequency axis, such as phase vocoders, the audio data is subjected to Fourier transform to operate the phase on the frequency axis, thereby computing the inverse Fourier transform to obtain data on the time axis again. In this way, the audio data is extended and/or contracted. This technique can obtain better results than OLA method and PSOLA method, but includes a problem that requires complex arithmetic processing such as Fourier transform and complex number operation.
The present invention has an object to provide a data converting apparatus and a data converting method, which can be composed with a simple configuration using OLA method, and, in particular, can solve a problem of deterioration in sound quality which may be caused due to discontinuity at frame connecting points and/or in a slow speed reproduction.
According to one aspect of the present invention, there is provided a data converting apparatus, which divides input data into plural frames and partially overlaps the divided frames, thereby producing output data, the apparatus which comprises an input buffer for storing input data of a first frame size, an output buffer for storing output data of a second frame size decided in accordance with a reproducing speed ratio, wherein the reproducing speed ratio indicates a ratio of a reproducing speed of the output data to the input data, a controlling unit for deciding an overlapping rate based on the reproducing speed ratio indicating the ratio of a reproducing speed of the output data to the input data, and for deciding a first hopsize based on the first frame size of the input data and the decided overlapping rate, a data converting unit for producing the output data of the second frame size from the input data stored in the input buffer, wherein the data converting unit comprise an input-frame data producing unit for producing from the input data stored in the input buffer input frames each having the first hopsize and including a predetermined number of sub-frames, wherein the predetermined number of sub-frames corresponds to the overlapping rate, a window function executing unit for executing a window function on the input frames produced by the input-frame data producing unit, a frame processing unit for shifting the input frames subjected to the window function by the first hopsize and overlapping the shifted input frames, thereby obtaining data of output frame, wherein the obtaining data of output frame has a second hopsize defined based on the first hopsize and the reproducing speed ratio and includes sub-frames, and an output buffer data producing unit for storing in the output buffer the data of output frame as the output data each having the second, hopsize and including the predetermined number of sub-frames, and the controlling unit sets the first hopsize and the overlapping rate in a slow speed reproduction different from the first hopsize and the overlapping rate in a high speed reproduction and vice versa, wherein the slow speed reproduction is set when the reproducing speed ratio has been set lower than “1” and the high speed reproduction is set when the reproducing speed ratio has been set higher than “1”.
According to another aspect of the present invention, there is provided a data converting method, which divides input data into plural frames of a first frame size and partially overlaps the divided frames, thereby producing output data, the method, which comprises a controlling step of deciding an overlapping rate based on a reproducing speed ratio indicating a ratio of a reproducing speed of the output data to the input data, and deciding a first hopsize based on the first frame size of the frame and the decided overlapping rate, and of deciding the first hopsize and the overlapping rate in a slow speed reproduction different from the first hopsize and the overlapping rate in a high speed reproduction and/or of deciding the first hopsize and the overlapping rate in the high speed reproduction different from the first hopsize and the overlapping rate in the low speed reproduction, wherein the slow speed reproduction is set when the reproducing speed ratio has been set lower than “1” and the high speed reproduction is set when the reproducing speed ratio has been set higher than “1”, an input-frame data producing step of producing from the input data stored in an input buffer input frames each having the first hopsize and including a predetermined number of sub-frames, wherein the predetermined number of sub-frames corresponds to the overlapping rate, a window function executing step of executing a window function on the input frames produced at the input-frame data producing step, a frame processing step of shifting the input frames subjected to the window function by the first hopsize and overlapping the shifted input frames, thereby obtaining data of output frame having sub-frames of a second hopsize defined based on the first hopsize and the reproducing speed ratio, and an output buffer data producing step of storing data from the data of output frame in an output buffer as output data each including the predetermined number of sub-frames having the second hopsize.
Embodiments of a data converting apparatus of the present invention will be described with reference to the accompanying drawings.
The input unit 12 has key switches such as a switch for setting a reproducing speed and ten keys. The displaying unit 13 includes, for example, a liquid crystal displaying device. CPU 11 executes various processes, including detection of switching operation of the input unit 12, determination of a reproducing speed ratio based on the switching operation, driving of a converting process based on the switching operation, calculation of parameters to be used in the converting process, determination of parameters to be used for various types of filters and in filtering processes, transferring of the parameters to the signal converting unit 18 and control of input I/F 17.
ROM 14 stores a process program for CPU 11 to execute the processes, including detection of switching operation of the input unit 12, determination of a reproducing ratio based on the switching operation, driving of a converting process based on the switching operation, calculation of parameters to be used in the converting process, determination of parameters used for various types of filters and in filtering processes, transferring of the parameters to the signal converting unit 18 and control of input I/F 18. RAM 15 provides data storing areas for an input buffer, output buffer, input frame and output frame, wherein the input buffer stores data input from the input I/F 17, the output buffer stores data read from the audio circuit 19, and the input frame and output frame temporarily stores data to be processed. Further, RAM 15 stores various parameters to be used in the process. The large scale storing device 16 comprises hard disk drives and card type memories. The large scale storing device 16 stores various data including data input through the input I/F 17, data converted by the signal converting unit 18 and data subjected to the filtering process.
The input I/F 17 is connected with a microphone 20 and has an input terminal 21, as shown in
The signal converting unit 18 executes a time extending process or a time contracting process on input data, and further executes a filtering process on data subjected to the time extending process, thereby producing data whose reproducing speed has been converted. A configuration of the signal converting unit 18 and a process to be executed by the signal converting unit 18 will be described later. The audio circuit 19 has D/A converter (not shown) and an amplifier (not shown) The audio circuit 19 converts data received from the signal converting unit 18 into an analog audio signal and amplifies the audio signal, outputting the audio signal through a speaker 22.
The frame-process calculating unit 30 executes processes relating to frames and matters between frames including a data operation such as a data shift in the input frame 27 and output frame 29, execution of a window function on the input frame 27, and addition of a data value of the input frame 27 to a data value of the corresponding output frame 29.
After the initializing process is finished at step 301, the process at step 301 and the following processes are repeatedly executed. CPU 11 judges at step 302 whether or not any switch of the input unit 12 has been operated. When it is determined at step 302 that a switch has been operated, a process is executed in accordance with the operated switch.
CPU 11 judges at step 401 whether or not the reproducing speed setting switch in the input unit 12 has been turned on. When it is determined at step 401 that the reproducing ratio setting switch has been turned on (YES at step 401), CPU 11 obtains an input reproducing speed ratio from operation of the ten key or operation of “+” key (“plus” key) or “−” key (“minus” key) at step 402.
The reproducing speed ratio indicates a reproducing speed at which the converted output data is reproduced when a speed of the input data is “1”. The reproducing speed ratio “2” means that the output data is reproduced at twice the speed of the input data. Further, the reproducing speed ratio “0.5” means that the output data is reproduced at half the speed of the input data. In the data converting process to be described later, a “expansion ratio” is equivalent to the inverse reproducing speed ratio, that is, the “expansion ratio” is equivalent to “1/reproducing speed”.
CPU 11 determines sizes of the input data buffer 25 and input frame 27, an overlapping ratio, a size of the output frame 29, size of the intermediate buffer 31 and a size of the output buffer 33 at step 403. In the present embodiment, the input data is converted at a reproducing speed ratio designated in the range of the reproducing speed ratio from 0.5 to 2.0, and converted data is outputted. In the present embodiment, OLA (OverLap and Add) method is employed and the input data is converted into output data without executing a complex calculation.
In OLA method, the input data is divided into input frames of a predetermined size at first. The input frames are extracted such that the input frames each have a partially overlapping portion with each other, and the output data each has a partially overlapping portion with each other. As a parameter indicating the partially overlapping portion of the frames is used an overlapping rate. The overlapping rate indicates how much the frame is shifted relative to the whole frame size. For example, when the overlapping rate is “4”, this means that the frame is shifted by ¼ of the frame size. A shifted amount of the frame is called a “hopsize”.
When the frames are overlapped to be added or connected, these frames are subjected to the window function and the windowed data is added together or connected. For example, hanning window can be used as the window function. If a hopsize of the input data is equivalent to a hopsize of the output data, data increases in amplitude by an overlapped amount, but the original waveform is maintained. If the hopsize of the output data is larger than the hopsize of the input data, the time extending process is executed on data, whereby the reproducing speed is made slow (slow-speed reproduction). Meanwhile, if the hopsize of the output data is smaller than the hopsize of the input data, the time contracting process is executed on data, whereby the reproducing speed is made high (high-speed reproduction).
If the frame sizes of the frames to be overlapped and the overlapping rate are not set properly when the frames are overlapped to be added or connected, sound quality is deteriorated in slow-speed reproduction and in high-speed reproduction. For instance, when the overlapping rate is set to the minimum value “2” and the time extending process is executed on the frames in the slow reproduction, as shown in
Meanwhile, when the overlapping rate is set large, a problem due to discontinuity at connecting portions of frames can be reduced, but another problem is invited, as will be described below. As the overlapping rate is set larger, the number of the overlapping frames increases. Therefore, effect of a comb filter is multiply-applied on data, and as a result, sound quality is greatly changed. Even if a frame size is set large, since the overlapping portion of frames is made large, the effect of a comb filter is excessively applied on data.
The inventor of the present invention has found that it is preferable to set the overlapping rate not less than “2” and near “2” as possible. If the overlapping rate is set to “2”, the time extending process in the slow-speed reproduction invites a problem that the overlapped frames fluctuates in amplitude and a problem of discontinuity of the superimposed frames, but the inventor has found that as far as the time contracting process is executed in the high-speed reproduction invites, no problem is invited even though the overlapping rate is set to “2”.
In the slow-speed reproduction, it is preferable to set the overlapping rate to a value larger than “2” to avoid causing the problem of the frame discontinuity. In consideration of the effect of the comb filter, it has been found that it is preferable to set the overlapping rate to “3” or “4”. In the present embodiment, the overlapping rate in the slow-speed reproduction is set to “4” in consideration of easy processing.
Further, it is preferable to set the frame size in consideration of the effect of the comb filter. The inventor has found that when the overlapping rate is set to “2”, approximately one fifth of the number of samples corresponding to a sampling frequency is suitable for the frame size, and that when the overlapping rate is set to “4”, approximately one tenth of the number of samples corresponding to the sampling frequency is suitable for the frame size.
In the present embodiment, the sampling frequency is 44.1 KHz (44,100 Hz) and both the input buffer 801 and the input frame 802 have a size (frame size) of “4096”. The overlapping rate is “4” and a size of a sub-frame corresponds to a value of “1024”, which is given by dividing the frame size (“4096”) by the overlapping rate “R” (Refer to a reference numeral: 811 in
The output frame 803 in the present embodiment has a sub-frame 831 at its leading portion, which has a size of “1024/R”, wherein the size is obtained by dividing the size of the input buffer 801 and input frame 802 by the reproducing speed ratio “R”. Following to the leading sub-frame 831, there are disposed-four sub-frames having the same size as the input buffer 801 and input buffer 802 (Refer to a reference numeral: 832).
The intermediate buffer 804 in the present embodiment comprises four sub-frames (Refer to a reference numeral: 841) having the same size (1024/R) as the leading sub-frame 831.
The leading sub-frame 931 of the output frame 903 has a size of 4096/R, which is given by dividing the size of the sub-frame of the input buffer 901 or the input frame 902 by the reproducing speed ratio “R”. Following to the leading sub-frame 931, there are disposed two sub-frames (Refer to a reference numeral: 932 in
In the present embodiment, a combination of sizes of the input buffer and input frame and the overlapping rate in the case that the reproducing speed ration is set larger than “1” and a combination of sizes of the input buffer and input frame and the overlapping rate in the case the reproducing speed ratio is set lower than “1” are stored in RAM 14. Therefore, the reproducing speed ratio is obtained at step 402 in the switching process (
In the case that the reproducing speed ratio is set lower than “1”, that is, in the case of the slow-speed reproduction, a data area is secured in RAM 14 at step 404 for storing the input buffer, input frame, output frame, intermediate buffer and output buffer shown in
CPU 11 determines the type of filter and parameters (for example, filter coefficients) for the filtering process depending on the reproducing speed ratio at step 405. Hereinafter, the filtering process executed by the filtering processing unit 32 will be described in detail.
In the case that the reproducing speed ratio falls within the range from “0.8” to “1.0” (0.8≦ reproducing speed ratio <1.0), that is, in the case that the time extending rate is relatively low in the slow-speed reproduction, samples are delayed by some tens or some hundreds of samples and are added to a sample of an original waveform in a frame addition with frames overlapped to be described later. When some delayed samples are added to the original sample, a filter represented by the following equation (1) is obtained.
H(z)=1+αZ−K (1)
In the equation (1), α is a multiplier factor determined by the window function, and K represents the number of the delayed samples. The filter represented by the equation (1) acts as the comb filter on the original waveform of the sample. The effect of the comb filter changes the waveform of the converted data, altering sound quality.
Meanwhile, in the case that the reproducing speed ratio is relatively low (0.5≦ reproducing speed ratio ≦0.6), that is, in the case that the time extending rate is relatively high, the effect of the comb filter is reduced to be negligible. But since the time extending rate has been set high, the problem of discontinuity at the connecting portions of the frames is made noticeable, and the waveform of the converted data includes considerable, amount of rough noises.
In the present embodiment, in the time extending process in the slow-speed reproduction, the data, which has been subjected to the data converting process is subjected to an inverse filtering process in the range where the reproducing speed ratio is set close to “1”, reducing the effect of the comb filter. A transfer function, of the reverse filter is given by the equation (2).
H(z)=1/(1+βZ−K) (2)
In the equation (2), β is a multiplier factor and K represents the number of the delayed samples. The multiplier factor β is determined based on the reproducing speed ratio by CPU 11.
In the case that the reproducing speed ratio is set lower, that is, the reproducing speed ratio is to “0.5”, the hopsize will be “2048”. Then, the discontinuity appears every 2048 samples, causing noises due to the discontinuity. The fundamental frequency of the noises is given by 44,100 Hz/2048 ≈21.5 Hz. The fundamental pitch is 21.5 Hz and harmonic sounds are added thereto.
To negate the noises, a filter of the transfer function represented by the equation (3) is executed on the data after data conversion in the present embodiment.
H(z)=1+γZ−K (3)
In the equation (3), γ is a multiplier factor and K represents the number of the delayed samples. The multiplier factor γ may be fixed to 0.5 or may be determined based on the reproducing speed ratio by CPU 11. The filter represented by the equation (3) is a comb filter, which has zero points around every 2104 Hz when the delayed sample is 1024. The noises caused due to discontinuity at the connecting portions of frames can be reduced by applying the comb filter on the data. In other words, in the present embodiment, the noises caused at the discontinuities between the sample frames can be reduced by applying on the data the comb filter, which has zero points at the connecting points of sub-frames in the intermediate buffer and output buffer.
After the switching process finishes at step 302 in
After the input data process finishes at step 303, CPU 11 gives the signal converting unit 18 an instruction of a data converting process. The signal converting unit 18 executes the data converting process at step 304.
The signal converting unit 18 judges at step 1002 whether or not the parameter “i” is larger than the overlapping rate. When it is determined at step 1002 that the parameter “i” is not larger than the overlapping rate (NO at step 1002), the signal converting unit 18 (input frame data producing unit 26) copies the i-th sub-frame of the input buffer to the tail sub-frame of the input frame at step 1003.
As will be described later, the sub-frames of the input frame are shifted by one sub-frame at step 1007. Therefore, when the process of step 1002 is executed repeatedly, the sub-frames In (t−3), In (t−2), In (t−1) and In (t) will be stored from the leading to the tail in the input frame as shown in
Further, the signal converting unit 18 (intermediate buffer data producing unit 28) copies the leading sub-frame of the output frame to the i-th sub-frame of the intermediate buffer at step 1004. As will be described later, the sub-frames of the output frame are shifted by one sub-frame at step 1008. Therefore, data corresponding to the i-th sub-frame of the intermediate buffer is disposed to the leading sub-frame of the output frame at step 1004.
After the sub-frame copying process at step 1004, the signal converting unit 18 (frame-process calculating unit 30) applies the hanning window on the input frame at step 1005. Then, the signal converting unit 18 adds a data value of the hanning windowed input frame and data values corresponding to the second and the following sub-frames of the output frame at step 1006. The hanning window 1202 is applied onto the input frame 1200 at step 1005, as shown at (a) in
After the process finishes at step 1006, the signal converting unit 18 (input frame data producing unit 26) shifts data values of the input frame by one sub-frame in the left-hand direction at step 1007, whereby the k-th (k=2 to 4) sub-frame at step 1006 will become the (k-1)-th sub-frame at step 1007, respectively. The first sub-frame at step 1006 will be set aside. The sub-frame of the input buffer is copied onto the fourth sub-frame of the input frame at step 1003 in the next cycle. But at step 1006, a data value of “0” is stored in the fourth sub-frame of the input frame.
The signal converting unit 18 (intermediate buffer data producing unit 28) shifts data values of the output frame by the first sub-frame of the output frame in the left-hand direction at step 1008. In the output frame 1205, which has been processed at step 1008 as shown at (c) in
After the process finishes at step 1008, the signal converting unit 18 increments the parameter “i” at step 1009, and returns to step 1002.
In the data converting process shown in
With reference to
After the data converting process finishes at step 304 in
As shown in the table of
When the filtering processing unit 32 has received the instruction of executing the filtering process at step 1501 in
When it is determined at step 1502 that the second filter should be applied (NO at step 1502), the filtering processing unit 32 reads a data value of the intermediate buffer 31 at step 1506, and uses the filter multiplier factor provided from CPU 11, executing on the read data value a filter corresponding to the comb filter represented by the equation (3) at step 1507. Then, the filtering processing unit 32 stores at step 1505 in the output buffer 33 the data value which has been subjected to the filter at step 1507.
When the filtering processing unit 32 does not receive the instruction of executing the filtering process at step 1501 in
After the filtering process finishes at step 305 in
In the present embodiment, in the case that the time extending process should be executed, that is, in the case of the slow-speed reproduction, when the reproducing speed ratio falls within the first range close to “1”, that is, when the time extending rate in the time extending process is relatively low, the filtering processing unit 32 applies the reverse filter of the comb filter on the data to reduce the effect of the comb filter which is caused in the output data during the time extending process, whereby the slow-speed reproduction is realized with simple processing and without deterioration in sound quality.
In the present embodiment, CPU 11 determines based on the reproducing speed ratio the multiplier factor β in the transfer function of the reverse filter H (z)=1/(1+β Z−K), where β is the multiplier factor and K represents the number of the delayed samples, and transfers the determined multiplier factor β to the filtering processing unit 32, whereby a degree at which the reverse filter is applied on the data can be adjusted in accordance with the set reproducing speed ratio.
Particularly in the present embodiment, CPU 11 determines the multiplier factor β such that as the reproducing speed ratio becomes lower, the multiplier factor β in the transfer function of the reverse filter becomes lower, in other words, such that as the time extending rate in the time extending process becomes larger, the effect of the comb filter on the output data is reduced. Therefore, the reverse filter can be applied on the output data in accordance with a level of the effect of the comb filter.
Further, in the present embodiment, when the reproducing speed ratio falls within the second range which is at a lower position than the first range, the filtering processing unit 32 applies the second filter or the comb filter on the output, data. When the time extending rate in the time extending process is relatively large, the filtering processing unit 32 applies the comb filter on the output data to reduce noises caused due to the discontinuity at the connecting portions of frames. Therefore, even in the case of a low reproducing speed ratio (or large time extending rate), output data with noises reduced can be obtained.
In the present embodiment, the comb filter has zero points at connecting points of sub-frames having hopsize in the intermediate buffer and output buffer. Therefore, noises at the connecting points can be reduced properly.
In the present embodiment, CPU 11 determines based on the reproducing speed ratio the multiplier factor γ in the transfer function of the comb filter H (z)=1+γ Z−K, where γ is the multiplier factor and K represents the number of the delayed samples, and transfers the determined multiplier factor γ to the filtering processing unit 32, whereby a degree at which the reverse filter is applied on the data can be adjusted in accordance with the set reproducing speed ratio.
Particularly in the present embodiment, CPU 11 determines the multiplier factor γ such that as the reproducing speed ratio becomes lower, the multiplier factor γ in the transfer function of the comb filter becomes larger. As the reproducing speed ratio becomes lower, or as the time extending rate becomes larger in the time extending process, levels of noises at connecting points of frames increase. Therefore, the effect of the comb filter can be applied on the output data in accordance with the levels of the noises.
Although specific embodiments of the present invention have been described in the foregoing detailed description, it will be understood that the invention is not limited to the particular embodiments described herein, and that numerous rearrangements, modifications, and substitutions may be made to the embodiments of the invention and fall within the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2009-191570 | Aug 2009 | JP | national |
2009-191571 | Aug 2009 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5991724 | Kojima et al. | Nov 1999 | A |
6484137 | Taniguchi et al. | Nov 2002 | B1 |
6675141 | Inoue et al. | Jan 2004 | B1 |
7464028 | Singhal | Dec 2008 | B2 |
7526351 | He et al. | Apr 2009 | B2 |
7672835 | Setoguchi | Mar 2010 | B2 |
7957960 | Chen | Jun 2011 | B2 |
8078456 | Chen et al. | Dec 2011 | B2 |
20060143000 | Setoguchi | Jun 2006 | A1 |
20060221788 | Lindahl et al. | Oct 2006 | A1 |
20070078662 | Sakurai et al. | Apr 2007 | A1 |
20070168188 | Choi | Jul 2007 | A1 |
20070276657 | Gournay et al. | Nov 2007 | A1 |
20080162151 | Cho | Jul 2008 | A1 |
20080262856 | Megeid et al. | Oct 2008 | A1 |
Number | Date | Country |
---|---|---|
5-143077 | Jun 1993 | JP |
7-77999 | Mar 1995 | JP |
10124082 | May 1998 | JP |
2002-175099 | Jun 2002 | JP |
2005266571 | Sep 2005 | JP |
Entry |
---|
Japanese Office Action for Japanese Patent Application Serial No. 2009-191571 mailed on Jun. 14, 2011. |
Ogawa, Digital Filter Recipe for Processing Audio Digital Signal by means of Software, C Magazine, vol. 14, No. 3, Softbank Publishing Co. Ltd., Mar. 1, 2002, pp. 13-41. |
Japanese Office Action for Japanese Patent Application Serial No. 2009-191570 mailed on Jun. 14, 2011. |
Number | Date | Country | |
---|---|---|---|
20110046967 A1 | Feb 2011 | US |