This Nonprovisional application claims priority under 35 U.S.C. §119(a) on patent Application No. 2004-181881 filed in Japan on Jun. 18, 2004, the entire contents of which are hereby incorporated by reference.
1. Field of the Invention
The present invention relates to an audio signal processing method, an audio signal processing apparatus, and an audio signal processing system for increasing or decreasing a predetermined non-harmonic structured spectral component contained in an audio signal, as well as to a computer program product for causing a computer to increase or decrease a predetermined non-harmonic structured spectral component contained in an audio signal.
2. Description of Related Art
Graphic equalizers are widely used as means for adjusting an audio signal such as music outputted from a speaker. (e.g., Japanese Patent Application Laid-Open No. 5-175773 (1993)). When a graphic equalizer is used, an audio signal reproduced from a CD (compact disk) or the like can be frequency-analyzed, and then the spectra of specific frequency ranges can be increased and decreased. Thus, when a bass drum sound contained in an audio signal outputted from a speaker is to be emphasized, the spectrum of a low frequency range may be increased.
Nevertheless, in many cases, a plurality of musical instruments are used in a musical performance, and hence a plurality of instrumental sounds are contained in the audio signal. Thus, when the spectrum of a specific frequency range of the audio signal is increased or decreased, a plurality of instrumental sounds having a spectrum in the specific frequency range should be increased or decreased similarly. For example, when the spectrum of a low frequency range is increased for the purpose of emphasizing a bass drum, the bass drum sound is increased, and so are other instrumental sounds such as a bass guitar sound that have a spectrum in the low frequency range of the target of increase.
As such, a graphic equalizer increases and decreases the spectra of specific frequency ranges of an audio signal, and hence all the instrumental sounds are similarly increased and decreased that have a spectrum in a specific frequency range of the target of increase or decrease. This has caused a problem that a specific instrumental sound cannot be solely increased or decreased without an influence on the other instrumental sounds, such as that a bass drum sound cannot be solely increased or decreased without an influence on a bass guitar sound.
The present invention has been devised with considering such a situation. An object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for extracting a predetermined non-harmonic structured spectral component contained in an audio signal and then increasing or decreasing the spectral component so as to allow the audio-signal contained predetermined spectral component to be independently increased or decreased without an influence on the other spectral components.
Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for calculating the spectrum of an audio signal by frequency analysis so as to allow a non-harmonic structured sound such as a drum sound to be extracted from the audio signal on the basis of the spectrum distribution.
Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for adapting a spectral component of a template in such a manner that the difference between an extracted spectral component and the spectral component of the template goes below or at a predetermined value, so as to improve the accuracy in the extraction of a non-harmonic structured sound such as a drum sound.
Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for selecting a predetermined number of extracted spectral components in ascending order of difference between the spectral component and a spectral component of a template and then updating the spectral component of the template into the median of the predetermined number of selected spectral components so as to permit the acquisition of a template in which the spectra of spectral components not having a non-harmonic structure are suppressed.
Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for quantizing an extracted spectral component and a spectral component of a template in the initial adaptation for the spectral component of the template so as to permit the suppression of an erroneous calculation that a large difference value is obtained despite that the two components are alike.
Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for increasing or decreasing an extracted predetermined spectral component in response to a received amount of increase or decrease so as to allow the power of the extracted predetermined spectral component to be adjusted independently of the power of the audio signal.
Another object of the invention is to provide an audio signal processing method, an audio signal processing apparatus, and a computer program product for causing the process of extracting a predetermined non-harmonic structured spectral component and the process of increasing or decreasing the spectral component to be performed in different apparatuses from each other, so as to allow the load to be distributed efficiently.
An audio signal processing method according to the first invention is characterized by comprising steps of extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing or decreasing the extracted predetermined spectral component.
An audio signal processing method according to the second invention is based on the first invention, and characterized by further comprising a step of calculating a spectrum of the audio signal by frequency analysis, wherein, in the step of extracting the predetermined non-harmonic structured spectral component, a spectrum is extracted that corresponds to the predetermined non-harmonic structured spectral component.
An audio signal processing method according to the third invention is based on the first invention, and characterized in that the step of extracting the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in advance, and the method further comprises a step of adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
An audio signal processing method according to the fourth invention is an audio signal processing method for extracting, with reference to a spectral component of a template stored in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and is characterized by comprising a step of adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
An audio signal processing method according to the fifth invention is based on the third or fourth invention, and is characterized in that the adapting step further comprises steps of calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting a predetermined number of spectral components in ascending order of the calculated difference; and updating the spectral component of the template into a median of the predetermined number of selected spectral components.
An audio signal processing method according to the sixth invention is based on the fifth invention, and characterized by further comprising a step of quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template, wherein, in the step of calculating a difference, a difference is calculated between each extracted spectral component and the spectral component of the template which have been quantized.
An audio signal processing method according to the seventh invention is based on the first or fourth invention, and characterized by further comprising a step of receiving an amount of increase or decrease for the predetermined spectral component, wherein, in the increasing or decreasing step, the extracted predetermined spectral component is increased or decreased in response to the received amount of increase or decrease.
An audio signal processing method according to the eighth invention is characterized by comprising steps of extracting a predetermined non-harmonic structured spectral component contained in an audio signal; outputting onset time information of the extraction of the predetermined on-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal; receiving the outputted onset time information, the predetermined spectral component, and the audio signal; and increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the received onset time information.
An audio signal processing apparatus according to the ninth invention is characterized by comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing and decreasing means for increasing or decreasing the predetermined spectral component extracted by the extracting means.
An audio signal processing apparatus according to the tenth invention is based on the ninth invention, and characterized by further comprising calculating means for calculating a spectrum of the audio signal by frequency analysis, wherein the extracting means extracts a spectrum corresponding to the predetermined non-harmonic structured spectral component.
An audio signal processing apparatus according to the eleventh invention is based on the tenth invention, and characterized in that the extraction of a predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in a storage unit in advance, and the apparatus further comprises adapting means for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
An audio signal processing apparatus according to the twelfth invention is an audio signal processing apparatus for extracting, with reference to a spectral component of a template stored in a storage unit in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and characterized by comprising adapting means for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
An audio signal processing apparatus according to the thirteenth invention is based on the eleventh or twelfth invention, and characterized in that the adapting means further comprises: subtracting means for calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting means for selecting a predetermined number of spectral components in ascending order of the difference calculated by the subtracting means; and updating means for updating the spectral component of the template into a median of the predetermined number of spectral components selected by the selecting means.
An audio signal processing apparatus according to the fourteenth invention is based on the thirteenth invention, and characterized by further comprising quantizing means for quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template, wherein the subtracting means calculates a difference between each extracted spectral component and the spectral component of the template which have been quantized by the quantizing means.
An audio signal processing apparatus according to the fifteenth invention is based on the ninth or twelfth invention, and characterized by further comprising receiving means for receiving an amount of increase or decrease for the predetermined spectral component, wherein the increasing and decreasing means increases or decreases the extracted predetermined spectral component in response to the amount of increase or decrease received by the receiving means.
An audio signal processing system according to the sixteenth invention is characterized by including: a first audio signal processing apparatus comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting means for outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal by the extracting means, the predetermined spectral component, and the audio signal; and a second audio signal processing apparatus comprising: receiving means for receiving the onset time information, the predetermined spectral component, and the audio signal outputted from the first audio signal processing apparatus; and increasing and decreasing means for increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the onset time information received by the receiving means.
An audio signal processing apparatus according to the seventeenth invention is characterized by comprising: extracting means for extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting means for outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal by the extracting means, the predetermined spectral component, and the audio signal.
An audio signal processing apparatus according to the eighteenth invention is characterized by comprising: receiving means for receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, the predetermined spectral component, and the audio signal; and increasing and decreasing means for increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the onset time information received by the receiving means.
A computer program product according to the nineteenth invention is a computer program product for causing a computer to process an audio signal, wherein the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises instructions for: extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and increasing or decreasing the extracted predetermined spectral component.
A computer program product according to the twentieth invention is based on the nineteenth invention, and characterized in that the computer readable program code means further comprises an instruction for calculating a spectrum of the audio signal by frequency analysis, and the extracting instruction causes the computer to extract a spectrum corresponding to the predetermined non-harmonic structured spectral component.
A computer program product according to the twenty-first invention is based on the twentieth invention, and characterized in that the instruction for extracting a predetermined non-harmonic structured spectral component is executed with reference to a spectral component of a template stored in advance, and the computer readable program code means further comprises an instruction for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
A computer program product according to the twenty-second invention is a computer program product for causing a computer to extract, with reference to a spectral component of a template stored in a memory in advance, a predetermined non-harmonic structured spectral component contained in an audio signal, and characterized in that the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises an instruction for adapting the spectral component of the template in such a manner that a difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value.
A computer program product according to the twenty-third invention is based on the twenty-first or twenty-second invention, and characterized in that, in the adapting instruction, the computer readable program code means further comprises instructions for: calculating a difference between each extracted spectral component and the spectral component of the template in case that a plurality of spectral components have been extracted; selecting a predetermined number of spectral components in ascending order of the calculated difference; and updating the spectral component of the template into a median of the predetermined number of selected spectral components.
A computer program product according to the twenty-fourth invention is based on the twenty-third invention, and characterized in that the computer readable program code means further comprises an instruction for quantizing the extracted spectral components and the spectral component of the template in an initial adaptation for the spectral component of the template; and the instruction for calculating a difference causes the computer to calculate a difference between each extracted spectral component and the spectral component of the template which have been quantized.
A computer program product according to the twenty-fifth invention is based on the nineteenth or twenty-second invention, and characterized in that the computer readable program code means further comprises an instruction for receiving an amount of increase or decrease for the predetermined spectral component; and the increasing or decreasing instruction causes the computer to increase or decrease the extracted predetermined spectral component in response to the received amount of increase or decrease.
A computer program product according to the twenty-sixth invention is a computer program product for causing a computer to process an audio signal, wherein the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and characterized in that the computer readable program code means comprises instructions for: extracting a predetermined non-harmonic structured spectral component contained in an audio signal; and outputting onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal.
A computer program product according to the twenty-seventh invention is a computer program product for causing a computer to process an audio signal, wherein the computer program product comprises a computer readable storage medium having computer readable program code means embodied in the medium, and the computer readable program code means comprises instructions for: receiving onset time information of the extraction of a predetermined non-harmonic structured spectral component from an audio signal, the predetermined spectral component, and the audio signal; and increasing or decreasing the received spectral component contained in the received audio signal, on the basis of the received onset time information.
In the first, ninth and nineteenth-inventions, a predetermined non-harmonic structured spectral component contained in an audio signal is extracted. An example of the non-harmonic structured tone is a sound of a percussion instrument such as a drum. Then, in the audio signal, the extracted predetermined spectral component is increased or decreased. For example, when the extracted spectral component of a drum is increased, the drum sound is emphasized. On the contrary, when the extracted spectral component of a drum is decreased, the drum sound is cancelled. As such, a predetermined spectral component contained in an audio signal is solely extracted and can be independently increased or decreased without an influence on the other spectral components.
In the second, tenth and twentieth inventions, the spectrum of an audio signal is calculated by frequency analysis. The sound of a percussion instrument such as a drum is of non-harmonic structure, and have slight or no harmonic structure. The sounds of other types of musical instruments have a harmonic structure. Thus, on the basis of the spectrum distribution, the non-harmonic structured sound of a percussion instrument such as a drum can be discriminated from the harmonic structured sounds of other types of musical instruments. That is, the non-harmonic structured sound of a percussion instrument such as a drum can be extracted from the audio signal on the basis of the spectrum distribution.
In the third, fourth, eleventh, twelfth, twenty-first and twenty-second inventions, the extraction of a predetermined non-harmonic structured spectral component is performed on the basis of a spectral component of a template stored in advance. For example, when a drum sound is to be extracted, a template of a drum sound is stored in a storage unit in advance. Nevertheless, it is extremely rare that the drum sound contained in an audio signal agrees completely with the drum sound of the template stored in advance. These sounds usually differ from each other more or less. Thus, the spectral component of the template is adapted in such a manner that the difference between the extracted spectral component and the spectral component of the template goes below or at a predetermined value. This ensures that the drum sound contained in the audio signal agrees approximately with the drum sound of the template stored in advance. This improves the accuracy in the extraction of the drum sound, and hence permits accurate increase or decrease of the extracted drum sound. Further, this approach allows various drum sounds to be extracted on the basis of a single template.
In the fifth, thirteenth and twenty-third inventions, in case that a plurality of spectral components have been extracted, the difference between each extracted spectral component and a spectral component of a template is calculated. Then, a predetermined number of spectral components are selected in ascending order of the calculated difference. The spectral component of the template is then updated into the median of the predetermined number of selected spectral components, so that the template is adapted. The spectral structure of a non-harmonic structured spectral component usually appears in the same position of the selected spectral components. In contrast, the spectral structure of a harmonic structured spectral component seldom appears in the same position of the selected spectral components. Thus, when the median is used, the spectral structure of the non-harmonic structured spectral component is expected to be retained, whereas harmonic structured musical instrumental sounds other than the sound of a percussion instrument such as a drum are seldom retained. As a result, the spectra of spectral components not having a non-harmonic structure are suppressed.
In the sixth, fourteenth and twenty-fourth inventions, extracted spectral components and a spectral component of a template are quantized in the initial adaptation for the spectral component of the template, and then the difference is calculated between each extracted spectral component and the spectral component of the template which have been quantized. Without template adaptation, since it is extremely rare that a drum sound, for example, contained in an audio signal agrees completely with a template drum sound, a large difference could be erroneously calculated despite that the two sounds are alike. In contrast, when the extracted spectral components and the spectral component of the template are quantized, and when a representative value such as the median is used in the difference calculation, it is suppressed that a large difference is erroneously calculated despite that the two sounds are alike.
In the seventh, fifteenth and twenty-fifth inventions, an amount of increase or decrease for a predetermined spectral component is received, and then the extracted predetermined spectral component is increased or decreased in response to the received amount of increase or decrease. For example, an increase and decrease knob similar to a volume control knob for the power of the audio signal may be used for inputting the amount of increase or decrease. A user adjusts the increase and decrease knob so as to vary the power of the extracted predetermined spectral component independently of the power of the audio signal.
In the eighth, sixteenth, seventeenth, eighteenth, twenty-sixth and twenty-seventh inventions, in a first audio signal processing apparatus, a predetermined non-harmonic structured spectral component contained in an audio signal is extracted. Then, outputted are onset time information of the extraction of the predetermined non-harmonic structured spectral component from the audio signal, the predetermined spectral component, and the audio signal. These outputs may be recorded in a recording medium or transmitted through a communication network. In a second audio signal processing apparatus, the onset time information, the predetermined spectral component, and the audio signal which have been outputted are received. Then, the received spectral component contained in the received audio signal is increased or decreased on the basis of the received onset time information. Various types of information described here may be received in the form of a recording medium or through a communication network. The extraction of a predetermined non-harmonic structured spectral component is a task of heavy load, and hence is desired to be carried out by a high performance computer or the like. In contrast, the increasing or decreasing of a predetermined spectral component is a task of light load, and hence may be carried out by a general audio device or the like. As such, according to the invention, the load is efficiently distributed so that even an audio device of low performance can increase or decrease the predetermined non-harmonic structured spectral component.
According to the first, ninth and nineteenth inventions, a predetermined spectral component contained in an audio signal can be independently increased or decreased without an influence on the other spectral components.
According to the second, tenth and twentieth inventions, a non-harmonic structured sound such as a drum sound can be extracted from an audio signal on the basis of the spectrum distribution.
According to the third, fourth, eleventh, twelfth, twenty-first and twenty-second inventions, the accuracy is improved in the extraction of a non-harmonic structured sound such as a drum sound. This permits accurate increase or decrease of the extracted drum sound. Further, the invention allows various non-harmonic structured sounds such as various drum sounds to be extracted on the basis of a single template.
According to the fifth, thirteenth and twenty-third inventions, a template is obtained in which the spectra of spectral components not having a non-harmonic structure are suppressed.
According to the sixth, fourteenth and twenty-fourth inventions, it is suppressed that a large difference is erroneously calculated despite that an extracted spectral component and a spectral component of a template are alike.
According to the seventh, fifteenth and twenty-fifth inventions, the power of an extracted predetermined spectral component can be adjusted independently of the power of the audio signal.
According to the eighth, sixteenth, seventeenth, eighteenth, twenty-sixth and twenty-seventh inventions, the process of extracting a predetermined non-harmonic structured spectral component and the process of increasing or decreasing the spectral component are carried out by different apparatuses from each other. Thus, the load is efficiently distributed so that even a general audio device or the like can increase or decrease a predetermined non-harmonic structured spectral component.
The above and further objects and features of the invention will more fully be apparent from the following detailed description with accompanying drawings.
The invention is described below in detail with reference to drawings showing its embodiments.
The CPU 11 controls the system components 12 through 17 described above. The CPU 11 causes the RAM 12 to store programs and data received through the input unit 15 or the communication unit 17, programs and data read out from a recording medium by the HDD 13 or the external storage unit 14 and the like. Further, the CPU 11 performs various processing such as the execution of the programs stored in the RAM 12 and arithmetic operations on the stored data, and causes the RAM 12 to store the results of the various processing as well as temporary data used in the various processing. The data such as operation results temporarily stored in the RAM 12 is transferred to the HDD 13 and outputted through the display unit 16 or the communication unit 17 under the control of the CPU 11.
The HDD 13 stores an audio signal (sound data) received from the outside by the computer 10. The computer 10 extracts a non-harmonic structured sound (spectral component) such as the sound of a percussion instrument such as a drum contained in the audio signal, and then increases or decreases the extracted sound. Amount of increase or decrease of the extracted sound is received through the input unit (receiving means) 15. The non-harmonic structured sound is a sound having almost no harmonic structure. However, the sound may contain a very weak harmonic structure negligible in comparison with general musical instrumental sounds having a harmonic structure.
The CPU 11 serves as means (calculating means) for calculating the power spectrum P(t, f) of an audio signal at a frame t and frequency f. In an example, the audio signal is sampled in 44.1 kHz. Then, an STFT (Short Time Fourier Transformation) is calculated using a Hanning window having a window width of 4096 points (a frequency resolution of 10.8 Hz) and a window shift length of 441 points (a time resolution of 10 ms), so that the power spectrum P(t, f) is obtained.
The CPU 11 serves also as means for detecting an onset time candidate oi of a drum. The onset time candidate oi of the drum is detected, for example, as a time (frame) where the power spectrum rises steeply. In three successive frames in the time direction (t=a+1, a, a+1), in case that the differential Q(t, f)={∂P(t, f)/∂t} of P(t, f) with respect to time (frame) satisfies Q(t, f)>0, the CPU 11 calculates the differential Q(a, f) at frame a. On the contrary, in case that Q(t, f)>0 is not satisfied in the three successive frames, the CPU 11 sets Q(a, f)=0. Then, at each frame t, the CPU 11 multiplies Q(t, f) by a low pass filter function F(f) based on the typical frequency characteristics of a drum, and calculates a sum S(t) in the frequency direction according to the following equation.
The HDD (storage unit) 13 stores a seed template Ts created on the basis of a single tone signal of a drum. The seed template Ts is a power spectrum having a predetermined time length and acquired by STFT starting at an onset time. The seed template Ts is in the form of a matrix the row of which corresponds to time and the column of which corresponds to frequency. Each component is specified as a seed template Ts(t, f) (where 1≦t≦15 and 1≦f≦2048).
The CPU 11 serves as means (adapting means) for adapting the seed template Ts to an audio signal of the target of analysis. The CPU 11 updates the seed template Ts as described later, and repeats the update of the template after that. The template having undergone the g-th update is expressed by Tg. Since the seed template Ts is the initially inputted (g=0) template, T0=Ts. The CPU 11 serves as means (calculating means) for extracting a spectrum segment Pi (i=1, . . . , N, where N is the total number of detected onset time candidates) which is a power spectrum having a predetermined time length and starting at an onset time candidate oi (ms) detected from the audio signal of the target of analysis. The spectrum segment Pi is a matrix having the same size as the template Tg.
The extraction of the spectrum segment is carried out as described above. Nevertheless, the time resolution of 10 ms is not sufficient for the template to be adapted accurately. Thus, a correction process is preferably performed on the onset time candidate oi. In an example, the CPU 11 serves as means for correcting the onset time candidate oi (ms) into oi′ (ms), and then extracts a spectrum segment Pi for the corrected onset time candidate oi′ (ms). For example, in case that a spectrum segment selected from those of oi′=oi−5 ms or oi+5 ms has better quality than that extracted from those of oi (ms), the CPU 11 adopts as the spectrum segment Pi the power spectrum extracted from those starting at time oi′ (ms).
In an example, the CPU 11 extracts a spectrum segment Pi j starting at time oi+j (ms) (where j=−5 ms, 0 ms and 5 ms). Then, the CPU 11 calculates the correlation value Corr(j) between the template Tg′ and the spectrum segment Pi,j according to the following equation.
The CPU 11 then acquires an offset value J maximizing the correlation value Corr(j), and determines the Pi j with the obtained offset value J to be Pi.
The CPU 11 further calculates a template Tg′ and a spectrum segment Pi′ which are generated by multiplying the template Tg and the spectrum segment Pi respectively by the low pass filter function F(f) according to the following equations.
Tg′(t,f)=F(f) Tg(t,f)
Pi′(t,f)=F(f) Pi(t,f)
The CPU 11 serves as means (selecting means) for selecting a predetermined number M of spectrum segments that are alike to the template Tg in the course of adaptation. The predetermined number M has a constant ratio (0.1 in the present embodiment) to the total number of spectrum segments (detected onset time candidates). The CPU 11 serves also as subtracting means. That is, the CPU 11 calculates the distance (difference) Di between the template Tg and the spectrum segment Pi, and then selects a predetermined number M of spectrum segments in ascending order of the calculated distance. The distance Di may be calculated according to the following equation.
In case that the distance Di is calculated according to the above equation, a large distance is calculated despite that the power peak position in the template Tg differs merely slightly from that in the spectrum segment Pi. This occurs a possibility that accurate calculation of the distance can not be executed.
In order to avoid this situation, in the invention, the seed template T0 (Ts) and the spectrum segment Pi are quantized with lower time and frequency resolutions in the initial adaptation as shown in
The CPU 11 then calculates the distance Di between the seed template T0 (Ts) and the spectrum segment Pi according to the following equation.
The CPU 11 serves also as updating means for updating the template Tg into a new template Tg+1 on the basis of the predetermined number M of selected spectrum segments Ps (s=1, . . . , M). It is probable that the spectral structure of a drum sound appears in the same position in each spectrum segment PS. In contrast, the sound spectral components of musical instruments other than the drum seldom appear in the same position in each spectrum segment Ps. Thus, the CPU 11 determines as a new template Tg+1 the median of the selected spectrum segments Ps as follows.
Tg+1(t,f)=medianPs(t,f)
When the median is used as described here, the spectral structure of the drum sound is expected to be retained. In contrast, instrumental sounds other than the drum sound are seldom retained. Thus, the sound spectral components of musical instruments other than the drum are expected to be suppressed. As such, the seed template To can be adapted to a drum sound in an audio signal containing plural types of instrumental sounds.
When the determination of a new template Tg+1 is repeated, the drum sound of the template approaches the drum sound contained in the audio signal so that the template adaptation is achieved. In the course of repetition of the determination, the amount of change in the template goes smaller so that the adaptation converges. The CPU 11 serves as means for comparing the present template Tg with a new template Tg+1, and thereby determining the convergence of adaptation in case that the difference between the two spectra goes below or at a predetermined value. At that time, the CPU 11 adopts the new template Tg+1, as an adapted template TA.
The CPU 11 serves also as means (extracting means) for performing template matching based on the adapted template TA and thereby determining whether the drum is generating a sound at an onset time candidate oi or not. The CPU 11 multiplies the adapted template TA by the low pass filter function F(f) described above, and thereby calculates according to the following equation a weight function ω that indicates the magnitude of characteristics on the spectrum at each frame t of the adapted template TA and at each frequency f.
ω(t, f)=F(f)·TA(t, f)
In case that the power of each spectrum segment differs from that of the template, it is not sure that the determination whether the template is contained in the spectrum segment or not is performed appropriately. Thus, for the purpose of ensuring appropriate template matching, the power of each spectrum segment is preferably adjusted such that the power matches with that of the template. The CPU 11 selects the frequency ft,k (k=1, . . . , 15) of a characteristic point having the k-th largest value of ω(t, ft, k) at frame t in the template TA, and then calculates the power difference ηi(t, ft,k) according to the following equation.
ηi(t, ft,k)=Pi(t, ft,k)−TA(t, ft,k)
Then, the CPU 11 selects the value of ηi(t, ft,k) at the first quartile point (the point at 25% of the sample set sorted in ascending order), and thereby adopts this value as the power difference δi(t) at frame t. In case that the number of frames that do not satisfy δi(t)≧Ψ (Ψ is a negative constant) exceeds a predetermined threshold value R, the CPU 11 determines that TA is not contained in the spectrum segment Pi.
The CPU 11 calculates the final power difference Δi (the adjustment value for the spectrum segment: −Δi) according to the following equation.
In case that Δi≦Θ (Θ is a constant) is satisfied, the CPU 11 determines that the adapted template TA is not contained in the spectrum segment Pi. In case that Δi≦Θ is not satisfied, the CPU 11 determines that the adapted template TA is contained in the spectrum segment Pi, and then calculates an adjusted spectrum segment Pi′ according to the following equation.
Pi′(t, f)=Pi(t, f)−Δi
The CPU 11 serves also as means for calculating the distance between the adapted template TA and the adjusted spectrum segment Pi′. At the calculation of the distance, the CPU 11 determines whether the spectrum of the adapted template TA is contained in the spectrum of the spectrum segment Pi′.
Here, Ψ is a negative constant. When a non-zero negative number is used as Ψ, a small variation in the spectral component can be absorbed. The CPU 11 integrates the distance measure γi over the time-frequency domain, and thereby acquires the overall distance Γi. At that time, the CPU 11 performs a weighting operation of multiplying the distance measure by the weight function co according to the following equation.
The CPU 11 serves also as means for determining whether the target drum has generated a sound in the spectrum segment Pi′(t, f) portion or not. More specifically, in case that Γi<θ is satisfied, the CPU 11 determines that the target drum has generated a sound, and then decides the onset time candidate oi as the onset time.
The CPU 11 serves also as increasing and decreasing means for increasing or decreasing a drum sound at onset time.
As described above, the CPU 11 calculates various numerical data. The numerical data calculated by the CPU 11 is stored in the RAM 12 or the HDD 13. Further, when the CPU 11 is to calculate other numerical data on the basis of already calculated numerical data, the CPU 11 reads necessary numerical data from the RAM 12 before the new calculation.
A computer program stored in a recording medium 19 such as a CD-ROM is read by the external storage unit 14 and then temporarily stored in the HDD 13 or the RAM 12. After that, the computer program is executed by the CPU 11. This approach allows the CPU 11 to serve as various system components described above. Alternatively, a computer program may be received via the communication unit 17 from another apparatus connected to the communication network 20, and then temporarily stored in the HDD 13 or the RAM 12. After that, the computer program may be executed by the CPU 11.
Described below is a practical procedure of increasing or decreasing a drum sound by using a computer (audio signal processing apparatus) according to the invention.
The computer 10 reads an audio signal (sound data), for example, from a recording medium 19 in the external storage unit 14, and then stores the data into the HDD 13. Alternatively, the computer 10 may store into the HDD 13 sound data (an audio signal, hereafter) that are inputted through a sound card (not shown) and then converted into an audio signal. The computer 10 further reads a drum sound template (seed template Ts), for example, from a recording medium 19 in the external storage unit 14, and then stores the data into the HDD 13.
The CPU 11 first performs frequency analysis on the audio signal so as to calculate the power spectrum P, and then stores into the HDD 13 the data of the calculated power spectrum P. The CPU 11 then detects an onset time candidate oi (S10) on the basis of a power spectrum P extracted and stored in the HDD 13. The CPU 11 stores the detected onset time candidate oi into the HDD 13. On the basis of the onset time candidate oi, the CPU 11 extracts (calculates) a spectrum segment Pi (S12), and then stores the data of the extracted spectrum segment Pi into the HDD 13. After that, the CPU 11 performs template adaptation (template adaptation) (S14), and thereby updates the updated template Tg (seed template Ts in the beginning) stored in the HDD 13. As a result, the template converges into an adapted template TA.
After that, the CPU 11 performs template matching by using the adapted template TA, and then decides the onset time (extracts a drum sound) (S16). The CPU 11 stores the decided onset time into the HDD 13. Using the adapted template TA, the CPU 11 increases or decreases the power spectrum in the vicinity of the decided onset time (S18), and thereby creates an audio signal used as an output. The CPU 11 stores this audio signal into the HDD 13. The increase or decrease of the power spectrum is performed in response to the amount of increase or decrease received through the input unit 15. The audio signal (sound data) used as an output may be outputted and recorded into a recording medium 19 in the external storage unit 14. Alternatively, the audio signal used as an output may be outputted through a sound card not shown.
The above-mentioned embodiment has been described in the case that the audio signal processing apparatus according to the invention is embodied in the form of a software process using a computer. However, the invention is applicable also to various types of apparatuses for outputting an audio signal such as a recording device, an electronic musical instrument, an audio device, a portable audio device, and a portable telephone or the like.
The control unit 31 serves as means for extracting a predetermined non-harmonic structured spectral component such as a drum sound contained in an audio signal as well as means for increasing or decreasing the extracted predetermined spectral component. The control unit 31 serves also as means for calculating the spectrum of an audio signal by frequency analysis, and thereby extracts a spectrum corresponding to the predetermined non-harmonic structured spectral component. The extraction of the predetermined non-harmonic structured spectral component is performed with reference to a spectral component of a template stored in the flash memory (storage unit) 33 in advance. The control unit 31 serves as means for adapting the spectral component of the template in such a manner that the difference between the extracted spectral component and the spectral component of the template stored in the flash memory 33 goes below or at a predetermined value. More specifically, the control unit 31 serves as in case that a plurality of spectral components have been extracted: means for calculating the difference between each extracted spectral component and the spectral component of the template; means for selecting a predetermined number of spectral components in ascending order of the calculated difference; and means for updating the spectral component of the template into the median of the predetermined number of selected spectral components. As such, the control unit 31 adapts the spectral component of the template.
The control unit 31 serves also as means for quantizing each extracted spectral component and the spectral component of the template in the initial adaptation for the spectral component of the template, and thereby calculates the difference between each extracted spectral component and the spectral component of the template that have been quantized. The operation unit 35 serves as means for receiving the amount of increase or decrease of the predetermined spectral component, so that the control unit 31 increases or decreases the extracted predetermined spectral component in response to the amount of increase or decrease received through the operation unit 35. In an example, in addition to a volume control knob for the overall power of the audio signal, the operation unit 35 comprises a volume control knob for bass drum.
Similarly to the computer shown in
In the configuration shown in
The above-mentioned embodiment has been described in the case that a non-harmonic structured sound such as a drum sound is extracted and increased or decreased. However, the invention is not limited to the drum sound. A non-harmonic structured sound generated by another percussion instrument such as cymbals may be extracted and increased or decreased. Further, a non-harmonic structured sound generated by another type of sound source may be extracted and increased or decreased. Further, a bass drum sound or a snare drum sound among various types of drum sounds may be extracted and increased or decreased.
An audio signal processed according to the invention may contain a voice signal. For example, a predetermined non-harmonic structured spectral component may be extracted from an audio signal of music containing a vocal, and then the extracted spectral component may be increased or decreased. Further, a predetermined non-harmonic structured spectral component may be extracted from an audio signal containing a voice of the target of speech recognition, and then the extracted spectral component may be increased or decreased. Accordingly, in speech recognition, a predetermined non-harmonic structured spectral component contained in voice data can be extracted and decreased. Such a non-harmonic structured spectral component contained in voice data is a noise component in many cases. Thus, the noise component can be cancelled by extracting and decreasing it. This improves the accuracy in the speech recognition.
Further, the above-mentioned embodiment has been described in the case that once the onset time is decided, the power spectrum is immediately increased or decreased in the vicinity of the onset time (S16 and S18 in
As this invention may be embodied in several forms without departing from the spirit of essential characteristics thereof, the present embodiment is therefore illustrative and not restrictive, since the scope of the invention is defined by the appended claims rather than by the description preceding them, and all changes that fall within metes and bounds of the claims, or equivalence of such metes and bounds thereof are therefore intended to be embraced by the claims.
Number | Date | Country | Kind |
---|---|---|---|
2004-181881 | Jun 2004 | JP | national |