The present invention relates to a technique for evaluating a degree of consonance or dissonance between a plurality of sounds.
Heretofore, there have been proposed techniques for evaluating a degree of an auditory difference (i.e., consonance or dissonance) between a plurality of sounds. Japanese Patent Application Laid-open Publication No. 2007-316416 (hereinafter referred to as “Patent Literature 1”) and International Publication WO 2006/079813 (hereinafter referred to as “Patent Literature 2”), for example, disclose techniques for measuring a difference in pitch between a singing voice sound of a user and a normative sound (i.e., model sound) and correcting the pitch of the singing sound.
However, with the techniques disclosed in Patent Literature 1 and Patent Literature 2, where it is necessary to detect the pitches (fundamental frequencies) of the singing sound and the model sound in order to evaluate a degree of difference between the singing sound and the model sound, there would arise the problem that, if the singing sound and the model sound greatly differ from each other in pitch, a degree of consonance or dissonance between the two sounds can not be evaluated appropriately. Although the foregoing have discussed the prior art problem involved in evaluating singing sounds, a similar problem would arise when evaluating other sounds than singing sounds, such as tones performed by musical instruments.
In view of the foregoing, it is an object of the present invention to provide an improved sound processing apparatus and program which can evaluate a degree of consonance or dissonance between a plurality of sounds appropriately with high accuracy.
In order to accomplish the above-mentioned object, the present invention provides an improved sound processing apparatus, which comprises: a mask generation section that generates an evaluating mask indicative of a degree of dissonance with a first sound per each frequency along a frequency axis, by setting, for each of a plurality of peaks in spectra of the first sound, a dissonance function indicative of relationship between a frequency difference from the peak and a degree of dissonance with a component of the peak; and an index calculation section that collates spectra of a second sound with the evaluating mask to thereby calculate a consonance index value indicative of a degree of consonance or dissonance between the first sound and the second sound. The term “sound” is used herein to refer to any of desired sounds, including not only a voice uttered by a person but also a tone performed by a musical instrument, operating sound of a machine, etc.
In the sound processing apparatus of the present invention, the evaluating mask, generated by setting a dissonance function for each of a plurality of peaks in spectra of the first sound, is used for calculation of a consonance index value indicative of a degree of consonance or dissonance between the first sound and the second sound. Thus, in principle, the present invention can eliminate the need for detecting the fundamental frequencies of the first and second sounds. As a result, the present invention can evaluate, with high accuracy, a degree of consonance or dissonance between the first and second sounds, regardless of the fundamental frequencies of the first and second sounds.
In a preferred implementation, the spectra of the first sound are supplied as a spectral trajectory comprising a time-series arrangement of spectra, the mask generation section generates a time-series trajectory of the evaluating masks; the spectra of the second sound are supplied as a spectral trajectory comprising a time-series arrangement of spectra, and the index calculation section collates the spectral trajectory of the second sound with the trajectory of the evaluating masks. Because the spectral trajectory of the second sound is collated with the trajectory of the evaluating masks, the present invention can evaluate degrees of consonance or dissonance between the first and second sounds in view of changes over time of the first and second sounds.
Preferably, the sound processing apparatus of the present invention further comprises: a correlation calculation section that calculates a correlation value between the spectra of the first sound and the spectra of the second sound; and a shift processing section that shifts the spectra of the second sound, in a direction of the frequency axis, by a given frequency difference such that the correlation value calculated by the correlation calculation section becomes maximum. The index calculation section collates the spectra of the second sound, having been processed by the shift processing section, with the evaluating mask. Because the spectra of the second sound are shifted in the direction of the frequency axis, by a given frequency difference such that the correlation value between the first and second sounds becomes maximum and then collated with the evaluating mask, the present invention can evaluate, with high accuracy, a degree of consonance or dissonance between the first and second sounds, for example, even where the first and second sounds differ from each other in pitch range.
Preferably, the correlation calculation section includes: a band processing section that generates a band intensity distribution of the first sound indicative of a spectral intensity of each predetermined unit band of the first sound and generates a band intensity distribution of the second sound indicative of a spectral intensity of each predetermined unit band of the second sound; and an arithmetic operation processing section that calculates, per each frequency difference corresponding to the unit band, a correlation value between the band intensity distribution of the first sound and the band intensity distribution of the second sound. Because a correlation value between the band intensity distribution of the first sound and the band intensity distribution of the second sound is calculated in the present invention, the correlation value calculation processing can be simplified as compared to a case where, for example, a correlation value between the frequency spectra of the first and second sounds is calculated.
In a further preferred implementation, the correlation calculation section further includes a first correction value calculation section that calculates, for each of the frequency differences between the first sound and the second sound, a first correction value corresponding to a sum of the intensities in a portion of the band intensity distribution of the first sound that does not overlap with the band intensity distribution of the second sound; a second correction value calculation section that calculates, for each of the frequency differences between the first sound and the second sound, a second correction value corresponding to a sum of the intensities in a portion of the band intensity distribution of the second sound that does not overlap with the band intensity distribution of the first sound; and a correction section that, for each of the frequency differences, subtracts the first and second correction values from the correlation value calculated by the arithmetic operation processing section and thereby corrects the correlation value. The aforementioned arrangements of the present invention can avoid the inconvenience that the correlation value increases despite high intensities in a portion of the band intensity distribution of one of the first and second sounds that does not overlap with the band intensity distribution of the other sound, and thus, the present invention allows the pitches of the first and second sounds to highly coincide with each other.
Preferably, when a plurality of the dissonance functions overlap each other on the frequency axis, the mask generation section generates the evaluating mask by selecting a maximum value of the degrees of dissonance at the frequency in the plurality of the dissonance functions. Thus, even where adjoining peaks in the spectra of the first sound are located close to each other so that a plurality of the dissonance functions overlap on the frequency axis, the present invention can generate an evaluating mask having degrees of dissonance of the individual peaks properly set therein.
Preferably, the mask generation section generates the evaluating mask by adding or subtracting a predetermined value to or from the degree of dissonance of the dissonance function set on the frequency axis. Because the degree of dissonance in the evaluating mask can be appropriately adjusted through the addition or subtraction of the predetermined value, the present invention can generate an evaluating mask suited for collation with the spectra of the second sound.
Preferably, the index calculation section includes: an intensity identification section that identifies a maximum value of amplitudes of the peaks in the spectra of the second sound; a collation section that multiplies, for each of the frequencies, each of the amplitude of the spectral trajectory of the second sound and each numerical value of the evaluating mask, to thereby output a product for each of the frequencies; and an index determination section that determines a consonance index value by dividing a maximum value of the products, outputted by the collation section, by the maximum amplitude value identified by the intensity identification section. Because the maximum value of the products, outputted by the collation section, is normalized through the division by the maximum value of the amplitudes of the peaks in the spectra of the second sound, the present invention can calculate an appropriate consonance index value while effectively reducing influence of amplitude levels of the spectra of the second sound.
Preferably, the index calculation section calculates the consonance index value for each of a plurality of cases where the spectra of the second sound have been shifted by different shift amounts in the direction of the frequency axis, and the sound processing apparatus of the invention further comprises a tone pitch adjustment section that changes a tone pitch of the second sound by a given shift amount such that the degree of consonance indicated by the consonance index value becomes maximum (or the degree of dissonance becomes minimum). Because the tone pitch of the second sound is adjusted by a shift amount corresponding to the consonance index value, the present invention can generate a second sound highly consonant with the first sound.
Preferably, the index calculation section collates each of a plurality of the second sounds with the evaluating mask, to thereby calculate a consonance index value for each of the second sounds. Because a consonance index value is calculated individually for each of the second sounds, the present invention can select, from among the plurality of the second sounds, a sound having a high degree of consonance or dissonance with the first sound.
The aforementioned sound processing apparatus of the present invention may also be constructed and implemented as a computer-implemented method. Also, the present invention may be implemented by hardware (electronic circuitry), such as a DSP (Digital Signal processor) dedicated to the inventive sound processing, as well as by cooperation between a general-purpose arithmetic operation processing device, such as a CPU (Central processing Unit) and a software program. Further, the processor used in the present invention may comprise a dedicated processor with dedicated logic built in hardware, not to mention a computer or other general-purpose type processor capable of running a desired software program.
The following will describe embodiments of the present invention, but it should be appreciated that the present invention is not limited to the described embodiments and various modifications of the invention are possible without departing from the basic principles. The scope of the present invention is therefore to be determined solely by the appended claims.
For better understanding of the object and other features of the present invention, its preferred embodiments will be described hereinbelow in greater detail with reference to the accompanying drawings, in which:
As shown in
The arithmetic operation processing device 12 functions as a sound evaluation section 20. The sound evaluation section 20 calculates an index value of consonance D between one of the sounds VA (hereinafter referred to as “target sound VA”) and another one of the sounds VB (hereinafter referred to as “evaluated sound VB”) stored in the storage device 14. The index value of consonance (hereinafter referred to as “consonance index value”) D is a numerical value indicative of a degree of dissonance, with the target sound VA, of the evaluated sound VB which a human listener auditorily perceives when that the target sound VA and evaluated sound VB are reproduced in parallel or in succession. There is a tendency that the greater the consonance index value D of the evaluated sound VB, the more difficult for the evaluated sound VB to be musically consonant with the target sound VA (i.e., the smaller the consonance index value D of the evaluated sound VB, the easier for the evaluated sound VB to be musically consonant with the target sound VA). The consonance index value D calculated by the sound evaluation section 20 is output, for example, from a display device or sounding device as an image or sound. User can recognize a degree of dissonance between the target sound VA and the evaluated sound VB by knowing the consonance index value D. Although the instant embodiment will be described assuming that the target sound VA and the evaluated sound VB have a same time length, these sound VA and VB may have different time lengths.
As shown in
The quantization section 24 of
First, as shown in
Second, as also shown in
The mask generation section 30 of
(A) of
As shown in (A) of
The degree of dissonance D0(f) calculated through the aforementioned arithmetic operations sometimes may not become zero at the frequency fp of the peak q of the target sound VA. However, components of sounds which have a same or common frequency f naturally become consonant with each other, i.e. present a zero degree of dissonance D0(f). Thus, for each of the peaks p, the second adjustment section 36 of
The third adjustment section 38 of
Dmask(f)=D0(f)−Dmax+k (2)
Further, the third adjustment section 38 establishes an evaluating mask M by setting all degrees of dissonance D0(f) below zero at zero, as shown in (E) of
The evaluating mask M is generated in accordance with the aforementioned procedure, and thus, in a case where the evaluated sound VB contains a lot of components of frequencies f having high degrees of dissonance Dmask(f) defined in the evaluating mask M, the evaluated sound VB has a high possibility of being dissonant with the target sound VA. Thus, the index calculation section 60 of
However, if the target sound VA and the evaluated sound VB do not coincide with each other in pitch range, a range of frequencies f having high degrees of dissonance Dmask(f) in the evaluating mask M and a range of frequencies fp of peaks p of the spectral trajectory RB differ from each other. Thus, even if the target sound VA and the evaluated sound VB are sounds musically dissonant with each other, the index value D calculated by the collation between the evaluating mask M and the spectral trajectory RB takes a small value (namely, the two sounds VA and VB are evaluated as consonant with each other). In order to avoid the above-mentioned non-coincidence, the correlation calculation section 40 and shift processing section 50 of
The correlation calculation section 40 of
The band processing section 42 generates band intensity distributions S (SA and SB) from the spectral trajectories R (RA and RB) generated by the quantization section 24 per each of the unit portions TU. Namely, the band intensity distribution SA is generated from the spectral trajectory RA, while the band intensity distribution SB is generated from the spectral trajectory RB.
As shown in
The arithmetic operation processing section 44 of
Because the correlation value C0 is calculated only for overlapping portions between the band intensity distribution SA and the band intensity distribution SB, the correlation value C0 calculated by the arithmetic operation processing section 44 may sometimes take a great value even where respective conspicuous components (components of great amplitudes within bands) of the band intensity distribution SA and band intensity distribution SB are present in portions of the band intensity distribution SA and the band intensity distribution SB that do not overlap with each other at the frequency difference Δf in question. However, if respective conspicuous components of the band intensity distribution SA and the band intensity distribution SB are present in non-overlapping portions between the distributions SA and SB as noted above, these band intensity distributions SA and SB should be evaluated as having a low correlation as a whole. In view of the foregoing, the correction section 48 in the instant embodiment corrects the correlation value C0, calculated by the arithmetic operation processing section 44, in accordance with intensities in the non-overlapping portions between the band intensity distributions SA and SB. More specifically, the correction section 48 lowers the correlation value C0 calculated by the arithmetic operation processing section 44 for the frequency difference Δf at which the components in the non-overlapping portions between the band intensity distributions SA and SB become conspicuous. The following paragraphs describe a specific example manner in which the correlation value C0 is corrected.
The first correction value calculation section 461 of
Similarly, the second correction value calculation section 462 of
The correction section 48 calculates a corrected correlation value C by subtracting the correction values A1 and A2 from the correlation value C0 per each frequency difference Δf. (E) of
The shift processing section 50 of
(B) of
The index calculation section 60 of
The collation section 64 collates the spectral trajectory RB of each of the Nt unit portions TU with the evaluating mask M created from the spectral trajectory RA of the unit portion TU. More specifically, the collation section 64 calculates, for each of a plurality of bands Bq (each of 10 cents) of the spectral trajectories RB where there exists a peak p, calculates an index value d by multiplying (1) the degree of dissonance Dmask(fp) at the frequency fp of the peak p in the evaluating mask M and (2) the amplitude ap of the peak p in the spectral trajectory RB (d=Dmask(fp)·ap). The collation between the spectral trajectory RB and the evaluating mask M (i.e., calculation of the index value d per each band Bq) is performed for every one of the Nt unit portions TU of the evaluated sound VB.
As shown in
In the instant embodiment, as described above, a consonance index value D between the target sound VA and the evaluated sound VB is calculated using the evaluating mask M having a dissonance function Fd set for each of a plurality of peaks p in the spectral trajectory RA of the target sound VA. Thus, in principle, the instant embodiment can eliminate the need for detecting the fundamental frequencies of the target sound VA and evaluated sound VB. As a result, the instant embodiment can evaluate, with high accuracy, a degree of dissonance (or consonance) between the target sound VA and the evaluated sound VB even in the case where the target sound VA and the evaluated sound VB differ from each other in fundamental frequency or where a component of the fundamental frequency is missing from the target sound VA or from the evaluated sound VB.
Further, because the spectral trajectories RB of the evaluated sound VB are shifted along the frequency axis in such a manner than the pitch range of the target sound VA and the pitch range of the target sound VB approach each other, the instant embodiment can evaluate, with high accuracy, a degree of dissonance (or consonance) between the target sound VA and the evaluated sound VB even in the case where the target sound VA and the evaluated sound VB differ from each other in pitch range (e.g., where the target sound VA and the evaluated sound VB are performed on different musical instruments). Further, with the instant embodiment, where the corrected correlation value C based on the correction values A1 and A2 is used to determine a shift amount ΔF of the spectral trajectories RB, the pitch range of the target sound VA and the pitch range of the target sound VB can be caused to approach each other with high accuracy regardless of bands of the spectral trajectories RA and RB where there exist respective conspicuous components
The following describe a second embodiment of the sound processing apparatus of the present invention. In the following description about the second embodiment, the same elements as in the first embodiment are indicated by the same reference numerals and characters and will not be described here to avoid unnecessary duplication.
The tone pitch adjustment section 70 of
The sound evaluation section 20 selects an evaluated sound VB of which the calculated consonant index values D are minimal (i.e., which is most consonant with a target sound VA) from among the plurality of evaluated sounds VB stored in the storage device 14. Namely, in the third embodiment, it is possible to extract, from among the plurality of evaluated sounds VB, an evaluated sound VB sufficiently auditorily consonant with the target sound VA. Such an evaluated sound VB identified by the sound evaluation section 20 can be suitably used, for example, for mixing or connection with the target sound VA or for composition of a new music piece.
Whereas the third embodiment of the present invention has been described above as selecting one evaluated sound VB, it may be constructed to select a plurality of evaluated sounds VB ranked high in descending order of the consonant index values D (and use these selected evaluated sounds for mixing or connection with the target sound VA). Further, the arrangements of the second embodiment may be applied to the third embodiment. For example, regarding one of the plurality of evaluated sounds VB, stored in the storage device 14, for which the consonance index values D become minimum, a shift amount ΔP with which the consonant index values D become minimum with respect to the target sound VA may be determined in generally the same manner as in the second embodiment so that the tone pitch adjustment section 70 changes the tone pitches of the evaluated sounds VB by the shift amount ΔP.
<Modification>
The above-described embodiments may be modified variously. Specific example modifications will be set forth below, and two or more of these modifications may be combined as desired.
(1) Modification 1:
Whereas each of the embodiments has been described above as constructed to calculate spectral trajectories R (RA and RB) at the time of the calculation of the consonant index values D, it is also advantageous to calculate and store, in the storage device 14, spectral trajectories R of individual sounds V (target and evaluated sounds VA and VB) in advance. In the case where a plurality of evaluated sounds VB are collated with a target sound VA as in the above-described third embodiment, it is particularly advantageous to calculate and store in advance spectral trajectories R of a plurality of sounds V (target and evaluated sounds VA and VB), with a view to reducing the time required for calculation of the spectral trajectories R of each of the sounds V at the time of the calculation of the consonant index values D. Further, it is also advantageous to employ a construction where spectral trajectories R calculated by an external apparatus are supplied to the arithmetic operation processing device 12 via a communication network or via a portable storage or recording medium; in this case, the frequency analysis section 22 and quantization section 24 are omitted from the sound evaluation section 20. In the aforementioned modification where spectral trajectories R are prepared in advance, sounds V need not be stored in the storage device 14. Whereas the foregoing have described the storage and supply of spectral trajectories R, there may be employed another modified construction where band intensity distributions S (SA and SB) too are stored in advance in the storage device 14 or supplied from an external apparatus.
(2) Modification 2:
The way in which the index calculation section 60 calculates a consonance index value D may be modified as appropriate. For example, there may be employed a modified construction where index calculation section 60 calculates a consonance index value D by the collation section 64 by averaging index values d, calculated per each spectral trajectory RB, over Nt unit portions TU. Namely, the present invention may advantageously employ a modified construction where a consonance index value D is calculated through collation between the spectral trajectory RB of the evaluated sound VB and the evaluating mask M, and relationship between results of the collation between the spectral trajectory RB and the evaluating mask M and calculated consonance index values D may be defined in any desired form or manner. Further, whereas each of the embodiments has been described above as constructed to determine the maximum value of index values d as a consonance index value D, there may be advantageously employed a modified construction where the minimum value of index values d as a consonance index value D (i.e., where a greater consonance index value D is set as the degree of consonance between target and evaluated sounds VA and VB increases). Namely, the consonance index value D is defined as an index indicative of a degree of either consonance or dissonance between the target and evaluated sounds VA and VB, and relationship between increase/decrease of the degree of consonance or dissonance and increase/decrease of the degree of the consonance index value D may be defined in any desired form or manner.
(3) Modification 3:
In a case where there is no problem concerning a difference in pitch range between a target sound VA and an evaluated sound VB (e.g., where the target sound VA and the evaluated sound VB coincide with each other in pitch range), the correlation calculation section 40 and shift processing section 50 are dispensed with. Further, whereas each of the embodiments has been described above as constructed to calculate a correlation value C between band intensity distributions SA and SB of target and evaluated sounds VA and VB, the present invention may be constructed to calculate a correlation value C between a spectral trajectory RA (or spectral spectra QA, qA of a target sound VA and a spectral trajectory RB (or spectral spectra QB, qB of a target sound VB.
(4) Modification 4:
Further, whereas each of the embodiments has been described above as constructed to use spectral trajectories A (RA and RB), having been quantized by the quantization section 24, there may be employed a modified construction where frequency spectra q (qA and qB) calculated by the conversion section 221 are used in place of the spectral trajectories R (RA and RB) (namely, where the adjustment section 223 and quantization section 24 are omitted), or a modified construction where frequency spectra Q (QA and QB) having been adjusted by the adjustment section 223) are used in place of the spectral trajectories R (RA and RB) (namely, where the quantization section 24 is omitted).
This application is based on, and claims priority to, JP PA 2008-164057 filed on 24 Jun. 2008. The disclosure of the priority application, in its entirety, including the drawings, claims, and the specification thereof, is incorporated herein by reference.
Number | Date | Country | Kind |
---|---|---|---|
2008-164057 | Jun 2008 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5504270 | Sethares | Apr 1996 | A |
20020087565 | Hoekman et al. | Jul 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
20090316915 A1 | Dec 2009 | US |