The present invention relates to a technology of estimating time lengths of beats and measures in a musical piece from a sound signal indicating the musical piece.
There is known a technique of analyzing a sound signal. For example, Patent Literatures 1 and 2 disclose techniques using a probability model to estimate beat points, tempos, measure positions, and chords of a musical piece from a sound signal indicating the musical piece.
Patent Literature 1: JP-A-2015-114361
Patent Literature 2: JP-A-2015-200803
In the techniques disclosed in Patent Literatures 1 and 2, it is necessary to define a transition probability model of beat points, tempos, beats, and chord progression in advance.
With this regard, the present invention provides a technology of more simply estimating lengths of beats and measures from an input sound signal.
An aspect of the present invention provides a sound signal processing method including: acquiring a unit time length in an input sound signal which indicates a musical piece; calculating a tone feature amount from the input sound signal; calculating, in a case where time lengths of one beat and one measure in the musical piece are assumed using the unit time length as a reference in the input sound signal with respect to the tone feature amount, a first index which indicates validity of the assumed time lengths; and estimating the time lengths of one beat and one measure using the unit time length as a reference based on the first index.
Another aspect of the present invention provides a sound signal processing device including: at least one memory storing instructions; and at least one processor configured to implement the stored instructions to execute a plurality of tasks, including: a unit-time acquisition task that acquires a unit time length in an input sound signal indicating a musical piece; a feature amount calculation task that calculates a tone feature amount from the input sound signal; a first index calculation task that calculates, in a case where time lengths of one beat and one measure in the musical piece are assumed using the unit time length as a reference in the input sound signal with respect to the tone feature amount, a first index which indicates validity of the assumed time lengths; and an estimation task that estimates the time lengths of one beat and one measure using the unit time length as a reference based on the first index.
The index may be calculated using a priority which is set in advance with respect to a combination of the time lengths of one beat and one measure.
According to the present invention, it is possible to more simply estimate lengths of beats and measures from an input sound signal.
The sound signal processing device 1 includes an input sound acquisition unit 11, a unit-time acquisition unit 12, a feature amount calculation unit 13, an index calculation unit 14, an estimation unit 15, a storage unit 16, and an output unit 17. The input sound acquisition unit 11 acquires a sound signal indicating the musical piece which is an input sound signal, that is, a processing target described below. The unit-time acquisition unit 12 acquires a unit time length in the input sound signal. The feature amount calculation unit 13 calculates a tone feature amount from the input sound signal. In a case where time lengths of one beat and one measure are assumed using the unit time length as a reference in the input sound signal with respect to the tone feature amount calculated by the feature amount calculation unit 13 the index calculation unit 14 calculates an index indicating validity of the assumed time lengths. The estimation unit 15 estimates the time lengths of one beat and one measure using the unit time length as a reference based on the index calculated by the index calculation unit 14.
The storage unit 16 stores a priority which is predetermined in advance with respect to a combination of the time lengths of one beat and one measure. In this example, the estimation unit 15 estimates the time lengths of one beat and one measure based on the priority stored in the storage unit 16. The output unit 17 outputs information on the time lengths of one beat and one measure which is estimated by the estimation unit 15,
The storage 103 stores a program which causes the computer to serve as the sound signal processing device 1. The CPU 101 executes the program to install the function illustrated in
In Step S1, the input sound acquisition unit 11 acquires the input sound signal. The input sound signal is, for example, a sound signal of the musical piece based on non-compressed or compressed (way, mp3, etc.) sound data, but the present invention is not limited thereto. The sound data may be stored in the storage 103 in advance, or may be input from the outside of the sound signal processing device 1.
In Step S2, the unit-time acquisition unit 12 acquires a unit time length ta. The unit time length ta means a minimum unit of musical time in the musical piece, for example, it said a repeating unit of performing sound of a musical instrument (for example, an interval from one stroke to next stroke of a high hat in a case where a rhythm is split using the high hat). For example, the unit time length ta corresponds to the length of an eighth note or a sixteenth note in the musical piece. As an example, the unit-time acquisition unit 12 calculates the unit time length ta by analyzing the input sound signal. A well-known technique is used to calculate the unit time length ta. Alternatively, the unit time length ta may be designated by a user's command input. In this case, the unit-time acquisition unit 12 acquires the unit time length ta according to the user's command input. Specifically, for example, the user repeatedly pushes a button (or taps a touch screen) at timing corresponding to the unit time length in synchronization with an input sound. The unit-time acquisition unit 12 determines the unit time length ta corresponding to the repetition input.
The timing (for example, timing of sounding a high hat) which is automatically calculated by the sound signal processing device 1 and designated by a user's command input is not always exactly periodic. Therefore, the unit time length ta acquired in Step S2 does not need to be constant all over the analyzing target sections in the input sound signal. The input sound signal is divided into a plurality of sections. The unit time length ta in each section may be different. In other words, the unit time length ta is an example of the time length to smooth the tone feature amount described below. Alternatively, the sound signal processing device 1 may use a method of calculating an average value to determine a constant unit time length ta all over the analyzing target sections. In this case, a portion (for example, a portion where a tempo is changed in the music) in which a variation of timing interval in the musical piece exceeds a threshold (for example, 10% of an average value) may be processed differently from the other portions by the sound signal processing device 1.
The description will be given with reference to
In Step S4, the index calculation unit 14 calculates an index. The index indicates the validity of the assumed time length in a case where the time lengths of one beat and one measure are assumed using the unit time length ta as a reference in the input sound signal with respect to the tone feature amount.
[Equation 1]
R[d,n]=|DFT{x[d,t]}| (1)
The description will be given with reference to
[Equation 2]
S[l]=Σnw[l,n](ΣdR[d,n]) (2)
Herein, w[l, n] is as follows.
Further, u[l, n] is as follows.
The Equations (2) to (4) show that data of surroundings corresponding to the period l in the amplitude DFT of length N is subjected to the sum of products. In other words, w[l, n] is a window function to cut the data out of surroundings of the period l. “λ” of Equation (4) is a constant which is experimentally determined. In other words, in Step S42, the index of the time lengths of one beat and one measure is calculated by applying the DFT results with the window function corresponding to the time lengths of one beat and one measure in the musical piece using the unit time length ta as a reference in a time domain.
The description will be given with reference to
In this example, the candidates of the combinations (m, b) of m and b are restricted in advance from the viewpoint of music. First, most of the musical pieces are 2 beats. 3 beats, or 4 beats. Therefore, for example, even in a case where there is a limitation to m∈{2, 3, 4}, there is no problem in many cases. When the unit time length ta is considered to correspond to an eighth note of a sixteenth note, even in a case where there is a limitation on m∈{2, 3, 4}, there is no problem in many cases. When m and b are limited as described above, the candidates of the combinations (m, b) are limited to 9 candidates. The storage unit 16 stores information which is used to specify a candidate of the combination. The index calculation unit 14 sequentially selects a combination among these 9 candidates. The limitation of the candidates of the combinations (m, b) described herein is given as merely exemplary, but the invention is not limited thereto.
In Step S44, the index calculation unit 14 acquires a priority P0[m, b] corresponding to the selected combination (m, b). The priority P0 is set in advance, and stored in the storage unit 16.
The description will be given with reference to
[Equation 5]
P[m,b]=s[b]+S[mb]+P0[m,b] (5)
As an example, in a case where (m, b) is (4, 4), the following Equation (6) is obtained.
[Equation 6]
P[4,4]=s[4]+S[16]+P0[4,4] (6)
The index calculation unit 14 stores the calculated index P[m, b] in the storage unit 16.
The index S[l] is necessarily calculated up to S[mb] which corresponds to a product of m and b. In other words, maximum values mmax and bmax of m and b are necessarily set such that the period l covers the following Equation (7).
[Equation 7]
lmax=mmax·bmax (7)
For example, in a case where mmax=4 and bmax=4, the following Equation (8) is obtained.
[Equation 8]
l∈{2,3,4,6,8,9,12,16} (8)
Therefore, the index calculation unit 14 calculates the index S[l] in a range of Equation (8) in Step S42.
In Step S46, the index calculation unit 14 determines whether the indexes P[m, b] of all the candidates of the combinations of (m, b) are completely calculated. In addition, in a case where it is determined that there is a combination (m, b) of which the index P[m, b] is not calculated (S46: NO), the index calculation unit 14 causes the process to move to Step S43. Hereinafter, the combination (m, b) is updated, the processes of Steps S44 and S45 are repeatedly performed. In a case where it is determined that the indexes P[m, b] of all the candidates of the combinations are completely calculated (S46: YES), the index calculation unit 14 ends the flow of
The description will be given with reference to
In Step S6, the output unit 17 outputs information on the combination (m, b) which is estimated by the estimation unit 15. The information on the combination (m, b) is, for example, a beat (4/4 beat, 4/3 beat, etc.) of the musical piece related to the input sound signal. Alternatively, the output unit 17 outputs a parameter to specify the combination (m, b). In a case the output destination is the user, the output information is displayed in a display for example. In a case where the output destination is another sound signal processing device, the output information is output as data for example.
Table 1 shows results of beat estimation obtained by a method (example) according to the embodiments and a comparative example. The inventors of this application have performed the beat estimation using the methods according to the example and the comparative example performed on an actual musical piece, and evaluated an accurate rate. As the comparative example, there was used an algorithm of estimating all the beats of the musical piece as a 4/4 beat. As a target for the beat estimation, there were prepared 100 musical pieces of popular songs. The musical piece was classified into a 4-beat system (a numerator of the beat is a multiple of “2”) and a 3-beat system (a numerator of the beat is a multiple of “3”).
In the example, the accurate rate to the musical piece of the 4-beat system is slightly smaller than the comparative example. However, the accurate rate to the musical piece of the 3-beat system is dramatically improved. Therefore, the accurate rate of the example is significantly improved compared to the comparative example as a whole.
The present invention is not limited to the embodiment, but various modifications can be made. Hereinafter, some of modifications will be described. In the following modifications, two or more ones may be combined.
A specific computing method of the index P[m, b] is not limited to the one exemplified in the embodiment. For example, the priority P0 may be not taken into consideration. In other words, the third term on the right hand side of Equation (5) may be omitted.
In the embodiment, the candidates of the combinations (m, b) are limited from a musical viewpoint, but such a limitation may be not considered. For example, “m” and “b” each may be set in a different range of usable values. All of possible combinations (m, b) in those ranges may be candidates. In this case, it may be excluded a probability that a combination (m, b) having no musical meaning is estimated as most plausible by the priority P0. For example, a combination of (m, b)=(7, 3) corresponds to a 7/8 beat. However, since a 7/8 beat musical piece itself is rare, the priority P0 may be set to a low value (for example, a negative value).
In the example according to the embodiment, the number “m” indicates beats contained in one measure. However, the number “m” may indicate the unit time length to contained in one measure. In this case, the number “m” has to be an integer multiple of the number “b”. Therefore, a case where the number “m” is not an integer multiple of the number “b” may be excluded when the candidates of the combinations (m, b) are limited. Alternatively, the candidates of the combinations (m, b) are not limited. The priority P0 corresponding to the combination (m, b) of which the number “m” is not an integer multiple of the number “b” may be set to an extremely low value (for example, −∞)).
The tone feature amount is not limited to that described in the embodiment. For example, a feature amount other than the MFCC, such as a formant frequency and an LPC (Linear Predictive Coding) cepstrum, may be used.
The window function is not limited to that exemplified in Equation (3). Any function fotlaiat may be employed as long as a spectrum of the surroundings of the period l can be cut out.
In the example according to the embodiment, a single processing device has all the functions of
A program executed by the CPU 101 of the sound signal processing device 1 may be provided by a memory medium such as an optical disk, a magnetic disk, and a semiconductor memory, or may be downloaded through a communication line such as the Internet. The program has no need to include all the steps of
Number | Date | Country | Kind |
---|---|---|---|
2016-048562 | Mar 2016 | JP | national |
This application is a continuation of the international patent application No. PCT/JP2017/009745 which was filed on Mar. 10, 2017, claiming the benefit of priority of Japanese Patent Application No. 2016-048562 filed on Mar. 11, 2016, the contents of which are incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
8344234 | Sugai | Jan 2013 | B2 |
9087501 | Maezawa | Jul 2015 | B2 |
9280961 | Eronen | Mar 2016 | B2 |
20080115656 | Sumita | May 2008 | A1 |
20110255700 | Maxwell | Oct 2011 | A1 |
20140338515 | Sheffer | Nov 2014 | A1 |
20140358265 | Wang | Dec 2014 | A1 |
20150094835 | Eronen | Apr 2015 | A1 |
Number | Date | Country |
---|---|---|
2000221979 | Aug 2000 | JP |
2002116454 | Apr 2002 | JP |
2007052394 | Mar 2007 | JP |
2008275975 | Nov 2008 | JP |
2015114361 | Jun 2015 | JP |
2015200803 | Nov 2015 | JP |
2009125489 | Oct 2009 | WO |
Entry |
---|
English translation of Written Opinion issued in Intl. Appln. No. PCT/JP2017/009745 dated May 23, 2017, previously cited in IDS filed Aug. 30, 2018. |
International Search Report issued in Intl. Appln. No. PCT/JP2017/009745 dated May 23, 2017. English translation provided. |
Written Opinion issued in Intl. Appln. No. PCT/JP2017/009745 dated May 23, 2017. |
Shoji et al. “Downbeat Estimation of Acoustic Signals of Music with Irregular Meter.” Journal of the Acoustical Society of Japan. Dec. 1, 2012: 595-604. vol. 68, No. 12. |
Number | Date | Country | |
---|---|---|---|
20180374463 A1 | Dec 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2017/009745 | Mar 2017 | US |
Child | 16117154 | US |