The present invention relates to a technique for detecting or identifying, from a sound signal, a repetition of a plurality of portions that are similar to each other in musical character.
Heretofore, there have been proposed various techniques for identifying, from a music piece, a portion where a musical character of performance tones satisfies a predetermined condition. Japanese Patent Application Laid-open Publication No. 2004-233965, for example, discloses a technique for identifying a refrain (or chorus) portion of a music piece by appropriately putting together a plurality of portions of a sound signal, obtained by recording performance tones of the music piece, which are similar to each other in musical character.
The technique disclosed in the No. 2004-233965 publication can identify with a high accuracy a refrain portion of a music piece if the music piece is simple and clear in musical construction (e.g., pop or rock music piece having clear introductory and refrain portions) and the refrain portion continues for a relatively long time (i.e., has relatively long duration). However, with the technique disclosed in the No.2004-233965 publication which is only intended to identify a refrain portion of a music piece, it is difficult to identify with a high accuracy a particular portion of a music piece where one or more portions each having a short time length (i.e., short-time portions) are repeated successively, e.g. a piece of electronic music where performance tones of a bass or rhythm guitar are repeated in one or more short-time portions each having a time length of about one or two measures.
In view of the foregoing, it is an object of the present invention to provide a technique which can also identify with a high accuracy a portion of a music piece where a short-time portion is repeated.
In order to accomplish the above-mentioned object, the present invention provides an improved sound signal processing apparatus for identifying a loop region where a similar musical character is repeated in a sound signal, which comprises: a character extraction section that divides the sound signal into a plurality of unit portions and extracts a character value of the sound signal for each of the unit portions; a degree of similarity calculation section that calculates degrees of similarity between the character values of individual ones of the unit portions; a first matrix generation section that generates a degree of similarity matrix by arranging the degrees of similarity between the character values of the individual unit portions, calculated by the degree of similarity calculation section, in a matrix configuration, the degree of similarity matrix having arranged in each column thereof the degrees of similarity acquired by comparing, for each of the unit portions, the sound signal and a delayed sound signal obtained by delaying the sound signal by a time difference equal to an integral multiple of a time length of the unit portion, the degree of similarity matrix having a plurality of the columns in association with different time differences equal to different integral multiples of the time length of the unit portion; a probability calculation section that, for each of the columns corresponding to the different time differences in the degree of similarity matrix, calculates a repetition probability indicative of a level of similarity on the basis of the degree of similarity; a peak identification section that identifies a plurality of peaks in a distribution of the repetition probabilities calculated by the probability calculation section; a second matrix generation section that generates a reference matrix having a plurality of columns corresponding to different time differences equal to different integral multiples of the time length of the unit portion and having predetermined reference values arranged in the columns associated with positions of the time differences where the plurality of peaks identified by the peak identification section are located; and a collation section that identifies the loop region in the sound signal by collating the reference matrix with the degree of similarity matrix.
Because the sound signal processing apparatus of the present invention is arranged to identify the loop region by collating, with the degree of similarity matrix, the reference matrix set in accordance with the positions of the individual peaks in the distribution of the repetition probabilities calculated from the degree of similarity matrix.
In a preferred embodiment, the collation section includes: a correlation calculation section that calculates correlation values along a time axis of the sound signal by applying the reference matrix to the degree of similarity matrix, and a sound signal portion identification section that identifies the loop region on the basis of peaks in a distribution of the correlation values calculated by the correlation calculation section.
Further, in a preferred embodiment, the peak identification section includes: a period identification section that identifies a period of the peaks in the distribution of the repetition probabilities; and a peak selection section that selects a plurality of peaks appearing with the period, identified by the period identification section, in the distribution of the repetition probabilities. The period identification by the period identification section may be performed using a conventionally-known technique, such as auto-correlation arithmetic operations or frequency analysis (e.g., Fourier transform).
If the number of the peaks to be identified from the distribution of the repetition probabilities is too great (namely, if the size of the reference matrix is too great), it would be difficult to detect a loop region of a relatively short time length. If, on the other hand, the number of the peaks to be identified from the distribution of the repetition probabilities is too small, so many sound signal portions including short-time repetitions would be detected as loop regions. Thus, in a preferred embodiment of the present invention, the peak identification section limits, to within a predetermined range, the total number of the peaks to be identified from the distribution of the repetition probabilities. Because the total number of the peaks to be identified by the peak identification section is limited to within the predetermined range like this, the sound signal processing apparatus can advantageously identify each loop region of a suitable time length with a high accuracy. For example, in order to detect, as a loop region, a short-time repetition as well, the total number of the peaks to be identified is limited to below a predetermined threshold value, while, in order to prevent a short-time repetition from being detected as a loop region, the total number of the peaks to be identified is limited to above a predetermined threshold value.
Loop region identification based on the positions of peaks in the distribution of the correlation values may be performed in any desired manner. For example, the portion identification section may identify, as a loop region, a sound signal portion running from a time point of a peak in the distribution of the correlation values to a time point when a reference length corresponding to a size of the reference matrix terminates. However, in a case where a loop region lasts over a time length exceeding the size of the reference matrix, a peak detected from the distribution of the correlation values may probably have a flat top. Thus, when a peak having a flat top is detected, the portion identification section of the present invention preferably identifies, as a loop region, a sound signal portion having a start point that coincides with the leading edge of the peak and an end point that coincides with a time point located a reference length, corresponding to the size of the reference matrix, from the trailing edge of the peak.
The sound signal processing apparatus of the present invention may be implemented not only by hardware (electronic circuitry), such as a DSP (Digital Signal Processor) dedicated to processing of input sounds, but also by cooperation between a general-purpose arithmetic operation processing device, such as a CPU (Central Processing Unit), and a program. The program of the present invention is a process for causing a computer to perform a process for identifying a loop region, where a plurality of repeated portions are arranged, from a sound signal, which comprises: a character extraction operation for extracting a character value of the sound signal for each of unit portions of the signal; a degree of similarity calculation operation for calculating degrees of similarity between the character values of the individual unit portions; a first matrix generation operation for generating a degree of similarity matrix by arranging the degrees of similarity between the character values of the individual unit portions in a matrix configuration (i.e., in a plane including a time axis and a time difference axis), the degree of similarity matrix having arranged in each column (similarity column line corresponding to a high degree-of-similarity portion of the sound signal) thereof the degrees of similarity acquired by comparing, for each of the unit portions, the sound signal and a delayed sound signal obtained by delaying the sound signal by a time difference equal to an integral multiple of a time length of the unit portion; a probability calculation operation for, for each of the time differences in the degree of similarity matrix, calculating a repetition probability corresponding to a ratio of the high degree-of-similarity portion; a peak identification operation for identifying a plurality of peaks in a distribution of the repetition probabilities; a second matrix generation operation for generating a reference matrix having a plurality of reference column lines at positions of the peaks identified by the peak identification operation; a correlation calculation operation for, for each of a plurality of time points on the time axis of the degree of similarity matrix, calculating a correlation value between the reference column line of the reference matrix and the similarity column line of the degree of similarity matrix; and a portion identification operation for identifying a loop region on the basis of peaks in a distribution of the correlation values. The program of the present invention may not only be supplied to a user stored in a computer-readable storage medium and then installed in a user's computer, but also be delivered to a user from a server apparatus via a communication network and then installed in a user's computer.
The following will describe embodiments of the present invention, but it should be appreciated that the present invention is not limited to the described embodiments and various modifications of the invention are possible without departing from the basic principles. The scope of the present invention is therefore to be determined solely by the appended claims.
For better understanding of the object and other features of the present invention, its preferred embodiments will be described hereinbelow in greater detail with reference to the accompanying drawings, in which:
The sound processing apparatus 100 identifies a loop region of a sound signal V supplied from the signal generation device 12. As seen in
As shown in
Character extraction section 22 of
Degree of similarity calculation section 24 calculates numerical values (hereinafter referred to as “degrees of similarity”) SM, which are indices of similarity, by comparing between sound character values F of individual unit portions. More specifically, the degree of similarity calculation section 24 calculates a degree of similarity in sound character value F between every pair of unit portions. If the sound character values F are represented as vectors, a Euclidean distance or cosine angle between sound character values F of every pair of the unit portions to be compared is calculated (or evaluated) as the degree of similarity SM.
The matrix generation section 26 of
In other words, in the degree of similarity matrix MA, degrees of similarity obtained by comparing, for each of the unit portions, the sound signal V and a delayed sound signal obtained by delaying the sound signal V by a time corresponding to an integral multiple of the time length of the unit portion are put in a column, and a plurality of such columns are included in the matrix MA in association with the time differences corresponding to different integral multiples of the time length of the unit portion. Namely, the time axis T is a row axis, while the time difference axis D is a column axis. The “shift amount d” is a delay time whose minimum length is equal to the time length of the unit portion.
Because the portion s1 (t1-t2) and portion s2 (t2-t3) are similar to each other in character value F between their respective unit portions as illustrated in
As shown in
Note that, in a case where the degree of similarity SM is high only in a small number of unit portions, some area of the degree of similarity matrix MA where the second values b2 are distributed may be dotted with a few first values b1. Further, in practice, even portions musically similar to each other may be disimilar in character value F to each other in only a few unit portions, and thus, some arrays of the first values b1 maybe spaced from each other with a slight interval (i.e., interval corresponding to an area of the second values b2) along the time axis T. The filter process (Morphological Filtering) performed by the noise sound removal section 264 includes an operation for removing the first values b1, distributively located in the T-D plane, following the threshold value process, and an operation for interconnecting a plurality of the arrays of the first values b1 that are located in spaced-apart relation to each other with a slight interval along the time axis T. Namely, the noise sound removal section 264 removes, as noise, the first values b1 other than those values constituting the similarity column line GA exceeding a predetermined length. Through the aforementioned processing, the degree of similarity matrix MA of
Probability calculation section 32 of
In
The peak identification section 34 includes a period identification section 344 and a peak selection section 346. The period identification section 344 identifies a period TR of the peaks PR in the repetition probability distribution r, using auto-correlation arithmetic operations performed on the repetition probability distribution r. Namely, while moving (i.e., shifting) the repetition probability distribution r along the time difference axis D, the period identification section 344 first calculates a correlation value CA between the repetition probability distributions r before and after the shifting, to thereby identify relationship between the shift amount Δ and the correlation value CA.
Then, the period identification section identifies a period TR of the peaks PR in the repetition probability distribution r on the basis of results of the auto-correlation arithmetic operations. For example, the period identification section 344 calculates intervals Δp between a plurality of adjoining peaks, as counted from a point at which the shift amount is zero, of a multiplicity of peaks appearing in a distribution of the correlation values CA, and it determines a maximum value of the intervals Δp as the period TR of the peaks PR in the repetition probability distributions r.
Peak selection section 346 of
The peak selection section 346 limits the number m of the peaks PR, which are to be selected from the probability distribution r, to below a threshold value TH1 (e.g., TH1=5). For example, if the number of the peaks PR detected from the probability distribution r is greater than the threshold value TH1, then m (m=TH1) peaks PR close to the original point of the time difference axis D are selected. In a case where the music piece does not include any clear loop region L, the number of the peaks PR in the probability distribution r is small, and thus, if the number m of the peaks PR detected from the probability distribution r is smaller than a predetermined threshold value TH2 (TH2<TH1, e.g., TH2=3), the peak selection section 346 informs a user, through image display or voice output, that the music piece does not include any loop region L. Namely, the number m of the peaks PR ultimately selected by the peak selection section 346 is limited to within a range of equal to or smaller than the threshold value TH1 but equal to or greater than the threshold value TH2. The threshold value TH1 and threshold value TH2 are variably controlled in accordance with a user's instruction. The following description assumes that the peak identification section 34 has identifies four peaks PR (i.e., m=4).
Matrix generation section 36 of
As shown in
Then, the matrix generation section 36 generates a reference matrix MB by setting at the first value b1 (that is a predetermined reference value, such as “1”) each of M numerical values belonging to the m peak correspondent columns Cp and located from a positive diagonal line (i.e., straight line extending from the first-row-first-column position to the M-th-row-M-th-column position) to the M-th row, and setting at the second value b2 (e.g., “0”) each of the other numerical values belonging to the m peak correspondent columns Cp. In
As noted above, column lines (hereinafter referred to as “reference column lines”) GB where the first reference values b1 (=1) are arranged are set in the individual peak-correspondent columns Cp of the reference matrix MB. Peaks PR appear in the repetition probability distribution r with a period corresponding to each of the repeated portions SR within the loop regions L. Thus, there is a high possibility that similarity column lines GA exist, in a similar manner to the reference column lines GB of the reference matrix MB, in areas of the degree of similarity matrix MA where the loop regions L are present.
In
The correlation calculation section 42 of
The correlation value CB is a numerical value functioning as an index of correlation (similarity) between forms of an arrangement (interval and total length) of the individual reference lines GB of the reference matrix MB and an arrangement of the individual similarity column lines GA of the degree of similarity matrix MA. For example, the correlation value CB is calculated by adding together a plurality of (i.e., M×M) numerical values obtained by multiplying together corresponding pairs of the numerical values (b1 and b2) in the reference matrix MB and the degrees of similarity SM (b1 and b2) in an M-row-M-column area of the degree of similarity matrix MA which overlaps the reference matrix MB.
Through the aforementioned process, the correlation value CB (i.e., relationship between the time axis T and the correlation value CB) is calculated for each of a plurality of time points on the time axis T of the degree of similarity matrix MA. As understood from the description about the aforementioned correlation value CB, the correlation value CB takes a greater value as the individual reference column lines GB of the reference matrix MB and the similarity column lines GA in the area of the degree of similarity matrix MA corresponding to the reference matrix MB are more similar in form.
The portion identification section 44 of
As shown in (b) of
If the time length (i.e., “reference time length”) of the reference matrix MB, corresponding to the number M of the rows of the reference matrix MB, agrees with the time length of a loop region L of the music piece, the correlation value CB increases only when the reference matrix MB is superposed on the loop region L on the time axis T. Thus, a peak PC (PC1) having a sharp top appears in the distribution of the correlation values CB, as shown in (b) of
The portion determination section 446 identifies a loop region L on the basis of the position LP detected by the peak detection section 444. When the peak detection section 444 has detected the position LP of a sharp peak PC (PC1), the portion determination section 446 identifies, as a loop region (i.e., group of m repeated portions SR) L, a portion (music piece portion or sound signal portion) running from the position LP to a time point at which the reference time length W terminates. Once the peak detection section 444 detects the position LP of the trailing edge of the flat peak PC (PC2 or PC3), the portion determination section 446 identifies, as a loop region L, a portion (music piece portion or sound signal portion) running from the leading edge of the peak PC to a time point at which the reference time length W terminates. Namely, if the peak PC is flat, the loop region L is a portion that comprises an interconnected combination of a given number of repeated portions SR corresponding to a portion running from the leading edge to the trailing edge of the peak PC and m repeated portions SR.
Because the reference matrix MB, set in accordance with the positions LP of the individual peaks PR of the probability distribution r calculated from the degree of similarity matrix MA, is used to identify a loop region L, the instant embodiment can also detect with a high accuracy a loop region L comprising repeated portions SR each having a short time length.
If the number m of the peaks PR to be used for generation of the reference matrix MB is too great (namely, if the reference column lines GB of the reference matrix MB are too many), there would arise the problem that only a loop region L where the similarity column lines GA are similar to the reference matrix MB is detected for a long time. If, on the other hand, the number m of the peaks PR to be used for generation of the reference matrix MB is too small, there would arise the problem that an excessively great number of loops L are detected. However, the instant embodiment, where the number m of the peaks PR to be used for generation of the reference matrix MB is limited to the range between the threshold value TH1 and the threshold value TH2, can advantageously detect loop regions L each having an appropriate time length.
Further, in the instant embodiment, peaks PC having a flat top, in addition to peaks PC having a sharp top, can be detected from the distribution of the correlation values CB, and, for such a peak PC having a flat top, a sound signal portion running from the trailing edge (position LP) to the time point when the reference length W terminates is detected as a loop region L. As a consequence, even a loop region having a time length exceeding the reference length W can be detected with a high accuracy.
The above-described embodiment of the present invention may be modified variously as set forth below by way of example, and such modifications may be combined as desired.
The method for detecting peaks PR from the repetition probability distribution r may be modified as desired. For example, the period identification section 344 of the peak identification section 34 identifies, as the period TR, an interval from the original point of the shift amount Δ (i.e., “Δ=0” point) to the point of the maximum value (peak) of the correlation values CA in the distribution of the correlation values CA, as shown in
Further, the method for identifying the period TR of the peaks PR appearing in the probability distribution r is not limited to the aforementioned scheme using auto-correlation arithmetic operations. For example, there may be employed an arrangement that identifies a frequency spectrum (or cepstrum) of the probability distribution r by performing frequency analysis, such as the Fourier transform, and identifies the period TR from frequencies of peaks in the identified frequency spectrum.
Results of the loop region detection may be used in any desired manners. For example, a new music piece may be made by appropriately interconnecting individual repeated portions SR of loop regions L detected by the sound processing apparatus 100. Results of the loop region detection may also be used in analysis of the organization of the music piece, such as measurement of a ratio of the loop regions L.
This application is based on, and claims priority to, JP PA 2008-037654 filed on 19 Feb. 2008. The disclosure of the priority application, in its entirety, including the drawings, claims, and the specification thereof, is incorporated herein by reference.
Number | Date | Country | Kind |
---|---|---|---|
2008-037654 | Feb 2008 | JP | national |