This application claims priority based on 35 USC 119 from prior Japanese Patent Application No. P2007-078956 filed on Mar. 26, 2007, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to an apparatus which detects music (musical piece) sections from an audio including speech sections and music sections in a mixed manner.
2. Description of Related Art
In general, an aired audio often includes sections carrying speeches of an announcer and music sections in a mixed manner. When a listener wishes to record his/her favorite musical piece while listening to the audio, the listener has to manually start recording the musical piece at a timing when the musical piece begins, and to manually stop recording the musical piece at a timing when the musical piece ends. These manual operations are troublesome for the listener. Moreover, if a listener suddenly decides to record a favorite musical piece which is aired, it is usually impossible to thoroughly record the musical piece from its beginning without missing any part. In such case, it is effective to record an entire aired program first, and then extract the favorite musical piece from the recorded program by editing. This editing becomes easier by separating music sections from the aired program beforehand and by playing back only the separated music sections.
To this end, a technology for automatically separating music sections and speech sections from each other by analyzing characteristics of each of the sections. A technology disclosed by Japanese Patent Application Laid-Open Publication No. 2004-258659 is for separating a musical piece and a speech from each other by using characteristic amounts in terms of frequencies such as mel-frequency cepstral coefficients (MFCCs). However, the technology disclosed by the Publication No. 2004-258659 has a problem that a process for calculating the characteristic amount in a frequency area of an audio signal becomes vast because the process is so complicated that the workload for the process becomes large.
An aspect of the invention provides an apparatus implementing at least recording or playback that detects a music section from an audio signal. The apparatus comprises: a cut point detector configured to detect a time point as a cut point where a level of an audio signal or an amount of change in the audio signal level is equal to or more than a predetermined value; a frequency characteristic amount calculator configured to calculate a characteristic amount in a frequency area of the audio signal; a cut point judging unit configured to judge an attribute of the cut point on a basis of the calculated characteristic amount in a frequency; and a music section detector configured to detect a start point and an end point of a music section on a basis of the attribute and an interval between sampling points.
Another aspect of the invention provides an apparatus implementing at least recording or playback that detects a music section from an audio signal. The apparatus comprises: a cut point detector configured to detect a time point as a cut point where a level of an audio signal level or an amount of change in the audio signal level is equal to or more than a predetermined value; a frequency characteristic amount calculator configured to calculate a characteristic amount in a frequency area of the audio signal; and a music section detector configured to detect a start point and an end point of each music section on a basis of the calculated characteristic amount of the frequency and information on the detected cut point.
Still another aspect of the invention provides a musical piece detecting apparatus that detects a musical piece from an inputted audio. The apparatus comprises: an audio power calculator configured to calculate an audio power from an inputted audio signal; a cut point detector configured to detect a time point as a cut point where a level of an audio signal level or an amount of change in the audio signal level is equal to or more than a predetermined value on a basis of the audio power, the cut point detector configured to output time information on the cut point; a frequency characteristic amount calculator configured to calculate a characteristic amount in a frequency area at the detected cut point of the inputted audio signal; a likelihood calculator configured to calculate a likelihood between the characteristic amount and reference data on the musical piece; a cut point judging unit configured to judge, on a basis of the likelihood, whether or not the audio signal at the cut point is the musical piece; a time length judging unit configured to judge, on a basis of the time information on the cut point, a result of the judgment made by the cut point judging unit, the time length judging unit judging, on the basis of the time information on the cut point, whether or not a section between sections not judged as musical pieces lasts for a predetermined time length or longer; and a music section detector configured to detect a music section on a basis of a result of the judgment made by the time length judging unit.
The recording or playback apparatus is capable of separating the musical piece from the audio consisting of the musical piece and the speech though a simple arithmetic process.
Descriptions will be provided hereinbelow for an embodiment with reference to the drawings.
MPEG audio layer-3 (MP3) codec 3 includes an encoder function and a decoder function. The encoder function encodes the digital audio data, and thus generates compressed coded data, as well as subsequently outputs the compressed coded data along with time information. The decoder function decodes the coded data. D/A (digital-to-analog) converter 4 converts the digital audio data, which is decoded by MP3 codec 3, to analog signal data. Subsequently, this analog signal data is inputted into speaker 5 via an amplifier, whose illustration is omitted from
On a basis of the audio signal, DSP (digital signal processor) 7 calculates an audio power obtained by raising a value representing the amplitude of the audio signal to the second power for the purpose of detecting an audio signal level. In addition, DSP 7 calculates an amount of change in the audio power in order to detect an amount of change in the audio signal level. Furthermore, DSP 7 defines, as a cut point, a timing at which the amount of change in the audio power is not smaller than a predetermined value, and thus detects the cut point. Moreover, DSP 7 calculates a characteristic amount in a frequency area, an MFCC, for example, only at each cut point and in its proximity. Then, DSP 7 calculates a likelihood between the characteristic amount and an MFCC calculated on a basis of a sample audio signal.
Through bus 6, CPU (central processing unit) 8 controls the overall operation of the recording or playback apparatus according to the present embodiment. In addition, CPU 8 performs things such as a process for assuming whether the cut point belongs to the start point or the end point of the musical piece. HDD (hard disc drive) 10 is a large-capacity storage in which the coded data and the time information is stored via HDD interface 9 of an ATA (advanced technology attachment) interface. Memory 11 has a function of storing the execution program, and of having data generated through the arithmetic process stored temporarily, as well as of delaying the audio data for a predetermined time length right after the audio data is converted from analog to digital. It should be noted that various pieces of data are transmitted to, and received from, MP3 codec 3, DSP 7, CPU 8, HDD interface 9 and memory 11 via bus 6.
The digital audio data from A/D converter 2 is stored in delay memory 11a for delaying the digital audio data by a time length equivalent to a time needed for DSP 7 to perform its process. Concurrently, audio power calculator 71 in DSP 7 calculates the audio power equivalent to the audio signal level, or a value by raising the value representing the amplitude of the audio signal to the second power.
Cut point detector 72 in DSP 7 detects, as a cut point, a timing at which the amount of change in the audio signal level is large, or a timing at which the amount of change in the audio signal level is not smaller than the predetermined value. Thus, an output from the detection is outputted. Concurrently, the time information and the amount of change at the cut point are stored in temporary storage memory 11c.
Frequency characteristic amount calculator 73 synchronizes the audio data, which is outputted from delay memory 11a with delay by the predetermined time, with the output from cut point detector 72. Then, in a very short period of time between a timing slightly preceding a cut point and a timing slightly delayed from the cut point, the calculator 73 temporarily calculates the characteristic amount of the frequency, such as the MFCC. Then, the result is inputted to likelihood calculator 74.
In the present embodiment, it is taken into consideration that the characteristic amount of the frequency of the musical piece is different from that of the speech. For this reason, a characteristic amount of the frequency typical of the musical piece and that of the speech are both stored in external memory 11b beforehand as reference data used for comparison between the characteristic amounts of the frequencies. As a result, likelihood detector 74 in the DSP calculates the likelihood between the reference data and the output representing the result of the calculation of the characteristic amount at each cut point and in its proximity, which output is received from frequency characteristic amount calculator 73. Thereafter, likelihood detector 74 inputs an output representing the calculated likelihood to cut point judging unit 81 in CPU 8.
It should be noted that the calculated characteristic amount of the frequency does not have to be compared with the reference data. Specifically, in addition to the foregoing method of calculating the likelihood of the musical piece through comparing the calculated characteristic amount of the frequency with the reference data, another applicable method calculates the likelihood of the musical piece through assigning the characteristic amount of the frequency to an evaluation function set up beforehand.
Subsequently, cut point judging unit 81 judges whether the audio signal at the cut point belongs to the music or the speech on the basis of the output of the calculated likelihood. A result of the judgment is additionally stored in temporary storage memory 11c, in which the time information and the amount of change at the cut point which are received from the cut point detector 72 are already stored, with the result of the judgment associated with the time information and the amount of change at the cut point.
Time length judging unit 83 judges whether the audio judged, by cut point judging unit 81, as belonging to the music section lasts for a predetermined time length or longer. Time length judging unit 83 judges that the section is not a musical piece when the music section lasts shorter than the predetermined time length. In the case shown in
As a result, in the case where the time interval between two neighboring sampling points in the speech is shorter than 100 seconds, even if a sampling point between the two sampling point is judged as a musical piece, time length judging unit 83 is designed not to judge the section between the two neighboring sampling points as a musical piece. The time interval between two neighboring sampling points judged as a speech or anything but a musical piece is measured, and a corresponding section which is not shorter than 100 seconds is judged as a musical piece.
It is empirically learned that a musical piece lasts more than 100 seconds. Accordingly, in the case where the time interval between two neighboring sampling points in a speech is shorter than 100 seconds, even if a sampling point between the two neighboring points may be judged as a musical piece, time length judging unit 83 is designed to judge the corresponding section as no musical piece. Time length judging unit 83 is designed to measure the time interval between two neighboring sampling points judged as a speech or anything but a musical piece, and to judge a corresponding section which is more than 100 seconds as a musical piece.
Music section detector 82 receives an output of the judgment which is obtained from time length judging unit 83, and thus rewrites the table in temporary storage memory 11c, accordingly changing an existing table to a table (final table) for each musical piece.
When the recording operation is completed, this final table is supplied to HDD interface unit 9 via music section detector 82, and is subsequently stored in HDD 10.
It should be noted that each final table is stored in HDD 10 with a start point, an end point, cut points, and amounts of change left for a corresponding musical piece. These are all used to play back the chorus of the musical piece when the musical piece is going to be played back.
Out of encoded data stored in HDD 10, only parts corresponding to music sections specified in the final table are sequentially read out in accordance with editing and playback operations, and are thus inputted into MP3 codec 3. MP3 codec 3 decodes the corresponding parts in the encoded data. Subsequently, the decoded parts are converted to the audio signal by D/A converter 4, and are thus outputted from speaker 5. This makes it possible to detect only the musical piece from the audio signal including speech sections and the like, as well as accordingly to extract and play back the musical piece.
The present embodiment makes it possible to precisely detect the musical piece, because the music sections are detected by use of both information on the cut points and information on the amounts of characteristic of the respective frequencies.
Furthermore, the present embodiment also makes it possible to detect the music sections though the arithmetic process entailing only a light workload, because the music sections are detected by calculating the characteristic amount in the frequency area of the audio signal only at each cut point and in its proximity.
In the present embodiment, DSP 7 is designed to implement its own function whereas CPU 8 is designed to implement its own function. However, the present embodiment is not necessarily limited to the function division therebetween. The two functions may be implemented by CPU 8 only. Otherwise, the present embodiment may have a configuration in which, through software process, CPU 8 implements the functions respectively of A/D converter 2, MP3 codec 3 and D/A converter 4 in addition to the function of DSP 7. Although delay memory 11a, external memory 11b and temporary storage memory 11c have been discretely shown in the foregoing example, the memories are formed in memory 11 shown in
In the case of the foregoing example, the apparatus detects the music sections while recording the musical piece, so that the apparatus creates and records the final table. Instead, a configuration may be adopted, which causes the apparatus to detect the music sections while sequentially playing back the recorded digital audio data from HDD 10 during an idle time after the apparatus completes recording the musical piece, so that the apparatus creates the final table. Otherwise, a circuit configuration may be adopted, which causes the apparatus to carry out all of the operations according to the foregoing example in linkage with the playback operation. It goes without saying that these configurations are included in the present invention.
In addition, in the foregoing example, the audio signal level is detected as the value obtained by raising a value representing the amplitude of the audio signal to the second power. The audio signal level can be similarly detected as the absolute value of the amplitude, instead.
Moreover, in the foregoing example, the cut point is defined as a timing at which the audio signal level changes to the large extent. As a result, the cut point corresponds to neither the start point nor the end point of the musical piece precisely. However, the cut point can be sufficiently used as the playback start point or the playback end point of the musical piece.
The foregoing example has a configuration effective for a method with which, while editing after recording musical pieces, the operator determines whether or not each of the recorded musical pieces is what the operator wished to have by playing back a part of every recorded musical piece, and leaves only musical pieces which the operator wishes to have as a library afterward. The foregoing example aims at being used regardless of whether or not the editing is carried out precisely.
(Modification)
The music sections may be detected in accordance with the following procedure.
The detection according to the modification makes it possible to increase the precision with which the music section is detected in comparison with the technology, disclosed in Japanese Patent Application Laid-Open Publication No. 2004-258659, for detecting a music section by use of a characteristic amount of the frequency only.
The invention includes other embodiments in addition to the above-described embodiments without departing from the spirit of the invention. The embodiments are to be considered in all respects as illustrative, and not restrictive. The scope of the invention is indicated by the appended claims rather than by the foregoing description. Hence, all configurations including the meaning and range within equivalent arrangements of the claims are intended to be embraced in the invention.
Number | Date | Country | Kind |
---|---|---|---|
2007-078956 | Mar 2007 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5233484 | Nagasawa et al. | Aug 1993 | A |
5402277 | Nagasawa et al. | Mar 1995 | A |
5712953 | Langs | Jan 1998 | A |
6169241 | Shimizu | Jan 2001 | B1 |
6242681 | Daishoji | Jun 2001 | B1 |
6570991 | Scheirer et al. | May 2003 | B1 |
6998527 | Agnihotri | Feb 2006 | B2 |
7120576 | Gao | Oct 2006 | B2 |
7179980 | Kirkeby et al. | Feb 2007 | B2 |
7256340 | Okazaki et al. | Aug 2007 | B2 |
7277852 | Iyoku et al. | Oct 2007 | B2 |
7315899 | Dunning et al. | Jan 2008 | B2 |
7336890 | Lu et al. | Feb 2008 | B2 |
7346516 | Sall et al. | Mar 2008 | B2 |
7544881 | Makino et al. | Jun 2009 | B2 |
7558729 | Benyassine et al. | Jul 2009 | B1 |
20020120456 | Berg et al. | Aug 2002 | A1 |
20030101050 | Khalil et al. | May 2003 | A1 |
20030171936 | Sall et al. | Sep 2003 | A1 |
20030229537 | Dunning et al. | Dec 2003 | A1 |
20040069118 | Okazaki et al. | Apr 2004 | A1 |
20040165730 | Crockett | Aug 2004 | A1 |
20040167767 | Xiong et al. | Aug 2004 | A1 |
20050016360 | Zhang | Jan 2005 | A1 |
20050169114 | Ahn | Aug 2005 | A1 |
20060074667 | Saffer | Apr 2006 | A1 |
20060081118 | Okazaki et al. | Apr 2006 | A1 |
20060085188 | Goodwin et al. | Apr 2006 | A1 |
20070051230 | Hasegawa | Mar 2007 | A1 |
20070106406 | Makino et al. | May 2007 | A1 |
20080097756 | De Lange et al. | Apr 2008 | A1 |
20080236368 | Matsumoto et al. | Oct 2008 | A1 |
20090088878 | Otsuka et al. | Apr 2009 | A1 |
Number | Date | Country |
---|---|---|
2004-258659 | Sep 2004 | JP |
Number | Date | Country | |
---|---|---|---|
20080236368 A1 | Oct 2008 | US |