This application is based on Japanese Patent Application No. 2009-223066 filed on Sep. 28, 2009 and Japanese Patent Application No. 2010-195431 filed on Sep. 1, 2010, the contents of which are hereby incorporated by reference.
1. Field of the Invention
The present invention relates to a music track extraction device which extracts only a music track portion from a radio broadcast program and a music track recording device which records a music track.
2. Description of Related Art
There is a digital reproduction device which automatically extracts a music portion from a received radio broadcast program and storing the music portion. For example, there is a digital reproduction device that extracts a music track portion by performing a judgment between stereo data and monaural data from left channel data and right channel data of broadcast data and setting a stereo portion as a music track and a monaural portion as a non-music track.
However, the digital reproduction device has a problem in that the degree of separation between the left and right channel data is small if received field intensity of a radio broadcast is low, and hence an audio signal being originally the stereo portion may be judged as a monaural signal, which makes it impossible to correctly extract a music track portion. The digital reproduction device has another problem of failing to extract a music track portion without a broadcast which transmits at least left and right channel data (for example, frequency modulation (FM) broadcast). Specifically, for example, a music track portion cannot be extracted from an amplitude modulation (AM) broadcast which transmits only monaural data.
A music track extraction device according to the present invention includes:
an audio power calculation section which calculates an audio power from an audio signal; and
a judgment section which performs a judgment between a music track portion and a non-music track portion based on a state of the audio power.
A music track recording device according to the present invention includes:
the music track extraction device described above; and
a recording section which records an audio signal within a segment judged as a music track by the music track extraction device.
The meaning and effects of the present invention become clearer from the following description of embodiments. However, the following embodiments are mere examples of the embodiment of the present invention, and the meaning of the present invention or the meanings of the terms of respective components thereof are not limited to what are described in the following embodiments.
First, a recording/reproduction device 100 according to a first embodiment being an embodiment of the present invention is described in detail with reference to the drawings.
The FM tuner 1 demodulates an FM broadcast wave and outputs an analog audio signal. The A/D conversion section 2 converts the analog audio signal into a digital audio signal. The DSP 3 includes a music track extraction section (section which extracts only a music track portion from the audio signal and outputting the music track portion) and an audio codec section (including an encoder which encodes an uncompressed digital audio signal into compressed audio data and a decoder which decodes the compressed audio data into the uncompressed digital audio signal). The D/A conversion section 4 converts the digital audio signal into an analog audio signal and outputs the analog audio signal. If the audio signal is a stereo signal, respective signals of two left and right channels are output. The CPU 5 is a processor. The memory 6 is a so-called work memory for the CPU 5. Recorded on the recording medium 7 are the compressed audio data (recorded music track data) and setting information added thereto.
First, the FM tuner 1 and the encoder within the DSP 3 are activated, and an audio signal is recorded into a recorded file on the recording medium 7 (for example, HDD) while being encoded (S1 and S2). Based on an encoded sound waveform, calculation of an audio power value, calculation of a change amount of the audio power value, and calculation of a difference (L-R difference) signal between the two left and right channels are started (S3, S4, and S5).
Here,
Further,
If a change point at which the change amount of the audio power is equal to or larger than a predetermined value (indicated by, for example, the broken line of the graph at the left bottom of
On the other hand, if neither the average value of a power nor the average value of the L-R difference is equal to or lager than the threshold value, a position of the change point (relative time instant with reference to the start of recording) is recorded as a non-music track point (TA(i)) (S10). This procedure is repeated until an instruction to stop the recording is issued (S11, S12).
If the instruction to stop the recording is issued (yes in S12), the encoding is stopped, the non-music track point (TA(i)) is saved, and the recorded file is closed (S13). The non-music track point (TA(i)) may be saved in the recorded file separately from the compressed audio data, or may be saved in a file other than the recorded file.
Note that, only the non-music track point is recorded and a music track point is not recorded in the above-mentioned processing because the recording/reproduction device 100 according to this embodiment judges that a segment (1) between the non-music track point and the next non-music track point (2) which has a length equal to or longer than a predetermined time (for example, equal to or longer than 90 seconds) is a music track segment (which is described later with reference to the flowchart of
Further, in the above-mentioned processing, the non-music track point is determined if neither the average value of the power nor the average value of the L-R difference is equal to or lager than the threshold value, while the music track point is determined if the average value of the audio power or the average value of the L-R difference is equal to or lager than the threshold value, because: (1) the average value of the audio power tends to be larger in the music track portion than in the non-music track portion; and (2) the average value of the audio power does not become so small even if the field intensity is lowered. This is described with reference to
The graph at the top of
The graph at the middle of
The graph at the bottom of
First, a non-music track point TA(i) is read from a recorded file or the like (S21). Then, a distance (for example, TA(1)-TA(0)) between adjacent non-music track points TA(i) is calculated (S22). If the distance is equal to or longer than TM seconds (for example, equal to or longer than 90 seconds), the non-music track points TA(0) and TA(1) are recorded as the start point and the end point of the music track, respectively (S23). If the distance is shorter than TM seconds, the procedure returns to Step S22 while incrementing i by 1, in which TA(2)-TA(1) is calculated and compared with TM seconds. This processing is repeated until there is no candidate for point data indicating a music track (until the judgment of Step S26 results in yes).
First, a recording/reproduction device 100a according to a second embodiment being an embodiment of the present invention is described in detail with reference to the drawings. Note that, the second embodiment is a specific example of performing a judgment between the music track portion and the non-music track portion by using the above-mentioned characteristic found by the present applicant (that more change points occur in the non-music track part such as a talk than in the music track part).
The recording/reproduction device 100a according to this embodiment includes the FM tuner 1, an AM tuner 1a, the A/D conversion section 2a, a DSP 3a, the D/A conversion section 4, the CPU 5, the memory 6, and the recording medium 7.
The AM tuner 1a demodulates an AM broadcast wave and outputs an analog audio signal. The A/D conversion section 2a converts the analog audio signal output from the FM tuner 1 and the AM tuner 1a into a digital audio signal. The DSP 3a includes the music track extraction section and the audio codec section, but the configuration and operation of the music track extraction section are different from those of the DSP 3 of the recording/reproduction device 100 according to the first embodiment (details thereof is described later). The D/A conversion section 4 converts the digital audio signal into an analog audio signal and outputs the analog audio signal. The CPU 5, the memory 6, and the recording medium 7 are the same as those of the recording/reproduction device 100 according to the first embodiment.
Note that,
Next, the music track extraction section included in the DSP 3a of the recording/reproduction device 100a according to the second embodiment is described in detail with reference to the drawings.
The music track extraction section included in the DSP 3a of the recording/reproduction device 100a according to this embodiment includes an audio power calculation section 301, a second change amount calculation section 302, a second change point detection section 303, a second change point frequency calculation section 304, an audio power average calculation section 305, a difference signal calculation section 306, a difference signal average calculation section 307, and a music track segment judgment section 308.
In the same manner as in the recording/reproduction device 100 according to the first embodiment, as illustrated in
In the same manner as in the recording/reproduction device 100 according to the first embodiment, as illustrated in
In the same manner as in the recording/reproduction device 100 according to the first embodiment, as illustrated in
The second change point frequency calculation section 304 calculates a frequency of the second change point detected by the second change point detection section 303. For example, it is possible to count the number of second change points included in a second time described later and calculate the number as the frequency of the second change point.
In the same manner as in the recording/reproduction device 100 according to the first embodiment, as illustrated in
In the same manner as in the recording/reproduction device 100 according to the first embodiment, as illustrated in
In the same manner as in the recording/reproduction device 100 according to the first embodiment, as illustrated in
In the same manner as in the recording/reproduction device 100 according to the first embodiment, the music track segment judgment section 308 performs the judgment between the music track portion and the non-music track portion based on the magnitude of the audio power (the above-mentioned power value) and the magnitude of the difference signal (the above-mentioned difference value). Specifically, if it is confirmed at least one of that the average value of the audio power calculated by the audio power average calculation section 305 is equal to or larger than the threshold value as illustrated in
Further, in the recording/reproduction device 100a according to this embodiment, the music track segment judgment section 308 performs the judgment between the music track portion and the non-music track portion based on a frequency at which the change amount of the audio power becomes equal to or larger than a predetermined magnitude. An outline of the above-mentioned judgment method is described in detail with reference to the drawings.
Therefore, if it is confirmed that the frequency of the second change point calculated by the second change point frequency calculation section 304 is equal to or smaller than the threshold value, the music track segment judgment section 308 judges at least one part of the confirmed time as the music track portion. Further, if it is confirmed that the frequency of the second change point calculated by the second change point frequency calculation section 304 is larger than the threshold value, the music track segment judgment section 308 judges at least one part of the confirmed time as the non-music track portion.
That is, if it is confirmed at least one of that the average value of the audio power is equal to or larger than the threshold value, that the average value of the difference signal is equal to or larger than the threshold value, and that the frequency of the second change point is equal to or smaller than the threshold value, the music track segment judgment section 308 judges at least one part of the confirmed time as the music track portion. In contrast, if it is confirmed all of that the average value of the audio power is smaller than the threshold value, that the average value of the difference signal is smaller than the threshold value, and that the frequency of the second change point is larger than the threshold value, the music track segment judgment section 308 judges at least one part of the confirmed time as the non-music track portion.
With the above-mentioned configuration, the judgment between the music track portion and the non-music track portion of the audio signal is performed based on the state of the audio power. Therefore, even if received field intensity is low or even if a broadcast being received is transmitting only the monaural data, it is possible to perform the judgment between the music track portion and the non-music track portion of the audio signal with high accuracy. This is not limited to the recording/reproduction device 100a according to this embodiment, and the same applies to the recording/reproduction device 100 according to the first embodiment.
Note that, in the recording/reproduction device 100a according to this embodiment, the music track segment judgment section 308 performs the judgment between the music track portion and the non-music track portion of the audio signal based on three factors, that is, the magnitude of the audio power, the magnitude of the difference signal, and the frequency at which the change amount of the audio power becomes large, but the judgment based on at least one of the magnitude of the audio power and the magnitude of the difference signal does not need to be performed. That is, the recording/reproduction device 100a may be configured to exclude at least one of the audio power average calculation section 305 and the pair of the difference signal calculation section 306 and the difference signal average calculation section 307. Further, the same applies to the recording/reproduction device 100 according to the first embodiment, and the judgment based on the magnitude of the difference signal does not need to be performed.
However, it is preferred that the judgment between the music track portion and the non-music track portion of the audio signal be performed by using various kinds of judgment methods because the judgment can be performed with high accuracy as described in the first embodiment. Further, as described above, if a portion to be judged as the music track portion is judged as the music track portion by any one of a plurality of judgment methods, the music track portions of the audio signal can be judged without exception.
Next, a specific example of the operation of the recording/reproduction device 100a according to the second embodiment illustrated in
As illustrated in
Subsequently, the audio signals output from the A/D conversion section 2a are sequentially read into an audio first-in first-out (FIFO) section 61 (S43). Then, the music track extraction section of the DSP 3a performs the above-mentioned judgment on the audio signals sequentially read from the audio FIFO section 61. Note that, the audio FIFO section 61 can be interpreted as a part of the memory 6.
First, the audio power calculation section 301 calculates the audio power as described above (S44). Further, the difference signal calculation section 306 calculates the difference signal as described above (S45). The calculation of the audio power and the calculation of the difference signal are performed until the processing on the audio signal during a first time T1(n) is finished (until the judgment of Step S46 results in yes).
The first time T1(n) is a unit time for performing a processing (judgment) by dividing the audio signal by predetermined times. One first time has a duration of, for example, several tens of milliseconds (ms).
After the audio power and the difference signal of the audio signal during the first time T1(n) are calculated, the audio power average calculation section 305 calculates the average value of the audio power during the first time T1(n) as described above (S47). Further, the difference signal average calculation section 307 calculates the average value of the difference signal during the first time T1(n) as described above (S48). Further, the second change amount calculation section 302 calculates a second change amount c(n) of the audio power during the first time T1(n) as described above (S49).
If the second change amount c(n) is equal to or larger than the threshold value (yes in S50), a data item “1” indicating that the second change point exists is recorded in a change point FIFO section 62 (S51). On the other hand, if the second change amount c(n) is smaller than the threshold value (no in S50), a data item “0” indicating that the second change point does not exist is recorded in the change point FIFO section 62 (S52). Note that, the change point FIFO section 62 can be interpreted as a part of the memory 6.
Further, the second change point frequency calculation section 304 calculates the frequency of the second change point by referencing the data items recorded in the change point FIFO section 62 (S53). At this time, at least the data items regarding the second change point detected from a music signal during a second time T2(n) are recorded in the change point FIFO section 62. The second change point frequency calculation section 304 calculates the frequency of the second change point by counting the number of the data items “1” indicating that the second change point exists among the data items during the second time T2(n) read from the change point FIFO section 62 (S53).
In the same manner as the first time T1(n), the second time T2(n) is a unit time for performing a processing (judgment) by dividing the audio signal by predetermined times. One second time T2(n) has a duration of, for example, several seconds (s). Note that, the second time T2(n) is a time for calculating the frequency of the second change point, and hence it is preferred that the second time T2(n) be at least a time longer than the first time T1(n).
The first time T1(n) and the second time T2(n) are described in detail with reference to the drawings.
Further, as described above, the music track segment judgment section 308 performs the judgment between the music track portion and the non-music track portion of the audio signal based on the three factors, that is, the magnitude of the audio power, the magnitude of the difference signal, and the frequency at which the change amount of the audio power becomes large (S54). Note that, the music track segment judgment section 308 may output the non-music track point TA(i) as a judgment result in the same manner as in the recording/reproduction device 100 according to the first embodiment.
The time of the audio signal at which the music track segment judgment section 308 performs the judgment based on the magnitude of the audio power and the magnitude of the difference signal is at least a part of the first time T1(n) (for example, time instant substantially at the midpoint of the first time T1(n)). Meanwhile, the time at which the judgment is performed based on the frequency at which the change amount of the audio power becomes large is at least a part of the second time T2(n) (for example, time instant substantially at the midpoint of the second time T2(n)).
As described above, in the recording/reproduction device 100a according to this embodiment, the time of the audio signal at which the music track segment judgment section 308 performs the judgment may be shifted depending on each judgment method. Therefore, for example, judgment results obtained sequentially (for example, respective judgment results based on the magnitude of the audio power and the magnitude of the difference signal) may be retained in a judgment result retaining section 63, and final judgment results may be output after the judgment results obtained by the above-mentioned three methods have been produced. Note that, the judgment result retaining section 63 can be interpreted as a part of the memory 6.
If the judgment is performed on the audio signal in Step S54, for example, the CPU 5, the DSP 3a, or the like increments the variable n by 1 (S55). Then, the above-mentioned judgment (S43 to S55) is repeated until the instruction to stop the recording is issued (until the judgment of S56 results in yes).
If the instruction to stop the recording is issued (yes in S56), the encoding is stopped, the judgment results (for example, non-music track point TA(i)) are saved, and the recorded file is closed (S57). The judgment results may be saved in the recorded file separately from the compressed audio data, or may be saved in a file other than the recorded file.
With such a configuration, it is possible to smoothly combine and perform the respective judgment methods based on the magnitude of the audio power, the magnitude of the difference signal, and the frequency at which the change amount of the audio power becomes large.
Note that, there may be a case where sufficient data (data on the second time T2(n) necessary for the judgment) is not recorded in the change point FIFO section 62 at the start or the end of the judgment. In such a case, for example, the judgment result of other judgment methods (judgments based on the magnitude of the audio power and the magnitude of the difference signal) may be employed, the judgment may be performed by referencing data during a time shorter than the second time T2(n) recorded in the change point FIFO section 62, or the judgment may be performed by compensating insufficient data by dummy data.
Further, the judgment result produced by a judgment method having a high judgment accuracy may be given a higher priority than the judgment result produced by another judgment method. In this case, for example, the final judgment may be performed by assigning priorities to (weighting) the judgment results produced by the respective judgment methods and combining the judgment results produced by the respective judgment methods.
Further, in the case where the music track segment judgment section 308 outputs the non-music track point TA(i) as the judgment result, the method of generating the playlist as illustrated in
The same judgment methods as those in the recording/reproduction device 100 according to the first embodiment may be employed in the respective judgments based on the magnitude of the audio power and the magnitude of the difference signal performed by the music track segment judgment section 308 of the recording/reproduction device 100a according to the second embodiment. The configuration for this case is described in detail with reference to the drawings.
The music track extraction section included in the DSP 3a of the recording/reproduction device 100a according to this example includes the audio power calculation section 301, the second change amount calculation section 302, the second change point detection section 303, the second change point frequency calculation section 304, an audio power average calculation section 305b, the difference signal calculation section 306, a difference signal average calculation section 307b, a music track segment judgment section 308b, a first change amount calculation section 309b, and a first change point detection section 310b.
As illustrated in
Then, in the same manner as in the recording/reproduction device 100 according to the first embodiment, as illustrated in
Further, in the same manner as in the recording/reproduction device 100 according to the first embodiment, as illustrated in
In the same manner as in the recording/reproduction device 100 according to the first embodiment, the music track segment judgment section 308b performs the judgment at the time instant of the first change point of the audio signal based on the magnitude of the audio power and the magnitude of the difference signal. Further, in the same manner as in the normally used recording/reproduction device 100a according to the second embodiment, the music track segment judgment section 308b performs the judgment at a time of at least one part of the second time T2(n) (for example, time instant substantially at the midpoint of the second time T2(n)) based on a frequency at which the second change amount of the audio power becomes large (the number of the second change points included in the second time T2(n)).
Even with such a configuration, it is possible to combine and perform the respective judgment methods based on the magnitude of the audio power, the magnitude of the difference signal, and the frequency at which the change amount of the audio power becomes large.
Note that, the second predetermined value used by the second change point detection section 303 which detects the second change point may be set smaller than the predetermined value used by the first change point detection section 310b which detects the first change point as illustrated in
With such a configuration, the first change point and the second change point that are suitable for each of the judgment methods can be detected, which can improve the judgment accuracy of each of the judgment methods. Specifically, for example, the judgment accuracy of the judgment methods based on the magnitude of the audio power and the magnitude of the difference signal can be improved if the first predetermined value is raised to an extent that allows a boundary between the music track portion and the non-music track portion to be judged with high certainty. Further, for example, the judgment accuracy of the judgment method based on the frequency at which the change amount of the audio power becomes large can be improved if the second predetermined value is reduced to an extent that allows a dispersed state and a dense state to be clearly distinguished from each other (that increases a difference between the numbers of the second change points in the respective states).
Further, in this example, the second change amount calculation section 302 and the first change amount calculation section 309b may be shared. Further, the second change point detection section 303 and the first change point detection section 310b may be shared. With such a configuration, a processing amount of the DSP 3a can be reduced.
A part or all of the operations of the DSPs 3 and 3a or the like of the recording/reproduction devices 100 and 100a according to the embodiments of the present invention may be performed by a control device such as a microcomputer. Further, all or a part of functions realized by such a control device may be described as a program, and all or a part of functions realized by such a control device may be realized by executing the program on a program execution device (for example, computer).
Further, irrespective of the above-mentioned case, the recording/reproduction devices 100 and 100a illustrated in
The above-mentioned descriptions of the respective embodiments are intended solely to describe the present invention, and should not be interpreted as limiting the invention beyond the scope of the appended claims or reducing the scope. Further, the respective components of the present invention are not limited to the above-mentioned embodiments, and naturally various kinds of modifications can be made within the technical scope described within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2009-223066 | Sep 2009 | JP | national |
2010-195431 | Sep 2010 | JP | national |