This application is a 371 application of the International PCT application serial no. PCT/JP2018/020079, filed on May 24, 2018. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The present disclosure relates to a beat timing generation device and a beat timing generation method.
In the related art, there are devices outputting tempo information reflecting a playing tempo (for example, Patent Literature 1). In addition, there is a technology in which a tempo clock synchronized with music included in an audio signal can be generated (for example, refer to Patent Literature 2 and Patent Literature 3). In addition, there is a technology in which a rhythm pattern of an input acoustic signal is determined (for example, Patent Literature 4).
On an occasion of performing or singing music, listeners often add hand clapping to the music in accordance with the rhythm thereof. It is conceivable that such hand clapping be added automatically (without depending on manual operation) during a performance or a reproduction of music. Incidentally, technologies in the related art are used when generating tempo information or a tempo clock or determining a rhythm pattern, and automatic output of hand clapping is not taken into consideration, whereby a large amount of complicated calculation is required.
The present disclosure has been made in consideration of the foregoing circumstances, and an objective thereof is to provide a beat timing generation device and a beat timing generation method capable of generating timings of a beat with a small amount of calculation.
According to an aspect of the present disclosure, there is provided a beat timing generation device including a generation unit for generating, from data of music that has been input, information of timings on which a beat of the music depends and a plurality of pieces of intensity data indicating a power at the timings, a calculation unit for calculating a period and a phase of the beat of the music using the plurality of pieces of intensity data, and a detection unit for detecting a beat generation timing on a basis of the period and the phase of the beat.
The beat timing generation device may further include a reproduction processing unit for performing reproduction processing of the beat in accordance with the beat generation timing.
The beat timing generation device may employ a configuration in which the calculation unit sets a beats per minute (BPM) for the plurality of pieces of intensity data on a basis of the timings indicated by the plurality of pieces of intensity data, calculates one period of the BPM as the period of the beat, and calculates a relative position of the beat generation timing in a sine wave indicating the BPM as the phase of the beat; and the detection unit obtains a count value indicating the period of the beat and the phase of the beat, performs counting up to the count value using a counter incremented for a sampling rate for each one of samples, and detects a timing as the beat generation timing when a value of the counter reaches the count value.
The beat timing generation device may employ a configuration in which the calculation unit calculates, as the period of the beat, one period of the BPM at a time when a value of Fourier conversion data obtained through Fourier conversion performed for each of the plurality of pieces of intensity data and each of a plurality of BPM becomes the largest.
The beat timing generation device may employ a configuration in which when the Fourier conversion data is obtained for each of the plurality of pieces of intensity data and a first BPM of the plurality of BPM, the calculation unit acquires the Fourier conversion data for at least one second BPM having a vibrational frequency which is an integral multiple of a vibrational frequency of the first BPM, and the calculation unit sums up a value of the Fourier conversion data calculated using the first BPM and a value of the Fourier conversion data calculated using the second BPM at a predetermined proportion as a value of the Fourier conversion data for the first BPM.
The beat timing generation device may employ a configuration in which the generation unit acquires a frame consisting of a predetermined number of continuous samples of sound from data of the music that has been input, performs thinning of the samples in the frame, performs fast Fourier conversion for the thinned samples, performs processing of obtaining data indicating a sum of powers of each frequency bandwidth obtained through fast Fourier conversion at predetermined intervals, and extracts data indicating the sum of the powers as the intensity data when the data indicating the sum of the powers indicating a value larger than itself has not been in existence continues for a predetermined time.
According to another aspect, there is provided a beat timing generation method including generating, from data of music that has been input, information of timings on which a beat of the music depends and a plurality of pieces of intensity data indicating a power at the timings, calculating a period and a phase of the beat of the music using the plurality of pieces of intensity data, and detecting a beat generation timing on a basis of the period and the phase of the beat.
Hereinafter, with reference to the drawings, a beat timing generation device and a beat timing generation method according to the embodiment will be described. The configuration of the embodiment is an example. The present disclosure is not limited to the configuration of the embodiment.
The ROM 11 stores various programs executed by the CPU 10 and data used when the programs are executed. The RAM 12 is used as a development domain for the programs, an operation domain for the CPU 10, a storage domain for data, and the like. The HDD 13 stores the programs, the data used when the programs are executed, music data, and the like. For example, the music data is sound data having a predetermined audio file format such as an MP3 or a WAVE form. The format form of an audio file may be a form other than an MP3 or a WAVE form. The ROM 11 and the RAM 12 are examples of a main storage device, and the HDD 13 is an example of an auxiliary storage device. The main storage device and the auxiliary storage device are examples of a storage device or a storage medium.
The input device 14 includes keys, buttons, a touch panel, and the like and is used for inputting information (including an instruction and a command). The display device 15 is used for displaying information. The communication I/F 16 is connected to a network 2 and controls processing related to communication. The CPU 10 can download desired music data (musical signal) from the network 2, for example, in accordance with an instruction input from the input device 14 and can store the music data in the HDD 13.
The CPU 10 performs various kinds of processing by executing a program. In addition to the processing related to downloading of music, the processing includes processing related to reproduction of music, processing of generating a beat generation timing of music, processing of outputting a beat (for example, a clapping sound, particularly a hand clapping sound or the like) in accordance with a beat generation timing, and the like.
For example, when music data is reproduced, the CPU 10 generates digital data (digital signal) representing musical sound from music data read out from the HDD 13 to the RAM 12 by executing a program and supplies the digital data to the D/A 17. The D/A 17 converts digital data representing sound into an analog signal through digital/analog conversion and outputs the analog signal to the AMP 18. An analog signal of which an amplitude is adjusted by the AMP 18 is output from the speaker 19.
For example, the MIC 21 collects singing sound and the like accompanying musical sound (karaoke) output from the speaker 19. An analog audio signal collected by the MIC 21 has its amplitude amplified by the AMP 18 and is amplified through the speaker 19. At this time, singing sound may be mixed with musical sound or may be output from separate speakers individually.
In addition, the MIC 21 is also used when magnifying sound (outputting sound from the speaker 19) or recording sound by collecting audio of a performance using a musical instrument (a so-called live performance) or reproduced audio of music from external equipment. For example, a performance sound signal collected by the MIC 21 is converted into a digital signal by the A/D 20 and delivered to the CPU 10. The CPU 10 converts the performance sound signal into a form in accordance with an audio file format, generates an audio file, and stores the audio file in the HDD 13. The processing of generating a beat generation timing may be performed for a musical sound signal collected by the MIC 21.
The beat timing generation device 1 may include a drive device (not illustrated) for a disc-type recording medium such as a compact disc (CD). In this case, musical sound may be reproduced by supplying a digital signal representing musical sound read from a disc-type recording medium using the drive device to the D/A 17. In this case, processing of generating a beat generation timing may be performed for a musical sound signal read from the disc-type recording medium.
The generation unit 101 for Spx data generates and outputs Spx data using digital data representing musical sound. The buffer 102 accumulates the Spx data (corresponding to a plurality of pieces of intensity data) from over at least a predetermined time. In the present embodiment, a time period of six seconds is adopted for the predetermined time as an example, but the predetermined time may be longer or shorter than six seconds. The calculation unit 103 calculates period data and phase data of a beat using a cluster of the Spx data accumulated in the buffer 102 from over the predetermined time. The detection unit 104 for a generation timing detects a beat generation timing using period data and phase data. The reproduction processing unit 105 performs reproduction processing of a beat in accordance with a generation timing.
Hereinafter, processing of each part constituting the control unit 100 will be described in detail.
<Generation of Spx Data>
Generation of Spx data performed by the generation unit 101 will be described. A digital signal representing musical sound data (data sent to the D/A 17 for outputting audio) related to reproduction is input to the generation unit 101. A digital signal (musical signal) representing sound may be obtained through reproduction processing of music data stored in the HDD 13 or may be obtained through A/D conversion of an audio signal collected by the MIC 21.
Digital data representing sound is stored in the RAM 12 and is used for processing of the generation unit 101. Digital data representing sound is a cluster of sample (specimen) data (generally, voltage values of analog signals) gathered from analog signals in accordance with a predetermined sampling rate. In the present embodiment, as an example, the sampling rate is 44,100 Hz. However, the sampling rate can be suitably changed as long as a desired FFT resolution can be obtained.
In S02, the generation unit 101 performs thinning processing. That is, the generation unit 101 performs ¼ thinning with respect to 1,024 samples and obtains 256 samples. Thinning other than ¼ thinning may be adopted. In S03, the generation unit 101 performs fast Fourier conversion (FFT) with respect to 256 samples and obtains data (which will be referred to as power data) indicating a magnitude of power in units of frames from the results (power of each frequency bandwidth) of the FFT (S04). Since the power is expressed as a square of an amplitude, the concept of “power” also includes an amplitude.
For example, the power data is the sum of powers obtained by performing FFT with respect to 256 samples. When the power of a corresponding bandwidth in a preceding frame is subtracted from the power of each frequency bandwidth in a current frame and a value thereof is positive (power has increased), the value of the power is saved for calculation of the sum, but any other values than those described above (negative values after subtraction (power has decreased)) may be disregarded. This is because there is a high probability that a part having a large amount of increase in power is a beat.
In addition, as long as the comparison target is the same as that in another frame, a value used for calculating the sum may be the sum of powers of the current frame, may be the sum of powers of which the value obtained by subtracting the power of the preceding frame from the power of the current frame is a positive value, or may be a difference obtained by subtracting the power of the preceding frame from the power of the current frame. In addition, in a power spectrum obtained by performing FFT, calculation of the foregoing difference may be performed for only a frequency lower than a predetermined frequency. A frequency equal to or higher than the predetermined frequency may be filtered out using a low pass filter.
The power data is stored in the RAM 12 or the HDD 13 in units of frames. The generation unit 101 compares the magnitudes of the sums (peak values) of the power every time the power data in units of frames is prepared, saves data indicating a larger sum, and discards data indicating a smaller sum (S05). The generation unit 101 determines whether or not a sum larger than the sum saved in S05 has been in existence for a predetermined time (S06). The predetermined time is 100 ms, for example, but it may be longer or shorter than 100 ms. When a state in which data indicating a larger sum has not been in existence continues for the predetermined time, the generation unit 101 extracts the data indicating the sum of powers as Spx data and stores (retains) the Spx data in the buffer 102 (S07). In this manner, the Spx data is data obtained by extracting peak values of digital data indicating music at intervals of 100 ms and is data indicating information (information of timings) of timings on which a beat of music depends and the power at the timings. A plurality of pieces of Spx data is accumulated in the buffer 102. The generation unit 101 repeatedly performs the processing from S01 to S06.
<Calculation of Period Data and Phase Data>
To be specific, the sum of products with respect to Exp(2πjft) (for a sine wave vibrating at a BPM frequency, the amplitude is the same regardless of the vibrational frequency) is obtained for a frequency (BPM frequency) f={86, 90, 94, and so on to 168}/60 corresponding to a predetermined number of BPM values, for example, 20 BPM values within a range of 86 to 168 for the six seconds of Spx data. That is, Fourier conversion is performed. The result of Fourier conversion is adopted as Fourier conversion data c(i) (i=0, 1, 2, 3, and so on to 19).
Here, the factor t(k) in Expression 1 indicates a time position during the previous six seconds in which the Spx data is present and is based on units of seconds. The factor k indicates an index of the Spx data, such as k=1, and so on to M (M is the number of pieces of Spx data). In addition, the factor x(t(k)) indicates a current value (magnitude of the peak value) of the Spx data. The factor j indicates the unit of imaginary numbers (j2=−1). The factor f(i) indicates a BPM frequency. For example, BPM 120 indicates 2.0 Hz.
The calculation unit 103 determines the BPM of which the absolute value corresponds to the largest value as the BPM of the Spx data (beat) in c(i)=(c0, c2, c3, and so on to c19) (S13). In addition, the phase value thereof (Phase) φ=Arg(c(i)) [rad] is adopted as a beat timing for the six seconds of Spx data. A beat timing indicates a relative position with respect to the beat generation timings which arrive periodically.
The phase value φ indicates a deflection angle of a complex number and can be obtained by the following Expression 2 when c=cre+jcim (cre is a real part and cim is an imaginary part) is established.
Through calculation of the phase value φ, it is possible to ascertain the relative position of the beat generation timing with respect to a BPM sine wave, that is, the degree of a delay of the beat generation timing with respect to one period with respect to the BPM.
For example, when the BPM is 104 and the sampling rate is 44,100 Hz, the period data (the number of samples) has 44,100 [samples]/(104/60)=25,442 [samples]. In addition, in a case in which the period data has 25,442 [samples], when the phase value φ is 0.34 [rad], the phase data (the number of samples) has 25,442 [samples]×0.34 [rad]/2π[rad]=1,377 [samples] (S15). Further, the calculation unit 103 outputs the period data and the phase data (S16). The calculation unit 103 repeatedly performs the processing of S11 to S16 every time six seconds of Spx data has been accumulated. Accordingly, it is possible to follow a change in rhythm of music.
<Detection of Beat Generation Timing>
In S22, the detection unit 104 employs new period data and phase data for detecting the beat generation timing, and old period data and old phase data are discarded. At this time, when Spx data is prepared, the samples of the frame constituting the Spx data are in a state in which a delay of 100 ms is applied thereto. Therefore, here, time adjustment (phase adjustment) is performed such that music and the rhythm being performed or reproduced and a hand clapping sound (which will be described below) coincide with each other. Thereafter, the processing proceeds to S23.
In S23, a counter is set using the number of samples of the period data and the number of samples of the phase data. For example, the detection unit 104 has a counter counting up (incrementing) for each one of the samples (intervals between voltage checks of the analog signal in accordance with the sampling rate) according to the sampling rate and incrementing up to the count value of the counter for each one of the samples. Accordingly, the detection unit 104 stands by until the count value becomes equal to or larger than a predetermined value (a value indicating the sum of the number of samples (count value) of the phase data and the number of samples (count value) of the period data) from zero (S24).
When the count value of the counter becomes equal to or larger than the predetermined value, the detection unit 104 detects the beat generation timing based on the estimation and outputs an output instruction of a beat (S25). In response to the output instruction, the reproduction processing unit 105 sends the digital data of the beat (for example, a hand clapping sound) which has been stored in the ROM 11 or the HDD 13 in advance to the D/A 17. The digital data is converted into an analog signal by the D/A 17, has its amplitude amplified by the AMP 18, and then is output from the speaker 19. Accordingly, a hand clapping sound is output in a manner of being superimposed on the music being reproduced or performed.
According to the embodiment, music which has been reproduced or performed (in the past) is input to the generation unit 101, and the generation unit 101 generates Spx data. Such Spx data is accumulated in the buffer 102, the calculation unit 103 calculates the period and the phase of a beat from a plurality of pieces of Spx data from over a predetermined time (six seconds), and the detection unit 104 detects and outputs the beat generation timing in accordance with the music being reproduced or performed. Accordingly, the reproduction processing unit 105 can output a hand clapping sound which coincides with the rhythm of music being reproduced or performed. This hand clapping sound can be automatically output through generation of the Spx data described above, calculation of the period and the phase of the beat based on the Fourier conversion data, and a simple algorithm having a small amount of calculation, such as counting up to the counter value. Accordingly, increase in load with respect to an object (CPU 10) which executes processing or increase in memory resource usage can be avoided. In addition, due to the small amount of processing, a clapping sound can be output without a delay (or even if there is a delay, people cannot recognize the delay) with respect to reproduction sound or performance sound.
The processing performed by the control unit 100 may be performed using a plurality of CPUs (processors) or may be performed using a CPU having a multi-core configuration. In addition, the processing performed by the control unit 100 may be executed by a processor (a DSP, a GPU, or the like) other than the CPU 10, an integrated circuit (an ASIC, an FPGA, or the like) other than a processor, or a combination (an MPU, an SoC, or the like) of a processor and an integrated circuit.
In the embodiment described above, regarding the BPM used for calculating a period data, an example using a BPM within a range of 86 to 168 has been described. In contrast, the absolute value (spectral intensity) of c(i) is obtained in not only the BPM within a range of 86 to 168 (each thereof corresponds to a first BPM) but also in the BPM within a range of 172 to 336 (twice thereof) or the BPM within a range of 344 to 672 (four times thereof) (corresponding to at least one second BPM having the vibrational frequency which is an integral multiple of the vibrational frequency of the first BPM).
Depending on the music, since there may be a greater power with respect to a BPM corresponding to finer eighth notes or sixteenth notes than a basic beat symbolized by quarter notes, a better BPM may be able to be selected by reflecting twice or four times this power in the intensity of the basic beat. In the foregoing example, twice or four times are adopted as an example of an integral multiple, but similar effects can be obtained with an integral multiple of three times, five times, or greater. The constituents shown in the embodiment can be suitably combined within a range not departing from the objective.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/020079 | 5/24/2018 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2019/224990 | 11/28/2019 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7373209 | Tagawa | May 2008 | B2 |
7534951 | Yamashita | May 2009 | B2 |
7923621 | Shiraishi | Apr 2011 | B2 |
8269093 | Naik | Sep 2012 | B2 |
8426715 | Bregar | Apr 2013 | B2 |
8436241 | Yamashita | May 2013 | B2 |
8704069 | Naik | Apr 2014 | B2 |
10262640 | Takehisa | Apr 2019 | B2 |
20050117032 | Ueda | Jun 2005 | A1 |
20060127054 | Matsuyama | Jun 2006 | A1 |
20070234882 | Usa | Oct 2007 | A1 |
20080276793 | Yamashita | Nov 2008 | A1 |
20090056526 | Yamashita | Mar 2009 | A1 |
20090287323 | Kobayashi | Nov 2009 | A1 |
20100282045 | Chen | Nov 2010 | A1 |
20110144780 | Ueshima | Jun 2011 | A1 |
20130226957 | Ellis | Aug 2013 | A1 |
20150094835 | Eronen | Apr 2015 | A1 |
20200152162 | Maezawa | May 2020 | A1 |
20200357368 | Yoshino | Nov 2020 | A1 |
20210241729 | Kusakabe | Aug 2021 | A1 |
20220020348 | Kusakabe | Jan 2022 | A1 |
20220351707 | Kusakabe | Nov 2022 | A1 |
20230053899 | Minakata | Feb 2023 | A1 |
Number | Date | Country |
---|---|---|
104620313 | May 2015 | CN |
2002150689 | May 2002 | JP |
2003289494 | Oct 2003 | JP |
2004302053 | Oct 2004 | JP |
2006102080 | Apr 2006 | JP |
2006153923 | Jun 2006 | JP |
2007033851 | Feb 2007 | JP |
2008107569 | May 2008 | JP |
2008275975 | Nov 2008 | JP |
2008283305 | Nov 2008 | JP |
2009092681 | Apr 2009 | JP |
2009098262 | May 2009 | JP |
2010055076 | Mar 2010 | JP |
4561735 | Oct 2010 | JP |
2011164497 | Aug 2011 | JP |
2012118417 | Jun 2012 | JP |
2017219595 | Dec 2017 | JP |
2018180480 | Nov 2018 | JP |
2020529235 | Oct 2020 | JP |
2008129837 | Oct 2008 | WO |
WO-2019043798 | Mar 2019 | WO |
Entry |
---|
“Office Action of Japan Counterpart Application”, dated Nov. 24, 2021, with English translation thereof, pp. 1-6. |
“Written Opinion and Written amendment of Japan Counterpart Application”, submitted on Jan. 20, 2022, with English translation thereof, pp. 1-8. |
“International Search Report (Form PCT/ISA/210) of PCT/JP2018/020079,” dated Aug. 7, 2018, with English translation thereof, pp. 1-5. |
Number | Date | Country | |
---|---|---|---|
20210241729 A1 | Aug 2021 | US |