The present technology relates to an acoustic processing apparatus, an acoustic processing method, a program, an electronic apparatus, a server apparatus, a client apparatus, and an acoustic processing system, and, in particular, to an acoustic processing apparatus or the like able to mark karaoke using commercial music content.
Most karaoke marking methods, systems and apparatuses of the related art prepare singing main melody data that is a model in addition to accompaniment data that does not include the singing main melody of music, and perform marking according to the degree of matching between pitch time series data extracted from the singing voice of the singer that is the marking target and the singing main melody data (for example, PTL 1). Such a karaoke marking function is provided through karaoke apparatuses or karaoke games installed in karaoke shops and restaurants in town, and Internet services or the like.
Meanwhile, current commercial music content is delivered to end users in forms such as a physical media package, such as a CD, or by download sales in a compressed audio file format, such as MP3 and AAC, through a communication line, such as the Internet. Most commercial music content is ordinarily provided as an audio signal in which the singing and accompaniment are indistinctly recorded, and in this case, the singing main melody is not provided as independent data.
If a technology exists that extracts only the singing main melody signal from an audio signal of commercial music content in which the singing and accompaniment are mixed, it is possible to realized karaoke marking with the method of the related art. However, even though there has been much research, it is difficult to say that there is sufficient precision in the signal extraction of the singing main melody. In consideration of the above situation, it can be said that there has until now been no means for enjoying karaoke marking only with commercial music content provided as a CD or a compressed audio file format.
For control of acoustic effects of karaoke of the related art, it is common for a singer to use a karaoke apparatus (karaoke machine in a karaoke box, PC or game software), and to perform preselected adjustment of the echo and harmony, that is, to turn on or off and to control the strength and weakness of the functions. A method in which the music provider side prepares these acoustic effects in advance to match the atmosphere of the music in such a way that the acoustic effects are automatically applied has also been proposed (for example, refer to PTL 2).
However, in the case of the user setting the acoustic effect in advance, the effect is continued to be the same at the start and finish, and auditory stimulation is lacking. When used by a person with little singing ability, dissonance is generated with respect to the harmony, and not only the singer themself, but the surrounding listeners are also made uncomfortable. In a case in which the acoustic effects are changed to match the atmosphere of the music, although a given extent of auditory stimulation is obtained, the problems remain of the time and effort in the music provider side setting the acoustic effect in advance or the dissonance in a case in which a singer with little singing ability uses harmony.
PTL 1: Japanese Unexamined Patent Application Publication No. 4-070690
PTL 2: Japanese Unexamined Patent Application Publication No. 11-052970
An object of the present technology is to enable karaoke marking using commercial music content. Another object of the present technology is to enable application of acoustic effects in real time according to the singing ability of the singer.
According to an aspect of the present technology, there is provided an acoustic processing apparatus including a first feature amount calculator that calculates a first pitch feature amount from a music acoustic signal for each predetermined time interval; a second feature amount calculator that calculates a second pitch feature amount from a target comparison acoustic signal for each time interval corresponding to the predetermined time interval; and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount.
In the technology, the first pitch feature amount is calculated from the music acoustic signal for each predetermined time interval by the first feature amount calculator. The music acoustic signal, for example, is provided by a media package, such as a CD, or is provided by a communication line, such as the Internet. The predetermined time interval, for example, is a comparatively short time interval within the time such that a feature amount is approximately constant.
The second pitch feature amount is calculated by the second feature amount calculator from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval. The comparison acoustic signal is a singing voice signal or a musical instrument performance signal. The time interval corresponding to the predetermined time interval does not necessarily correspond one-to-one to the predetermined time interval, and may have a correspondence relationship with the predetermined time interval. For example, the time interval corresponding to the predetermined time interval may be a time interval of an integer multiple of the predetermined time interval.
For example, in the first feature amount calculator, signal intensity information for each time period or each frequency of the music acoustic signal is calculated as the first pitch feature amount. For example, in the second feature amount calculator, the time period or frequency of each signal component included in the target comparison acoustic signal is calculated as the second pitch feature amount.
A similarity between acoustic signals is calculated by the similarity calculator by comparison of the first pitch feature amount and the second pitch feature amount. For example, the above-described signal intensity information as the first pitch feature amount may be used as is, or may be binarized and used. It is possible to reduce the calculation amount for the similarity calculation by being binarized and used. For example, a time period that is double the time period or a frequency that is ½ the frequency may be used, in addition to the time period or the frequency as the second pitch feature amount.
In the present technology, the similarity between acoustic signals is calculated by comparison between a first pitch feature amount calculated from a music acoustic signal for each predetermined time interval and a second pitch feature amount calculated from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval, and, for example, it is possible to mark karaoke using commercial music content.
In the present technology, an acoustic effect application portion that applies a predetermined acoustic effect to the singing voice signal according to the similarity may be further included. In this case, it is possible to apply acoustic effects in real time according to the singing ability of the singer.
According to another aspect of the present invention, there is provided an electronic apparatus including an accompaniment audio output portion that performs output of accompaniment audio according to a music acoustic signal; an acoustic signal acquisition portion that acquires a target comparison acoustic signal; and a signal processing portion that performs comparison processing between the target comparison acoustic signal and the music acoustic signal, in which the signal processing portion includes a first feature amount calculator that calculates a first pitch feature amount from the music acoustic signal for each predetermined time interval, a second feature amount calculator that calculates a second pitch feature amount from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval, and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount.
According to still another aspect of the present technology, there is provided an acoustic processing apparatus including a marking processing portion that performs a marking processes based on a singing voice signal; and an acoustic effect application portion that applies a predetermined acoustic effect to the singing voice signal according to a result of the marking process.
In the technology, the marking process is performed based on the singing voice signal by the marking processing portion. A predetermined acoustic effect is applied to the singing voice signal according to the result of the marking process by the acoustic effect application portion. For example, the marking processing portion may be set so as to perform the marking process by calculating a similarity between a music acoustic signal and the singing voice signal. For example, the marking processing portion may include a first feature amount calculator that calculates a first pitch feature amount from the music acoustic signal for each predetermined time interval, a second feature amount calculator that calculates a second pitch feature amount from the singing voice signal for each time interval corresponding to the predetermined time interval, and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount.
In the technology, predetermined acoustic effects are applied to the singing voice signal according to the results of the marking process based on the singing voice signal, and it is possible to apply acoustic effects in real time according to the singing ability of the singer.
According to still another aspect of the invention, there is provided an acoustic processing system including a server apparatus and a client apparatus, in which the server apparatus includes a feature amount calculator that calculates a first pitch feature amount from a music acoustic signal for each predetermined time interval, and an information transmitter that transmits information based on the first pitch feature amount to a client apparatus, and the client apparatus includes an acoustic signal acquisition portion that acquires a target comparison acoustic signal, and a similarity acquisition portion that acquires a similarity between acoustic signals calculated by comparison between the first pitch feature amount and a second pitch feature amount calculated from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval.
The present technology is formed by a server apparatus and a client apparatus. A feature amount calculator and an information transmitter are provided in the server apparatus. The first pitch feature amount is calculated from the music acoustic signal for each predetermined time segment by the feature amount calculator. Information based on the first pitch feature amount is transmitted to the client apparatus by the information transmitter.
For example, the server apparatus may further include an acoustic signal receiver that receives a target comparison acoustic signal from the client apparatus; a second feature amount calculator that calculates a second pitch feature amount from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval; and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount, in which the information transmitter transmits the similarity to the client apparatus.
For example, the server apparatus may further include a feature amount receiver that receives a second pitch feature amount calculated from a target comparison acoustic signal for each time interval corresponding to the predetermined time interval from the client apparatus; and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount, in which the information transmitter transmits the similarity to the client apparatus.
An acoustic signal acquisition portion and a similarity acquisition portion are included in the client apparatus. The target comparison acoustic signal is acquired by the acoustic signal acquisition portion. A similarity between acoustic signals calculated by comparison between the first pitch feature amount and a second pitch feature amount calculated from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval is acquired by the similarity acquisition portion.
For example, the client apparatus may further include a feature amount calculator that calculates the second pitch feature amount from the target comparison acoustic signal; a feature amount receiver that receives the first pitch feature amount from a server apparatus; and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount, in which the similarity acquisition portion acquires the similarity from the similarity calculator.
For example, the client apparatus may further include a feature amount calculator that calculates the second pitch feature amount from the target comparison acoustic signal; a feature amount transmitter that transmits the first pitch feature amount to the server apparatus; and a similarity receiver that receives the similarity from the server apparatus, in which the similarity acquisition portion acquired the similarity from the similarity receiver.
In the present technology, a process of calculating the first pitch feature amount from the music acoustic signal is performed by at least the server apparatus, and it is possible to reduce the processing burden and the circuit scale of the user side apparatus.
According to the present invention, it is possible to mark karaoke using commercial music content. According to the present technology, it is possible to apply acoustic effects in real time according to the singing ability of the singer.
a) is a diagram showing an example of binarized signal intensity information of each time period for each time interval of a music acoustic signal.
a) is a diagram showing an example of signal intensity information of each time period for each time interval of a music acoustic signal.
Below, description will be given of embodiments for realizing the invention (below, referred to as “embodiments”). The description will be given in the following order.
1. Embodiments
2. Modification Examples
The microphone 11 configures the acquisition portion for the singing voice signal. The user (singer) inputs a singing voice matching accompaniment audio from the microphone 11, and the microphone 11 outputs a singing voice signal corresponding to the singing voice. The marking processing portion 12 performed a marking process based on the singing voice signal and outputs marking information showing a similarity.
The acoustic effect application portion 13 applies a predetermined acoustic effect to the singing voice signal output from the microphone 11 according to the marking information as a marking process result. The adder 14 adds the singing voice signal output from the acoustic effect application portion 13 to the accompaniment audio signal. The speaker 15 outputs audio (accompaniment audio, singing audio) by the output signal of the adder 14.
The pitch feature amount analyzer 111 analyzes the music acoustic signal, and calculates the pitch feature amount of the musical composition audio for each predetermined time interval. Here, the predetermined time interval, for example, is a comparatively short time interval such that the feature amount in the time interval is approximately constant, such as 20 msec and 40 msec. Here, the calculated acoustic feature amount is considered the signal intensity information for each time period or for each frequency of the music acoustic signal. The pitch feature amount analyzer 111 obtains time series data of the pitch feature amount of the music acoustic signal by calculating the above-described pitch feature amount in the all of the above-described predetermined time interval of the music acoustic signal.
The signal intensity information for each time period of the music acoustic signal, for example, is calculated using an autocorrelation function formula represented in the following expression (1).
R(T): autocorrelation with time difference (period) T
s(t): input time signal during time t
N: data number
The signal intensity information for each frequency of the music acoustic signal, for example, is calculated by performing a short-time Fourier transform. Below, obtaining the signal intensity information for each time period of the music acoustic signal with the pitch feature amount analyzer 111 will be described. However, although a detailed description will not be made, it is possible to obtain marking information by similar processing even in a case in which the signal intensity information is calculated for each frequency of the music acoustic signal in the pitch feature amount analyzer 111.
The pitch detector 113 calculates the pitch feature amount from the singing voice signal for each time segment corresponding to the above-described predetermined time interval. The time interval corresponding to the predetermined time interval may be the same as the predetermined time interval or may be different. That is, the time interval corresponding to the predetermined time interval does not necessarily correspond one-to-one to the predetermined time interval, and may have a correspondence relationship with the predetermined time interval. For example, the time interval corresponding to the predetermined time interval may be a time interval of an integer multiple of the predetermined time interval. Below, a one to one correspondence between the time interval and the predetermined time interval will be described. The pitch detector 113 obtains time series data of the pitch feature amount of the singing voice signal by calculating the above-described pitch feature amount in each time interval of the singing voice signal.
The calculated acoustic feature amount is considered time period information or period information of the singing voice signal. The time period information of the singing voice signal, for example, is calculated using the autocorrelation function formula represented by the above-described expression (1). In this case, the pitch detector 113 extracts a basic period showing a strong correlation value. The frequency information of the singing voice signal is calculated by performing a short-time Fourier transform. In this case, the pitch detector 113 extracts the lowest peak frequency in order that the power spectrum of the period signal hold a peak as an integer multiple of the basic frequency. After the frequency information of the singing voice signal is calculated, it is also possible to easily perform conversion of the frequency information to the above-described time period information. Below, obtaining the time period information of the singing voice signal with the pitch detector 113 will be described.
The singing voice marking portion 114 calculates the marking information indicating the similarity between acoustic signals by comparison of the pitch feature amount of the music acoustic signal obtained by the pitch feature amount analyzer 111 and the pitch feature amount of the singing voice signal obtained by the pitch detector 113. In the singing voice marking portion 114, the signal intensity information for each time period of the music acoustic signal obtained by the pitch feature amount analyzer 111 is used as is, or is binarized and used. It is possible to reduce the calculation amount through the binarizing. In the singing voice marking portion 114, the time period information of the singing voice signal obtained by the pitch detector 113 is used as is, or the time period information is further doubled and used. Here, the doubled time period is the ½ frequency, in terms of frequency.
A marking process example in the marking processing portion 12 will be described.
In marking process example 1, the signal intensity information for each time period is calculated as a pitch feature amount of the music acoustic signal for each predetermined time interval by the pitch feature amount analyzer 111, and binarized information of the signal intensity information is used in the singing voice marking portion 114. In marking process example 1, time period information that is the pitch feature amount of the singing voice signal for each predetermined time interval is calculated by the pitch detector 113, and the time period information thereof is used in the singing voice marking portion 114.
b) shows an example of binarized signal intensity information of each time period for each time interval of a music acoustic signal.
The flowchart in
Next, the marking processing portion 12, in Step ST4, calculates the time period information in the target time interval of the singing voice signal with the pitch detector 113 (refer to
The marking processing portion 12, in Step ST7, divides the score with the number of elapsed time intervals with the singing voice marking portion 114, setting the marking result (marking information). The marking processing portion 12, in Step ST8, determines whether the singing is finished. The marking processing portion 12, for example, determines the finish of singing when the user performs a finishing operation from an operation portion not shown in the drawings, or when the accompaniment audio finishes. When the singing is not finished, the marking processing portion 12 returns to the process in Step ST2, and moves to the process setting the next time interval to the target time interval. Meanwhile, when the singing is finished, the marking processing portion 12 immediately finishes the marking process in Step ST9.
In the marking process in the flowchart in
In marking process example 2, the signal intensity information for each time period is calculated as a pitch feature amount of the music acoustic signal for each predetermined time interval by the pitch feature amount analyzer 111, and the signal intensity information is used as is in the singing voice marking portion 114. In marking process example 2, time period information that is the pitch feature amount of the singing voice signal for each predetermined time interval is calculated by the pitch detector 113, and the time period information thereof is used in the singing voice marking portion 114.
b) shows an example of signal intensity information of each time period for each time interval of the music acoustic signal.
The flowchart in
Next, the marking processing portion 12, in Step ST14, adds signal intensity information of the time period which time period information shows calculated in Step ST13 from among the signal intensity information of each time period calculated in Step ST12 to the score with the singing voice marking portion 114. The marking processing portion 12, in Step ST15, divides the score with the number of elapsed time intervals with the singing voice marking portion 114, and sets the marking result (marking information).
Next, the marking processing portion 12, in Step ST16, determines whether the singing is finished. The marking processing portion 12, for example, determines the finish of singing when the user performs a finishing operation from an operation portion not shown in the drawings, or when the accompaniment audio finishes. When the singing is not finished, the marking processing portion 12 returns to the process in Step ST12, and moves to the process setting the next time interval to the target time interval. Meanwhile, when the singing is finished, the marking processing portion 12 immediately finishes the marking process in Step ST17.
In the marking process in the flowchart in
In marking process example 3, the signal intensity information for each time period is calculated as a pitch feature amount of the music acoustic signal for each predetermined time interval by the pitch feature amount analyzer 111, and binarized information of the signal intensity information is used in the singing voice marking portion 114. In marking process example 3, time period information that is the pitch feature amount of the singing voice signal for each predetermined time interval is calculated by the pitch detector 113, and the doubled time period information thereof is used in the singing voice marking portion 114 along with the time period information thereof.
b) shows an example of binarized signal intensity information of each time period for each time interval of a music acoustic signal.
The flowchart in
Next, the marking processing portion 12, in Step ST24, calculates the time period information in the target time interval of the singing voice signal with the pitch detector 113 (refer to
When the signal intensity information is “0” in Step ST25, the marking processing portion 12, in Step ST28, determines whether the time period that is double the time period which the time period information calculated in Step 24 shows, that is, the time period of one octave lower is “1”. When the signal intensity information is “1”, the marking processing portion 12, in Step ST26, adds one point to the score with the singing voice marking portion 114, and thereafter, moves to the process in Step ST27. Meanwhile, when the signal intensity information is “0”, the marking processing portion 12 moves immediately to the process in Step ST27.
The marking processing portion 12, in Step ST27, divides the score with the number of elapsed time intervals with the singing voice marking portion 114, and sets the marking result (marking information). The marking processing portion 12, in Step ST29, determines whether the singing is finished. The marking processing portion 12, for example, determines the finish of singing when the user performs a finishing operation from an operation portion not shown in the drawings, or when the accompaniment audio finishes. When the singing is not finished, the marking processing portion 12 returns to the process in Step ST22, and moves to the process setting the next time interval to the target time interval. Meanwhile, when the singing is finished, the marking processing portion 12 immediately finishes the marking process in Step ST30.
In the marking process in the flowchart in
The operation of the marking processing portion 12 shown in
[Reducing Process of Accompaniment Audio Creeping from Speaker]
It is assumed that singing is performed while accompaniment audio is output in a space by the music acoustic signal. In this case, an additional processing configuration such as shown in
The music acoustic signal is supplied to a song vocal cancellation processing portion 121, in addition to the pitch feature amount analyzer 111. In the song vocal cancellation processing portion 121, the vocal signal is canceled from the music acoustic signal, and the accompaniment acoustic signal is obtained. The accompaniment audio signal is supplied to the speaker 122, and the accompaniment audio is output from the speaker 122.
To the microphone 123, the singing voice is input, and accompaniment audio creeping from the speaker 122 is also input. Therefore, for the output signal of the microphone 123, an echo signal due to the accompaniment audio is added to the singing voice signal. For the echo estimating portion 125, the space propagation characteristics (echo characteristics) between the speaker and microphone are realized by adaptive filter process or the like, and an echo signal corresponding to the echo signal included in the singing voice signal is generated based on the accompaniment audio signal or the like. In the adder 124, the echo signal generated by the echo estimating portion 125 is subtracted from the output signal of the microphone 123. The singing voice signal from which the echo signal is removed is output from the adder 124, and input to the pitch detector 113.
In the additional processing configuration such as shown in
Configuring the marking processing portion 12 shown in
The server apparatus 12B includes a pitch feature amount analyzer 111 and a pitch feature amount transmitter 131. The pitch feature amount analyzer 111 calculates the pitch feature amount of the music acoustic signal for each predetermined time interval, for example, the signal intensity information of each time period by analyzing the music acoustic signal. The pitch feature amount transmitter 131 transmits time series data of the pitch feature amount obtained with the pitch feature amount analyzer 111 to the client apparatus 12A. Although the instruction path is not shown, an analysis instruction is transmitted from the client apparatus 12A to the server apparatus 12B before singing, and analysis of the music acoustic signal is begun in the server apparatus 12B based on the analysis instruction.
The client apparatus 12A includes a pitch detector 113, a singing voice marking portion 114, and a pitch feature amount receiver 132. The pitch detector 113 calculates a pitch feature amount of the singing voice signal for each predetermined time interval, for example, time period information from the singing voice signal. The voice feature amount receiver 132 received time series data of the pitch feature amount that is transmitted from the server apparatus 12B. The singing voice marking portion 114 calculates and outputs the marking information indicating the similarity between acoustic signals by comparison of the pitch feature amount of the music acoustic signal received by the pitch feature amount receiver 132 and the pitch feature amount of the singing voice signal obtained by the pitch detector 113.
The operation of the marking processing portion 12 (configuration example 1) shown in
In the client apparatus 12A, singing by the client (user) is begun. The pitch feature amount of the singing voice signal for each predetermined time interval, for example, time period information is calculated from the singing voice signal by the pitch detector 113. In the client apparatus 12A, the marking information is calculated by the singing voice marking portion 114 by comparison of the pitch feature amount of the music acoustic signal received by the pitch feature amount receiver 132 and the pitch feature amount of the singing voice signal obtained by the pitch detector 113. In so doing, acquisition of the marking information is performed with the client apparatus 12A.
In the marking processing portion 12 (configuration example 1) shown in
The server apparatus 12B includes the pitch feature amount analyzer 111, the pitch detector 113, the singing voice marking portion 114, a voice signal receiver 142, and a marking information transmitter 143. The pitch feature amount analyzer 111 calculates the pitch feature amount of the music acoustic signal for each predetermined time interval, for example, the signal intensity information of each time period by analyzing the music acoustic signal. Although the instruction path is not shown, an analysis instruction is transmitted from the client apparatus 12A to the server apparatus 12B before singing, and analysis of the music acoustic signal is begun in the server apparatus 12B based on the analysis instruction.
The voice signal receiver 142 receives the singing voice signal transmitted from the client apparatus 12A. The pitch detector 113 calculates the pitch feature amount of the singing voice signal for each predetermined time interval, for example, the time period information from the singing voice signal received by the voice signal receiver 142. The singing voice marking portion 114 calculates the marking information indicating the similarity between acoustic signals by comparison of the pitch feature amount of the music acoustic signal obtained by the pitch feature amount analyzer and the pitch feature amount of the singing voice signal obtained by the pitch detector 113. The marking information transmitter 143 transmits the marking information calculated with the singing voice marking portion 114 to the client apparatus 12A.
The client apparatus 12A includes the voice signal transmitter 141 and the marking information receiver 144. The voice signal transmitter 141 transmits the singing voice signal to the server apparatus 12B. The marking information receiver 144 receives the marking information transmitted from the server apparatus 12B.
The operation of the marking processing portion 12 (configuration example 2) shown in
In the client apparatus 12A, singing by the client (user) is begun. The singing voice signal is transmitted from the voice signal transmitter 141 of the client apparatus 12A and received by the voice signal receiver 142 of the server apparatus 12B. In the server apparatus 12B, the pitch feature amount of the singing voice signal for each predetermined time interval, for example, the time period information is calculated from the singing voice signal received in this way by the pitch detector 113.
In the server apparatus 12B, the marking information is calculated by the singing voice marking portion 114 by comparison of the pitch feature amount of the music acoustic signal obtained by the pitch feature amount analyzer 111 and the pitch feature amount of the singing voice signal obtained by the pitch detector 113. The marking information calculated in this way is transmitted from the marking information transmitter 143 of the server apparatus 12B, received by the marking information receiver 144 of the client apparatus 12A, and acquisition of the marking information is performed by the client apparatus 12A.
In the marking processing portion 12 (configuration example 2) shown in
The server apparatus 12B includes the pitch feature amount analyzer 111, the singing voice marking portion 114, a pitch feature amount receiver 152, and a marking information transmitter 153. The pitch feature amount analyzer 111 calculates the pitch feature amount of the music acoustic signal for each predetermined time interval, for example, the signal intensity information of each time period by analyzing the music acoustic signal. Although the instruction path is not shown, an analysis instruction is transmitted from the client apparatus 12A to the server apparatus 12B before singing, and analysis of the music acoustic signal is begun in the server apparatus 12B based on the analysis instruction.
The pitch feature amount receiver 152 receives time series data of the pitch feature amount of the singing voice signal transmitted from the client apparatus 12A. The singing voice marking portion 114 calculates the marking information indicating the similarity between acoustic signals by comparison of the pitch feature amount of the singing voice signal received by the pitch feature amount receiver 152 and the pitch feature amount of the music acoustic signal obtained with the pitch feature amount analyzer 111. The marking information transmitter 153 transmits the marking information calculated with the singing voice marking portion 114 to the client apparatus 12A.
The client apparatus 12A includes the pitch detector 113, the pitch feature amount transmitter 151 and the marking information receiver 154. The pitch detector 113 calculates a pitch feature amount of the singing voice signal for each predetermined time interval, for example, time period information from the singing voice signal. The pitch feature amount transmitter 151 transmits time series data of the pitch feature amount obtained with the pitch detector 113 to the server apparatus 12B. The marking information receiver 154 receives the marking information transmitted from the server apparatus 12B.
The operation of the marking processing portion 12 (configuration example 3) shown in
In the client apparatus 12A, singing by the client (user) is begun. In the client apparatus 12A, the pitch feature amount of the singing voice signal for each predetermined time interval, for example, the time period information is calculated by the pitch detector 113. The time series data of the pitch feature amount of the singing voice signal is transmitted from the pitch feature amount transmitter 151 of the client apparatus 12A, and received by the pitch feature amount receiver 152 of the server apparatus 12B.
In the server apparatus 12B, the marking information is calculated by the singing voice marking portion 114 by comparison of the pitch feature amount of the music acoustic signal obtained by the pitch feature amount analyzer 111 and the pitch feature amount of the singing voice signal received by the pitch feature amount receiver 152. The marking information calculated in this way is transmitted from the marking information transmitter 153 of the server apparatus 12B, received by the marking information receiver 154 of the client apparatus 12A, and acquisition of the marking information is performed by the client apparatus 12A.
In the marking processing portion 12 (configuration example 3) shown in
As described above, in the marking processing portion 12 shown in
The configuration example of the marking processing portion 12 showing
The marking processing portion 12 includes a correct answer data delivery portion 161, the pitch detector 162, and a singing voice marking portion 163. The pitch detector 162 detects pitch information of the singing voice signal for each predetermined time interval (short time interval), and inputs the information to the singing voice marking portion 163. Here, the pitch information is the basic frequency obtained by analyzing the periodicity of the singing voice signal for each short time interval, or is converted to a pitch name by quantization thereof.
The correct answer data delivery portion 161 delivers the correct answer data to the singing voice marking portion 163 while taking the pitch information and the time synchronization. Here, the correct answer data is the basic frequency that is a model or is converted to a pitch name by quantization thereof that is included in the time series data. The singing voice marking portion 163 compares the pitch information and the correct answer information, performs scoring according to a match or the closeness of the value thereof, and obtains the marking information.
The operation of the marking processing portion 12 shown in
The harmony application portion 173 inputs the singing voice signal S1, and generates the harmony application signal S3 by applying a converted signal to the key (for example, a third or a fifth) that is matched when synthesized with the singing voice signal S1. The addition portion 174 adds the reverberation application signal S2 generated with the reverberation application portion 172 to the harmony application signal S3 generated with the harmony application portion 173, and a reverberation and harmony application signal S4 is obtained.
The acoustic effect application determining portion 171 performed a threshold determining process, as below, with respect to the marking information obtained by the marking processing portion 12 (refer to
The acoustic effect application determining portion 171 is set such that the connection of the switch portion 175 switches to terminal a when the score<α, and the input singing voice signal S1 is output as an output singing voice signal S5. The acoustic effect application determining portion 171 is set such that the connection of the switch portion 175 switches to the b terminal when α≦score<β, and the reverberation application signal S2 is output as the output singing voice signal S5. Furthermore, the acoustic effect application determining portion 171 is set such that the connection of the switch portion 175 switches to the c terminal when β≦score, and the reverberation and harmony application signal S4 is output as the output singing voice signal S5.
The operation of the acoustic effect application portion 13 shown in
In the harmony application portion 173, the harmony application signal S3 is generated by a converted signal being applied to the key (for example, a third or a fifth) that is matched with the singing voice signal S1 when synthesized therewith. The harmony application signal S3 and the above-described reverberation application signal S2 are added with the addition portion 174, and the harmony and reverberation application signal S4 is obtained. The harmony and reverberation application signal S4 is supplied to the c terminal of the switch portion 175.
The marking information is supplied to the acoustic effect application determining portion 171. In the acoustic effect application determining portion 171, a threshold determining process is performed with respect to the marking information, and switching of the switch portion 175 is controlled. When the score<α, and the score is low, the connection of the switch portion 175 is switched to the a terminal, and the input singing voice signal S1 is set as is to the output singing voice signal S5. When α≦score<β, and the score is intermediate, the connection of the switch portion 175 is switched to the b terminal, and the reverberation application signal S2 is set to the output singing voice signal S5. When β≦score, and the score is high, the connection of the switch portion 175 is switched to the c terminal, and the reverberation and harmony application signal S4 is set to the output singing voice signal S5.
In the acoustic effect application portion 13 shown in
The acoustic effect application portion 13 shown in the above-described
For example, setting the marking information (score) to SCORE (maximum 100 points), setting α and β (where α<β) as thresholds, and adding the effect as below to the singing voice signal are considered.
(1) If SCORE<α, the output singing voice signal S5 is switched to the input singing voice signal S1.
(2) If α≦SCORE<β, the output singing voice signal S5 is switched to the reverberation application signal S2 in which the intensity of the reverberation is controlled according to the SCORE as below. In this case, when the intensity of the reverberation is set to RLev (0 to 1.0), RLev=SCORE÷100.
(3) If β≦SCORE, the output singing voice signal S5 is switched to the reverberation and harmony application signal S3 in which the intensity of the reverberation and the harmony is controlled according to the SCORE as below. In this case, when the intensity of the reverberation is set to RLev (0 to 1.0), RLev=SCORE÷100. In this case, when the intensity of the harmony is set to HLev (0 to 1.0), HLev=SCORE÷100.
The acoustic effect application portion 13 shown in the above-described
(1) If SCORE<α, the output singing voice signal S5 is switched to the reverberation application signal S2 in which the intensity of the reverberation is controlled according to the SCORE as below. In this case, when the intensity of the reverberation is set to RLev (0 to 1.0), RLev=(100−SCORE)÷100.
(2) If α≦SCORE, switching is performed to the reverberation and harmony application signal S3 in which the intensity of the reverberation and the harmony is controlled according to the SCORE as below. In this case, when the intensity of the reverberation is set to RLev (0 to 1.0), RLev=(100−SCORE)÷100. In this case, when the intensity of the harmony is set to HLev (0 to 1.0), HLev=SCORE÷100.
By adding echo (reverb) as the score decreases through this control, it is possible to cover off-key singers. Since harmony is discomforting in the case of an off-key singer, the intensity is suppressed as the score decreases.
As described above, in the karaoke apparatus 10 in
In the above-described embodiment, although an example in which the target comparison acoustic signal is the singing voice signal is shown, the present technology is not limited thereto, and cases of other acoustic signals, for example, musical instrument performance signals, or the like, are considered.
In the above-described embodiments, although description was made assuming a case of a single person singing as the singing voice signal, it is possible to perform the same marking process with respect to the singing voice signal in a case of two people singing in, for example, a duet piece. Naturally, three or more people is also possible.
In addition, in the above-described embodiments, it may not necessary to perform the process in which the pitch feature amount of the musical composition audio is obtained from the music acoustic signal by the pitch feature amount analyzer 111 in real time matching the singing, and the process may be performed in advance.
Here, the present technology may also adopt the following configuration.
(1) An acoustic processing apparatus including a first feature amount calculator that calculates a first pitch feature amount from a music acoustic signal for each predetermined time interval; a second feature amount calculator that calculates a second pitch feature amount from a target comparison acoustic signal for each time interval corresponding to the predetermined time interval; and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount.
(2) The acoustic processing apparatus according to (1), in which the target comparison acoustic signal is a singing voice signal.
(3) The acoustic processing apparatus according to (2), further including an acoustic effect application portion that applies a predetermined acoustic effect to the singing voice signal according to the similarity.
(4) The acoustic processing apparatus according to any one of (1) to (3), in which the first feature amount calculator calculates signal intensity information for each time period or each frequency of the music acoustic signal as a first pitch feature amount, and the second feature amount calculator calculates a time period or frequency of each signal component included in the target comparison acoustic signal as a second pitch feature amount.
(5) The acoustic processing apparatus according to (4), in which the similarity calculator binarizes and uses the signal intensity information as the first pitch feature amount.
(6) The acoustic processing apparatus according to (4) or (5), in which the similarity calculator uses, in addition to the time period or the frequency as the second pitch feature amount, a time period that is double the time period or a frequency that is ½ the frequency.
(7) An acoustic processing method including the steps of calculating a first pitch feature amount from a music acoustic signal for each predetermined time interval; calculating a second pitch feature amount from a target comparison acoustic signal for each time interval corresponding to the predetermined time interval; and calculating a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount.
(8) A program causing a computer to function as first feature amount calculating means for calculating a first pitch feature amount from a music acoustic signal for each predetermined time interval; second feature amount calculating means for calculating a second pitch feature amount from a target comparison acoustic signal for each time interval corresponding to the predetermined time interval; and similarity calculating means for calculating a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount.
(9) An electronic apparatus including an accompaniment audio output portion that performs output of accompaniment audio according to a music acoustic signal; an acoustic signal acquisition portion that acquires a target comparison acoustic signal; and a signal processing portion that performs comparison processing between the target comparison acoustic signal and the music acoustic signal, in which the signal processing portion includes a first feature amount calculator that calculates a first pitch feature amount from the music acoustic signal for each predetermined time interval, a second feature amount calculator that calculates a second pitch feature amount from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval, and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount.
(10) An acoustic processing apparatus including a marking processing portion that performs a marking processes based on a singing voice signal; and an acoustic effect application portion that applies a predetermined acoustic effect to the singing voice signal according to a result of the marking process.
(11) The acoustic processing apparatus according to (10), in which the marking processing portion performs the marking process by calculating a similarity between a music acoustic signal and the singing voice signal.
(12) The acoustic processing apparatus according to (11), in which the marking processing portion includes a first feature amount calculator that calculates a first pitch feature amount from the music acoustic signal for each predetermined time interval, a second feature amount calculator that calculates a second pitch feature amount from the singing voice signal for each time interval corresponding to the predetermined time interval, and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount.
(13) A server apparatus including a first feature amount calculator that calculates a first pitch feature amount from a music acoustic signal for each predetermined time interval; and an information transmitter that transmits information based on the first pitch feature amount to a client apparatus.
(14) The server apparatus according to (13) further including an acoustic signal receiver that receives a target comparison acoustic signal from the client apparatus; a second feature amount calculator that calculates a second pitch feature amount from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval; and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount, in which the information transmitter transmits the similarity to the client apparatus.
(15) The server apparatus according to (13) further including a feature amount receiver that receives a second pitch feature amount calculated from a target comparison acoustic signal for each time interval corresponding to the predetermined time interval from the client apparatus; and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount, in which the information transmitter transmits the similarity to the client apparatus.
(16) A client apparatus including an acoustic signal acquisition portion that acquires a target comparison acoustic signal; and a similarity acquisition portion that acquires a similarity between acoustic signals calculated by comparison between a first pitch feature amount calculated from a music acoustic signal for each predetermined time interval and a second pitch feature amount calculated from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval.
(17) The client apparatus according to (16) further including a feature amount calculator that calculates the second pitch feature amount from the target comparison acoustic signal; a feature amount receiver that receives the first pitch feature amount from a server apparatus; and a similarity calculator that calculates a similarity between acoustic signals by comparison of the first pitch feature amount and the second pitch feature amount, in which the similarity acquisition portion acquires the similarity from the similarity calculator.
(18) The client apparatus according to (16) further including a feature amount calculator that calculates the second pitch feature amount from the target comparison acoustic signal; a feature amount transmitter that transmits the first pitch feature amount to a server apparatus; and a similarity receiver that receives the similarity from the server apparatus, in which the similarity acquisition portion acquired the similarity from the similarity receiver.
(19) An acoustic processing system including a server apparatus and a client apparatus, in which the server apparatus includes a feature amount calculator that calculates a first pitch feature amount from a music acoustic signal for each predetermined time interval, and an information transmitter that transmits information based on the first pitch feature amount to a client apparatus, and the client apparatus includes an acoustic signal acquisition portion that acquires a target comparison acoustic signal, and a similarity acquisition portion that acquires a similarity between acoustic signals calculated by comparison between the first pitch feature amount and a second pitch feature amount calculated from the target comparison acoustic signal for each time interval corresponding to the predetermined time interval.
Number | Date | Country | Kind |
---|---|---|---|
2012-032139 | Feb 2012 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2013/050722 | 1/17/2013 | WO | 00 |