The present invention relates to an electronic musical instrument, a musical sound generating method of an electronic musical instrument, and a storage medium.
Heretofore, technologies of musical piece playing devices that enable a singing voice sound to be played using keyboard operation elements or the like have been proposed (for example, technology disclosed in Patent Document 1). In this related art, a so-called vocoder technology is proposed in which the voice sound level of each frequency band in an input voice sound (modulator signal) is measured using a plurality of band pass filter groups (analysis filter group, vocal tract analysis filters) having different center frequencies from each other, electronic sounds (carrier signals) played using keyboard operation elements are passed through a plurality of band pass filter groups (reproduction filter group, vocal tract reproduction filters) having different center frequencies from each other, and the output level of each band pass filter is controlled on the basis of the measured voice sound levels. With this vocoder technology, the sounds played using the keyboard operation elements are changed to sounds that resemble those made when a person talks.
In addition, heretofore, as a voice sound generation method for generating a person's voice, a technology has also been known in which a person's voice is imitated by inputting a continuous waveform signal that determines the pitch through a filter (vocal tract filter) that models the vocal tract of a person.
Furthermore, a sound source technology of an electronic musical instrument is also known in which a physical sound source is used as a device that enables wind instrument sounds or string instrument sounds to be played using keyboard operation elements or the like. This related art is a technology called a waveguide and enables musical instrument sounds to be generated by imitating the changes in the vibration of a string or air using a digital filter.
Patent Document 1: Japanese Patent Application Laid-Open Publication No. 2015-179143
However, in the above-described related art, although the waveform of a sound source can approximate a person's voice or a natural musical instrument, the pitch (pitch change) of the output sound is determined in a uniform manner using an electronic sound (carrier signal or excited signal) having a constant pitch based on the pitch played using a keyboard operation element, and therefore a pitch change is monotone and does not reflect reality. Accordingly, the present invention is directed to a scheme that substantially obviates one or more of the problems due to limitations and disadvantages of the related art.
Accordingly, an object of the present invention is to reproduce not only formant changes, which are characteristics of an input voice sound, but to also reproduce pitch changes of the input voice sound.
Additional or separate features and advantages of the invention will be set forth in the descriptions that follow and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
To achieve these and other advantages and in accordance with the purpose of the present invention, as embodied and broadly described, in one aspect, the present disclosure provides an electronic musical instrument including: a memory that stores, before performance of a musical piece on the electronic musical instrument by a performer begins, pitch variation data that represents differences between fundamental tone frequencies of notes in a melody of the musical piece and fundamental tone frequencies of notes in prescribed singing voice waveform data, the prescribed singing voice waveform data representing or simulating a singing voice that is generated when a person actually sings the melody of the musical piece; and a sound source that outputs a pitch-adjusted carrier signal to be received by a waveform synthesizing device that generates synthesized waveform data based on the pitch-adjusted carrier signal, the pitch-adjusted carrier signal being generated on the basis of the pitch variation data acquired from the memory and performance instruction pitch data that represent pitches specified by the performer during the performance of the musical piece on the electronic musical instrument, the pitch-adjusted carrier signal being generated even when the performer does not sing after performance of the musical piece begins.
In another aspect, the present disclosure provides a method performed by an electronic musical instrument that includes: a memory that stores, before performance of a musical piece on the electronic musical instrument by a performer begins, pitch variation data that represents differences between fundamental tone frequencies of notes in a melody of the musical piece and fundamental tone frequencies of notes in prescribed singing voice waveform data, the prescribed singing voice waveform data representing or simulating a singing voice that is generated when a person actually sings the melody of the musical piece, and a plurality of pieces of amplitude data that represent characteristics of the singing voice generated on the basis of the prescribed singing voice waveform data and that respectively correspond to a plurality of frequency bands; a sound source; and a waveform synthesizing device, the method including: causing the sound source to output a pitch-adjusted carrier signal generated on the basis of the pitch variation data acquired from the memory and performance instruction pitch data that represent pitches specified by the performer during the performance of the musical piece on the electronic musical instrument, the pitch-adjusted carrier signal being generated even when the performer does not sing after performance of the musical piece begins; and causing the waveform synthesizing device to modifies the pitch-adjusted carrier signal in accordance with the plurality of pieces of amplitude data acquired from the memory so as to generate and output synthesized waveform data.
In another aspect, the present disclosure provides a non-transitory computer-readable storage medium having stored thereon a program executable by an electronic musical instrument that includes: a memory that stores, before performance of a musical piece on the electronic musical instrument by a performer begins, pitch variation data that represents differences between fundamental tone frequencies of notes in a melody of the musical piece and fundamental tone frequencies of notes in prescribed singing voice waveform data, the prescribed singing voice waveform data representing or simulating a singing voice that is generated when a person actually sings the melody of the musical piece, and a plurality of pieces of amplitude data that represent characteristics of the singing voice generated on the basis of the prescribed singing voice waveform data and that respectively correspond to a plurality of frequency bands; a sound source; and a waveform synthesizing device, the program causing the electronic musical instrument to perform the following: causing the sound source to output a pitch-adjusted carrier signal generated on the basis of the pitch variation data acquired from the memory and performance instruction pitch data that represent pitches specified by the performer during the performance of the musical piece on the electronic musical instrument, the pitch-adjusted carrier signal being generated even when the performer does not sing after performance of the musical piece begins; and causing the waveform synthesizing device to modifies the pitch-adjusted carrier signal in accordance with the plurality of pieces of amplitude data acquired from the memory so as to generate and output synthesized waveform data.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory, and are intended to provide further explanation of the invention as claimed.
The present invention will be more understood with reference to the following detailed descriptions with the accompanying drawings.
Hereafter, embodiments for carrying out the present invention will be described in detail while referring to the drawings.
The memory 101 stores: second amplitude data 111, which is time series data of amplitudes, which respectively correspond to a plurality of frequency bands of tones (notes) included in singing voice waveform data (voice sound data) of an actually sung musical piece; pitch variation data 112, which is time series data representing differences between the fundamental tone frequencies of vowel segments of tones (notes) included in a melody (e.g., model data) of singing of a musical piece (the term “fundamental tone frequency” used in the present specification means the frequency of a fundamental tone or the fundamental frequency of a fundamental tone) and fundamental tone frequencies of vowel segments of tones included in the singing voice waveform data; and consonant amplitude data 113, which is time series data corresponding to consonant segments of the tones of the singing voice waveform data. The second amplitude data 111 is time series data used to control the gains of band pass filters of a band pass filter group of the vocoder demodulation device 104 that allows a plurality of frequency band components to pass therethrough. The pitch variation data 112 is data obtained by extracting, in time series, difference data between fundamental tone frequency data of pitches (e.g., model pitches) that are set in advance for vowel segments of tones included in a melody, and fundamental tone frequency data of vowel segments of tones included in singing voice waveform data obtained from actual singing. The consonant amplitude data 113 is a time series of noise amplitude data of consonant segments of tones included in the singing voice waveform data.
The keyboard operation elements 102 input, in time series, performance specified pitch data (performance instruction pitch data) 110 that represents pitches specified by a user via performance operations performed by the user.
As pitch change processing, the microcomputer 107 generates a time series of changed (adjusted) pitch data 115 by changing the time series of the performance specified pitch data 110 input from the keyboard operation elements 102 on the basis of a time series of the pitch variation data 112 sequentially input from the memory 101.
Next, as first output processing, the microcomputer 107 outputs the changed pitch data 115 to the sound source 103, and generates a time series of key press/key release instructions 114 corresponding to key press and key release operations of the keyboard operation elements 102 and outputs the generated time series of key press/key release instructions 114 to the sound source 103.
On the other hand, as noise generation instruction processing, in consonant segments of the tones included in the singing voice waveform data corresponding to operations of the keyboard operation elements 102, for example, in prescribed short time segments preceding the sound generation timings of the tones, the microcomputer 107 outputs, to a noise generator 106, the consonant amplitude data 113 sequentially read from the memory 101 at the timings of the consonant segments instead of outputting the pitch variation data 112 to the sound source 103.
In addition, as a part of amplitude changing processing, the microcomputer 107 reads out, from the memory 101, a time series of a plurality of pieces of the second amplitude data 111 respectively corresponding to a plurality of frequency bands of tones included in the singing voice waveform data and outputs the times series to the vocoder demodulation device 104.
The sound source 103 outputs, as pitch-changed (pitch-adjusted) first waveform data 109, waveform data having pitches corresponding to fundamental tone frequencies corresponding to the changed pitch data 115 input from the microcomputer 107 while controlling starting of sound generation and stopping of sound generation on the basis of the key press/key release instructions 114 input from the microcomputer 107 through control realized by the first output processing performed by the microcomputer 107. In this case, the sound source 103 operates as an oscillator that oscillates the pitch-changed first waveform data 109 as a carrier signal for exciting the vocoder demodulation device 104 connected in the subsequent stage. Therefore, the pitch-changed first waveform data 109 includes a triangular-wave harmonic frequency component often used as a carrier signal or a harmonic frequency component of an arbitrary musical instrument in vowel segments of the tones included in the singing voice waveform data, and is a continuous waveform that repeats at a pitch corresponding to the changed pitch data 115.
In addition, in a consonant segment that exists at the start time and so on of the sound generation timing of each tone of the singing voice waveform data, before the sound source 103 performs outputting, the noise generator 106 (or consonant waveform generator) generates consonant noise (for example, white noise) having an amplitude corresponding to the consonant amplitude data 113 input from the microcomputer 107 through control realized by the above-described noise generation instruction processing performed by the microcomputer 107 and superimposes the consonant noise on the pitch-changed first waveform data 109 as consonant segment waveform data.
Through control of the above-described amplitude changing processing performed by the microcomputer 107, the vocoder demodulation device (can also be referred to as an output device, a voice synthesizing device, or a waveform synthesizing device, instead of a vocoder demodulation device) 104 changes a plurality of pieces of first amplitude data, which are obtained from the pitch-changed first waveform data 109 output from the sound source 103 and respectively correspond to a plurality of frequency bands, on the basis of the plurality of pieces of second amplitude data 111 output from the microcomputer 107 and respectively corresponding to a plurality of frequency bands of tones included in the singing voice waveform data. In this case, the vocoder demodulation device 104 is excited by consonant noise data included in the pitch-changed first waveform data 109 in a consonant segment of each tone of the singing voice waveform data described above, and is excited by the pitch-changed first waveform data 109 having a pitch corresponding to the changed pitch data 115 in the subsequent vowel segment of each tone.
Next, as second output processing specified by the microcomputer 107, the vocoder demodulation device 104 outputs second waveform data (synthesized waveform data) 116, which is obtained by changing each of the plurality of pieces of first amplitude data, to the sound system 105, and the data is then output from the sound system 105 as sound.
The switch group 108 functions as an input unit that inputs various instructions to the microcomputer 107 when a user takes a lesson regarding (learns) a musical piece.
The microcomputer 107 executes overall control of the electronic musical instrument 100. Although not specifically illustrated, the microcomputer 107 is a microcomputer that includes a central arithmetic processing device (CPU), a read-only memory (ROM), a random access memory (RAM), an interface circuit that performs input and output to and from the units 101, 102, 103, 104, 106, and 108 in
The above-described electronic musical instrument 100 is able to produce sound by outputting the second waveform data 116 which is obtained by adding the nuances of a person's singing voice to the pitch-changed first waveform data 109 of a melody, musical instrument sound etc. that reflects the nuances of pitch variations of a singing voice generated by the sound source 103.
In addition, the vocoder demodulation device 104 includes a multiplier group 202 that is composed of a plurality of multipliers (x #1 to x # n) that respectively multiply the first amplitude data 204 (#1 to # n) output from the band pass filters (BPF #1, BPF #2, BPF #3, . . . , BPF # n) by the values of the #1 to # n pieces of second amplitude data 111 input from the microcomputer 107.
Furthermore, the vocoder demodulation device 104 includes an adder 203 that adds together the outputs from the multipliers (x #1 to x # n) of the multiplier group 202 and outputs the second waveform data 116 in
The above-described vocoder demodulation device 104 in
The vocoder modulation device 401 receives singing voice waveform data 404 obtained from a microphone when the melody of a certain musical piece is sung in advance, generates the second amplitude data group 111, and stores the generated second amplitude data group 111 in the memory 101 in
The pitch detector 402 extracts a fundamental tone frequency (pitch) 406 of the vowel segment of each tone from the singing voice waveform data 404 based on the actual singing of the melody described above.
The subtractor 403 calculates a time series of the pitch variation data 112 by subtracting fundamental tone frequencies 405, which are for example model fundamental tone frequencies set in advance for the vowel segments of the tones included in the melody, from the fundamental tone frequencies 406 of the vowel segments of the tones included in the singing voice waveform data 404 based the above-described actual singing of the melody extracted by the pitch detector 402.
The consonant detector 407 determines segments of the singing voice waveform data 404 where tones exist but the pitch detector 402 did not detect fundamental tone frequencies 406 to be consonant segments, calculates the average amplitude of each of these segments, and outputs these values as the consonant amplitude data 113.
Furthermore, the vocoder modulation device 401 includes an envelope follower group 502 that is composed of a plurality of envelope followers (EF #1, EF #2, EF #3, . . . , EF # n). The envelope followers (EF #1, EF #2, EF #3, . . . , EF # n) respectively extract envelope data of changes over time in the outputs of the band pass filters (BPF #1, BPF #2, BPF #3, . . . , BPF # n), sample the respective envelope data every fixed period of time (for example, 10 msec), and output the resulting data as the pieces of second amplitude data 111 (#1 to # n). The envelope followers (EF #1, EF #2, EF #3, . . . , EF # n) are for example low pass filters that calculate the absolute values of the amplitudes of the outputs of the band pass filters (BPF #1, BPF #2, BPF #3, . . . , BPF # n), input these calculated values, and allow only sufficiently low frequency components to pass therethrough in order to extract envelope characteristics of changes over time.
The singing voice waveform data 404 may be data obtained by storing the singing voice sung by a person in the memory 101 in advance before a performer performs the musical piece by specifying the operation elements, or may be data obtained by storing singing voice data output by a mechanism using a voice synthesis technology in the memory 101.
When a user instructs starting of a lesson using the switch group 108 in
When the determination made in step S801 is YES, a sound generation start (note on) instruction is output to the sound source 103 in
Next, it is determined whether there is a key release (step S803).
When the determination made in step S803 is YES, a sound production stop (note off) instruction is output to the sound source 103 in
After that, the keyboard processing of step S701 in
Next, a pitch change instruction based on the changed pitch data 115 is issued to the sound source 103 (step S902). After that, the pitch updating processing of step S702 in
The outputs of the multipliers inside the multiplier group 202 in
According to the above-described embodiment, the second waveform data 116 can be obtained in which the nuances of pitch variations in the singing voice waveform data 404 obtained from a singing voice singing a melody are reflected in the pitch-changed first waveform data 109 in
In addition, although a filter group (analysis filters, vocal tract analysis filters) is used to reproduce the formant of a voice with the aim of playing a singing voice using keyboard operation elements in this embodiment, if the present invention were applied to a configuration in which a natural musical instrument such as a wind instrument or string instrument is modeled using a digital filter group, a performance that is closer to the expression of the natural musical instrument could be realized by imitating the pitch variations of the wind instrument or string instrument in accordance with operation of the keyboard operation elements.
A method may also be considered in which a lyric voice recorded in advance is built in as pulse code modulation (PCM) data and this voice is then produced, but with this method, there is a large amount of voice sound data and producing sound with an incorrect pitch when a performer makes a mistake while playing is comparatively difficult. Additionally, there is a method in which lyric data is built in and a voice signal obtained through voice synthesis based on this data is output as sound, but this method has disadvantages that large amounts of calculation and data are necessary in order to perform voice synthesis and therefore real time control is difficult.
Since the need for an analysis filter group can be eliminated by performing synthesis using a vocoder method and analyzing amplitude changes at each frequency in advance in this embodiment, the circuit scale, calculation amount, and data amount can be reduced compared with the case in which the data is built in as PCM data. In addition, in the case where a lyric voice is stored in the form of PCM voice sound data and an incorrect keyboard operation element 102 is played, it is necessary to perform pitch conversion in order to make the voice match the pitch specified by an incorrect keyboard operation element when by the user, whereas when the vocoder method is adopted, the pitch conversion can be performed by simply changing the pitch for the carrier and therefore there is also the advantage that this method is simple.
Through cooperative control with the microcomputer 107, the vocoder demodulation device 104 in
In the above-described embodiment, the voice spectrum envelope data and the pitch variation data corresponding to a lyric voice of a musical piece are stored in advance in the memory 101.
Furthermore, the pitch variation data is added to the pitch for each key press in the above-described embodiment, but sound production may instead be carried out by using pitch variation data in note transition periods between key presses.
In addition, in the above-described embodiment, the microcomputer 107 generates the changed pitch data 115 by adding the pitch variation data 112 itself read out from the memory 101 to the performance specified pitch data 110 of a pitch corresponding to a key press in the pitch updating processing in
A specific embodiment of the present invention has been described above, but the present invention is not limited to the above-described embodiment and various changes may be made without departing from the gist of the present invention. It will be obvious to a person skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or the scope of the present invention. Therefore, it is intended that the present invention encompass the scope of the appended claims and modification and variations that are equivalent to the scope of the appended claims. In particular, it is clearly intended that any part or whole of any two or more out of the above-described embodiment and modifications of the embodiment combined with each other can be considered as being within the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2017-186690 | Sep 2017 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
8735709 | Yamauchi | May 2014 | B2 |
20010037196 | Iwamoto | Nov 2001 | A1 |
20030221542 | Kenmochi | Dec 2003 | A1 |
20110000360 | Saino | Jan 2011 | A1 |
20110144982 | Salazar | Jun 2011 | A1 |
20150040743 | Tachibana | Feb 2015 | A1 |
20150310850 | Nakano | Oct 2015 | A1 |
20170025115 | Tachibana | Jan 2017 | A1 |
Number | Date | Country |
---|---|---|
H09-204185 | Aug 1997 | JP |
H10-240264 | Sep 1998 | JP |
2001-249668 | Sep 2001 | JP |
2006-154526 | Jun 2006 | JP |
2009-122611 | Jun 2009 | JP |
2015-34920 | Feb 2015 | JP |
2015-179143 | Oct 2015 | JP |
Entry |
---|
Japanese Office Action dated May 22, 2018, in a counterpart Japanese patent application 2017-186690. (A machine translation (not reviewed for accuracy) attached.). |
Japanese Office Action dated Dec. 18, 2018, in a counterpart Japanese patent application 2017-186690. (A machine translation (not reviewed for accuracy) attached.). |
Number | Date | Country | |
---|---|---|---|
20190096379 A1 | Mar 2019 | US |