The present invention relates to a technique for generating, with a designated pitch, a voice based on a character.
There have heretofore been known apparatus which generate singing voices by synthesizing voices of lyrics while varying a pitch in accordance with a melody. Patent Literature 1, for example, discloses a technique for updating or controlling a singing position in lyrics, indicated by lyrics data, in response to receipt of performance data (pitch data). Namely, Patent Literature 1 discloses a technique in which a melody performance is executed by a user operating an operation section, such as a keyboard, and the lyrics are caused to progress in synchronism with a progression of the melody performance. Further, in the field of electronic musical instruments, controllers of various shapes have been under development, and it has been known to provide a grip section projecting from the body of a keyboard musical instrument and provide, on the grip section, a desired operation section and an appropriate detection section for detecting a manual operation performed on the operation section (see, for example, Patent Literature 2 and Patent Literature 3).
Further, Patent Literature 4, for example, discloses a technique in which a plurality of lyrics are displayed on a display device, a desired portion of the lyrics is selected through an operation of an operation section, and the selected portion is output as a singing voice of a designated pitch. Patent Literature 4 also discloses a construction in which a user designates a syllable of the lyrics displayed on a touch panel, and then, once the user performs key depression successively three times on a keyboard, the designated syllable is audibly generated or sounded with a pitch designated on the keyboard.
Patent Literature 1: Japanese Patent Application Laid-open No. 2008-170592
Patent Literature 2: Japanese Patent Application Laid-open No. HEI-01-38792
Patent Literature 3: Japanese Patent Application Laid-open No. HEI-06-118955
Patent Literature 4: Japanese Patent Application Laid-open No. 2014-10190
In the conventionally-known apparatus which generate voices on the basis of characters, such as singing voice generation device, various performance expressions, like user expressions, achievable by the voice generation, are undesirably considerably limited in width or range. Specifically, in live performances, it is desirable to permit flexible modification of the lyrics and/or control of a style or manner (state) of voice generation, i.e. flexible ad-lib performances, such as repeating a phrase of a desired portion of the lyrics in accordance with warming-up or climaxing of the music piece and/or changing, even where the same phrase is repeated, the lyrics expressions, intonations of the performance and/or the like per repetition of the phrase as necessary. However, with the conventionally-known apparatus, it is not possible to easily execute such flexible ad-lib performances. For example, it is not easy to flexibly control the manner of the voice generation, such as by making a setting such that a user-desired partial range of the music piece is repeated during the performance, or changing, in a case where the same phrase is repeated, the lyrics and/or intonation per repetition.
Besides, there has heretofore been a demand for development of various techniques for allowing an object of repeat to be selected with ease. Namely, in order to repeat the lyrics in the technique disclosed in Patent Literature 4, it is necessary to select the lyrics displayed on the display section. However, it is also necessary to view the display section while singing voices are being output. Further, when an operation for selecting the displayed lyrics is required, the performing style of a human player would be limited to one that permits the viewing of the display section and lyrics selecting operation. During a live performance, of a performance device provided with a display section, for example, it is essential for the human player to view the performance device provided with the display section. Therefore, it tends to be difficult for the human player to perform the performance device by touching the performance device without relying on the sense of vision, and thus, the range of motion, performance posture, etc. of the user would be limited to those that permit the viewing of the display section and selection operation.
In view of the foregoing prior art problems, it is an object of the present invention to provide a technique which generates voices based on a pre-defined character string, such as lyrics, in accordance with performed pitches, and which permits an ad-lib performance, such as a change of a voice to be generated and thereby permits an increased range of expressions in the character-based voice generation. It is another object of the present invention to permit selection of an object of repeat without relying on the sense of vision.
In order to accomplish the aforementioned object, the present invention provides a controller for a voice generation device, the voice generation device being configured to generate a voice corresponding to one or more designated characters in a pre-defined character string, the controller comprising: a character selector configured to be operable by a user to designate the one or more designated characters in the pre-defined character string; and a voice control operator configured to be operable by the user to control a state of the voice to be generated by the voice generation device. The present invention also provides a system comprising the aforementioned controller and the aforementioned voice generation device.
According to the present invention, where a voice corresponding to the one or more characters designated from the pre-defined character string in response to a user's operation of the character selector is generated by the voice generation device and the voice to be generated can be controlled as desired in response to a user's operation of the voice control operator, the voice to be generated can be changed or the like in accordance with a user's operation although the present invention is constructed to generate voices based on the pre-defined character string. Thus, in the case where voices corresponding to characters of lyrics are to be generated in synchronism with a music performance, controllability by the user can be enhanced, which can thereby facilitate an ad-lib performance in lyrics-based voice generation. In this way, the present invention can significantly increase a width or range of expressions in the lyrics-based voice generation.
In one embodiment of the present invention, the controller further comprises a grip adapted to be held with a hand of the user, and the character selector and the voice control operator are both provided on the grip. In one embodiment, the character selector and the voice control operator are provided on the grip at positions where the character selector and the voice control operator are operable with different fingers of the user holding the grip. Further, in one embodiment, the controller is constructed in such a manner that one of the character selector and the voice control operator is operable with the thumb of the user and the other of the character selector and the voice control operator is operable with another finger of the user.
Further, in one embodiment, the character selector and the voice control operator are disposed on different surfaces of the grip. The construction where the character selector and the voice control operator are disposed on the single grip in the aforementioned manner is suited for the user to appropriately operate both of the character selector and the voice control operator using any of the fingers of one hand of the user holding the grip. Thus, the user can easily operate the character selector and the voice control operator on the grip with one hand while performing a keyboard musical instrument or the like with the other hand.
According to another aspect of the present invention, there is provided a voice generation device which comprises a processor configured to function as: an information acquisition section that acquires information designating one or more characters in a pre-defined character string; a voice generation section that generates, based on the acquired information, a voice corresponding to the designated one or more characters; an object-of-repeat reception section that receives information designating a currently-generated voice as an object of repeat; and a repeat control section that controls the voice generation section to repeatedly generate the voice designated as the object of repeat. Thus, by listening to voices sequentially generated by the voice generated by the voice generation section, the user can quickly auditorily judge whether the voice being currently generated in real time is suited to be designated as an object of repeat and then designate (select) the currently-generated voice as an object of repeat. In this way, the user can select a character as the object of repeat, without relying on the auditory sense.
Further, the input/output section 60 comprises an input section that inputs an instruction given from the user etc., and an output section (including a display and a speaker) that outputs to the user various information (image information and voice information). As an example, rotary switches and a display are provided as the input section and the output section, respectively, on the keyboard musical instrument 10 and depicted within a dotted-line block in
The controller 10a projects from one side surface (left side surface in the illustrated example of
On the grip G of the controller 10a are provided a character selector 60a capable of functioning as a part of the input/output section 60 of the keyboard musical instrument 10, a voice control operator 60b, and a repeat operator 60c. Namely, a signal and/or information generated in response to an operation of any of the character selector 60a, voice control operator 60b and repeat operator 60c on the controller 10a is transferred to the body (voice generation device) 10b of the keyboard musical instrument 10, where the signal and/or information is handled as a user-input signal and/or information. The character selector 60a, which is configured to be operable by the user to designate one or more characters included in a pre-defined character string (such as lyrics), includes a plurality of selection buttons Mcf, Mcb, Mpf and Mpb that are in the form of push button switches. The character selector 60a is disposed on the curved or slanting surface (chamfered part) formed between the upper flat surface and the rear flat surface (see
The repeat operator 60c is operable by the user to enter repeat-performance-related input. In the instant embodiment, the repeat operator 60c, which is also in the form of a push button switch, is disposed on the curved or slanting surface (chamfered part) formed between the upper flat surface and the rear flat surface (see
The voice control operator 60b is configured to be operable by the user to control the state of the voice to be generated by the voice generation device 10b. As an example, the pitch of the voice to be generated is controllable in response to an operation of the voice control operator 60b. The voice control operator 60b is disposed on the front flat surface of the grip G (see
In the above-described construction, the user operates the character selector 60a, voice control operator 60b and repeat operator 60c while holding the grip G of the controller 10a with the left hand as shown in
Further, when the user is holding the grip G as shown in
Further, according to the above-described construction, the user can operate the character selector 60a or the repeat operator 60c with the thumb of the one hand and operate the voice control operator 60a with another finger of the one hand while holding the grip G of the controller 10a with the one hand. Thus, the user can readily simultaneously operate, with the one hand, the voice control operator 60b and the character selector 60a (or the repeat operator 60c). Further, the user's operation on the voice control operator 60b with the one hand is similar to an operation of holding a guitar fret or the like; thus, by the user touching the voice control operator 60b with an operation similar to the guitar fret holding operation, the manner of voice generation can be controlled in accordance with the user's touch-operating or touching contact position on the voice control operator 60b. Further, when the user is holding the controller 10a, the user's hand contacts only the flat, curved or slanting surfaces of the controller 10a without contacting any pointed portion of the controller 10a. Thus, the user can slidingly move the hand repeatedly along the longitudinal direction (i.e., left-right direction in
Further, a voice generation program 30a, character information 30b and a voice fragment database 30c are recorded in advance in the non-volatile memory 30. The character information 30b is information of a pre-defined character string, such as lyrics, which includes, for example, information of a plurality of characters constituting the character string and information indicative of an order of the individual characters in the character string. In the instant embodiment, the character information 30b is in the form of text data where codes indicative of the characters are described in accordance with the above-mentioned order. Needless to say, the data of the lyrics prestored in the non-volatile memory 30 may be of only one or a plurality of music pieces, or just one phrase of a portion of a music piece. When voices of a desired song or character string are to be generated, the character information 30b of the music piece, i.e. the character string, is selected. Further, the voice fragment database 30c is a collection of data for playing back or reproducing human singing voices, and in the instant embodiment, the voice fragment database 30c is created by collecting waveforms of voices, represented by characters, when the voices were uttered with reference pitches, segmenting each of the collected waveforms into voice fragments each having a short time period and then databasing waveform data indicative of the segmented voice fragments. Namely, the voice fragment database 30c comprises a collection of waveform data indicative of a plurality of voice fragments. Combining such waveform data indicative of voice fragments can reproduce voices indicated by desired characters.
More specifically, the voice fragment database 30c is a collection of waveform data of voice transition portions (articulations), such as C to V (i.e., Consonant-to-Vowel) transition portions, V to V (i.e., Vowel-to-another-Vowel) transition portions and V to C (Vowel-to-Consonant) transition portions, and waveform data of stretched sounds (stationaries) of vowels V. Namely, the voice fragment database 30c is a collection of voice fragment data indicative of various voice fragments as materials of singing voices. These voice fragment data are data created on the basis of voice fragments extracted from voice waveforms uttered by actual persons. In the instant embodiment, voice fragment data to be connected together for reproducing voices of desired characters or a desired character string are predetermined and prestored in the non-volatile memory 30 (although not particularly shown). The CPU 20 references the non-volatile memory 30 in accordance with desired characters or a desired character string indicated by the character information 30b to select voice fragment data to be connected together. Then, waveform data for reproducing voices indicated by the desired characters or desired character string are created by the CPU 20 connecting together the selected voice fragment data. Note that the voice fragment database 30c may be prepared for various different languages or for different characteristics of voices, such as the sexes of human voice utterers. Further, the waveform data constituting the voice fragment database 30c may each be data prepared by segmenting a train of samples, obtained by sampling the waveform of the voice fragment at a predetermined sampling rate, into frames each having a predetermined time length, or per-frame spectral data (of amplitude and phase spectra) obtained by performing the FFT (Fast Fourier Transform) on the data prepared by segmenting a train of samples. The following describe a case where the waveform data constituting the voice fragment database 30c are the latter data, i.e. spectral data.
In the illustrated embodiment, the CPU 20 can execute the voice generation program 30a stored in the non-volatile memory 30. Through execution of the voice generation program 30a, the CPU 20 generates, with pitches instructed by the user on the pitch selector 50, voice signals corresponding to characters defined as the character information 30b. Then, the CPU 20 instructs the sound output section 70 to output voices in accordance with the generated voice signals, in response to which the sound output section 70 generates analog waveform signals for outputting the voices and amplifies the analog waveform signals to audibly output the voices.
In the present invention, the pre-defined character string is not necessarily limited to lyrics of an existing song associated in advance with a predetermined music piece and may be any desired character string of a poem, a verse, an ordinary sentence or the like. In the following description, let it be assumed that voices corresponding to a character string of lyrics associated with a predetermined music piece are generated. As known, a progression of notes and a progression of lyrics in a music piece are associated with each other in a predetermined relationship. In such a case, a note may correspond to one syllable or a plurality of syllables, or it may sometimes correspond to a sustained portion of a syllable having been generated in correspondence to an immediately preceding note. As also known, the unit number of characters that can be associated with one note differs depending on the type of language. In Japanese, for example, each syllable can generally be expressed by one Japanese alphabetical letter (kana character), and thus, lyrics can be associated with individual notes on a kana-character-by-kana-character basis. In many of other languages, such as English, on the other hand, one syllable is generally expressed by one or a plurality of characters, and thus, lyrics are associated with individual notes on a syllable-by-syllable basis rather than on the character-by-character basis; namely, the number of characters constituting a syllable may be just one or plural (more than one). The concept derivable from the foregoing is that, in any language systems, the number of characters for designating a voice to be generated in correspondence to a syllable is one or plural. In this sense, the one or plural characters to be designated for generation of a voice in the present invention suffice to identify one or plural syllables (including a syllable with a consonant alone) necessary for the voice generation.
As an example, a construction may be employed where, in synchronism with a user's pitch designation operation on the pitch selector 50, one or more characters in a character string (lyrics) are caused to sequentially progress in accordance with a predetermined character progression order of the character string (lyrics). For that purpose, the individual characters in the character string (lyrics) are divided into character groups, each comprising one or more characters, in association with respective notes to which the characters are allocated, and such groups are ordered in accordance with the progression order.
Further,
Note that the aforementioned pitch conversion process may be arranged in any desired manner as long as it can convert a voice of a particular pitch to a voice of another pitch; for example, the pitch conversion process may be implemented by operations for evaluating a difference between the pitch designated on the pitch selector 50 and the reference pitch of the voice indicated by the voice fragment data, shifting, in a frequency axis direction, a spectral distribution indicated by the waveform of the voice fragment data by frequencies corresponding to the evaluated difference, etc. Needless to say, the pitch conversion process may be implemented by various other operations than the aforementioned and may be performed on the time axis. The voice generation of step S105 is arranged to also control the state (e.g., pitch) of the to-be-generated voice in accordance with an operation performed via the voice control operator 60b, as will be later described in greater detail. In the voice generation of step S105, various factors (such as pitch, volume and color) of the to-be-generated voice may be made adjustable, and voice control for imparting vibrato and/or the like to the to-be-generated voice may be performed.
Once the voice signal is generated, the CPU 20 outputs the generated voice signal to the sound output section 70. Then, the sound output section 70 converts the voice signal into an analog waveform signal and audibly outputs the analog waveform signal after amplification. Thus, from the sound output section 70 is audibly output the voice that is of the syllable indicated by the object-of-output character group and that has the pitch, volume intensity, etc. designated on the pitch selector 50.
At following step S106, the CPU 20 determines whether the repeat function has been turned on by an operation of the repeat operator 60c, details of which will be described later. Normally, the repeat function is in an OFF state, and thus, a NO determination is made at step S106, so that the CPU 20 goes to step S120 where the value of the pointer j is incremented by “1”. Thus, an object-of-output character group designated by the incremented value of the pointer j corresponds to a voice to be generated at the next voice generation time.
In the above-described processing, the CPU 20 increments the variable (pointer j) for designating the object-of-output character group, each time the pitch selector 50 is operated once (step S120). In the instant embodiment, the CPU 20, after starting the operation for generating and outputting the voice corresponding to the object-of-output character group with the pitch designated on the pitch selector 50, increments the variable (pointer j) irrespective of whether the generation and output of the voice has been stopped or not. Thus, in the instant embodiment, the term “object-of-output character group” refers to a character group corresponding to a voice to be generated and output in response to the next voice generation instruction, in other words a character group waiting for voice generation and output.
In the instant embodiment, the CPU 20 may display, on a display of the input/output section 60, the object-of-output character group and at least another character group of the position, in the progression order, preceding or succeeding the object-of-output character group. For example, a lyrics display frame for displaying a predetermined number of characters (e.g., m characters) is provided on the display of the input/output section 60. The CPU 20 references the RAM 40 to acquire, from the character string, a total of m characters including one character group of the position designated by the pointer j and other characters preceding and/or succeeding the one character group and then displays the thus-acquired characters on the lyrics display frame of the display.
Further, the CPU 20 may cause the input/output section 60 to present a display such that the object-of-output character group and the other characters are visually distinguished from each other. Such a display can be implemented in various manners, such as by highlighting the object-of-output character group (e.g., flashing the object-of-output character group, changing the color of the object-of-output character group, or adding an underline to the object-of-output character group), clearly displaying the other characters preceding or succeeding the object-of-output character group (e.g., flashing the other characters, changing the color of the other characters, or adding an underline to the other characters), and/or the like. Further, the CPU 20 switches the displayed content on the display of the input/output section 60 so that the object-of-output character group is always displayed on the display of the input/output section 60. The display switching may be implemented in various manners, such as by scrolling the displayed content on the display as the object-of-output character group is switched to another in response to a change in the value of the pointer j, sequentially switching the displayed content by a plurality of characters at a time, and/or the like.
Further, in
According to such a basic example of voice generation, the user can control the voice pitch and the character progression via the pitch selector 50, so that singing voices corresponding to the lyrics having a predetermined order of characters can be generated (automatically sung) with pitches exactly as desired by the user. However, in such a basic example, the characters in the character string progress in accordance with the predetermined progression order, and thus, if the user performs an unscheduled operation, such as an erroneous operation, on the pitch selector 50 that differs from, or does not correspond to, an actual progression of the music piece, the progression of the singing voices would undesirably become faster or slower than the progression of the music piece. In the illustrated example of
In view of the foregoing, the controller 10a of the keyboard musical instrument 10 according to the instant embodiment is provided with a character selector 60a, and the controller 10a is constructed in such a manner that, even when an unscheduled operation has been performed on the pitch selector 50, the object-of-output character group for which voices are to be generated (i.e., which is to be voiced) can be returned to a character group conforming to the scheduled or original music piece progression by the user operating the character selector 60a. Further, an ad-lib performance modifying the original music piece progression can be executed by the user intentionally operating the pitch selector 50 and the character selector 60a in combination as necessary.
More specifically, as shown in
The following describe, with reference to
When the operated selection button is the forward character shift selection button Mcf, the CPU 20 shifts the position, in the progression order, of the object-of-output character group forward by one position (step S205). Namely, the CPU 20 increments the value of the pointer j by one. When the operated selection button is the backward character shift selection button Mcb, the CPU 20 shifts the position of the object-of-output character group backward by one position (step S210). Namely, the CPU 20 decrements the value of the pointer j by one.
Further, when the operated operator is the forward phrase shift selection button Mpf, the CPU 20 shifts the position of the object-of-output character group forward by one phrase (step S215). Namely, the CPU 20 references the character information 30b of the lyrics character train to search for the end of a nearest phrase present between the current object-of-output character group and a character group of a position in the progression order succeeding (i.e., greater in position-indicative value than) the current object-of-output character group. Then, when the end of the nearest phrase has been detected, the CPU 20 sets a numerical value indicative of the position of a character group located next to the end of the nearest phrase (i.e., a position, in the progression order, of the leading or first character group of a phrase immediately succeeding the end of the nearest phrase) into the pointer j.
Further, when the operated operator is the backward phrase shift selection button Mpb, the CPU 20 shifts the position of the object-of-output character group backward by one phrase (step S220). Namely, the CPU 20 references the character information 30b of the lyrics character train to search for the end of a nearest phrase present between the current object-of-output character group and a character group of a position in the progression order preceding (i.e., smaller in position-indicative value than) the current object-of-output character group. Then, when the end of the nearest phrase has been detected, the CPU 20 sets a numerical value indicative of the position of a character group located backward next to the end of the nearest phrase (i.e., a position, in the progression order, of the leading or first character group of a phrase immediately preceding the end of the nearest phrase) into the pointer j.
Once the user designates a pitch by operating the pitch selector 50 at generally the same time that, or at an appropriate timing immediately after, the value of the pointer j is incremented or decremented as needed in response to a user's operation of the character selector 60a, the CPU 20 performs the process of
The order of the character groups for which voices are to be generated can be modified by a user's operation of the character selector 60a as set forth above. Thus, even when the user has performed an erroneous pitch designation operation on the pitch selector 50, the order of the character groups for which voices are to be generated can be adjusted back to an appropriate order corresponding to the predetermined music piece progression.
According to the instant embodiment, the position of the object-of-output character group changes in synchronism with the user's operations of the pitch selector 50, in such a case. Therefore, as shown in
With the aforementioned construction, the user can change the object-of-output character group on a character-group-by-character-group basis or on a phrase-by-phrase basis in accordance with the order indicated by the character information, by operating the character selector 60a. Thus, with the simple construction, the user can appropriately correct the object-of-output character group; besides, if the user accurately remember the order of the lyrics character string, the user can also modify the object-of-output character group by a mere touching operation without relying on the sense of vision.
Further, according to the aforementioned construction, a voice corresponding to the object-of-output character group is generated in synchronism with an operation of the pitch selector 50, and then, the pointer j designating the position of the object-of-output character group is incremented. Thus, once the voice is generated in response to the operation of the pitch selector 50, another character group of the position immediately succeeding the character group corresponding to the generated voice becomes the object of output. In this manner, the user can know a state of progression of the singing voices by listening to the voice having been output at the current time point. Thus, when the user operates any one of the buttons of the character selector 60a, the user can readily know for which lyrics character a voice can be generated next, i.e. which lyrics character can be voiced next. For example, if the user operates the backward character shift selection button Mcb so that the object-of-output character group is shifted backward by one position, the user can recognize that the character group corresponding to the currently output voice (or last-output voice of voices whose output has been completed) can be made the object-of-output character group again. In this way, the user can change the object-of-output character group by operating the character selector 60a on the basis of information acquired through the auditory sense, so that the user can more easily correct the object-of-output character group by a mere touching operation without relying on the sense of vision.
Further, the instant embodiment is configured to be capable of controlling a characteristic (e.g., adjusting a pitch) of a voice to be generated in response to the user operating the voice control operator 60b in order to enhance the performance of the keyboard musical instrument 10 as a musical instrument. More specifically, once the voice control operator 60b is operated with a finger of the user during generation of a voice responsive to an operation of the pitch selector 50, the CPU 20 acquires a touching contact position of the finger on the voice control operator 60b and also acquires a correction amount associated in advance with the contact position. Then, the CPU 20 controls a characteristic (any one of pitch, volume, color, etc.) of the currently generated voice in accordance with the correction amount.
If a voice is currently being generated as determined at step S300, the CPU 20 acquires a touching contact position of a user's finger (step S305); namely, the CPU 20 acquires a signal indicative of a touching contact position output from the voice control operator 60b. Then, on the basis of the contact position of the user's finger on the voice control operator 60b, the CPU 20 acquires a correction amount relative to a reference pitch that is the pitch designated on the pitch selector 50.
More specifically, the voice control operator 60b is a sensor which has an elongated rectangular finger-contact detecting surface and which is configured to detect at least a one-dimensional operated position (linear position). In one example, a lengthwise middle position of the long side of the voice control operator 60b corresponds to the reference pitch, and correction amounts for different touching contact positions are predetermined such that the correction amount of pitch gets greater as the contact position gets farther from the middle position of the long side of the voice control operator 60b. Further, of the correction amounts, correction amounts for raising the pitch are associated with individual touching contact positions on one side from the middle position of the voice control operator 60b, while correction amounts for lowering the pitch are associated with individual touching contact positions on the other side from the middle position of the voice control operator 60b.
Thus, the opposite end positions of the long side of the voice control operator 60b represent the highest and lowest pitches. In a construction which permits correction by up to four half tones from the reference pitch, for example, the reference pitch is associated with the middle position of the long side of the voice control operator 60b, a pitch higher by four half tones than the reference pitch is associated with one of the opposite ends of the long side, and a pitch higher by two half tones than the reference pitch is associated with a position midway between the one end and the middle position. Further, a pitch lower by four half tones than the reference pitch is associated with the other end of the long side, and a pitch lower by two half tones than the reference pitch is associated with a position midway between the other end and the middle position. In the instant embodiment, where corrected pitches are associated with individual touching contact positions as noted above, the CPU 20, after having acquired a contact-position indicating signal from the voice control operator 60b, acquires, as a correction amount, a difference in frequency between the pitch corresponding to the contact position and the reference pitch.
Then, the CPU 20 performs pitch conversion (step S315). Namely, using, as the reference pitch, the pitch designated by the currently depressed pitch selector 50, i.e. the pitch of the voice currently being generated at step S300, the CPU 20 performs pitch adjustment (pitch conversion) of the currently generated voice in accordance with the correction amount acquired at step S310. More specifically, the CPU 20 performs a pitch conversion process for creating voice fragment data with which to output a voice with the corrected pitch, such as by performing a process for shifting, in the frequency axis direction, a spectral distribution indicated by a waveform of voice fragment data with which to output a voice with the reference pitch. Further, the CPU 20 generates a voice signal on the basis of the voice fragment data having been created by the pitch conversion process and outputs the thus-generated tone signal to the sound output section 70. As a consequence, the voice of the corrected pitch is output from the sound output section 70. In the above-described example, an operation of the voice control operator 60b is detected during generation of a voice and the correction amount acquisition and the pitch conversion process are performed on the basis of the detected operation as noted above. Alternatively, when the voice control operator 60b has been operated before output of a voice is started, followed by an operation of the pitch selector 50, the correction amount acquisition and the pitch conversion process may be performed, during generation of a voice corresponding to the operation of the pitch selector 50, while reflecting the operation of the voice control operator 60b immediately preceding the generation of the voice.
Thus, once the pitch of Mi is designated by an operation on the pitch selector 50 at next time point t5, a voice corresponding to the character group L3 is generated with the pitch of Mi. In this case, once the generation of the voice corresponding to the character group L3 is started, the object-of-output character group designated by the pointer j switches to the next character group L4. The generation of the voice corresponding to the character group L3 lasts from the start time of the depression operation of the pitch selector 50 designating the pitch of Mi (i.e., from time point t5) to a time at which the depression operation of the pitch selector 50 is terminated (i.e., to time point t6). Then, once the pitch of Fa is designated by an operation of the pitch selector 50 at time point t6, a voice corresponding to the object-of-output character group L4 is generated with the pitch of Fa.
In the illustrated example of
Further, in such a case, although the same lyrics characters are repeated as noted above, a perfection level of the performance can often be enhanced if the singing voices repeated in the period from time point is to time point t7 are different in state than the singing voices output in the period from time point t3 to time point t5. Further, in the instant embodiment, where the keyboard 10 is provided with the voice control operator 60b, the user can change, by operating the voice control operator 60b, the state of the singing voices between the first and second of the repeated performances.
Further, in the illustrated example of
Further, in the illustrated example of
Note that, although it is necessary for the user to simultaneously operate the forward character shift selection button Mcf and the voice control operator 60b at time point tf, the user can easily perform such simultaneous operations of the selection button Mcf and the control operator 60b by use of the controller 10a according to the embodiment of the invention. Namely, with the controller 10a according to the embodiment of the invention, where the voice control operator 60b is provided on the front flat surface of the grip as viewed from the user and the forward character shift selection button Mcf is provided between the upper and rear flat surfaces of the grip, the user can operate the forward character shift selection button Mcf with the thumb of one hand and operate the voice control operator 60b with another finger (such as the index finger) while holding the grip G with the one hand; thus, the user can simultaneously operate the forward character shift selection button Mcf and the voice control operator 60b.
With the voice control operator 60b provided in the aforementioned manner, it is possible to execute singing voice performances in many variations. For example, even with the construction where the order of character groups is caused to progress each time the single pitch selector 50 is operated once, a voice indicated by a single character group can be generated with two or more successive pitches. Let' assume, for example, a song to be performed sequentially in the order of the character groups L1, L2, L3, L4, L5 and L6 and with predetermined pitches, i.e., Do for the character group L1, Re for the character group L2, Mi and Fa for the character group L3, Do for the character group L4, Re for the character group L5, and Mi for the character group L1. In this case, the user operates the pitch selector 50 to designate the pitches of Do, Re and Mi at time points t1, t2 and t3, respectively, as shown in
With the above-described construction, the user can use the controller 10a to give an instruction for generating voices based on characters in various expressions. Further, while the user is performing the keyboard musical instrument 10 and voices are being output in response to the performance of the keyboard musical instrument 10, the user can flexibly execute modification of the lyrics and control of the manner of voice generation, such as repetition of a desired lyrics portion, like a chorus or highlighted portion, and change of intonation in response to warming-up or climaxing of the music piece. Furthermore, when a same lyrics portion is repeated through modification of the lyrics, it is also possible to change the intonation of the same lyrics portion by controlling the manner of voice generation, and thus, it is possible to increase the range of expressions of character-based voices.
Further, in order to allow an ad-lib performance of the lyrics to be executed in a variety of ways, the instant embodiment of the invention is constructed in such a manner that the user can designate, by operating the repeat operator 60c, a range of character groups (character group range) to be set as an object of repeat (i.e., start and end of the repeat performance). More specifically, once the user depresses the repeat operator 60c, the CPU 20 starts selection of character groups to be set as an object of repeat. Then, once the user terminates the depression operation on the repeat operator 60c, the CPU ends the selection of character groups as the object of repeat. In this manner, the CPU 20 sets, as the object of repeat, the range of the character groups selected while the user was depressing the repeat operator 60c.
First, with reference to
The following describe the object-of-repeat selection (setting) process with reference to
If the repeat function is currently OFF as determined at step S400, the CPU 20 turns on the repeat function (step S405). Namely, in the instant embodiment, once the user depresses the repeat operator 60c when the repeat function is OFF, the CPU 20 determines that the repeat function has been switched to the ON state and rewrites the repeat flag recorded in the RAM 40 into a value indicating that the repeat function is currently ON. After the repeat function has been turned on as above, the CPU 20 performs a process for setting a range of character groups (character group range) to be made an object of repeat for a period till the depression operation on the repeat operator 60c is terminated.
Then, the CPU 20 sets the object-of-output character group as the first character group of the object of repeat (step S410). Namely, the CPU 20 acquires the current value of the pointer j and records the thus-acquired current value of the pointer j into the RAM 40 as a value indicative of a position, in the progression order, of the first character group of the object of repeat. The object-of-output character group indicated by the current value of the pointer j is indicative of a voice to be generated at the next voice generation time (i.e., the next time the pitch selector 50 is operated). In the illustrated example of
Then, the CPU 20 waits until it is determined that the depression operation on the repeat operator 60c has been terminated (step S415). Even during the waiting period, the CPU 20 performs the aforementioned voice generation process in response to an operation on the pitch selector 50 (see
Once the depression operation on the repeat operator 60c is terminated as determined at step S415, the CPU 20 sets, as the last character group of the object of repeat, the character group immediately preceding the object-of-output character group (step S420). Namely, the CPU 20 acquires the current value of the pointer j and records a value (j−1) obtained by subtracting 1 (one) from the current value of the pointer j into the Ram 40 as a value indicative of the position of the last character group of the object of repeat. The character group immediately preceding the object-of-output character group, indicated by the value (j−1), corresponds to the currently-generated voice or last-generated voice.
In the illustrated example of
Once the character group range is set as the object of repeat in the aforementioned manner, the CPU 20 sets the first character group of the object of repeat as the object-of-output character group (step S425). Namely, the CPU 20 references the RAM 40 to acquire a value indicative of the position, in the progression order, of the first character group of the object of repeat and sets the thus-acquired value into the pointer j. Thus, the next time pitch designation information is acquired in response to an operation on the pitch selector 50, a voice corresponding to the first character group of the object of repeat will be generated.
The following describe, with reference to
At step S110, the CPU 20 determines whether or not the object-of-output character group indicated by the pointer j is the last character group of the object of repeat. If the object-of-output character group indicated by the pointer j is not the last character group of the object of repeat, the CPU 20 branches from a NO determination of step S110 to step S120, where it increments the value of the pointer j by one.
Namely, each time a pitch designation operation is performed on the pitch selector 50, the process of
To turn off the repeat function currently in the ON state, the user depresses the repeat operator 60c again, in response to which the process of
Then, the CPU 20 clears the setting of the character group range as the object of repeat (step S435). Namely, the CPU 20 deletes, from the RAM 40, the values indicative of the respective positions, in the progression order, of the first and last character groups of the object of repeat. As an example, the CPU 20 is configured to leave the value of the pointer j, i.e. the object-of-output character group, unchanged even when the repeat function has been turned off. Thus, in the illustrated example of
The user can identify the object-of-output character group (L5 in the illustrated example of
For example, the user can set the character group L7 as the object of output by depressing the forward character shift selection button Mcf twice at a timing preceding time point t7. In this case, if the user operates the pitch selector 50 at time point t7, the voice indicated by the character group L7 is output. Further, in a case where a boundary between the character group L6 and the character group L7 is set as the end of a phrase in the character information 30a, the user can set the character group L7 as the object of output by depressing the forward character shift selection button Mcf once at a timing preceding time point t7. In such a case too, if the user operates the pitch selector 50 at time point t7, the voice indicated by the character group L7 is output.
Note that, as a modification of the operation of step S435, the CPU 20 may automatically advance the value of the value of the pointer j to an original predetermined progressing position. More specifically, the CPU 20 may sequentially advance a reference pointer, which assumes that no repeat is being made during a repeat performance, in response to a pitch designation operation. For instance, in the illustrated example of
Note that combining operations via the repeat operator 60c and voice control via the voice control operator 60b permits a wide variety of performances. For example, such a combination permits a performance similar to that shown in
According to the above-described construction of the instant embodiment, the CPU 20 repeatedly generates, in response to operations on the repeat operator 60c, voices corresponding to a character group range set as an object of repeat set as desired by the user. Further, with the instant embodiment, a repeat timing of voices indicated by characters of the object of repeat can be controlled in accordance with a user's instruction (user's operation on the pitch selector 50). Further, the user can designate a desired character range of the lyrics character string and thereby cause voices of the desired character range to be output repeatedly as set forth above, and thus, when a performance of a same portion is to be repeated for mastering, memorizing, etc. of a musical instrument performance, the user can easily designate a desired repeat range and cause the designated repeat range to be performed in a repeated fashion. Besides, the above-described repeat function can be used for mastering etc. of, for example, a foreign language without being limited to a musical instrument performance; as an example, voices of a desired character range can be repeatedly generated, such as for listening training of a foreign language or the like. Furthermore, in creation of the character information 30b, creation of a same character group for a repeated performance (i.e., creation of the same character group for being performed for the second or subsequent time following the first performance) may be omitted. In this way, it is possible to simplify the operation for creating the character information 30b and hence reduce a necessary storage capacity for the character information 30b. Moreover, according to the instant embodiment, a desired portion can be selected from a character string of a predetermined progression order defined as the character information 30b and can be repeated while voices are being generated by the voice generation apparatus on the basis of the character information 30b, as set forth above. Thus, it is possible to generate voices of the character string with the existing progression order of the character string modified as desired. The existing progression order of the character string may be modified in various manners, such as by trolling, repeating a highlighted or climaxing portion (i.e., chorus) of the music piece, scatting words like “La, La, La”, and repeating a portion of a high performing difficulty for a practicing purpose. Further, with the instant embodiment, it is possible to not only designate a character range as an object of repeat but also instruct a start and end of a repeat performance, via the repeat operator 60c in the form of a single push button switch. Thus, not only designation of a character range as an object of repeat but also timing control of a repeat performance can be executed with extremely simple operations. Furthermore, repeat-related control can be performed with a reduced number of operations. Moreover, the user can select characters as an object of repeat in real time by listening to voices sequentially output from the sound output section 70; thus, the user can select such characters as an object of repeat without relying on the visual sense.
The above-described embodiment is just an illustrative example for describing the present invention, and various other embodiments may be employed. For example, the controller 10a is not limited to the shape shown in
Furthermore, for the grip G, it is only necessary that the character selector 60a, the repeat operator 60c and the voice control operator 60b be provided at such positions that, when the character selector 60a or the repeat operator 60c is operated with a finger of the user, the voice control operator 60b can be operated with another finger of the user. For that purpose, the character selector 60a (or the repeat operator 60c) and the voice control operator 60b may be provided on a portion of the grip G where the fingers of one hand of the user are placed while the user is holding the grip G with the one hand. For example, the grip G may be constructed in such a manner that the character selector 60a (or the repeat operator 60c) and the voice control operator 60b are provided on different surfaces rather than on a same flat surface, as shown in (A), (B), (D) and (E) of
Further, in order for the user to stably hold the grip while grasping the grip with one hand, it is preferable that the character selector 60a (or the repeat operator 60c) and the voice control operator 60b not be located on two opposite surfaces (e.g., front and rear surfaces in (A) and (E) of
What is more, the manner of interconnection the controller 10a and the body 10b is not necessarily limited to that shown in
Furthermore, the application of the present invention is not necessarily limited to the keyboard musical instrument 10 and may be another type of electronic musical instrument equipped with the pitch selector 50. The present invention is also applicable to a singing voice generation device which automatically generates voices of lyrics defined in the character information 30b in accordance with pre-created pitch information (such as MIDI information), or an apparatus which reproduces recorded sound information and recorded image information.
In such a case, the CPU 20 may acquire pitch designation information (MIDI event information etc.) automatically reproduced in accordance with an automatic performance sequence, generate a voice of a character group, designated by the pointer j, with a pitch designated by the acquired pitch designation information (MIDI event information etc.), and advance the value of the pointer j in accordance with the acquired pitch designation information (MIDI event information etc.). When the pitch selector 60a has been operated in the embodiment which acquires such pitch designation information according to the automatic performance sequence, the CPU 20 may temporarily stop acquisition of the pitch designation information according to the automatic performance sequence, acquires, instead of such pitch designation information, pitch designation information given from the pitch selector 50 in response to a user's operation, and then generate a voice of a character group, designated by the pointer j having been changed in response to the operation on the character selector 60a, with a pitch designated by the pitch designation information acquired from the pitch selector 50. A modification of the embodiment where the pitch designation information is acquired in accordance with the automatic performance sequence may be constructed in such a manner that, when the pitch selector 60a has been operated, the progression of the automatic performance is changed (advanced or returned) in accordance with a change of the value of the pointer j responsive to the operation on the character selector 60a, and that pitch designation information automatically generated in accordance with the thus-changed progression of the automatic performance is acquired and then a voice of a character group, designated by the pointer j having been changed in response to the operation of the character selector 60a, is generated with a pitch indicated by the acquired pitch designation information. In such a modification, the pitch selector 50 is unnecessary. Even where a voice generation (output) timing is designated by a user's operation, a means for designating such a voice generation (output) timing is not necessarily limited to the pitch selector 50 and may be another type of suitable switch or the like. For example, the modification may be constructed such that information indicative of a pitch of a voice to be generated is acquired from automatic sequence data and a generation timing of that voice is designated in accordance with a user's operation of a suitable switch.
Furthermore, the construction for varying the pitch on the basis of the voice control operator 60b is not necessarily limited to the one employed in the above-described embodiment, and various other constructions may be employed. For example, the CPU 20 may be configured to acquire a pitch variation rate from the reference pitch on the basis of a touching contact position on the pitch control operator 60b and vary the pitch on the basis of the acquired pitch variation rate. Further, the CPU 20 may consider that a position of the voice control operator 60b the user has first contacted the operator 60b is the reference pitch while a voice is being generated with the reference pitch, and then, when the contact position has changed from the first contact position, the CPU 20 may determine a pitch correction amount and a pitch variation rate on the basis of a distance between the first contact position and the changed contact position.
In the aforementioned case, a pitch correction amount and pitch variation rate per unit distance are determined in advance. Under such conditions, the CPU 20 acquires a changed distance that is a distance of the changed contact position from the first contact position. Then, the CPU 20 identifies a pitch variation amount and pitch variation rate by multiplying a value, calculated by dividing the changed distance by the unit distance, by the per-unit-distance pitch correction amount and pitch variation rate. Alternatively, the CPU 20 may be configured to identify a pitch correction amount and pitch variation rate on the basis of a change in the contact position on the voice control operator 60b (such as a moving velocity) rather than on the basis of a touching contact position on the voice control operator 60b. Of course, the width or range over which the pitch is variable via the voice control operator 60b is not necessarily limited to the aforementioned and may be any of various other ranges (such as a range of one octave). Further, the pitch variation range may be made variable in accordance with a user's instruction or the like. Furthermore, the object of control by the voice control operator 60b may be selected from among pitch, volume, characters of a voice (such as a sex of a voice utterer and characteristic of the voice) in accordance with a user's instruction or the like.
Note that the voice control operator 60b may be disposed separate from the grip G having the character selector 60a provided thereon, rather than on the grip G For example, an existing tone control operator provided on the input/output section 60 of the body 10h of the keyboard musical instrument 10 may be used as the voice control operator 60b.
Furthermore, the way of acquiring the character information 30b is not necessarily limited to the aforementioned, and the character information 30b may be input from an external recording medium, having the character information 30b recorded therein, to the keyboard musical instrument 10 through wired or wireless communication. Alternatively, singing voices being uttered may be picked up in real time via a microphone and buffered into the RAM 14 of the keyboard musical instrument 10 so that character information 30b can be acquired on the basis of buffered audio waveform data.
Furthermore, the character information 30b defining a predetermined character string of lyrics or the like may be any information as long as it is capable of substantively defining a plurality of characters and an order of the characters, and the character information 30b may be in any form of data expression, such as text data, image data or audio data. For example, the character information 30b may be expressed with code information indicative of time-serial variation of syllables corresponding to characters, or with time-serial audio waveform data. In shorthand, whatever form of data expression the character information 30b may be in, it is only necessary that the character information 30b be coded in such a manner that individual character groups (each comprising one or more characters corresponding to a syllable) in the character string are separately distinguishable, and that voice signals can be generated in accordance with such codes.
Furthermore, the above-described voice generation device may be constructed in any desired manner as long as it has a function for generating voices, indicated by characters, in accordance with an order of the characters, namely, as long as it can reproduce, as voices, sounds of words indicated by characters on the basis of the character information. Furthermore, as the technique for generating voices corresponding to character groups as set forth above, any desired one of various technique may be employed, such as a technique which generates waveforms for sounding characters, indicated by the character information, on the basis of waveform information indicative of sounds of various syllables.
Furthermore, the voice control operator may be constructed in any desired manner as long as it can change a factor that is an object of control (object-of-control factor); for example, the voice control operator may be a sensor via which the user can designate variation from a predetermined reference of the object-of-control factor, a value of the object-of-control factor, a state of the object-of-control factor after variation, and/or the like. The voice control operator may be a push-button switch or the like rather than a touch sensor. Furthermore, although it is only necessary that the voice control operator be at least capable of controlling the manner of generation of a voice indicated by a character selected by the character selector, the voice control operator is not so limited, and the voice control operator may be configured to be also capable of controlling the manner of generation of a voice independently of selection by the character selector.
What is more, the character selector 60a may include one or more other types of character selection (designation) means in addition to the aforementioned four types of selection buttons Mcf, Mcb, Mpf and Mpb.
Also, in the illustrated example of
To summarize the above-described embodiments with regard to the repeat function, the CPU 20 is configured to advance or retreat the pointer j artificially in response to an operation of the character selector 60a and/or in response to a progression of an automatic performance sequence and to identify (acquire) a character group, comprising one or more characters, from the pointer j (see steps S102, S105, steps S200 to S220, etc.). Such a function performed by the CPU 20 corresponds to a function as an information acquisition section that acquires information designating one or more characters included in a pre-defined character string.
Further, the CPU 20 is configured to generate a voice, corresponding to a character group of a position in the progression order designated by the pointer j, with a pitch designated as above (step S105). The thus-generated voice is output from the sound output section 70. Such a function performed by the CPU 20 corresponds to a function as a voice generation section that generates a voice of the designated one or more characters on the basis of the acquired information.
Further, as shown in
Number | Date | Country | Kind |
---|---|---|---|
2014-124091 | Jun 2014 | JP | national |
2014-124092 | Jun 2014 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2015/066659 | 6/10/2015 | WO | 00 |