1. Field of the Invention
The present invention relates to a method and device for editing singing voice synthesis data that directs control over synthesis of a singing voice. The invention also relates to a method for analyzing a singing voice that generates singing characteristics data used for editing singing voice synthesis data.
2. Description of the Related Art
There is known in the art of singing voice synthesis, a technique of synthesizing a singing voice based on singing voice synthesis data. The term singing voice synthesis data referred to here is sequence data including note data specifying a duration and pitch of a voice, and lyrics data associated with the note data, and sound control data. Examples of kinds of data included in the sound control data are volume control data for controlling a volume of a voice outputting lyrics indicated by the lyrics data, and pitch control data for controlling a pitch of the voice.
The singing voice synthesis data may be freely edited by a user and stored in a memory. The different kinds of data constituting the singing voice synthesis data, i.e., each of the pieces of note data, the lyrics data associated with each piece of note data, and the sound control data are read out from a memory in a sequential manner and supplied to a singing voice synthesizer by a sequencer. The singing voice synthesizer synthesizes singing voice signals that correspond to the lyrics indicated by the lyrics data, which are supplied by the sequencer, and have a pitch and voicing duration specified by the note data. The singing voice synthesizer then performs sound control such as volume and pitch control on the singing voice signals based on the sound control data, for output.
When an actual person sings, the first voicing of a phrase segmented by silent sections strongly characterizes the singer. One may desire that singing be made much more expressive by varying both volume and pitch at a start of a phrase. Japanese Patent Application Laid-Open Publication No. 2015-034920 (JP 2015-034920, hereinafter) discloses a technology in which a probability model is used to machine learn a relationship between pitch transitions of synthesized singing represented by reference music track data consisting of a combination of note data and lyrics data of a particular music track, with pitch transitions of reference singing data being obtained by actually singing the particular music track. Singing characteristics data that define the probability model are then generated.
One possibility for making singing more expressive is to generate singing characteristics data by using the technology of JP 2015-034920, and further generating sound control data to impart variation in a pitch and volume at a beginning of a phrase based on the singing characteristics data. In the technology of JP 2015-034920, however, the section for which the probability model performs machine learning is determined based on the note data of the reference music track data. Consequently, the technology of JP 2015-034920 is not able to obtain singing characteristics data that could be used to enhance musical expressivity in a section immediately before note-on, since the technology interprets such a section as a silent section, and thus differentiates the section from a voiced section.
The present invention has been made in view of the abovementioned situation, and one of the objects of the invention is to provide a method and device for editing singing voice synthesis data so as to impart enhanced musical expressiveness to singing at a beginning of a phrase. Another object of the invention is to provide an improved method for analyzing singing and to increase utility of a method in editing singing voice synthesis data and a device used for realizing the method.
A singing voice synthesis data editing method according to one aspect of the present invention includes adding to singing voice synthesis data, a piece of virtual note data that is placed immediately before a piece of note data having no contiguous preceding piece of note data, the singing voice synthesis data including: multiple pieces of note data each specifying a duration and a pitch at which each note that is in a time series, representative of a melody to be sung, is voiced; multiple pieces of lyrics data associated with at least one of the multiple pieces of note data; and a sequence of sound control data for directing sound control over a singing voice that is synthesized from the multiple pieces of lyrics data. The method additionally includes obtaining sound control data that directs sound control over the singing voice synthesized from the multiple pieces of lyrics data, and that is associated with the piece of virtual note data. The above method may also be embodied as a device for editing singing voice synthesis data.
When there is a piece of note data that has no contiguous preceding piece of note data, such as at the beginning of a phrase, the method or device for editing singing voice synthesis data of the present invention adds to the singing voice synthesis data a piece of virtual note data that is placed immediately before the note data that does not have any contiguous preceding piece of note data. Sound control data associated with the piece of virtual note data is then obtained. Accordingly, it is possible to implement sound control by the sound control data for the section before the first note-on timing of a phrase, whereby singing at the beginning of the phrase is made expressive.
A singing analysis method according to another aspect of the present invention includes generating singing characteristics data based on music track data that includes multiple pieces of note data each specifying a duration and a pitch at which each note that is in a time series, representative of a melody to be sung, is voiced, with multiple pieces of lyrics data associated with at least one of the multiple pieces of note data, as well as singing data indicating a singing voice waveform obtained by singing the music track. The generated singing characteristics data defines a probability model for generation of singing data from the music track data. The singing analysis method also includes adding, to the music track data from which the singing characteristics data is generated, a piece of virtual note data placed immediately before a piece of note data having no contiguous preceding piece of note data, among the multiple pieces of note data. The singing analysis method may be embodied as a singing analysis device that executes such singing analysis method.
According to this method or device for analyzing singing, singing characteristics data are generated based on the music track data to which the piece of virtual note data has been added. Consequently, by using the obtained singing characteristics data, the aforementioned method or device for editing singing voice synthesis data enables generation of sound control data appropriate for the piece of virtual note data that has been added.
An embodiment of the invention will be described below, referring to the drawings.
The singing analysis device 100 generates singing characteristics data Z that represents the singing style of a particular singer (hereinafter the “reference singer”). Singing style as used here means a manner of expression including the way of singing that is distinctive to the reference singer (for example, note bending) and facial expressions. The singing voice synthesis device 200 executes singing voice synthesis incorporating singing characteristics data Z generated by the singing analysis device 100, and then generates singing voice signals of a singing voice of any music track that reflects the singing style of the reference singer. In other words, even when the singing voice of the reference singer is not available for a desired music track, the singing voice synthesis device 200 may generate a singing voice of the subject music track with the singing style of the reference singer attributed thereto (i.e., a voice sounding as if the reference singer is actually singing the music track).
Singing Analysis Device 100
The singing analysis device 100 has a CPU 12, a volatile storage unit 13, a non-volatile storage unit 14 and a communication interface (I/F) 15. The non-volatile storage unit 14 is formed of, for example a read-only memory (ROM) or a hard disc device (HDD), and stores reference singing data XA and reference music track data XB that are used to generate singing characteristics data Z. The reference singing data XA represents the waveform of the voice of the reference singer singing a particular music track as shown as an example in
By executing a singing analysis program GA stored in the non-volatile storage unit 14, the CPU 12 achieves multiple functions for generating the singing characteristics data Z of the reference singer (a variable extractor 22, a characteristics analyzer 24, and a virtual note data is adder 26). The singing analysis program GA may be provided in a form stored in a computer-readable storage medium and be installed in the singing analysis device 100.
Such storage medium and the non-volatile storage unit 14 are, for example, non-transitory recording media, and they may be any publicly known recording media such as optical recording media (optical discs) such as a CD-ROM, magnetic recording media and semiconductor recording media. “Non-transitory” recording media mentioned in the description of the present invention include all types of recording media that may be read by a computer, except for transitory, propagating signals, and they do not exclude volatile recording media. The singing analysis program GA may alternatively be provided in a form distributed through a communication network and be installed onto a computer.
The variable extractor 22 obtains from the reference singing data XA, a sequence of feature quantities of the reference voice. In this example, the variable extractor 22 sequentially calculates, as the feature quantity, a difference (hereinafter, a relative pitch) R that is the difference between a pitch PB of the synthetic voice generated through voice synthesis using the reference music track data XB, and a pitch PA of the reference voice represented by the reference singing data XA. In other words, the relative pitch R may be also referred to as the numerical value of the pitch bend of the reference voice (the amount the pitch PA of the reference sound varies against the pitch PB of the synthetic voice that serves as a benchmark). As shown in
The transition generator 32 sets a transition (hereinafter, synthetic pitch transition) CP of the pitch PB of the synthetic voice generated through voice synthesis using the reference music track data XB. In phoneme-connecting voice synthesis using the reference music track data XB, the synthetic pitch transition (pitch curve) CP is generated according to the pitch and voicing duration specified by the reference music track data XB for each note, and the phonemes corresponding to the lyrics of respective notes are tuned to each pitch PB of the synthetic pitch transition CP and inter-connected, whereby a synthetic voice is generated. The transition generator 32 generates the synthetic pitch transition CP according to the reference music track data XB of the reference music track. As will be understood from the above, the synthetic pitch transition CP corresponds to the locus of the model. (standard) pitch PB of the voice singing the reference music track.
It is of note that whereas the synthetic pitch transition CP may be used in voice synthesis as mentioned above, actual generation of a synthetic voice is not necessary with the singing analysis device 100 as long as the synthetic pitch transition CP corresponding to the reference music track data XB is generated.
The pitch detector 34 of
Specifically, the interpolator 36 sets a sequence of the pitch PA in an interpolated section (a first interpolated section) ηA2, of a predetermined length, of the unvoiced section σ0, the first interpolated section ηA2 residing at the starting side of the unvoiced section σ0, based on a sequence of the pitch PA in a section (a first section) ηA1, of a predetermined length, of the voiced section σ1, the first section ηA1 residing at the ending side of the voiced section σ1. For example, the interpolator 36 sets numerical values along an approximate line (regression line for example) L1 of the sequence of the pitch PA in the section ηA1, as the pitch PA in the interpolated section ηA2 that is immediately subsequent to the section ηA1. In other words, the sequence of the pitch PA in the voiced section σ1 is extended into the unvoiced section σ0 such that the transition of the pitch PA is continuous across the voiced section σ1 (section ηA1) and the immediately subsequent unvoiced section σ0 (interpolated section ηA2).
Similarly, the interpolator 36 sets a sequence of the pitch PA in an interpolated section (a second interpolated section) ηB2, of a predetermined length, of the unvoiced section σ0, the second interpolated section ηB2 residing at the ending side of the unvoiced section σ0, based on a sequence of the pitch PA in a section (a second section) ηB1, of a predetermined length, of the voiced section σ2, the second section ηB1 residing at the starting side of the voiced section σ2. For example, the interpolator 36 sets each numerical value along an approximate line (a regression line for example) L2 of the sequence of the pitch PA within the section ηB1 as the pitch PA within the interpolated section ηB2 that immediately precedes the section ηB1. In other words, the sequence of the pitch PA within the voiced section σ2 is extended into the unvoiced section σ0 so that the transition of the pitch PA is continuous across the voiced section σ2 (section ηB1) and the immediately preceding unvoiced section σ0 (interpolated section ηB2). The section ηA1 and the interpolated section ηA2 are set to have the same time duration with each other, and the section ηB1 and the interpolated section ηB2 are also set to have the same time duration with each other. It is, however, of note that each section may have a time duration different from one another. In addition, the time duration of the section ηA1 and that of the section ηB1 may or may not be the same, and the time duration of the interpolated section ηA2 and that of the interpolated section ηB2 also may or may not be the same.
As shown in
The characteristics analyzer 24 of
The section setter 42 segments the sequence of the relative pitches R, which has been generated by the variable extractor 22, into multiple sections (hereinafter, unit sections) UA along the time axis. More specifically and as understood from
Furthermore, the section setter 42 associates the following information with each of the multiple unit sections UA:
A phrase here stands for a section in the reference music track corresponding to a melody (a sequence of notes) that the listener recognizes as a cohesive unit of music. The unit section UA set by the section setter 42 is thus distinguished from a phrase. For example, the reference music track may be segmented into phrases with silent sections as the boundaries, the silent sections having a time length longer than a predetermined time length (for example, a fourth-note rest or longer).
The analysis processor 44 of
The analysis processor 44 generates the decision tree T[n] through machine learning (decision tree learning) in which whether or not predetermined conditions (queries) related to the unit sections UA are met is sequentially determined. The decision tree T[n] is a classification tree that puts (clusters) the unit sections UA into clusters, and the decision tree T[n] is expressed in a tree structure in which nodes v (va, vb, and vc) are connected to one another over multiple levels. As shown in
At the root node va and the intermediate nodes vb, whether or not such conditions as the following (contexts) are net is determined: whether or not the unit section UA is a silent section; whether or not a note in the unit section UA is shorter than a sixteenth note; whether or not the unit section UA is on the starting side of a note; or whether or not the unit section UA is on the ending side of a note. The time point at which to terminate the classification of each unit section UA (the timing at which the decision tree T[n] is finalized) is determined according to the standard of Minimum Description Length (MDL) for example. The structure of the decision tree T[n] (such as the number of intermediate nodes vb and the conditions set thereat, as well as the K number of the end nodes ye) differs for each of the states St of the probability model M.
The variable information D[n] of the unit data z[n] in
Meanwhile, the section setter 42 segments the reference music track into multiple unit sections UA for every unit sound value, by referring to the reference music track data XB (SA5). In doing so, the virtual note data adder 26 first adds virtual note data to the reference music track data XB. The section setter 42 then performs the segmentation by referring to the reference music track data XB after the virtual note data being added thereto. In other words, when there is a time difference longer than a predetermined duration between the note-off timing of a preceding note and the note-on timing of a subsequent note (such as at the beginning of a phrase), the two notes placed side by side in the reference music track data XB, the virtual note data adder 26 adds a piece of virtual note data that is placed immediately before the subsequent note. The section setter 42 segments multiple notes included in the reference music track data XB containing one or more pieces of such virtual note data into a predetermined duration (for example, the duration of a sixteenth note), wherein the segmenting is performed, for each and every note, in the order from the beginning of each note to the end of each note.
More specifically, the section setter 42 segments, into unit sections UA, each note, except for virtual notes, included in the reference music track data XB. The section setter 42 also segments, this time into unit sections UA′ that have the same length as the unit sections UA, the notes corresponding to the virtual note data (refer to
It is of note that the detailed way of adding the virtual note data is the same as that indicated in
The analysis processor 44 generates a decision tree T[n] for each state St of the probability model M through machine learning using each unit section (UA or UA′) (SA6). The analysis processor 44 then generates the variable information D[n] corresponding to the relative pitch R within each unit section (UA or UA′) that has been classified as an end node vc of the decision tree T[n] (SA7). Subsequently, the analysis processor 44 stores in the non-volatile storage unit 14, the singing characteristics data Z that contains the unit data z[n] for each state St of the probability model M, the unit data z including the decision tree T[n] generated in step SA6 and the variable information D[n] generated in step SA7 (SA8). By repeating the abovementioned steps for every combination of a reference singer (reference singing data XA) and a reference music track data XB, the non-volatile storage unit 14 appropriately stores different sets of singing characteristics data Z respectively corresponding to the mutually differing reference singers.
The above explanation of the functions of the singing analysis device 100 has hereto focused on the generation of the singing characteristics data that indicates a pitch transition. The same method can generally be applied to the generation of singing characteristics data that indicates a volume transition. Unlike the generation of the singing characteristics data that indicates a pitch transition, however, the singing characteristics data that indicates a volume transition does not use the volume characteristics of the reference music track data XB, but rather uses a volume characteristics detected from the reference singing data XA as it is, as the singing characteristics data.
Singing Voice Synthesis Device 200
In
The non-volatile storage unit 202 of the present embodiment stores a singing voice synthesis program 210, a phoneme database 220, and a singing characteristics database 230. The singing voice synthesis program 210 and the phoneme database 220 are read out of a storage media by the memory I/F 207, or received from a network server by the communication I/F 206 for example, and are stored in the non-volatile storage unit 202. The singing characteristics database 230 is made up of the singing characteristics data Z, which is generated by the singing analysis device 100, either by being downloaded via the communication I/F 206 or by being read out of a storage medium having stored thereon the singing characteristics data Z by the memory I/F 207, and then being stored in the non-volatile storage unit 202 for compilation in a database. Each of the storage medium from which the singing voice synthesis program 210 is read out, the non-volatile storage unit 202, and the volatile storage unit 203 may be, for example, non-transitory recording media and they may include optical recording media (optical discs) such as CD-ROMs, or alternatively, any commonly known recording media such as magnetic recording media or semiconductor recording media.
The phoneme database 220 is a collection of phoneme waveform data indicating waveforms of various phonemes, such as consonants and vowels, that are materials forming a singing voice. This phoneme waveform data refers to data based on phoneme waveforms extracted from the waveform of a voice of an actual person. The phoneme database 220 contains groups of phoneme waveform data obtained from singing voice waveforms of different singers having different voice qualities, such as a male voice, a female voice, a clear voice, or a husky voice. The singing voice synthesis program 210 causes the CPU 201 to execute singing voice synthesis using this phoneme database 220, and the singing characteristics database 230.
A data format such as VSQ or VSQX is used for singing voice synthesis data 310, and the singing voice synthesis data 310 includes a sequence of note data 311, a sequence of lyrics data 312, and a sequence of sound control data 313. The note data 311 indicates a sequence of notes representing the melody of a song, and more specifically, a sequence of multiple pieces of note data that specify the duration and pitch at which each note is voiced. The lyrics data 312 indicates the lyrics to be sung along with the notes, and more specifically, it is a sequence of multiple pieces of lyrics data indicating names of multiple phonemes that are present in the lyrics. Each of the pieces of lyrics data indicating the phoneme names of the lyrics is associated with at least one of the pieces of note data 311. In other words, each piece of the lyrics data is data indicating the lyrics, or more specifically, data indicating the phoneme names of the lyrics associated with each piece of note data 311 indicating one of the notes included in the sequence of note data 311. The sound control data 313 is a sequence of data for controlling the volume and the pitch at which singing is performed, based on the lyrics indicated by the lyrics data 312 along with the notes indicated by the note data 311.
The singing voice synthesis data editor 211 causes the display 204 to display a Graphical User Interface (GUI) that accepts the input operation of the singing voice synthesis data 310. With the GUI displayed, the user operates the operator 205 to input each of the pieces of singing voice synthesis data 310. The singing voice synthesis data editor 211 stores, in a predetermined storage area within the volatile storage unit 203, the singing voice synthesis data 310 that the user has input by operating the operator 205. Meanwhile, when the user inputs an instruction to store the singing voice synthesis data 310 by operating the operator 205, the singing voice synthesis data editor 211 stores in the non-volatile storage unit 202 the singing voice synthesis data 310 that was stored in the volatile storage unit 203.
As part of a function unique to the present embodiment, the singing voice synthesis data editor 211 has a virtual note data adder 211a and a sound control data acquirer 211b. In a sequence of note data 311 of the singing voice synthesis data 310, when there is a piece of note data that does not have a contiguous preceding piece of note data, the virtual note data adder 211a adds to the sequence of note data 311, a piece of virtual note data that is placed immediately before the piece of note data having no contiguous preceding piece of note data.
In
When the user inputs a singing voice synthesis instruction by operating the operator 205, the sequencer 212, while advancing the relative time that has as its benchmark the starting point of the singing voice synthesis data 310 stored in the volatile storage unit 203, reads out from the volatile storage unit 203 a piece of note data 311 whose relative time is the beginning of the voiced period, as well reading out a piece of lyrics data 312 and a piece of sound control data 313 each of which is associated with the piece of note data 311. The sequencer 212 then supplies to the singing voice synthesizer 213 each of the piece of note data 311, the piece of lyrics data 312, and the volume control data and the pitch control data included in the piece of sound control data 313.
The singing voice synthesizer 213 first reads out from the phoneme database 220 one or more pieces of phoneme waveform data corresponding to the phoneme name(s) indicated by the lyrics data supplied from the sequencer 212. Then by performing pitch conversion on the phoneme waveform data piece(s), the singing voice synthesizer 213 generates phoneme waveform data piece(s) having a pitch obtained by changing the pitch indicated by the piece of note data 311 based on the pitch control data. Subsequently, the singing voice synthesizer 213 performs on the generated phoneme waveform data piece(s) volume control indicated by the volume control data. The singing voice synthesizer 213 smoothly connects along the time axis, the phoneme waveform data pieces thus obtained. In this way, the singing voice synthesizer generates a digital sound signal for Outputting a singing voice (singing waveform data that is in a waveform format), and outputs the generated singing waveform data to the sound system 208. Such is the functional configuration achieved by the execution of the singing voice synthesis program 210.
Operation of the Present Embodiment
Operation of the present embodiment will now be described below.
According to the present embodiment, the user of the singing voice synthesis device 200 may accumulate in the singing characteristics database 230 of the non-volatile storage unit 202 the singing characteristics data. Z of the desired singer generated by the singing analysis device 100. The user of the singing voice synthesis device 200 may use for singing voice synthesis, the singing characteristics data Z of the desired singer stored in the singing characteristics database 230.
When the user of the singing voice synthesis device 200 operates the operator 205 in a predetermined way, the CPU 201 executes the singing voice synthesis program 210. Into the singing voice synthesis data, editor 211 of the singing voice synthesis program 210 there is input, for example by a user operating the operator 205, a piece of note data 311 and one or more pieces of lyrics data 312, which are then stored in a predetermined area within the volatile storage unit 203. The singing voice synthesis data editor 211 of the present embodiment includes a function of editing the sound control data 313 that is associated with the piece of note data 311 and the one or more piece of lyrics data 312.
First, the CPU 201 performs the preprocessing (SB1).
The CPU 201 performs virtual note data addition (SB2).
To summarize, as stated above, whereas the singing voice synthesizer 213 generates phoneme waveform data by changing the pitch indicated by the note data 311 based on the pitch control data, the note data 311 here does not include the virtual note data. It is of note here that with regard to the adjustment made to the note data N1 in the preprocessing (the adjustment from section (a) to (b) in
Subsequently, the CPU 201 determines whether the editing mode of the sound control data has been selected by the user as the manual-editing mode or the automatic-editing mode (SB3).
In a case that the user selects the manual-editing mode, the CPU 201 causes the display 204 to display the sequence of note data 311 and the lyrics data 312. The CPU 201 then acquires the sound control data including the volume control data and the pitch control data that the user inputs by operating the operator 205 (SB4). In this case, the user may input sound control data for the section(s) of the virtual note data. It is of note, however, that the virtual note data is not included in the sequence of note data 311 supplied by the sequencer 212.
On the other hand, in a case that the user selects the automatic-editing mode, the CPU 201 generates the sound control data based on the sequence of note data 311, the lyrics data 312 and the singing characteristics data Z of the desired singer selected by the user (SB5).
More specifically, the CPU 201 refers to the sequence of note data 311 to which the virtual note data has been added and segments along the time axis, the melody line of the music track that is the target of singing voice synthesis, into multiple unit sections, each having a unit value (for example, a sixteenth note) that is substantially the same as the aforementioned unit section UA or UA′. A synthetic music track that is the target of singing voice synthesis is the sequence of note data 311 (the sequence of note data 311 with virtual note data added thereto) of the singing voice synthesis data 310. The CPU 201 segments each of the multiple notes (pieces of note data corresponding to the notes originally included in the sequence of note data 311 and the pieces of virtual note data that have been added) included in this sequence of note data 311. This segmentation is performed in substantially the same manner as in the aforementioned segmentation of the unit section UA and UA′.
Subsequently, the CPU 201 applies each unit section to the decision tree T[n] of the unit data z[n] corresponding to the n-th state St of the probability model M included in the singing characteristics data Z. By doing so, the CPU 201 specifies an end node vc that the relevant unit section belongs to, among the K number of end nodes vc of the decision tree T[n], and then specifies the sequence of the relative pitches R by using each variable ω (ω0, ω1, ω2, and ωd) of the variable group Ω[k] corresponding to the relevant end node ye among the variable information D[n]. By sequentially executing the above processes for each state St of the probability model M, the sequence of the relative pitches R within the unit section is specified. More specifically, the duration of each state St is set according to the variable cod of the variable group Ω[k], and each relative pitch R is calculated such that the simultaneous probability of the following appearance probabilities is maximized: the appearance probability of the relative pitch R defined by the variable ω0; the appearance probability of the alteration over time ΔR of the relative pitches R defined by the variable ω1; and the appearance probability of the second order differential value Δ2R of the relative pitches R defined by the variable ω2. By connecting the sequences of the relative pitches R across multiple unit sections along the time axis, a relative pitch transition CR that extends over the entire synthetic music track is generated. The CPU 201 designates pitch control data indicating a relative pitch transition CR as the sound control data 313.
In the above, a description of the editing of the pitch control data is provided as an example. Editing of the volume control data substantially follows the same steps, and the CPU 201 generates the volume control data indicating the volume transition, which occurs while the singing is performed, based on a sequence of note data 311 with piece(s) of virtual note data added to, the lyrics data 312, and the singing characteristics data Z.
When the user inputs an instruction for singing voice synthesis by operating the operator 205, in the same aforementioned manner, the sequencer 212 supplies to the singing voice synthesizer 213, a piece of note data 311, the lyrics data 312 associated with this piece of note data 311, and the sound control data 313 after reading them out of the volatile storage unit 203. Here, the sound control data 313 includes sound control data that controls the volume and the pitch of the section of the virtual note data.
The singing voice synthesizer 213 first reads out from the phoneme database 220, one or more pieces of phoneme waveform data corresponding to the phoneme names indicated by the lyrics data supplied from the sequencer 212. Then by performing pitch conversion to such phoneme waveform data piece(s), the singing voice synthesizer 213 generates phoneme waveform data piece(s) having a pitch that is obtained by changing the pitch indicated by the piece of note data according to the pitch control data. Subsequently, the singing voice synthesizer 213 performs volume control, indicated by the volume control data, to the generated phoneme waveform data piece(s).
The pitch control data and the volume control data in this case includes the pitch control data and the volume control data corresponding to the section of the virtual note.
Accordingly, by the present embodiment, a variation in pitch and volume according to the singing characteristics of the desired singer may be given to the section immediately before the section having no contiguous preceding note such as a beginning of a phrase. As a result, singing is made more expressive.
The variable extractor 22 and the characteristics analyzer 24 of the singing analysis device 100 (refer to
Other Embodiments
The above description applies to one embodiment of the present invention, but other embodiments are possible for the present invention. The below are examples of such other embodiments.
100 . . . singing analysis device, 200 . . . singing voice synthesis device, 12 and 201 . . . CPU, 14 and 202 . . . non-volatile storage unit, 12 and 203 . . . volatile storage unit, 15 and 206 . . . communication I/F, 204 . . . display, 205 . . . operator, 207 . . . memory I/F, 208 . . . sound system, GA . . . singing analysis program, 22 . . . variable extractor, 24 . . . characteristics analyzer, XA . . . reference singing data, XB . . . reference music track data, Z . . . singing characteristics data, 210 . . . singing voice synthesis program, 220 . . . phoneme database, 230 . . . singing characteristics database, 211 . . . singing voice synthesis data editor, 211a and 26 . . . virtual note data adder, 211b . . . sound control data acquirer, 212 . . . sequencer, 213 . . . singing voice synthesizer, 310 . . . singing voice synthesis data, 311 . . . note data, 312 . . . lyrics data, and 313 . . . sound control data.
Number | Date | Country | Kind |
---|---|---|---|
2015-146889 | Jul 2015 | JP | national |
2016-102192 | May 2016 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5621182 | Matsumoto | Apr 1997 | A |
20050137862 | Monkowski | Jun 2005 | A1 |
20090306987 | Nakano | Dec 2009 | A1 |
20100175539 | Silbert | Jul 2010 | A1 |
20150025892 | Lee | Jan 2015 | A1 |
20150040743 | Tachibana | Feb 2015 | A1 |
Number | Date | Country |
---|---|---|
2015-34920 | Feb 2015 | JP |
Number | Date | Country | |
---|---|---|---|
20170025115 A1 | Jan 2017 | US |