The present invention relates to a method and apparatus for music generation and more particularly to a method and apparatus for generating a piece of music after receiving any length of input such as a segment of sound or music.
Along with the time progress, music has become a big part of human life, and people can easily access to music almost anytime and anywhere. Some people like lyricists and composers are good at creating melody, chord, beat or a complete music, and they can even rely on producing music to make a living. However, not everyone has his/her talent in creating music, and, for those people, it may be wonderful when they can create his/her own works through a music generation method and apparatus. Therefore, there remains a need for a new and improved design for a method and apparatus for music generation to overcome the problems presented above.
The present invention provides a method and apparatus for music generation which may include steps of receiving an any length of input; recognizing pitches and rhythm of the input; generating a first segment of a full music; generating segments other than the first segment to complete the full music; generating connecting notes, chords and beats of the segments of the full music and handling anacrusis; and generating instrument accompaniment for the full music.
Techniques for sound extractions are employed in sound processing and several data representations, and the key features of input are configured to be extracted according to the characteristics of input sounds. The step of recognizing pitches and rhythm of the input is a signal processing of the input, and the frame of a generated music is generated in this step including an initial short melody and an initial bars and time signature.
After the frame of the generated music is generated, the sound input is processing through a deep learning system to generate a first segment of a full music and segments other than the first segment to complete a full music in sequence. Furthermore, each of the two steps is completed through the deep learning system including steps of extracting music instrument digital interface (MIDI); extracting melody; extracting chord; extracting beat; and extracting music progression of the input sound.
The detailed description set forth below is intended as a description of the presently exemplary device provided in accordance with aspects of the present invention and is not intended to represent the only forms in which the present invention may be prepared or utilized. It is to be understood, rather, that the same or equivalent functions and components may be accomplished by different embodiments that are also intended to be encompassed within the spirit and scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs. Although any methods, devices and materials similar or equivalent to those described can be used in the practice or testing of the invention, the exemplary methods, devices and materials are now described.
All publications mentioned are incorporated by reference for the purpose of describing and disclosing, for example, the designs and methodologies that are described in the publications that might be used in connection with the presently described invention. The publications listed or discussed above, below and throughout the text are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention.
In order to further understand the goal, characteristics and effect of the present invention, a number of embodiments along with the drawings are illustrated as following:
Referring to
Techniques for sound extractions are employed in sound processing and several data representations, and the key features of input are configured to be extracted according to the characteristics of input sounds. The step of recognizing pitches and rhythm of the input (120) is a signal processing of the input, wherein the frame of a generated music is generated in this step including an initial short melody and an initial bars and time signature, and the data representations of generating the initial short melody (Equation 1) and the initial bars and time signatures (Equation 2) are shown as below:
M0={nM
nM
Notes in main melody does not overlap
Equation 1. Data Representations of Generating Initial Short Melody
B0={b0,1, . . . ,b0,|B
b0,i=(tb
At this point the time signature for each bar should be same: ∀1≤i<j≤|B0|, sb
Equation 2. Data Representations of Generating Initial Bars & Time Signatures
After the frame of the generated music is generated, the sound input is processing through a deep learning system (200) to generate a first segment of a full music (130) and segments other than the first segment to complete a full music (140) in sequence. Furthermore, each of the two steps (130) (140) is completed through the deep learning system (200) including steps of extracting music instrument digital interface (MIDI) from the music input (201); extracting score information from the MIDI (202); extracting a main melody from the MIDI (203); extracting a chord progression from the MIDI (204); extracting a beat pattern from the MIDI (205); extracting a music progression from the MIDI (206); and applying a music theory to the melody, chord progression and beat pattern extracted in steps 203 to 205 (207) as shown in
Regarding music sequence handling, when generating a segment of music having several bars, deep learning models have a tendency to generate these bars uniquely. However, real-world music often has some degree of repetition among those bars in the same segment. By introducing such repetition, the music can leave a stronger imprint of its motive and main theme to the listener.
In our invention, we define three types of music sequence: (i) melody sequence: this sequence determines how the main melody is to be repeated. For example, the first 2 bars of Frère Jacques has the same main melody, and bars 3-4 of the song also have the same melody; (ii) beat pattern sequence: this sequence determines how the beat/rhythm pattern is to be repeated. For example, in the Happy Birthday song, the same 2-bar beat/rhythm pattern is repeated four times; and (iii) chord progression sequence: this sequence determines how the chord progression is to be repeated. Unlike melody and beat pattern, the repetition of chord progression is more limited. In the present invention, we only allow chord progression to be repeated from the beginning of the segment because repeating a chord progression from the middle of another chord progression could have a negative effect on the music.
In one embodiment, the music sequence can be extracted from a music database, which includes steps of: (i) identifying the key of the music and perform chord-progression recognition; (ii) splitting music into segments based on recognized chord progression; (iii) extracting the main melody and beat pattern for each bar in the segment; and (iv) utilizing machine learning algorithm to determine which bars have their melody/beat-pattern/chord-progression being repeated.
In another embodiment, when generating a segment of music of n bars, the process is as follows: (i) selecting a music sequence from the database with length n; (ii) based on the selected music sequence and input melody, generating chord progression for current segment, which will match input melody as well as selected music sequence (i.e. repeat previous chords when instructed by music sequence); and (iii) generating melody and beat pattern bar by bar.
In a further embodiment, the step of generating melody and beat pattern bar by bar may include three possibilities. First, a bar with entirely new beat and melody. The system then utilizes deep learning to generate new beat pattern and melody. After generation, the system records generated beat & melody for future use.
Second, a bar needs to repeat a previous beat pattern but does not need to repeat previous melody. The system first loads the previously generated beat pattern. Next, the system uses deep learning to generate the new melody. The generated melody might not match the beat pattern previously generated. Thus, the final step is to align generated melody to the beat pattern. (more on this later) After generation, the system records generated beat & melody for future use. Third, a bar needs to repeat a previous beat pattern and melody. The system can simply load previously generated beat pattern and melody.
In still a further embodiment, the generated melody might not have the same rhythm as the beat pattern previously generated because a beat pattern determines at what time there should be a new note. As a result, the generated melody must be aligned with the beat pattern. For a melody with n notes and a beat pattern requesting m notes, the process of aligning the melody to the beat pattern is as follows:
Regarding melody mutation handling, it is known that repetition is very important to music. However, too much repetition can make music sounds boring. As a result, we introduce melody mutation to introduce some more variation to the generated music, while preserving the strengthened motive introduced by music sequence. After each segment of music is generated, we apply music mutation to generated segment. Similar to music sequence, music mutation may include chord mutation, beat mutation and melody mutation. The general mutation process is as follows:
In the step of extracting main melody from MIDI (203), the deep learning system (200) is configured to get one track which is most likely to be the main melody of the music to generate. However, it is also possible for the deep learning system (200) to extract more than one main melody from a MIDI file. The data representation of extracting main melody from MIDI (203) (Equation 3) is shown as below:
M={nM, . . . ,nM|M|}
nMi=(tMi,dMi,hMi,vMi)
Notes in main melody does not overlap
Equation 3. Data Representation of Extracting Main Melody
In the step of extracting chord progression from MIDI (204), a chord progression is generated through the data representations of extracting chord progression from MIDI (204) (Equation 4) which is shown as below:
C={(tC1,c1), . . . ,(tC|C|,c|C|)}
In the step of extracting beat pattern from MIDI (205), the deep learning system (200) is configured to use heuristic data representations to extract the beat pattern for each bar, and a beat pattern is generated through the data representations of extracting beat pattern from MIDI (205) (Equation 5) which is shown as below:
E=E1∪ . . . ∪E|B|
Ei={(tE
Moreover, in one embodiment, the chord progression of the generated music is configured to be adjusted according to the generated beat pattern. The deep learning system (200) is adapted to assume a chord change can only happen at a downbeat. The deep learning system (200) is adapted to detect whether there is a chord change for each downbeat and identify which chord is changed when detecting a chord change so as to generate the adjusted chord progression.
In the step of extracting music progression from MIDI (206), a music progression is generated from MIDI, and the data representations of extracting music progression from MIDI (206) (Equation 6) is shown as below:
={(P1,l1), . . . ,(P||,l||)}
Pi={bP
Moreover, after the extracting processes, the deep learning system (200) is configured to be self-trained and developed to a deep learning model in the system (200).
Therefore, in the step of generating a first segment of a full music (130), the main melody, the chord progression, and the beat of the first segment of the full music are respectively generated through the deep learning system (200) in following data representations (Equations 7, 8 and 9), wherein the first segment of the full music is defined as Part x:
Mx={nM
nM
Notes in main melody does not overlap
M0⊆Mx
nM
Equation 7. Data Representations of Extracting Main Melody for Part x
Cx={(tC
On the other hand, in the step of generating segments other than the first segment to complete the full music (140), the main melody, the chord progression, and the beat of segments other than the first segment are respectively generated through the deep learning system (200) in following data representations (Equations 10, 11 and 12):
M′=M′1∪ . . . ∪M′||
M′i={nM′
nM′
Notes in main melody does not overlap
M′i∩M′j=Ø,∀i≠j
Equation 10. Data Representations of Initial Melody for Full Music
Cx={(tC
The step of generating connecting notes, chords and beats of the segments of the full music and handling anacrusis (150) is processing after the full music including melody, chord progression and beat pattern is generated from the deep learning system (200). In this step, a music generating system of the present invention having music theory database is configured to generate connecting notes, chords, and beats between two connected segments and to handle anacrusis such as generating unstressed notes before first bar of a segment, wherein the music theory may include an anacrusis handler and a connection handler as shown in
M′=M′1∪ . . . ∪M′||
M′i={nM′
nM′
Notes in main melody does not overlap
M′i∩M′j=Ø,∀i≠j
Equation 13. Data Representations of Generating Melody for Full Music
Cx={(tC
As shown in
R={(R1,I1),(R2,I2), . . . ,(R|R|,I|R|)}
Ri={(tR
Furthermore, since sometimes the generated music or segments of the full music are not perfectly aligned with the bars thereof, the music generating system of the present invention enable a user to modify generated main melody through the deep learning system (200). After the segment, segments or the full music is generated, a user may have some options such as (i) stopping here; (ii) letting the deep learning system (200) to regenerate selected segments; and (iii) letting the deep learning system (200) to regenerate a full music. Moreover, the music generating system of the present invention is configured to save the input sound for use in future or generating a different music by mixing different saved input sounds through the deep learning system (200).
In another embodiment, referring to
={(P1,l1), . . . ,(P||,l||)}
Pi={bP
x∈[1,||]
Some songs are not perfectly aligned with bars. Need some way to represent.
Equation 17. Data Representations of Generating Music Progression From Metadata
In addition, the music generating system of the present invention comprises the deep learning system (200) and means for receiving any length of input (110); recognizing pitches and rhythm of the input (120); generating a first segment of a full music (130); generating segments other than the first segment to complete the full music (140); generating connecting notes, chords and beats of the segments of the full music and handling anacrusis (150); generating instrument accompaniment for the full music (160); and generating music progression from metadata (170).
Having described the invention by the description and illustrations above, it should be understood that these are exemplary of the invention and are not to be considered as limiting. Accordingly, the invention is not to be considered as limited by the foregoing description, but includes any equivalents.
Number | Name | Date | Kind |
---|---|---|---|
5281754 | Farrett | Jan 1994 | A |
20020007722 | Aoki | Jan 2002 | A1 |
20070291958 | Jehan | Dec 2007 | A1 |
20090064851 | Morris | Mar 2009 | A1 |
20140076125 | Kellett | Mar 2014 | A1 |
20160163297 | Trebard | Jun 2016 | A1 |
20190251941 | Sumi | Aug 2019 | A1 |
20190266988 | Sumi | Aug 2019 | A1 |
Number | Date | Country | |
---|---|---|---|
20200066240 A1 | Feb 2020 | US |
Number | Date | Country | |
---|---|---|---|
62723342 | Aug 2018 | US |