Method and apparatus for music generation

FIELD OF THE INVENTION

The present invention relates to a method and apparatus for music generation and more particularly to a method and apparatus for generating a piece of music after receiving any length of input such as a segment of sound or music.

BACKGROUND OF THE INVENTION

Along with the time progress, music has become a big part of human life, and people can easily access to music almost anytime and anywhere. Some people like lyricists and composers are good at creating melody, chord, beat or a complete music, and they can even rely on producing music to make a living. However, not everyone has his/her talent in creating music, and, for those people, it may be wonderful when they can create his/her own works through a music generation method and apparatus. Therefore, there remains a need for a new and improved design for a method and apparatus for music generation to overcome the problems presented above.

SUMMARY OF THE INVENTION

The present invention provides a method and apparatus for music generation which may include steps of receiving an any length of input; recognizing pitches and rhythm of the input; generating a first segment of a full music; generating segments other than the first segment to complete the full music; generating connecting notes, chords and beats of the segments of the full music and handling anacrusis; and generating instrument accompaniment for the full music.

After the frame of the generated music is generated, the sound input is processing through a deep learning system to generate a first segment of a full music and segments other than the first segment to complete a full music in sequence. Furthermore, each of the two steps is completed through the deep learning system including steps of extracting music instrument digital interface (MIDI); extracting melody; extracting chord; extracting beat; and extracting music progression of the input sound.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow chart of a method and apparatus for music generation of the present invention.

FIG. 2 is a flow chart illustrating the processing of a deep learning system of the method and apparatus for music generation in the present invention.

FIG. 3 is a flow chart of another embodiment of the method and apparatus for music generation of the present invention.

FIG. 4 is a flow chart of step 130 of the method and apparatus for music generation of the present invention.

FIG. 5 is a flow chart of step 140 of the method and apparatus for music generation of the present invention.

FIG. 6 is a flow chart of step 150 of the method and apparatus for music generation of the present invention.

FIG. 7 is a flow chart of step 160 of the method and apparatus for music generation of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The detailed description set forth below is intended as a description of the presently exemplary device provided in accordance with aspects of the present invention and is not intended to represent the only forms in which the present invention may be prepared or utilized. It is to be understood, rather, that the same or equivalent functions and components may be accomplished by different embodiments that are also intended to be encompassed within the spirit and scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs. Although any methods, devices and materials similar or equivalent to those described can be used in the practice or testing of the invention, the exemplary methods, devices and materials are now described.

All publications mentioned are incorporated by reference for the purpose of describing and disclosing, for example, the designs and methodologies that are described in the publications that might be used in connection with the presently described invention. The publications listed or discussed above, below and throughout the text are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention.

In order to further understand the goal, characteristics and effect of the present invention, a number of embodiments along with the drawings are illustrated as following:

Referring to FIG. 1, the present invention provides a method and apparatus for music generation, and the method for music generation may include steps of receiving any length of input (110); recognizing pitches and rhythm of the input (120); generating a first segment of a full music (130); generating segments other than the first segment to complete the full music (140); generating connecting notes, chords and beats of the segments of the full music and handling anacrusis (150); and generating instrument accompaniment for the full music (160).

Techniques for sound extractions are employed in sound processing and several data representations, and the key features of input are configured to be extracted according to the characteristics of input sounds. The step of recognizing pitches and rhythm of the input (120) is a signal processing of the input, wherein the frame of a generated music is generated in this step including an initial short melody and an initial bars and time signature, and the data representations of generating the initial short melody (Equation 1) and the initial bars and time signatures (Equation 2) are shown as below:

M₀={n_M₀₁, . . . ,n_M₀_|M₀_|}
n_M₀_j=(t_M₀_j,d_M₀_j,h_M₀_j,v_M₀_j)

- n_M₀_j: jth note of melody
- t_M₀_j: Starting tick of jth note of melody
- d_M₀_j: Duration (ticks) of jth note of melody
- h_M₀_j: Pitch of jth note of melody
- v_M₀_j: Velocity of jth note of melody

Notes in main melody does not overlap

Equation 1. Data Representations of Generating Initial Short Melody

B₀={b_0,1, . . . ,b_0,|B₀_|}
b_0,i=(t_b_0,_i,s_b_0,_i)

- t_b₀_i: Ending tick of the ith bar.
- s_b₀_i: Time signature of ith bar.

At this point the time signature for each bar should be same: ∀1≤i<j≤|B₀|, s_b₀_i=s_b₀_j

Equation 2. Data Representations of Generating Initial Bars & Time Signatures

After the frame of the generated music is generated, the sound input is processing through a deep learning system (200) to generate a first segment of a full music (130) and segments other than the first segment to complete a full music (140) in sequence. Furthermore, each of the two steps (130) (140) is completed through the deep learning system (200) including steps of extracting music instrument digital interface (MIDI) from the music input (201); extracting score information from the MIDI (202); extracting a main melody from the MIDI (203); extracting a chord progression from the MIDI (204); extracting a beat pattern from the MIDI (205); extracting a music progression from the MIDI (206); and applying a music theory to the melody, chord progression and beat pattern extracted in steps 203 to 205 (207) as shown in FIGS. 4 and 5. In the step of extracting MIDI (201), the deep learning system (200) is configured to translate MIDI of the input sound in step 110 to more readable format for the deep learning system (200). Then, through the deep learning system (200), score information, main melody, chord progression, beat pattern, and music progression of the music are acquired after the MIDI information of the sound input is extracted. In one embodiment, the score information is specified at the beginning of MIDI, and the score information can be directly acquired. In one embodiment, the music theory may include a music sequence handler and a melody mutation handler.

Regarding music sequence handling, when generating a segment of music having several bars, deep learning models have a tendency to generate these bars uniquely. However, real-world music often has some degree of repetition among those bars in the same segment. By introducing such repetition, the music can leave a stronger imprint of its motive and main theme to the listener.

In our invention, we define three types of music sequence: (i) melody sequence: this sequence determines how the main melody is to be repeated. For example, the first 2 bars of Frère Jacques has the same main melody, and bars 3-4 of the song also have the same melody; (ii) beat pattern sequence: this sequence determines how the beat/rhythm pattern is to be repeated. For example, in the Happy Birthday song, the same 2-bar beat/rhythm pattern is repeated four times; and (iii) chord progression sequence: this sequence determines how the chord progression is to be repeated. Unlike melody and beat pattern, the repetition of chord progression is more limited. In the present invention, we only allow chord progression to be repeated from the beginning of the segment because repeating a chord progression from the middle of another chord progression could have a negative effect on the music.

In one embodiment, the music sequence can be extracted from a music database, which includes steps of: (i) identifying the key of the music and perform chord-progression recognition; (ii) splitting music into segments based on recognized chord progression; (iii) extracting the main melody and beat pattern for each bar in the segment; and (iv) utilizing machine learning algorithm to determine which bars have their melody/beat-pattern/chord-progression being repeated.

In another embodiment, when generating a segment of music of n bars, the process is as follows: (i) selecting a music sequence from the database with length n; (ii) based on the selected music sequence and input melody, generating chord progression for current segment, which will match input melody as well as selected music sequence (i.e. repeat previous chords when instructed by music sequence); and (iii) generating melody and beat pattern bar by bar.

In a further embodiment, the step of generating melody and beat pattern bar by bar may include three possibilities. First, a bar with entirely new beat and melody. The system then utilizes deep learning to generate new beat pattern and melody. After generation, the system records generated beat & melody for future use.

Second, a bar needs to repeat a previous beat pattern but does not need to repeat previous melody. The system first loads the previously generated beat pattern. Next, the system uses deep learning to generate the new melody. The generated melody might not match the beat pattern previously generated. Thus, the final step is to align generated melody to the beat pattern. (more on this later) After generation, the system records generated beat & melody for future use. Third, a bar needs to repeat a previous beat pattern and melody. The system can simply load previously generated beat pattern and melody.

In still a further embodiment, the generated melody might not have the same rhythm as the beat pattern previously generated because a beat pattern determines at what time there should be a new note. As a result, the generated melody must be aligned with the beat pattern. For a melody with n notes and a beat pattern requesting m notes, the process of aligning the melody to the beat pattern is as follows:

- (i) If n=m: Aligning is straight forward. The system simply modifies the starting time and duration of each note in the melody to match the requirement of the beat pattern.
- (ii) If n>m: The system then selects the least significant note in the melody and remove it. The system repeats this process until n=m, and then use the methodology in (i) to align melody to beat pattern. The significance of the notes in the melody is measured by the following criteria:
  - a. The current chord and key of music. If the pitch of the note matches the key and chord poorly, the note has low significance. For example:
    - i. In a C-key music under chord C major, the note C #will have a low significance since it matches neither the C scale nor the notes consisting C major chord.
    - ii. In a C-key music under chord E major, note G #will have a high significance while G will have a low significance. This is because G #is essential to E major chord while G does not match well in E major.
  - b. Length of the note. Shorter notes have lower significance.
- (iii) If n<m: The system then performs the following operation:
  - a. Remove the beat with the shortest duration from the beat pattern. The removed beat is thus merged with one adjacent beat. This will result in m being reduced by 1. If n=m after removal, then use the methodology in (i) to align melody to beat pattern. Otherwise, go to step (iii)b
  - b. Repeat the most significant note in the melody. The significance is defined in the same fashion as (ii). This operation will result in n being increased by 1. If n=m after removal, then use the methodology in (i) to align melody to beat pattern. Otherwise, go to step (iii)a.

Regarding melody mutation handling, it is known that repetition is very important to music. However, too much repetition can make music sounds boring. As a result, we introduce melody mutation to introduce some more variation to the generated music, while preserving the strengthened motive introduced by music sequence. After each segment of music is generated, we apply music mutation to generated segment. Similar to music sequence, music mutation may include chord mutation, beat mutation and melody mutation. The general mutation process is as follows:

- (i) Input generated melody, beat pattern and chord progression;
- (ii) For each bar of music,
  - a. “Roll a dice” to determine whether the chord of this bar should be mutated. If true:
    - i. Change the chord according to manually defined chord mutation rules. The chord mutation rules are based on the key of the music. For example, in C key, Dm can be mutated to Bdim.
    - ii. After chord mutation, the melody of this bar will be adjusted to match the new chord. For example, when mutating Em to E, all G note in the melody need to change to G #.
  - b. For each beat in the beat pattern, “roll a dice” to determine whether the beat should be mutated. If true, three possible mutations are applied to the beat:
    - i. Shorten/lengthen the beat. The length of the next beat will be adjusted as a result.
    - ii. Merge the beat with the next beat.
    - iii. Split the beat in to two beats.
  - c. If beat pattern is modified, align melody to modified beat pattern. The alignment process is described in Music Sequence Handling section.
  - d. For each note in the melody, “roll a dice” to determine whether the pitch of the note should be mutated. If true, adjust the pitch of the note according to manually defined note mutation rules. The note mutation rules are based on the key of the music and the chord. For example:
    - i. Under C key and C chord, note G4 can be mutated to C5.
    - ii. Under C key and Em chord, note G4 can be mutated to B4.
- (iii) Repeat step (ii) until all bars have been covered.

In the step of extracting main melody from MIDI (203), the deep learning system (200) is configured to get one track which is most likely to be the main melody of the music to generate. However, it is also possible for the deep learning system (200) to extract more than one main melody from a MIDI file. The data representation of extracting main melody from MIDI (203) (Equation 3) is shown as below:

M={n_M, . . . ,n_M|M|}
n_Mi=(t_Mi,d_Mi,h_Mi,v_Mi)

- n_Mi: ith note of melody
- t_Mi: Starting tick of ith note of melody
- d_Mi: Duration (ticks) of ith note of melody
- h_Mi: Pitch of ith note of melody
- v_Mi: Intensity (Velocity) of the note ith note of melody

Notes in main melody does not overlap

Equation 3. Data Representation of Extracting Main Melody

In the step of extracting chord progression from MIDI (204), a chord progression is generated through the data representations of extracting chord progression from MIDI (204) (Equation 4) which is shown as below:

C={(t_C1,c₁), . . . ,(t_C|C|,c_|C|)}

- t_Ci: Starting tick of the ith chord.
- c_i: Shape of ith chord.
  
  Equation 4. Data Representations of Extracting Chord

In the step of extracting beat pattern from MIDI (205), the deep learning system (200) is configured to use heuristic data representations to extract the beat pattern for each bar, and a beat pattern is generated through the data representations of extracting beat pattern from MIDI (205) (Equation 5) which is shown as below:

E=E₁∪ . . . ∪E_|B|
E_i={(t_E_i₁,e_E_i₁), . . . ,(t_E_i_|E_i_|,e_E_i_|E_i_|)}

- E_i: Beat for ith bar.
- t_E_i_j: Tick of the jth beat in ith bar
- e_E_i_j: Type of jth beat in ith bar.
  
  E_i∩E_j=Ø,∀i≠j
  
  Equation 5. Data Representations of Extracting Beat

Moreover, in one embodiment, the chord progression of the generated music is configured to be adjusted according to the generated beat pattern. The deep learning system (200) is adapted to assume a chord change can only happen at a downbeat. The deep learning system (200) is adapted to detect whether there is a chord change for each downbeat and identify which chord is changed when detecting a chord change so as to generate the adjusted chord progression.

In the step of extracting music progression from MIDI (206), a music progression is generated from MIDI, and the data representations of extracting music progression from MIDI (206) (Equation 6) is shown as below:

custom character ={(P₁,l₁), . . . ,(P_|_|,l_|_|)}
P_i={b_P_i₁, . . . ,b_P_i_|P_i_|}

- P_i: ith part of the song. Each part contains a list of bars
- B_P_i_j∈B. P_iand P_jdo not overlap.
- l_i: Label of ith part of the song (verse, chorus, etc)
  
  Equation 6. Data Representations of Extracting Music Progression

Moreover, after the extracting processes, the deep learning system (200) is configured to be self-trained and developed to a deep learning model in the system (200).

Therefore, in the step of generating a first segment of a full music (130), the main melody, the chord progression, and the beat of the first segment of the full music are respectively generated through the deep learning system (200) in following data representations (Equations 7, 8 and 9), wherein the first segment of the full music is defined as Part x:

M_x={n_M_x₁, . . . ,n_M_x_|M_x_|}
n_M_x_j=(t_M_x_j,d_M_x_j,h_M_x_j,v_M_x_j)

- n_M_x_j: jth note of melody
- t_M_x_j: Starting tick of jth note of melody
- d_M_x_j: Duration (ticks) of jth note of melody
- h_M_x_j: Pitch of jth note of melody
- v_M_x_j: Velocity of jth note of melody

Notes in main melody does not overlap

M₀⊆M_x
n_M_x_i=n_M₀_i,∀i≤|M₀|

Equation 7. Data Representations of Extracting Main Melody for Part x

C_x={(t_C_x₁,c_C_x,₁), . . . ,(t_C_x_|C_x_|,c_C_x_|C_x_|)}

- t_C_x_i: Starting tick of the ith chord.
- c_C_x_i: Shape of ith chord.
  
  C₀⊆C_x
  (t_c_x_i,c_x,i)=(t_c₀_i,c_0,i),∀i≤|C₀|
  
  Equation 8. Data Representations of Extracting Chord Progression for Part x
  
  E_x=E_x,1∪ . . . ∪E_x,|P_x_|
  E_x,i={(t_E_x,i₁,e_E_x,i₁), . . . ,(t_E_x,i_|E_x,i_|,e_E_x,i_|E_x,i_|)}
- E_x,i: Beat for ith bar.
- t_E_x,i_j: Tick of the jth beat in ith bar
- e_E_x,i_j: Type (up or down) jth beat in ith bar.
  
  E₀⊆E_x
  E_x,i=E_0,i,∀i≤|B₀|
  E_x,i∩E_x,j=Ø,∀i≠j
  
  Equation 9. Data Representations of Extracting Beat for Part x

On the other hand, in the step of generating segments other than the first segment to complete the full music (140), the main melody, the chord progression, and the beat of segments other than the first segment are respectively generated through the deep learning system (200) in following data representations (Equations 10, 11 and 12):

M′=M′₁∪ . . . ∪M′_| custom character _|
M′_i={n_M′_i₁, . . . ,n_M′_i_|M′_i_|}
n_M′_i_j=(t_M′_i_j,d_M′_i_j,h_M′_i_j,v_M′_i_j)

- M′_i: Melody of ith part of the song
- n_M′_i_j: jth note of melody M′_i

Notes in main melody does not overlap

M′_i∩M′_j=Ø,∀i≠j

Equation 10. Data Representations of Initial Melody for Full Music

C_x={(t_C_x₁,c_C_x,₁), . . . ,(t_C_x_|C_x_|,c_C_x_|C_x_|)}

- t_C_x_i: Starting tick of the ith chord.
- c_C_x_i: Shape of ith chord.
  
  C₀⊆C_x
  (t_C_x_i,c_x,i)=(t_C₀_i,c_0,i),∀i≤|C₀|
  
  Equation 11. Data Representations of Initial Chord Progression for Full Music
  
  E_x=E_x,1∪ . . . ∪E_x,|P_x_|
  E_x,i={(t_E_x,i₁,e_E_x,i₁), . . . ,(t_E_x,i_|E_x,i_|,e_E_x,i_|E_x,i_|)}
- E_x,i: Beat for ith bar.
- t_E_x,i_j: Tick of the jth beat in ith bar
- e_E_x,i_j: Type (up or down) jth beat in ith bar.
  
  E₀⊆E_x
  E_x,i=E_0,i,∀i≤|B₀|
  E_x,i∩E_x,j=Ø,∀i≠j
  
  Equation 12. Data Representations of Initial Beat for Full Music

The step of generating connecting notes, chords and beats of the segments of the full music and handling anacrusis (150) is processing after the full music including melody, chord progression and beat pattern is generated from the deep learning system (200). In this step, a music generating system of the present invention having music theory database is configured to generate connecting notes, chords, and beats between two connected segments and to handle anacrusis such as generating unstressed notes before first bar of a segment, wherein the music theory may include an anacrusis handler and a connection handler as shown in FIG. 6, and the data representations of generating melody, chord progression, and beat for full music (Equations 13, 14 and 15) are respectively shown as below:

M′=M′₁∪ . . . ∪M′_| custom character _|
M′_i={n_M′_i₁, . . . ,n_M′_i_|M′_i_|}
n_M′_i_j=(t_M′_i_j,d_M′_i_j,h_M′_i_j,v_M′_i_j)

- M′_i: Melody of ith part of the song
- n_M′_i_j: jth note of melody M′_i

Notes in main melody does not overlap

M′_i∩M′_j=Ø,∀i≠j

Equation 13. Data Representations of Generating Melody for Full Music

C_x={(t_C_x₁,c_C_x,₁), . . . ,(t_C_x_|C_x_|,c_C_x_|C_x_|)}

- t_C_x_i: Starting tick of the ith chord.
- c_C_x_i: Shape of ith chord.
  
  C₀⊆C_x
  (t_C_x_i,c_x,i)=(t_C₀_i,c_0,i),∀i≤|C₀|
  
  Equation 14. Data Representations of Generating Chord Progression for Full Music
  
  E=E₁∪ . . . ∪E_|_|
  E_i=E_i,1∪ . . . ∪E_i,|P_i_|
  E_i,j={(t_E_i,j₁,e_E_i,j₁), . . . ,(t_E_i,j_|E_i,j_|,e_E_i,j_|E_i,j_|)}
- E_i: Beat for ith part of the song.
- E_i,j: Beat for ith bar in part P_i.
- t_E_i,j_k: Tick of the kth beat in jth bar in P_i
- e_E_i,j_k: Type (up or down) kth beat in jth bar in P_i
  E_i∩E_k=Ø,∀i≠k
  E_i,j∩E_i,k=Ø,∀j≠k
  
  Equation 15. Data Representations of Generating Beat for Full Music

As shown in FIG. 7, the step of generating instrument accompaniment for the full music (160) is processing after the connecting notes, chords and beats and handling anacrusis is generated for the full music, wherein the data representations of generating instrument accompaniment for the full music (Equation 16) is shown as below:

R={(R₁,I₁),(R₂,I₂), . . . ,(R_|R|,I_|R|)}
R_i={(t_R_i₁,d_R_i₁,n_R_i₁), . . . ,(t_R_i_|R_i_|,d_R_i_|R_i_|,n_R_i_|R_i_|)}

- R: Set of tracks
- R_i: ith track
- I_i: Instrument of ith track
- t_R_i_j: Starting tick of jth note of the ith track
- d_R_i_j: Duration (ticks) of jth note of the ith track
- n_R_i_j: Pitch of jth note of the ith track
  
  R₁=M
  
  Equation 16. Data Representations of Generating Instrument Accompaniment for Full Music

Furthermore, since sometimes the generated music or segments of the full music are not perfectly aligned with the bars thereof, the music generating system of the present invention enable a user to modify generated main melody through the deep learning system (200). After the segment, segments or the full music is generated, a user may have some options such as (i) stopping here; (ii) letting the deep learning system (200) to regenerate selected segments; and (iii) letting the deep learning system (200) to regenerate a full music. Moreover, the music generating system of the present invention is configured to save the input sound for use in future or generating a different music by mixing different saved input sounds through the deep learning system (200).

In another embodiment, referring to FIG. 3, the system of the present invention is configured to accept different inputs in the same time such as user humming (1101) and metadata (1102), wherein the metadata includes genre and user's mood. The main methodology of generating a first segment of a full music (130) and generating segments other than the first segment to complete the full music (140) are same as the embodiment described above, and the steps of generating a first segment of a full music include receiving any length of input (110); recognizing pitches and rhythm of the input (120); generating music progression form metadata (170); generating a first segment of a full music (130); generating segments other than first segment to complete the full music (140); generating connecting notes, chords and beats between two segments of the full music and handling anacrusis (150); and generating instrument accompaniment for the full music (160), wherein the data representations excepting the generating music progression form metadata are the same as described above, and data representations of generating music progression from metadata (Equation 17) is shown as below:

custom character ={(P₁,l₁), . . . ,(P_|_|,l_|_|)}
P_i={b_P_i₁, . . . ,b_P_i_|P_i_|}
x∈[1,||]

- P_i: ith part of the song. Each part contains a list of bars B_P_i_j∈B. P_iand P_jdo not overlap.
- x: The part where the initial melody belongs to
- l_i: Label of ith part of the song (verse, chorus, etc)

Some songs are not perfectly aligned with bars. Need some way to represent.

Equation 17. Data Representations of Generating Music Progression From Metadata

In addition, the music generating system of the present invention comprises the deep learning system (200) and means for receiving any length of input (110); recognizing pitches and rhythm of the input (120); generating a first segment of a full music (130); generating segments other than the first segment to complete the full music (140); generating connecting notes, chords and beats of the segments of the full music and handling anacrusis (150); generating instrument accompaniment for the full music (160); and generating music progression from metadata (170).

Having described the invention by the description and illustrations above, it should be understood that these are exemplary of the invention and are not to be considered as limiting. Accordingly, the invention is not to be considered as limited by the foregoing description, but includes any equivalents.

Number	Name	Date	Kind
5281754	Farrett	Jan 1994	A
20020007722	Aoki	Jan 2002	A1
20070291958	Jehan	Dec 2007	A1
20090064851	Morris	Mar 2009	A1
20140076125	Kellett	Mar 2014	A1
20160163297	Trebard	Jun 2016	A1
20190251941	Sumi	Aug 2019	A1
20190266988	Sumi	Aug 2019	A1

Method and apparatus for music generation

Information

Patent Number

Date Filed

Date Issued

Inventors

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Abstract

Description

Claims

US Referenced Citations (8)

Related Publications (1)

Provisional Applications (1)