Programs have been developed that can generate music based on a lyric inputted by a user. However, the music that is generated by such programs often lacks musical qualities that many people appreciate, and thus isn't very song-like. For example, auto-generated music from such programs can suffer from misalignments in lyrics and melody notes, scattered or disjointed organization and song structure, mismatched rhythm tracks, and lack of a catchy repeating melody. As a result, such programs have not achieved widespread use. As a result, a barrier presently exists to rapid song development using such programs.
In view of the above, a music generation system is provided comprising a processor and a memory operatively coupled to the processor and storing a rhythm template database comprising a plurality of rhythm templates, and a music generation program stored in the memory and executed by the processor to be configured to receive a user input of lyrics, identify a plurality of syllables in the lyrics, determine a syllable pattern in the identified plurality of syllables, match the syllable pattern to a selected rhythm template of the plurality of rhythm templates, generate a melody based on the selected rhythm template, generate a music file encoding the melody and the lyrics, and output the music file encoding the melody and the lyrics.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
In view of the above issues, systems and methods are provided to generate music based on lyrics inputted by a user. Referring to
A bus 20 may operatively couple the processor 14, the input/output module 18, and the volatile memory 16 to the non-volatile memory 24. Although the song configuration file 28, song structure settings 38, score template 46, the score template generator 60, and the music generator 62 are depicted as hosted (i.e., executed) at one computing device 12, it will be appreciated that the song configuration file 28, song structure settings 38, score template 46, score template generator 60, and music generator 62 can alternatively be hosted across a plurality of computing devices to which the computing device 12 is communicatively coupled via a network 22.
As one example of one such other computing device, a client computing device 64 may be provided, which is operatively coupled to the computing device 12. In some examples, the network 22 can take the form of a local area network (LAN), wide area network (WAN), wired network, wireless network, personal area network, or a combination thereof, and can include the Internet.
The computing device 12 comprises a processor 14 and a non-volatile memory 24 configured to store the song configuration file 28, song structure settings 38, score template 46, score template generator 60, and music generator 62 in non-volatile memory 16. Non-volatile memory 24 is memory that retains instructions stored data even in the absence of externally applied power, such as FLASH memory, a hard disk, read only memory (ROM), electrically erasable programmable memory (EEPROM), etc. The instructions include one or more programs, including the music generation program 26 comprising the score template generator 60, the music generator 62, and data used by such programs sufficient to perform the operations described herein. In response to execution by the processor 14, the instructions cause the processor 14 to execute the music generation program 26.
The processor 14 is a microprocessor that includes one or more of a central processing unit (CPU), a graphical processing unit (GPU), an application specific integrated circuit (ASIC), a system-on-chip (SOC), a field-programmable gate array (FPGA), a logic circuit, or other suitable type of microprocessor configured to perform the functions recited herein. The system 10 further includes volatile memory 16 such as random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), etc., which temporarily stores data only for so long as power is applied during execution of programs.
Referring to
Referring back to
The user input 82 of lyrics may comprise song structure settings 38 or lyrics settings 42. Referring to
The order of variations may determine the actual order in which the variations appear in the final song. The number of variations may match the number of the appearance of the song section in the song structure section 40.
The instrumental section 44 specifies the instrumental music to accompany each song section within the song. In this example, the instrumental music is identified by a number representing a unique piece of instrumental music. The song structure section 40 specifies the actual structure of the song. In this example, the song structure section 40 comprises a sequence starting with a verse, followed up by a chorus, another verse, another chorus, a bridge, and ending with a third chorus.
Referring back to
Referring to
For example, when Mandarin Chinese lyrics are processed by the lyrics parser 60a, Pinyin may be used as the phoneme format to obtain the phonemes. The phonemes may be grouped by syllables by identifying compound words in the lyrics. For example, in Mandarin Chinese lyrics, the lyrics “Jīntiān tiānqi zhēn h{hacek over (a)}d” are parsed into syllable groups “Jīntiān”, “tiānqi”, and “zhēn h{hacek over (a)}o”, so that there are three identified syllable groups, each having two syllables. In English lyrics, the lyrics “How are you today” are parsed into four syllable groups “How”, “are”, “you”, “today”, so that there are three identified syllable groups, each having one syllable, and one identified syllable group which has two syllables.
Responsive to parsing the lyrics, the rhythm template selector 60b is configured to assign a rhythm template to each song section of the lyrics, and the chord progression template selector 60c is configured to assign a chord progression template to each song section of the lyrics. Based on how the rhythm template selector 60b rates the rhythm template, the rhythm template selector 60b selects a rhythm template for each song section. Likewise, based on how the chord progression template selector 60c rates the chord progression template, the chord progression template selector 60c selects a chord progression template for each song section.
In the example of
The rhythm template selector 60b may evaluate a first condition to ensure that the number of notes in the rhythm template is not equal to or smaller than the number of syllables in the syllable pattern of the song section. The rhythm template selector 60b may evaluate a second condition to ensure that the minimum number of syllables that the rhythm template supports is equal to or larger than the number of syllables in the song section. The rhythm template selector 60b may evaluate a third condition to ensure that there are no breaks inside a multi-syllabic English word (“hello”, for example) or a compound Chinese word (“n{hacek over (i)} h{hacek over (a)}d”, for example) when the rhythm template is matched to the song section. The rhythm template selector 60b may evaluate a fourth condition to determine how many words in the song section need to be deleted or added (for example, adding one word or deleting two words) to match the rhythm template to the song section.
For each song section, the rhythm template selector 60b of the music generation program 26 may rate the rhythm templates 34a-e based on how the first condition, second condition, the third condition, and/or the fourth condition are met by the rhythm template for the song section to determine a rating reflecting a degree of matching between the syllable pattern and each of the plurality of rhythm templates 34a-e. The rhythm template selector 60b selects a rhythm template 34a-e with the best rating for each song section.
The rhythm template selector 60b may use a lyrics alignment template 54 of the score template 46 to map a syllable sequence in the lyrics to a melodic and monophonic note sequence. The lyrics alignment template 54 supports the mapping of a certain range of syllables (10 to 15 syllables, for example). The lyrics alignment template 54 aligns syllables of the lyrics to a melodic and monophonic note sequence in the corresponding rhythm template 34. The lyrics alignment template 54 may be initialized manually, or generated automatically. The lyrics alignment template 54 may be provided in a format indicating how to map syllables to the notes. In the example of
Referring to
Referring to the example of
It will be appreciated that the numbers of post-processing methods in the melody post-processing template 48, syllable patterns 56, song sections 50a-h, lyric alignment templates 54, repetition types in the repetition structures list 52 are not necessarily limited to the numbers described with reference to
Referring to
The populated score template 46 is then inputted into the music generator 62 to generate a music file 84 encoding the melody and the lyrics, applying the melody post-processing methods selected by the melody processing selector 60e. The melody post-processing methods may be performed by a pitch generator, which may be a conditional multimodal variational autoencoder (MVAE) model, for example.
For each song section, the music generator 62 may select and stitch all the syllable patterns 56 in the score template 46 to generate a melody score with lyrics, a MIDI file with chord progressions, or an audio file, for example. The length of the music file 84 may depend on the pattern lengths and the number of unique patterns for each song section. For example, if a song structure includes an ‘ABAB’ repetition type, the stitched pattern ‘AB’ instead of ‘ABAB’ would be processed by the music generator 62 to generate pitches. To deal with anacrusis in generating a MIDI file, the music generator 62 may pad the first chord to an extra bar at the beginning of the song, and not merge any pickup bars. Following pitch generation, the generated music file 84 may be chopped into 4-bar or 5-bar pieces, for example.
The outputted music file 84 may be in a format which carries information about the melody, rhythm pattern, etc. of the song. One example is the MIDI format which carries musical information about the pitch, start timing, stop timing, loudness (attack velocity), etc. The MIDI data can be multi-track, and each track can have a musical instrument type associated with it, such as piano, bass guitar, strings, and drums. In this way, the melody can be encoded in one track of a multi-track MIDI file the rhythm can be another track of the MIDI file, etc. The MIDI file may be played through a playback program that assigns synthesized and/or sampled electronic instruments to playback each track, thereby generating an audio file of the song. In one example, the MIDI file may have a General MIDI format, so that like sounding instruments are assigned to predetermined MIDI instrument codes.
The method 100 may start at step 102, when user input of lyrics for one song section is received. After step 102, a song structure is selected at step 104 for the one song section that is specified by the user input of lyrics. After step 104, the music generation program generates the rest of the song sections and selects the song structure for the rest of the song sections at step 106.
Alternatively, the method 100 may start at step 106, when a user or a machine inputs lyrics for all song sections. At step 108, song structures are selected for all song sections in accordance with the inputted lyrics, parsing the lyrics to obtain phonemes, group the phonemes by syllables.
The method 100 continues to step 110, at which a rhythm template and a chord progression template are matched for each song section. At step 112, a song structure with rhythm repetitions is generated. At step 114, melody post-processing methods are selected. At step 116, inputting the populated score template into the music generator, pitch generation is performed based on the rhythms and chord progressions specified in the score template. At step 118, rhythm repetition is applied in accordance with the score template. At step 120, the melody post-processing methods selected in the score template are performed.
At step 122, a chord score is generated, and at step 124, accompaniment is generated. At the same time as steps 122 and 124, at step 126, the melody score is generated. At step 128, the melody score is aligned with the lyrics. At step 130, a melody score with the lyrics is generated. At step 132, a singing voice is generated. At step 134, a mix-down audio is generated based on the generated chord score and melody score.
The method 200 comprises three main steps: step 202 in which all rhythms which meet the requirements for each lyric phrase are rated, step 204 in which all possible rhythm sequences for each song section are searched for and rated, and step 206 in which inter-section rhythm rating is performed.
Step 202, in which all rhythms which meet the requirements for each lyric phrase are rated, may include the following steps, and may be performed by the rhythm template selector. At step 202a, phonemes of the lyrics are aligned to the rhythm of the rhythm template by the lyrics alignment template. At step 202b, for each lyric group, a left gap (the onset gap between the onset of the last lyric in the previous lyric group and the onset of the first lyric in the current lyric group), a right gap (the duration of the last lyric), and all onset gaps in the current lyric group are calculated.
Referring to
Referring to
Returning to
At step 202d, for each onset gap which is greater than the minimum, the rating is increased by onset gap/minimum gap. In pseudocode, this rating increase would be represented by the equation: rating=rating+g/m, where g is each onset gap and m is the minimum gap. At step 202e, for any rest, or a music bar with no notes, in each onset gap, the rating is increased by the product of the rest duration and the rest weight. In pseudocode, this rating increase would be represented by the equation: rating=rating+rest duration*rest weight.
Step 204, in which all possible rhythm sequences for each song section are searched for and rated, may include the following steps, and may be performed by the repetition constructor. At step 204a, the types of rhythm repetitions which are allowed are defined. For example, for two lyric phrases, AA and AB rhythm repetitions may be allowed. For four lyric phrases, AAAA, AABA, ABAB, AABB, ABCB, ABAC, AABC, and ABCC rhythm repetitions may be allowed. At step 204b, for each repetition type, the exact rhythm sequences that match the repetition type are searched for. For example, for rhythm repetition ABAB, rhythm sequences [rhythm 1, rhythm 3, rhythm 1, rhythm 3] and [rhythm 1, rhythm 4, rhythm 1, rhythm 4] may be selected.
At step 204c, all rhythm sequences which are not ‘stitchable’, or not appropriate to link together in rhythm sequences, are filtered out. For example, if the next rhythm template has anacrusis (pickup notes) and the current rhythm template does not have enough space at the end to accommodate the anacrusis, the two rhythms are not ‘stitchable’. At step 204d, a total rating is given for each ‘stitchable’ rhythm sequence by adding the rating of each rhythm template multiplied by a weight, which is the number of repetitive rhythms in this sequence. For example, for the rhythm sequence [rhythm 1, rhythm 4, rhythm 1, rhythm 4] , the total rating will be (0.1+0.2+0.3+0.6)*2=2.4. The weighting mechanism makes the method 200 more likely to select more repetitive rhythm sequences due to the lower weight. At 204e, the ‘stitchable’ rhythm sequence with the lowest rating is selected as the rhythm sequence to use for the song section.
Step 206, in which the inter-section rhythm rating is performed, the ‘stitchability’ between those rhythm sequences that were selected in step 204 is evaluated, and may include the following steps, which may be performed by the repetition constructor. At step 206a, all of the rhythm sequences that are not ‘stitchable’ between song sections are filtered out. At step 206b, for all the ‘stitchable’ rhythm sequences, total ratings are given by summing up all the ratings in the rhythm sequence. At step 206c, the ‘stitchable’ rhythm sequence with the lowest rating is selected as the rhythm sequence between song sections.
The above-described system and methods generate lyric-aligned melodies from lyrics, which may lower the barrier to music composition for users. Generated melodies accommodate varied song structures including verses and choruses, as well as rhythm and song section repetitions. Lyrics are aligned to melodies by drawing from different rhythm templates and chord progression templates, increasing the variety of possible melody compositions. By lowering the barrier to music composition, user engagement may be increased on social platforms hosting the music generation program. Amateurs and professional musicians alike may benefit from using an easy-to-use music creation tool to create songs from lyrics.
In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.
Computing system 300 includes a logic processor 302 volatile memory 304, and a non-volatile storage device 306. Computing system 300 may optionally include a display sub system 308, input sub system 310, communication sub system 312, and/or other components not shown in
Logic processor 302 includes one or more physical devices configured to execute instructions. For example, the logic processor may be configured to execute instructions that are part of one or more applications, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.
The logic processor may include one or more physical processors (hardware) configured to execute software instructions. Additionally or alternatively, the logic processor may include one or more hardware logic circuits or firmware devices configured to execute hardware-implemented logic or firmware instructions. Processors of the logic processor 302 may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic processor optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic processor may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration. In such a case, these virtualized aspects are run on different physical logic processors of various different machines, it will be understood.
Non-volatile storage device 306 includes one or more physical devices configured to hold instructions executable by the logic processors to implement the methods and processes described herein. When such methods and processes are implemented, the state of non-volatile storage device 306 may be transformed—e.g., to hold different data.
Non-volatile storage device 306 may include physical devices that are removable and/or built in. Non-volatile storage device 306 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., ROM, EPROM, EEPROM, FLASH memory, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), or other mass storage device technology. Non-volatile storage device 306 may include nonvolatile, dynamic, static, read/write, read-only, sequential-access, location-addressable, file-addressable, and/or content-addressable devices. It will be appreciated that non-volatile storage device 306 is configured to hold instructions even when power is cut to the non-volatile storage device 306.
Volatile memory 304 may include physical devices that include random access memory. Volatile memory 304 is typically utilized by logic processor 302 to temporarily store information during processing of software instructions. It will be appreciated that volatile memory 304 typically does not continue to store instructions when power is cut to the volatile memory 304.
Aspects of logic processor 302, volatile memory 304, and non-volatile storage device 306 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.
The terms “module,” “program,” and “engine” may be used to describe an aspect of computing system 300 typically implemented in software by a processor to perform a particular function using portions of volatile memory, which function involves transformative processing that specially configures the processor to perform the function. Thus, a module, program, or engine may be instantiated via logic processor 302 executing instructions held by non-volatile storage device 306, using portions of volatile memory 304. It will be understood that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms “module,” “program,” and “engine” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.
When included, display subsystem 308 may be used to present a visual representation of data held by non-volatile storage device 306. The visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the non-volatile storage device, and thus transform the state of the non-volatile storage device, the state of display subsystem 308 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 308 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic processor 302, volatile memory 304, and/or non-volatile storage device 306 in a shared enclosure, or such display devices may be peripheral display devices.
When included, input subsystem 310 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity; and/or any other suitable sensor.
When included, communication subsystem 312 may be configured to communicatively couple various computing devices described herein with each other, and with other devices. Communication subsystem 312 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network, such as a HDMI over Wi-Fi connection. In some embodiments, the communication subsystem may allow computing system 300 to send and/or receive messages to and/or from other devices via a network such as the Internet.
The following paragraphs provide additional support for the claims of the subject application. One aspect provides a music generation system comprising a processor and a memory operatively coupled to the processor and storing a rhythm template database comprising a plurality of rhythm templates; and a music generation program stored in the memory and executed by the processor to be configured to receive a user input of lyrics; identify a plurality of syllables in the lyrics; determine a syllable pattern in the identified plurality of syllables; match the syllable pattern to a selected rhythm template of the plurality of rhythm templates; generate a melody based on the selected rhythm template; generate a music file encoding the melody and the lyrics; and output the music file encoding the melody and the lyrics. In this aspect, additionally or alternatively, the outputted music file is a melody score with lyrics, a MIDI file with chord progressions, or an audio file. In this aspect, additionally or alternatively, the lyrics comprise a plurality of song sections; the selected rhythm template is matched to the syllable pattern of one of the plurality of song sections; and each of the plurality of song sections is matched to one of the plurality of rhythm templates. In this aspect, additionally or alternatively, the user input of lyrics comprises song structure settings including a song structure; and the melody is generated based on the song structure indicating an order of the plurality of song sections. In this aspect, additionally or alternatively, each of the plurality of song sections is matched to one of the plurality of rhythm templates based on a list of allowed rhythm repetition types. In this aspect, additionally or alternatively, each of the plurality of song sections is matched to one of the plurality of rhythm templates based on a list of rhythms pairs that are prohibited from being linked with each other. In this aspect, additionally or alternatively, to match the syllable pattern to the selected rhythm template, a degree of matching between the syllable pattern and each of the plurality of rhythm templates is determined based on a number of syllables in the syllable pattern and a minimum number of syllables supported by each of the plurality of rhythm templates. In this aspect, additionally or alternatively, to match the syllable pattern to the selected rhythm template, a degree of matching between the syllable pattern and each of the plurality of rhythm templates is determined based on a number of syllable groups that require adding or subtracting to the determined syllable pattern to match with the selected rhythm pattern. In this aspect, additionally or alternatively, the memory is configured to further store a chord progression database comprising a plurality of chord progressions; the syllable pattern is matched to a selected chord progression of the plurality of chord progressions; and the melody is generated based on the selected rhythm template and the selected chord progression. In this aspect, additionally or alternatively, the melody is generated by performing pitch generation based on the selected rhythm template and the selected chord progression.
Another aspect provides a music generation method comprising steps to receive a user input of lyrics; identify a plurality of syllables in the lyrics; determine a syllable pattern in the identified plurality of syllables; match the syllable pattern to a selected rhythm template of the plurality of rhythm templates; generate a melody based on the selected rhythm template; generate a music file encoding the melody and the lyrics; and output the music file encoding the melody and the lyrics. In this aspect, additionally or alternatively, the outputted music file is a melody score with lyrics, a MIDI file with chord progressions, or an audio file. In this aspect, additionally or alternatively, the lyrics comprise a plurality of song sections; the selected rhythm template is matched to the syllable pattern of one of the plurality of song sections; and each of the plurality of song sections is matched to one of the plurality of rhythm templates. In this aspect, additionally or alternatively, the user input of lyrics comprises song structure settings including a song structure; and the melody is generated based on the song structure indicating an order the plurality of song sections. In this aspect, additionally or alternatively, each of the plurality of song sections is matched to one of the plurality of rhythm templates based on a list of allowed rhythm repetition types. In this aspect, additionally or alternatively, each of the plurality of song sections is matched to one of the plurality of rhythm templates based on a list of rhythms pairs that are prohibited from being linked with each other. In this aspect, additionally or alternatively, to match the syllable pattern to the selected rhythm template, a degree of matching between the syllable pattern and each of the plurality of rhythm templates is determined based on a number of syllable groups that require adding or subtracting to the determined syllable pattern to match with the selected rhythm pattern. In this aspect, additionally or alternatively, the syllable pattern is matched to a selected chord progression of a plurality of chord progressions; and the melody is generated based on the selected rhythm template and the selected chord progression. In this aspect, additionally or alternatively, the melody is generated by performing pitch generation based on the selected rhythm template and the selected chord progression.
Another aspect provides a music generation system comprising a processor and a memory operatively coupled to the processor and storing a rhythm template database comprising a plurality of rhythm templates and a chord progression database comprising a plurality of chord progressions; an audio reproduction device operatively coupled to the memory and the processor; and a music generation program stored in the memory and executed by the processor to be configured to receive a user input of lyrics; identify a plurality of syllables in the lyrics; determine a syllable pattern in the identified plurality of syllables; match the syllable pattern to a selected rhythm template of the plurality of rhythm templates; match the syllable pattern to a selected chord progression of the plurality of chord progressions; and generate a melody based on the selected rhythm template and the selected chord progression; generate an audio file or a MIDI file encoding the melody and the lyrics; and output the audio file or the MIDI file encoding the melody and the lyrics on the audio reproduction device.
It will be appreciated that “and/or” as used herein refers to the logical disjunction operation, and thus A and/or B has the following truth table.
It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.
To the extent that terms “includes,” “including,” “has,” “contains,” and variants thereof are used herein, such terms are intended to be inclusive in a manner similar to the term “comprises” as an open transition word without precluding any additional or other elements.