The instant invention relates generally to methods of generating music works and, more particularly, methods of automatically generating music works via a rule-based approach that utilizes structured and customizable algorithmic templates and AI technology.
Creation of a musical work has been a goal and dream of many people for as long as music has been around. However, a lack of knowledge of details regarding the intricacies of music styles has prevented many from generating and writing music. As such, this endeavor has, for a very long time, been a privilege of people having the necessary knowledge and education.
With the advent of personal computers and the widespread development of specialized software for these devices in the home consumer market software, products have emerged that allow a user to create pleasing and useful musical compositions without having to know music theory or needing to understand music constructs such as measures, bars, harmonies, time signatures, key signatures, music notation, etc. These software products generally provide graphical user interfaces that feature a visual approach to song and music content creation that allows even novice users to focus on the creative process by providing easy access to the process of music generation.
Additionally, these software products have simplified for the user access to content useful for the generation of music. A multitude of individual sound clips, e.g., sound loops or just “loops”, are usually provided to the user for selection and insertion into the tracks of a graphical user interfaces. With these sorts of software products, the task of music or song generation has come within reach for an expanded audience of users, who happily take advantage of the more simplified approach to music or song generation. These software products have evolved over the years, gotten more sophisticated and more specialized, and some have even been implemented on mobile devices.
However, the general approach to music or song generation according to this approach has remained virtually unchanged. i.e., the user is required to select individual pre-generated loops that represent different instruments, for example drums, bass, guitar, synthesizer, vocals, etc., and place them in digital tracks to generate individual song parts that have lengths of 4 or 8 measures. Using this approach most users are able to generate one or two of these song parts with the help of the graphical user interface of a mobile or desktop-based software product. However, this tends to produce an unfinished music work, because the generation of a complete, musically pleasing music work is a task that is not practicable for most users, who will then leave the music work unfinished and abandon the attempt to generate music works.
Heretofore, as is well known in the media editing industry, it should now be recognized, as was recognized by the present inventors, that there exists, and has existed for some time, a very real need for a system and method that would address and solve the above-described problems.
Thus, what is needed is a system and method for a rule-based algorithmic generative music system that is easily accessible to the user and that provides an algorithmic approach utilizing structured and customizable templates that integrate AI technology and utilize provided, pre-prepared databases containing data content for use by the instant invention.
Before proceeding to a description of the present invention, however, it should be noted and remembered that the description of the invention which follows, together with the accompanying drawings, should not be construed as limiting the invention to the examples (or embodiments) shown and described. This is so because those skilled in the art to which the invention pertains will be able to devise other forms of this invention within the ambit of the appended claims.
According to a first embodiment, there is presented herein a generative music system using rule-based algorithms organized in selectable templates for music generation utilizing AI technology. The generative music system utilizes a three-phase process for generating a musical work—the three phases are an input phase, a data determination phase and, last, a render phase. The input phase collects, compacts and organizes data provided by the user and the inventor. In the data determination phase of the instant invention the data collected in the input phase is put through a multi-step process wherein data values are determined that represent music work generation values that are then finally utilized in the render phase.
The foregoing has outlined in broad terms some of the more important features of the invention disclosed herein so that the detailed description that follows may be more clearly understood, and so that the contribution of the instant inventors to the art may be better appreciated. The instant invention is not to be limited in its application to the details of the construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. Rather, the invention is capable of other embodiments and of being practiced and carried out in various other ways not specifically enumerated herein. Finally, it should be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting, unless the specification specifically so limits the invention.
These and further aspects of the invention are described in detail in the following examples and accompanying drawings.
The invention will be described in connection with its preferred embodiments. However, to the extent that the following detailed description is specific to a particular embodiment or a particular use of the invention, this is intended to be illustrative only and is not construed as limiting the invention's scope. On the contrary, it is intended to cover all alternatives, modifications, and equivalents included within the invention's spirit and scope, as defined by the appended claims.
While this invention is susceptible of embodiment in many different forms, there is shown in the drawings, and will be described hereinafter in detail, some specific embodiments of the instant invention. It should be understood, however, that the present disclosure is to be considered an exemplification of the principles of the invention and is not intended to limit the invention to the specific embodiments or algorithms so described.
As is generally indicated in
By way of a general introduction,
In
Associated with each genre is the provision of at least one user-selectable template 605 that will be used in the generation of the output music work. The discussion associated with
Next, and preferably, the user might be allowed to choose between a collection-based 610 or a mix-pack based 615 music work generation process. The difference between the variants is associated with the type and source of the source audio material that is used in the generation of at least one seed part. One key difference between the two approaches is that in the collection based 610 approach collections of audio material, as disclosed in connection with
As a next preferred step, the data determination phase 680 is initiated. In the data determination phase 680, data values that define multiple different seed parts are generated 620 and stored using the information provided by the template pipeline seed part generation instructions 422. Additional information regarding the seed part generation is provided in connection with
One component of the determination phase 680 is the generation of the structure 625 of the output music work. As part of this step the instant invention utilizes the data values representing the seed part, variation parts, shuffled parts, intro parts, outro parts and transition parts and applies a structure based at least in part on the order of these parts. As one example, the order of the parts might be flagged as “reverse” so that the previously determined ordering A B C D E F built with information from the seed part returns A B C D E F E D C B. As is indicated in
The generation of the parts mentioned above takes place in the next three steps. First, the instant invention generates part variations 630 of the previously generated seed part using the information provided by the template pipeline shuffle/duplicate instructions 424. These variation parts might be intro or outro parts or shuffled or duplicate parts. Alternatively, or additionally, this step generates transition parts 635 and, optionally, an intro/outro as has been discussed previously in connection with the information provided by the template pipeline, steps 426 and 428, respectively. As an additional step of the data determination phase 680, the instant invention will determine if vocal parts 640 are to be added to the output music work, where vocal content is obtained from a mixpack containing audio loops with vocal content. The vocal content is specifically prepared for selection and integration into the data values representing the output music work.
As a next preferred step the data determination phase 680 will populate the generated structure 645 of the music work parts with information about audio loops from the collections or mix-packs that have been selected for insertion into the music work parts.
Next, the instant invention will preferably determine values for harmony presets 650. The harmony preset values define the chord progression sequences for each music work part, with these sequences being drawn from a range of provided presets stored in the template. The harmony presets are selected and provided to the render phase 690. As a last step of the data determination phase the instant invention will select and provide the stored data values for the automix algorithm 650 as provided by the selected template algorithm.
Finally, it should be noted that the only essential steps in the data determination phase 680 in
In the render phase 690 the instant invention will utilize the data values from the data determination phase 680. The rendering step will generate the output music work 660 by building the seed part, setting the structure, generating the part variations, transitions, and vocal parts, ordering the structure, populating the structure with audio loops from an audio loop database and then applying the harmony preset settings and the automix values to generate an output work. That is, step 660 utilizes the data values collected in the previous steps. The structure is disclosed in
Each audio collection 200 contains one or more different mixpacks 232, 234, and 236, each of which contains or is otherwise associated with some number of audio loops that are musically similar to each other and are compatible with a common genre and also with the theme of the mixpack. Each mixpack might contain audio loops that are stored locally as part of its data structure and/or it might contain pointers to loops that are stored in a general loop database 240 as is indicated in
Turning next to
According to a first aspect of the inventive template, foundational data is accumulated from the audio collection(s) 410 that are available to it. As has been discussed previously in connection with
An additional aspect of the template algorithm is a data construct referred to herein as a pipeline 420. The pipeline 420 contains a list of instructions that are utilized to build the music work and all of its parts. At its most basic level this could be a software module with the instructions embedded in it or read on the fly. In other cases, it could be a collection of high-level instructions or commands, e.g., macro instructions, that are executed by a software engine designed for that purpose. The instructions define, among others, the length and song part structure of the resulting music work.
A main function of the pipeline application is to create song parts from scratch or based on other song parts and, additionally, structuring these parts. More particularly, the pipeline 420 contains instructions and steps that, in essence, generate a plurality of data values, beginning with the generation of a seed part. It should be noted that the sorted listing in this figure is not meant to represent a strict stepwise order of the individual parts in this listing. As has been stated, the template provides a plurality of data values that are then utilized by a render phase to generate the output music work. A seed part is the initial part of the song/music work that embodies the overall concept and feel of the work. One reason for referring to this construct as a “seed part” is that it forms a seed or basis for the steps that follow. For example, the generated shuffled parts, duplicate parts, transition parts, intro and outro parts are built based on the characteristics of the seed part.
Execution of the pipeline instructions generates a seed part that forms part of the initial building block of the music work. The seed part will not typically be the first or last part of the song structure but, instead, it will typically be situated in the body of the work, preceded by at least an intro section and followed by at least an outro section.
The music work parts acting as intro and outro are, at the most basic, variant copies of the parts to which the intro is building and from which the outro is following. According to one embodiment, there is a priority list of instrument channels that are activated (intro) and deactivated (outro) when transitioning from the intro into the body of the music work and from the outro to the music work end. With respect to the intro, preferably these transitions will be achieved by activating instrument channels in a preferred order to transition from the intro to the main body of the music work. Conversely, the outro of the music work will transition to its ending by deactivating instrument channels in a preferred. In one embodiment, the order of activation would be Keys, Strings, Synth, Guitar, Percussion . . . , and the ordering for deactivation would be Drum, Bass, Percussion, Synth, Strings, Keys. Obviously, the particular instruments that are activated/deactivated will depend on the instrument channels that have been created in the music work, e.g., not every music work will have an intro or outro that has a keyboard (i.e., “Keys”) channel.
Another component of the instruction list associated with the pipeline is a collection of steps that indicate how a seed music work part should be generated 422. These steps are utilized in step 620 of
As is indicated in
Continuing now with
Next, a drum audio loop and a bass audio loop will be randomly selected 730. Note that this selection could be from among the audio loops 240 in one of the mixpacks 232, 234 and 236 in the example of
As a next preferred step, the instant invention will parse through all of the instrument labels (i.e., instrument types) in the audio collection associated with the selected template and determine 740 a list of the instrument labels (i.e., the instrument types) and their frequency in the collection. From this list, an ordering of the frequency of occurrence of each instrument type will be created, and the most common instrument types will be identified. Note that the drum and bass labels/instrument types will be excluded from this list.
Next, the instant invention will identify audio loops associated with the three most commonly occurring instrument types 750. Additionally, in some embodiments a random component can be introduced into the seed part generation process by adding a chance, e.g., a 50% chance, that at least one or more audio loops is added from any of the other less common instrument types. Of course, this percentage can be varied from 0% to 100% to vary the likelihood that a less common instrument will be selected. The instant invention will implement the above steps a number of times to provide the user with multiple seed parts 760 which potentially can lead to multiple output music works.
Returning now to
An additional component of the pipeline instructions might be the information about the generation of transition parts 426. Transition parts act as bridges between two music work parts. In some variations, the instrument channels might change the audio loops from the earlier music part one channel at a time and music work part by music work part to match the music part that follows. Instructions associate with the transition operation 426 will specify a starting part and an ending part, between the transition music work parts.
A further entry in the instruction list of the pipeline are the instructions for the generation of music work parts that are utilized as intro and outros 428. Instructions for the generation of intro and outro parts are similar in the following respect. In both cases, multi-channel music work parts are generated wherein for intro parts instrument channels are activated one by one and for outro parts instrument channels are deactivated one by one. The instructions will create as many parts as necessary to arrive at the desired target part for both the intro and outro parts. In addition, for the intro a music work part is selected toward which the intro part is building and, for the outro part, a music work part is selected that the outro part is building down from. For both variants the instrument channel deactivation and activation is preferably determined from an ordered list of instrument channels as has been described previously. In some cases there might be instruments that are flagged as never active. Additionally, the lengths (measures, time, etc.) of the intro and outro can be specified separately.
Turning to an additional aspect of the template algorithm structure as set out in
Another data value that is utilized and implemented in the music work part generation is called progressions 450 and represents harmony presets, which are chord sequences that might be chosen for each music work part in a data collection representing an output music work. The chord sequences utilized for individual music work parts are drawn at random from a range of hard coded, predetermined and provided presets. These presets might be organized in the template algorithm organization according to this example:
In the example above, each letter represents one beat (not one bar) and major chords are represented with upper case letters, while minor chords are written in lower case. In some cases, these presets might correspond to standard music chord change patterns. e.g., 1-5-6-4 (e.g., C, G, Am, F), 6-4-1-5 (e.g., Am, F, C, G), 1-4-5-4 (e.g., C, F, G, F), 1-6-4-5 (e.g., C, Am, F, G), 2-5-1-6 (e.g., Dm, G, C, Am), etc.
Another set of instructions that might be included as part of the instruction list contained in the pipeline 420 are directions associated with the automix algorithm 460 that aims to provide a more balanced mix of the music work parts.
The automix volume adjustment 300 multi-step process of
To address this problem the instant invention utilizes the automix algorithm of
The volume levels of the generated music work are adjusted by applying the automix algorithm to the music work's audio loops and/or its song parts and/or or its instrument channels. These adjustments are applied in multiple steps and preferably at different granularity levels. Each step might be applied alone or all of the approaches in
The first granularity level involves a loudness adjustment being made to all audio loops that are part of the music work. As is illustrated in
A second/higher level of granularity is an adjustment of the volume of each song part 310 that makes up the music work, with the goal being to make the different song parts more consistent in volume. As a first preferred step for each song part the instant invention will determine the number of active instrument channels 340. If the number of active instrument channels is above four 345, the volume of all instrument channels will be reduced by a factor 350, for example 0.5 dB. If the number of active instrument channels is less than four then the instant invention 355 will, for each song part, increase the volume of all instrument channels by a factor 360, for example 0.5 dB. The 0.5 dB value was selected based on the experience of the inventors with a goal of keeping the loops in the song part in balance. Note that the 0.5 dB value was empirically determined and could be, for example, 0.25 dB, 0.5 dB, 0.75 dB, 1.0 dB, etc., depending on the loops involved and the tastes of the user. Those of ordinary skill in the art will readily understand how this value might be chosen in a particular case.
The third and highest granularity level is a volume adjustment based on instrument channels. In a first preferred step, all instrument channels are selected 315 and the volume of these instrument channels reduced by a predetermined or calculated value 365, for example a reduction by 2 dB is one suitable value. In this third granularity level, the volumes of the drum and bass instrument channels will not be reduced by this amount. This approach is designed to shift the audio experience in favor of instrument channels that typically make up the power/energy of the music work, i.e., the drum and bass content. Although 2 dB is a preferred value, other choices based on the experience of the instant inventors might be, for example, I dB, 4 dB, and 5 dB.
As one possible alternative to the process discussed above, manual pre-set volume level offsets might be provided for specified instrument channels. Adjustment values given in decibels might be provided. In many cases, mostly negative values will be utilized. These presets might be organized in the template algorithm organization according to this example:
Turning next to
Each song part has a specific runtime at a given tempo. The run time might be defined in terms of measures instead of time, for example, 4 or 8 measures or multiples thereof. Additionally, the song parts might be further identified by, for example, designating them as being an intro, ending, verse, chorus, bridge, etc.
In
Coming next to
An embodiment that illustrates a preferred approach to generating the seed part is contained in
Next, the instant invention will utilize the priority mix-pack selection and parse through all instrument channel labels in the specified mixpack and determine 850 a list and order of the most common instrument channel labels for that mix-pack. As mentioned previously, this list will exclude bass and drum instrument channels from the ordered list. The instant invention will then select audio loops which are associated with at least the top three of an ordered list of the determined most common instrument channel labels/types 860. Additionally, the instant invention will optionally randomly select at least one other audio loop from any of the less common instrument channel labels. The instant invention will typically implement the above disclosed steps a plurality of times to provide the user with multiple seed parts 870 which, in turn, can lead to multiple output music works.
The audio collections 910 are repositories of audio material based on a thematic approach, preferably genre. The next part is the seed part generation 920, which represents an AI model 970 that utilizes a set of audio loops as the basis for the output music work concept. A variety of methods are utilized to generate the seed parts, there might be GAN generated seed parts, AI template generated seed parts 970, or AI machine learning based generated seed parts.
The AI in this step will previously have been trained on a number of loops, preferably the loops in the database 240. In one embodiment, the content of the loop database will be analyzed by an algorithm which provides data values for around 200 fundamental/low level parameters of each audio loop including. These parameters might include, for example, volume, loudness, FFT (e.g., the frequency content of the loop or sound based on its fast Fourier transform and/or its frequency spectrum) etc. In one preferred embodiment the analysis might continue by using PCA (principal component analysis), linear discriminant analysis (“LDA”), etc. LDA will be performed on the fundamental/low parameters to reduce their number and dimensionality. Methods of reducing dimensionality using PCA and LDA in a way to maximize the amount of information captured are well known to those of ordinary skill in the art. The resulting summary parameters which, in some embodiments might comprise at least eight or so parameters, will be used going forward. The summary parameters might include one that corresponds to the instrument(s) that are predominant in each loop.
The next part is the generation of a structure 930 of the audio loop sequence. In this case, an unsupervised process AI model 970 utilizes the seed part and intelligently selects and assembles a set of audio loops into a sophisticated music work structure. This process applies the AI template to the contextual numerical relationship between the audio loop sounds, as derived by a convolutional neural network audio signal retrieval process that the audio loops have been subjected to prior to storage in the database, wherein a particular numerical value uniquely represents each audio loop. As before, the AI will previously have been trained on music works of different genres, tempos, lengths, etc.
As a next preferred part there is the provision of a representation of an output music work 940, wherein the system provides a symbolic representation of the output music work as a machine readable alpha numerical file or metadata file. The next part is the music work generation 950 wherein the render engine reads the symbolic presentation of the output music work and generates an output music work as the final part 960. The AI 970 is one that has been trained on different genres of music and is able to assist the instant invention in forming and selecting templates, generating seed parts, and establishing the structure of the music work. As has been noted previously, the fact that the steps of
It should be noted and understood that the invention is described herein with a certain degree of particularity. However, the invention is not limited to the embodiment(s) set for herein for purposes of exemplifications, but is limited only by the scope of the attached claims.
It is to be understood that the terms “including”, “comprising”, “consisting” and grammatical variants thereof do not preclude the addition of one or more components, features, steps, or integers or groups thereof and that the terms are to be construed as specifying components, features, steps or integers.
The singular shall include the plural and vice versa unless the context in which the term appears indicates otherwise.
If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional element.
It is to be understood that where the claims or specification refer to “a” or “an” element, such reference is not to be construed that there is only one of that element.
It is to be understood that where the specification states that a component, feature, structure, or characteristic “may”, “might”, “can” or “could” be included, that particular component, feature, structure, or characteristic is not required to be included.
Where applicable, although state diagrams, flow diagrams or both may be used to describe embodiments, the invention is not limited to those diagrams or to the corresponding descriptions. For example, flow need not move through each illustrated box or state, or in exactly the same order as illustrated and described.
Methods of the present invention may be implemented by performing or completing manually, automatically, or a combination thereof, selected steps or tasks.
The term “method” may refer to manners, means, techniques and procedures for accomplishing a given task including, but not limited to, those manners, means, techniques and procedures either known to, or readily developed from known manners, means, techniques and procedures by practitioners of the art to which the invention belongs.
For purposes of the instant disclosure, the term “at least” followed by a number is used herein to denote the start of a range beginning with that number (which may be a ranger having an upper limit or no upper limit, depending on the variable being defined). For example, “at least” means or more than. The term “at most” followed by a number is used herein to denote the end of a range ending with that number (which may be a range having or 0 as its lower limit, or a range having no lower limit, depending upon the variable being defined). For example, “at most 4” means 4 or less than 4, and “at most 40%” means 40% or less than 40%.
Terms of approximation (e.g., “about”, “substantially”, “approximately”, etc.) should be interpreted according to their ordinary and customary meanings as used in the associated art unless indicated otherwise. Absent a specific definition and absent ordinary and customary usage in the associated art, such terms should be interpreted to be ±0% of the base value.
When, in this document, a range is given as “(a first number) to (a second number)” or “(a first number)-(a second number)”, this means a range whose lower limit is the first number and whose upper limit is the second number. For example, 25 to 00 should be interpreted to mean a range whose lower limit is 25 and whose upper limit is 00. Additionally, it should be noted that where a range is given, every possible subrange or interval within that range is also specifically intended unless the context indicates to the contrary. For example, if the specification indicates a range of 25 to 00 such range is also intended to include subranges such as 26-00, 27-00, etc., 25-99, 25-98, etc., as well as any other possible combination of lower and upper values within the stated range, e.g., 33-47, 60-97, 4-45, 28-96, etc. Note that integer range values have been used in this paragraph for purposes of illustration only and decimal and fractional values (e.g., 46.7-9 0.3) should also be understood to be intended as possible subrange endpoints unless specifically excluded.
It should be noted that where reference is made herein to a method comprising two or more defined steps, the defined steps can be carried out in any order or simultaneously (except where context excludes that possibility), and the method can also include one or more other steps which are carried out before any of the defined steps, between two of the defined steps, or after all of the defined steps (except where context excludes that possibility).
Further, it should be noted that terms of approximation (e.g., “about”, “substantially”, “approximately”, etc.) are to be interpreted according to their ordinary and customary meanings as used in the associated art unless indicated otherwise herein. Absent a specific definition within this disclosure, and absent ordinary and customary usage in the associated art, such terms should be interpreted to be plus or minus 0% of the base value.
Still further, additional aspects of the instant invention may be found in one or more appendices attached hereto and/or filed herewith, the disclosures of which are incorporated herein by reference as if fully set out at this point.
Thus, the present invention is well adapted to carry out the objects and attain the ends and advantages mentioned above as well as those inherent therein. While the inventive device has been described and illustrated herein by reference to certain preferred embodiments in relation to the drawings attached thereto, various changes and further modifications, apart from those shown or suggested herein, may be made therein by those of ordinary skill in the art, without departing from the spirit of the inventive concept the scope of which is to be determined by the following claims.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 63/450,136 filed on Mar. 6, 2023, and incorporates said provisional application by reference into this document as if fully set out at this point.
Number | Date | Country | |
---|---|---|---|
63450136 | Mar 2023 | US |