This application relates generally to generating music tracks, and more specifically to generating music tracks using artificial neural networks.
Some computer-generated music uses interactive evolutionary computation (IEC), by which a computer generates a random initial set of music tracks, and then a human selects aesthetically pleasing tracks that are used to produce the next generation. However, even with human input for selecting the next generation, computer-generated music often sounds artificial and uninspired. Also, computer-generated music often lacks a global structure that holds together the entire song.
One example embodiment involves a method for generating rhythms. This method comprises the steps of: generating an initial population of Compositional Pattern Producing Networks (CPPNs) wherein each CPPN produces a rhythm output; receiving a selection of one of the rhythm outputs; and evolving a next generation of CPPNs based upon the selection.
Another example embodiments involves a system for generating rhythms. This system comprises a plurality of Compositional Pattern Producing Networks (CPPNs), each of the CPPNs using a time signature input to produce a rhythm; logic configured to receive a selection of one or more of the CPPNs; and logic configured to generate at least one evolved CPPN based upon the selection.
Music may be represented as a function of time. In this regard, where t=0 indicates the beginning of a musical composition and t=n indicates the end of a musical composition, there is a function f(t) that embodies a pattern equivalent to the musical composition itself. However, with respect to the musical composition, the function f(t) may be difficult to formulate. While it may be difficult to formulate a function f(t) indicative of the musical composition itself, the musical composition has recognizable structure, which varies symmetrically over time. For example, the time in measure increases from the start of the measure to the end of the measure then resets to zero for the next measure. Thus, a particular musical composition exhibits definable variables, such as time in measure (“m”), time in beat (“b”), and time in song (“t”), which can be viewed as a time signature. These variables may then be used as arguments to a function, e.g., g(m, b, t), which receives the variables as arguments at any given time and produces a note or a drum hit for the given time.
Over the period t=0 to t=n, these note or drum hit outputs comprise a rhythm exhibiting the structure of the time signature inputs. In this regard, the rhythm output produced by the function will move in accordance with the function input signals, i.e., the time in measure and the time in beat over the time period t=0 to t=n. Thus, g(m, b, t) will output a function of the musical composition structure, i.e., time in measure and time in beat, and the output will sound like rhythms indicative of the time structure.
The transformation function g(t) which generates a rhythm as a function of various inputs can be implemented by, or embodied in, an artificial neural network. Viewed another way, the artificial neural network encodes a rhythm. In some embodiments, a specific type of artificial neural network called a Compositional Pattern Producing Network (CPPN) is used. Although embodiments using CPPNs are discussed below, other embodiments uses different types of artificial neural networks are also contemplated.
The systems and methods disclosed herein generate an initial set of CPPNs which produce a rhythm output from a set of timing inputs. A user selects one or more CPPNs from the initial population, and the systems and methods evolve new CPNNs based on the user selections.
Memory 20 stores various programs, in software and/or firmware, including an operating system (O/S) 52, rhythm CPPN generation logic 53, and rhythm evolving logic 54. The operating system 52 controls execution of other programs and provides scheduling, input-output control, file, data management, memory management, and communication control and related services. In addition, memory 20 stores CPPN data 40 comprising a plurality of initial rhythm CPPNs 100-110 and a plurality of evolved rhythm CPPN 200-210.
During operation, the rhythm CPPN generation logic 53 generates the plurality of initial rhythm CPPNs 100-110 that produce a plurality of respective rhythms, e.g., drum rhythms. In this regard, each CPPN 100-110 receives one or more inputs containing timing information (described further herein), and produces an output that is an audible representation of the rhythm embodied or encoded in the respective CPPNs 100-110.
Once the CPPNs 100-110 are generated, a program can query CPPN generation logic 53 to obtain a description of one or more of the initial CPPNs 100-110, where the CPPN description includes a description of the rhythm output. In some embodiments, the CPPN description also describes other aspects of the CPPN, including (for example) the input signal, the activation functions, etc. Using the CPPN description, the program can display one or more graphical representations of the rhythms embodied in the CPPNs 100-110 via the display 82. A graphical display of the rhythms embodied in the CPPNs is described further with reference to
The graphical representations enable the user to visually inspect different characteristics of each of the rhythms embodied in the initial CPPNs 100-110. In addition and/or alternatively, the user can listen to each of the rhythms and audibly discern the different characteristics of the plurality of rhythms. The user then selects one or more rhythms exhibiting characteristics that the user desires in an ultimate rhythm selection.
After selection of one or more rhythms by the user, the rhythm CPPN evolving logic 54 generates a plurality of evolved CPPNs 200-210. In one embodiment, the CPPN evolving logic 54 generates the CPPNs 200-210 by employing a Neuroevolution of Augmenting Topologies (NEAT) algorithm. The NEAT algorithm is described in “Evolving Neural Networks through Augmenting Topologies,” in the MIT Press Journals, Volume 10, Number 2 authored by K. O. Stanley and R. Mikkulainen, which is incorporated herein by reference. The NEAT algorithm and its application within the rhythm-evolving system 10 are described hereinafter with reference to
In employing NEAT to evolve the CPPNs 200-210, the CPPN evolving logic 54 may alter or combine one or more of the CPPNs 100-110. In this regard, the CPPN evolving logic 54 may mutate at least one of the CPPNs 100-110 or mate one or more of the CPPNs 100-110 based upon those selected by the user. The user may select, for example, CPPNs 100-105 as exhibiting characteristics desired in a rhythm by the user. With the selected CPPNs 100-105, the evolving logic 54 may select one or more of the CPPNs 100-105 selected to mate and/or mutate. Furthermore, the evolving logic 54 may apply speciation to the selected CPPNs 100-105 to form groups of like or similar CPPNs that the evolving logic 54 makes and/or mutates.
Once the evolving logic 54 mutates at least one CPPN 100-110 and/or mates at least two of the CPPNs 100-110, the evolving logic 54 stores the mutated and/or mated CPPNs 100-110 as evolved rhythm CPPNs 200-210. Once the evolving logic 54 generates one or more CPPNs 200-210, a program can query evolving logic 54 to obtain a description of the evolved CPPNs 200-210 to the user, and can generate a graphical representation of one or more of evolved CPPNs 200-210 (as described earlier in connection with CPPNs 100-110). Again, the user can select one or more of the rhythms embodied in the CPPNs 200-210 as desirable, and the evolving logic 54 performs mutation and mating operations on those CPPNs embodying those rhythms desired by the user. This process can continue over multiple generations until a rhythm is evolved that the user desires.
Each function f (A)-f (E) is referred to as an “activation function,” which is a mathematical formula that transforms on input(s) of a processing element A-E into one or more output rhythm signals 32. Thus, each input signal 11-13 can be viewed as comprising a series of time steps, where at each time step the CPPN 100′ transforms the combination of input time signature inputs 11-13 into one or more corresponding output rhythm signals 32, each of which represents a note or a drum hit for that time.
When associated with a particular percussion instrument (e.g., when a user makes the association via a user interface), a particular rhythm signal 32 indicates at what volume the instrument should be played for each time step. For ease of illustration, the example embodiment of
The time signature inputs for the example CPPN 100′ of
Other inputs may be provided to the CPPN 100′. As an example, a sine wave may be provided as an input that peaks in the middle of each measure of the musical composition, and the CPPN function may be represented as g(m, b, t, s) where “s” is the sine wave input. While many rhythms may result when the sine wave is provided as an additional input, the output produced by the function g(m, b, t, s) exhibits a sine-like symmetry for each measure.
To further illustrate the concept of how the CPPNs in
Example activation functions implemented by processing elements A-E include sigmoid, Gaussian, or additive. The combination of processing elements within a CPPN can be viewed as applying the function g(m, b, t) (described above) to generate a rhythm signal 32 at output 31 in accordance with the inputs 11-13. Note that, unless otherwise specified, each input is multiplied by the weight of the connection over which the input is received. This support for periodic (e.g., sine) and symmetric (e.g., Gaussian) functions distinguishes the CPPN from an ANN.
As an example, f(D) may employ a sigmoid activation function represented by the following mathematical formula:
F(D)=2.0*(1.0/(1.0+exp(−1.0*x))))−1.0 A.1
In such an example, the variable x is represented by the following formula:
x=input 26*weight of connection 45+input 25*weight of connection 47, A.2 as described herein.
As another example, f(D) may employ a Gaussian activation function represented by the following mathematical formula:
f(D)=2.5000*((1.0/sqrt(2.0*Pl))*exp(−0.5*(x*x))) A.3
In such an example, the variable z is also represented by the formula A.2 described herein.
As another example, f(D) may employ a different Gaussian activation function represented by the following mathematical formula:
f(D)=(5.0138*(1/sqrt(2*Pl))exp(−0.5(x*x)))−1
In such an example, the variable x is also represented by the formula A.2 described herein.
Numerous activation functions may be employed in each of the plurality of processing elements A-E, including but not limited to an additive function, y=x; an absolute value function, y=|x|; and exponent function, y=exp(x); a negative function y=−1.0*(2*(1.0/(1.0+exp(−1.0*x)))−1); a reverse function, if (value>0) y=2.50000* ((1.0/sqrt(2.0*Pl))**exp(−8.0*(x*x))) else if (value<0) y=−2.5000*((1.0/sqrt(2.0*Pl))* exp(−8.0*(x*x))); sine functions, y=sin((Pl*x)/(2.0*4.0)), y=sin(x*Pl), or y=sin(x*2*Pl); an inverse Gaussian function y==2.5000*((1.0/sqrt(2.0*Pl))*exp(−0.5*(value*value))); a multiply function, wherein instead of adding the connection values, they are multiplied and a sigmoid, e.g., A.1 is applied to the final product.
As an example, processing element D comprises inputs 25 and 26 and output 27. Further, for example purposes, the connection 45 may exhibit a connection strength of “2” and connection 47 may exhibit a connection strength of “1.” Note that the “strength” of a connection affects the amplitude or the numeric value of the particular discrete value that is input into the processing element. The function f(D) employed by processing element may be, for example, a summation function, i.e.,
F(D)=Σ(Inputs)=1*input25+2*(input 26)=output 27.
Note that other functions may be employed by the processing elements A-E, as described herein, and the summation function used herein is for example purposes.
Note that the placement of the processing elements A-E, the activation functions f(A)-f(E), described further herein, of each processing element A-E, and the strength of the connections 44-48 are referred to as the “topology” of the CPPN 100′ or CPPN 100*. The strength of the connections 44-48 may be manipulated, as described further herein, during evolution of the CPPN 100′ or CPPN 100* to produce the CPPNs 200-210 and/or produce a modified rhythm reflecting one or more of the CPPNs 100-110 mated or mutated. Notably, the strengths of the connections 44-48 may be increased and/or decreased in order to manipulate the output of the CPPN 100′.
As described earlier with reference to
In one embodiment, the CPPN generation logic 53 generates the initial population of CPPNs 100-110. This initial population may comprise, for example, ten (10) CPPNs having an input processing element and an output processing element. In such an example, each input processing element and output processing element of each CPPN randomly generated employs one of a plurality of activation functions, as described herein, in a different manner. For example, one of the randomly generated CPPNs may employ formula A.1 in its input processing element and A.2 in its output processing element, whereas another randomly generated CPPN in the initial population may employ A.2 in its input processing element and A.1 in its output processing element. In this regard, each CPPN generated for the initial population is structurally diverse.
Further, the connection weight of a connection 44-48 intermediate the processing elements of each CPPN in the initial population may vary as well. As an example, in one randomly generated CPPN the connection weight between the processing element A and B may be “2,” whereas in another randomly generated CPPN the connection weight may be “3.”
Once the CPPN generation logic 53 generates the initial population, a user may view a graphical representation or listen to the rhythm of each CPPN 100-110 generated. One such graphic representation will be described below in connection with
GUI 100 interprets the rhythm output of one or more CPPNs to visually convey the strength at which each instrument beat is played. In the example of
By examining the row 115 associated with an instrument, one can evaluate, based upon the visual representation of in the row 115, whether the rhythm for the instrument may or may not be an acceptable one. In addition, the GUI 100 includes a “Play” button 102 associated with each grid. When button 102 is selected, the CPPN activation logic 54 plays the rhythm graphically represented by the particular grid 111. Since each of the grids 111 and 112 is a representation of a particular CPPN's output (100-110 or 200-210), selecting a “Show Net” button 16 results in a diagram of a CPPN representation (e.g., that depicted in
Once the user evaluates the rhythm by visually evaluating the grid 111 or playing the rhythm, the user can rate the rhythm by selecting a rating. In this example embodiment, ratings are selected through a pull-down button (e.g., poor, fair, or excellent). Other embodiments use other descriptive words or rating systems.
The GUI 100 further includes a “Number of Measures” control 104 and a “Beats Per Measure” control 105. As described herein, the rhythm displayed in grid 100 is a graphical representation of an output of a CPPN (100-110, 200-210) that generates the particular rhythm, where the CPPNs 100-110 and 200-210 further comprise beat, measure, and time inputs 11-13 (
Furthermore, the GUI 100 includes a tempo control 106 that one may used to change the tempo of the rhythm graphically represented by grid 111. In the example embodiment of
The GUI 100 further includes a “Load Base Tracks” button 107. A base track plays at the same time as the generated rhythm, allowing the user to determine whether or not a particular generated rhythm is appropriate for use as a rhythm for the base track. Further, one can clear the tracks that are used to govern evolution by selecting the “Clear Base Track” button 108. Once each rhythm is evaluated, the user may then select the “Save Population” button 109 to save those rhythms that are currently loaded, for example, “Rhythm 1” and “Rhythm 2.”
Additionally, once one or more rhythms have been selected as good or acceptable as described herein, the user may then select the “Create Next Generation” button 101. The evolving logic 54 then evolves the selected CPPNs 100-110 corresponding to the selected or approved rhythms as described herein. In this regard, the evolving logic 54 may perform speciation, mutate, and/or mate one or more CPPNs 100-110 and generate a new generation of rhythms generated by the generated CPPNs 200-210. The user can continue to generate new generations until satisfied.
The GUI 100 further comprises a “Use Sine Input” selection button 117. If selected, the evolving logic 54 may feed a Sine wave into an CPPN 100-110 or 200-210 as an additional input, for example, to CPPN 100′ (
Once the CPPNs are generated, the user evaluates the rhythms. In some embodiments, a program interacts with CPPN logic 52 and 53 to obtain a description of the initial population of CPPNs, then produces a visual representation of the CPPNs (e.g., GUIs 300, 500). At step 420, the system 10 receives a user selection of one or more of the rhythms. In some embodiments, user selection includes a user rating each initial rhythm for example, on a scale from excellent to poor.
At step 430, the system 10 creates a next generation of CPPNs based upon the selection input. In this regard, the system 10 generates CPPNs 200-210 through speciation, mutation, and/or mating based upon those rhythms that the user selected and their corresponding CPPNs. At step 440, the system 10 determines whether or not the user desires additional generations of CPPNs to be produced. If Yes, the process repeats again starting at step 420. If No, the process is ended. In this manner, the process of selection and reproduction iterates until the user is satisfied.
The systems and methods for evolving a rhythm disclosed herein can be implemented in software, hardware, or a combination thereof. In some embodiments, such as that shown in
The systems and methods disclosed herein can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device. Such instruction execution systems include any computer-based system, processor-containing system, or other system that can fetch and execute the instructions from the instruction execution system. In the context of this disclosure, a “computer-readable medium” can be any means that can contain or store the program for use by, or in connection with, the instruction execution system. The computer readable medium can be based on, for example but not limited to, electronic, magnetic, optical, electromagnetic, or semiconductor technology.
Specific examples of a computer-readable medium using electronic technology would include (but are not limited to) the following: a random access memory (RAM); a read-only memory (ROM); an erasable programmable read-only memory (EPROM or Flash memory). A specific example using magnetic technology includes (but is not limited to) a floppy diskette or a hard disk. Specific examples using optical technology include (but are not limited to) a compact disc read-only memory (CD-ROM).
The software components illustrated herein are abstractions chosen to illustrate how functionality is partitioned among components in some embodiments disclosed herein. Other divisions of functionality are also possible, and these other possibilities are intended to be within the scope of this disclosure. Furthermore, to the extent that software components are described in terms of specific data structures (e.g., arrays, lists, flags, pointers, collections, etc.), other data structures providing similar functionality can be used instead.
Software components are described herein in terms of code and data, rather than with reference to a particular hardware device executing that code. Furthermore, to the extent that system and methods are described in object-oriented terms, there is no requirement that the systems and methods be implemented in an object-oriented language. Rather, the systems and methods can be implemented in any programming language, and executed on any hardware platform.
Software components referred to herein include executable code that is packaged, for example, as a standalone executable file, a library, a shared library, a loadable module, a driver, or an assembly, as well as interpreted code that is packaged, for example, as a class. In general, the components used by the systems and methods for evolving a rhythm are described herein in terms of code and data, rather than with reference to a particular hardware device executing that code. Furthermore, the systems and methods can be implemented in any programming language, and executed on any hardware platform.
The flow charts, messaging diagrams, state diagrams, and/or data flow diagrams herein provide examples of the operation of rhythm generating CPPN logic, according to embodiments disclosed herein. Alternatively, these diagrams may be viewed as depicting actions of an example of a method implemented by rhythm generating CPPN logic. Blocks in these diagrams represent procedures, functions, modules, or portions of code which include one or more executable instructions for implementing logical functions or steps in the process. Alternate implementations are also included within the scope of the disclosure. In these alternate implementations, functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.
The foregoing description has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Obvious modifications or variations are possible in light of the above teachings. The implementations discussed, however, were chosen and described to illustrate the principles of the disclosure and its practical application to thereby enable one of ordinary skill in the art to utilize the disclosure in various implementations and with various modifications as are suited to the particular use contemplated. All such modifications and variation are within the scope of the disclosure as determined by the appended claims when interpreted in accordance with the breadth to which they are fairly and legally entitled.
This application claims the benefit of U.S. Provisional Application No. 60/941,192, filed May 31, 2007, which is incorporated by reference herein in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5138928 | Nakajima et al. | Aug 1992 | A |
5883326 | Goodman et al. | Mar 1999 | A |
6051770 | Milburn et al. | Apr 2000 | A |
6297439 | Browne | Oct 2001 | B1 |
6417437 | Aoki | Jul 2002 | B2 |
7065416 | Weare et al. | Jun 2006 | B2 |
7193148 | Cremer et al. | Mar 2007 | B2 |
7196258 | Platt | Mar 2007 | B2 |
7227072 | Weare | Jun 2007 | B1 |
7381883 | Weare et al. | Jun 2008 | B2 |
20050092165 | Weare et al. | May 2005 | A1 |
20060266200 | Goodwin | Nov 2006 | A1 |
20070022867 | Yamashita | Feb 2007 | A1 |
20070074618 | Vergo | Apr 2007 | A1 |
20070199430 | Cremer et al. | Aug 2007 | A1 |
20080190271 | Taub et al. | Aug 2008 | A1 |
20080249982 | Lakowske | Oct 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20080295674 A1 | Dec 2008 | US |
Number | Date | Country | |
---|---|---|---|
60941192 | May 2007 | US |