1. Field of the Invention
The present invention relates generally to data processing, pattern recognition and music composition. In particular, the invention provides an automated method of data regression and generation to produce original music compositions. The system utilizes existing musical compositions provided by a user and generates new compositions based upon a given input.
2. Background
Although the spirit out of popular music has slowly eroded over the last several years, there exists an excellent industry-rooted motivation for research towards discovering an elusive “pop formula.” While the reward for discovering such “pop formula” may be great, research on this field, up to date, has not utilized popular music songs to create new compositions.
Much of the work done thus far in computational composition has been quite respectful of the role of the human being in the process of composition. From Lejaren Hiller (Hiller, L. & L. Isaacson, 1959, Experimental Music, McGraw Hill Book Co. Inc.) to David Cope (Cope, D., 1987, Experiments in Music Intelligence, Proceedings of the International Music Conference, San Francisco: Computer Music Ass'n.) and Michael Mozer (Mozer, M., Neural Network Music Composition by Prediction: Exploring the Benefits of Psychoacoustic Constrains and Multiscale Processing, Connection Science, 1994), researchers have likened their use of machinery in the creation of original works to the use that any artist makes of an inanimate tool. Hiller states this clearly:
The field of research into computational methods of musical analysis and generation is quite broad. Early efforts towards the probabilistic generation of melody involved the random selection of segments of a discrete number of training examples (P. Pinkerton, Information Theory and Melody, Scientific American, 194:77-86, 1956). In 1957, Hiller, working with Leonard Isaacson, generated the first original piece of music made with a computer—the “Illiac Suite for String Quartet.” Hiller improved upon earlier methods by applying the concept of state to the process, specifically the temporal state represented in a Markov chain. Subsequent efforts by music theorists, computer scientists, and composers have maintained a not-to-distant orbit around these essential approaches-comprehensive analysis of a musical “grammar” followed by a stochastic “walk” through the rules inferred by the grammar to produce original material, which (it is hoped) evinces both some degree of creativity and some resemblance to the style and format of the training data.
In the ensuing years, various techniques were tried ranging from the application of expert system, girded with domain-specific knowledge encoded by actual composers to the model of music as auras of sound whose sequence is entirely determined by probabilistic functions (I. Xenakis, Musiques Formelles, Stock Musique, Paris, 1981).
The field enjoyed a resurgence in the 80's and 90's with the widespread adoption of the MIDI (Musical Instrument Digital Interfaces) format and the accessibility that format provides for composers and engineers alike to music at the level of data. In the world of popular music, the growth in popularity of electronica, trance, dub, and other forms of mechanically generated music has led to increased experimentation in computational composition on the part of musicians and composers. Indeed, in the world of video games, the music composed never ventures further than the soundboards of computers on which it is composed. As far as the official record goes, however, even given all of the research that has gone into automatic composition and computer-aided composition, in the world of pop (which is a world of simple, catchy, ostensibly formulaic tunes) there is still no robotic Elvis or a similar system that allows for the composition of such musical pieces.
A search of the prior art uncovers systems that are designed to develop musical compositions as continuations of single musical inputs. These systems utilize single musical compositions as templates for a continuation of the melody, but do not create new compositions based upon the original input. Other systems utilize statistical methods for morphing one sound into another. While these systems utilize more than one input, their output is merely a new sound that begins with the original input and evolves into the second input. Such a basic system lacks the ability to create completely new compositions from more complex input such as a pop song. Other systems allow for the recognition of representative motifs that repeat in a given composition, but they do not create completely new compositions. As a result, there is a need for a system that can utilize multiple advanced compositions, such as pop songs, to create new musical pieces.
The present invention provides a system for the creation of music based upon input provided by the user. A user can upload a number of musical compositions into the system. The user can then select from a number of different statistical methods to be used in creating new compositions. The system utilizes a selected statistical method to determine patterns amongst the inputs and creates a new musical composition that utilizes the discovered patterns. The user can select from the following statistical methods: Radial Basis Function (RBF) Regression, Polynomial Regression, Hidden Markov Models (HMM) (Gaussian), HMM (discrete), Next Best Note (NBN), and K-Means clustering. After the existing musical pieces and the statistical method are chosen, the system develops a new musical composition. Lastly, when the user selects the “Listen” option, the program plays the new composition and displays a graphical representation for the user.
The various features of novelty that characterize the invention will be pointed out with particularity in the claims of this application.
The above and other features, aspects, and advantages of the present invention are considered in more detail, in relation to the following description of embodiments thereof shown in the accompanying drawings, in which:
The invention summarized above and defined by the enumerated claims may be better understood by referring to the following description, which should be read in conjunction with the accompanying drawings. This description of an embodiment, set out below to enable one to build and use an implementation of the invention, is not intended to limit the invention, but to serve as a particular example thereof. Those skilled in the art should appreciate that they may readily use the conception and specific embodiments disclosed as a basis for modifying or designing other methods and systems for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent assemblies do not depart from the spirit and scope of the invention in its broadest form.
In an effort to solve the above-described problem, a computer application for the automatic composition of musical melodies is provided.
The user can then select the melodies that will be utilized to train the system. The user can select a melody one at a time or all the melodies using a single button 121. The user can also instruct the system to select songs that appear on the list at specific times using another button 123, such as selecting every fifth song on the list to compile the training set. The user can also instruct the system to utilize only songs belonging to a designated identifier using separate buttons, such as B 127, C 129, D 131, or P 133, as the training set for creating a new composition.
Once the user selects the training set from the song list 103, a specific training 108 approach can be selected. A number of buttons are provided so the user can choose from a number of training methods: Regression-RBF 107, Regression-Polynomial 109, HMM (discrete) 111, HMM (Gaussian) 113, NBN (Next Best Note) 115, or K Means 117. Each of these training methods is described in further detail below. Once the training method is selected, the user can specify parameters for the selected method, such as the number of standard deviations, “sigma” 135, the number of standard deviations or “degree” 137, the number of discrete hidden states 139, the number and mix of hidden states 141, 143, or the number of centroids to consider 145. Having been given the inputs, the system is trained and produces an output file. A suitable programming language such as Python is used to translate the sequence of integers contained in the output file into a playable MIDI file. The newly created MIDI file can then be launched from the GUI 101. The file is launched by selecting the “listen” 119 option. The MIDI file then is played through any audio peripheral compatible with the system and a graphical representation of the training results can be presented to the user as shown in
The output of the new song generation process is a file that can be stored in a subfolder of the application directory tree, or any location selected by the user. As stated previously, that file can then be translated into a MIDI file that can also be stored at a specific location in the application directory tree or a location selected by the user. Some embodiments of the present invention can allow the user to select locations for storage of the output file, the MIDI file, both files, or neither file. Some embodiments of the present invention allow the user to specify the name of the MIDI file. Other embodiments do not allow the user to specify the name of the file. If the embodiment does not allow the user to select a new name for the file, the file will be overwritten with every use of the application or given a generic name that changes when the application is subsequently utilized. When the user is ready to select a new training set, the user may clear the previous selections by selecting the “clear selection” button 125.
The output files are generated through a variety of different methods as shown in
In this method, a standard least squares measurement is used for the estimation of empirical risk −R(θ). As a result, θ minimizes to:
Where N is the number of samples in the training set, y is a vector of outputs, X is a D-dimensional matrix of N rows of input, and θ represents the coefficient parameters used. In some embodiments, the variable X (representing the sequence of pitches) is unidimensional. To elevate the resulting equation from its simplistic linear output, a feature space of non-linear equations φ is introduced, which is applied to each input. Therefore, the dimensions of X become the value of Xi (for each i in φ) as transformed by each φi. Under this model, Equation (1) becomes:
The minimization of θ is accomplished by computing the gradient ∇R of Equation 2 (essentially, taking partial derivatives of the equation), setting to zero and solving for θ. The resulting equation (in matrix form) simplifies as:
Θ=(XTX)−1Xry (3)
Equation (3) is simply the pseudo inverse of matrix X multiplied by the output vector y.
The first feature vector (which can be implemented and tested by setting the parameters and pressing the “Regression Polynomial” 109 button in the GUI 101) is simply an array of functions that successively raise the degree of each xi to the power of each i for each φi in the feature space. As an input parameter, the Regression Polynomial function 109 accepts an integer value for its “sigma” component 135.
Another method is the Radial Basis Function (RBF) Regression that can be selected by using the button 107 on the GUI 101. This method is more versatile than the basic Regression Polynomial. The RBF Regression generally takes the form of some weight multiplied by a distance metric from a given centroid to the data provided. In one embodiment of the present invention, the function utilized is Gaussian providing a normal distribution of output over a given range. As an input parameter, the RBF function 107 accepts an integer value for its “sigma” component or degree 137, which corresponds to the width of the Gaussian function involved. This function is represented by the formula:
An RBF Regression has the advantage of being more flexible than the simple polynomial regression because it takes into account its distance from the data at every point (centroid here corresponding to the individual input data points). This is an additive model, meaning that the output from each function is “fused” with the output of each succeeding and preceding function to generate a smoother graph. At smaller values of sigma, the output provides an accurate representation of the input.
The next available method for music generation utilizes Hidden Markov Models (HMM) both discrete and Gaussian, which can be selected by buttons 111 and 113. The Markov model has been used widely in the field of natural language recognition and understanding. The general Markov principle provides that the future is independent of the past, given the present. Although this principle may appear to be dismissive of the concept of history, it implies a strong regard for the temporal nature of data.
Inherent in this structure, however is the conditional probability of node Z given node Y. Mathematically, this is presented as: P(Z|Y). For the model shown in
p(X,Y,Z)=p(X)p(Y|X)p(Z|Y) (5)
In contrast, this model differs from a probabilistic model in which the output of any node is equally likely—the case in which the entire set of outputs is independently and identically distributed (typically, and often cryptically, referred to as IID):
p(X,Y,Z)=p(X)=p(Y)p(Z) (6)
It is often the case, when reviewing data for statistical analysis, that certain data points are observed and others remain unknown to us. This situation gave rise to the concept of the Hidden Markov Model, in which an n-th order Markovian chain stands “behind the scenes” and is held responsible for a sequence of outputs.
As an imaginary-world example, consider the Wizard of Oz (Baum, L. Frank, The Wonderful Wizard of Oz, George M Hill, Chicago and N.Y. 1900). The flaming head and scowling visage of the Wizard in the grand hall of Emerald city can be seen as occupying any of a sequence of output states X={x1, x2, x3, . . . , xn} where x, (for example) is his chilling cry of “SILENCE!” at the protestations of the Cowardly Lion. Meanwhile, the diminutive and somewhat avuncular figure of the old gentleman from Kansas, who stands frantically behind the curtain, yanking levers and pulling knobs, can be seen as occupying any of a number of “hidden” states Q={q1, q2, q3} which give rise to the output states mentioned above.
In this case, the old gentleman's transition from one state q1 to the next state qt+1 is governed by a matrix of transition probabilities, which is typically chosen to be homogeneous (meaning that the probability of transition from one state to the next is independent of the variable t). A graphic illustration of this model can be found in
The joint probability for the model is therefore given by:
The essential idea of the HMM is that we can determine likelihood of a given hidden state sequence and output sequence by assuming that there is a “man behind the curtain” at work in generating the sequence.
A classic example illustrates the principle embodied in the present invention. One can determine the probability of drawing a sequence of colored balls from a row of urns, each of which contain a specific number of differently-colored balls, if one knows how many of each color is in each urn, and the likelihood of moving from one urn to the next. Similarly, one can determine the probability of each urn containing a certain number of each color if one is shown enough sequences and told something about the probability of transitioning from urn to urn. (See Rabiner, Lawrence, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, Vol. 77, No. 2, February 1989). As a result, generally, if one knows the number of hidden states, and the likelihood of moving from one hidden state to the next, and one knows the probability of emitting a given output symbol for each hidden state, then the world of the model is uncovered.
The HMM is a powerful tool for analyzing seemingly random sequences of emissions. In one embodiment of the present invention, the emission states correspond to a sequence of pitches. The preferred embodiment of the present invention estimates the transition matrix and then, given a set of training examples or emission sequences (the notes in the training songs), estimates the probabilities of emissions. The resulting model is then utilized to generate new data.
As shown in
Another technique utilized in one embodiment of the present invention is the Next Best Note (NBN). The NBN technique can be selected using button 115. In this approach, a kind of virtual grammar is induced from the dataset, examining at each point of the song, the next most likely position given the current position. This can be viewed as a first order Markovian approach. The interesting aspect of this model is that the generated output tends to represent the original dataset more faithfully. In addition, it provides an improved training strategy across multiple songs. In this approach, a matrix of N×177 is created where N is the number of songs in the training set and 177 is the normalized song length. Each song is encoded with a fixed “start note” that is outside the range of notes present in any of the songs. The application then stochastically selects from among the most common next notes. This process continues for each selected note until the end of the song.
Another technique utilized to create new melodies is the K-means clustering 117, as shown in
The process continues to convergence. In one embodiment of the present invention, the algorithm is run twenty times, choosing from among the twenty results the centroids that produce the minimum value for J where J is determined as the sum across all points and all centroids of the Euclidean distance of the points to the centroids, or:
The user of one embodiment of the present invention can specify the number of clusters/centroids 145 he or she would like to examine. After identifying the clusters, each segment is fed to the discrete HMM, which generates output based on its estimation. The K-means algorithm according to the present invention identifies a certain segmentation within the song and the HMM (at its finer level of granularity) is able to extract intra-segment patterns that yield more aesthetically pleasing melodies. As described below,
While the present invention can be used in the traditional approach (i.e. the production of music utilizing complex musical forms of classical—or at least historic—origin), it can also function by drawing from a very different corpus, i.e. popular music. One dataset utilized by the present invention consists of 46 pieces of pop music written by the Beatles (excepting Sir Ringo Starr). Given this dataset, ostensibly much reduced in complexity and theoretically possessing of a tangible formulaic quality, the present invention demonstrates that truly aesthetic pop songs (to the ear of a human listener) can be generated using a variety of statistical techniques.
As shown on
The songs are normalized using an open source library (such as that found at http://www.mxm.dk/products/public/pythonmidi) of MIDI conversion and decoding tools. The songs are normalized, as explained earlier, by reducing the note count to the lowest common denominator (177), applying uniformity to note duration (each note is transformed to an eighth note, regardless of previous duration).
A graphical depiction of the output of the system utilizing the Polynomial Regression method 109 trained at degree 6 with the song “When I'm 64” is shown in
The invention has been described with references to exemplary embodiments. While specific values, relationships, materials and steps have been set forth for purposes of describing concepts of the invention, it will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the basic concepts and operating principles of the invention as broadly described. It should be recognized that, in the light of the above teachings, those skilled in the art can modify those specifics without departing from the invention taught herein. Having now fully set forth the preferred embodiments and certain modifications of the concept underlying the present invention, various other embodiments as well as certain variations and modifications of the embodiments herein shown and described will obviously occur to those skilled in the art upon becoming familiar with such underlying concept. It should be understood, therefore, that the invention may be practiced otherwise than as specifically set forth herein. Consequently, the present embodiments are to be considered in all respects as illustrative and not restrictive.
This application is based upon and claims benefit of copending and co-owned U.S. Provisional Patent Application Ser. No. 60/927,998 entitled “Music Analysis and Generation Method”, filed with the U.S. Patent and Trademark Office on May 4, 2007 by the inventor herein, the specification of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60927998 | May 2007 | US |