The present invention relates generally to analyzing musical compositions represented in audio files/sources and more particularly to predicting and/or determining musical key information about the musical composition.
The capacity to accurately determine musical key information from a musical composition represented, for example, in a digital audio file has myriad applications. For instance, DJs and musicians often need accurate musical key information for audio sampling, remixing, or other DJ-related purposes. Specifically, musical key information can be used to create audio mash-ups, compose new songs, or overlay elements of one song with another song without experiencing a harmonic key clash. Although the need for musical key information is apparent, the method to obtain such information is not. Frequently, documentation concerning the musical composition is not available, e.g. sheet music, thereby frustrating any efforts directed toward discovering musical key information about the composition.
Even without the necessary documentation, musical key information about a composition can be determined by an artisan with a “trained” ear. Simply by listening to a musical composition, the artisan can proffer a reasonably accurate conclusion as to musical key information of the composition-in-question. Unfortunately, many are without such a skill set.
It is also known to use computer software to predict musical key information about a musical composition represented in an audio file. Representative software packages include Rapid Evolution available through Mixshare and MixMeister Studio marketed by MixMeister Technology, L.L.C. These software products allow an audio file or other source containing a musical composition to be analyzed for musical key information, although with varying degrees of success and utility.
Consider, for exemplary purposes, the following sequence illustrating one approach to extracting/predicting musical key information from a musical composition. Initially, the musical composition is decomposed into its constituent musical note components. The collection of constituent musical notes is then compared to a database of musical key templates-often twenty four templates, one for each musical key. Each template in the database describes the notes most commonly associated with a specific key. To predict musical key information, the software selects the template, i.e. musical key, with the highest correlation to the collection of constituent musical notes from the subject audio file. Moreover, the software may also provide correlation or probability information describing the relationship between the collection of constituent musical notes and each of the templates.
Unfortunately, the database of templates typically employed in these types of software applications is hampered by the style of compositions used to build the templates (styles or genres of music different from that used to generate the templates may distort the results) and the limited number of templates available, such as only twenty-four.
Thus, what is needed a musical key detection system that can readily accommodate different musical styles, have a database containing as many templates as desired, and provide additional metrics from which to more accurately predict musical key information from musical composition represented by digital audio signals.
The present invention is a system and method for predicting and/or determining musical key information about a musical composition represented by an audio signal. The system includes a database having a collection of reference musical works. Each of the reference musical works is described by both a root key value and a note strength profile. The root key identifies the tonic triad, the chord, major or minor, which represents the final point of rest for a piece, or the focal point of a section. The note strength profile, or relative note strength profile, describes the frequency, duration and volume of every note in the reference musical work compared to other notes in the same musical work. Thus, for every reference musical work in the database, a corresponding root key and note strength profile exists. The root key and note strength profile may be determined through the same or different processes. For example, the root key may be determined by a neural network-based analysis of the reference musical work or by a skilled artisan with a trained ear listening to the song. The note strength profile may be determined by any number of software implemented algorithms. The database may include as many reference musical works are desired.
The present invention also provides a musical key estimation system coupled to the database, or, alternatively worded, capable of accessing the database. The musical key estimation system includes a note strength algorithm, an association algorithm, and a target audio file input. The note strength algorithm operates to determine the note strength of the target audio file (the audio file or audio source containing the musical composition of interest). To avoid confusion, it should be noted that the structure/content of the note strength of the target audio file (i.e. musical composition) and the note strength profile of the reference musical works are comparable. Further, in the preferred embodiment, the note strength algorithm can also be used to determine the note strength profiles of the reference musical works. The target audio file input is an interface, whether hardware or software, adapted to accept/receive the target audio file to permit the musical key estimation system to analyze the target audio file (i.e. musical composition).
The association algorithm predicts musical key information about the target audio file given the note strength of the target audio file and the information, i.e. reference musical works characteristics, in the database. Specifically, the association algorithm functions to predict musical key information based on an input, the note strength of the target audio file, and the existing relationships defined in the database by corresponding root keys and reference musical work note strength profiles and between different reference musical works. The association algorithm allows the musical key estimation system to generate implicit musical key information from the database given the note strength of the target audio file.
The association algorithm may be comprised of two main components, a data mining model and a prediction query. The data mining model is a combination of a machine learning algorithm and training data, e.g. the database of reference musical works. The data mining model is utilized to extract useful information and predict unknown values from a known data set (the database in the present instance). The major focus of a machine learning algorithm is to extract information from data automatically by computational and/or statistical methods. Examples of machine learning algorithms include Decision Trees, Logistic Regression, Linear Regression, Naïve Bayes, Association, Neural Networks, and Clustering algorithms/methods. The prediction query leverages the data mining model to predict the musical key information based on the note strength profile of the target audio file.
One important aspect of the present invention is the ability to have a database with reference musical works described by both a root key and a note strength profile. This provides the association algorithm with a database having multiple metrics describing a single reference musical work from which to base predictions. However, the importance lies not only in this multiple metric aspect but also in a database that can be populated with a limitless number of reference audio files from any styles or genres of music. In essence, the robust database provides a platform from which the association algorithm can base musical key information predictions. This engenders the present invention with a musical key prediction/detection accuracy not seen in the prior art.
The present invention relates generally to analyzing musical compositions represented in audio files. More specifically, the present invention relates to predicting and/or determining musical key information about the musical composition based on the note strength of the composition in relation to a database of reference musical works, each reference musical work having a note strength profile and a root key value. A musical work or composition describes lyrics, music, and/or any type of audible sound.
Now referring to
The musical estimation system 12 includes an association algorithm 16, a note strength algorithm 18, and an audio file input 20. The audio file input 20 permits the musical estimation system 12 to access or receive the target audio file 32, the target audio file 32 containing/representing the musical composition of interest 38 (the composition for which musical key information is desired, hereinafter “musical composition” 38). The target audio file 32 can be of any format, such as WAV, MP3, etc. (regardless of the particular medium storing/transferring the file 32, e.g. CD, DVD, hard drive, etc.). The audio file input 20 may be a piece of hardware; such as a USB port, a CD/DVD drive, an Ethernet card, etc., it may be implemented via software, or it may be a combination of both hardware and software components. Regardless of the particular implementation, the audio file input 20 permits the musical key estimation system 12 to accept/access the musical composition 38.
The note strength algorithm 18 is used to determine the note strength 34 of the musical composition 38 and, as will be explained in more detail below, provides a description of the musical composition 38 from which the predicted key information may be based. The note strength 34 provides a measure of the frequency, duration, and volume of every note in the musical composition 38 compared to other notes in the same composition 38 and operates as a signature for the musical composition 38. Accordingly, in the preferred embodiment, the note strength 34 is based on the relative core note values-a value for each musical note A, Ab, B, Bb, C, D, Db, E, Eb, F, F#, and G.
However, it is also within the scope of the present invention for the note strength 34 to encompass only a subset of the relative core notes and values, such as if the musical composition 38 does not contain one or more of the relative core notes or if processing/speed concerns dictate that not all of the relative core notes and values be used or, possibly, even needed. Further the present invention also envisages the note strength 34 composed of a set of notes greater than the relative core notes, for instance the note strength 34 may describe twenty-four or forty-eight notes. Even more generally, the note strength 34 may be composed of as many notes (e.g. frequency bands) as desired to effectively analyze the musical composition 38. For example, many modern pianos have a total of eighty-eight keys (thirty-six black and fifty-two white) and the note strength 34 may be composed of eighty-eight notes, one for each key on the piano. The set of notes comprising the note strength 34 is only constrained by the parameters of the association algorithm 16. Thus, if the association algorithm 16 accepts a note strength 34 with X number of elements then the musical composition 38 may be segmented into X number of elements by the note strength algorithm 18.
Referring to
The tuning frequency of a musical piece is typically defined to be the pitch A4 or 440 Hertz. For the note strength 34 to provide a robust and meaningful description of the musical composition 38, the actual tuning frequency of the composition 38 should be accounted for (tuning frequencies may vary due to, for example, the use of historic instruments or timbre preferences, etc.). To this end, the note strength algorithm 18 extracts the tuning frequency in a pre-processing effort (step 56).
The pre-processing step may be accomplished, among others, by applying, in parallel, three banks of resonance filters, with their mid-frequencies spaced by one semi-tone (100 cent), to the audio signal. The mid-frequencies of the three banks are slightly shifted by a constant offset. The mean energy over all semi-tones is calculated, resulting in a three-dimensional energy vector, and the tuning frequency of the filter banks is adapted towards the maximum of the energy distribution. The final result of the tuning frequency of the “middle” filter bank is then the result of this pre-processing step. A similar process is also described by Alexander Lerch, On the Requirement of Automatic Tuning Frequency Estimation, Proc of 7th Int. Conference on Music Information Retrieval (ISMIR 2006), Victoria, Canada, Oct. 8-12, 2006, which is hereby incorporated by reference.
Now that the actual tuning frequency is known, the tonal content, extracted from the frequency domain representation of the audio signal of the musical composition 38, can be converted into the pitch domain based on the actual tuning frequency of the musical composition 38—in essence, shifting the tonal content based on the actual tuning frequency, shown in step 58. The conversion results in a list of peaks with a pitch frequency and magnitude. This list is then converted into an octave-independent pitch class representation by summing all pitches that represent a C, C#, D, etc. from all octaves into one pitch chromagram vector that is 12-dimensional, one dimension for each pitch class, as shown in step 60. The pitch chromagram vector, visually represented in
The database 14 includes a plurality of reference audio files 22 (also referred to as analyzed audio signals 22), each reference audio file 22 representing a musical work 36 (also refereed to as a musical piece 36 or reference composition 36) and having a root key 24 and a note strength profile 26 or reference note strength profile 26. The note strength profile 26 of a musical work 36 is analogous to the note strength of the musical composition 34 and, in the preferred embodiment, is obtained via the note strength algorithm 18 detailed above.
The root key 24 identifies the tonic triad, the chord, major or minor, which represents the final point of rest for a piece, or the focal point of a section. The root key 24 can be determined in numerous ways; such as by a neural engine after it has been trained by evaluating outcomes using pre-defined criteria and informing the engine as to which outcomes are correct based on the criteria, documentation accompanying the reference audio file 22 or musical work 36, the conclusion of an artisan with a trained ear, the musician or composer of the work 36, etc. Consequently, and importantly, all musical works 36 in the database 14 are described by two disparate metrics—root key 24 and note strength profile 26.
The database 14 may be contained on a single storage device or distributed among many storage devices. Further, the database 14 may simply describe a platform from which the plurality of reference files 22 can be located or accessed, e.g. a directory. The plurality of reference files 22 contained within the database 14 may be altered at any time as new reference musical works or supplemental analyzed audio files are added, removed, updated, or re-classified.
The database 14 can be populated as depicted in
The association algorithm 16 predicts musical key information about the musical composition 38 by analyzing the note strength of the composition 34 in relation to both the root keys 24 and note strength profiles 26 of the plurality of reference audio files 22 (containing/representing the musical works 36). The association algorithm 16 of one embodiment is comprised of two main components: a data mining model 28 and a prediction query 30.
The data mining model 28 uses the pre-defined relationships between the root keys 24 and the note strengths profiles 26 and between different reference audio files 22 to generate/predict musical key information based on previously undefined relationships, i.e. a relationship between the note strength of the musical composition 38 and the reference audio files 22 or musical works 36. To realize this ability, the data mining model 28 relies on training data from the database 14, in the form of root keys 24 and note strength profiles 26, and a machine learning algorithm.
Machine learning is a subfield of artificial intelligence that is concerned with the design, analysis, implementation, and applications of algorithms that learn from experience, experience in the present invention is analogous to the database 14. Machine learning algorithms may, for example, be based on neural networks, decision trees, Bayesian networks, association rules, dimensionality reduction, etc. In the preferred embodiment, the machine learning algorithm (or association algorithm 16 more generally) is based on a Naïve Bayes model.
Bayesian theory is a mathematical theory that controls the process of logical inference. A form of Bayes' theorem is reproduced below:
Naïve Bayes models are well suited for basing predictions on data sets that are not fully developed. Specifically, Naïve Bayes models assume data sets are not interrelated in a particular way. This allows the above equation to be simplified as follows:
Where, in relation to the present invention, P(A/B) is the probability of a particular musical key given the note strength, P(B/A) is the probability of the note strength given a particular musical key, P(A) is the probability of a particular musical key, and P(B) is the probability of a particular note strength. Intuitively, P(B) would likely be zero, unless one of the plurality of reference audio files 22 (containing/representing the musical works 36) had exactly the same note strength/note strength profile as the musical composition 38-an unlikely scenario as the note strength is not restricted to a limited number of incarnations. Thus, the note strength profiles 26 are grouped into categories and it is the probability of these categories of note strength profiles that are used in the Naïve Bayes model for P(B).
The prediction query 30 utilizes the data mining model 28 to predict musical key information based on the note strength of the target audio file 34. However, this process need not be recreated for every different application; rather it can be facilitated by commercially available software. For illustrative purposes, a SQL database management package, distributed by Microsoft®, could be employed to build the data mining model 28 and request information from the database 14 via the data mining model 28. Advantageously, the SQL package has an integral Naïve Bayes-based data mining model/tool. One specific implementation of a Naïve Bayes-based data mining model/tool is presented in U.S. Pat. No. 7,051,037 issued to Thomas et al., and is hereby incorporated by reference.
As is clear from
In another embodiment of the present invention, the association algorithm 16 can be based on data clustering (“Clusters”) instead of a data mining model/tool. Clustering partitions a large data set, e.g. the database 14, into smaller subsets according to predetermined criteria. This process is detailed in
An exemplary representation of a clusters database 15 having two C Minor clusters and two C Major clusters is depicted in
A prediction sequence based on this Clusters embodiment is shown in
It should also be noted that the association algorithm 16 (whether via a Bayesian technique, Clusters technique, or other) can not only provide/predict the musical key with the highest probability or correlation to that of the musical composition 38 but also provide information about the probability or correlation for all other keys. In other words, the present invention can predict the likelihood of each possible key being the actual key of the musical composition 38.
Further, and once again independent of the particular technique employed, the operation of the musical key estimation system 12 can be described, in part, as generating a plurality of prospect values and using the prospect values to predict musical key information about the musical composition 38. Specifically, each distinct prospect value relates the note strength of the musical composition 34 to a distinct note strength profile of a musical work 26 (or group of musical works 26 as in the clusters method or the Naïve Bayes model). By evaluating the prospect values, the musical key estimation system 12 can select a candidate note strength profile (one particular note strength profile) from the plurality of note strength profiles 26 or grouped note strength profiles. The candidate note strength profile selected having a prospect value within an indicator range. The indicator range defining some metric, e.g. highest correlation between the note strength and note strength profile or lowest correlation. The musical key estimation system 12 then provides the root key 24 corresponding to the candidate note strength profile as the output or result.
Moreover, as the association algorithm 16 can employ techniques to predict/detect the musical key of the composition 38, the present invention also allows the results of the different techniques to be compared using a lift chart—a measure of the effectiveness of a predictive model calculated as the ration between the results obtained with and without the predictive model. Thus, when different association algorithms 16 (using different techniques) are more accurate that than others, the present invention can determine which techniques (or more precisely which association algorithm 16 using a specific technique) is more accurate and base the prediction of the most effective technique.
The database 14 may also include a composition classification system 48. The composition classification system 48 provides a structure that permits the plurality of reference audio files 22 to be organized (or at least searchable) according to the type of musical work they represent—such as jazz, classical, rock, etc. In some instances, better predictions may result if the association algorithm 16 only bases its efforts on musical works 36 in the same genre or style as the musical composition 38. Thus, if the musical composition 38 is known to be a jazz song (classified, for example, in a first class) then the present invention permits the association algorithm 16 to only employ musical works 36 in the database 14 classified as jazz works or in the first class, as determined by the composition classification system 48. However, and more generally, the composition classification system 48 allows the association algorithm 16 to use any number or type/style/genre of classifications for its predictions whether or not the classification of any particular musical work 36 accords with the style or genre of the musical composition 38.
Although in most cases an entire musical composition will be analyzed to detect the musical key, the present invention also permits the musical composition 38 to be analyzed in segments of varying size. Further, as the present invention can analyze the musical composition 38 in segments, it can also report key changes that occur during the composition 38. Thus, if the key of the musical composition 38 changes from A Minor to E Minor, the present invention can report the change and the specific segment in the composition 38 where the change occurred.
The audio file input 20 of the musical estimation system 12 is adapted to accept the target audio source 32. For example, if the target audio source 32 is a flash drive 32, the audio file input 20 may be a USB port 20 that receives the flash drive 32. Further, in this example, the musical key estimation system 12 may be a personal computer having a memory storage device, such as a first hard drive, that stores the association algorithm 16 and the note strength algorithm 18. The personal computer 12 may also provide the necessary control over the audio file input 20 (e.g. the USB port) to manipulate the target audio source 32 and provide the memory (e.g. the first hard drive, RAM, cache) and the processing power (e.g. the CPU) needed to execute the algorithms 16 and 18.
The database 14, containing the reference audio files 22, may be a separate storage device, e.g. another computer or a server, or it may be another component of the musical key estimation system 12, e.g. a second hard drive in the personal computer 12 or merely a part of the first hard drive. Irrespective of the configuration of the musical key estimation system 12 and the database 14, the association algorithm 16 is able to access and read the database 14 and the reference audio files 22 to generate/predict musical key information about the composition 38.
Once the association algorithm 16 has determined/predicted musical key information about the musical composition 38, the results may be reported on an output display 158, such as a computer monitor.
Thus, although there have been described particular embodiments of the present invention of a new and useful SYSTEM AND METHOD FOR PREDICTING MUSICAL KEYS FROM AN AUDIO SOURCE REPRESENTING A MUSICAL COMPOSITION, it is not intended that such references be construed as limitations upon the scope of this invention except as set forth in the following claims.
This application is a non-provisional application which claims benefit of co-pending U.S. Patent Application Ser. No. 60/945,311 filed Jun. 20, 2007, entitled “MUSICAL KEY DETECTION USING HUMAN TRAINING DATA” which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60945311 | Jun 2007 | US |