The present disclosure relates to content selection and organization systems.
Individual pieces of music are identified herein as “songs” for simplicity, regardless of whether such songs actually involve any form of human singing. Rather, a song is an individual piece of music that has a beginning and an end, regardless of its length, the type of music being played therein, whether it is instrumental, vocal or a combination of both, and regardless of whether it is part of a collection of songs, such as an album, or by itself, a single.
Traditional content selection systems, especially music selections systems, such as APPLE ITUNES, tend to rely on content types based on style, genre, content author(s), content performer(s), etc., for enabling users to browse through vast libraries of content and make selections to watch, listen, rent, buy, etc. For example, in such music selection systems, the music is often organized by the genre, style or type of music, i.e., jazz, classical, hip hop, rock and roll, electronic, etc., and within such genres, the music may be further classified by the artist, author, record label, era (i.e., 50's rock), etc.
Some music selection systems will also make recommendations for music based on user preferences and other factors. Pandora Media, Inc.'s PANDORA radio system, for example, allows users to pick music based on genre and artists, and will then recommend additional songs the user may be interested in listening to based on the user's own identification system. This identification system is derived from the Music Genome Project. While the details of the Music Genome Project do not appear to be publicly available, certain unverified information about it is available on-line. For example, Wikipedia states that the Music Genome Project uses over 450 different musical attributes, combined into larger groups called focus traits, to make these recommendations. There are alleged to be thousands of focus traits, including rhythm syncopation, key tonality, vocal harmonies, and displayed instrumental proficiency. See, http://en.wikipedia.org/wiki/Music_Genome_Project.
According to Wikipedia [[cite]], in accordance with the Music Genome Project, each song is represented by a vector (a list of attributes) containing up to 450 or more attributes or “genes,” as noted above. Each gene corresponds to a characteristic of the music, for example, gender of lead vocalist, level of distortion on the electric guitar, type of background vocals, etc. Different genres of music will typically have different sets of genes, e.g., 150 genes for some types of music, 350 to 400 genes for other types, and as many as 450 genes for some forms of classical music. Each gene is assigned a number between 0 and 5, in half-integer increments. The assignment is performed by a human in a process that takes 20 to 30 minutes per song. Some percentage of the songs is further analyzed by other humans to ensure conformity. Distance functions are used to develop lists of songs related to a selected song based on the vector assigned to the selected song.
While the Music Genome Project represents an ambitious and detailed identification system, it suffers from many shortcomings as a result of its inherent complexity. The most significant of these deficiencies is that it often recommends songs, as implemented by PANDORA, as being similar to other songs, but listeners of those songs are not capable of identifying why those songs were determined to be similar. There may be very good reasons, among the hundreds of attributes being used to make determinations of similarities between the songs, but those similarities do not appear to relate to what most listeners hear or feel. Accordingly, a better, more simplistic solution is needed.
Human identification relies on human perception, which is a subjective system. Human perception is believed to be involved because songs identified by a particular mood and a particular color may or may not sound anything like other songs identified by the same mood and color. This tends to indicate human perception as a subjective error factor in identifying music in this manner.
A content selection system and method for identifying and organizing moods in content using objectively measured scores for rhythm, texture and pitch (RTP) and clustered into six mood classifications based on an objective analysis of the measured scores. Digitized representations of the content may also be identified and organized based on the content's frequency data, three-dimensional shapes derived from the digitized representations, and colors derived from the frequency data. Each piece of content may be identified by at least a mood shape, but may also be identified by a mood color and/or a mood based on the clustered RTP scores and/or the digitized representation. In a further embodiment, the RTP-based mood classifications may be used in place of fingerprints and combined with color and shape. Users of the selection system may be able to view the moods identified in the different manners, or combinations of two or three mood identifying manners and select, customize and organize content based on the identified moods.
Throughout the drawings, reference numbers may be re-used to indicate correspondence between referenced elements. The drawings are provided to illustrate examples described herein and are not intended to limit the scope of the disclosure.
Embodiments of the present disclosure are primarily directed to music selection and organization, but the principles described herein are equally applicable to other forms of content that involve sound, such as video. In particular, embodiments involve a content identification or classification system that objectively identifies music based on a three part classification model, an rhythm, texture and pitch (RTP) model, or a combination of both.
The three parts of the three part classification model include fingerprint, color and shape. The generation of the fingerprint will be described first. Music, and songs in particular, may be represented in a number of different ways that provide a visual representation of the music. As illustrated in
While such a waveform can be somewhat distinctive of the song represented, the amount of information conveyed by the small distortions in the waveform is limited, making it difficult for someone viewing the waveform to extract much in the way of perceptual information. If that song evoked a mood in someone listening to the song, the 1-D waveform does little to represent the characteristics of the song that evoke that mood.
Accordingly, audio spectrograms based on a short-term Fourier transform, such as represented in
The spectrogram is a two dimensional (2-D) representation of frequency over time, like a waveform, but is considered to provide a more accurate representation of the song because the spectrogram shows changes in intensity on specific frequencies, much like a musical score. The 2-D spectrogram shows some visual distinction based on signal differences due to different audio sources, such as different persons' voices and different types of instruments used to perform the song.
While the spectrogram visually represents some similarities and differences in the music, the time-domain signal representation makes the process of comparing spectrograms using correlation slow and inaccurate. One solution proposed for analyzing the characteristics of spectrogram images is disclosed by Y. Ke, D. Hoiem, and R. Sukthankar, Computer Vision for Music Identification, In Proceedings of Computer Vision and Pattern Recognition, 2005. In this paper, the authors propose determining these characteristics based on: “(a) differences of power in neighboring frequency bands at a particular time; (b) differences of power across time within a particular frequency band; (c) shifts in dominant frequency over time; (d) peaks of power across frequencies at a particular time; and (e) peaks of power across time within a particular frequency band.” Different filters are used to isolate these characteristics from the audio data. If the audio data is formatted in a particular music format, such as MP3, .WAV, FLAC, etc., the compressed audio data would first be uncompressed before creating the spectrogram and applying the filters.
One solution for analyzing spectrograms of music in this fashion is the CHROMAPRINT audio fingerprint used by the ACOUSTID database. CHROMAPRINT converts input audio at a sampling rate of 11025 Hz and a frame size is 4096 (0.371 s) with ⅔ overlap. CHROMAPRINT then processes the converted data by transforming the frequencies into musical notes, represented by 12 bins, one for each note, called “chroma features”.
While the audio representation, or chromagram, of
The arrangement of filter images from
CHROMAPRINT uses 16 filters that can each produce an integer that can be encoded into 2 bits. When these are combined, the result is a 32-bit integer. This same process may be repeated for every subimage generated from the scanned image, resulting in an audio fingerprint, such as that illustrated in
Audio or acoustic fingerprints may be used to identify audio samples, such as songs, melodies, tunes, advertisements and sound effects. This may enable users, for example, to identify the name of a song, the artist(s) that recorded the song, etc., which can then be used to monitor copyright compliance, licensing compliance, monetization schemes, etc.
The present disclosure proposes a number of additional novel uses for audio fingerprints and other components used to generate the fingerprints, including the spectrogram and the chromagram, which may be used to objectively identify texture, pitch and the moods in songs or other content, respectively. When a listener feels a particular emotion as a result of listening to a song (except for any personal connection to memories or experiences that the listener may also have), the listener is typically reacting to some inherent quality that can be identified within the frequencies or other characteristics of the song. Since all aspects of a song may be represented in its frequencies and aspects of those frequencies, and those frequencies are used to generate an audio fingerprint, the mood of that song may therefore also be represented in that audio fingerprint. By comparing known similarities between audio fingerprints for songs having the same mood, it may be possible to identify the mood of a song by simply analyzing the audio fingerprint.
Since some songs represent multiple moods or more or less of a mood than other songs, the degree of one or more moods represented by a song may be represented by similarity percentages. For example, if a song is 40% aggressive, 40% sad, and 20% neutral, the mood identifiers may also be associated with those similarity percentages.
While the use of audio fingerprints to identify moods in a wide range of songs is a significant improvement over the existing technique of relying on humans to listen to songs in order to identify the mood(s) conveyed by the songs, there is still a causality dilemma. In order to get a classifying/clustering machine to use audio fingerprints to identify different moods represented in songs, it is first necessary to train the machine with audio fingerprints that represent different moods in the songs, which requires humans to listen to some set of songs to identify the moods in that set so that set can be used to train the machine. As a result, if the machine is not well trained, then the audio fingerprint based identification may not be accurate. This human element also reintroduces some of the possible subjective error that exists from reliance on human-based mood identification.
The present disclosure also addresses these problems by using at least three different mood identification techniques to identify each song. While a human element may be used to get the process started, using multiple mood identification techniques, such as audio fingerprinting, makes it possible to check each mood identification technique against at least two other different mood identification techniques, which results in better mood identification accuracy.
One additional mood identification technique, as previously noted, is the use of color. Color is more traditionally thought of in the context of timbre, which is the term commonly used to describe all of the aspects of a musical sound that have nothing to do with the sound's pitch, loudness or length. For example, two different musical instruments playing the same note, at the same loudness and for the same length of time may still be distinguished from one another because each of the instruments is generating a complex sound wave containing multiple different frequencies. Even small differences between these frequencies cause the instruments to sound differently and this difference is often called the “tone color” or “musical color” of the note. Words commonly used to identify musical color include clear, reedy, brassy, harsh, warm, resonant, dark, bright, etc., which are not what humans tend to think of as visual colors.
Nevertheless, a similar technique may be used to assign visual colors to music. The human eye has three kinds of cone cells that sense light with spectral sensitivity peaks in long, middle and middle wavelengths, noted as L, M and S. Visual colors similarly correspond to these wavelengths, with blue in the short wavelength, green in the middle wavelength, and red spanning between the middle wavelength and the long wavelength. The three LMS parameters can be represented in three-dimensional space, called “LMS color space” to quantify human color vision. LMS color space maps a range of physically produced colors to object descriptions of color as registered in the eye, which are called “tristimulus values.” The tristimulus values describe the way three primary colors can be mixed together to create a given visual color. The same concept may be applied to music by analogy. In this context, musical tristimulus measures the mixture of harmonics in a given sound, grouped into three sections. The first tristimulus may measure the relative weight of the first harmonic frequency; the second tristimulus may measure the relative weight of the 2nd, 3rd, and 4th harmonics taken together; and the third tristimulus may measure the relative weight of all the remaining harmonics. Analyzing musical tristimulus values in a song may make it possible to assign visual color values to a song.
Since the musical tristimulus values correspond to the frequencies represented in a song, and (as noted above) the frequencies can be used to identify moods in that song, it follows that the visual colors identified in that song can be used to identify the mood(s) of that song. However, because songs consist of multiple frequencies, simple visual color representations of songs tend to end up looking like mixed up rainbows, with the entire visual color spectrum being represented at once. To address this issue, each visual color representation can be further analyzed to identify predominant visual colors and to delete visual colors that are less dominant, resulting in a smaller (more representative) set of visual colors, such as six colors, representing a mood of each song.
In a similar manner, melody, rhythm and harmony may also be mapped to colors. For example, sound without melody, harmony, or rhythm is known as white noise. A song with lots of rhythm, harmony and melody may be thought of as being black. Sounds that have equal attributes of melody, rhythm and harmony, may therefore be thought of as being gray. Harmonious tones without melody or rhythm may be mapped to a specific color, such as yellow. Music that includes melody and harmony, but no driving rhythm, may be considered green. Music with lots of melody, but little rhythm or harmony, may be considered cyan. A simple melody with a hard driving rhythm may be blue. Music with lots of rhythm, some melody, and some harmony, may be purple. Music with lots of rhythm and some harmony, but little melody, may be red. The above color association is just an example and other color associations may readily be used. The point is that combining this form of music colorization with the musical tristimulus colorization technique may result in songs being identified with more predictable color identifiers.
According to a research study published in the journal BMC Medical Research Methodology (Feb. 2, 2010), people on average tend to pick different colors to describe different moods.
Using any appropriate color/mood association chart, it is thus possible to identify the mood(s) represented by a song for the color identifiers for that song. The mood(s) identified for a song using the color technique can then be compared to the mood(s) identified for the same song using audio fingerprints or other mood identifying techniques. There may be a correlation between the two, but now always, which illustrates that at least one of the techniques may be less accurate, which is why at least one additional mood identification technique may be helpful in order to triangulate the mood identifiers for a song. For example, it may be difficult to accurately identify the location of an object based on two reference points (signals between a mobile device and two cell towers, for example), but when three reference points are used, the object's location can usually be fairly accurately identified, absent any signal interference, or the utilization of techniques for eliminating or accommodating for such interference. The same principles may be applied with moods. With two mood identifiers for a song, it may be possible to identify the mood of the song, but that identification may not always be accurate. When a third mood identifier is added, however, the accuracy may increase significantly. Additional mood identifiers may likewise be added to further increase accuracy, up to a certain point, where the addition of further mood identifiers makes no significant statistical difference.
A third mood identification technique for music includes shape. One type of system that generates shapes associated with songs are software programs known as “visualizers,” such as VSXU, MILKDROP, GFORCE and multiple different plugins for APPLE's ITUNES. Such visualizers tend to use the same audio frequency data utilized to generate audio fingerprints, so the resulting images may be similarly useful in terms of identify moods. However, despite the fact that the same frequency data may be used as an input to an audio fingerprint system and a visualizer system, some visualizers also use loudness as an input, which may cause the visualizer to identify different moods than the audio fingerprint system using the same frequency data, where amplitude may not be used to generate the audio fingerprint. In addition, each visualizer may analyze the frequency data differently and may therefore identify different moods.
Since it is desirable to have a fairly high level of mood identification correlation between the different systems used, it may be problematic if the visualizer identifies different moods than the other mood identification systems. Accordingly, mood identification systems may be chosen based on the level of correlation between the moods each system identifies.
While a visualizer may be used as a mood identification system, some visualizers are designed to create different visualizations for each song every time the program is run, so some visualizers may not be well suited for mood identification for this reason. In addition, some visualizers produce very complex imagery associated with songs. For example, one of the simpler screenshots generated by the GFORCE visualizer, and made available on the SoundSpectrum, Inc. website, http://www.soundspectrum.com/, is illustrated in black and white in
Similarly complicated and unique shapes may be generated from songs in much the same way that images may be generated from fractals. For example, in HARLAN J. BROTHERS, INTERVALLIC SCALING IN THE BACH CELLO SUITES, Fractals 17:04, 537-545, Online publication date: 1 Dec. 2009, it was noted that the cello suites of Johann Sebastian Bach exhibit several types of power-law scaling, which can be considered fractal in nature. Such fractals are based on melodic interval and its derivative, melodic moment, as well as a pitch-related analysis. One issue with complicated shapes representing moods, such as those images generated by the GFORCE visualizer or a fractal based system, although they may be used for mood identification, is that the images may be too complicated for many users to reliably identify a mood based on the generated images.
Accordingly, the present disclosure describes a simpler type of imagery that may be utilized to identify the moods in songs, and which may be combined with the color mood identification system described above. The present disclosure uses audio fingerprints, instead of the frequency data used to generate the audio fingerprints (although the frequency data may be used instead of the fingerprint), to generate simplistic, three-dimensional geometric shapes for each song. By using the fingerprint itself, although analyzed in a different way, a correlation between the moods identified with the fingerprint and the shapes may be assured.
As previously, described, an audio fingerprint may be used to generate a static visual representation of a song, but like visualizer images and fractal images, the visual representation may be too complicated for humans to easily identify similarities between different fingerprints. By converting the audio fingerprint into a simple geometric representation of the complex data, it is easier for humans to easily recognize and differentiate between the visually represented songs.
The details that make up the geometric shape (shape, angles, number of sides, length of lines) are determined by the unique data contained within the audio fingerprints. Since each of the above songs are distinctly different from one another, they generate distinctly different shapes. Songs with more aggressive moods tend to generate shapes with sharper angles. Songs with sadder or more mellow moods tend to generate shapes with angles that are close to 90 degrees. Songs with more ambiguous moods tend to generate shapes with more ambiguous shapes.
At the same time, there may also be degrees of similarities between songs that are based on the level of mood represented. For example, in
With reference back to
In view of the above, it may be possible to identify the mood or moods in a particular song in at least three different ways, as illustrated in
With these three different types of classifications, it may be possible to identify in any type of music or other content corresponding commonalties. The fingerprint, color and shape may define where a song or other type of content fits within any type of content selection system, which may make it easier for any user of the selection system to select content they desire.
An alternative to using audio fingerprints to objectively identify moods in songs is to use rhythm, texture and pitch to objectively identify moods in songs. And, as further described below, it may likewise be possible to objectively determine scores for rhythm, texture and pitch for purposes of using those scores to determine the mood classification/mood class/mood for each song.
A SVM trained with the RTP scores for the moods listed in
In an embodiment, as illustrated in
In an embodiment, the characteristics of the digital representation of songs may be analyzed to determine their moods as described above before songs are first published by artists or record labels representing the artists. Alternatively, the artists or recording labels could input the songs for analysis, step 1602. If the moods of the songs are determined before being published, the metadata for each song could include the mood(s) for in the same manner as genre, artist, title, etc.
In the same manner songs may be input by a music services, step 1604, such as ITUNES, SPOTIFY, PANDORA and the like. Once input, the songs may be analyzed to determine either the audio fingerprints or the RTP scores, step 1606, for the songs. Once the songs have been analyzed or scored, the songs may be placed into different mood classifications as noted above, step 1608, and as appropriate for the analysis/score of the songs. Once the moods of songs are determined, the music service could then include the mood of each song with the streamed data. In an embodiment, the music service analyzes the songs itself In the manner described above and provides mood information with the streamed songs so users can select songs based on mood as well as other data.
In an embodiment, users may be able to customize moods after the moods have been first determined, step 1612. As the moods of the songs are objectively determined, but mood is inherently a subjective determination of the listener, not every objective mood determination will fit a particular listener's subject mood. Hence, it may be desirable to allow listeners to change moods or groups songs into their own categories. If a listener/user does not want to customize any aspect of a mood for a song, then the user may be enabled to just listen to the songs as classified, step 1614. Alternatively, the listener/user wants to customize the moods or categorize the songs based on moods, they may do so, step 1616. In an embodiment, the user may want to categorize songs with the same mood or perhaps different moods within a single category that they name themselves, such as “Sunday Listening,” which includes a grouping of songs with different moods that a user likes to listen to on Sundays.
User preferences may include negative or positive preferences in a wide variety of categories. For example, a user could indicate a desire not to listen to any country music or pop songs within a particular mood, or to only listen to songs published after 1979. Preference may include genre, artist, bands, years of publication, etc. Once a user has customized the mood classes based on their preferences in step 1616, the user may then be enabled to listen to the music based on the customizations, step 1618. It should be noted, that just because one or more songs were excluded from a particular mood class by a customization, that does not change the mood class associated with those one or more songs. A user may create a customized play list of songs on one occasion based on particular preferences and then create a different customized play list of songs based on other preferences on another occasion. Thus, the songs remain classified as they are and remain in the mood classes to which they are assigned regardless of the user's customizations, which apply only to the customized lists created in step 1618.
As noted above, in addition to objectively analyzing or scoring songs in order to classify them by mood, it may also be possible to objectively analyze songs in order to generate RTP scores for the songs. For example, the chromagrams generated as part of the process of creating audio fingerprints may also be used to objectively determine the pitch for songs for which chromagrams have been generated. The chromagram is analogous to a helix that circles around from bottom to top like a corkscrew. The musical scale goes around the helix, repeating keys each time it goes around, but each time at a higher octave. Just as there are 12 octaves in the scale, there are 12 vectors in the chromagram. If one were to attempt to scale the 12 vectors of the chromagram for each song across the 88 keys to a piano, for example, it may be very difficult to do, but that is not actually necessary.
The same SVM system described above can be used for purposes of objectively determining pitch in songs by determining a number of combinations of chromagram vectors among all possible combinations of chromagram vectors that correspond to certain pitch scores and then training a multiclass SVM with that number of combinations of chromagram vectors. Once the SVM has been trained, the SVM may then be used to determine which pitch score (such as an integer between 1 and 5) corresponds to every single possible combination of chromagram vectors, and once a set of all of the predetermined combinations of chromagram vectors have been mapped to different pitch scores, future pitch scores may then be determined by comparing a combination of chromagram vectors against the set of predetermined combinations of chromagram vectors and assigning the pitch score to the music based on a match between the combination of chromagram vectors to one of the predetermined combinations of chromagram vectors among the set of predetermined combinations of chromagram vectors.
Training the SVM as noted above and testing the SVM based on a 15 song subset of the 65 songs for which chromagrams have been produced generated correct results and acceptable functional margins for twelve of the fifteen songs, compared to the human generated scores for pitch, which are noted after each song below:
Texture generally relates to the singular or multiple melodies in a song and whether or not those melodies are accompanied by chords. Texture can also be representative of the density of frequencies in a song over time. In order to compute the chromagram used to generate the audio fingerprints and the objective pitch scores noted above, it is first necessary to generate the spectrogram for each song. The spectrogram is a visual representation of the spectrum of frequencies in each song as the songs plays. As a result, the spectrogram for a song can be used to represent the texture for a song. One manner of representing the texture and scaling the score from 1 to 5 is to average the frequency densities and separate those out by fifths. If the average frequency density for a song is in the first quintile, the texture score would be a 1 and if the average frequency density for a song is in the fifth quintile, the texture score would be a 5.
Rhythm can also be denoted by a long list of complicated factors, but the primary components of rhythm are Beats Per Minute (BPM) and time signature. A number of currently available tools exist to measure BPM in songs as well as time signatures, which are an indication of how many beats are contained in each bar and which note value is to be given one beat. In a manner similar to that described above, the raw BPM and time signature data generated by such tools can then be used to derive averages for both BPM and time signatures in the songs that may then be mapped to a 1-5 score. For example, a song with a consistently average BPM throughout the song and neither too fast or too slow, along with an even time signature, such as 4/4 time, may be scored a 3 for rhythm. A song with a higher average BPM, but which is inconsistent in places, or a moderate average BPM, but either a changing time signature or other variation, such as 5/4 time, may be scored a 4 for rhythm. Songs with slower BPMs and even time signatures may be scored a 2 for rhythm.
With regard to RTP scores and application of the SVM, once the SVM has scored every possible combination of RTP scores there will be no need to classify a song with the SVM once the song has been scored. For example, if a song has an RTP score of 3, 3, 5, its mood classification will be Happy because previously RTP scored songs with the same vector have been classified as Happy. Since every digital file, including any digitized music file, has a computable MD5 hash associated with it, where the MD5 hash is a cryptographic hash function producing a 128-bit (16-byte) hash value, typically expressed in text format as a 32 digit hexadecimal number, the MD5 hash can serve as a unique identifier for each song. Once a song has been scored, the MD5 hash can be computed and associated with the song. Thereafter, without rescoring a song, the MD5 hash can first be computed to see if the song has already been scored, and if so, the existing score can be used for that song, thereby greatly simplifying the scoring process for known songs. If the song has not been scored, it will be scored and the MD5 hash will be associated with that score. Other unique identifications associated with different types of music formats may be used in a similar manner.
A block diagram of a music selection and organization system based on the above disclosure is illustrated in
The three-dimensional shapes may be wire frames or solids. Each three-dimensional shape may include at least one angle and one side, where the degree of the angle and the length of the side identify a percentage of a mood identified in the content. The three-dimensional shapes may also include one or more colors that represent moods in the corresponding content. The color may be determined by creating a color representation of the content based on the frequency data sampled from the content. Predominant colors identified in the color representations are kept and less predominant colors in the color representation are deleted so as to generate one or more colors representing the mood(s) in the content. The color representation may be based on tristimulus values. The color representation may also be based on combinations of melodies, harmonies and rhythms in the content and/or RTP data. The moods may also be derived by the analyzer directly from the digitized representations. The content may be music or video that includes music.
In an embodiment, a user of the selection and organization system may also load a song or other content of their choice into the selection and organization system so as to generate a color, shape, RTP score, or fingerprint representation, and then search for similarly identified content. Extending the same idea further from that, the user may randomly generate a visual representation without specific content in mind, and find content based on fingerprints, RTP scores, colors or shapes aesthetically pleasing to the user.
Conditional language used herein, such as, among others, “can,” “could,” “might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain examples include, while other examples do not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular example. The terms “comprising,” “including,” “having,” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list.
In general, the various features and processes described above may be used independently of one another, or may be combined in different ways. All possible combinations and subcombinations are intended to fall within the scope of this disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed examples. The example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed examples.
While certain example or illustrative examples have been described, these examples have been presented by way of example only, and are not intended to limit the scope of the subject matter disclosed herein. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of certain of the subject matter disclosed herein.
This application is a continuation-in-part of U.S. patent application Ser. No. 14/671,973, filed Mar. 27, 2015, which is a continuation-in-part of U.S. patent application Ser. No. 14/603,324, filed Jan. 22, 2015; and is a continuation-in-part of U.S. patent application Ser. No. 14/603,325, filed Jan. 22, 2015; both of which are a continuation-in-part of U.S. patent application Ser. No. 13/828,656, filed Mar. 14, 2013, now U.S. Pat. No. 9,639,871, the entire contents of each of which are incorporated herein by reference. U.S. patent application Ser. Nos. 14/603,324 and 14/603,325 both claim benefit under 35 U.S.C. § 119(e) of Provisional U.S. Patent Application No. 61/930,442, filed Jan. 22, 2014, and of Provisional U.S. Patent Application No. 61/930,444, filed Jan. 22, 2014, the entire contents of each of which are incorporated herein by reference. U.S. patent application Ser. No. 14/671,973 also claims benefit under 35 U.S.C. § 119(e) of Provisional Application No. 61/971,490, filed Mar. 27, 2014, the entire contents of which are incorporated herein by reference. This application is also a continuation-in-part of U.S. patent application Ser. No. 14/671,979, filed Mar. 27, 2015, the entire contents of which are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5561649 | Lee | Oct 1996 | A |
6151571 | Pertrushin | Nov 2000 | A |
6411289 | Zimmerman | Jun 2002 | B1 |
6464585 | Miyamoto et al. | Oct 2002 | B1 |
6539395 | Gjerdingen et al. | Mar 2003 | B1 |
6657117 | Weare et al. | Dec 2003 | B2 |
6748395 | Picker et al. | Jun 2004 | B1 |
6964023 | Maes et al. | Nov 2005 | B2 |
6993532 | Platt et al. | Jan 2006 | B1 |
7022907 | Lu et al. | Apr 2006 | B2 |
7024424 | Platt et al. | Apr 2006 | B1 |
7080324 | Nelson et al. | Jul 2006 | B1 |
7115808 | Lu et al. | Oct 2006 | B2 |
7205471 | Looney et al. | Apr 2007 | B2 |
7206775 | Kaiser et al. | Apr 2007 | B2 |
7227074 | Ball | Jun 2007 | B2 |
7233684 | Fedorovskaya et al. | Jun 2007 | B2 |
7296031 | Platt et al. | Nov 2007 | B1 |
7396990 | Lu et al. | Jul 2008 | B2 |
7424447 | Fuzell-Casey et al. | Sep 2008 | B2 |
7427708 | Ohmura | Sep 2008 | B2 |
7485796 | Myeong et al. | Feb 2009 | B2 |
7541535 | Ball | Jun 2009 | B2 |
7562032 | Abbosh et al. | Jul 2009 | B2 |
7582823 | Kim et al. | Sep 2009 | B2 |
7626111 | Kim et al. | Dec 2009 | B2 |
7756874 | Hoekman et al. | Jul 2010 | B2 |
7765491 | Cotterill | Jul 2010 | B1 |
7786369 | Eom et al. | Aug 2010 | B2 |
7809793 | Kimura et al. | Oct 2010 | B2 |
7822497 | Wang | Oct 2010 | B2 |
7858868 | Kemp et al. | Dec 2010 | B2 |
7921067 | Kemp et al. | Apr 2011 | B2 |
8013230 | Eggink | Sep 2011 | B2 |
8229935 | Lee et al. | Jul 2012 | B2 |
8248436 | Kemp et al. | Aug 2012 | B2 |
8260778 | Ghatak | Sep 2012 | B2 |
8269093 | Naik et al. | Sep 2012 | B2 |
8346801 | Hagg et al. | Jan 2013 | B2 |
8354579 | Park et al. | Jan 2013 | B2 |
8390439 | Cruz-Hernandez et al. | Mar 2013 | B2 |
8407224 | Bach et al. | Mar 2013 | B2 |
8410347 | Kim et al. | Apr 2013 | B2 |
8505056 | Cannistraro et al. | Aug 2013 | B2 |
8686270 | Eggink et al. | Apr 2014 | B2 |
8688699 | Eggink et al. | Apr 2014 | B2 |
8855798 | Dimaria et al. | Oct 2014 | B2 |
8965766 | Weinstein et al. | Feb 2015 | B1 |
9165255 | Shetty et al. | Oct 2015 | B1 |
9195649 | Neuhasuer et al. | Nov 2015 | B2 |
9788777 | Knight et al. | Oct 2017 | B1 |
9830896 | Wang et al. | Nov 2017 | B2 |
9842146 | Chen et al. | Dec 2017 | B2 |
20030045953 | Weare | Mar 2003 | A1 |
20030133700 | Uehara | Jul 2003 | A1 |
20030221541 | Platt | Dec 2003 | A1 |
20050065781 | Tell et al. | Mar 2005 | A1 |
20050109194 | Gayama | May 2005 | A1 |
20050109195 | Haruyama et al. | May 2005 | A1 |
20050211071 | Lu et al. | Sep 2005 | A1 |
20050234366 | Heinz et al. | Oct 2005 | A1 |
20050241465 | Goto | Nov 2005 | A1 |
20060047649 | Liang | Mar 2006 | A1 |
20060096447 | Weare et al. | May 2006 | A1 |
20060143647 | Bill | Jun 2006 | A1 |
20060170945 | Bill | Aug 2006 | A1 |
20070079692 | Glatt | Apr 2007 | A1 |
20070107584 | Kim et al. | May 2007 | A1 |
20070113725 | Oliver et al. | May 2007 | A1 |
20070113726 | Oliver et al. | May 2007 | A1 |
20070131096 | Lu et al. | Jun 2007 | A1 |
20080021851 | Alcalde et al. | Jan 2008 | A1 |
20080040362 | Aucouturier et al. | Feb 2008 | A1 |
20080184167 | Berrill et al. | Jul 2008 | A1 |
20080189754 | Yoon | Aug 2008 | A1 |
20080235284 | Aarts et al. | Sep 2008 | A1 |
20080253695 | Sano et al. | Oct 2008 | A1 |
20080300702 | Gomez et al. | Dec 2008 | A1 |
20080314228 | Dreyfuss | Dec 2008 | A1 |
20090069914 | Kemp et al. | Mar 2009 | A1 |
20090071316 | Oppenheimer | Mar 2009 | A1 |
20090182736 | Ghatak | Jul 2009 | A1 |
20090234888 | Holmes et al. | Sep 2009 | A1 |
20100011388 | Bull et al. | Jan 2010 | A1 |
20100042932 | Lehtiniemi et al. | Feb 2010 | A1 |
20100053168 | Kemp et al. | Mar 2010 | A1 |
20100086204 | Lessing | Apr 2010 | A1 |
20100091138 | Nair | Apr 2010 | A1 |
20100094441 | Mochizuki et al. | Apr 2010 | A1 |
20100223128 | Dukellis et al. | Sep 2010 | A1 |
20100223223 | Sandler et al. | Sep 2010 | A1 |
20100253764 | Sim et al. | Oct 2010 | A1 |
20100260363 | Glatt et al. | Oct 2010 | A1 |
20100325135 | Chen et al. | Dec 2010 | A1 |
20110112671 | Weinstein | May 2011 | A1 |
20110184539 | Agevik et al. | Jul 2011 | A1 |
20110191674 | Rawley et al. | Aug 2011 | A1 |
20110202567 | Bach et al. | Aug 2011 | A1 |
20110239137 | Bill | Sep 2011 | A1 |
20110242128 | Kang | Oct 2011 | A1 |
20110252947 | Eggink et al. | Oct 2011 | A1 |
20110252951 | Leavitt et al. | Oct 2011 | A1 |
20110271187 | Sullivan et al. | Nov 2011 | A1 |
20110289075 | Nelson | Nov 2011 | A1 |
20110314039 | Zheleva et al. | Dec 2011 | A1 |
20120090446 | Moreno et al. | Apr 2012 | A1 |
20120132057 | Kristensen | May 2012 | A1 |
20120172059 | Kim et al. | Jul 2012 | A1 |
20120179693 | Knight et al. | Jul 2012 | A1 |
20120179757 | Jones et al. | Jul 2012 | A1 |
20120197897 | Knight et al. | Aug 2012 | A1 |
20120226706 | Choi et al. | Sep 2012 | A1 |
20120260789 | Ur et al. | Oct 2012 | A1 |
20120272185 | Dodson et al. | Oct 2012 | A1 |
20120296908 | Bach et al. | Nov 2012 | A1 |
20130032023 | Pulley et al. | Feb 2013 | A1 |
20130086519 | Fino | Apr 2013 | A1 |
20130138684 | Kim et al. | May 2013 | A1 |
20130167029 | Friesen et al. | Jun 2013 | A1 |
20130173526 | Wong et al. | Jul 2013 | A1 |
20130178962 | DiMaria et al. | Jul 2013 | A1 |
20130204878 | Kim et al. | Aug 2013 | A1 |
20130205223 | Gilbert | Aug 2013 | A1 |
20130247078 | Nikankin et al. | Sep 2013 | A1 |
20140052731 | Dahule et al. | Feb 2014 | A1 |
20140080606 | Gillet et al. | Mar 2014 | A1 |
20140085181 | Roseway et al. | Mar 2014 | A1 |
20140094156 | Uusitalo | Apr 2014 | A1 |
20140180673 | Neuhauser | Jun 2014 | A1 |
20140282237 | Fuzell-Casey | Sep 2014 | A1 |
20140310011 | Biswas et al. | Oct 2014 | A1 |
20140372080 | Chu | Dec 2014 | A1 |
20150078583 | Ball et al. | Mar 2015 | A1 |
20150081064 | Ball et al. | Mar 2015 | A1 |
20150081065 | Ball et al. | Mar 2015 | A1 |
20150081613 | Ball et al. | Mar 2015 | A1 |
20150134654 | Fuzell-Casey | May 2015 | A1 |
20150179156 | Uemura et al. | Jun 2015 | A1 |
20150205864 | Fuzell-Casey | Jul 2015 | A1 |
20150220633 | Fuzell-Casey | Aug 2015 | A1 |
20160110884 | Fuzell-Casey | Apr 2016 | A1 |
20160125863 | Henderson | May 2016 | A1 |
20160203805 | Strachan | Jul 2016 | A1 |
20160329043 | Kim et al. | Nov 2016 | A1 |
20160372096 | Lyske | Dec 2016 | A1 |
20170091983 | Sebastian et al. | Mar 2017 | A1 |
20170103740 | Hwang et al. | Apr 2017 | A1 |
20170206875 | Hwang et al. | Jul 2017 | A1 |
20170330540 | Quattro et al. | Nov 2017 | A1 |
20180033416 | Neuhasuer et al. | Feb 2018 | A1 |
20180047372 | Scallie et al. | Feb 2018 | A1 |
20180049688 | Knight et al. | Feb 2018 | A1 |
20180053261 | Hershey | Feb 2018 | A1 |
Entry |
---|
www.picitup.com; Picitup's; PicColor product; copyright 2007-2010; accessed Feb. 2, 2015; 1 page. |
http://labs.tineye.com; Multicolor; Idee Inc.; copyright 2015; accessed Feb. 2, 2015, 1 page. |
http://statisticbrain.com/attention-span-statistics/; Statistics Brain; Statistic Brain Research Institute; accessed Feb. 2, 2015; 4 pages. |
Dukette et al.; “The Essential 20: Twenty Components of an Excellent Health Care Team”; RoseDog Books; 2009; p. 72-73. |
Music Genome Project; http://en.wikipedia.org/wiki/Music.sub.--Genome.sub.--Project; accessed Apr. 15, 2015; 4 pages. |
Ke et al.; “Computer Vision for Music Identification”; In Proceedings of Computer Vision and Pattern Recognition; 2005; vol. 1; p. 597-604. |
Lalinsky; “How does Chromaprint work?”; https://oxygene.sk/2011/01/how-does-chromaprint-work; Jan. 18, 2011; accessed Apr. 15, 2015; 3 pages. |
Gforce; http://www.soundspectrum.com; copyright 2015; accessed Apr. 15, 2015; 2 pages. |
Harlan J. Brothers; “Intervallic Scaling in the Bach Cello Suites”; Fractals; vol. 17 Issue 4; Dec. 2009; p. 537-545. |
Ke et al.; “Computer Vision for Music Identification”; IEEE Computer Vision and Pattern Recognition CVPR; Jun. 2005; 8 pages. |
Number | Date | Country | |
---|---|---|---|
20180139268 A1 | May 2018 | US |
Number | Date | Country | |
---|---|---|---|
61971490 | Mar 2014 | US | |
61930442 | Jan 2014 | US | |
61930444 | Jan 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14671979 | Mar 2015 | US |
Child | 15868902 | US | |
Parent | 14671973 | Mar 2015 | US |
Child | 14671979 | US | |
Parent | 14603324 | Jan 2015 | US |
Child | 14671973 | US | |
Parent | 14603325 | Jan 2015 | US |
Child | 14603324 | US | |
Parent | 13828656 | Mar 2013 | US |
Child | 14603325 | US | |
Parent | 13828656 | US | |
Child | 14603324 | US |