A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
This invention is directed to the matching and selection of recommended consumer items from a database.
Many consumers have difficulty finding items, such as music or videos, that they like, whether online, in a retail store or at home. For example, studies have shown that many consumers who enter music retail stores intending to buy, leave without making a purchase and that many of those unsatisfied consumers had fully intended to buy music on that visit. The online shopping experience, with its even larger selection, can be even more challenging to consumers. For example, to discover music, consumers must rely on rough genre classification tools or collaborative filtering technology. Neither is effective, as reflected in the purchasing patterns in the industry. In 2000, less than 3% of active music titles accounted for over 80% of sales Consumers lack an effective means of browsing and discovering new music they will like.
On the other side of matching equation, the would-be music transaction, music retailers, record labels, and other delivery channels strive to find the right listeners for the music they have to offer. No current means exist for the cost-effective promotion of extensive product lines to a wide audience. The economics of national promotion forces record labels to consolidate their marketing efforts and rely on “hits” to meet annual growth targets. Music consumers are not familiar with the vast majority of music releases, which, as a result, are not purchased and are unprofitable. The present invention is directed to novel methods and systems for retailers and content providers to better understand consumers and target music promotions.
The invention is directed to a method and system for determining at least one match item that corresponds to a source item. For example, in the context of music, the invention includes the steps of creating a database comprising multiple songs, each song in the database represented by an n-dimensional database vector corresponding to n musical characteristics of the song; determining a n-dimensional source song vector that corresponds to n musical characteristics of the source song; calculating a first distance between the source song vector and a first database song vector, the distance being a function of the differences between the n musical characteristics of the source song vector and the first database song vector; calculating a second distance between the source song vector and a second database song vector, the distance being a function of the differences between the n musical characteristics of the source song vector and the second database song vector; and selecting the at least one match song based on the magnitude of the first distance and the second distance.
The invention may include numerous other features and characteristics, for example, again in the context of music, the steps of calculating the first and second distances may further include application of a weighting factor to the difference between certain of the n musical characteristics of the first and second database song vectors and the source song vector.
Other details of the invention are set forth in the following description.
The Music Genome Project™ is a database of songs. Each song is described by a set of multiple characteristics, or “genes”, or more that are collected into logical groups called “chromosomes.” The set of chromosomes make up the genome. One of these major groups in the genome is the Music Analysis Chromosome. This particular subset of the entire genome is sometimes referred to as “the genome.”
Song Matching Techniques
Song to Song Matching
The Music Genome Project™ system is a large database of records, each describing a single piece of music, and an associated set of search and matching functions that operate on that database. The matching engine effectively calculates the distance between a source song and the other songs in the database and then sorts the results to yield an adjustable number of closest matches.
Each gene can be thought of as an orthogonal axis of a multi-dimensional space and each song as a point in that space. Songs that are geometrically close to one another are “good” musical matches. To maximize the effectiveness of the music matching engine, we maximize the effectiveness of this song distance calculation.
Song Vector
A given song “S” is represented by a vector containing approximately 150 genes. Each gene corresponds to a characteristic of the music, for example, gender of lead vocalist, level of distortion on the electric guitar, type of background vocals, etc. In a preferred embodiment, rock and pop songs have 150 genes, rap songs have 350, and jazz songs have approximately 400. Other genres of music, such as world and classical, have 300–500 genes. The system depends on a sufficient number of genes to render useful results. Each gene “s” of this vector is a number between 0 and 5. Fractional values are allowed but are limited to half integers.
Song S=(s1, s2, s3, . . . , sn)
Basic Matching Engine
The simple distance between any two songs “S” and “T”, in n-dimensional space, can be calculated as follows:
distance=square-root of (the sum over all n elements of the genome of (the square of (the difference between the corresponding elements of the two songs)))
This can be written symbolically as:
distance(S,T)=sqrt[(for i=1 to n)Σ(si−ti)^2]
Because the monotonic square-root function is used in calculating all of these distances, computing the function is not necessary. Instead, the invention uses distance-squared in song comparisons. Accepting this and implying the subscript notation, the distance calculation is written in simplified form as:
distance(S,T)=Σ(s−t)^2
B. Weighted and Focus Matching
1. Weighted Matching
Because not all of the genes are equally important in establishing a good match, the distance is better calculated as a sum that is weighted according to each gene's individual significance. Taking this into account, the revised distance can be calculated as follows:
distance=Σ[w*(s−t)^2]=[w1*(s1−t1)^2]+[w2*(s2−t2)^2]+ . . .
where the weighting vector “W,”
Song W=(w1, w2, w3, . . . , wn)
is initially established through empirical work done, for example, by a music team that analyzes songs. The weighting vector can be manipulated in various ways that affect the overall behavior of the matching engine. This will be discussed in more detail in the Focus Matching section of this document.
Scaling Functions
The data represented by many of the individual genes is not linear. In other words, the distance between the values of 1 and 2 is not necessarily the same as the distance between the values of 4 and 5. The introduction of scaling functions f(x) may adjust for this non-linearity. Adding these scaling functions changes the matching function to read:
distance=Σ[w*(f(s)−f(t))^2]
There are a virtually limitless number of scaling functions that can be applied to the gene values to achieve the desired result.
Alternatively, one can generalize the difference-squared function to any function that operates of the absolute difference of two gene values. The general distance function is:
distance=Σ[w*g(|s−t|)]
In the specific case, g(x) is simply x2, but it could become x3 for example if it was preferable to prioritize songs with many small differences over ones with a few large ones.
2. Focus Matching
Focus matching allows the end user of a system equipped with a matching engine to control the matching behavior of the system. Focus traits may be used to re-weight the song matching system and refine searches for matching songs to include or exclude the selected focus traits.
Focus Trait Presentation
Focus Traits are the distinguishing aspects of a song. When an end user enters a source song into the system, its genome is examined to determine which focus traits have been determined by music analysts to be present in the music. Triggering rules are applied to each of the possible focus traits to discover which apply to the song in question. These rules may trigger a focus trait when a given gene rises above a certain threshold, when a given gene is marked as a definer, or when a group of genes fits a specified set of criteria. The identified focus traits (or a subset) are presented on-screen to the user. This tells the user what elements of the selected song are significant.
Focus Trait Matching
An end user can choose to focus a match around any of the presented traits. When a trait, or number of traits, is selected, the matching engine modifies its weighting vector to more tightly match the selection. This is done by increasing the weights of the genes that are specific to the Focus Trait selected and by changing the values of specific genes that are relevant to the Trait. The resulting songs will closely resemble the source song in the trait(s) selected.
Personalization
The weighting vector can also be manipulated for each end user of the system. By raising the weights of genes that are important to the individual and reducing the weights of those that are not, the matching process can be made to improve with each use.
Aggregation
Song to Song Matching
The matching engine is capable of matching songs. That is, given a source song, it can find the set of songs that closely match it by calculating the distances to all known songs and then returning the nearest few. The distance between any two songs is calculated as the weighted Pythagorean sum of the squares of the differences between the corresponding genes of the songs.
Basic Multi-Song Matching
It may also be desirable to build functionality that will return the best matches to a group of source songs. Finding matches to a group of source songs is useful in a number of areas as this group can represent a number of different desirable searches. The source group could represent the collected works of a single artist, the songs on a given CD, the songs that a given end user likes, or analyzed songs that are known to be similar to an unanalyzed song of interest. Depending on the makeup of the group of songs, the match result has a different meaning to the end user but the underlying calculation should be the same.
This functionality provides a list of songs that are similar to the repertoire of an artist or CD. Finally, it will allow us to generate recommendations for an end user, purely on taste, without the need for a starting song.
Vector Pairs
Referring to
The center-deviation vector pair can be used in place of the full set of songs for the purpose of calculating distances to other objects.
Raw Multi-Song Matching Calculation
If the assumption is made that a songs gene's are normally distributed and that they are of equal importance, the problem is straightforward. First a center vector is calculated and a standard deviation vector is calculated for the set of source songs. Then the standard song matching method is applied, but using the center vector in place of the source song and the inverse of the square of the standard deviation vector elements as the weights:
As is the case with simple song-to-song matching, the songs that are the smallest distances away are the best matches.
Using Multi-Song Matching With the Weighting Vector
The weighting vector that has been used in song-to-song matching must be incorporated into this system alongside the 1/σ^2 terms. Assuming that they are multiplied together so that the new weight vector elements are simply:
New weight=wi/σi^2
A problem that arises with this formula is that when σ2 is zero the new weight becomes infinitely large. Because there is some noise in the rated gene values, σ2 can be thought of as never truly being equal to zero. For this reason a minimum value is added to it in order to take this variation into account. The revised distance function becomes:
distancet=Σ[(wi*0.25/(σi^2+0.25))*(μi−ti)^2]
Other weighting vectors may be appropriate for multi-song matching of this sort. Different multi-song weighting vector may be established, or the (0.5)2 constant may be modified to fit with empirically observed matching results.
Taste Portraits
Groups with a coherent, consistent set of tracks will have both a known center vector and a tightly defined deviation vector. This simple vector pair scheme will breakdown, however, when there are several centers of musical style within the collection. In this case we need to describe the set of songs as a set of two or more vector pairs.
As shown in
Ideally there will be a small number of such clusters, each with a large number of closely packed elements. We can then choose to match to a single cluster at a time. In applications where we are permitted several matching results, we can choose to return a few from each cluster according to cluster size.
The invention has been described with respect to specific examples including presently preferred modes of carrying out the invention. Those skilled in the art will appreciate that there are numerous variations and permutations of the above described systems and techniques, for example, that would be used with videos, wine, films, books and video games, that fall within the spirit and scope of the invention as set forth in the appended claims.
This application claims priority to provisional U.S. Application Ser. No. 60/291,821 filed May 16, 2001.
Number | Name | Date | Kind |
---|---|---|---|
4191472 | Mason | Mar 1980 | A |
4775935 | Yourick | Oct 1988 | A |
5041972 | Frost | Aug 1991 | A |
5124911 | Sack | Jun 1992 | A |
5210820 | Kenyon | May 1993 | A |
5237157 | Kaplan | Aug 1993 | A |
5250745 | Tsumura | Oct 1993 | A |
5278751 | Adiano et al. | Jan 1994 | A |
5291395 | Abecassis | Mar 1994 | A |
5410344 | Graves et al. | Apr 1995 | A |
5469206 | Strubbe et al. | Nov 1995 | A |
5483278 | Strubbe et al. | Jan 1996 | A |
5486645 | Suh et al. | Jan 1996 | A |
5534911 | Levitan | Jul 1996 | A |
5541638 | Story | Jul 1996 | A |
5550746 | Jacobs | Aug 1996 | A |
5616876 | Cluts | Apr 1997 | A |
5634021 | Rosenberg et al. | May 1997 | A |
5634051 | Thomson | May 1997 | A |
5634101 | Blau | May 1997 | A |
5675784 | Maxwell et al. | Oct 1997 | A |
5719344 | Pawate | Feb 1998 | A |
5726909 | Krikorian | Mar 1998 | A |
5749081 | Whiteis | May 1998 | A |
5754938 | Herz et al. | May 1998 | A |
5758257 | Herz et al. | May 1998 | A |
5809246 | Goldman | Sep 1998 | A |
5822744 | Kesel | Oct 1998 | A |
5835087 | Herz et al. | Nov 1998 | A |
5848396 | Gerace | Dec 1998 | A |
5848404 | Hafner et al. | Dec 1998 | A |
5864868 | Contois | Jan 1999 | A |
5893095 | Jain et al. | Apr 1999 | A |
5897639 | Greef et al. | Apr 1999 | A |
5911131 | Vig | Jun 1999 | A |
5913204 | Kelly | Jun 1999 | A |
5918223 | Blum et al. | Jun 1999 | A |
5931901 | Wolfe et al. | Aug 1999 | A |
5945988 | Williams et al. | Aug 1999 | A |
5963916 | Kaplan | Oct 1999 | A |
5963957 | Hoffberg | Oct 1999 | A |
5969283 | Looney et al. | Oct 1999 | A |
5973683 | Cragun et al. | Oct 1999 | A |
6020883 | Herz et al. | Feb 2000 | A |
6026388 | Liddy et al. | Feb 2000 | A |
6026398 | Brown et al. | Feb 2000 | A |
6029195 | Herz | Feb 2000 | A |
6049797 | Guha et al. | Apr 2000 | A |
6070160 | Geary | May 2000 | A |
6085185 | Matsuzawa et al. | Jul 2000 | A |
6088722 | Herz et al. | Jul 2000 | A |
6240423 | Hirata | May 2001 | B1 |
6526411 | Ward | Feb 2003 | B1 |
6657117 | Weare et al. | Dec 2003 | B1 |
20030089218 | Gang et al. | May 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
60291821 | May 2001 | US |