© 2002-2009 Mystrands, Inc. A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. 37 CFR § 1.71(d).
This invention pertains to methods and systems to provide recommendations of media items, for example music items, in which the recommendations reflect dynamic adaptation in response to explicit and implicit user feedback.
New technologies combining digital media item players with dedicated software, together with new media distribution channels through computer networks (e.g., the Internet) are quickly changing the way people organize and play media items. As a direct consequence of such evolution in the media industry, users are faced with a huge volume of available choices that clearly overwhelm them when choosing what item to play in a certain moment.
This overwhelming effect is apparent in the music arena, where people are faced with the problem of selecting music from very large collections of songs. However, in the future, we might detect similar effects in other domains such as music videos, movies, news items, etc.
In general, the disclosed process and device is applicable to any kind of media item that can be grouped by users to define mediasets. For example, in the music domain, these mediasets are called playlists. Users put songs together in playlists to overcome the problem of being overwhelmed when choosing a song from a large collection, or just to enjoy a set of songs in particular situations. For example, one might be interested in having a playlist for running, another for cooking, etc.
Different approaches can be adopted to help users choose the right options with personalized recommendations. One kind of approach employs human expertise to classify the media items and then use these classifications to infer recommendations to users based on an input mediaset. For instance, if in the input mediaset the item x appears and x belongs to the same classification as y, then a system could recommend item y based on the fact that both items are classified in a similar cluster. However, this approach requires an incredibly huge amount of human work and expertise. Another approach is to analyze the data of the items (audio signal for songs, video signal for video, etc) and then try to match user's preferences with the extracted analysis. This class of approaches is yet to be shown effective from a technical point of view.
The use of a large number of playlists to make recommendations may be employed in a recommendation scheme. Analysis of “co-occurrences” of media items on multiple playlists may be used to infer some association of those items in the minds of the users whose playlists are included in the raw data set. Recommendations are made, starting from one or more input media items, based on identifying other items that have a relatively strong association with the input item based on co-occurrence metrics. More detail is provided in our PCT publication number WO 2006/084102.
Recommendations based on playlists or similar lists of media items are limited in their utility for generating recommendations because the underlying data is fixed. While new playlists may be added (or others deleted) from time to time, and the recommendation databases updated, that approach does not directly respond to user input or feedback. Put another way, users may create playlists, and submit them (for example through a web site), but the user may not in fact actually play the items on that list. User behavior is an important ingredient in making useful recommendations. One aspect of this disclosure teaches how to take into account both what a user “says” (by their playlist) and what the user actually does, in terms of the music they play, or other media items they experience. The present application discloses these concepts and other improvements in related recommender technologies.
Additional aspects and advantages of this invention will be apparent from the following detailed description of preferred embodiments, which proceeds with reference to the accompanying drawings.
Reference is now made to the figures in which like reference numerals refer to like elements. For clarity, the first digit of a reference numeral indicates the figure number in which the corresponding element is first used.
In the following description, certain specific details of programming, software modules, user selections, network transactions, database queries, database structures, etc. are omitted to avoid obscuring the invention. Those of ordinary skill in computer sciences will comprehend many ways to implement the invention in various embodiments, the details of which can be determined using known technologies.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In general, the methodologies of the present invention are advantageously carried out using one or more digital processors, for example the types of microprocessors that are commonly found in servers, PC's, laptops, PDA's and all manner of desktop or portable electronic appliances.
Described herein is a new system for building Pre-Computed Correlation (PCC) datasets for recommending media items. In some embodiments, the proposed system combines the methods to build mutually exclusive PCC datasets into a single unified process. The process is presented here as a simple discrete dynamical system that combines item similarity estimates derived from statistical data about user media consumption patterns with a priori similarity estimates derived from metadata to introduce new information into the PCC datasets. Statistical data gathered from user interactions with recommender-driven media experiences is then used as feedback to fine-tune these PCC datasets.
In one embodiment, the process takes advantage of statistical data gathered from user-initiated media consumption and metadata to introduce new information into PCCs in a way that leverages social knowledge and addresses a “cold-start” problem. The “cold-start problem” arises when there are new media items that are not yet included in any user-defined associations such as playlists or playstreams. The problem is how to make recommendations without any such user-defined associations. The system disclosed herein incorporates metadata related to new media items with the user-defined associations to make recommendations related to the new media items until the new media items begin to appear in user-defined associations or until passage of a particular time period.
In one embodiment, the PCCs are fine-tuned using feedback in the form of user interactions logged from recommender-driven media experiences. In some embodiments, the system may be used to build individual PCC datasets for specific media catalogs, a single PCC dataset for multiple catalogs, or other special PCC datasets (new releases, community-based, etc.).
In an embodiment, the playlist analyzer 106 accesses and analyzes playlists from “in-the-wild,” aggregating the playlist data in an Ultimate Matrix of Associations (UMA) dataset 116. “In-the wild” playlists are those accessed from various databases and publicly and/or commercially available playlist sources. The playstream analyzer 108 accesses and analyzes consumed media item data (e.g., logged user playstream data) aggregating the consumed media item data in a Listening Ultimate Matrix of Associations (LUMA) dataset 118. The media catalog analyzer 110 accesses and analyzes media catalog data aggregating the media item data in an Metadata PCC (MPCC) dataset 120. The user feedback analyzer 114 accesses and analyzes logged user feedback responsive to recommended media items aggregating the data in a Feedback Ultimate Matrix of Associations (FUMA) dataset 122.
In one embodiment, PCC builder module 104 merges the UMA 116, LUMA 118, FUMA 122 and MPCC 120 relational information to generate a single media item recommender dataset to be used in recommender application 112 configured to provide users with media item recommendations based on the recommender dataset.
In one embodiment, the playlist analyzer 106 may generate the UMA dataset 116 by accessing “in-the-wild” playlists source(s) 124. Similarly, the playstream analyzer 108 may generate the LUMA dataset 118 by accessing a playstream data (ds) database 128 which comprises at least one play stream source. The playstream harvester 130 compiles statistics on the co-occurrences of media items in the playstreams aggregating them in the LUMA dataset 118. LUMA dataset 118 can also be viewed as an adjacency matrix of a weighted, directed graph. In one embodiment, each row Li in the graph is a vector of statistics on the co-occurrences of item i with every other item j in the collection of playstreams gathered by the playstream harvester 130, and, as with the UMA dataset 116, is therefore the weight on the edge in the graph from item i to item j. Generating the LUMA dataset 118 and playstream data by analyzing consumed media item data is discussed in greater detail below.
In one embodiment, the media catalog analyzer 110 generates the MPCC dataset 120 by accessing the media catalog(s) 133. The coldstart catalog scanner 136 compares the metadata for media items in one or more media catalogs 133. The all-to-all comparison of media item metadata by coldstart catalog scanner 136 generates a preliminary PCC, M(n), that can be combine with a preliminary PCC corresponding to the LUMA dataset 118 and UMA dataset 116 generated in PCC builder 104.
In one embodiment, the user feedback analyzer 114 generates the FUMA dataset 122 by aggregating user feedback statistics with popularity and similarity statistics based on the LUMA dataset 118. The user generated feedback is responsive to media item experiences associated with media item recommendations driven by the recommender 102. However, there are various other methods of incorporating user generated feedback and claimed subject matter is not limited to this embodiment. Generating the FUMA dataset 122 using the user feedback, popularity and similarity statistics is described is greater detail below.
In one embodiment, the PCC builder initially accesses or receives the relational data UMA dataset 116, (U(n)), LUMA dataset 118, (L(n)), and the MPCC dataset 120, (M(n)). At each PCC update instant n, this relational information is combined with FUMA dataset 122, (F(n)), and the previous value P(n−1) to compute the new PCC values 138 (P(n)) for item i. The computed PCCs 138 are supplied to the recommender 102, and the recommender knowledge base (kb) 102 is used to drive recommender-based applications 112. In one embodiment, the user responses to those applications are logged at user behavior log 132, between instant n−1 and n. User feedback processor 134 processes the logged user feedback to generate the FUMA dataset 122 (F(n)) used by the PCC Builder 104 in the update operation, here represented formally as:
P(n)=f(P(n−1),U(n),L(n),M(n),F(n))
In some embodiments, individual values in the MPCC dataset 120 (M(n)) may not evolve after initial computation, the time evolution in M(n) involves the affect of adding new media items or metatags to the media catalogs 133 (mi and mij). The adaptive recommender system 100 proposes a method for combining the U(n) and L(n) into new values to which a graph search process is applied and a method for modify the result using the M(n) and F(n).
In some embodiments, Pre-Computed Correlation (PCC) datasets are built from various Ultimate Matrix of Association (UMA) and Listening UMA datasets based on playlist and/or playstream data. The UMA and LUMA datasets are discussed in greater detail below.
In some embodiments, the PCCs may be built using ad hoc methods. For instance, the PCCs may be built from processed versions of UMA and LUMA datasets wherein the UMA or LUMA datasets for the item with ID i may include two random variables qi and ci,j, which may be treated as measurements of the popularity of item i and the similarity between items i and j.
Using one such ad hoc method, the similarities may be first weighted as:
i,j
=c
i,j[2 ln q/(qiqj)k]
The weighted similarities
In this embodiment, the PCC for item i is built by searching the graph starting with item j in the graph and ordering all items j≠i according to their maximum transitive similarity ri,j to item i. The transitive similarity along a path ei,j={i=k0, k1, k2, . . . , j=kn} from i to j along which no item km appears twice is computed as:
r(ei,j)=Πl=0l=n-1ck
The maximum transitive similarity between items i and j then is computed, subject to search depth and time bounding constraints, as:
r
i,j=maxe
In other embodiments, PCCs may be built using a principled approach, such as for instance using a Bernoulli model to build PCC datasets from UMA and/or LUMA datasets as described below.
The simplest model for the co-occurrence of two items i and j on a playlist or in a playstream is a Bernoulli model that places no deterministic or probabilistic constraints on playstream/playlist length. This Bernoulli model just assumes that:
ρij=Pr{Oc(j)|Oc(i)}=Pr{Oc(i)|Oc(j)}=ρji
where Oc(i) denotes item i occurs on a playlist or in a playstream, and 0≦ρij≦1 is some symmetric measure of the “similarity” of item i and j. The random occurrence of both items on a playlist or in a playstream given that either item occurs then is modeled as a Bernoulli trial with probability:
Taking advantage of the identities:
this can be re-expressed as:
Finally, denoting ηij=Pr{Oc(i)ΛOc(j)|Oc(i)V Oc(j)}
To model the co-occurrences, let ci(n) denote the number of actual playlists/playstreams that include item i up through update index n, and let ci,j(n) denote the actual number of playlists/playstreams that includes both item i and item j. To capture initial conditions correctly, assume also there is some earliest update n0>0 after which both items could be included on a playlist/playstream. The total number of playlists including item i or item j then is
c(i,j;n)=[ci(n)−ci(n0)]+[cj(n)−cj(n0)]−cij(n)
Since the occurrence of both items on a playlist or in a playstream given that either item occurs is modeled as a Bernoulli trial, the number of playlists/playstreams that includes item j given that the playlist/playstream includes item i after update no is a binomial random variable cij(n) with distribution:
and mean and variance:
μc=c(i,j;n)η σc2=c(i,j;n)η(1−η)
respectively.
Continuing with the general Bernoulli model for building PCCs, one quantity of interest of this model of co-occurrences is the estimate {circumflex over (ρ)}ij of the similarity ρij given the quantities ci(n), cj(n), and cij(n). For the binomial distribution fc(c), the maximum-likelihood estimate {circumflex over (η)} for η is the value which maximizes the function fc(c) for a given c=cij(n) and c(i, j, n). This is the value {circumflex over (η)} such that
From which it is easily computed that:
The maximum likelihood estimate for the similarity then is (perhaps not surprisingly)
Continuing still with the general Bernoulli model for building PCCs, another quantity of interest is the expected number of co-occurrences of two items given that either of them appears on a playlist or in a playstream. This is the quantity:
where c(i, j; n) is the number of playlists or playstreams that include either item i or j.
As already noted, given actual values ci(n), cj(n), cij(n), and n0, the number of playlists or playstreams including item i or j item is:
c(i,j;n)=[ci(n)−ci(n0)]+[cj(n)−cj(n0)]−cij(n)
If ρij is known, the expected number of co-occurrences, to which cij(n) can be compared, would be
The probability that cij(n) would actually be observed is:
Given multiple random processes x1, . . . , xm representing independent samples xi=x+wi of an underlying variable x corrupted by zero-mean additive measurement noise wi, a linear estimate {circumflex over (x)} for x is:
{circumflex over (x)}=k
1
x
1
+ . . . +k
m
x
m
In the optimal minimum variance estimator, the gains k1, . . . , km are chosen such that the estimation error:
{circumflex over (x)}=x−x=x−(k1x1+ . . . +kmxm)
has zero mean E{{tilde over (x)}} and minimum variance E{{tilde over (x)}2}, given the known variances σ12, . . . σ□2 of the m observations for x.
The zero mean requirement is met by:
From this, the constraint km=1−Σi=1m−1ki results.
The variance of the {tilde over (x)} can be simplified from the properties that E{wi}=0, E{wiwi}=σ12, and E{wiwi}=0 for i≠j.
Noting the relationship on the ki derived from the zero-mean constraint, this simplifies further to
The minimum-variance choices for the gains ki is found by solving the family of simultaneous equations:
for i=1, . . . , m−1. The general solution is:
while for the special case m=2
Referring again to
In one embodiment, Mi datasets for new items i are initially computed and updated each processing instant, by the following general process:
This process assumes that a suitable computation of the similarity mij of two items i and j is available. Additionally, the process accounts for the case in which the catalog of seed items for recommendations contains items that are not in, or are even completely disjoint from, the catalog of recommendable items.
Playlist analyzer 106 generates the UMA dataset 116 by accessing “in-the-wild” playlists source(s) 124. Harvester 126 compiles statistics on the co-occurrences of media items in the playlists such as tracks, artists, albums, videos, actors, authors and/or books. These statistics are aggregated in the UMA dataset 116. UMA dataset 116 can be viewed as an adjacency matrix of a weighted, directed graph. In one embodiment, each row Ui in the graph is a vector of statistics on the co-occurrences of item i with every other item j in the collection of playlists gathered by the Harvester 126 process, and therefore is the weight on the edge in the graph from item i to item j.
The dataflow diagram of
In one embodiment, the played events in the played table ds database 128 is the primary source data for LUMA 118. The data is buffered in the played event buffer 518 and stored in the Buffered Playlist Data (bds) database 516. Table 1 below presents a column structure of the played table. Several columns of the “played” table are relevant for building LUMA 118.
The fields shown in Table 1 and their contents may include:
pd_user_id_pk_fk—registered user ID.
pd_subscriber_id—Client platform ID.
pd_remote_addr—Originating IP address for play event.
pd_time_zone—Offset from GMT for client local time.
pd_country_code—The two-letter ISO country code returned by GeoIP for the IP address.
pd_shuffle—Media player shuffle mode flag (0=non-shuffle, 1=shuffle).
pd_souree—Source of play event track:
In one embodiment, legitimate values for Table 1 fields include but are not limited to:
In one embodiment, the contents of the pd_source and pd_playlist name items depend on the listening scenario and the client as shown in Table 2. In Table 2, “dpb” means “determined by player” and of course “nA” means “not applicable”. “pl_name” means the playlist name as known to the music player and “lib_name” means the library name as known to the music player. “shd_name” for the Mac client means the name the user has set as the iTunes->Preferences->Sharing->Shared name. Library and Musicstore may be the actual text strings returned by the player. Finally, “-” means that the items get assigned the null string as a value, either because, or regardless, of what the client may have sent.
The orphan_track and resolved_track tables in the orphan database 508 may contain additional supporting information for possible resolution of tracks that could not be resolved when the play event was logged. Tables 3 and 4 present embodiments of column structures of the played, orphan_track, and resolved_track tables, respectively. In one embodiment, raw track information may be retrieved from a Backend Resolver 520 API.
In one embodiment, to decouple the LUMA build process 500 from other activity in the ds database 128, the played events in the played table are buffered in the played event buffer 518 into one or more copies of the played table in the played event buffer bds database 516. The played table in the bds database 516 may have the same or similar structure as shown in Table 1 for the source played table of ds database 128.
In an embodiment, a MySql playstream segmentation (ps) database 510 may be used to maintain data, in some cases keyed to user IDs, needed for the segmentation operation. Because the contents of this database may be constantly changing, a framework such as iBATIS may be used as the access method.
In a particular embodiment, in order to support the dynamic segmentation of played events accumulated in the played table of the ds database 128 into playstreams, a detection table is maintained for mapping the ID of each user (dt_user_id_pk_fk=pd_user_id_pk_fk) into the ID in the played table for the last played item (dt_played_id_pk=pd_played_id_pk) actually included in a playstream and the ID of the last playstream extracted (dt_stream_id). Table 5 presents an embodiment of a column structure of the detection table in the ps database that implements this mapping.
Events in the played table may be processed in blocks. In an embodiment, to track the last played event of the last processed block, an extraction table may be maintained that includes only the last processed event ID. Table 6 presents an embodiment of a column structure of the extraction table in the ps database 510 that maintains this value.
In a particular embodiment, to keep track of the last ID assigned to a playstream for a user, a stream table may be maintained for mapping the ID of each user (st_user_id_pk_fk pd_user_id_pk_fk) into the last playstream converted into an rpf file (st_rpf_id). Table 7 presents an embodiment of the column structure of a stream table in the ps database 510 that implements this mapping.
To keep track of the last ID assigned to a playlist, a single-row table must be maintained that contains the last assigned playlist ID (lst_playlist_id). Table 8 presents an embodiment of a column structure of the list table in the ps database 510 that implements this mapping.
In a particular embodiment, a single-row luma2uma table may be used to store the ID of the last RPF file from the rpf playlist directory 506 that has been combined into an input rpf file for the UMA build pipeline in playlist analyzer 124 (see
In one embodiment, playstreams detected and extracted from the played table of the ds database 128 may be stored in playstreams archive 512 as individual files in a hierarchical directory structure keyed by the 32-bit pd_user_id_pk_fk and a 32-bit playstream ID number. In one embodiment, the 32-bit pd_user_id_pk_fk may be represented as a four byte string u3u2uiuo and the 32-bit playstream ID number be represented by the four byte string p3p2pipo, then the fully-qualified path file names for playstream files may have the form:
archive_path/u3/u2/u1/u0/p3/p2/p1/p0
where archive_path is the root path of the playstream archive.
In an embodiment, each playstream file may contain relevant elements from the played table events for the tracks in the playstream. The format may consist of a first line which contains identifying information for the playstream and then n item lines, one for each of the n tracks in the playstream.
The first line of the playstream file may have the format:
pd_user_id_pk_fk pd_subscriber_id pd_remote_addr pd_time_zone pd_country_code pd_source pd_playlist_name pd_shuffle stream_begin_time stream_end_time
where the items with the “pd_” suffix are the corresponding items from the first play event in the stream, stream_begin_time is the pd_begin_time of the first event in the play stream, and stream_end_time is the pd_end_time of the last event in the play stream. All items are space separated and last item is followed by the OS-defined EOL separator. In one embodiment, a necessary condition for play events to be grouped into a playstream may be that they all have the same value for the first six items in the first line of the playstream file.
The remaining n lines for the tracks in the playstream have the format:
pd_played_id_pk pd_track_id:pd_artist_id:pd_album_id pd_is_skip
where the items with the “pd_” suffix may be the corresponding items from the play event for the track.
As shown in
In an embodiment, the playstream segmenter 530 segments playstreams by a process that examines events in the played table for a given user to determine groups of sequentially contiguous events which can be segmented into playstreams.
In a particular embodiment, two criteria may be used to find segmentation boundaries between groups of played events. The first criteria may be that all events in a group must have the same values for the following columns in the played table:
In a particular embodiment, two consecutive events which differ in any of these values may define a boundary between two consecutive playstreams.
The second criteria for defining a playstream may be based on time gaps between sequentially tracks. Two consecutive tracks for which the pd_begin_time of the second event follows the pd_end_time of the first event may also define a boundary between two consecutive playstreams.
As already noted, the playstream extraction process is asynchronous with processes for inserting events into the played table. In a particular embodiment, both processes run continuously, with the user ID to played event ID mapping in the detection table of the ps database 510 used to arbitrate the data transfer between the processes.
The playstream-to-playlist converter 504 processes the extracted playstreams into rpf format playlists. This processing mainly involves removing redundant events and resolving orphan events that could not be resolved at the time the event was generated.
In an embodiment, raw playstreams may contain a valid colon-delimited track:artist:album triple, or a null triple 0:0:0 and an orphan ID for each event. In addition, a playstream can contain duplications which are not of interest for a playlist. The playstream-to-playlist converter resolves the orphans it can with the aid of the resolver 509 and the resolved_track table in the orphan database.
The ps database 510 may contain the state information for the asynchronous playstream-to-rpf conversion process. For each user ID, the stream table may contain the playstream ID (e.g., st_rpf_id) of the last playstream actually converted to an rpf playlist and the detection table may contain the playstream ID (e.g., dt_stream_id) of the last playstream actually extracted by the playstream segmenter 530. In one embodiment, the playstream segmenter 530 is a functional block of the playstream harvester 130 (see
An important question in defining CTL events is whether the playstream analyzer 108 should generate events on a per-playstream basis or for aggregate statistics, or both. On one hand, if CTL events are generated on a per-playstream basis, the number could be large, and grow with the number of users. On the other hand, because the LUMA builder operates in an asynchronous mode, a natural period over which to aggregate statistics would be one activation of the LUMA processes. Thus the actual time period encompassed by the playstreams processed in a single activation of the LUMA processes could vary from activation to activation, and so additional states would have to be maintained to regularize the aggregated statistics.
CTL events may generated on a per playstream/per-playlist basis and stored in the ctl database 514. That is a CTL PLAYSTREAM_HARVEST event may be generated for each extracted playstream and a CTL PLAYLIST_HARVEST event may be generated for each playstream converted to an rpf playlist.
Referring to
The playstream-to-playlist converter process herein described assumes identifiers for playstreams are sequential such that the last playstream identified will have an ID indicating that it was the last in time playstream to be identified. Process 900 begins at block 902 by retrieving the current playstream list (pd_user_id_pk_fk) for which the playstream ID (dt_stream_id) in the detection table in the ps database 510 is greater than the last identified raw playlist (st_rpf_id) in the stream table. At block 904, for each value user_id in the list retrieve the value of the last_stream_id for the selected user_id from the detection table in the ps database 510 and retrieve the value of the last_rpf_id for the selected user_id from the stream table in the ps database 510. The process flows to block 906 where for each playstream with this_stream_id from last_stream_id+1 to last_stream_id an iterative process begins with removing all but one instance of each event with duplicate track IDs or orphan IDs, regardless of whether they are sequential or not, from the playstream. At block 908, the track ID, artist ID, and album ID are extracted for each item in the processed playstream into an rpf format playlist. At block 910, the rpf playlist is stored in the watched directory at the start of the UMA build system playlist analyzer 106 with a 4 byte playstream user ID as the playlist Member ID, and the lower 24 bits of last_playlist_id+1 as the lower 3 bytes of the Playlist ID the upper bytes of the Playlist ID a code for the playstream source according to Table 10.
At block 912, increment last_playlist_id and update the list table in the ps database 510 with last_playlist_id. At block 914, update the stream table in the ps database with this_stream_id for this user_id. At block 916 the process ends.
To start, in a particular embodiment the linear estimator 202 receives the playlist and playstream data L(n) 116 and U(n) 118.
Linear Estimator for Estimating Co-Occurrences from Playlist and Playstream Data
The Bernoulli model, discussed above for determining co-occurrences to determine datasets for UMA 116 and LUMA 118 is presented below. The model postulates that the random occurrence of two items and on a playlist or in a playstream given that either item occurs on the playlist or in the playstream is modeled as a Bernoulli trial with probability:
where 0≦ρi,j≦1 is some symmetric measure (ρij=ρji) of the assumed “similarity” of item i and j. In this model, the number of co-occurrences of items i and j is modeled by a binomial random variable xij(n) and the expected number of co-occurrences is:
where x(i, j; n) is the number of playlists or playstreams that include item i or item j.
In
x
ij(n)={circumflex over (η)}(n)x(i,j;n)
where {circumflex over (η)}(n) is the estimated probability that both items i and j occur on a playlist or playstream if either one does, and x(i, j; n) is some preferred choice for the total number of playlists and playstreams that include item i or j.
A starting assumption for the estimator is that it may be desirable to arbitrarily weight the relative contribution of the playlist and playstream data in any estimate. The most straightforward way to do this is by defining two weighting constants 0≦αu, αl,≦1 such that the effective number of co-occurrences is αuuij(n) and αllij(n), and the total number of playlists including items i or j or as defined below is αu u(i, j; n) and αll(i, j; n). The estimate for η then is:
The estimator can then be re-expressed as:
For some specific choices of αu, αl and x(i, j; n), the general estimator reduces to specific linear estimators:
αu=1, αl=1, x(i,j;n)=u(i,j;n)+l(i,j;n)—The resulting estimator
x
ij(n)=uij(n)+lij(n)
with unweighted contributions by uij(n) and lij(n) turns out to be a simple minimum variance estimator as described below.
x(i,j;n)=αuu(i,j;n)+αll(i,j;n)—For this case, the estimator
x
ij(n)=αuuij(n)+αllij(n)
is a weighted minimum variance estimator. The weights should reflect some independent assessment of the relative value uij(n) and lij(n) contribute to the PCCs driving the recommender. Note the value of x(i, j; n) for this estimator implies that the popularities in the items Xi(n) and Xj(n) of the data set built from Ui(n), Uj(n), Li(n) and Lj(n) must be the weighted sum of the popularities Ui(n), Li(n) and Uj(n), Lj(n), respectively.
αu=αl, x(i,j;n)=αuu(i,j;n)+αll(i,j;n)—The general case of the resulting estimator
is an unweighted minimum variance estimator if the popularities in the items Xi(n) and Xj(n) are adjusted to be the weighted sum of the popularities in Ui(n), Li(n) and Uj(n), Lj(n), respectively. This form of the co-occurrence estimator may be useful for accommodating mathematical requirements in the subsequent graph search phase of the PCC build process.
x(i,j;n)=u(i,j;n)+l(i,j;n)—The general case of the resulting estimator
results in inconsistent datasets Xi(n). Because this choice for x(i, j; n) implies the popularities in Xi(n) and Xj(n) are the sum of Ui(n), Li(n) and Uj(n), Lj(n), respectively, but the co-occurrences are a weighted estimate, the number of playlists and playstreams implied by xi(n), xj(n), and xij(n) will be inconsistent with x(i, j; n). Furthermore, xi(n), xj(n) cannot be adjusted for every i and j to be consistent. The special case αu=αl reduces to the unweighted minimum variance estimator.
Graph Search for Determining Similarity from Co-Occurrence Estimate
The following discussion refers to the graphs illustrated in
In an embodiment, a graph search may identify all paths in X(n) graph 300 between all pairs of nodes comprising a head node and a tail node (or originating node and destination node). For a given head node, a search may determine all other nodes in graph 300 that are connected to the head node via some continuous path. For instance, head node 310 is indirectly connected to tail node 312 via path 308 through an intervening node 316. Head node 304 is directly connected to tail node 314 along path 311 via edge 302.
In Y(n) graph 400 the paths identified in graph 300 are represented as weighted edges (e.g., 402) connecting head nodes to tail nodes in graph 400. The weight attached to an edge is a function indicating similarity and/or distance which correlates to the number of nodes traversed over a particular path joining two nodes in the X(n) graph 300. For instance, for head node 410 (corresponding to node 310 of graph 300) and tail node 412 (corresponding to node 312 in graph 300) the weight on edge 408 correlates to path 308 in graph 300. The weight on edge 411 connecting nodes 404 and 414 correlates to path 311 in graph 300.
In an embodiment, for similarity, the weight on an edge joining a head node to a highly similar tail node is greater than the weight on an edge joining the head node to a less similar tail node. For distance the opposite is the case: the distance weight on an edge joining the head node to a highly similar tail node is less (they are closer) than the weight on an edge joining the head node to a less similar tail node.
Referring again to
In practice, variants of the second and third stage functionality may be combined into a single processing operation in several ways. For instance, in one embodiment, a Bayesian estimator 208 tunes the composite Z(n) 222 in response to user feedback F(n) 218. User feedback may be short-term user feedback Fs(n) and/or long-term user feedback Fl(n)) to produce the final PCC dataset P(n) 218. Long and short term user feedback is discussed in further detail below.
Referring again to
Given an initial update instant ni in which both item i and item j first appear on playlists or in playstreams, zij(n) may be computed as follows:
Using this formula the contribution of m(n) is faded out and the contribution of yij(n) is faded in, reflecting an assumption that even relatively small values of yii(n) should be used as yij(n) if they have persisted long enough because they represent rare but interesting similarities between i and j. A choice for the coefficient β under this assumption is:
β=e−1/N
where N is the number of updates after which the contribution of mij should be less than roughly ⅓.
A variety of other processes and procedures based on assumptions about the relationship between metadata similarity and the model of similarity implied by the graph search procedure on the co-occurrence data may also be executed by the adaptive recommender system 100 and claimed subject matter is not limited in this regard. For instance, the update instant n1 at which fading out of the metadata contribution begins could be delayed until the number of correlations between every item on the path between i and j exceeds a certain number. The graph search process would view the number of correlations between two items as 0 until a threshold is exceeded. Another approach could be based on deriving an estimate for the variance of the yij(n) and delaying n1 until that variance falls below a threshold value after both items i and j first appear on playlists or in playstreams.
PCC builder 104 in
It should be noted that in the embodiment described herein, the task of adapting the recommender to better match aggregate audience preferences is addressed. However, personalizing recommendations may be accomplished for instance by looking at results for individual users and claimed subject matter is not limited in this regard. Adapting the recommender kb 102 to aggregate audience preferences may be implemented in a variety of ways. Thus, the embodiments described herein are intended for illustrative purposes and do not limit the scope of claimed subject matter.
PCC datasets may be organized on a per item basis. The PCC dataset for item i may include a set of random variables ri,j, each of which is a monotonic estimate of the actual similarity ρi,j between item i and item j. The PCC dataset also includes a random variable qi which is an estimate of the popularity σi of item i.
In an embodiment, various sources of data that can be used in the recommendation process including: UMA 116, an analogous pair of popularity q′i(t) and association estimates r′i,j(t) based on user listening behavior using the LUMA 118 (see
Use of various types of user feedback leverages differences inherent and implicit in various types of feedback. For instance, there may be an essential difference between the replays/skips and the thumbs up/down ratings as listeners come to actually use those features. Aggregate replays/skips data may reflect the popularity arc of a track/artist/album. Aggregate thumbs up/down ratings may reflect something intrinsic about the quality of a track/artist/album. Replays/skips and thumbs up/down ratings data may be a measure of attributes of the specific tracks, or may be indicative of some relationship between the subject item and other preceding tracks. In other words, a thumbs-down rating on a rock track that appears in the context of a number of jazz tracks the listener likes suggests that the rock track is not a good recommendation to a listener who likes the jazz tracks but is not necessarily a useful rating of the inherent quality of the rock track.
Users may interact with media streams built or suggested using data provided by recommender kb 102. The users may interact with these media streams in several ways and those interactions can be divided for example into positive assessments and negative assessments reflecting general user assessments of the items in the streams:
Positive assessments are actions that to some degree indicate a positive reception by the user, for example:
Negative assessments are actions that to some degree indicate a negative reception by the user, for example:
In interpreting these actions, the context in which the user assessments are made may be accounted for by using the media streams as context delimiters. For instance, when a user bans an item j, (e.g. a Bach fugue) in a context that includes item i (e.g. a Big & Rich country hit), that action indicates something about the item j independently, and about item j relative to the preferred item i. Both types of information are useful in tuning the recommender. The view of media streams as context delimiters, and the user interactions as both absolute and relative assessments of items in those contexts, can be used to adapt the association information encoded in the unadapted PCC dataset Z(n) 222 to produce the final tuned PCC dataset P(n) 138.
Different user actions can be inferred to have different importance for tuning recommendations. Plays, replays, skips, thumbs up, and thumbs down actions suggest more transient responses to items, add-to-favorites and bans suggest more enduring assessments. To reflect this difference, the former user actions may be measured over a short time span, such as over one update instance or period, while the latter user actions may be measured over a longer time span.
The presentation of media items may be organized into sessions. Users may control media consumption during a presentation session by providing feedback where the feedback selections such as replays/skips and thumbs up/down rating features exert influences on the user-experience, for instance:
Based on these considerations information about the attributes of individual media items, and about the relationships between media items from the user feedback data can be extrapolated.
In Bayes Estimation, an observed random variable y is assumed to have a density fy(θ; y), where θ is some parameter of the density function. The parameter itself is assumed to be a random variable 0≦θ≦1 with density fθ(θ) referred to as a prior distribution. The problem is to derive an estimate {circumflex over (θ)} given some sample y of y and some assumed form for the distributions fy(θ; y) and the prior distribution fθθ). An important aspect of Bayes estimation is that fθ(θ) need not be an objective distribution as it standard probability theory, but can be any function that has the formal mathematical properties of a distribution that is based on a belief of what it should be, or derived from other data.
Because fy(θ; y) varies with θ, it can be viewed as a conditional density fy|θ(y|θ). The joint density fy|θ(y;θ) of y and (θ) then can be expressed as:
f
θ|y(θ|y)fy(y)=fy,θ(y,θ)=fy|θ(y|θ)fθ(θ)
Re-arranging by Bayes Law yields the posterior distribution:
Although fy(y) typically is not known, it can be derived from fy|θ(y|θ) and fθ(θ) as:
Given a value for y, the Bayes estimate for θ is the value for which fθ|y(θ|y) has minimum variance. This is just the conditional mean {circumflex over (θ)}=E{θ|y} of fθ|y(θ|y).
As a simple example of Bayes estimation, consider the case where fy|θ(y|θ) has a binomial distribution and fθ(θ) has a beta distribution:
The joint density then is:
From this the marginal can be computed as:
Taking the quotient yields the beta posterior density:
The Bayes estimate is the conditional mean E{|y} of fθ|y(θ|y)
Referring again to
The user feedback F(n) 122 in
The first five items (plays, replays, thumbs up, skips, thumbs down) may be aggregations over a small number of previous update periods, while the last two items (add to favorites, ban) may be aggregations over a long time scale.
At each update instant n, the number ai(n) of actual presentations of item i and the number aij(n) of actual presentations of item j in the context of item i is known. Let Ai(n) represent the collection of these counts for item i and A(n) represent the collection of all Ai(n). An estimate of the number of presentations di(n) and dij(n) that the audience actually desired is calculated from the A(n) and F(n), perhaps as the weighted sums:
where the γk and λk are arbitrary constants di(n) and dij(n) could also be computed according to any suitable non-linear functions di(n)=Γ(fi(n)) and dij(n)=Λ(fij(n)). This model can also be applied to user feedback measured on a “1”-“5” star scale, or any similar rating scheme.
With values ai(n) and aij(n) for the actual number of presentations of item i and of item j in the context of item i, and estimates di(n) and dij(n) for the imputed desired number of presentations, any number of schemes can be used to compute an estimate pij(n) for the component pij(n) of the PCC item Pi(n). In one embodiment, a Bayesian estimator (as described above) may be used to derive a posterior estimate {circumflex over (p)}ij(n) of the value pij(n) most likely to result in the desired number of presentations di(n) and dij(n), given that the actual presentations aij(n) were randomly generated by the recommender kb 102 and application at a rate proportional to the prior value pij(n) determined by the value zij(n) of the random variable zij(n).
The Bayesian estimator example described above makes the rather arbitrary assumptions that the random variable pij(n), given the actual presentations ai(n) of item i and the expected presentations ai(n)zij(n) of item j in the context of item i, has a beta distribution (omitting the update index n for the moment to simplify the notation):
and that the random variable dij(n) conditioned on pij(n) has a binary distribution:
The resulting random variable pij(n) conditioned on dij(n) also is beta distributed:
The Bayesian estimate for {circumflex over (p)}ij(n)=E {pij(n)|dij(n)} then is:
The Bayesian estimator for {circumflex over (p)}ij(n) only compensates for the difference between the user experience that resulted from the prior value pij(n) of and the desired user experience. The effects of zij(n+1) reflecting information from new playlists, new playstreams and metadata on the PCC dataset must also be incorporated in the computation for the new pij(n+1) value to be used in the PCC dataset until the next update instant. If it is assumed that the difference between the value pij(n+1) used by the recommender until the next update instant and the compensated {circumflex over (p)}ij(n) value for the current instant n is solely determined by the playstreams, playlists, and metadata fed into the system between instant n and n+1, an estimate for pi(n+1) can be expressed as:
p
ij(n+1)={circumflex over (p)}ij(n)+zij(n)−zij(n−1)
Finally, the notation with regard to time instants can be cleaned up a bit by letting pij(n) denote the random variable for the value of pij to be used from time instant n until the next update at time instant n+1, and letting dij(n) denote the random variable for the value of dij based on the user feedback from time instant n−1 until the update at time instant n based on experiences generated by the recommender for the value pij(n−1). With those definitions, the random variable pij(n) can be expressed as:
p
ij(n)=kppij(n−1)+kddij(n)+k0(n)+zij(n)−zij(n−1)
It is important to note that even though the assumptions about the forms of the densities fp(pij) and fp|d(pij|dij) may not match the actual data, and therefore that the estimate for pij(n) may be sub-optimal, the overall system may be stable as long as the estimates of di(n) and dij(n) are constrained such that di(n)≧dij(n). In production, the sub-optimal performance of the adaption process may be all but obscured by the other random effects in the system, but it may be necessary to estimate the relevant distributions if experience shows that better performance is required.
In another embodiment, consumption of media items by a single user may be organized into sets of items, which in the case of music media items may be called “tracks.” Sets may be referred to as a session ={I1, . . . , Il}.
i(n) may denote the set of sessions for day n which include item i. If user sessions span multiple days, sessions may be arbitrarily divided into multiple sessions. In a particular embodiment users may be restricted from randomly requesting items. However a user may request repeated performances and may skip the first or subsequent repeated performances. As a result, in general the set of sessions including i can be represented as the union i(n)=i(n)∪i(n) of two non-disjoint subsets i(n) and i(n) which include plays and skips, respectively, of item i.
For the purposes of discussion, the raw PCC dataset for item i are represented as φi, and the final PCC dataset as θi(k), where φi,j, ≡ri,j and θi,j(k) are the values for item j in the respective PCC dataset for item i. Xi(k), represents the number of times the system selects item i for presentation to the audience over some interval nk−Δ<n≦nk. Similarly, for the same time period, Yi(k) represents the number of times the audience would like to have item i performed, and the number of times the audience would like item j performed in a session with item i is represented as yi,j(k).
In one embodiment, inferring θi,j(k) from φi,j(k), Xi(k), Yi(k), and yi,j(k) proceeds in two phases at each update instant k. In the first phase, the quantities Xi(k), Yi(k), and yi,j(k) are inferred from the data. Using those statistics, in the second phase the final PCC entry θi,j(k) is estimated from the values for Xi(k), Yi(k), and yi,j(k) computed in the first phase and φi,j(k) using simple Bayesian techniques.
In an embodiment in the first phase the number Xi(k) of presentations of item i the system makes to the audience is expressed and the number Yi(k) and yi,j(k) of performances of item i and performances of item j in a session with item i, respectively, the audience preferred is inferred. Xi(k) is based on the system constraints. Since the user may not randomly request an item, and the system does not initiate presentation of an item more than once in an session, the number of presentations by the system is the number of sessions containing at least one play or skip of item i:
Although a particular session may include more than one instance of item i, only the first instance in either subset would have been presented by the system to the user. For later use in computing yi,j(k), the analogous number of presentations of item j in a session with item i by the system is:
In contrast to Xi(k), Yi(k), and yi,j(k) reflect audience responses to the items presented to them. As noted previously, the audience members may have two types of responses available to them. First, they may chose to listen to the item one or more times, or they may skip the item. And they may rate the item as “thumbs up”, “thumbs sideways” or “thumbs down”. Yi(k), and yi,j(k) may be inferred from user feedback provided through these mechanisms by computing certain daily statistics from the session histories described herein below. For convenience, in the description these statistics represent the sum statistic for a daily statistic z(n) as:
The statistics may be assumed to start from day n=1, and therefore z (n;n) is the sum from n=1.
To define Yi(k), three random variables are defined which are daily statistics for the sessions in Pi(n). Let pi(n), si(n), ui(n), and di(n) represent the number of plays, skips, “thumbs up” ratings, and “thumbs down” ratings, respectively, for item i. For these daily statistics, define the four sum statistics Pi(n, Δ), Si(n, Δ), Ui(n,n), and Di(n, Δ), where Δ defines the time period over which skipped items should be repeated less frequently. Although skipped items are discussed explicitly here, the effect of skips is primarily manifest in the system implicitly through a value for Yi(k) which would be less than the value the system autonomously would present in the absence of skips. The number of plays the audience desired is defined as:
The first bracketed term reflects the number of performances of those presented by the system that the audience actually chose to accept. The second bracketed term is the number of repeats requested by the audience, and the third term is a boost factor to reflect the historical popularity of highly-rated items. Assume that rating an item “thumbs down” does not automatically cause the system to skip the item and that a play is registered for the item. If the system automatically skips the item in response to a “thumbs down” user rating the first term would be Xi(k)−Si(nk, Δ).
The weighting factors specify the relative emphasis the system should give to the audience response to the baseline system presentation (λi), audience requested repeats (ki), and ratings (ni). The constant ξi plays a role in the second phase where it in effect prevents the system from exaggerating the similarity between item i and other items a session based on too little data about item i.
The number of performances of item j in a session with item i that the audience desired is defined in an analogous way to Yi(k). First let xi,j(n), pi,j(n), si,j(n), ui,j(n), and di,j(n) represent a number of presentations, plays, skips, “thumbs up” ratings, and “thumbs down” ratings, respectively, for item j in a session in which the user accepts a performance of item i, and define the corresponding sum statistics Xi,j(k), Pi,j(n, Δ), Si,j(n, Δ), Ui,j(n, n), and Di,j(n, Δ). The number of performances of item j in a session with item i desired by the audience then is:
System constraints that preclude the system from presenting an item more than once per session to a user, and the definition of Xi,j(k) is:
X
i(k)−Di(nk,Δ)−Si(nk,Δ)≧Xi,j(k)≧Xi,j(k)−Di,j(nk,Δ)−Si,j(nk,Δ)
Similarly, since under the same constraints an item can only be rejected at most once per session, Ui(nk, nk)≧Ui,j(nk, nk). If the user could not request that items be repeated, then Yi(k)≧yi,j(k) if λi≧λi,j, ki≧ki,j, ni≧ni,j, and ξi≧ξi,j. However, because the number of repeats a user may request of item i is independent of the number of repeats he or she can request of item j, we cannot assume that:
P
i(nk,Δ)+Si(nk,Δ)−Xi(k)≧Pi,j(nk,Δ)+Si,j(nk,Δ)−Xi,j(k)
or, therefore, that Yi(k)≧yi,j(k). Since it seems that a specific user request that item j be repeated would typically mean that the user just likes item j, rather than the user prefers joint performances of item i and item j, and repeats will be relatively infrequent, to account for this yi,j(k) by can be arbitrarily upper-bound by Yi(k).
Additionally, the coefficients λi, ki, ni, ξi, and λi,j, ki,j, ni,j, ξi,j may be selected using various techniques. One approach would be to derive the coefficients such that Yi(k) and □□,□(k) are a maximum likelihood or Bayesian estimates based on the observed data Pi(n, Δ), Si(n, Δ), Ui(n, n), Di(n, Δ), and Pi,j(n, Δ), Si,j(n, Δ), Ui,j(n, n), and Di,j(n, Δ).
Another method is the ad hoc technique based on the “gut feeling” how each component should be weighted to give the best picture of the audience preferences. In this case, it is important first to understand the role of the constant terms ξi and ξi,j by examining the ratio xi,j|Xi. As Xi becomes small, this ratio becomes increasingly non-representative of the entire audience. One way to counter this is to choose ξi and ξi,j such that the ratio ξi/ξi,j reflects the similarity value φi,j for item j in the PCC dataset for item i. The Bayesian estimation technique outlined in the below presents one formal alternative for incorporating φi,j.
Another important observation for the ad hoc approach is that the coefficients ki and kij, determine how much repeat requests by the audience members should be weighted. Arguably m repeat requests by a single audience member should be given less emphasis than m repeat requests by m audience members so ki and kij, should be monotonic increasing functions of the number of audience members represented by the sessions in i, i,j. The same reasoning applies to the coefficients ηi and ηi,j on the contribution of the positive rated items.
Once the random process models Xi(k), Yi(k), and yi,j(k) for the audience preference statistics are derived, a parameter estimation problem arises which is: For each pair of items i and j, there are observations yi,j(k) described by a random process yi,j(k) whose sample instants have the distribution fy(y) that depends in some way on the element θi,j in the final PCC dataset. There is also prior information in the form of an entry φi,j in the raw PCC dataset. In order to find the value of the parameter θi,j that best explains the observations yi,j(k) given the prior information φi,j, and to develop a realistic way for computing the weighting coefficients α, β, and γ an estimator of the general form:
θ(k)=αφ+βy(k)+γ
is used.
Thus, at any particular time assume that entry θi,j for item j in the PCC dataset for item i is the probability that item j should be presented to a user in a session with track i. Under this assumption, yi,j(k) has a binomial distribution (again omitting the subscripts to clarify the notation):
where, for a particular yi,j(k), θi,j(k), is an element of the final PCC dataset. Yi(k)=Yi(k) is the maximum number of possible presentations of item j in the context of item i derived by the methods discussed above in Phase 1, and is independent of the number of presentations of j.
Two approaches for estimating {circumflex over (θ)}i,j(k) that provides an explanation for an observed value y′i,j(k)=min {yi,j(k),Yi(k)}=yi,j(k) where the observed value y′i,j(k) is taken to be bounded by Yi(k) to account for possible user-requested repeats of item j in a session with item i are discussed herein. First, a maximum likelihood estimate for the second embodiment of the user feedback system in the absence of any other information about θi,j(k) and yi,j(k) is discussed. Then a Bayesian estimator for the second embodiment of the user feedback system which incorporates additional knowledge of the prior PCC φi,j(k) used to determine the number of items xi,j(k) originally presented to the user is discussed.
In the absence of any other information except the observed data yi,j(k)=yi,j(k), a choice for θi,j would be the maximum likelihood estimate (MLE) {circumflex over (θ)}ij. Omitting subscripts for notational clarity, the MLE {circumflex over (θ)} is the value of θ for which:
showing, in the absence of any additional information about θij(k), the best estimate is {circumflex over (θ)}i,j(k)=yi,j(k)/Yi(k).
The naive maximum likelihood estimator makes no assumptions about the properties of θi,j(k). The Bayesian approach to estimation assumes instead that θi,j(k) is a random variable θi,j(k) whose prior distribution fθ(θ) is known at the outset and treats the distribution fy|θ(y;θ) of the observed data as a conditional distribution fy|θ(y|θ). In this case of interest is an estimate {circumflex over (θ)}i,j(k) given the observation yi,j(k) and the assumption for the prior distribution of θi,j(k).
In the Bayesian estimation framework, {circumflex over (θ)}i,j(k) is referred to as an a posteriori estimate for θi,j(k), and is the value of θ for which the posterior distribution:
has minimum variance. This minimum variance Bayes estimate is the conditional mean θi,j(k)=E{θ|y} of fθ|y(θ|y).
The conditional distribution fθ|y(θ|Y) is assumed to be binomial. Further, fθ(θ) is assumed to be the conjugate prior density of fθ|y(θ|y). For a binomial conditional, the conjugate prior is the beta density:
where φi,j is an element of the initial PCC dataset used to select the xi,j(k) and Xi(k)=Xi(k) is the actual number presentations of item i initiated by the system derived by the methods of the previous section. Use Xi(k) φ here rather than xi,j(k) to explicitly incorporate the nominal influence of φ into the model rather than implicitly introduce φ via its influence on the observations xi,j(k).
Given the conditional distribution fθ|y(θ|y) and the prior density fθ(θ), joint density can be directly expressed as:
From the joint density, the marginal distribution can be derived as:
Taking the quotient shows that the posterior density is also a beta density:
Thus, from the posterior density fθ|y(θ|y) the Bayes estimator is:
For comparison, the maximum likelihood estimator is the value {circumflex over (θ)}MSE for which fθ|y(θ|y) assumes a maximum value (the mode). Using the methods of Phase 1, the following estimate is found:
The weighted sum forms of these estimates highlights how the coefficients depend on the sizes of the data sets in contrast to weighted sum formulations with fixed coefficients, and how both estimates can differ significantly from the maximum likelihood estimate of the previous section where the initial PCC value φi,j is not taken into account. This form also shows how the Bayes estimate includes a constant term that is not present in the ML estimate. Finally, for small X+Y the difference between the two estimates can be non-trivial, but for either large X or large Y the two estimates converge:
Although every item in every PCC dataset could be updated at each time instant, however for the case Yi(k)=0 and therefore yi,j(k)=0, in this case set:
Thus, even though the audience did not desire any performances of item i, or item j in the presence of item i, the value of θi,j(k) differs from φi,j. Note this is not the case for the maximum likelihood estimator since:
To differentiate the case of null audience feedback (no presentations of an item), from wholly negative audience feedback (all skips) can be done by elaborating the actual process for the estimator as follows:
where θi,j(0)=φi,j.
The proposed process for building PCC datasets seeks to combine processes for building U(n) and L(n) to build PCCs for the recommender. The new process suggests it can be reasonably viewed as a dynamical system driven by statistical data about user consumption, catalog metadata, and user feedback in response to recommender performance. The data processing involved has been described at a certain level of abstraction to provide reasonable insight into the actual objective of each step without prescribing specific, possibly suboptimal, computations in needless detail. The resulting system merges the two independent processes into a single process that addresses the cold start problem in reasonably simple but useful way. Finally, the new process provides a method for fine-tuning the PCCs in response to user feedback.
It will be obvious to those having skill in the art that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. The scope of the present invention should, therefore, be determined only by the following claims.
This application claims priority to U.S. Provisional Application No. 61/057,833 filed May 31, 2008 and incorporated herein by this reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61057833 | May 2008 | US |