Recommendation systems are programs that suggest items of potential interest to a person—such as television programs, music, and retail products—given some information about the person's interests.
Often, recommendation systems are implemented using collaborative filtering techniques, where a person's interests are determined (filtered) based on the interests of many other people (by collaboration). Collaborative filtering systems generally operate in two steps: First, identify people who share the same interests as the target user—as indicated by rating patterns or past purchase activity. Then, using the ratings from those like-minded people, recommendations are made to the user. Some shortcomings of naive collaborative filtering include: inadequate overlap of interests between the user and the group (a.k.a., the “sparsity problem”), ineffective if there is not enough rating or purchase information available for new items, potential privacy concerns of having purchase or preference information stored on third-party servers, and the potential for having recommendations influenced by the artificial inflation or deflation of ratings (spoofing).
Another approach to recommendation systems is content-based. In this approach, the content or other characteristics of the items themselves are used to gage a person's interest in new items. For example, knowledge of genres, artists, actors, directors, writers, MPAA-type ratings, cost, and production date of previously consumed (viewed, purchased, listened to) items is used to predict additional items of interest. These techniques depend on the ratings or past behavior of an individual user—not on the preferences of a group. Shortcomings of this approach can be: need for user to explicitly enter preference/profile information and difficulties in extracting good features for describing items.
One content-based system is described in U.S. Pat. No. 6,727,914 B1, by Gutta. Unlike the present invention, Gutta does not use a large, dynamic population of descriptive terms (attributes); instead, a small number of attributes are pre-defined for all users. His system appears to use a conventional, information gain directed decision tree rather than the incrementally updated clustering decision tree described herein. Further, unlike the present invention, his system appears unable to provide ranking scores for items, instead partitioning items into two groups: recommended and not-recommended. U.S. Pat. No. 6,727,914 B1 is also unclear which attributes are preferred for a television recommendation system, and whether they are defined as simple Boolean variables, multi-valued, or defined over a continuous range. Further, unlike the present invention, Gutta does not disclose how the system identifies programs watched by the user.
Attribute: In certain embodiments, an attribute is a term or feature used to describe an item, and often encoded in a binary format when represented in a BAF, attribute bit vector or term bit vector.
Attribute Bit Vector: In certain embodiments, an attribute bit vector is a bit vector containing fixed (predefined) attributes describing a particular item.
Binary Attribute Format (BAF): In certain embodiments, binary attribute format is both the format of and a reference to a compiled fixed length data structure containing all the information about an item for use by the IPGX Client.
Bit Vector: In certain embodiments, a bit vector is the Attribute Bit Vector and Term Bit Vector taken together, upon which learning, ranking, and filtering decisions are based.
Certainty Factors: In certain embodiments, a certainty factor is a prior probability score given to programs or other items indicating a measure of belief or disbelief in the like or dislike of an item by a viewer (or user in general). In other words, they indicate how good or bad the evidence is in classifying an item as being of interest or disinterest.
Clickstream: In certain embodiments, a clickstream is a time sequence of channel changes, web site visits, or button clicks made by TV viewers or computer users.
Clustering: In certain embodiments, clustering is the process of partitioning items into groups of similar items.
Clustering Decision Tree: In certain embodiments, a clustering decision tree is a decision tree in which leaves denote clusters of similar examples. In certain embodiments, the criteria used to determine node splitting in the clustering decision tree is similarity of cluster centroids, rather than a metric related to information gain.
Common (or Canonical) Text Format: In certain embodiments, common (or canonical) text format is a text encoding used for collecting key descriptive terms about items before they are compiled into Bit Vectors.
CPE (Customer Premises Equipment): In certain embodiments, a CPE device is an electronic computing device deployed in the home or end user location for the purposes of providing television, radio, information access, or other functions on a broadband or other communications network. Examples of CPE devices include television set-top-boxes (for cable, satellite, and IPTV), certain televisions, personal computers, game consoles, mobile telephones, IP telephones, personal digital assistants, iPods, music players, and the like.
Data Sources: In certain embodiments, are web sites, online databases, private databases, printed item descriptions, electronic files containing item descriptions.
Decision Tree: In certain embodiments, a decision tree is a tree whose internal nodes are tests on input patterns and whose leaf nodes are categories of patterns.
Example: In certain embodiments, an example is a bit vector, BAF, or the like that describes an item.
Headend: In certain embodiments, a headend is the distribution system's side of a transmission system where servers and broadcast equipment are located. Also appearing in as “head end” and “head-end”.
Inductive Learning: In certain embodiments, inductive learning refers to methods of learning from examples.
IPG (Interactive Program Guide): In certain embodiments, an IPG is a program that displays a variety of available program options to users and allows users to select programs for viewing, listening, or recording. IPG is also referred to as EPG (Electronic Program Guide).
IPGX (Interactive Program Guide Extensions): IPGX refers to several embodiments of embodiments of the present invention. In certain embodiments, IPGX provides television and movie program recommendations to users based on their viewing history.
IPGX Client: In certain embodiments, an IPGX client consists of the IPGX software components residing in a set-top box or other CPE device. Many variations of the IPGX Client are available according to embodiments of the present invention.
IPGX Server: In certain embodiments, an IPGX server consists of the IPGX software components residing at the distribution system headend. Many variations of the IPGX server are available according to embodiments of the present invention.
IPTV (Internet Protocol Television): In certain embodiments, IPTV refers to video content delivered using Internet Protocol, typically available from web sites or portals for downloading or streaming to CPE devices.
Information Retrieval (IR): In certain embodiments, information retrieval is the subfield of computer science that deals with the automated storage and retrieval of documents.
Items: In certain embodiments, items are television programs, movies, advertisements, music, books, merchandise, online auction-items, sports players, sports teams, e-mail messages, vendors, service providers, businesses, advertisers, web sites, video clips, pictures, text content, and the like
Item Descriptions: In certain embodiments, television program listings, interactive program guide entries, web pages, database entries, text documents, reviews.
iTV: In certain embodiments, ITV is an Interactive Television.
MSO (multiple services operator): In certain embodiments, MSO is a generic term for the provider of video programs and related services like Video on Demand, Electronic Program Guides, and Internet/Data Services.
Program: In certain embodiments, a program is a unit of video content such as a movie, series episode, advertisement, sporting event, or the like.
Program Data Repository: In certain embodiments, a program data repository is a database or other data structures where program data is stored on the IPGX Server.
Target: In certain embodiments, to target is to identify items of particular interest to a specific user or group of users, as in “to target advertising.”
Targeted Advertising: In certain embodiments, targeted advertising consists of information about products or services designed to appeal to specific groups of viewers and delivered to reach those viewers.
Term: In certain embodiments, a term is a descriptive element found in items or item descriptions, such as a word, phrase, name, title, e-mail address, identification number, and the like.
Term Attribute: An attribute mapped to a bit in the term map for a user, and subsequently appearing in a term bit vectors.
Term Identifier: In certain embodiments, same as term number. Also denoted by “term ID”.
Term Name: In certain embodiments, a unique name assigned to a term found in item descriptions, where space characters are replaced by underscore characters (_), all letters are converted to uniform upper or lower case, and the name is prefixed by a character that identifies the kind of term, such as actor, director, writer, genre, and the like.
Term Number: In certain embodiments, a unique integer assigned by the server to a term. In certain embodiments, a term number is assigned to a term only if the term meets several criteria. The term number is used to uniquely identify a term attribute during term mapping, learning, and ranking.
Term Bit Vector: In certain embodiments, a term bit vector is a bit vector containing the variable attributes (a.k.a. terms) describing a particular item.
Term Dictionary: In certain embodiments, a term dictionary is a table maintained on the Server that keeps track of all variable terms extracted from the various item description sources. The table maps each term to a unique identifier and notes the frequency each term has occurred in the entire item database.
Term Map: In certain embodiments, a term map is a list or other data structures maintained on set-top box or other CPE device that keeps track of variable attributes terms associated with “liked” items. Used for assigning terms to bits in local Term Bit Vector. Contains Term # (from Server Term Dictionary), Term Frequency (number of time the term has been seen in “liked” items on the Client), and Bit Assignment (which bit in the Term Bit Vector, if any, the term has been assigned to for the particular set-top box or PC).
Term Mapping: In certain embodiments, term mapping is the process of translating the variable terms encoded in Term Bit Vectors to specific bits in Bit Vectors. This personalizes item descriptions that are learned (via decision tree clustering) for an individual user or CPE device.
User: In certain embodiments, a user denotes an individual or group of people who use a CPE device or recommendation system, such as a television viewer, a radio listener, a group of television viewers who share a television or other CPE device, and the like.
Vector Space Model: In certain embodiments, the vector space model is a popular technique in IR where documents are represented as vectors of the significant words they contain. Each term in a vector specifies the frequency with which the word appears in the document. Similarities between documents (and between queries expressed as vectors and documents) can be computed using vector math operations.
Viewer: In certain embodiments, is a person who views and interacts with programs and other content provided by the distribution system. In certain embodiments, viewer is synonymous with user.
VOD (Video-on-demand): In certain embodiments, VOD refers to the process of unicasting video content to individual viewers on a specific CPE device whenever the viewers choose to view the content, as opposed to the more traditional method of scheduled broadcast video content delivery.
In an exemplary embodiment of the present invention, a system and method represents one or more items of interest to a user. The representation of an item of interest is presented as a vector consisting of N distinct attributes representing content or features that collectively describe the item. The relevance of an item, a quantitative estimate of a user's interest in the item, can be determined by analyzing the clickstream of remote control actions for television channel changes, or clicks used in navigating a web site (e.g., choosing auction items from an auction web site) via television or the internet. The N distinct attributes are gleaned from descriptive data about each item (preferably encoded in a canonical format). The attributes can be: (a) Predefined for the kind of items being considered, (b) Determined automatically from frequency data, or (c) A combination of both predefined and automatically-defined attributes. In an embodiment, each attribute in the item vector is a binary bit. Each item being learned by the system is assigned a relevance value representing a degree of interest the user has in the item. During a learning process, a binary decision tree is constructed using the Bit Vectors associated with each item being learned. Using an unsupervised learning algorithm, item vectors (a.k.a., examples) are clustered in the leaf nodes of the decision tree, each retaining its corresponding relevance value. The tree may be constrained to a fixed size, thus bounding the processing and memory requirements. To allow continuous learning of examples despite the fixed size, similar examples are merged and older examples forgotten. The invention is particularly suitable for systems providing broadband delivery of media content to CPE devices, because all learning, ranking, recommending, and filtering can be distributed to and performed within the constricted computing environments of CPE devices, and all personal data describing user likes, interests, and viewing habits are maintained within the CPE device, enhancing privacy.
To make item recommendations, the system periodically gathers item information from about items from data sources such as web sites, online databases, local files, electronic program guides (EPGs), auction item descriptions, and the like. As in the learning process, each item is represented as a bit vector with bits set to represent the current item. These vectors are passed to the ranking engine component of the system which filters the item through the binary decision tree and finds the cluster of examples most closely matching the new item. The item is then ranked by calculating an interest score for the item that blends the similarity of the item with the examples in the cluster and their associated relevance values. In the electronic program guide embodiment, the Attribute and Term Bits represent information such as title, creation date, writer, director, producer, cast, and genre.
Another embodiment employs a number of clustering decision trees, with each tree representing a demographic group of television viewers. These trees can be used to help determine the makeup of viewers in a given household using the same STB and thus help decide what programming and targeted advertising to recommend at a particular time. From the remote control clickstream data, the embodiment ranks each of the clustering decision trees, accumulates ranking values of each decision tree, and infers the quantity and characteristics of the people who live in a specific household with respect to their age and television viewing habits. The embodiment is implemented to ascertain the demographics of household members from data related to, for example, who is watching what program and at what times of the day on a set-top box. The embodiment predefines a number of clustering decision trees, each of which effectively describes a demographic group of interest. From the remote control clickstream data, the embodiment ranks each of the clustering decision trees, accumulates ranking values of each decision tree, and infers the number of individuals and the characteristics of the people who watch television programming, their age and their television viewing habits.
Yet another embodiment of the present invention compares the clustering decision trees generated by different applications, and/or on different set-top-boxes (if there are more than one set-top box) to generate more information about the viewers in one household. The embodiment compares the trees and branches of those trees for similarities and differences using a similarity algorithm. Differences would indicate another viewer who prefers one television unit over another. Similarities at different times indicate a viewer who watches different televisions at different times. This allows the identification of different television sets for the family, the kids, the bedroom, etc. Similarities at overlapping times may indicate multiple viewers in the house with the same interests and demographics. Small differences between these similar trees/branches/clusters may define subtle distinctions between similar individuals.
In another embodiment of the present invention, the algorithm is configured for filtering e-mail spam, where a front-end processor is employed to translate each e-mail message to set appropriate bits. Messages that are in a common text format (CTF), and received from an e-mail server are encoded into a binary attribute format (BAF) consisting of a message identifier, relevance (interest score), and attribute bit vector, such as, message date, time, size, header (i.e. from whom the e-mail message is, subject of the message, etc.), and attachments (if any). Based on recipient's interest in the message as determined by clickstream analysis (i.e. opening of the e-mail message, or deletion of the message without opening it), the embodiment determines whether or how to pass the message example to user.
According to another embodiment of the present invention, a system for and method of recommending items to a user is presented. This embodiment includes generating item descriptions and learning the item description of interest to a user. This embodiment further includes learning terms or attributes effective for representing item descriptions and clustering item descriptions that are similar. This embodiment further includes recommending item descriptions based on the learning steps.
Various optional features of the above embodiments include the following. The item descriptions may constitute television programming data, advertising data, electronic mail, or web-based auction items data. A relevance value of each learned items of interest may be calculated from clickstream data using the following formula:
In certain embodiments, the maximum relevance value may be set to 127.
The following description is intended to convey an understanding of the invention by providing a number of specific embodiments and details involving various applications of the invention. It is understood, however, that the invention is not limited to these embodiments and details, which are exemplary only. It is further understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
The following disclosure considers in detail potential applications for embodiments of the present invention, including, by way of non-limiting examples, systems and methods for providing greater personalization in the areas of TV programming, TV-based advertising, and email filtering.
In the past, television viewers identified television programs of interest using printed television program guides. Typically, such guides contained grids listing the available television programs by time, date, channel and title. As the number of television content providers increased, so did the number of programs available to viewers. In some parts of the world broadcast channels alone number in the hundreds. Along with the availability of pay-per-view channels and video-on-demand, the ability for viewers to effectively identify desirable television programs using such printed guides has become impractical.
More recently, television program guides have become available in electronic format, often referred to as electronic program guides (EPGs). An EPG is an application used with Digital Video Recorders, set-top boxes for Cable, Satellite and IPTV delivery systems, and newer TVs to list current and scheduled programs that are or will be available on each channel. EPGs display a fragment of the available broadcast content in grids listing the available television programs by time and date, channel and title—much like their paper counterparts. Some also provide search functionality that allows viewers to find programs by actors, keywords, and other criteria.
Referring now to
In one configuration, the system 100 includes a plurality of set-top boxes (STBs) 140(a)-140(n) located, for instance, at customer premises. Generally, an STB 140 is a consumer electronics device that serves as a gateway between customer's televisions 150(a)-150(n) and the network 115. Alternatively, STBs 140(a)-140(n) may be embodied more generally as a “Home Multimedia Platform” or an “Internet Protocol Set-Top Box”, signifying that STBs are becoming aligned with home broadband connectivity, such as wired or wireless LANs.
As shown in
EPGs may be improved according to certain embodiments of the present invention so as to enhance the ability of viewers to more quickly and easily identify programs of interest. For example, many viewers have preferences toward, or biases against, certain categories of programming, such as action-based programs or sports programming. By applying these viewer preferences to the EPG, programs of interest to a particular viewer can more effectively be found.
To address the limitations of EPGs, systems capable of making video programming recommendations are being developed. Some, like those offered by NetFlix, TiVo, and MovieLens are based primarily on collaborative filtering. Others, such as that described in U.S. Pat. No. 6,727,914 to Gutta, take a content-based approach. Combined aspects of collaborative filtering, content-based recommenders, and explicit feedback may be used to provide better solutions. To minimize the need for viewers to explicitly enter information about themselves, some systems keep track of the channels the viewers watch (or buttons they push) to attempt to learn viewers' interests in the background (a.k.a., “clickstreaming”). Many of these systems go a long way toward improving viewers' abilities to find programs of interest. Many, however, still suffer from some of the shortcomings of collaborative filtering and content-based approaches mentioned above.
In the field of television-based advertising, currently commercials aired during a program are those expected to appeal to the people that watch the program. This type of demographic targeting is coarse grained. It is well known that only a small fraction of viewers find the commercials of interest. More often, the viewer switches channels or walks away from the set. Methods are being developed to deliver advertisements on a more personalized basis. These methods leverage the collaborative filtering, content-based, and clickstream techniques discussed earlier and promise to better match product and service providers with interested customers. Methods that require the least effort by viewers, most closely match their current interests, and respect their privacy rights and expectations will generally be most successful.
With the pervasive acceptance of email comes the explosion of unwanted mass emailing (a.k.a., “spam”). Filtering techniques have been invented to stem the flood of spam, such as: local blacklisting where a system administrator maintains a list of spammers' IP addresses and blocks email from those addresses; distributed blacklists where web sites share blacklists; whitelisting that creates a list of accepted e-mailers and allows email only from senders on that list (cuts off email from legitimate senders that are not on the list); Bayesian filtering that scores words associated with spam and analyzes new messages based on the score. Other ways of filtering spam include: accepting only mail with a trusted key (may also remove non-spam emails that do not have an associated key); a greylist technique where a mail server refuses new messages with a temporary error and remembers the delivery attempt for the recipient email address, source email address, and source IP (may also refuse authentic messages until some time has elapsed since the first attempt, which introduces some delay on legitimate mail delivery). In turn, spammers are finding new ways to work around these protections. It would be beneficial to have more robust systems and methods for filtering unwanted electronic mail.
Embodiments of the present invention typically employ a primarily content-based approach that uses inductive learning techniques to learn user likes and dislikes in a way that is efficient in terms of processing time and memory space, requires no explicit input from users, and avoids some of the other drawbacks of collaborative filtering and content-based recommendation systems mentioned earlier. Among its various applications, it can be used to make TV program and movie recommendations, product/service recommendations, target advertising, and filter unwanted email.
Certain embodiments of the present invention may leverage techniques from the fields of artificial intelligence and information retrieval—most notably inductive learning and vector space modeling.
A wide variety of embodiments are referred to as IPGX, for Interactive Program Guide extensions. It extends an EPG to provide multimedia program recommendations to viewers.
In an IPGX embodiment, each program available from a distribution system is encoded by a Translator module 340 on the IPGX Server embodiment (
Pertinent to the operation of IPGX embodiments are the Relevance, Attribute Bit Vector, Term Bit Vector, Term Count, and Term List segments.
Relevance is an integer ranging from 0 to 255 quantifying the viewer's interest in a particular program. Its value is set by the IPGX Client embodiment (
The information encoded in the Attribute and Term Bit Vectors is compiled from a Common Text Format made up of program information extracted from sources such as Zapit™ 310(a), Yahoo™, TV Guide™ 310(b), and VOD catalogs 310(n). The Bit Vectors are used to drive the learning and ranking functions of a given IPGX embodiment.
The Attribute Bit Vector is a 64-bit structure with bits representing the fixed attributes defined in the Common Text Format. These are attributes like program air time, program cost, run time, age suitability, quality ratings, production year, MPAA-type ratings, content advisories, program type, and program genre.
The Term Bit Vector is a 64-bit structure with bits representing the variable attributes defined in the Common Text Format. These include program title, subtitle, description, actors, writers, directors, producers, sub-genres, and channel name. Variable attributes range over an unlimited set of terms that change over time as new actors, writers, directors, and producers gain prominence; new channels come on line; and new titles are released. This vector also accommodates words extracted from program descriptions.
The Term Count and Term List are used by the IPGX Client when assigning terms to bits in the Term Bit Vector.
The flow of the overall IPGX process of an exemplary embodiment is shown in
In step 220, item descriptions are generated by IPGX Server processes (
Once received by the IPGX Client, items of interest are identified in step 230 using a Viewer History Collector module 420. An embodiment of this component and step uses a clickstream algorithm as described in Appendix A to accomplish this learning. The clickstream algorithm assigns relevance values (a.k.a. interest scores) ascribing viewer interest to programs by analyzing clicks of the TV remote control. Based on the relevance value, the IPGX Client software determines whether or not to pass the program example to an IPGX Learning engine 440 to be learned. (In this embodiment, only “liked” programs exceeding a certain relevance threshold are learned.) Each program example consists of an appropriately set Attribute Bit Vector (64 bits) and its associated relevance value. In step 230, the embodiment of the present invention is configured to learn the programs that a viewer likes best.
In the next step 240, the IPGX learning embodiment 440 is configured to perform term mapping by analyzing the variable terms (
In step 250, the system uses a clustering decision tree to learn to recognize items (i.e., program examples) of interest, from relevant examples identified in step 230. This is accomplished using a binary decision tree (an IPGX Preferences Tree embodiment) that is built on the fly from examples using the complete Bit Vectors generated in the previous step. The nodes of the tree correspond to decision logic that decides which node to route a given example to next based on the values of bits in the Bit Vector—one bit decision per node. Tree leaves store example vectors and their associated relevance values. Leaf nodes are split into new leaf nodes and the tree is reorganized as needed. The clusters are groups of “liked” programs that have similar attributes as computed by the decision tree logic.
Finally in step 260, new programs represented as Bit Vectors are passed to the IPGX Ranking engine embodiment 435 in order to assess their potential interest to the viewer. Using the IPGX Preferences Tree, a candidate program is filtered into the cluster containing the most similar examples and a score is generated for ranking the candidate program against other programs. The scores are a measure of similarity between the candidate program and the example programs in that cluster combined with the relevance values of those example programs. Candidate programs that are most similar to the most relevant examples receive the highest scores. Feedback about the quality of clusters and program recommendations can be used to modify which attributes are used and what clusters are formed.
The methods of Information Retrieval and Machine Learning used in certain embodiments have many characteristics in common—making them very compatible and powerful when used in combination. These characteristics include, by way of non-limiting example:
Techniques specific to Information Retrieval leveraged in this embodiment include:
In the Machine Learning domain, an IPGX embodiment uses binary decision trees to cluster program examples. It builds trees incrementally using an unsupervised learning technique.
Some unique aspects of this embodiment include, by way of non-limiting example:
The following paragraphs discuss the main data structures in a typical IPGX embodiment: Attribute and Term Bit Vectors, Term Dictionary, Term Map, and Decision Trees.
Attribute and Term Bit Vectors are used to represent programs in a way suitable for building decision trees and calculating similarity between programs. These vectors consist of bits representing various program features and content. IPGX embodiments typically use binary values for each element in these vectors (thus the name Bit Vectors). Boolean bits, in lieu of integers, strings, or other data types, are well suited to the set-top environment and provide good overall performance.
The more representative these features are of programs, the better the clustering, ranking, and overall system performance
The complete Bit Vector can be represented as follows:
(a1 a2 a3 . . . a64 t1 t2 t3 . . . t64)
Where,
a1 . . . a64 represent the 64 fixed attribute bits (Attribute Bits), by way of non-limiting example:
If the attribute equals “1,” that attribute is true for the current program. If “0,” the attribute is false for the current program.
As described in Appendix B, fixed attributes in many cases have been consolidated from source attributes spanning continuous ranges or a large number of choices. (This is sometimes called “dimensionality reduction.”) For example, program production years (YEAR) have been translated into four date ranges: OLD, DATED, MODERN, and CURRENT; and program genres (GENRE) have been consolidated into 15 genres deemed most significant. This has two benefits: (1) It allows simpler and faster bit operations on the data, and (2) It avoids over-constraining (over fitting) the data for classification and ranking.
t1 . . . t64 represent the 64 variable attribute bits (Term Bits)
Term bits are defined uniquely for each set-top box based on the frequency the terms appear in the descriptions of watched programs. (These are the terms contained in the Term List of the BAFs sent from the IPGX Server embodiment.) They are maintained in a Term Dictionary on the Server and tracked on the set-top using a Term Map. The Term Map maps the “most important” terms (those having a relatively large amount of statistically significant correlation to programs of interest) to bits in the Term Bit Vector.
For example, given the term map shown in
These bits indicate preferences for programs featuring Russell Crowe as an actor, broadcast on ESPN2, falling into the subgenre Soccer, and directed by Ron Howard. Note that terms used on the Client are encoded as an identification number. These IDs are defined by the Server and mapped to underlying strings in the Term Dictionary.
This vector indicates the following about this movie:
It is worth noting additional variable terms (up to 64) were likely sent to the set-top as being relevant to this movie, such as “Aed harris,” “Ajennifer connelly,” “Achristopher plummer,” “Ajosh lucas,” “Tbeautiful,” “Tmind,” “Sbrilliant,” “Sasocial,” “Smathematician,” “Scryptography,” “Snightmare.” However, since none of these terms corresponded to bits assigned in the Term Map, they were not assigned in the Term Bit Vector and thus not used for learning or ranking.
In another embodiment, no fixed Attribute Bits are used. Instead, all attributes are Term Bits and they are dynamically adopted for use in Bit Vectors based on their correlation significance to program examples on a given set-top box.
All programs are represented by vectors using these attributes with bits set as appropriate for the particular program.
On the IPGX Server embodiment, a dictionary is maintained to keep track of the terms extracted from the program descriptions. These are the terms used to populate the Term Bit Vectors. The dictionary contains term names, a count of the number of times each term appears in all program descriptions (term frequency), and a global ID used to refer to the term on set-top boxes (term #).
Term names are prepended to unique characters to designate the type of term. As described in Appendix B, current types are G (Sub-Genre), T (Title), S (Subtitle and descriptive words), A (Actor/Cast Member), D (Director), P (Producer), W (Writer), and C (Channel). These classifications denote the meaning of the terms.
The following lists give examples of the types of terms stored in the Term Dictionary.
Sub-genres:
The current data type of Term # is an unsigned 16-bit integer. Therefore, 65,536 terms may be assigned.
Terms are processed differently depending on term type.
MAX Filter and MIN Filter indicate cutoffs for terms having frequencies above or below given thresholds. Only those terms with frequencies within range are assigned Term #s to be included in the BAFs and sent to the Client to be learned or ranked. Additional filtering takes place on the Client in determining what terms are included in the Term Bit Vector.
The mapping of terms to bits in the Term Bit Vector is accomplished using a set-top Term Map. This structure keeps track of terms associated with programs of interest as seen in BAF Term Lists sent to the set top.
As shown in
Referring back to
Each of these terms has been encountered enough times in maximum relevance items (i.e., BAF examples with maximum relevance) to merit assignment as term bits. They are thus used to learn and rank programs for this set-top box.
A decision tree provides the core of a typical IPGX preferences structure. It currently takes the form of a univariate binary tree. Such trees can be characterized recursively as follows:
The IPGX decision tree embodiment encodes the viewing history of a set-top box using the bits provided in the Attribute and Term Bit Vectors. In one embodiment, only “liked” programs are learned in the tree. The degree each program was liked is encoded in the relevance value for that program. That information is stored with the program vectors in the leaf nodes.
IPGX tree embodiments are built incrementally from example program vectors, automatically clustering similar programs into leaf nodes.
The example vectors stored in the leaf nodes are kept in chronological order to facilitate merging or replacement operations when the leaf is “full.” The similar examples get merged or the oldest examples replaced.
When used for ranking, candidate program vectors are submitted to the tree, a similar cluster found, and a ranking score computed using the similarity of the candidate vector to the other vectors in the cluster while factoring in the associated relevance values of the examples.
An IPGX decision tree embodiment is shown in
Additionally, program example vectors do not maintain their associated Program ID. In fact, examples having similar vectors are merged into single (archetypical) vectors in the interest of storage space and computing efficiency.
A clickstream algorithm in step 230 has been defined for using channel change information to quantify viewing preferences. This algorithm runs on the Client and sets the relevance value for programs that have been viewed. Details of the algorithm are provided in Appendix A.
Programs with relevance values exceeding a certain threshold are passed to the IPGX Learning engine embodiment to be learned as examples of “liked” programs.
While monitoring what viewers watch allows for some inference of dislikes or aversions, improvements are possible. (It is possible viewers simply didn't know what shows were available, or the viewer had to choose between various desirable programs aired at the same time.) Thus relevance ratings are assigned in the range 0-255 where 0 represents ambivalence or no information available, 255 represent the strongest possible interest and values in between indicate progressively more “liking.” As a result, the current decision tree clusters only programs presumably of interest (with some noise of course). Since a preferred purpose of the IPGX system is to identify the most desirable recommendations for viewers, it may be that negative examples could be ignored, but in other embodiments of the present invention, negative examples may have a certain value; as such, those negative examples are accommodated and retained by the embodiment described herein.
Attribute Bit Vectors are typically built on the Server using the fixed attributes defined in the Common Text Format. Attribute Bit Vectors may also be built on the client when necessary (e.g., source data comes from program guide information within the client). Term Bit Vectors are built on the Client using the BAF Term List and the set-top Term Table.
An exemplary process for building term vectors is:
Criteria for assigning a term a bit in the Term Map are:
These thresholds make it easy for terms to initially be assigned bits and it becomes more difficult as more terms are seen.
Every time a term is seen in the BAF of a program being learned, its frequency count is incremented by 1. Terms seen for the first time are added to the Term Map and their frequency set to 1.
As Term Frequency counts reach the limit of the Term Frequency data type, all term frequencies in the list are reduced by half.
Typical elements of maintaining the Term Dictionary on the Server include:
The stop word list on the Server includes words judged to be not useful in characterizing programs. These provide an initial filtering out of high frequency words from the program description stream. Additional high frequency filtering is provided by the cutoffs determining what terms are included in BAFs.
Term stemming is accomplished in one embodiment using the Porter Stemmer.
IPGX embodiments typically build univariate binary trees as depicted in
New examples need not require a radical rebuild of the tree; it is possible to modify the tree to accommodate such new examples by allowing it to sprout new leafs as appropriate.
Only Bit Vectors for programs deemed to be of interest to the viewer are typically sent to the IPGX Learning engine embodiment for learning and thus incorporated in the tree.
The trees are built incrementally. Leaf nodes are split into two new leafs based on the following criteria:
To decide whether or not to split a leaf, each attribute is tried and centroid differences calculated until one is found that exceeds the threshold. If no split is found with a suitable metric, the leaf node remains as is.
Calculating the centroid of those program data sets may use, by way of non-limiting example:
Where Cx and Cy denote the two centroid vectors and the a's represent each attribute bit of the centroid vectors.
Since the program vectors are binary, this equation may be represented as, by way of non-limiting example:
where,
When the number of examples in a leaf node reaches a maximum, e.g.,16, each new example is merged with an existing example (if one is found that is similar enough). Otherwise the new example replaces the oldest previous example. The current metric for merging examples is set for a similarity value between vectors that is >0.9 where similarity is measured by the Dice Coefficient.
The Dice Coefficient may be represented as, by way of non-limiting example:
Where px and py denote the two program vectors and the a's represent each attribute bit. Since the program vectors are binary, this equation reduces to, by way of non-limiting example:
where
The following are typical characteristics of IPGX decision tree embodiments:
Ranking is accomplished by passing vectors representing new programs of potential viewer interest to the IPGX Ranking engine embodiment.
The vectors are prepared much like those for learned examples—the Server sets the fixed attributes and the Client sets the variable term bits based on the Term Map.
The Ranking engine navigates the tree to find the cluster best matching the example. A rank score is then calculated by blending the similarity between the submitted example and each example stored in the cluster with the associated relevance values of each example in the cluster. The blending is the summation of the Dice coefficient times the relevance value for the example summed over each example in the cluster. A K-factor is used to adjust for the relative importance of the two weighting schemes.
The engine returns a rank value for each example sent to it suitable for comparison against other examples. Thus a rank order of programs of potential interest can be provided for the viewer. The Client can send any collection of examples to be ranked, such as all programs available in some time window or all associated with particular genres. This will help constrain the ranking to what may be of particular interest at the time.
In summary, the IPGX embodiment described here extends Electronic Program Guides to allow program recommendations to be made to users based on their viewing history. To deliver this capability, certain IPGX embodiments leverage vector space model and inductive learning techniques.
These techniques allow certain IPGX embodiments to meet goals of: ease of use, ability to make good recommendations despite the presence of exceptions and dependencies, consistent with “edge computing,” provides a ranking capability for new items, and protects user security and privacy.
Some ways in which IPGX embodiments differ from other approaches include:
An example would be a 30 second advertisement for a product such as a specific automobile. If the viewer spends 20 seconds watching the advertisement, the interest score of that viewer for the product is 85 that translate to a strong interest. In such an event, the embodiment of the present invention utilizes the interest score, and transmits to a learning core that tailors individual advertisements of that specific automobile to the specific viewer.
Another embodiment employs a number of clustering decision trees, with each cluster representing a demographic of television viewers. From the remote control clickstream data, the embodiment ranks each of the clustering decision trees, accumulates ranking values of each decision tree, and infers the quantity and characteristics of the people who live in a specific household in relation to their age, and television viewing habits. The embodiment is implemented to discover the demographics of, for example, one household, and who is watching what program and at what times of the day on one set-top box. The embodiment pre-defines a number of clustering decision trees, each of which is associated with a demographic group of interest. From the remote control clickstream data, the embodiment ranks each of the clustering decision trees, accumulates ranking values of each decision tree, and infers the number of individuals and the characteristics of the people who watch television programming, their age and their television viewing habits. The demographic groups are then sorted according to the accumulated ranking values of each decision tree. As a result, the demographic group with the highest value may be representative of those watching television on that specific set-top box.
To identify the demographic group of the current dominant viewer, the embodiment performs the ranking, accumulation, and sorting for only the last 20 minutes of clickstream data. In another variation of the present invention, the algorithm constructs a 2-dimensional matrix indexed by both tree demographic, such as age and their television viewing habits, and a time value day-of-week/time-of-day at which the data is collected. This allows the demographic results to vary over time, giving more precise results of time-related viewing. For example, the set-top-box may be controlled by a family of two, the first one being a hard-riding biker father, and the other being his 3-year-old daughter. The combination of the primary method based on the identification of the two demographic groups, and a secondary method based on the time of day and the day of the week that a demographic group yields an idea of who is watching at any given time. As a result, this combination produces better historical patterns that may utilized in constructing individual targeted advertising for different demographic groups of a single household. Another method identifies branches of the clustering decision tree associated with individual viewers, calculates a centroid of each cluster as well as a centroid of all the clusters. When the centroid of all clusters in one branch is significantly different than that of another branch, associate each branch with a distinct viewer, thereby resulting in a similarity of one cluster for one viewer being higher than another cluster generated by the viewing habits of another viewer.
Yet another embodiment of the present invention compares the clustering decision trees generated by different applications, and/or on different set-top-boxes (if there are more than one set-top box) in the house to generate more information about the viewers in the household. The embodiment compares the trees and branches of those trees for similarities and differences using a similarity algorithm defined earlier. Differences indicate another viewer who prefers one TV over another. Similarities at different times indicate a viewer who watches different televisions at different times. This may allow us to identify different television sets for the family, the kids, the bedroom, etc.
Similarities at overlapping times indicate multiple viewers in the house with the same interests and demographics. Small differences between these similar trees/branches/clusters may define subtle distinctions between similar individuals.
To bid on on-line auction items, bidders may log onto one of the many auction web sites, register by entering their basic personal information about themselves, credit card information (if any), and e-mail addresses for payment processing and communication purposes respectively. In addition, bidders are usually required to submit a username and password for allowing them to log onto the auction web site. Then bidders then search categories to find things about items to bid on, and when they find an item of interest, bidders usually select that item, enter their registered username and password and the bid amount if the current bid is the minimum, otherwise, a bidder would place a bid that is higher than the current high bid, which is displayed along with the description of the item. In the event that the bidder is the highest bidder or has been out-bid by another one, he or she is notified by e-mail. As such, an automated embodiment for organizing a bidder's pattern of items and the ability for bidder's to receive information about items of interest is needed.
Another embodiment employs a number of clustering decision trees to recommend on-line auction items for individual bidders. The bid item information about a currently selected item that a bidder is bidding on, information such as type of an item such as for example, a wristwatch, a digital camera, a laptop computer, etc., age of an item, size, and other characteristics is collected by the embodiment of the present invention, transformed into a binary vector, and the vector is assigned a relevance value. The relevance value ranges from zero to 255, 127 positive values, 127 negative values and one zero value. The positive values represent a positive interest; zero value represents a neutral interest, and a total lack of interest on the part of the bidder for items with negative values. Based on the relevance value accorded to the auction item data, the embodiment of the present invention passes interest scores to a learning engine that utilizes a binary decision tree constructed from the term bit vectors associated with the auction item data that has the maximum relevance values. Along with the number of times that a bidder inquired about an item, an example of a bidder's preference is presented to the learning algorithm that incrementally updates the learned preferences of the bidder. The embodiment of the present invention employs a binary decision tree to store learned information from the sequence of examples, maintains the set of examples which are grouped into clusters that were already determined to have matching attributes of an on-line bid item. The embodiment periodically collects bid items from a multiple on-line auction web sites, and when new examples related to a given item is identified, the embodiment applies a similarity configuration in order to find an existing leaf node of the binary decision tree where each new example related to a bid item should reside. In order to rank the examples in the cluster, the embodiment employs a similarity algorithm based on other examples in the cluster. Another step of arbitrating between attributes associated with items of interest and items of disinterest, the embodiment calculates the level of interest using certainty factors that suggests a prior probability represents a measure of belief (MB) for positive examples, and a measure of disbelief (MD) for negative examples. After the auction items are given a certainty factor value, then the embodiment of the present invention groups the items using a similarity algorithm, and concludes which of the examples to keep on a leaf node of the decision tree, and which ones to discard or merge with others. Based on the certainty factors value accorded to an on-line auction item, and the examples in the leaf nodes of the binary decision tree, the embodiment creates individual preferences based on the learned process to recommend to a specific user.
In another embodiment of the present invention, the algorithms are configured for a data filtering application, such as filtering e-mail spam. In this exemplary embodiment, all processing takes place on the client (e.g., PC or set-top box) after e-mails are received from an e-mail server. Items are e-mail messages rather than video program descriptions. Fixed attributes include bits for representing information such as, by way of non-limiting example:
Variable terms are parsed from the To, From, and Subject fields and the Body mails and represented as:
These terms are encoded and maintained in a Term Dictionary (hash table) on the client and used to assign term bits based on term frequency.
With reference to
In step 20, item descriptions are generated by client processes by receiving e-mails from an e-mail server, extracting attributes and terms from the e-mails, and maintaining extracted terms in a local Term Dictionary (hash table).
Then, items to be filtered (e.g., those e-mails considered spam) are learned in step 230. In a spam filtering application this would likely be accomplished by the user clicking a “This is SPAM” button.
In the next step 240, the Learning engine 440 is configured to learn the variable terms listed above based on the number of times it encounters those terms in e-mails of interest and, when a threshold is reached, it adds the term to the Term Bit Vector. Once this step is done, a complete bit vector representing the e-mail is ready to be learned by the decision tree.
In step 250, the system clusters the e-mail examples for comparison against future incoming e-mails using the binary decision tree. E-mails designated both SPAM and NOT SPAM are learned by the tree to allow appropriate discrimination during spam filtering.
Finally in step 260, new e-mails are processed into Bit Vectors and passed to the Ranking engine 435 to assess their potential interest to the recipient. Using the Preferences Tree, a candidate e-mail is filtered into the cluster containing the most similar examples where it is determined if the e-mail is spam or not. E-mails classified as spam are routed to a special storage area for future inspection by the user and/or for deletion.
It will be apparent to those skilled in the art that there are many variations that may be made in the embodiments of the invention as described above without departing from the spirit and scope of the invention. There are, for example, different ways that one might apply a variety of known methods for selecting statistically significant terms to accomplish the purposes described in the invention. Such differences do not necessarily constitute distinguishable variations because it is well known that the same results in function, hence method of application, may well be accomplished by a variety of code and/or system arrangements. There are similarly many other variations that will be apparent to those with skill in the art, all within the spirit and scope of the invention.
The present application claims priority to and incorporates by reference in its entirety U.S. Provisional Patent Application No. 60/709,420, entitled “Method and Apparatus for Multimedia Program Recommendation.” The invention relates to an intelligent technique for learning user interests based on user actions and then applying the learned knowledge to rank, recommend, and/or filter new items based on the level of interest to a user. More particularly the invention relates to an automated, personalized information learning and recommendation engine for a multitude of offerings, such as television programming, web-based auctions, targeted advertising, radio programming, websites, video clips, and electronic mail filtering. Some embodiments are structured to generate item descriptions, identify items of interest from user actions, discover relevant attributes for individual users, cluster similar items in a compact data structure, and then use the structure to rank new offerings.
Number | Date | Country | |
---|---|---|---|
60709420 | Aug 2005 | US |