Embodiments of the invention relate to methods of recommending items for a person's use.
Modern communication networks, such as mobile phone networks and the Internet, and the plethora of devices that provide access to services that they provide have not only made people intensely aware of each other, but have inundated them with a surfeit of information and options for satisfying any from the simplest to the most complex needs and desires. Whereas in the not too distant past, information available to an individual was relatively sparse and generally expensive in time and/or resources to acquire, today, information—wanted and unwanted—is relatively inexpensive. All too often, the information is overwhelmingly abundant and diluted with irrelevant information.
For example, today a person interested in choosing a movie may receive for review via the Internet and mobile phone or cable networks, a bewildering number of recommendations for many tens, if not hundreds, of movies. Each movie may be accompanied with options for viewing at home, at conventional movie theaters, on a desktop computer, laptop, notebook, and/or on a smartphone. A person in transit, on foot or in a vehicle, using a laptop or smartphone, can easily request suggestions for a choice of coffee shops or restaurants to patronize, and may receive a list of recommended suggestions of confusing length. Whereas, the cost of acquiring information appears to have plummeted, the task of managing its copiousness and various options to determine its relevance has become an increasingly complex and expensive task.
Various recommender systems and algorithms have been developed to attempt to deal with the challenges and opportunities that the abundance of information has generated, and to automatically focus and filter information to match a business's, organization's or person's, interests and needs. The systems and algorithms typically acquire and process explicit and/or implicit data acquired for people to determine characteristics of the people and their consumer histories that may be used to infer their preferences for various information, products, and/or activities, generically referred to as “items”.
Explicit data comprises information that a person consciously provides responsive to explicit requests for the information. Implicit data comprises data acquired responsive to observations of a person's behavior that are not consciously generated in response to an explicit request for information. The characteristics and/or their representations are used to configure and filter information recommended to a person to improve relevance of the recommended information and reduce an amount of irrelevant data that accompanies and dilutes the information.
Common recommender algorithms for automatically inferring and recommending items that a person might be interested in are algorithms referred to as “collaborative filtering” and “content-based filtering” algorithms. A recommender system using a collaborative filtering algorithm recommends an item to an individual if persons sharing a commonality of preferences with the individual have exhibited a preference for the item. For example, if the individual has shown a preference for item “A” in the past, and persons in the database who have shown preference for item A have also shown preference for an item “B”, then item B may preferentially be recommended to the individual. In accordance with a content-based filtering algorithm, a recommender system recommends an item to an individual if the item shares a similarity with items previously preferred by the individual. For example, if the individual has shown a preference for action movies, the algorithm may preferentially recommend an action movie to the individual.
An aspect of an embodiment of the invention, relates to providing a recommender system that recommends an item to a user responsive to correspondence between items preferred by the user in the past with clusters of items defined for a plurality of items. The clusters group related items and may be generated as functions of preferences expressed by a population of people and/or characteristics that the items share. Hereinafter, the plurality of items is also referred to as a “catalog” of items and items in the catalog may be referred to as “catalog items”.
In an embodiment of the invention, the recommender system processes explicit and/or implicit data acquired for the population of people and the catalog of items to define and “parse” the catalog of items into clusters of items and associate catalog items with the clusters in accordance with a clustering algorithm. To determine item recommendations for the user, the recommender system associates items, hereinafter “user legacy items”, preferred by the user in the past with catalog clusters to which they belong. A catalog cluster associated with a user legacy item is referred to as a “tagged” catalog cluster. The recommender system chooses catalog items to recommend to the user from among catalog items in tagged catalog clusters.
In an embodiment of the invention, the recommender system chooses catalog items to be recommended from catalog clusters that are not tagged, but are related to tagged catalog clusters by a shared characteristic. Optionally, the recommender system does not recommend catalog items from catalog clusters that are not tagged. The recommender system may recommend catalog items from different tagged and/or untagged catalog clusters to provide the user with a selection of recommended items exhibiting enhanced variety.
A catalog item chosen from a tagged catalog cluster for recommendation to the user in accordance with an embodiment of the invention may be a catalog item that satisfies a recommendation constraint. In an embodiment of the invention, trait vectors having components that are determined responsive to preference rankings for catalog items exhibited by a population of people represent catalog and user legacy items. Optionally, the recommendation constraint comprises at least one constraint on an inner product of a trait vector representing a recommended catalog item and a trait vector representing a user legacy item. The at least one constraint on the inner product may comprise a constraint that a magnitude of the inner product be greater than a given threshold magnitude. The at least one constraint on the inner product may comprise a constraint that a magnitude of the inner product normalized to magnitudes of the trait vectors be greater than a given threshold magnitude. The recommendation constraint may require that a recommended catalog item have an order greater than a given order in a set of catalog items ordered with respect to magnitudes of their inner products with a trait vector representing a user legacy item.
Optionally, the recommendation constraint comprises a content constraint and the recommender system chooses a catalog item from a tagged catalog cluster to recommend to a user by filtering the tagged catalog cluster using a content-based filtering algorithm. In an embodiment of the invention, the recommendation constraint comprises a collaborative filtering requirement and a recommended catalog item may be an item chosen from a tagged catalog cluster by filtering the tagged catalog using a collaborative filtering algorithm.
In the discussion, unless otherwise stated, adjectives such as “substantially” and “about” modifying a condition or relationship characteristic of a feature or features of an embodiment of the invention, are understood to mean that the condition or characteristic is defined to within tolerances that are acceptable for operation of the embodiment for an application for which it is intended.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Non-limiting examples of embodiments of the invention are described below with reference to figures attached hereto that are listed following this paragraph. Identical structures, elements or parts that appear in more than one figure are generally labeled with a same numeral in all the figures in which they appear. Dimensions of components and features shown in the figures are chosen for convenience and clarity of presentation and are not necessarily shown to scale.
In the discussion below, components of a recommender system in accordance with an embodiment of the invention are discussed with reference to
Recommender system 20 optionally comprises an “explicit-implicit database” 31 comprising explicit and/or implicit data acquired responsive to preferences exhibited by a population of users 21 for items in a catalog of items. Recommender system 20 may comprise a model maker 40 and a cluster engine 41 that cooperate to cluster related catalog items in catalog clusters and generate a clustered database 32. A recommender engine 50 recommends catalog items from catalog clusters in clustered database 32.
Optionally, the population comprises at least about a 1000 users 21. In an embodiment of the invention, the population may comprise at least about 100,000 users 21. Optionally, the population is equal to or greater than about 1,000,000 users 21. In an embodiment of the invention, the number of items in the catalog is equal to or greater than about 500 catalog items. Optionally, the number of items is equal to or greater than about 5,000 catalog items. In an embodiment, the number of items in the catalog is greater than or equal to about 10,000 catalog items.
Explicit data optionally comprised in explicit-implicit database 31 includes information acquired by recommender system 20 responsive to explicit requests for information submitted to users 21 in the population. Explicit requests for information may comprise, by way of example, questions in a questionnaire, requests to rank a book or movie for its entertainment value, or requests to express an opinion on quality of a product. Implicit data optionally comprised in explicit-implicit database 31 includes data acquired by the recommender system responsive to observations of behavior of users 21 in the population that is not consciously generated by an explicit request for information. For example, implicit data may comprise data responsive to determining which catalog items a user 21 in the population views in an online store, how long a user 21 focuses on a particular catalog item, or to determining a pattern that a user 21 exhibits in choosing catalog items.
Model maker 40 processes explicit and/or implicit data comprised in explicit-implicit database 31 to implement a model for representing catalog items that represents each of the catalog items by a representation usable to cluster the catalog items. Cluster engine 41 processes the representations of the catalog items provided by model maker 40 to generate “clustered database” 32 in which the plurality of catalog items is clustered into catalog clusters, each of which groups a different set of related catalog items. Whereas
Any of various models for providing representations of catalog items and methods of processing the representations to cluster the catalog items and generate clustered database 32 may be used in practice of an embodiment of the invention. Model maker 40 may for example generate representations of catalog items that are based on feature vectors. Optionally, model maker 40 represents catalog items by vectors in a space spanned by eigenvectors, which are determined from a singular value decomposition (SVD) of a “ranking matrix” representing preferences of users 21 for the catalog items. Model maker 40 may represent catalog items by trait vectors in a latent space determined by matrix factorization of a ranking matrix.
Cluster engine 41 optionally clusters catalog items in a same catalog cluster if same users exhibit similar preferences for the catalog items. Optionally, cluster engine 41 uses a classifier, such as a support vector machine, trained on a subset of the catalog items to distinguish catalog items and cluster catalog items into catalog clusters. In an embodiment of the invention, cluster engine 41 uses an iterative k-means clustering algorithm to cluster vectors representing catalog items and generate clustered database 32.
Whereas
For convenience of presentation, in
In
Optionally, cluster engine 41 generates a catalog cluster trait vector “V-CCLi, k” also referred to as a “cluster vector”, for each catalog cluster CCLi. A cluster vector V-CCLi, k represents catalog cluster CCLi in latent space 70, and is a function of trait vectors V-CITm, k representing catalog items CITm in the catalog cluster. Solid arrows 73 extending from origin 71 of latent space 70 schematically represent cluster vectors V-CCLi, k in
By way of example, cluster vector V-CCLi, k may be an average or weighted average of trait vectors V-CITm, k. Optionally, cluster vector V-CCLi, k 73 for a catalog cluster is a latent space vector that extends to a centroid (not shown) of the catalog cluster. For embodiments in which clustering is performed using a k-means algorithm, cluster vectors may be “centroid vectors”, which are used in a last iteration of the algorithm for which the clustering procedure is considered to have converged.
In an embodiment of the invention, cluster vectors V-CCLi, k may be used to configure a map of catalog clusters into which catalog items CITm are clustered. For example, a number of catalog clusters CCLi into which catalog items are clustered may be determined by requiring that different catalog clusters group catalog items of a desired degree of dissimilarity. The requirement may be implemented by requiring that an inner product of any two catalog cluster trait vectors V-CCLi, k have magnitude less than a given maximum magnitude. Alternatively or additionally, it may be required that all catalog items in a given catalog cluster exhibit a desired degree of similarity. The similarity requirement may be implemented by requiring that an inner product of the given catalog cluster's cluster vector V-CCLi, k 73 and the trait vector V-CITm, k of a catalog item in the catalog cluster be greater than a desired magnitude.
In an embodiment of the invention, recommender engine 50 recommends catalog items to a given user 21, responsive to correlations of the given user's past preferences for items and the catalog clusters determined by cluster engine 41. Optionally, the recommender engine processes a plurality of, “Z”, user legacy items, “UITZ”, 1≦z≦Z, which are items for which the given user has expressed a preference in the past, to correlate past preferences with catalog clusters. In
In an embodiment of the invention, recommender engine 50 chooses catalog items to be recommended to the given user 21 from the tagged catalog clusters and may employ any of various methods to choose a catalog item from a given tagged catalog cluster from among CCL1, CCL2, CCL4, or CCL8 for recommendation.
For example, in an embodiment, to choose a catalog item from an i*-th tagged catalog cluster “CCLi*”, recommender engine 50 may define a representative legacy trait vector, “V-RUITi*, k”, as a trait vector representative of user legacy items located in the i*-th tagged catalog cluster. (Hereinafter, an asterisk on an index, such as the index “i”, of an indexed symbol indicates that the indexed symbol refers to a tagged catalog cluster or an object associated with a tagged catalog cluster.) Representative legacy trait vector V-RUITi*, k is optionally an average or weighted average of legacy trait vectors V-UITz, k of user legacy items UITz, k comprised in tagged catalog cluster CCLi*. In
Optionally, a catalog item CITm that is chosen for recommendation from tagged catalog cluster CCLi*, is a catalog item for which an inner product of its trait vector V-CITm, k and representative legacy trait vector V-RUITi*, k has a magnitude greater than a desired minimum magnitude. Optionally, the minimum magnitude is not the same for all tagged catalog clusters CCLi* and the minimum magnitude is smaller for tagged catalog clusters found to include a greater number of user legacy items UITZ. In an embodiment, recommender engine 50 operates to recommend catalog items CITm from a tagged catalog cluster CCLi* responsive to a distance between the catalog cluster's cluster vector V-CCLi* ,k and the cluster's representative legacy trait vector V-RUITi* ,k. Catalog items may be chosen for recommendation from those catalog items for which their trait vectors V-CITm, k have an inner product with representative legacy trait vector V-RUITi* ,k that is greater than that between the cluster vector V-CCLi* ,k and representative legacy trait vector V-RUITi*, k.
By way of yet another example, recommender engine 50 may apply a collaborative filtering algorithm to catalog items CITm comprised in a tagged catalog cluster CCLi* to choose a catalog item from the tagged cluster for recommendation to the user.
In an embodiment, recommended catalog items are drawn from a catalog cluster CCLi that is not tagged by a presence of a user legacy item UITZ, but is characterized by satisfying a proximity constraint, which is a function of a metric that defines a distance between any two catalog clusters CCLi. The metric may be a magnitude of an inner product between cluster vectors V-CCLi, k 73 of the two catalog clusters. Optionally, the constraint requires that a distance between the non-tagged catalog cluster CCLi and at least one tagged catalog cluster CCLi* be less than a desired upper bound distance. A catalog cluster CCLi that satisfies the constraint may be referred to as a “related catalog cluster”.
In an embodiment of the invention, catalog items are chosen for recommendation from related catalog clusters responsive to their respective distances from a tagged catalog cluster. For example, more catalog items may be chosen for recommendation from related catalog clusters closer to a tagged catalog cluster than from related catalog clusters farther from a tagged catalog cluster. In an embodiment of the invention, the constraint requires that a distance between the non-tagged catalog cluster and each of two tagged catalog clusters be less than a desired upper bound distance. For example, non-tagged catalog cluster CCL3 in
After recommender engine 50 chooses catalog items CITm for recommendation from tagged, and optionally related catalog clusters, the recommender engine compiles a recommendation list of items to be recommended to the given user. Optionally, not all catalog items chosen for recommendation are included in the recommendation list. In an embodiment of the invention, a number of recommended catalog items CITm from each tagged catalog cluster CCLi* included in the recommendation list is responsive to a number of user legacy items UITZ found to be included in the catalog cluster. Optionally, the number of catalog items CITm from a tagged catalog cluster CCLi* included in the recommendation list is greater for tagged catalog clusters that include a greater number of user legacy items UITZ.
In a block 102, recommender system 20 acquires an explicit-implicit database 31 comprising explicit and/or implicit data for users Un, n=1, . . . N, and catalog items CITm, m=1, . . . , M. In a block 103 recommender system 20 determines a model for representing catalog items CITm and provides each catalog item CITm with a representation that may be used to cluster the catalog item.
In a block 104, a trial number “I” of catalog clusters into which catalog items CITm are to be clustered is initialized, and in a block 106 the recommender system defines catalog clusters CCLi, i=1, . . . , I, and clusters catalog items into the defined clusters. A catalog cluster CCLi comprises a subset of the catalog items having J(i) catalog items and may be defined by an expression CCLi={CITij:j=1, . . . , J(i)}. A sum of catalog items J(i) in the catalog cluster is equal to the total number of catalog items M, and in symbols M=ΣiJ(i).
Any of various models and methods may be used to configure and process data in explicit-implicit database 31 to represent catalog items CITm and provide catalog clusters CCLi. By way of example, in an embodiment, explicit-implicit database 31 comprises values of preference rankings for “M” items by “N” users, which are conveniently configured as an N×M matrix, a “ranking matrix”, RNKn, m (1≦n≦N, 1≦m≦M). In an embodiment of the invention, recommender system 20 assumes a latent space of “K” dimensions to represent catalog items CITm and factors ranking matrix RNKn, m to determine a trait vector V-CITm, k, 1≦k≦K for each catalog item CITm.
Recommender system 20 optionally clusters catalog items CITm by clustering trait vectors V-CITm, k in accordance with a k-means clustering algorithm. In a k-means algorithm, an initial set of I “clustering-vectors”, one for each of the I catalog clusters into which it is desired to cluster catalog items CITm, is optionally randomly determined. Each trait vector V-CITm, k is then clustered with the initial clustering vector for which its inner product is largest to form a first set of catalog clusters. For each catalog cluster in the first set of catalog clusters a new first iteration clustering vector is determined. Optionally, the new first iteration clustering vector is a vector that is an average or centroid vector of the trait vectors V-CITm, k in the cluster. The trait vectors V-CITm, k are clustered a second time using the first iteration clustering-vectors to determine a new, second set of catalog clusters and a second iteration clustering vector for each catalog cluster in the second set of catalog clusters. Iterations are continued until the procedure converges to an acceptable set of catalog clusters and cluster vectors V-CCLi, k.
In a block 108 recommender system 20 vets the catalog clusters to determine if the clusters include clusters that are too similar or too dissimilar. For example, in an embodiment of the invention, after generating catalog clusters CCLi and their representative cluster vectors V-CCLi, k (vectors 73 in
If on the other hand, in decision block 110 recommender system 20 determines that the inner products have satisfactory magnitudes, the recommender system optionally proceeds to a block 114 and acquires data identifying user legacy items UITZ, 1≦z≦Z. In a block 116, the recommender system determines to which catalog clusters CCLi each of the user legacy items UITZ (represented by x-circles 75 in
As noted above in the discussion of
In a block 120, recommender system 20 optionally compiles all recommended items RIT(i*)r from all tagged catalog clusters CCLi* into a recommendation set {S-RIT}={RIT(i*)r:i*=1, . . . , I* and r=1, . . . , R(i*)} for transmission to the user. Optionally, less than all of the catalog items chosen for recommendation are included in a recommendation set. For example, a user may request a list of recommendations limited to a number less than a total number of items determined eligible for recommendation.
In the description and claims of the present application, each of the verbs, “comprise” “include” and “have”, and conjugates thereof, are used to indicate that the object or objects of the verb are not necessarily a complete listing of components, elements or parts of the subject or subjects of the verb.
Descriptions of embodiments of the invention in the present application are provided by way of example and are not intended to limit the scope of the invention. The described embodiments comprise different features, not all of which are required in all embodiments of the invention. Some embodiments utilize only some of the features or possible combinations of the features. Variations of embodiments of the invention that are described, and embodiments of the invention comprising different combinations of features noted in the described embodiments, will occur to persons of the art. The scope of the invention is limited only by the claims.