Disclosed embodiments relate to recommendation methods, systems, and apparatuses. Specifically, they relate to recommendation methods, systems, and apparatuses based on collaborative filtering techniques.
The objective of a recommender system is to use information about users to predict the utility or relevance of a particular item, and use this information to provide personalized recommendations for users. There are two main types of recommendation methods and systems: content-based filtering and collaborative filtering.
Content-based methods, such as the one disclosed in U.S. Pat. No. 7,373,318, select the items to recommend based on their content. These methods are usually limited to items that could be analyzed by a machine. Consequently, such systems are difficult to implement in situations that require retrieving multimedia information where machine perception of the content (colors, textures, etc.) differs greatly from user perception. Furthermore, content-based filtering cannot evaluate the quality of an item nor can find serendipitous items (i.e., recommended items that are not apparently related to the user profile). Consequently, content-based methods are inappropriate for recommending either items to users based on the quality or multimedia items.
Collaborative filtering methods are not based on the content of items but rather on the opinions of other users. These methods recommend items that have received high ratings by other users with similar tastes or interests. In these techniques, the items are actually rated by people. Consequently, the system does not need to analyze content (and, therefore, they can be implemented for any type of item including non-annotated multimedia content), and the quality or subjective evaluation of the items is also considered. User ratings are stored in a table known as the rating matrix. This table is processed in order to generate the recommendations. Depending on how the data of the rating matrix are processed, two types of methods, memory-based and model-based, can be differentiated. Memory-based methods, such as the one disclosed U.S. patent application Ser. No. 10/333,953, use the whole table to compute their prediction. Generally, they use similarity measures to select users (or items) that are similar to the active user. Then, the prediction is calculated from the ratings of these neighbors. Memory-based methods are simpler and obtain reasonably accurate results. However, they present serious scalability problems given that the method has to process all the data to compute a single prediction. With a high number of users or items, these methods are not appropriate for online systems which require real-time recommendations.
Model-based methods, such as the one disclosed U.S. patent application Ser. No. 12/347,958, first construct a model to represent the behavior of the users and, therefore, to predict their ratings. The parameters of the model are estimated offline using the data from the rating matrix. These methods tend to be faster in prediction time than the memory-based approaches. However, model-based methods still present several problems. The models can be extremely complex, as they have a multitude of parameters to estimate, and they can be too sensitive to data changes. Additionally, the assumptions of the model may not fit the data, leading to wrong recommendations. In practice, many theoretical models cannot be applied to real data. Moreover, model construction (and update, when new data are added) usually takes a long time for practical applications.
In general terms, collaborative filtering methods are based on the similarities among users or items. Although many different techniques have been used to process the data, most focus on finding more or less hidden relationships. The idea is that, if two users show a similar rating pattern, they will probably coincide in the missing ratings, too. However, to find these relationships, most techniques require a significant amount of information. Therefore, with sparse datasets, these similarity-based methods face serious problems. Currently, there is a need for methods, apparatuses, and systems able to provide high recommendation quality while retaining scalability and computational efficiency.
Disclosed embodiments include a method for recommending one or more items among a plurality of items to a user implemented in a recommendation system including a processor and a memory, the method comprising the steps of: (a) calculating a user profile based on a plurality of previous ratings provided by the user; (b) calculating an item profile based on a plurality of previous ratings provided by a plurality of other users; (c) calculating an estimated rating for an item not rated by the user; and (d) calculating a recommendation of one or more items to said user based on said user profile, said item profile, and said estimated rating. According to one embodiment, the recommendation system includes a tendencies based collaborative filtering method. In a particular embodiment, the Tendencies Based Collaborative Filtering Method (TBCFM) is based on the user mean, the item mean, the user tendency calculated using a User Tendency Calculation Method (UTCM); and the item tendency calculated using a Item Tendency Calculation Method (ITCM).
Disclosed embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.
According to one embodiment, as shown in
Certain specific details are set forth in the following description and figures to provide a thorough understanding of various embodiments disclosed. Certain well-known details often associated with computing hardware, processing, and software technology are not set forth in the following disclosure to avoid unnecessarily obscuring the various disclosed embodiments. Further, those of ordinary skill in the relevant art will understand that they can practice other embodiments without one or more of the details described below. Aspects of the disclosed embodiments may be implemented in the general context of computer-executable instructions, such as program modules, being executed by a computer, computer server, or device containing a processor and memory. Generally, program modules include routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. Aspects of the disclosed embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote storage media including memory storage devices. Alternatively, according to an embodiment the methods disclosed can be implemented in hardware using an integrated microcontroller or FPGA device to create a recommendation apparatus or systems. Those skilled in the art will appreciate that, given the description of the modules comprising the disclosed embodiments provided in this specification, it is a routine matter to provide working systems that will work on a variety of known and commonly available technologies capable of incorporating the features described herein.
According to one embodiment, as shown in
According to one embodiment, as shown in
where vui is the rating of the user u for the item i and Iu is the set of items rated by user u. According to one embodiment, as shown in
where v.i is the item mean. Therefore, the user tendency is defined as the average difference between his/her ratings and the item mean.
According to one embodiment, as shown in
where Ui is the set of users who rated the item i. According to one embodiment, as shown in
where vu. is the user mean. Therefore, the item tendency is defined as the average difference between the item rating provided by a user and the user mean.
According to one particular embodiment, the Tendencies Based Collaborative Filtering Method (TBCFM) takes into account the user mean and the item mean as well as their respective tendencies when computing a prediction for a rating, named estimated rating 422.
The TBCFM is shown in
p
ui=max{vu.+τ.i,v.i+τu.} (1)
If the user tendency and the item tendency are both less than zero then the estimated rating is calculated using the Formula 2 414:
p
ui=min{vu.τ.i,v.i+τu.} (2)
If the user tendency is less than zero and the item tendency is greater or equal than zero and the item mean is greater or equal than the user mean then the estimated rating is calculated using the Formula 3 416:
p
ui=min{max{vu.,(v.i+τu.)β+(vu.+τ.i)(1−β)},v.i} (3)
where β is a parameter that controls the contribution of the item and user mean. Possible values range from 0 to 1. In a particular embodiment, and without limitation, β is set to 0.6.
If the user tendency is greater or equal than zero and the item tendency is less than zero and the item mean is less than the user mean then the estimated rating is calculated using the Formula 4 418:
p
ui=min{max{v.i,(v.i+τu.)β+(vu.+τ.i)(1−β)},vu.} (4)
Otherwise, the estimated rating is calculated using the Formula 5 420:
p
ui
=v
.i
β+v
u.(1−β) (5)
According to another embodiment,
Disclosed embodiments can be implemented in a recommendation apparatus or system for recommending one or more items among a plurality of items to a user. Such apparatus or system comprises: (a) a memory to store a plurality of ratings; and (b) a processor configured for (1) calculating a user profile based on a plurality of previous ratings provided by the user; (2) calculating an item profile based on a plurality of previous ratings provided by a plurality of other users; (3) calculating an estimated rating for an item not rated by the user; and (4) calculating a recommendation of one or more items to the user based on the user profile, the item profile, and the estimated rating. Furthermore, the processor can be configured to execute any of the above described methods, transforming it to a particular and specific recommendation machine that solves a particular and specific problem, namely, a machine for providing recommendations. In particular embodiments, and without limitation, the calculating a user profile is based on computing a mean of the previous ratings provided by the user; calculating an item profile is based on computing a mean of the previous ratings provided by the plurality of other users; calculating an user profile comprises computing a tendency of the previous ratings provided by the user; the calculating an item profile comprises computing a tendency of the previous ratings provided by the plurality of other users; the tendency of the previous ratings provided by the plurality of other users is calculated using a User Tendency Calculation Method (UTCM); the tendency of the previous ratings provided by the plurality of other users is calculated using an Item Tendency Calculation Method (ITCM); and the calculating an estimated rating for an item not rated by the user is calculated using a Tendencies Based Collaborative Filtering Method (TBCFM). Furthermore, in particular embodiments, the recommendation of one or more items to the user comprises the steps of: (a) calculating a plurality of estimated ratings for a plurality of items not rated by the user; (b) sorting the items not rated by the user according to the estimated ratings; and (c) selecting and reporting the items with the estimated rating above a threshold. More specifically, and without limitation, acceding to one particular embodiment the TBCFM comprises (a) calculating a user rating pui, based on a user mean vu., a user tendency τu., an item mean v.i, and an item tendency τ.i, according to pui=max {vu.+τ.i,v.i+τu.} if the user tendency and the item tendency are both greater or equal than zero; (b) calculating the user rating as pui=min {vu.+τ.i,v.i+τu.} if the user tendency and the item tendency are both less than zero; (c) calculating the user rating as pui=min {max{vu., (v.i+τu.)β+(vu.+τ.i)(1−β)}, v.i} if the user tendency is less than zero and the item tendency is greater or equal than zero and the item mean is greater or equal than the user mean; and (d) calculating a user rating as pui=min {max{v.i, (v.i+τu.)β+(vu.+τ.i)(1−β)}, vu.} if the user tendency is greater or equal than zero and the item tendency is less than zero and the item mean is less than the user mean or as pui=v.iβ+vu.(1−β) in all other cases.
Similarly, the disclosed methods can be embodied in a non-transitory computer-readable storage medium with an executable program stored thereon to implement a recommendation system, wherein the executable program instructs an apparatus to perform the following steps: (a) calculating a user profile based on a plurality of previous ratings provided by the user; (b) calculating an item profile based on a plurality of previous ratings provided by a plurality of other users; (c) calculating an estimated rating for an item not rated by the user; and (d) calculating a recommendation of one or more items to the user based on the user profile, the item profile, and the estimated rating. The specific methods described above can be incorporated and embodied in this non-transitory computer-readable storage medium.
While particular embodiments have been described, it is understood that, after learning the teachings contained in this disclosure, modifications and generalizations will be apparent to those skilled in the art without departing from the spirit of the disclosed embodiments. It is noted that the foregoing examples have been provided merely for the purpose of explanation and are in no way to be construed as limiting. While the method, apparatus, and system has been described with reference to various embodiments, it is understood that the words which have been used herein are words of description and illustration, rather than words of limitations. Further, although the system has been described herein with reference to particular means, materials and embodiments, the actual embodiments are not intended to be limited to the particulars disclosed herein; rather, the system extends to all functionally equivalent structures, methods and uses, such as are within the scope of the appended claims. Those skilled in the art, having the benefit of the teachings of this specification, may effect numerous modifications thereto and changes may be made without departing from the scope and spirit of the disclosed embodiments in its aspects.
According to one particular embodiment, the Tendencies Based Collaborative Filtering Method was evaluated using two datasets in order to demonstrate its utility and performance compared to other methods. The databases used were MovieLens and Netflix. These are the most popular datasets used by researchers and developers in the field of collaborative filtering.
The experiments were performed by dividing the dataset into two groups, a training subset and an evaluation subset. The first set corresponds to data the method already knows, that is, the data used to train the method model. With such information, the method computes the recommendation that will be later compared with the original data present in the evaluation subset. Two different approaches have been followed for selecting the training subset. In the first, it is constructed from a percentage of the available ratings, randomly chosen. For the tests all the following percentages were used: 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%. The evaluation subset was composed by randomly selecting 10% of the dataset. The ratings that appear in the evaluation subset were never included in the training subset. With high percentages of ratings in the training set, the behavior of the method under relatively high density conditions can be evaluated. In contrast, a small percentage allows evaluating the method under sparsity conditions, common in the initial phases, or in domains with a large number of users and/or items. In fact, given that many systems operate under these conditions, a second strategy was used to evaluate methods under sparsity conditions. It consists in selecting as training set, a fixed number, generally low, of ratings from each user: 1, 2, 3, 4, 5, 7, 10, 12, 15, and 20 ratings. This strategy is named Given-N.
The two most common tasks of recommender systems were evaluated: the prediction of the item rating and the recommendation of a certain number of items (5, 10, 15, 25, 50, 100, and 150 items were considered). In this second case, there is a challenge inherent to the use of offline datasets: real ratings are not available for most items. In these cases, small errors related to the items rated, like including an item with low rating or leaving out one with a high score, greatly affect the final result. To minimize this problem, in the evaluation, the method was forced to recommend items that have a rating in the evaluation set. Therefore, the final list will consist of N items that were already rated. In the task of annotation in context, this problem is smaller. As long as there is a significant amount of evaluation data, the results can be extrapolated to all the items.
A set of methods representative of the different techniques found in the literature was evaluated for comparison. These include:
All have been compared with the Tendencies Based Collaborative Filtering Method represented as TB in the figures.
First, the evolution of the accuracy and coverage of the different methods was studied, as the percentage of data used as training set was changed. The results for MovieLens dataset are shown in
Similarly, as the density of the information increases, a slight decrease in the differences among methods is observed. Most of them present similar results under relatively high density conditions, while their differences are accentuated as density diminishes. In fact, for MovieLens dataset and with a training set of 80%, there is no statistical significance in the MAE differences among the six best methods (UB, RSVD2, SVD++, RSVD, SO and TB), while with 10%, only the three best methods (RSVD2, SVD++, TB) present equivalent results. The same conclusion can be reached from results using the Netflix dataset. With 80% training set, no statistically significant differences exist among the best methods (RSVD, RSVD2, SVD++, SO and TB), while with 10%, SVD++ presents the best results, followed by RSVD2, NSDV2, RSVD and TB.
The tests under sparsity environments are good indicators of the ability of methods to extract more information from the rating matrix. As stated previously, the Given-N strategy was used to evaluate the methods under such conditions. Results for MovieLens and Netflix datasets are presented in
Comparing
As seen in
It is with computational efficiency where the tendencies based method obtains the best results. This is very important for recommendation systems that require real-time recommendations. In
Complexity has been separated into two parts, training and prediction, corresponding to the construction of the model using the training data and the computation of a single prediction, respectively. Training only needs to be performed once, while generally, a large number of predictions will be made. Thus, a low complexity when making a prediction can compensate an elevated complexity in training. In general, model-based methods are more efficient when computing a prediction, although the construction of the model has a considerable complexity. The tendencies based method stands out by far as the most efficient, with a complexity of O(mn) in training and of O(1) in the prediction.
These theoretical results are confirmed by studying the execution times.
This application claims the benefit of U.S. Provisional Application No. 61/492,206 filed on 2011 Jun. 1 by the present inventors, which is incorporated herein by reference.
| Number | Date | Country | |
|---|---|---|---|
| 61492206 | Jun 2011 | US |