The present invention relates to Recommender systems in social networks. More particularly, it relates a Bayesian inference based Recommender system for social networks.
Recommender systems enable users to quickly locate the interested items and have become an important tool to overcome the information overload problem. Recommender systems are usually classified into three categories: 1) content based recommendations; 2) Collaborative Filtering (CF) recommendations; and 3) hybrid approaches. Content-based recommendations rely on content descriptions, and match the content with the users preference. Collaborative Filtering (CF) recommendations use the opinions of a set of users to predict another user's interest. The recommendation scheme of the present invention falls into CF recommendation category.
Online social networks, such as FACEBOOK™, TWITTER™, YOUTUBE™, have become extremely popular and attract millions of users. Online social networks provide users novel ways to communicate, to share information, and to build virtual communities. Findings in the sociology and psychology fields indicate that human beings tend to associate and bond with similar others, so called homophily. Using social networks to improve CF recommendations is an ongoing effort.
Trust has been identified as an effective means to utilize social networks to improve recommendations' quality. Empirical studies have shown the correlation between trust and user similarity. Various techniques have been proposed to incorporate the trust into CF approaches. Typically, trust is a numerical value with larger value indicating higher level of trust.
Finally, a Bayesian approach has been employed in the past in recommendation systems. In one example, a Bayesian network is formed with a node corresponding to an item. The states of nodes correspond to the possible rating values. A learning algorithm searches over various model structures and converges to the best one.
According to an embodiment of the present invention, a node in Bayesian network is a user, where the structure of Bayesian network is governed by the underlying social network. As is known, social networks already have similar users connected to each other. The recommendation method of the present invention can be carried out in a distributed fashion thus being more scalable than known systems.
The Bayesian inference based recommendation, as described herein, uses conditional probability distribution to capture the similarity between users. Probability distribution carries rich information, and allows the present invention to employ Bayesian network inference to conduct multiple-hop inferences.
According to an implementation, the method for providing recommendations in a social network; includes the steps of receiving a query message from a querying initiator (QI) for content rating, forwarding query message to QI's neighbors, receiving recommendations ratings from QI's neighbors and/or neighbors friends, receiving conditional probability of distribution of recommendation ratings given by recommenders rooted at a friend, and constructing a QI rating response using a Bayesian Inference Network.
In a further implementation, the method for providing recommendations in a social network; includes the steps of determining whether the QI has rated requested content previously in response to the received query message, returning to the QI their rating when it has been determined the QI has previously rated the requested content; and terminating query message from QI.
These and other aspects, features and advantages of the present principles will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.
The present principles may be better understood in accordance with the following exemplary figures, in which:
The present principles are directed to recommendation systems, and more specifically to a Bayesian inference based recommendation system.
It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the present invention and are included within its spirit and scope.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the present invention and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the present invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the present invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The present invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
Reference in the specification to “one embodiment” or “an embodiment” of the present invention, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
As shown in
The key problem in this example is how to estimate John's rating with low estimation errors. One naive approach is to recommend to John the average of his friends' ratings. The underlying assumption of this approach is that his friends' ratings are equally important to John. In reality, John's movies taste may be “closer” to that of Bob than to Amy and Casey. Intuitively, Bob's rating should carry more weight when estimating John's rating. One can introduce a “trust value” between friends and use the trust-weighted sum as the recommendation. However, it is challenging to represent the rating closeness between friends using one numerical value. In addition, it is difficult to propagate the trust values through a social network. A more refined approach is to look into the pairwise movie rating correlation between John and his friends in the past, and infer the “most probable” rating of John for the current movie. More specifically, if John and Amy have watched and rated a common set of movies in the past, the rating correlation between them can be measured by a set of conditional probabilities P (RJ|RA) and P (RA|RJ), where RJ and RA are the random variables representing the ratings of John and Amy respectively. If Amy gives a score rA for the new release, based on the rating history, the most probable rating of John for the movie can be estimated as
Similarly, based on Bob and Casey's ratings, we can calculate two more estimates of John's rating and {tilde over (R)}j(rB) and {tilde over (R)}j(rC). Again we need to compute one recommendation for John by reconciling three different estimates. Ideally, we want to calculate the most probable rating of John conditional on three of his friends' ratings:
where we introduce the abbreviation r{•} for the event R{•}=r{•}. Unfortunately, it is hard to estimate the joint conditional distribution between John and all his friends. Instead, an embodiment of the present invention resorts to Bayesian network to reconstruct the joint conditional distribution based on the marginal conditional distributions between John and each of his friends. More specifically, the present invention construct a one-level Bayesian tree between John and his friends as follows: a) John is the root of the tree; b) each of John's friends is a direct child of John in the tree. Following the definition of Bayesian network, we essentially assume John's friends' ratings are independent with each other conditional on John's rating: {RA⊥RB⊥RC|RJ}. In other words, we assume the rating discrepancies between John and his friends are independent. Under this assumption, we have:
P(rA,rB,rC|rJ)=P(rA|rJ)P(rB|rJ)P(rc|rJ)
Following the Bayesian rule, we can calculate the conditional probability of John's rating based on his friends' ratings as:
In more general settings, when a user wants a recommendation rating for movies, it may
happen that none of his direct friends have watched/rated the movie. In order to increase the chance of recommendation, an embodiment of the present invention allows a user to propagate his query through the social network and collect ratings from indirect friends who are several social hops away. Based on the collected ratings, we can construct a multi-level Bayesian inference network to estimate the most probable rating for the querying user.
More specifically, the distributed recommendation system of an embodiment of the present invention works in the following framework.
Let SεV be the user sending out the recommendation rating query for a movie. Let L*={LiεV, i=1, 2, . . . , k} be the indexed set of direct and remote friends of S who responds to the query. Social networks normally have rich connectivity. In an effort to avoid redundant query responses, we generate a unique id for each query. Each user in L*only responds to a query when it receives the query for the first time. We further assume that a query response will be returned to S along the reverse path of the query propagation path. As a result, the union of all query response paths is the shortest path tree from all recommenders in L* to the common root S.
In order to work with ratings from direct and remote friends, an embodiment of the invention resorts to general Bayesian network to calculate the most probable rating at the root. Thus, a Bayesian inference network is constructed out of the recommendation propagation tree.
Each user takes its next-hop node on the shortest path to the root as its only parent in the Bayesian inference network. In other words, we assume that a user's rating is independent of the ratings of non-descendant users in the propagation tree conditional on its parent's rating. Using the constructed Bayesian inference network, one can calculate the conditional probability of Pr(Rs=s|RL
For any node m in the propagation tree, we define Cm as the set of its direct children. We further define Dm as the set of recommenders in the subtree rooted at m:
Recommendations generated by users in Dm are relayed by node m to the root. Obviously, we have Ds=L*. Due to the query forwarding protocol, if m itself is a recommender, it will not forward a query to its friends. Consequently, it will not forward recommendations generated by other recommenders, i.e., Dm={m}, if, mεL*
We define the probabilistic event of recommender Li gives rating ri as
Then the event of the joint ratings of all recommenders under m can be composed as
Our goal is to estimate P(RS=s|Ψs). Following the Bayesian rule, we have
It is sufficient to calculate P(Ψs|Rs=s). By definition, we have Ds=∪cεC
where the last equivalence is established by the conditional independence in the Bayesian network. If child c is a recommender, i.e., cεL*, then Dc={c}, and P(Ψc|RS=s)=P(Rc=rc|RS=s), which can be directly obtained from the conditional probability between c and its parent S. If c is a relay node, i.e., cεL* then we have
where the last equivalence is established by the independence of the ratings of c's descendants in the Bayesian network with S conditional on c's rating. The last term P(RC=i|RS=s) in the last entry is readily available from the marginal conditional distributions between c and S. The first term Σi=inP(Ψc|{Rc=i}) can be recursively calculated following the similar process by going up one level in the Bayesian network.
According to an embodiment of the invention, two recommendation metrics are employed, the most probable recommendation ŜMP, and Bayes Mean Square Error estimator, ŜMSE. Specifically,
where N is the highest rating. Bayes MSE is also Minimum Mean Square Error estimator.
According to an embodiment of the invention, the recommendation social network agent comprises three key components:
Below is a description of an implementation of the learning module and the protocol employed in the recommendation engine, according to an embodiment of the invention.
Individual users are required to estimate their own recommendation rating probability distribution, and their direct friends' recommendation rating probability conditioned on their own ratings. Accordingly, learning module maintains a database that stores the following counters: {ni} and {njT,i}. ni is the number of occurrences of rating i made by local user, and nj,iT is the number of occurrences of rating j made by friend T while local user gives rating i, where i, j=1, 2, - - - , N, and N is the highest rating. A frequency based approach is then used to estimate the distribution. P(i)=ni/Σi=1Nni, and. PT(j|i)=nj,iT/Σj=1Nnj,iT. Note that the frequency based estimation is maximum likelihood estimator (MLE).
How to update these counters are described next. All counters are initialized to be zero. When user S makes a recommendation of {movie_id,s} through user-recommendation engine interface, the learning module stores the recommendation pair into the recommendation table. The counter ns is increased by one. The learning module then sends {S, movie_id, s} to all direct friends. Assume the friend T receives the message. T's learning module conducts a lookup service at its recommendation table, and determines if T has already rated the same movie. If T never rated the movie, the message is discarded without further action. If T rated the movie with rating t, the learning module of user T increases the counter ns JS by one. In addition, T sends back a message of (T, movie_id, t) to user S. The purpose of this message is to allow user S to update its conditional counter accordingly.
Inference Recommendation by Querying Social Network Assume user S queries the recommendation engine. The recommendation engine sends the query messages to neighbors over the social network. A query message is 3-tuple of [sequence-id, movie-id, TTL]. The sequence-id is a unique id identifying this query message. For instance, the id can be created by hashing the movie-id together with querying user's own social network id. The movie-id identifies the interested movie for which the recommendation is sought after. TTL, or time-to-live, defines the search scope of the query. If the query message reaches the TTL (i.e., the TTL times out), the query message is dropped without further relaying.
Upon receiving a query message, a user first checks if he/she has already rated the interested movie. If yes, the movie rating is returned back to the user from which the query message is received. The query message is dropped without further forwarding. If the receiving user has not rated the movie, and the number of forwarding times of the query message is less than the TTL (time-to-live), the query message is forwarded to receiving user's neighbors, except the one from which the query message is received. Finally, if the query message reaches the TTL, it will be dropped. The receiving user sends a DROP message back to the user from which the query is received, indicating that the query has been dropped without recommendation.
The social network may have loops, and a user may receive the same query message from different neighbors. To simplify the inference, the receiving user only responds to the first query message, and sends DROP messages to all other sending users.
If a user is an intermediate user, he/she starts to wait for the responses after forwarding the query message to the direct friends. One response shall be received from each friend to which the query message has been forwarded. The response falls into three categories: a DROP message, a recommendation rating given by the friend, or the conditional probability distributions of the recommendation ratings given by the recommenders rooted at the friend, and conditioned on the N possible ratings of the friend. Once the user receives all responses, he/she starts to construct his/her own response, the conditional probability distributions of all received recommendation ratings conditioned on his/her possible N ratings, using the algorithm as described in the above discussion of the Bayesian inference network. The response is then delivered to the user from which the query message is received. Finally, after user S receives all responses, he/she computes the personalized recommendation rating as described above (Section 2.2.3 and Section 2.2.4.)
Coping with Cold Start and Rating Sparseness Using Maximum a Posteriori (Map) Estimation
Learning modules use the common ratings to construct probability distributions. Frequency based approach is used in probability estimation. The frequency based approach can be shown as maximum likelihood estimator (MLE). The estimation may be inaccurate, however, due to the small number of common ratings. In addition, a newly joined user has not made any ratings, thus friends cannot construct conditional probabilities against the new user. These so-called rating sparseness issue and cold start issue have baffled collaborative filtering (CF) based recommendation approaches.
Social network offers a new venue that allows users to help tackle the above issues. In the context of social network, a new user is likely to know his/her friends, and have general feelings/opinions about them. A new user can give a guideline on how the conditional probability distribution should look like. For instance, the new user may indicate to the recommendation engine that with likelihood of eight (in the range from one to ten), his rating is the same as a neighbor, and with likelihood of five, the recommendation rating is probably off by one, and so on and so forth. With this information, the recommendation engine can construct informative prior distributions for a new user's neighbors. Similarly, the neighbors of a new user can apply the same principle and construct the informative prior distributions for the new user. The prior distribution attributes uncertainty to the probability distribution. As the actual recommendation ratings accumulates over the time, the prior distribution is rectified by the actual ratings to reflect. the true similarity between two users. Below we describe a MAP based probability estimation.
Let p=[p1, p2, . . . , pn] be an n-state discrete probity density function (p.d.f.) and x={xi}, i=1, 2, . . . n be the number of observed samples for each state. If g(p) is the prior distribution of p, the MAP estimate, pMAP, is defined as the distribution that maximizes the posterior p.d.f. of p.
where f(x|p) is the probability of x given p.
We select the Dirichlet distribution as the prior distribution. The Dirichlet distribution is conjugate to discrete probability distribution which simplifies the derivation. The Dirichlet distribution is defined as
where αi>0, pi>0, Σi=1n-1pi<1, pn=1−Σi=1n-1pi, and Z is normalization constant,
is gamma function). The Dirichlet distribution defines the distribution of p given parameters {αi}. Therefore
Interestingly, at the beginning with no observations/samples,
Parameters {αi} shall be set based on users' inputs. The value of Σi=1n(αi−1) plays an important role in determining how fast the impact of prior distribution diminishes as the number of available samples increases.
The parameters of Prior Dirichlet Distribution parameters, fed have to be set in such a way that: (i) they reflect the confidence level of a user toward the neighbor in terms of the similarity of their taste/opinion towards movies; and (ii) the impact of Prior distribution should diminish as more samples are collected.
The confidence level of a user towards the neighbors can be collected using a simple GUI interface as depicted in
Now we describe how to set the Dirichlet parameters. Suppose user X has a neighbor user Y. User X and user Y's recommendation ratings are denoted by x and y, respectively. User X sets up the parameters of N conditional probability distributions,
p(y=h|x=l), l, h=1, 2, . . . , N, as follows:
Except for c0, confidence levels are divided by two in computing probability because offset by i stars can either be greater than the conditional value l by i stars, or smaller than the conditional value l by i stars.
The parameters of Prior Dirichlet distribution for p(y|x=l). πhl=K·p(y=h|x=l)+1, where K=Σh=lN(αhl−1), are denoted by {αil, i=1, 2, . . . , N}. The value of K is selected to properly reflect the influence of Prior distribution vs. actual samples.
For performance evaluation of the method of the present invention, we select the MovieLens dataset with about 1 million ratings for 3900 movies by 6040 users. The same dataset has been used in other studies, and the number of ratings, movies, and users are manageable yet rich enough to evaluate our algorithm. The ratings are on the numerical five start scale, where one and two stars represent negative ratings, three stars represents ambivalence rating, and four, five stars represent positive ratings. Unless indicated otherwise, the dataset. is divided into two parts: first. 70% of the user ratings is used for training purpose—to construct the social network and to estimate the rating probability and conditional probability distributions at individual users. The leftover 30% data is used for testing/evaluation
An artificial social network is constructed using the training data set. Sociology and psychology study show that the similar users have the tendency to associate with each other. We use the Pearson correlation coefficient to measure two users' similarity. Pearson correlation coefficient, between user u and user w is:
where IC is set of common ratings. Each user chooses to connect with ten users as his/her direct friend. These ten users are selected from a pool of 500 users (randomly selected out of the entire user population) with the highest Pearson correlation coefficient. Since a user may also be selected by other users as friend, the average number of friends, or the degree of a user in social network, is twenty. In addition, if the number of common ratings of two users is less than 20, these two users won't be friends in order to avoid randomness.
In the following experiments we use most probable (MP) rating in selecting the recommendation rating, and the mean absolute error (MAE) to measure the recommendation accuracy. To avoid the computational error introduce by zero probability, we smooth the distribution by adding a small probability to all discrete probability. Using the Most Probable (MP) recommendation described in Equation (2) above, a recommendation has an associated probability. We set a probability threshold and only the recommendations whose probability is greater than the threshold are presented to the querying user. A recommendation with low probability is not trustworthy. As to be shown in the following experiment, the MAE of recommendations improves as the probability threshold increases.
In this experiment, we also look into the impact of dynamic probability learning. Specifically, the beginning 70% of the data set is used to training the probability distribution. Once the simulation of the remaining 30% testing data set starts, we allow the users to continuously update the probability distributions.
Hybrid recommendation searching scheme: Note that the recommendation quality improves as the searching scope decreases. We thus propose the hybrid recommendation searching scheme. Hybrid recommendation searching scheme looks for the closest recommenders possible, and increases the searching scope only if no closer recommenders can be found. One simple way to implement this scheme is to start with TTL=1, and increases TTL by one only if no recommendation can be find in the previous round. The query process stops once a personalized recommendation is found.
Comparison with Collaborative Filtering
There are many CF algorithms. We use the user-to-user nearest neighbor prediction algorithm based on Pearson Correlation. The recommendation is based on N nearest neighbors, in terms of Pearson correlation, from the entire user population. The number of recommendations received is shown in
We can see from
The forwarding of QI's query message (910) is performed for a predetermined period of time defined by TTL, and a determination is made whether the TTL has timed out at step 912. If the TTL has timed out, the neighbor sends the QI a DROP message without a recommendation (914). If the TTL has not timed out, the receiving users (i.e. QI's neighbors) send to the QI: 1) recommendation ratings from friends of neighbor; and 2) the conditional probability of distribution of the recommendation ratings given by recommenders rooted at the user's (i.e., QI's neighbors) friends (916). A determination is then made whether or not the QI has received all the responses (918). If not, the system loops to wait for all responses. If all responses have been received, QI's recommendation query engine constructs its own rating response using the Bayesian inference network (920). Once received, the user's recommendation query engine will deliver the calculated rating response to the QI (922).
These and other features and advantages of the present principles may be readily ascertained by one of ordinary skill in the pertinent art based on the teachings herein. It is to be understood that the teachings of the present principles may be implemented in various forms of hardware, software, firmware, special purpose processors, or combinations thereof.
Most preferably, the teachings of the present principles are implemented as a combination of hardware and software. Moreover, the software may be implemented as an application program tangibly embodied on a program storage unit. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPU”), a random access memory (“RAM”), and input/output (“I/O”) interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit.
It is to be further understood that, because some of the constituent system components and methods depicted in the accompanying drawings are preferably implemented in software, the actual connections between the system components or the process function blocks may differ depending upon the manner in which the present principles are programmed. Given the teachings herein, one of ordinary skill in the pertinent art will be able to contemplate these and similar implementations or configurations of the present principles.
Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present principles is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present principles. All such changes and modifications are intended to be included within the scope of the present principles as set forth in the appended claims.
This application claims the benefit of U.S. Provisional Application Ser. No. 61/343,158 filed Apr. 23, 2010, which is incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/00690 | 4/18/2011 | WO | 00 | 10/22/2012 |
Number | Date | Country | |
---|---|---|---|
61343158 | Apr 2010 | US |