The present invention relates generally to the fields of data processing and information technology. More specifically, embodiments of the present invention relate to a service for selecting and propagating content and/or metadata to client device, which applications include selecting and propagating user created content via the World Wide Web (WWW).
With advances in computing, networking and related technologies, more and more computing devices are networked together, with more and more content available to the networked computing users. For example, billions of content pages/objects are available on the WWW for Internet users. However, publication and propagation of contents in a relevant manner, that is publishing and propagating content to those would be interested, remain a challenge.
For example, social networks on the Internet have become very popular in recent years. Social networks typically consist of two main elements: 1) users; and 2) the content within the network, such as home pages and images, that the users come to the network to view. For a network to become successful, it must attract users who will both produce and consume content. In the social networks that exist today, content is typically produced (i.e. published) by users using a traditional publishing approach. That is, when a user has something he or she decides to share, the user uses the social network system to create (publish) the content—for example by writing a blog entry, by uploading an image, or by rearranging his or her home page. This set of explicit actions lets a user construct a representation, available for others to view, of his or her personality and interests, or persona. This approach allows for the display of a breadth of content, but it requires users to actively update their content in order to maintain the interest of viewers. Because updating content is labor-intensive for the publisher, sites typically have a very large difference between the number of people viewing and the number of people creating content, sometimes as much as 100:1. This means that the social network system must attract a very large number of people in order to have enough actively changing content to generate repeat traffic. Typically such social network systems have a large number of publishers who create an initial page and then rarely or never update it. Likewise, the abandonment rate of viewers is also often high. Viewers must be dedicated in order to find new and interesting content. Thus, increased automation in content publication and propagation in a relevant manner would be desirable.
There are a number of websites, most notably Amazon and Netflix, as well as startups such as Findory, that provide recommendation systems. These look at historical purchases people have made, or content they have viewed, and from them construct suggestions for additional purchases or information. These systems often use a cosine similarities algorithm.
For the distribution of user created content, e.g. in the context of a social network, the simple approach of using cosine similarities algorithm doesn't work well. The distribution of user created content involves a large number of discrete content items, little of which actually gets purchased, much of which is not catalogued in detail, and much of which is not viewed frequently.
Embodiments of the present invention will be described by way of exemplary embodiments, but not limitations, illustrated in the accompanying drawings in which like references denote similar elements, and in which:
Illustrative embodiments of the present invention include, but are not limited to, methods and apparatuses for receiving from client devices automatically collected user activities associated data, and for selecting and propagating content and/or metadata back the client devices in a more efficient, flexible and effective (with high relevancy) manner. The methods and apparatuses having particular application to selection and propagation of relevant user created content in a social network.
Various aspects of the illustrative embodiments will be described using terms commonly employed by those skilled in the art to convey the substance of their work to others skilled in the art. However, it will be apparent to those skilled in the art that alternate embodiments may be practiced with only some of the described aspects. For purposes of explanation, specific numbers, materials, and configurations are set forth in order to provide a thorough understanding of the illustrative embodiments. However, it will be apparent to one skilled in the art that alternate embodiments may be practiced without the specific details. In other instances, well-known features are omitted or simplified in order not to obscure the illustrative embodiments.
Further, various operations will be described as multiple discrete operations, in turn, in a manner that is most helpful in understanding the illustrative embodiments; however, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations need not be performed in the order of presentation.
The phrase “in one embodiment” is used repeatedly. The phrase generally does not refer to the same embodiment; however, it may. The terms “comprising,” “having,” and “including” are synonymous, unless the context dictates otherwise. The phrase “A/B” means “A or B”. The phrase “A and/or B” means “(A), (B), or (A and B)”. The phrase “at least one of A, B and C” means “(A), (B), (C), (A and B), (A and C), (B and C) or (A, B and C)”. The phrase “(A) B” means “(B) or (A B)”, that is, A is optional.
Content/metadata selection and propagation service 104 may be implemented on a single central computer or a collection of servers, e.g. a cluster of locally networked servers, or a system of distributed servers coupled via one or more local/wide area networks. The various networks may comprise wired or wireless segments/domains.
The term “content/metadata” as used herein means content and/or metadata. Content may be commercial or non-commercial in nature, may be public or private, and may be text, graphics, video, audio or multi-media in form. Metadata may be a wide range of data describing technical and/or substantive attributes of the content. Accordingly, each of content/metadata providers may be any one of a wide range of such providers, including but not limited to a commercial or non-commercial website, a video and/or audio service, and so forth.
For the illustrated embodiments, each client device 102 may be endowed with at least a client data collection and management service 112, a client content/metadata selection and propagation service 114 and a client content presentation service 116. Services 112 and 114 may be configured complementarily to services 122 and 124. Various implementations of services 112, 114 and 116 are the subject matters of co-pending application entitled “Automated User Activity Associated Data Collection and Reporting for Content/Metadata Selection and Propagation Service”, having common inventorship with the subject application, and contemporaneously filed (application number to be assigned). For further details of services 112-116, readers are referred to the co-pending application.
Each of client devices 102 may be any one of a broad range of computing or processor based devices known in the art or to be developed, including but not limited to, desktop computers, notebook computers, palm-sized hand-held computing devices, personal digital assistants, smart phones, game consoles, set top boxes, and so forth.
Network 106 may comprise one or more wired and/or wireless, local and/or wide area networks.
Referring now to
Content message generation service 202 is configured to generate messages comprising content and/or metadata 208 for selection and propagation to the various client devices. Core pattern matching service 204 is configured to perform patterns detection for client devices 102, discerning patterns from reported user activities 210 on client devices, and/or relevancy between content and the client devices.
In various embodiments, core pattern matching service 204 performs the pattern detection and relevance determination for client devices, employing a number of pattern/relevance analysis algorithms 212. Pattern analysis algorithms 212 may be any one of such analysis algorithms known in the art or to be devised. Examples of these pattern/relevancy analysis algorithms 212 include but are not limited to cosine similarity algorithm, Bayesian network, and so forth. However, preferably the pattern/relevance analysis algorithms 212 complement each other, in that one pattern/relevance algorithm's strength compensate at least in part the weakness of another pattern/analysis relevance algorithm. For the embodiments, algorithms 212 are maintained and managed by core algorithm manager 206. In various embodiments, algorithm manager 206 also manages the algorithms to be employed for local pattern/relevance analysis on client devices 102 (see co-pending application for details).
In various embodiments, the messages 208 are propagated to the client devices based on their relevance to the various client devices. In various embodiments, the messages 208 propagated to each client device are locally merged with messages locally generated on the particular client device 102 and presented on the client devices 102 respectively (see copending application for further detail.)
Each of these elements performs its conventional functions known in the art. In particular, system memory 304 and mass storage 306 may be employed to store a working copy and a permanent copy of the programming instructions implementing, in whole or in part, services 122 and 124 (core services), including the various components illustrated in
The permanent copy of the programming instructions may be placed into permanent storage 406 in the factory, or in the field, through, for example, a distribution medium (not shown), such as a compact disc (CD), or through communication interface 410 (from a distribution server (not shown)). That is, one or more distribution media having an implementation of the agent program may be employed to distribute the agent and program various computing devices.
The constitution of these elements 302-312 are known, and accordingly will not be further described.
As alluded earlier, above described embodiments of the present invention may be practiced to providing relevant content to client devices in a social network, including content created by users of the client devices, thus enabling the social network to propagate and present to each user of the system a set of constantly changing content that the user will likely find interesting (relevant).
In various embodiments, the relevant content service may be designed such that additional relevance algorithms may be added at any time. Each relevance algorithm is given a unique identifier. The relevant content service stores the relevance weight that each relevance algorithm provides for the content that the relevant content service surfaces, and records the resulting clickthrough rates on that content. The relevant content service then back-propagates a score to the relevance algorithms that suggested the content, weighted by their relevance score. Thus, a relevance algorithm that gave high relevance to a piece of content that was clicked on will get a large bonus.
In various embodiments, the relevant content service uses these weights as the weighting score discussed previously. As a result, relevance algorithms that are most effective for a particular user will gain increasing influence in selecting content for that user.
Additionally, the relevant content service gives a score to the overall performance of each relevance algorithm across the entire set of users, and combines that score with the per-user score to determine actual weighting in the use of that algorithm for that particular user. This has the value of damping out spikes that might occur due to a very short term behavior pattern of a user. (E.g., the user might heavily click on one content base and overly highly weight a particular relevance algorithm.)
In various embodiments, the strength is a function of explicit statements such as ‘best friend’, as well as implicit voting based upon clickthroughs or other response activity. The strength of a connection drops with distance. Thus people a user knows will have a much stronger weight than people who are known only by people that the user knows. (For example, suppose user A knows user B. User B knows user C. User C knows user D. User A doesn't know user C or D. Suppose user B and user D have clicked on the content. The combined strength would be f(1)+f(3), where f is a distance function. Here, “1” represents the distance between user A and user B, and “3” represents the distance between user A and user D. {In this context, distance may also be referred to as “degree of removal”). The function f could be any one of a number of functions with an “inversely proportional” behavior. An example of such a function is 1/n2. In other words, the various embodiments assume that people in a social network have enough of a relationship that they will have some common interests or behaviors, but that this commonality drops off with distance (or degree of removal) in a non-linear fashion.
The above relationship-based approach provides one good source of information in constructing relevant content. However, the social network might not always be active, and it might not always be a good predictor. In various embodiments, the relevant content service enhances the accuracy of the prediction with a clickstream-based cosine similarities model,
In various embodiments, the relevant content service additionally looks at metadata associated with content the user has responded to select relevant content,
In various embodiments, the process of
In various embodiments, the relevant content service further employs a Bayesian system that analyzes a particular user's patterns to attempt to learn what might be useful to send them,
In various embodiments, the relevant content service may additionally inject (e.g. randomly or pseudo-randomly) a set of content that hasn't yet been clicked on, and for which there is therefore no response data about it, into the queue into a mix of locations (see e.g.
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a wide variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described, without departing from the scope of the embodiments of the present invention. This application is intended to cover any adaptations or variations of the embodiments discussed herein. Therefore, it is manifestly intended that the embodiments of the present invention be limited only by the claims and the equivalents thereof.
The present non-provisional application claims priority to provisional application No. 60/850,838, entitled Relevant Content Recommendation System, filed on Oct. 10, 2006.
Number | Date | Country | |
---|---|---|---|
60850838 | Oct 2006 | US |