This invention relates generally to techniques to network people around the content that they jointly value by first analyzing and leveraging pre-existing curated collections of links to documents, such as the favorite photos of individuals on photo sights, the citations within documents within medical journal sites or any other hypermedia database.
Document: is used in the general sense and may refer to any sort of information content including a photo, video, text or person identity.
Like: is used in the general sense and refers to a positive assessment
Dislike: is used in the general sense and refers to a negative assessment
Curation: refers to a collection of links to documents. For example, favorite photos, favorite photo within a particular category, citations within documents as these represent as these represent a set of curated documents centered around the subject matter of the document containing the citations.
What enables Google's PageRank to rank documents by relevance is the fact that documents are networked by hyperlinks that signal relevance and trust. However, there are two limitations to Google's Page Rank that relate to this invention. First, most new content does not contain hyperlinks (e.g., images, music and video), so there is no hyperlink-based network to leverage. Second, PageRank is most effective at producing a global rank vector, that is generally independent of any individuals personal tastes. While PageRank is very effective for globally accepted objective information, such as “what is a polar vortex”, it is ineffective in the subjective space, such as “what is the best treatment for my form of breast cancer”, because that answer is subject to the various competing points of view of the researchers in the evolving field of breast cancer.
For this reason, Facebook is now the main solution people use to discover more personalized content. It works because a person's friends and associates are more likely to share share their interests, so they can all pool their efforts in mining the Web for content they find more personally appealing. However, a person's social connections are only loosely related to one's personal interests. Thus, a Facebook users' network feed tends to be noisy and dominated by content that is trending within their social circles. Further, a social network is ineffective when one's interest is to exploring the views of alternative and competing thought leaders in a particular space.
One aspect of this inventions is taking advantage of the prevalence of curated links to documents of interest on the internet to network people around the content and to allow these people to explore subject areas from the points of view of various and competing thought leaders. The method is independent of document content, so it applies to all media types. Further, the method does not depend on any sort of metadata to define a curated area of interest, such as tags or keywords. Curated areas of interest are simply defined by the links contained within them. Given that even the cited articles within a document identify a curation around the subject matter of the document, any hyperlinked database of documents can seed the discovery network. Thus. the method provides a general technique that can network individuals around even the most niche areas of interest, over the entire range of content types available on the internet, whether that content is embodied in a linked document or a media type without links (audio, images, video), making the method potentially more widely applicable than PageRank.
In contrast to social networks like Facebook, the method gives networked users direct control over the type of content that flows to them and automatically connects them with individual around the world that share their interests or the points of view that the user wishes to explore. This is again in contrast to social networks, that require users to connect with people they know.
At step 101 pre-existing curated links to documents are identified. These may be the favorite videos of individuals on Youtube or articles cited within the articles of medical journals, as this collection of cited articles is considered a curation around the subject matter of the document citing the articles.
At step 102 each of these curations is analyzed using a collaborative filtering method. Each document is associated with a related-document vector. An example primitive collaborative filtering method would be:
related-document-vector(src_id, related_id)=log ii(totalLikes(related_id))*(common_likes(src_id, related_id)/totalLikes(related_id))
Where:
At Step 103 the networked users of the system define one or more personal curations. These are used by the user to create topical areas of interests, or competing view points they wish to explore. They then associated one or more of the preprocessed linked documents identified in steps 101 and 102, with each of their personal curations. Further, as users “drop” these documents into a personal curation they also identify if they “like” or “dislike” the document.
At Step 104 aggregate “like” and “dislike” vectors are created for each personal curation, by combining the individual “like” and “dislike” related-document vectors, respectively. This needs to be done is such a way that normalizes the effect of popular linked documents relative to less-popular linked documents. For example, when aggregating all the “liked” documents for curation j, then one could interleave the individual “like” related-document-vectors:
related_id=related-document-vector-at-position(src_id, position)
score=related-document-vector(src_id, related_id)*pow(E, maxInterleaves−position)
aggregate_like(related_id)=aggregate_like(related_id)+score
WHERE:
At Step 105, the related-personal-curations vectors are created for each personal curation. Curations that are more closely related to one another will have a higher relationship score. This analysis is performed as follows:
Once the related-personal-curations vectors are computed, various aspects of this invention may be presented to users. This is encapsulated in step 106.
At Step 106, one embodiment would be to present a realtime stream of content based on the “likes” being dropped by other users into the so identified related curations identified in step 105. This parallels the behavior of prior-art social networks. To obtain a realtime stream, this step ranks activity occurring in time ranges that doubles in length at every quantum. For example, all activity within:
Because exponentially more assessments are used to rank the content as one moves down the stream, the streams quality improves with age. That is, the most relevant documents percolate up as time progresses, naturally preserving the most valuable information over time.
This application claims priority from U.S. provisional patent Application No. 62/292,869 Filed Feb. 9, 2016
Number | Date | Country | |
---|---|---|---|
62292869 | Feb 2016 | US |