Many types of document management systems are available for storing documents in document repositories. These document management systems include file management systems, collaboration systems, source code management systems, video library systems, electronic mail systems, voice mail systems, and so on that store documents in a document repository. Each of these systems typically allows the documents to be stored in the document repository in a hierarchical manner and allows metadata (e.g., filename and create date) to be stored along with the content of the documents. These systems provide features that are tailored to a specific application. For example, a file management system provided by an operating system provides basic features for creating, updating, and searching for documents. A collaboration system provides features to facilitate collaborative development of documents by a team. These features may include versioning, change tracking, document check out/check in, and so on.
These document management systems allow a large number of documents to be created, changed, and viewed. It is not uncommon for a document repository to contain millions of documents. Because of the sheer number of documents, it can be difficult for a user to identify which documents need the user's attention. For example, at an organization (e.g., company), an IT worker may be a member of and provide compliance oversight for multiple projects. The IT worker may need to review design documents, requirement documents, user instruction manuals, and so on for each project to ensure that they comply with the standards of the organization. The IT worker may also need to review and edit documents that set the standards for the organization. With current document management systems, the IT worker can search for documents that need to be reviewed in various ways. For example, the IT worker can search for documents by name, but the IT worker would need to already know what documents need to be reviewed. As another example, the IT worker can search for documents by edit date to identify the documents that have been recently edited and then view the content of the documents to see what documents need attention. A difficulty with such an approach is that hundreds of documents can be edited on a given day, so the list of documents can be long. Another difficulty is that some of the edits may be minor changes (e.g., correcting a typo) made by one person and not need the IT worker's review—so the IT worker may spend time checking documents unnecessarily. Also, the IT worker may not need to review a document when it is edited, but could defer the review until the document is actually needed by a team.
A system for ranking documents based on activity level is provided. In some embodiments, a document promotion system generates a view score for a document based on the number of times the document was viewed and a freshness score for the document based on when the document was last updated. The document promotion system then generates an activity score for the document based on the view score and the freshness score for the document. The activity score for a document represents the level of activity associated with the document. The document promotion system ranks documents based on their generated activity scores and provides the documents to a user in the order of the ranking.
A method and system for highlighting documents for user review based on the activity level of the documents is provided. In some embodiments, a document promotion system generates an activity score for each document indicating the level of user activity associated with the document. The user activities may include creating a document, editing a document, viewing a document, printing a document, archiving a document, and so on. The document promotion system may quantify different types of user activity as sub-scores to generate an activity score indicating the activity level of each document. The document promotion system may quantify the activity of viewing a document using a view score derived from the number of times the document was viewed. The document promotion system may assume that a user may be more interested in documents that have been viewed many times than those that have been viewed only a few times. The document promotion system may quantify the activity of accessing a document using a unique user score derived from the number of unique users who have accessed the document. The document promotion system may assume that a user may be more interested in documents that have been accessed by many different users than those accessed many times but only by a few users. The document promotion system may quantify the activity of updating the document using a freshness score derived from when the document was last updated. The document promotion system may assume that a user may be more interested in newly updated (e.g., created or changed) documents than those that have not been updated for a while. The document promotion system may generate the activity score for a document based on a combination of the sub-scores. The document promotion system may then rank the documents based on their activity scores and present those documents to a user based on their ranking. In this way, the document promotion system can automatically identify documents to present to a user that are more likely to be of interest to the user.
In some embodiments, the document promotion system may generate the activity score for a document based on a weighted combination of sub-scores. For example, the document promotion system may generate sub-scores for different types of activities in the range of 0 to 1, with 0 meaning a low level of activity and 1 meaning a high level of activity. The document promotion system may weight one sub-score more than another sub-score to reflect the effect of the type of activity on user interest in a document. In addition, the document promotion system may use different weights for different users. The weights for a user may be tuned by the user. So, for example, a user interested in tracking documents that are of interest to a wide range of users may weight the unique user score highly. The document promotion system may also learn the weights for each user using various machine learning techniques based on “click-through” data indicating which documents a user selected when presented with a list of ranked documents. For example, if a user tends to select documents that have been recently updated, then a machine learning technique may generate a fairly high weight for the freshness score.
In some embodiments, the document promotion system may factor in the recency of user activity when generating a sub-score. For example, when generating a view score, the document promotion system may consider only those views within a view window (e.g., last two days, last week, and last month) or may consider all views but with their contribution to the view score decaying over time. If the contributions decay (e.g., exponentially) over time, a document with many views a week ago may have a lower view score than a document with only two views two days ago, and a document with only one view in the last day may have an even higher view score than the other documents. In a similar manner, when generating a unique user score, the document promotion system may consider only those accesses within an access window (e.g., one week), may consider all accesses but with their contribution to the unique user score decaying over time, and so on. The document promotion system may also weight the activity of certain users, referred to as distinguished users, more than the activity of other users. For example, a user who is a member of a team may be more interested in documents accessed by other members of the team than those accessed by non-members. As another example, a user may be more interested in documents accessed by the user or the user's supervisor than those accessed by subordinates. The document promotion system may also use machine learning techniques (e.g., based on gradient descent) to learn the influence of recent activity or activity by distinguished users on the sub-scores for a user.
In some embodiments, the document promotion system may generate the activity score based on the following equation:
AS
d
=w
v
*VS
d
+w
uu
*UUS
d
+w
F
*FS
d (1)
where ASd represents the activity score of document d, VSd represents the view score of document d, UUSd represents the unique user score of document d, FSd represents the freshness score for document d, wv represents the weight of the view score, wuu represents the weight of the unique user score, and WF represents the weight of the freshness score. In some embodiments, the document promotion system may use different combinations of sub-scores to generate an activity score. For example, the document promotion system may generate an activity score based only on a view score and a unique user score or only on a view score and a freshness score.
In some embodiments, the document promotion system may generate the view score based on the following equation:
where cVd represents the number of times document d was viewed and sV represents a tunable saturation parameter. The document promotion system may generate the unique user score based on the following equation:
where cUUd represents the number of unique users who accessed document d and sUU represents a tunable saturation parameter. The document promotion system may generate the freshness score based on the following equation:
where cFd represents the time since document d was last updated and sF represents a tunable saturation parameter.
In some embodiments, the document highlight system generates cVd, cUUd, and cFd using an exponential decay function represented by the following equation:
cX
d
Σe
−λt (5)
where X represents V, UU, or F, t represents the time since the access, and λ represents the rate of decay. This equation results in counting most recent accesses (e.g., t=0) as one and counting less recent accesses as rapidly approaching zero depending on the decay rate. The use of saturation parameters allows control over how rapidly a sub-score approaches 1. For example, a low saturation parameter (e.g., 1) results in a smaller influence on the sub-score with an increasing number (e.g., count of views or time since update). A high saturation parameter (e.g., 100) results in a larger influence on the sub-score with an increasing number. The document highlight system may allow a user to set these tunable parameters and decay rates or may use machine learning techniques to learn them.
The document promotion system may also include a collaboration user interface component 103, a search engine 104, a rank documents component 105, and an indexer component 106. The indexer component populates the search catalog based on information in the document and log repository. The collaboration user interface component may provide a conventional user interface of a collaboration system that has been modified to present documents based on activity level. The search engine may be a conventional search engine that receives a query, uses the search catalog to identify documents that match the query, and ranks the identified documents based on activity level. The rank documents component is provided a list of documents and ranks the documents based on activity level based on information in the document and log repository and/or the search catalog.
The computing devices and systems on which the document promotion system may be implemented may include a central processing unit, input devices, output devices (e.g., display devices and speakers), storage devices (e.g., memory and disk drives), network interfaces, graphics processing units, accelerometers, cellular radio link interfaces, global positioning system devices, and so on. The input devices may include keyboards, pointing devices, touch screens, gesture recognition devices (e.g., for air gestures), head and eye tracking devices, microphones for voice recognition, and so on. The computing devices may include desktop computers, laptops, tablets, e-readers, personal digital assistants, smartphones, gaming devices, servers, and computer systems such as massively parallel systems. The computing devices may access computer-readable media that includes computer-readable storage media and data transmission media. The computer-readable storage media are tangible storage means that do not include a transitory, propagating signal. Examples of computer-readable storage media include memory such as primary memory, cache memory, and secondary memory (e.g., DVD) and include other storage means. The computer-readable storage media may have recorded upon or may be encoded with computer-executable instructions or logic that implements the document promotion system. The data transmission media is used for transmitting data via transitory, propagating signals or carrier waves (e.g., electromagnetism) via a wired or wireless connection.
The document promotion system may be described in the general context of computer-executable instructions, such as program modules and components, executed by one or more computers, processors, or other devices. Generally, program modules or components include routines, programs, objects, data structures, and so on that perform particular tasks or implement particular data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments. Aspects of the document promotion system may be implemented in hardware using, for example, an application-specific integrated circuit (“ASIC”).
Although the subject matter has been described in language specific to structural features and/or acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. For example, although the document promotion system has been described primarily in the context of views and updates to a document, the activity level may factor in many different types of user activity or even non-user activity. Other user and non-user activity may include publishing a document to a web site, archiving a document, printing a document, changing metadata associated with a document (e.g., primary author), and so on. Accordingly, the invention is not limited except as by the appended claims.