The present invention generally relates to collaboration tools, and more particularly relates to integrating messaging with collaboration tools.
People use various dedicated collaboration tools (e.g., wikis) to help them organize task materials in single location where all members can view these task materials. Various collaboration topics (e.g., a particular wiki) can be created in those tools to work on a particular aspect of the project. There are many different electronic collaboration tools that support collaborative tasks. Examples of collaboration tools are wikis (e.g., Mediawiki), teamrooms (e.g., Lotus Quickr), blogs (e.g., Blogspot), calendar meeting schedulers (e.g., Lotus Notes or Evite), forums (e.g., Ubuntu Forums and GameDev.net), groups (e.g., Yahoo Groups), activities (e.g., Lotus Activities), communities (e.g., Jive SBS), shared files (e.g., Google Documents, Microsoft Sharepoint, and Flickr), microblogs (e.g., Twitter and Yammer), and business processes (e.g., SalesForce.com).
One embodiment of the present invention provides a method. According to the method, in response to a user creating a message in a messaging system, information from the message is compared with data sets associated with the user. Each of the data sets correspond to a collaboration topic of the user for the at least one collaboration tool. At least one of the data sets is selected based on the comparison, and information indicating the one or more collaboration topics of the user that correspond to the at least one data set that is selected is presented to the user via a user interface, with the information suggesting to the user to post the message to the one or more corresponding collaboration topics of the user.
Another embodiment of the present invention provides a computer program product comprising a computer readable storage medium having computer readable program code embodied therewith. The computer readable program code comprises computer readable program code configured to compare information from a message created by the user in a messaging system with data sets associated with the user that each correspond to a collaboration topic of the user for the at least one collaboration tool, select at least one of the data sets based on the comparison, and present information indicating the one or more collaboration topics of the user that correspond to the at least one data set that is selected, with the information suggesting to the user to post the message to the one or more corresponding collaboration topics of the user.
A further embodiment of the present invention provides a system that includes a matcher and a suggestion agent. The matcher compares information from a message created by the user in a messaging system with data sets associated with the user, and selects at least one of the data sets based on the comparison. Each of the data sets corresponds to a collaboration topic of the user for the at least one collaboration tool. The suggestion agent presents to the user via a user interface information indicating the one or more collaboration topics of the user that correspond to the at least one data set that is selected, with the information suggesting to the user to post the message to the one or more corresponding collaboration topics of the user.
Other objects, features, and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and specific examples, while indicating various embodiments of the present invention, are given by way of illustration only and various modifications may naturally be performed without deviating from the present invention.
Various embodiments of the present invention will be discussed in detail herein below with reference to the attached drawings.
It is now possible to collaborate with co-workers using many different enterprise collaboration tools (wikis, team spaces, and so on) to get work done. These tools offer benefits for collaboration, providing ways to create a shared space to organize work around tasks. The focus of the work may be thought of as a task, project, activity, etc., where the term “task” can refer to any kind of work focus. An electronic collaboration tool provides a way to share and organize task materials (digital information) in a shared electronic repository that is accessible to the people participating in the work, so that they can read, edit, organize, and manipulate the materials. The terms “collaboration tools” and “collaborative tools” are used herein to refer to tools that support shared materials. By this definition, email is not a collaborative tool, because there are no shared materials; each person has their own copy of the emails and attachments.
Web 2.0 technologies such as wikis, blogs, collaborative bookmarking, and social networking sites offer significant potential benefits for enterprise collaboration. However, one important property of these early systems is that they are generally focused on “weak tie” forms of collaboration where participation tends to be voluntary and participants may be unknown to each other at the outset. See, for example, M. Granovetter “The Strength of Weak Ties”, American Journal of Sociology, 78(6): 1360-1380, 1973. On the other hand, “strong tie” collaborations, where teams work together on focused projects such as wikis, forums, and blogs, offer potential additional benefits in allowing teams to share information. Wikis and forums, in particular, offer ways for teams to collate structure and manage collective team resources.
However, despite the potential benefits of tools such as wikis and dedicated collaborative applications such as Lotus Activities, considerable effort is required to shift to these tools. A team has to all agree on the tool that they want to use and there may be start up costs associated with this. Teams also have to negotiate collective practices for using collective resources, and research shows that new users are often unwilling to edit work of others on a wiki or to publish content that they feel is unfinished.
As a result, people find it easier to simply send emails and attachments to each other. However, one problem with using email for collaborative tasks is that each person must manage their own copy of the task materials, and email clients are not particularly good at helping people manage a multiplicity of emails around different tasks. Specifically, people have difficulty collating materials related to a given task when these are spread across multiple messages, determining the context for a given message, and monitoring the state of complex tasks. In addition, emails remain in each individual's email inbox, and the task materials are not shared. Each individual must manage and organize their own copies of email messages and attachments, which usually never become integrated with existing collaboration tools.
Furthermore, email studies show that materials relating to collaborative tasks end up in multiple, often disjoint threads distributed throughout the overloaded inbox. As a result, users have to scroll through their inboxes or access project folders to identify relevant versions of attachments and comments relating to shared tasks. This makes it difficult to keep track of collaboration deliverables and contributions. Email also increases personal workload as every involved participant has to organize and upload personal versions of documents, slides, and spreadsheets in their own email and personal file systems. Overall, resources and conversations are hard to track and are not shared, and everyone has the overhead of managing resources in their personal information space.
Embodiments of the present invention bridge messaging tools (such as email) and collaboration tools. While team members often set up collaboration topics (e.g., a wiki or a discussion forum) in collaboration tools, they commonly forget or do not make the effort to use them, instead reverting to email. Embodiments of the present invention enhance existing messaging clients to provide suggestions about collaboration topics to which team members might forward (i.e., post) their emails. This reminds users about the existence of the relevant collaboration tools and simplifies the process of contributing to those tools.
The server system 102, in this embodiment, comprises a collaboration integrator 106. The collaboration integrator 106, in this embodiment, comprises a work profile builder 108, a matcher 110, a suggestion agent 112, and a router 114. The collaboration integrator 106 integrates messaging (which is easier and more natural for users) with collaboration tools (which are explicitly designed to help users coordinate and manage tasks). The collaboration integrator 106 detects when a person is sending a message, such as an email, instant message, blog, and/or the like, determines, via the matcher 110, one or more collaborative tools that are potentially relevant to the message, and suggests, via the suggestion agent 112, these tools to the user. The collaboration integrator 106 then routes, via the router 114, the message to the tools that the user selected, in addition to sending the message to the users in the address fields of the message. Components of the collaboration integrator 106 can reside outside of the collaboration integrator 106 and/or across various systems.
The collaboration integrator 106 presents the user with suggestions on relevant collaboration tools by creating one or more work profiles 116 that index a person's collaborations and the tools involved. In this embodiment, the work profile 116 for each user is stored as a separate file on a server 102. Thus, users can engage in shared collaboration by simply sending messages to other users without having to learn and use dedicated collaboration tools. While an email message is used throughout this description as an exemplary message type applicable to the collaboration integrator 106, other types of messages, such as instant messages and blog messages, are applicable as well.
As shown in
The collaboration tool server(s) 120 comprises collaboration environments/tools 126, such as wikis (e.g., Mediawiki), teamrooms (e.g., Lotus Quickr), blogs (e.g., Blogspot), calendar meeting schedulers (e.g., Lotus Notes or Evite), forums (e.g., Ubuntu Forums or GameDev.net), groups (e.g., Yahoo Groups), activities (e.g., Lotus Activities), communities (e.g., Jive SBS), shared files (e.g., Google Documents, Microsoft Sharepoint, or Flickr), microblogs (e.g., Twitter or Yammer), business processes (e.g., SalesForce.com), and so on. The users of the users systems 118 interact with these collaboration tools 126 to, among other things, organize task materials.
In this embodiment, a work unit 202 comprises a unique identifier, an optional title, a list of the users involved in a focus of work, a list of tags created by users or created automatically, a list of representative keywords that describe the content of the work focus, the status of the work (such as whether the work is currently active, completed, dormant, etc.), dates of activity for the work, a pointer to the collaborative tool used to support the work, pointers to related work units, email descriptors, and the like. An important aspect of the work unit 202 is the set of pointers to collaborative tools. Each collaborative tool supports collaboration topics 206 (also referred to as tool units, and The terms “tool unit” and “collaboration topic” are used interchangeably throughout this description).
A collaboration topic 206 is a particular instance of data and processes. For example, a calendaring tool supports the creation of meetings. A particular meeting is a collaboration topic. For a wiki tool, a particular wiki site is a collaboration topic. For the Lotus Activities tool, a particular Activity is a collaboration topic. The exemplary diagram in
As explained above, the work profile 116 is a database of work units 202. Each work unit 202 describes the users involved in the work that the unit 202 represents.
The work profile 116 is personalized (i.e., a separate work profile is built for each user). The work profile builder 108 uses a user's authentication information 204 (e.g., username and password in collaboration tools) to build that user's personalized work profile 116 by extracting their collaboration topics 206 from the collaboration tools 126. In this exemplary embodiment, the work profile builder 108 utilizes an application programming interface (API), such as the ATOM API, exposed by the collaboration tools to find these collaboration topics. The work profile builder also uses an API to retrieve a web feed for each collaboration topic, and then processes that feed and applies information retrieval algorithms to extract features from the collaboration topic.
The work profile builder 116 creates a work unit 202 for each collaboration topic 206.
The work profile 116 for a set of users is built by mining the range of socio-collaboration tools that they use. The work profile builder 108, in this exemplary embodiment, interacts with one or more collaboration tools 126 associated with a given user to obtain the tool units/collaboration topics 206 associated with that user. Various mechanisms can be used by the work profile builder 108 to obtain the tool units/collaboration topics 206. For example, in one embodiment, the work profile builder 108 uses an API to obtain a web feed for each of the user's collaboration topics 206.
The work profile builder 108 then extracts summary information from each topic 206 and creates corresponding work units 202. The collaboration topic data in most collaborative tools 126 is accessible to only the users who are given access permission, which are usually the set of users involved in working with the collaboration topic 206. Therefore, the work profile 116, in this embodiment, is built on a per-user basis. In other words, each user who wants to be indexed in the work profile database gives the work profile builder 108 permission to access one or more collaboration topics 206 associated with that user, as shown by the authentication element 204 in
Most of the data needed for work units 202 can be obtained directly from the collaboration topics 206. For example, for each collaboration topic, the work profile builder 108 extracts the names and email addresses of the users involved, the unique identifier (UUID) of the collaboration topic, and the title of the collaboration topic from the corresponding attribute values in the feed. The work profile builder 108 computes the weights for each of the names and the title words (after removing standard stop words) using an extraction algorithm such as the term frequency-inverse document frequency (TF-IDF) information extraction algorithm (see G. Salton et al. “A vector space model for automatic indexing”). In this embodiment, the corpus to compute IDF for the names is computed from all the names in that user's entire set of collaboration topics, and the corpus to compute IDF for the title words is computed from all the title words of that user's entire set of collaboration topics.
Although most of the data needed for the work units 202 can be retrieved from the collaboration topics 206, the set of keywords 512 summarizing the content of the collaboration topics 206 is computed. One technique for this keyword extraction is TF-IDF. However, other keyword extraction techniques can also be used. To generate collaboration topic content keywords, the work profile builder 108 first extracts all of the content words from that topic 206. The work profile builder 108 reads each of the entries, and store the words in a vector after eliminating stop words. Next, the work profile builder 108 computes the TF-IDF score of each of the words. To compute IDF, the work profile builder 108 maintains a corpus comprising the words from users entire set of collaboration topics 206. The work profile builder 108 selects the top N (N is set a priori) words as content keywords from the collaboration topic. As an example, the value of N (maximum number of keywords) can be set to 50, which works well in practice. However, other values of N can be used. Once the work profile builder 108 has computed the above features, the work profile builder 108 creates a work unit 202 for the collaboration topic 206 in the user's work profile 116, as shown in
Collaboration topics 206 change dynamically over time. Therefore, the work profile 116 is updated as these changes are made. There are several ways that this updating can be performed. For example, in one embodiment each user allows the work profile builder 108 to periodically rebuild the work profile 116. In an alternative embodiment, the work profile 116 is updated by subscribing to feeds of changes from the collaborative tools 126, and the work profile builder 108 performs updates dynamically.
If the message type being integrated is an email message, in this embodiment a dynamic update is performed as follows. Whenever a user indicates that an email message is to be posted to a collaborative topic 206, a descriptor of that message is added to the corresponding work unit 202. Then, when another user creates a similar email message (e.g., the subject of the first email is “New idea” and the subsequent email subject is “RE: New idea”), posting to the same collaborative topic 206 is suggested.
Also, there are many systems that mine a user's email collection to compute that user's interests or social network (e.g., Lotus Atlas). Therefore, in one embodiment, one or more of these email mining systems is utilized to supplement a user's work profile 116. For example, if a user has email folders, these can be treated as collaborative topics 206 and work units 202 can be created for them. Email threads can also be treated as collaborative topic 206. Even further, email clusters derived by textual analysis can be considered as a “potential” collaborative topic 206. Potential collaborative topics 206 can be used to suggest to the user that they may want to create a new “real” collaborative topic 206 to share the material in the potential collaborative topic 206.
With respect to computing suggestions to display to the user via the interface 122, the matcher 110 matches the header and body information from the composed message 208 to the work profile 116 of the user and rates how relevant each work unit is to the message 208. Matching a message 208 to a work unit 202 is performed by computing different similarity values, such as the similarity between the set of users mentioned in the message 208 and the set of users in the work unit, the words in the body of the message 208 and the keywords in the work unit 202, the subject of the message 208 and the message descriptors in the work unit 202, and the current time and the times of activity in the work unit 202. In this embodiment, the overall similarity of a message to a work unit 202 is computed by combining these similarities. The set of work units 202 that are most similar to the message 208 are the prime candidates for the message to be associated with. The matcher 110 returns the top-K work units as relevant, where K can be set a priori. The matcher 110 uses a similarity function to compute the similarity value of each work unit with the message and then picks the top-scoring work units as relevant.
More specifically, there is a message such as an email E and a set of work units W={W1, W2, . . . Wn}. The similarity function Fs(E,Wi) returns the similarity of work unit Wi with email E. The matcher 110 applies this similarity function for each work unit Wi in the set W, constructs a list of work units with their similarity values with the email, discards the work units which have no similarity (i.e., similarity value zero) with the email from the list, and returns the top-K (where K≦n) work units as relevant.
A discussion is now given on how the similarity function Fs(E,Wi) is defined. A simple definition can be the cosine similarity (see G. Salton et al. “A vector space model for automatic indexing”) between the bag-of-words from the email and the work unit. However, such a simplified approach ignores the structure of the email (recipients, subject, and body) and the work unit (names, title, and keywords). Ignoring header data and using a simple bag-of-words can yield poor similarity scores for an email that does not have enough word-matches with the work unit. To illustrate this consider an email that has the following parts.
The user's work profile has the following work units.
W1:
W2:
Assuming that K=1 (i.e., the user interface 122 displays only one suggestion to the user), in this example the cosine similarity matching between the email and the work units using the bag-of-words approach does not rate the work unit W1 as most similar to the email. This is because W1 has only a three word match (James, Seminar, mobile) with the email and the resultant cosine similarity score is lower than the cosine similarity score for the other work unit, which has a four word match (organize, collaboration, research, community) with the email.
However, in this example the first work unit should be suggested instead of the second one, because the title of the first (Seminar Series) closely matches the subject of the email (Seminar), and the single recipient of the email also matches one of the members of the work unit. An email's subject expresses the purpose of the email and is an important clue to find the relevancy of the email to the work units in a users' profile. Information in the subject line should therefore be rated more highly than regular content in the message body. Similarly, recipients of the email are also an important clue to find the relevancy of that email to a work unit. To use such important clues, people matching and title matching should be treated separately from keyword matching. For the above example, if title matching and people matching are considered separately and given higher weights, then work unit W2 will have a lower matching score than W1.
To incorporate this structured data, in this exemplary embodiment the similarity function Fs(E,Wi) is defined as a weighted combination of the following similarity values.
Thus, Fs(E,Wi)=w1s1+w2s2+w3s3+w4s4+w5s5. All of the above similarity values use cosine similarity values of two vectors. Vectors from different parts of the email are created to compute similarity values. If the email is denoted as Ei, vectors are created from the “To”, “Subject”, and “Body” fields of the email. The first vector, denoted as vpi, comprises the email addresses of the recipients typed in the “To” field. The second vector, denoted as vsi, comprises the words in the subject field of the email, and the third vector, denoted as vbi, contains the bag of words from the body of the email. For each work unit Wj in a user's work profile, the vectors vpj, vsj, and vbj are created. The first vector comprises the email addresses of the people in that work unit, the second vector comprises the words in the title of the work unit, and the third vector comprises the keywords of the work unit. Next, the similarity values are computed as follows.
s1=cosine(vpi,vpj)
s2=cosine(vsi,vsj)
s3=cosine(vsi, vbj)
s4=cosine(vbi, vsj)
s5=cosine(vbi, vbj)
The cosine similarity computation uses the TF-IDF score of each term stored in the vector. For the vectors vpj, vbj and vsj, a TF-IDF score is available from the work unit. However, for the vectors vpi, vbi and vsi, the TF-IDF score of each term is determined by computing its TF in the email and normalizing by IDF, which is computed from the same corpus used to compute the IDF score of the terms in the work unit. Use of the same corpus ensures that words in emails and work units are normalized uniformly.
Similarity weights can also be learned from examples. In this embodiment, the similarity weights w1, w2, w3, w4 and w5 are learned from a corpus of emails, work units, and user supplied relevancy labels. The corpus is generated by having users label their emails to indicate the relation between each email and their work units. If E={E1, E2, . . . } denotes a set of emails and W={W1, W2, . . . } is a set of work units in the corpus, this algorithm uses a user provided relevancy label to provide a mapping between each email Ei and work unit Wj in the corpus. These relevancy labels are denoted as U(Ei, Wj) where:
Thus, the input to this algorithm is the user provided relevancy labels between each email and work unit pair. Next, the relevancy between each email and work unit pair is computed using each of the similarity metrics. This is denoted as Sk(Ei, Wj) where:
Next, for each of the similarity metrics, the user provided relevancy label (U(Ei, Wj)) and the relevancy label computed by the similarity metric Sk for each email and work unit pair are compared. The total number of agreements between them is counted. Table 1 illustrates this for emails {E1, E2, E3} and work units {W1, W2, W3, W4}.
In the table, X's show the relevancy relation that the user indicated between a message and a work unit and Y's indicate the relations that the algorithm computed. The cells which are blank are neither defined as relevant by users nor found to be relevant by the similarity metric Sk. Thus, there is an agreement between the algorithm and user when a given cell in the table comprises both an X and a Y (i.e., the user and the algorithm agree on a label), or when the cell is blank (i.e., the user and the algorithm agree that there is no match between that message and the work unit).
Ak(Ei, Wj) is the function defined as follows.
Thus, Ak(Ei, Wj) measures the agreement between the user provided relevancy and the relevancy computed by the similarity metric Sk. Continuing with the example, Ak(E1, W1) is 1 because the email and work unit pair is said to be relevant by the user and also found as relevant by the similarity metric Sk. Similarly, Ak(E2, W2)=1, Ak(E3, W1)=1, and Ak(E3, W4)=1 because the corresponding email and work unit pair is said to be relevant by the user and computed as relevant by the algorithm. Ak(E1, W4)=1, Ak(E2, W1)=1, Ak(E2, W4)=1, and Ak(E3, W3)=1 because the user and algorithm agree that there is no match between the corresponding email and work unit.
All such agreements (where Ak(Ei, Wj)=1) between email and work unit pairs are counted. The resultant number is the agreement score Ak for the similarity metric Sk. Thus, in this example, the agreement score Ak is 8. Once the agreement score Ak is determined using the above algorithm for each similarity metric Sk, the agreement scores Ak are added to obtain the normalization factor A. The weight wk of the similarity metric Sk is computed as: wk=Ak/A. To illustrate this, if there are two similarity metrics s1 and s2 and the agreement scores computed for them are 2 and 3, then w1=⅖ and w2=⅗.
Once the suggestions have been determined using the above process, the collaboration integrator suggestion agent 112 presents these suggestions 210 to the user via the collaboration integration interface 122. In this embodiment, the email client has a “Post” field, as well as “To” and “CC” fields, that hold the names of the tool units/collaboration topics 206 that the email is going to be routed to. The modified client also comprises an area to display suggested relevant work units 202 and a mechanism that allows the user to make a selection 212 with respect to one or more displayed work units 202. The collaboration integration interface 122 can be embedded in a plug-in for the messaging client 124. Alternatively, the collaboration integration interface 122 can be a desktop application on the user's system 118 or a web service that is displayed in a web browser.
The weights of each similarity metric can be dynamically updated depending on each user's collaboration pattern (which may change over time). Also, there can be situations in which messages are automatically posted to collaboration topics 206 without interacting with the user. In this case, the collaboration integration interface 122 may not be used, or there may be a minimal collaboration integration interface 122 that informs the user when an automatic posting occurs. Also, the collaboration integration interface 122 can be used to allow a person to view their work profile 116. The collaboration integration interface 122 can allow the person to edit the work units 202, adding or correcting the automatically mined information (e.g., add or change keywords), changing the status of work units (e.g., a particular wiki is an obsolete version), relating work units (e.g., relating a particular wiki and a particular Activity as a higher level work unit, as shown in
After building her work profile 116, whenever Ana uses her collaboration integrator 106 enhanced messaging client 124, it suggests one or more collaboration topics to post her emails to.
In this embodiment, the router 114 has internal knowledge on how to route/post messages to the different collaborative tools. For collaborative tools 126 such as blogging or micro-blogging tools, forum discussion tools, or Lotus Activities, the routed messages can fit easily into the structure of the collaboration topics. In this situation, the messages are posted as new entries. However, other methods are also applicable, such as inserting a message within existing pages or replacing certain existing text. For some tools 126 such as wikis, messages may not easily be fitted into the structure of the collaboration topics. In this situation, various methods can be used to add the messages to the topics. For example, in a wiki an email can be appended to a special Email page in each wiki site.
The router 114 also obtains permission from the user to post information to the collaboration topics on behalf of the user. This permission can be obtained/given by the user logging into a server that passes the user's credentials to the router 114. An advantage of the interface 122 being embedded in the messaging client 124 is that the user is almost always logged into their messaging service.
Accordingly, the collaboration integrator 106 provides a lightweight mechanism for users to organize task-relevant content in collaboration tools, while continuing to use messaging clients as their predominant communication tool. The collaboration integrator 106 bridges the gap between messaging and collaboration tools, so as to allow users to contribute to these tools as a side-effect of their normal practices. The collaboration integrator 106 utilizes an efficient algorithm that determines which of the user's collaboration topics are most relevant to the message being composed. The algorithm combines different similarity metrics between an email and collaboration topic and learns the similarity weights from a labeled corpus. Additionally, the collaboration integration system can automatically summarize message email threads and archive old content. Also, message attachments can be automatically uploaded to a collaboration tool and replaced with a link to the shared copy.
Experiments were performed on the suggestion algorithm described above and the performance of this suggestion algorithm was evaluated against real email data that was hand-classified by 12 participants. In addition to showing that the algorithm performed well, the experimental data showed which attributes of both email and collaboration topics lead to the best suggestions. Additionally, the user experience with the collaboration integration system was evaluated with 32 people and two groups in the targeted user population (email users interested in or already using collaboration tools with their teams).
There were 1237 emails collected from 12 users that were all employees in a large organization and that all used both collaboration tools (including Lotus Activities and/or Lotus Wikis) and emails for their collaborative tasks.
Their work units were first extracted from the collaboration topics that they were working on (activities and wikis) using the collaboration work profile builder. Each participant was given a list of the work units in their work profiles. They were asked to label each of their approximately 100 emails with the name of one or more relevant work units. If no work units were relevant, participants did not label the email. In total, 65% of the emails were unlabeled, 24% had 1 label, 5% had 2 labels, 1.4% had 3 labels, and 4.6% had 4 to 10 labels. This dataset was divided into two parts: the training set contained 50% of emails from each user selected randomly, and the validation set contained the remaining emails.
First, the suggestion algorithm was trained using the training set to learn the weights (w1, w2, w3, w4, w5) of individual similarity metrics using the algorithm described above. Next, the accuracy of the suggestion algorithm was evaluated using the validation set to calculate two standard metrics: recall (r) and precision (p). Recall indicates whether the right collaboration topics were suggested and measures the proportion of all user labels that the algorithm successfully suggested. Precision helps to understand how noisy the suggestions were by calculating the proportion of all suggestions made by the algorithm that were correct.
The recall and precision of the suggestion algorithm was computed for each similarity metric individually and for a combination of all similarity metrics as follows.
The recall and precision results of the suggestion algorithm were compared with a baseline algorithm. The baseline algorithm made work unit suggestions for an email by computing the cosine similarity of a bag-of-words collected from the email and a bag-of-words from each work unit. For the experiments, the size of the maximum number of suggestions (K) was varied from 1 to 10. Emails in the validation set that had a higher number of labels than K were not considered. In other words, when the accuracy of the algorithm was tested where K was 2 suggestions, cases where users had labeled emails with 3 or more labels were excluded for simplicity.
Using the recall metric (i.e., how many of the correct suggestions were generated), the suggestion algorithm performed well at suggesting the collaboration topics that matched a user's email labels.
Comparing different similarity metrics, email subject and work unit title (subject-title) gave the best suggestions with 63% recall on average, one correct suggestion 78% of the time, and all correct suggestions 65% of the time, followed by email subject and work unit keywords or subject-keyword with 56% recall on average, one correct suggestion 74% of the time, and all correct suggestions 63% of the time. The suggestion algorithm also did well at precisely providing collaboration topic suggestions without many false positives.
These experiments show that, as usual, there is a trade-off between recall and precision: as the number of suggestions increases, recall increases but precision decreases. Because users can ignore suggestions, it is better for the collaboration integrator to suggest the right collaboration topics (maximize recall) than to minimize the number of false positives (maximize precision). However, too much noise in the suggestions can lead to user annoyance. It was determined that a good balance of both high recall (75%) and precision (62%) occurs when six suggestions are shown (K=6).
The evaluation of recall/precision based on data from 12 users showed that the combined and subject-title similarity metrics both performed very well. However, it is possible that an individual user's collaboration pattern can affect the similarity metrics that lead to the best suggestions. Whether or not the best similarity metrics would be different for people with different collaboration patterns was tested by comparing the algorithm's recall and precision performance for two users: User 1 who had significant variation in the people she collaborated with across topics, and User 2 who worked with many of the same people across topics. The results of this test showed that “people” led to more accurate suggestions than any other individual similarity metric for User 1, whereas “subject-title” led to more accurate suggestions for User 2 and the aggregated set of users discussed above. As with the aggregated set of users, combined yielded the best suggestions for both cases. The precision results follow a similar pattern as for recall: “people” led to less noisy suggestions than any other individual similarity metric for the User 1, whereas “subject-title” led to less noisy suggestions for User 2. These results show that the different collaboration patterns of individual users can result in different similarity metrics being more important when deciding which collaboration topics to suggest.
The collaboration integrator 106 determines that the user is creating a messaging 208 using a messaging client 124 and matches/compares the message 208 to each work unit 202 in the user's work profile 116, at step 1110. The collaboration integrator 106 identifies relevant work units 202 (as a result of step 1110) and presents collaboration topic suggestions to the user via the interface 122, at step 1112. The collaboration integrator 106 receives the user's selection of one or more of the suggested collaboration topics and sends the message to the collaboration tool associated with the selected topic, at step 1114.
An example of the processes discussed above with respect to steps 1110 to 1112 is given as follows. As the user begins to compose a message, this action results in the suggestion agent 112 being called and the matcher 110 being invoked. The matcher 110 dynamically matches header and body information from the composed message with each of the work units 202 in the user's work profile 116. The matcher 110 computes the top-K relevant work units for a message, where K is the maximum number of suggestions and is set a priori. The matcher 110 returns the work unit title and the unique IDs of those top-K matched work units to the suggestion agent 112. The suggestion agent 112 displays their titles in the “suggested posts” field of the interface 122.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Referring now to
The information processing system 1200 includes a computer 1202. The computer 1202 has a processor(s) 1204 that is connected to a main memory 1206, mass storage interface 1208, and network adapter hardware 1210. A system bus 1212 interconnects these system components. Although only one CPU 1204 is illustrated for computer 1202, computer systems with multiple CPUs can be used equally effectively. The main memory 1206, in this embodiment, comprises the collaboration integrator 106 and work profiles 116 if the system 1200 is the server system 102, or the user interface 122 and the suggestion agent 112 if the system 1200 is the user system 118.
The mass storage interface 1208 is used to connect mass storage devices, such as mass storage device 1214, to the information processing system 1200. One specific type of data storage device is an optical drive such as a CD/DVD drive, which can be used to store data to and read data from a computer readable medium or storage product such as (but not limited to) a CD/DVD 1216. Another type of data storage device is a data storage device configured to support, for example, NTFS type file system operations.
An operating system included in the main memory is a suitable multitasking operating system such as any of the Linux, UNIX, Windows, and Windows Server based operating systems. Embodiments of the present invention are also able to use any other suitable operating system. Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system to be executed on any processor located within the information processing system 1200. The network adapter hardware 1210 is used to provide an interface to a network 104. Embodiments of the present invention are able to be adapted to work with any data communications connections including present day analog and/or digital techniques or via a future networking mechanism.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.