The disclosure relates to classifying information and to providing recommendations based on such classification.
The increased capability of computers to store vast amounts of on-line information has led to an increasing need for efficient data classification systems. Data classification systems are especially needed for natural language texts (e.g. articles, faxes, memos, electronic mail, etc.) where information may be unstructured and unassociated with other texts. The effect of this is that users are forced to sift through the increasing amount of on-line texts to locate relevant information. Users require that classification systems provide useful information under particular circumstances and distinguish useful information from other information.
An exemplary application in which vast amounts of data are classified is a customer call center, or more generally a contact center. In a contact center, an agent must respond to a high volume of incoming messages. To efficiently process those messages, contact centers can use software that provides auto-suggested responses to the agent to save the agent's time in preparing a response. In order to prepare a response, the content of the incoming message may first be analyzed to determine the nature of the message. Once the nature, or problem description, has been determined, an appropriate response can be prepared.
A system is disclosed to provide content analysis services. The system includes a classifier that provides one or more recommendations based on an incoming message. The classifier uses query-based classification in combination with example-based classification to classify the content of an incoming message. The system may include a user application that allows an agent to classify, process, and respond to incoming messages.
In a contact center, for example, appropriately configured software can use the classification result to efficiently retrieve relevant data from a database and to automatically suggest responses to an agent. By automatically suggesting responses that are limited to responses that are likely to be incorporated into the response, the software can reduce the time and effort required for the agent to respond to incoming messages. As such, the agent's productivity can be enhanced, and contact center costs can be reduced, by software that incorporates query-based classification. Various aspects of the system relate to analyzing the content of incoming messages.
For example, in one aspect, a method of analyzing the content of an incoming message includes classifying the incoming message using query-based classification to select at least one category that relates to the content of the incoming message. The method also includes classifying the incoming message using an example-based classification algorithm to search through a set of stored previous messages to identify at least one stored previous message that relates to the content of the incoming message. Each stored previous message is associated with at least one of the selected categories.
In another aspect, a computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that, when executed, cause a processor to perform operations to analyze the content of an incoming message according to the above-described method aspect.
In still another aspect, a computer-implemented system for responding to incoming messages includes a content analysis engine that uses query-based classification to select at least one category that relates to the content of the incoming message. The content analysis engine operates according to the above-described method aspect.
In various modifications, the foregoing aspects may include identifying at least one business object that is associated with the selected category. In that case, they may further include recommending the identified at least one business object. The aspects may include identifying at least one business object that is associated with the identified stored previous message. In that case, they may further include recommending the identified at least one business object.
Classifying the incoming message using query-based classification may include evaluating content of the incoming message using pre-defined queries. The predefined queries are associated with each of a plurality of pre-defined categories in a categorization scheme. Classifying would also then include selecting a category for which one of the pre-defined queries evaluates as true.
Classifying the incoming message using an example-based classification algorithm may include comparing the incoming message with the set of stored previous messages, and determining which stored previous messages in the set of stored previous messages are most similar to the incoming message.
The foregoing aspects may be modified to include identifying at least one business object that is associated with the selected category. In that case, it would also include identifying at least one business object that is associated with the identified stored previous message. In some examples, it would further include recommending business objects that are associated with both the selected category and the identified stored previous message. In other examples, it would further include recommending business objects that are associated with at least one of the selected category and the identified stored previous message.
The incoming message may be an email, or it may be received via Internet self-service. The foregoing aspects may also include providing a recommendation based on both the selected category and the identified at least one stored previous message. The example-based classification algorithm may be a k-nearest neighbor algorithm, or a support vector machine algorithm.
The foregoing aspects and modifications provide various features and advantages. For example, agent productivity can be increased because only the most relevant responses are automatically recommended to the agent. Accordingly, even though messages can be processed at a greater rate, the quality of the responses can be maintained or even improved. This results from the automatically suggested responses being analyzed using both query-based classification and example-based classification. The combination of these analysis techniques improves the quality of the suggested responses by filtering out irrelevant responses that either technique, by itself, would have suggested.
In a contact center with a large database of previously resolved problems, the identification process may be accelerated by pre-screening to limit the number of previously resolved problems that are evaluated. A large database may be prescreened to limit the number of previously resolved problems that are evaluated by first categorizing the problem description using query-based classification. Only those previously resolved problems that correspond to the same category as the problem description are evaluated. This may have the advantage of reducing cost and time associated with searching for customer solutions.
In some implementations, the method can provide significant computational advantages. For example, if the database of stored previous messages is very large, example-based classification can delay completion of the response, substantially burden the processing resources of the enterprise computing system at the expense of other processes, or both. By using query-based classification to narrow the number of previous examples considered during the example-based classification, the computational efficiency and speed of the classification process can be dramatically improved.
Other advantages result from the combination of both query-based and example-based classification in different ways to meet different objectives. For example, the results can be combined additively to broaden the number of suggested responses. Alternatively, the results can be combined exclusively to limit the suggested responses to those that are identified by both classification techniques. Additional features and advantages will be readily apparent from the following detailed description, the accompanying drawings, and the claims.
Like reference symbols in the various drawings indicate like elements.
In general, content analysis is a step in the process of preparing a substantive response to an incoming message or request. If content analysis is automated using software, then agent productivity may be increased by auto-suggesting relevant solutions to the agent. Where the incoming message is classified based upon a content analysis, the auto-suggested solutions may be selected based on the classification. Accordingly, efficient and accurate content analysis of an incoming message is key to auto-suggesting relevant responses.
This document describes a method of content analysis that uses a combination of two algorithms. First, the content of the message is categorized using pre-defined queries associated with categories in a pre-defined categorization scheme. A category is selected if the pre-defined query associated with that category evaluates as “true.” Second, the contents of the message are compared to a database of previous requests for information, and particularly to previous messages in that database that have an association with the selected category. A previous message that is similar to the incoming message is identified using an example-based algorithm, such as k-nearest neighbor, or support vector machine. The selected category and the identified previous message can then be used to provide suggested responses for responding to the incoming message.
For ease of understanding, content analysis software and methods will first be introduced in the context of a computing environment in which these methods may be executed. With this introduction, an overview of how content analysis may be applied is described in an exemplary system for responding to incoming e-mail messages. Then, the details of the two content analysis algorithms, namely query-based classification and example-based classification, are described in turn.
Beginning with the computing environment for content analysis,
For example, the user in the customer environment 12 can send an email over the Internet 14 from the terminal 20, and the user in the enterprise computing system 10 can receive the email at the terminal 22. The enterprise computing system 10 includes software that, when executed, first analyzes the content of the incoming email, and then automates at least a portion of the process of generating a response to the incoming email. An agent who uses the terminal 22 to respond to the email can use this software, such as ERMS software, to efficiently generate a response. In this example, the ERMS includes program modules stored in the stored information repository 24 of the enterprise computing system 10.
The stored information repository 24 also includes various databases, such as, for example, a categorization scheme database 26, a previous messages database 28, and a stored information database 30. The categorization scheme database 26 contains predefined categorization schemes, each of which includes predefined categories. The previous messages database 28 contains information about messages that have been previously received. The stored information database 30 contains various types of stored information, referenced herein as business objects, which may be used to generate responses to incoming message from the customer.
With that introduction to the computing environment, an overview of content analysis applied in an exemplary e-mail response management system (ERMS) will now be described. In
The content analysis engine 37 classifies the text of the incoming e-mail 34 using both the query-based classification module 38 and the example-based classification module 39. The content analysis is performed in order to identify business objects that are relevant to generating the response e-mail 36. Accordingly, business objects that are stored in the stored information database 30 (
The query-based classification module 38 uses at least one categorization scheme to categorize the textual content of the incoming e-mail 34. However, more than one category may be selected if a query associated with more than one category evaluates as “true.” The selected category in this example is linked to several example documents in the example-based classification module. Each example document (or previous example) that is stored in the previous messages database 28 is again linked to an object ID.
The content analysis engine 37 provides output signal 40, which includes the categories selected by the query-based classification module 38, and output signal 42, which includes object ID's identified by the example-based classification module 39. The selected categories of signal 40 are linked to business objects in the stored information database 30. The identified object ID's of signal 42 are associated with business objects in the stored information database 30.
The business object selection module 44 can combine the output signals 40, 42 to select relevant business objects for use in responding to the incoming e-mail 34. As such, the output signals 40, 42 can be combined in different ways to meet different objectives. For example, the results can be combined additively to broaden the number of suggested responses. In an additive combination, the business objects that are linked to the selected categories in the output signal 40 are combined with all of the business objects associated with the identified object ID's in the output signal 42. The resulting additive combination includes all business objects that are either linked to a selected category or associated with an identified object ID. As such, the number of business objects tends to be relatively large. Alternatively, the results can be combined exclusively to reduce the number of business objects selected by the business objects selection module 44. An exclusive combination includes only those business objects that are linked to the previous examples linked to the selected categories. This means the previous examples that are to be used to identify business objects during an example-based classification are filtered by the categories selected during a query-based classification. Using exclusive combinations, the number of business objects selected by the business object selection module 44 may be reduced.
The content analysis engine 37 of
With that overview of content analysis using the combination of query-based classification and example-based classification, the detailed operation of each of these two classification algorithms will next be described in turn.
Query-Based Classification
Turning first to the details of query-based classification, the following discussion describes the details of query-based classification so that the operation of the query-based classification module 38 of
The selection of categories to perform the foregoing exemplary business process steps, namely content analysis, depends on the structural details of the categorization scheme itself. The structures of two exemplary categorization schemes that may be used in the query-based classification module 38 of
Referring to
Accordingly, the categorization schemes 105 relate business objects 115 to the business process steps 100. By defining these associations, categorization schemes reflect relationships between business processes and resources (i.e., business objects), especially stored information, in the enterprise computing system 10. Moreover, if a categorization scheme 105 identifies a selected category from among the categories 110 that subsequently provides relevant BO's 115 to more than one business process step 100, then that categorization scheme 105 may be referred to as a “coherent” categorization scheme. In a business application that includes coherent categorization, a single categorization may be used to provide business objects to multiple business application business process steps within the business application. As such, the categorization schemes 105 may reflect relationships across multiple business processes.
For example,
By way of example, each of the categories 200, 210 and 220 is linked to relevant business objects within the business objects 115. For example, the events category 200 has a link 225 to a set of business objects 230. As will be described with reference to
As has been previously suggested, the sets 230, 240, 250 of business objects are selected from available business objects as being relevant to the categories to which they are linked. As such, the number of business objects of a particular type that are included within the particular set of business objects linked to a category can vary based on the number of business objects that are available. For example, the number of experts that are included in the set of linked business objects 230, 240, 250 depends upon the availability of subject matter experts who have knowledge relevant to the appropriate category. Similarly, the numbers of quick solutions 48 and response templates 50 that are included in a set of linked business objects 230, 240, 250, depend upon the stored contents of, for example, a knowledge base within the stored information repository 24 (
Accordingly, if the interaction record business step 120 is being performed in the presence of an input signal 30 (not shown), then content of the input signal 30 will determine how the categorization scheme 135 is navigated. If the content of the input signal 30 relates to driving directions to LEGOLAND®, then the categorization scheme would be navigated through the link 155 to the LEGOLAND® category 160, and through the link 205 to the driving directions category 210. If the ERMS business process step 125 is subsequently performed while responding to the same input signal 30, then the business process step 125 will automatically receive business objects that relate to the chosen driving directions category 210 from the set of business objects 240.
Thus, in the foregoing example, the performance of the interaction record business process step 120 categorizes the input signal 30 to select and use the driving directions category 210. The selected category may subsequently be used by a later business process step, in this example, the ERMS process step 125. Accordingly, the exemplary categorization scheme just described exhibits coherency because a selected category identified in one step of a business process can be used to perform a subsequent business process step.
Although the
Additional structural detail of a categorization scheme in accordance with the categorization schemes of
Each of the linked business objects 44 is linked to the selected category 410 by a unique link. Individual experts 46a, 46b, and 46c are linked to the selected category 410 by links 47a, 47b, and 47c, respectively, of the “is_expert” type. Individual quick solutions 48a, 48b are linked to the selected category 410 by links 49a, 49b, respectively, of the “is_solution” type. Individual response templates 50a, 50b, and 50c are linked to the selected category 410 by links 51a, 51b, and 51c, respectively, of the “is_response_template” type. Accordingly, one way to modify the categorization scheme is to modify the links 47, 49, or 51.
Use of the categorization schemes of
The categorization schemes of
Starting at 510, the contents of the incoming message 30 are retrieved at 512. The contents may be retrieved, for example, from a memory location in which the message was initially stored. All categorization schemes that are to be used to evaluate the retrieved contents are retrieved at 514. In general, categorization schemes will be retrieved from the categorization scheme database 26 (
After setting the current set of categories, the first category in the set is selected at 520. Predefined content queries associated with the selected category are evaluated against the content of the incoming message at 522. If the content matches the queries at 524, then the matching category is added to a results list at 526. The children, if any, at the next lower level of the selected category are assigned at 528 to be the current set of categories within the new recursion step that is started at 530.
Each of these children is evaluated in a recursive fashion by looping back to step 520 until no matching categories are found. In effect, this recursion loop may be described as navigating from the top level of categories of the hierarchical categorization scheme to successive matching child categories. A matching category is added to the result list at 526 if all its parent categories are matching.
After the recursive evaluation started at 530 has finished, or if no match has been found for the content queries at 524, then the next (i.e., neighbor) category on the same level in the current set of categories is selected at 532. If more categories require evaluation, then the flow loops back to the evaluation step 522. However, if no categories remain to be evaluated in the current set of categories, then the result list for the selected categorization scheme are added to the query-based classification result at 534.
If another categorization scheme remains to be evaluated, then the next categorization scheme is selected at 536, and control loops back to step 518. However, if no more of the retrieved categorization schemes remain to be evaluated, then the query-based classification result is returned at 538. After returning this result for use by the business application, the process of classifying content of the received message is completed at 540.
The result returned by the process of flowchart 500 is a set of categories that have been selected. As will be described in detail with reference to
Following the query-based classification procedure of
It is not necessary that all business processes that are performed use the same business objects. Although multiple business objects may be linked to the selected category (or selected categories), the business process may be configured to filter out all but the most relevant types of business objects.
The foregoing steps of flowchart 500 (
In
The suggested category 615 is automatically suggested to a user in a categorization step 616. The categorization step 616 corresponds to the selection process described in
The selected category 620 determines which API 622 is used to display the linked business objects. The API 622 defines, for example, the inheritance rules for displaying business objects. Inheritance rules may optionally be used to cause the display of business objects that are directly and/or indirectly linked to the selected category. For example, the inheritance rules may be configured to cause the display of all objects that are linked to the children of the selected category in addition to the objects directly linked to the selected category. In addition, the inheritance rules may optionally be configured to display business objects linked to parent categories of the selected category. The API 622 is typically configured when the software is installed in the enterprise computing system, and may be modified through maintenance. Accordingly, the API 622 can display business objects linked to parents and/or children of the selected category 620, in addition to the business objects in the set of linked business objects 624 that are directly linked to the selected category 620. The linked business objects 624, which corresponds to the linked business objects 44 in
The linked business objects 624 represent stored information that is relevant to performing the ERMS business process 600, and specifically to responding to the incoming email 610. For example, the experts 46 may identify a business partner who has special expertise that relates to the content of the incoming email 610. The quick solutions 48 may include documents that address the customer's questions in the email. In addition, the response templates 50 may provide the text of a reply email message so that the agent receives a prepared draft of a reply message.
Using these linked business objects 624, an agent can use an email editor 626 to finalize the response 612. Optionally, the agent may use other viewsets 628 to perform other steps in finalizing the response 612. For example, the agent may use one of the other viewsets 628 to attach a document that is one of the quick solutions 48 in the linked business objects 624. The agent may also involve a subject matter expert in the response 612 by using an expert 46 in the linked business objects 624 to contact the subject matter expert.
In the final step of the ERMS business process 600, the agent ends the contact 630 by, for example, sending the response 612 in the form of an email. Additional processes may be initiated as the contact is ended at 630. In this example, the 1-order repository 632 may record information about the just completed ERMS business process 600 for later uses. In other implementations, information about the transaction may be passed to other business processes within the enterprise system 10 for purposes such as, for example, reporting, monitoring, quality control, and the like.
The just described exemplary ERMS business process 600 may include a number of business process steps that, when performed together, constitute a system for responding to customer emails, and particularly business processes that are capable of supporting a large volume of interactions. Such business processes provide capabilities to interact with customers by e-mail, telephone, mail, facsimile, internet-based chat, or other forms of customer communication. Such business processes may be manual, partially automated, or fully automated. Business processes that include automation generally use computers, which, in some implementations, take the form of enterprise computing systems that integrate and perform multiple business processes.
In the foregoing example, business objects are linked to a selected category, and the business objects are used to perform a step in responding to the incoming message. The step may be performed once per incoming message, or as many times as the run-time user provides an input command to perform that business process step. As such, user input determines which business process steps are performed in the presence of a particular incoming message. Whether multiple processes are performed or not, the categorization is coherent if multiple business process steps are configured to be able to use business objects linked to a selected category.
In this implementation, the content analysis step 614 involves selecting a category based upon the content of the incoming email 610. The content of the email 610 may be first be analyzed by, for example, a text-mining engine. In implementations, the content analysis step 614 may include identifying key words in the header or body, for example, of the incoming email 610. Key words may include words, phrases, symbols, and the like, that are relevant to performing the categorization. With reference to
As will be shown below (in
The linked business objects 624 that are displayed can be of at least three types. One type is an expert 46. Experts provide contacts and referrals to human resources who can provide knowledge and support that relates to the selected category 620. Referral of a request in an incoming email 610 to one or more experts 46 may constitute part of preparing the response 612. An expert may be, for example, a business partner (e.g., an independent contractor) who has a business relationship with the enterprise, although not necessarily an employee relationship. A second type of linked business object 624 is a quick solution 48. Quick solutions 48 refer to stored business objects that contain information responsive to the incoming email 610. Quick solutions 48 include documents that directly contain the responsive information, as well as pointers to other sources of such direct information, such as, for example, internet hyperlinks, website addresses, and uniform resource locators (URLs). A third type of a linked business object 624 is a response template 50 that may be incorporated into the email editor 626 for the purpose of providing the agent pre-formatted, predefined content for an email. These response templates save the agent time in drafting the content of a response to each incoming email 610, thereby promoting the efficient performance of the ERMS business process 600. Both quick solutions 48 and response templates 50 may be stored in a knowledge base or other information storage container (e.g., the stored information repository 22 of
In the step of using the email editor 626 to finalize the response 612, the agent can review and edit the email. In addition, the user may also identify and attach to the email information, such as a quick solution 48 (e.g. documents or links to internet-based resources). Although the described implementation refers to preparing a response in the form of a reply email to the customer, other implementations may be used. For example, if an email is prepared, the email may be addressed to the customer who initiated the incoming email 610, or to an expert 46, or to both. However, the response 612 need not be in email form. By way of example, the response 612 may be in the form of a return phone call, facsimile, letter, or other action that may be internal or external to the enterprise system 10. If the incoming email 610 is a purchase order, for example, the response 612 may comprise an internally-generated sales order (via the 1-order repository 632) that ultimately results in the response 612 taking the form of a delivery of goods or services to the customer.
Depending upon the specific business process step that is being performed, the agent could also use the other viewsets 628 to finalize the response 612. The other viewsets 270 may be displayed as a part of a graphical user interface (GUI), as will be shown in
In implementations that are computer-based, portions of the business process steps to prepare the response 612 to the incoming email 610 may be automated. For example, the categorization scheme repository 618 may be stored in a memory location, such as a disk drive, random access memory (RAM), or other equivalent media for storing information in a computer system. In the end contact step at 630, for example, the results at the conclusion of the ERMS business process 600 may be stored in a memory location, such as in a 1-order repository 632, for subsequent use. In the categorization step 616, as a further example, the process of categorizing may be automated, for example, according to the flowchart 500 (
In
In
In this example, the agent has first entered information into the interaction record viewset 718 based upon the agent's analysis of the text 712 of the incoming message. The agent has specified that the reason for the e-mail relates to directions, that the priority of the interaction is medium, and that the e-mail may be described as relating to directions to LEGOLAND®. As one step of the ERMS business process, the information entered into the interaction record viewset 718 may be stored within the enterprise system 10 for later use.
The information that the agent has entered into the interaction viewset 718 provides the basis for performing a categorization using a categorization scheme. Given the above-entered information, and with reference to
In
In this example, an analysis of the content of the email has identified that the incoming email request relates to driving directions. In response, the DDLB 730 displays a list of suggested standard responses that are linked to the selected driving directions category 210. The suggested responses include the response templates 50 from the linked set of business objects 240. As such, the suggested responses displayed in the DDLB 730 derive from a categorization based on the text 712 of the incoming email.
In
Although, in the foregoing example, the agent selected one of the suggested response templates 50, the agent could have made other choices. For example, the agent could have selected the “More Responses . . . ” from the DDLB 730 to display other business objects that are not linked to the selected driving directions category 210. Alternatively, the agent could have selected more than one of the response templates 50 for inclusion in the reply email.
In
In
In
In
In
In
The ERMS business process step 125 of replying to an e-mail has been performed. The agent has manually categorized the content of the incoming email using the interaction reason categorization scheme 135. After the agent selected the driving direction category 210, a response template 50 linked to that selected category 210 was included in the response. In addition, the selected driving directions category 210 was also used to perform the interaction record business process step 120. Accordingly, the interaction record categorization scheme 135 is coherent in this example because the selected category 210 was used to perform both the ERMS business process step 125 and the interaction record business process step 120.
In
In
In
With reference to
In
In the search results area 934, a list of search results is displayed. In this example, two search results are displayed, each of which corresponds to a quick solution 48 document. With reference to
In
In the foregoing example, two business process steps have been performed using business objects linked to a single selected category. The selected building instructions category 220, which was initially selected during the performance of the interaction record business process step 120, has been used in the ERMS business process to perform the step of attaching a suggested quick solution 48 to the reply e-mail, and to perform the step of inserting a suggested response template 50 into the reply e-mail.
With reference to
The foregoing examples have illustrates how quick solutions 48 and response templates 50 are types of linked business objects 44 that may be used to perform a business process step. As has been described above, experts 46 are another type of business object that can be linked to a selected category. In an ERMS business process, for example, using an expert 46 involves routing an electronic message to notify and to inform a human expert about the incoming message. Each human expert has the capability to respond to certain categories of incoming messages. The capability of each human expert determines which categories are linked to each expert 46. Because experts that can provide high quality responses are limited resources, and because retaining experts can be costly to an enterprise, the efficient allocation of the time of experts is an important factor in enterprise system cost and quality. Accordingly, the ability to refer only appropriate incoming messages to experts, or routing incoming messages to the appropriate experts, is important.
Creating Categorization Schemes
In order to be able to perform the above-described query-based classification, a categorization scheme must first be defined. One convenient method of defining a categorization scheme that may be used by the query-based classification module 38 (
Accordingly, the categorization area 1001 in the user interface 1000 serves as a tool to enter, modify, and display categorization schemes. As can be appreciated, the categorization area 1001 is used to define various links that structure the hierarchical relationships within the categorization scheme. With reference to
In the linking area 1002, a number of tabs are provided to display various fields related to a user-highlighted category in the categorization area 1001. In this example, the driving direction category 210 is the highlighted category in the categorization area 1001. In
In the query viewset tab 1003, two rows of query criteria are shown. Elements for defining a query may be entered into columnar fields defined in a first row 1004 and a second row 1006. In the first row 1004, a match column 1005 includes a leading “if” statement. In the second row 1006, the match column 1005 includes a user-selectable drop-down list box (DDLB) into which the user can select various conditional conjunctions such as, for example, “and,” “or,” and “nor.” The conjunction provides the logical operation that connects queries in the rows 1004, 1006. If at run-time, for example, the row query 1004 evaluates as “true” and if the row query 1006 evaluates as “false,” and if the conjunction 1005 in the row 1006 is “or,” then the complete query will evaluate as “true.” However, if the conjunction 1005 in the row 1006 is “and,” then the complete query will evaluate as “false.” If the complete query for a category evaluates as “true,” then the content of the e-mail “corresponds” to that category. On the other hand, if the complete query evaluates as “false,” then the content does not correspond to that category.
Columnar fields in each row define the row queries for rows 1004, 1006. An attributes column 1007 provides a DDLB through which the user can identify attributes that are to be evaluated using the query defined in that row. For example, if the query of an e-mail relates to information contained in both the subject line and the body of the email, each row query can evaluate the content of the subject line, the body, or both. In this example, the row 1004 will evaluate “subject and body,” while the row 1006 query evaluates only the “subject.”
As described above with reference to flowchart 500 (
An operator column 1008 provides a DDLB through which the user can define the relational operator to be used to evaluate the query in that row. For example, the operator column 1008 may include operators such as equality, inequality, greater than, less than, sounds like, or includes. A value column 1009 provides a field in each of rows 1004, 1006 into which the user can enter values for each row query. If the attribute 1007 and the value 1009 in a row query have the relationship of the selected operator 1008, then that particular row will evaluate as “true.” If the attribute 1007 and the value 1009 do not have the relationship of the selected operator 1008 for a particular row, then that particular row will evaluate as “false.” Each row is connected to the previous row or to the subsequent row through a logical match operator 1005, such as “and,” “or,” and “nor.” Although only two rows 1004, 1006 are shown in this example, other rows may be entered using the scroll keys 1011. A case column 1012 provides a check box which, when checked, makes the query in that row case sensitive.
Query-based classification may be implemented using XML language. Both example-based and query-based classification may use a search engine to extract content from a message that is to be evaluated. For example, a natural language search engine may be used to identify text from the subject line of an email message for evaluation against queries defined for categories in a categorization scheme. Accordingly, a commercially available text search engine may be used to perform the step of retrieving content to be classified, as in step 512 (
Example-Based Classification
The foregoing discussion describes the details of query-based classification so that the operation of the query-based classification module 38 of
Example-based classification can most easily be described in the context of an application, such as an ERMS application. An exemplary ERMS, as illustrated in
As shown in
In one example, the knowledge base 1010 stores authoritative problem descriptions and corresponding solutions. The stored information in the knowledge base 1010 is manually maintained. The manual maintenance is typically performed by a knowledge administrator (or knowledge engineer) 1130, who may edit, create, and delete information contained in this knowledge base 1010. Because the stored information in the knowledge base is manually maintained, it may be referred to as authoritative. For example, an authoritative problem description may be a problem description submitted by a customer and then manually added to the knowledge base 1010 by the knowledge engineer 1130. As such, the authoritative problem description is manually maintained as part of a knowledge management process, and may represent, for example, what a customer has requested in the past, or is expected to request in the future.
In contrast to an authoritative problem description, a request for information is contained in an incoming message. The request for information may include a problem description, which is that part of the content that the system can classify and respond to. With reference to
Initially, a problem description in an incoming request for information is non-authoritative. A non-authoritative problem description has not been incorporated into the knowledge base 1010 according to the knowledge management process. However, a non-authoritative problem description may be semantically equivalent to an authoritative problem description if the descriptions express the same facts, intentions, problems, and the like. For example, the following problem descriptions may be semantically equivalent: “my hard disk crashed,” my hard drive had a crash,” my disk lost all data due to a crash.” Because each description describes the same problem using different words, the descriptions are semantically equivalent.
A problem description may be referred herein to as non-authoritative and semantically equivalent if it 1) has not been formally incorporated into the knowledge base 1010 by the knowledge engineer, and 2) describes the same problem as an authoritative problem description (i.e., one that is incorporated into the knowledge base 1010), but using different words. For example, a customer's email that describes a “hard disk failure” is non-authoritative when it is received, and may remain so unless the knowledge engineer 1130 subsequently incorporates it into the knowledge base 1010. Moreover, the problem description may be semantically equivalent to an authoritative problem description “computer disk crash,” stored in the knowledge base 1010, because both have the same meaning.
Each problem description and corresponding solution stored in knowledge base 1010 represents a particular class of problems and may be derived from a previous request for information. Accordingly, each problem description and its corresponding solution stored in knowledge base 1010 may be referred to as a class-center.
A repository for collected examples 1020 is provided that stores non-authoritative semantically equivalent problem descriptions and pointers to corresponding solutions stored in knowledge base 1010. Each non-authoritative semantically equivalent problem description and pointer may be referred to as a class-equivalent and may be derived from a previous request for information. In one implementation, the determination of class-equivalents may be determined by an expert 1110 or by an agent 1120. For example, in a call center context, the expert 1110 may be an individual familiar with the subject topic of an unclassified problem description. Although only a single expert and agent are illustrated in
A maintainer user interface 1030 may be provided that allows a user to edit problem descriptions stored in both the repository of collected examples 1020 and the knowledge base 1010. For example, a knowledge engineer 1130 may use the interface 1030 to post-process and maintain both the class-equivalents stored in the collected examples repository 1020, and the class-centers stored in knowledge base 1010. In one implementation, the knowledge engineer 1130 may be responsible for creating additional class-equivalents and editing unclassified problem descriptions to better serve as class-equivalents. In other implementations, the collected examples repository 1020 and the knowledge base 1010 may be maintained automatically.
For ease of understanding, the details of the maintainer user interface 1030 will now be described by referring briefly to
The maintainer user interface 1030 provides save functions 1144, 1146 that store edited problem descriptions in knowledge base 1010 and equivalent problem descriptions in the collected examples repository 1020. The maintainer user interface may provide create functions 1148, 1150 that generate class-centers in the knowledge base 1010 and class-equivalents in the collected examples repository 1020, respectively. The maintainer user interface 1030 may also provide delete functions 1152, 1154 to remove class-centers from the knowledge base 1010 and class-equivalents from the collected examples repository 1020, respectively, and a reassign function 1156 that may associate an already associated class-equivalent to another class-center.
The maintainer user interface 1030 also may provide state information regarding class-equivalents stored in the collected examples repository 1020. The state of a class-equivalent may be, for example, “valuable” or “irrelevant.” The knowledge engineer may decide which of the collected examples are “valuable” by accessing a state pull-down menu 1158 associated with each class-equivalent and selecting either the “valuable” or “irrelevant” option.
Referring again to
A user application 1131 provides access to problem descriptions and solutions in knowledge base 1010 and collects class-equivalents for storage in the repository for collected examples 1020. In one implementation, the system may be used by expert 1110 and agent 1120 to respond to incoming customer messages. In other implementations, user application 1131 may be provided directly to customers for suggested solutions.
The user application 1131 provides an e-mail screen 1070 and a solution search display 1105 comprising a manual search interface 1090, a solution cart component 1100, and search result area 1080 which displays auto-suggested solutions as well as solutions from manual search interface 1090.
The user application 1131 may be used either by an expert 1110, an agent 1120, or both, to respond to problem descriptions. Although only a single expert and agent are illustrated in
In an illustrative example, a customer may send a request for information including a problem description to the system via an electronic message. An e-mail screen 1070 may be implemented where the agent 1120 may preview the incoming electronic message and accept it for processing. Once an incoming message has been accepted, the classifier 1060 of the content analysis system may be invoked automatically and suggest one or more solutions from knowledge base 1010 using text-mining index 1050. In one implementation, the system may automatically respond to the incoming message based upon a level of classification accuracy calculated by the classifier 1060. In other examples, expert 1110 and agent 1120 may respond to the incoming message based upon one or more solutions recommended by the classifier 1060. The user application 1113 also includes an e-mail screen 1070 for displaying electronic messages to the agent 1120.
For situations in which the recommended solutions 1170 do not adequately match the problem description from the incoming message, the solution search display 1105 includes a manual search interface 1090. With reference to
The solution search display 1105 also provides a class-score 1172 to indicate the text-mining similarity of the recommended solutions 1170 to the incoming message. In addition, the solution display 1105 also may provide drilldown capabilities whereby selecting a recommended solution in the search result area 1080 causes the detailed problem descriptions and the solutions stored in the knowledge base 1010 and identified by the classifier 1060 to be displayed.
A solution cart component 1100 of solution search display 1105 may be used to collect and store new class-equivalents candidates into the collected examples repository 1020, and to cause selected solutions to appear on the e-mail screen 1070 in the attachment area 1164 (
Referring back to
The classifier 1060 may calculate the distance between the vector representing the customer problem and each vector stored in text-mining index 1050. The distance between the vector representing the customer problem description and vectors stored in text-mining index 1050 may be indicative of the similarity or lack of similarity between problems. The k vectors stored in text-mining index 1050 (i.e. class-centers and class-equivalents) with the highest similarity value may be considered the k-nearest-neighbors and may be used to calculate an overall classification accuracy as well as a scored list of potential classes matching a particular problem description.
Use of the k-nearest neighbor algorithm to perform example-based classification is illustrated using flow chart 1200 in
The classifier 1060 transforms the message into a vector of features 1204 and may calculate a classification result 1206 that includes a list of candidate classes with a class-weight and a class-score for each candidate class, as well as an accuracy measure for the classification given by this weighted list of candidate classes.
For each neighbor di (where i=1, . . . , k), the text-mining search engine yields the class c1, to which the neighbor is assigned, and a text-mining score s1, which is a measure of the similarity between the neighbor and the unassociated message. Within the k-nearest-neighbors of the unassociated message, only κ<k distinct candidate classes γj (where j=1, . . . , κ) are present.
Based on the above information of the k-nearest-neighbors, the classifier 1060 may calculate the classification result. In one implementation, the classification result may include a class-weight and a class-score.
The class-weight wj may measure the probability that a candidate class γj identified in text-mining index 1050 is the correct class for classification. In one implementation, class-weights may be calculated using the following formula:
Class-weights proportional to text-mining scores for j in the set of 1, . . . , κ:
In other implementations, class-weights also may be calculated using text-mining ranks from the text-mining search assuming the nearest-neighbors di are sorted descending in text-mining score. Class-weights using text-mining ranks may be calculated using the following formula:
Class-weights proportional to text-mining ranks for j in the set of 1, . . . , κ.
The classifier 1060 also may calculate an accuracy measure σ that may be normalized (i.e. 0≦σ≦1) and that signifies the reliability of the classification.
Class-weights also may relay information regarding how candidate classes γj are distributed across the nearest-neighbors and may be used as a basis to calculate an accuracy measure. For example, normalized entropy may be used in combination with definitions of class-weights using the following formula for classification accuracy:
The global accuracy measure may take into account all classes, while the local accuracy measure may only account for classes present in the k-nearest-neighbors.
The classifier 1060 may also calculate class-scores which may be displayed to expert 1110 and agent 1120 to further facilitate understanding regarding candidate classes and their relatedness to the unassociated message. In contrast to the normalized class-weights, class-scores need not sum to one if summed over all candidate classes.
For example, if the focus of the user is on classification reliability, the classifier 1060 may set the class-score equal to class-weights. Alternatively, if the focus of the user is on text-mining similarity between candidate classes and the unassociated message, the classifier 1060 may allow the class-score to deviate from the class-weights. In one implementation, the class-score tj may be calculated as an arithmetic average of the text-mining scores per class using the following formula (for each j in the set of 1, . . . , κ):
In another implementation, class-score may be calculated as the weighted average of the text-mining scores per class using the following formula (for each j in the set of 1, . . . , κ):
In other implementations, class-score may be calculated as a maximum of text-mining scores per class using the following formula (for each j in the set of 1, . . . , κ):
The class-score calculated by the arithmetic average may underestimate the similarity between the class and the unassociated message if the variance of the text-mining scores in the class is large. In contrast, the class-score calculated as a maximum text-mining score per class may overestimate the similarity. The class-score calculated as the weighted average may be a value between these extremes. Although three class-score calculations have been disclosed, classifier 1060 may support additional or different class-score calculations.
The classifier 1060 may determine if the classification is accurate at 1208 based upon the calculated accuracy measure. If the classification is accurate at 1212, the classifier 1060 automatically selects at 1214 a response that incorporates a solution description. If the classification is inaccurate at 1210, based upon the accuracy measure value, the classifier 1060 displays at 1216 a list of class-centers and class-equivalents. This allows the expert 1110 or agent 1120 to manually select at 1218 a response including a solution description from the classes displayed.
Other Examples
The above-described content analysis can provide generic classification services. In one implementation, for example, the system may serve as a routing system or expert finder without modification. The system may classify problem descriptions according to the types of problems agents have previously solved so that customer messages may be automatically routed to the most competent agent. The recommendation also may be a list of identifiers, each of which corresponds to a respective group of one or more suggested persons or entities knowledgeable about subject matter in the problem description.
The system, however, is not limited to incoming problem descriptions. In one implementation, the system may be used in a sales scenario. For example, the system may classify an incoming customer message containing product criteria with product descriptions in a product catalog or with other examples of customer descriptions of products to facilitate a sale.
With respect to business objects of the type “expert,” the “stored” information may be within the knowledge of a human expert who may be referred to in responding to an incoming message. Typically, an expert has more capability to address certain categories of incoming messages than a general call center agent. “Experts” (also referred to as business partners) may refer to one or more individuals who may be employees or contractors, and who may be on-site or off-site relative to the physical enterprise computing system. Accordingly, references in this document to an expert business object refer to identifying information, such as contact information, stored in the enterprise computing system. As such, a stored expert-type business object may provide a name, phone number, address, email address, website address, hyperlink, or other known methods for communicating with an expert who is linked to a selected category.
Although the examples discussed in this document have focused primarily on business processes that handle inbound and outbound information in the form of email, the coherent categorization schemes and content analysis may be used with other forms and combinations of inbound and outbound textual information. Such forms may include, for example, internet-based chat, data transmitted over a network, voice over telephone, voice over internet protocol (VoIP), facsimile, and communications for the visually and/or hearing-impaired (e.g., TTY), and the like. Furthermore, received information may be in one form while response information may be in a different form, and either may be in a combination of forms. In addition, inbound and outbound information may incorporate data that represents text, graphics, video, audio, and other forms of data. The interaction may or may not be performed in real time.
Various features of the system may be implemented in hardware, software, or a combination of hardware and software. For example, some features of the system may be implemented in computer programs executing on programmable computers. Each program may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system or other machine. Furthermore, each such computer program may be stored on a storage medium such as read-only-memory (ROM) readable by a general or special purpose programmable computer or processor, for configuring and operating the computer to perform the functions described above.
The invention can be implemented with digital electronic circuitry, or with computer hardware, firmware, software, or in combinations of them. Apparatus of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. The invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. The essential elements of a computer are a processor for executing instructions and a memory. Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
Other examples are within the scope of the following claims.