Software tool for training and testing a knowledge base

Description

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to computer software, and more particularly to relationship management software for classifying and responding to customer communications.

2. Description of the Prior Art

Most commercial enterprises devote significant time and resources to the tasks of reviewing and appropriately responding to inquiries, requests and other text-based electronic communications received from current or prospective customers. In order to enable more efficient administration of these tasks, certain software vendors, such as iPhrase Technologies, Inc. of Cambridge, Mass., have developed computerized customer relationship management (CRM) systems which perform analysis of incoming electronic communications and classify the communications into predetermined categories based on the determined intent. This categorization process may be utilized to automate generation of responses, or to guide human agents in the selection of a suitable response.

Such CRM systems typically require construction of a knowledge base (KB) before the analysis and classification functions may be performed reliably, i.e., before the CRM system may be put on-line. The KB contains relevant statistical and semantic information derived from a body of sample texts (known collectively as a corpus) by using a process known as training. KB performance may be improved by periodically retraining the KB with additional texts, or by providing the KB with online feedback (a process referred to as online learning, an example of which is described in U.S. patent application Ser. No. 09/754,179, filed Jan. 3, 2001). Generally, the accuracy and reliability of a CRM system depend on optimizing and maintaining KB performance. Poor KB performance may result in unacceptably high rates of false positives (i.e., frequently assigning non-relevant categories to communications) and/or false negatives (i.e., frequently failing to assign a relevant category to communications).

To construct and train a KB that provides satisfactory performance, the CRM user must carefully perform a number of preparatory tasks, including collecting appropriate sample texts, identifying a set of categories that classify the texts according to intent, and assigning the proper category to each sample text. If this process is conducted improperly or if erroneous information is used, then the performance of the resultant KB will be compromised, and the associated CRM system will behave in an unreliable fashion. Unfortunately, the prior art lacks tools for testing the performance of a KB and for reporting the test results in a manner which would allow the user to identify and remedy errors and problematic conditions in order to improve KB performance.

SUMMARY

Roughly described, an embodiment of the present invention provides a software tool for training and testing a knowledge base of a computerized customer relationship management system. The software tool may be conceptually divided into four component processes: corpus editing processes, knowledge base (KB) building processes, KB testing processes, and reporting processes. The corpus editing processes import selected sample texts, allow assignment of relevant categories from a predefined category list to individual corpus items, display corpus items and associated field and category information for user inspection, and modify the corpus items and associated information in accordance with user input. KB building processes select a subset of the corpus items to be used for training in response to user input, and cause a KB to be constructed based on analysis of the texts in the training subset. KB building processes may use the services of a modeling engine to perform the requisite text processing and semantic and statistical analysis operations. Once the KB has been built, KB testing processes test the performance of the KB by using it to classify each corpus item of in a second subset. Reporting processes then generate selected reports representative of the performance of the KB, and cause the reports to be displayed to the user. The reports may identify errors or problematic conditions to the user, which may be remedied by making appropriate changes to corpus items and/or organization of the KB.

Reports which may be generated by the reporting processes and viewed by the user include reports representative of overall KB performance across all categories, and reports representative of KB performance for a selected category. Illustrative examples of reports which may be selected include scoring graph reports, showing match scores in a selected category for each corpus item in the testing subset; reports showing the relationship between precision and recall, either for all categories or for a selected category; cumulative success over time reports, showing how the KB performance changes over time; threshold calculator reports, depicting the relationship between values of threshold, cost ratio, precision and recall and allowing the user to rationally set threshold values to be used by an application; and, stealing/stolen reports, showing the percentage and number of corpus items “stolen” by or from one category of a pair of categories, which may be used to identify categories having overlapping intents.

BRIEF DESCRIPTION OF THE FIGURES

In the attached drawings:

FIG. 1 is a block diagram depicting the knowledge base (KB) tool of the invention in relation to an exemplary computerized customer relationship management (CRM) system;

FIG. 2 is a block diagram depicting components of the KB tool;

FIG. 3 is a workflow diagram depicting the steps of a process for training and testing the KB;

FIG. 4 is an exemplary user interface (UI) screen of the KB tool used for displaying and editing corpus items;

FIG. 5 is a block diagram depicting the division of the corpus items into training and testing subsets;

FIG. 6 is an exemplary UI screen of the KB tool presenting a set of user-selectable options for dividing the corpus into training and testing subsets;

FIG. 7 is an exemplary scoring graph report;

FIG. 8 is an exemplary report of total precision versus recall;

FIG. 9 is an exemplary cumulative success over time report;

FIG. 10 is an exemplary threshold calculator report; and

FIG. 11 is an exemplary stealing/stolen report.

DETAILED DESCRIPTION

The invention may be more easily understood with reference to the attached figures, which depict various aspects of an embodiment of a software tool for training and testing a knowledge base of a computerized customer relationship management system. Referring initially to FIG. 1, there is shown a software tool (hereinafter referred to as the “KB tool”) 100, which provides a user with the ability to train and test a knowledge base (hereinafter referred to as “KB”) of a computerized customer relationship management (“CRM”) system 102. CRM system 102 may be logically and conceptually divided into three components: an application 104, a modeling engine 106, and a KB 108. Application 104, which may be configured to perform any variety of functions, receives text-based electronic communications from an external source. The communications will typically take the form of electronic mail messages (e-mails), or text supplied through a web interface (e.g., in a query box of an HTML form). Application 104 calls upon the services of modeling engine 106 to analyze the communication and to determine an associated intent. As will be discussed in further detail below, modeling engine 106 may determine intent by calculating a set of match scores for each communication, wherein individual match scores of the match score set correspond to one of a plurality of pre-established categories. The match score is representative of a confidence that the communication “belongs to” the associated category; a high match score for a category is indicative of a high probability that the communication is relevant to that category, whereas a low match score indicates a low probability of relevance. Modeling engine 106 uses KB 108 to perform the analysis and scoring functions, as will be described below.

Match scores calculated by modeling engine 106 are returned to application 104, which may select and take an appropriate action based on the match scores. In one example, application 104 takes the form of an automated e-mail response application, which receives inquiries and requests from current or prospective customers. Depending on match score values determined by the modeling engine, application 106 may select and send an appropriate response to the inquiry or route the inquiry to an appropriate agent 110 for further action. As an illustrative example, modeling engine 106 may analyze an e-mail received from a prospective customer and calculate a high match score for a category associated with a specific product or service offered by a company. The e-mail response application could then automatically send the prospective customer a response with information about the specific product/service, or route the customer e-mail to a human agent having the relevant expertise.

Those skilled in the art will recognize that application 104, modeling engine 106 and KB 108, as well as KB tool 100, may reside and be executed on a single computer, or on two or more computers connected over a network. The computer or computers on which the components reside will typically be equipped with a monitor and/or other display device, as well as a mouse, keyboard and/or other input device such that the user may view UI screens and reports and enter user input. Those skilled in the art will also recognize that the foregoing software components will typically be implemented as sets of instructions executable by a general-purpose microprocessor. In a specific implementation of CRM system 102, modeling engine 106 uses a two-phase process to analyze and classify received communications. In the first phase, a natural-language processing (NLP) engine extracts concepts from the communication and generates a structured document containing these concepts. As used herein, the term “concept” denotes any feature which may be used to characterize a specific category and distinguish it from other categories, including words or phrases as well as information representative of the source or context of the communication (e.g., an e-mail address). The NLP engine extracts the concepts by performing a prescribed sequence of operations, which may include language identification and encoding conversions, tokenization, text cleanup, spelling and grammatical error correction, and morphological and linguistic analysis.

According to the two-phase implementation of modeling engine 106, the structured document generated by the NLP engine and containing the extracted concepts is passed to a semantic modeling engine, which performs statistical pattern matching on the document by comparing it with the content of categories residing in KB 108 to produce the match score set. As noted above, each score in the match score set represents a confidence level that the communication falls within the associated category. KB 108 may also include one or more user-supplied rules specifying how to route communications to specific categories based on the content of the communication or related metadata (indicating, for example, the identity of the person sending the communication, or properties of the channel over which the communication was received, e.g., secured or unsecured).

Software utilizing a two-phase modeling engine of the foregoing general description is commercially available from iPhrase Technologies, Inc. It is noted, however, that the description of a specific implementation of modeling engine 106 is provided by way of an example, and the invention should not be construed as being limited thereto.

KB 108 may be regarded as an object containing the learned information required by modeling engine 106 to perform the match score generation function, and may take any suitable form, including a database or file (or collection of files). KB 108 contains relevant statistical and semantic information derived from a collection of sample texts known as a corpus. The process of deriving the relevant statistical and semantic information from the corpus is known as “training.” The performance of KB 108 may be maintained and improved over time by providing it (either in real-time or at specified intervals) with feedback and adjusting information contained within KB 108 accordingly, a process known as “learning.” In one example of feedback, application 104 may execute an “auto-suggest” function, wherein it identifies to a human agent two or more categories (or a set of candidate responses each of which is associated with one of the categories) most likely to be relevant to the received communication. When the agent selects one (or none) of the identified categories or associated responses, feedback is provided to KB 108, and statistics contained within KB 108 are appropriately modified to reflect the selection. The process of adapting a knowledge base using feedback is described in greater detail in co-pending U.S. patent application Ser. No. 09/754,179, filed Jan. 3, 2001, which is incorporated by reference.

In an exemplary implementation, KB 108 may be organized into an array of nodes, wherein each node contains semantic statistical information and/or rules for use by modeling engine 106 in classifying communications. Some or all of the nodes will represent individual categories. The simplest way to organize nodes in KB 108 is to place them in a single-level flat knowledge base structure. If, for example, CRM system 102 is designed to analyze customer e-mails and determine to which product each e-mail pertains, KB 108 may take the form of a flat knowledge base of several nodes, each node representing a product and containing the relevant semantic and statistical information. Alternatively, the nodes may be organized into a multi-level hierarchical structure, wherein certain of the nodes have child nodes, or into other structures known in the art.

KB tool 100 advantageously provides means for constructing and training KB 108, for assessing its performance, and for identifying various errors and problematic conditions. Referring now to FIG. 2, it is seen that KB tool 100 may be conceptually divided into four composite sets of processes: corpus editing processes 202, KB building processes 204, KB testing processes 206, and reporting processes 208. Generally described, corpus editing processes 202 import selected sample texts into a corpus, display corpus items and associated field and category information for user inspection, and modify the corpus items and associated information in accordance with user input; KB building processes 204 select a subset of the corpus items to be used for training in response to user input, and cause a KB to be constructed based on analysis and classification of text and metadata contained in the selected corpus items; KB testing processes 206 test the KB using a second subset of the corpus items; and, reporting processes 208 generate reports on the testing and cause the reports to be displayed to the user. It should be recognized that the partition of KB tool 100 into separate processes is conceptual in nature and should not be construed as specifying the actual program architecture of KB tool 100, i.e., as requiring that each set of processes reside in an independent module.

The functions performed by each of the processes, and by KB tool 100 as a whole, may be more clearly explained with reference to FIG. 3, which depicts the workflow associated with training and testing KB 108, and to FIGS. 4-11, which depict exemplary UI screens and reports that are displayed to the user and employed to implement the various functions of KB tool 100. Referring initially to FIG. 3 and proceeding from left to right, the operations of training and testing KB 108 begins with the creation and editing of the corpus file, which is managed by corpus editing processes 202. To create the corpus file, the user identifies (typically through a dialog box or other UI element) a source or sources of the sample texts that will be used for training and testing. The sample texts should be of the same type as and representative of the communications that will be analyzed and classified by CRM system 102. For example, if CRM system 102 is configured to act as an automated e-mail response application that automatically provides or suggests appropriate pre-prepared text responses to incoming e-mails, then the sample texts should be typical e-mail messages containing questions that are similar to those which will be received by CRM system 102. Performance of KB 108 will be improved by creating a corpus file containing a relatively large number of sample texts. Furthermore, it is beneficial to create a corpus file that contains a significant number of sample texts pertinent to each of the categories into which the communications will be classified. Files of various formats and types may serve as the source of the sample texts, including without limitation, comma separated value (CSV) files, Microsoft Excel (worksheet) files, and PST (Microsoft Outlook e-mail) files. In addition, the corpus file may be manually constructed (or modified) by entering or copying individual corpus items via a user interface.

Creation and editing of the corpus also involves defining corpus fields (also referred to as name-value pairs, or NVPs) and assigning a category to each corpus item. Corpus fields are data sets containing information associated with each corpus item. Definition of corpus fields allows the user to specify which elements of the corpus items (and of communications to be acted upon by CRM system 102) will be analyzed by modeling engine 106. For example, if the corpus items are e-mail messages, appropriate corpus fields may include a “From” field identifying the source of the corpus item, a “Message” field containing the message body, a “Subject” field containing the message subject, and a “Category” field identifying the category to which the corpus item belongs. Each corpus field may be assigned properties specifying the data type contained in the field (e.g., text or number) as well as options for how the field is processed (or not processed) by the NLP engine of modeling engine 108. These properties will typically be assigned via a dialog box or similar UI element. Each corpus item may include either or both unstructured and/or structured information. Structured information consists of information having certain predetermined constraints on its values and/or format, such as a corpus field which can only take a value of TRUE or FALSE. Unstructured information, such as a free language field (for example, the “Message” field described above) does not need to conform to prescribed restraints.

Corpus field names and properties may be specified by the user through a dialog box or other UI element. Alternatively, the corpus field names and properties may be specified in the sample text files themselves. In another alternative, corpus editing processes 202 may automatically define corpus fields and properties if the sample text file is in a certain prescribed format, such as a PST file containing e-mail messages.

Corpus editing processes 202 also manage the assignment of categories to each corpus item. The categories are representative of distinct groupings into which the communications may be classified according to the communications' intents. Typically, identification of categories is performed by manually reviewing a set of sample texts to determine what common intents are expressed in the texts. In one example, CRM system 102 is an automated e-mail response application for a product retailer. The user, upon review of a sample of recently received emails, finds that the e-mails may be classified into one of three areas: requests for product specifications and pricing information, complaints about purchased products, and inquiries regarding store locations and hours of operation. The user may then specify, using a dialog box or other UI element presented by the corpus editing processes 202 to the user, that three categories are to be used by KB 108 for classification, consisting of a product information request category, a complaint category, and a store location category. Next, the user assigns a relevant category to each item (e-mail) in the corpus. Assignment of the categories may be performed via a UI presented by corpus editing processes 202, or alternatively the categories may be added to the file containing the sample texts prior to importing them into the corpus file. Other methods and techniques, both manual and semi-automated, may be utilized to define a set of categories and assign a relevant category to individual corpus items. These methods and techniques include locating specified text strings, classifying by response (e.g., for sample texts consisting of standard (“canned”) answers appended to customer email inquiries), and clustering (identifying semantic similarities in unclassified corpus items to group textually similar items together).

FIG. 4 is an example of a UI 400 presented by corpus editing processes 202, allowing a user to view and edit individual corpus items. Each row 402 in the UI represents an individual corpus item, and each column 404 represents a corpus field, or name-value pair. In the example depicted in FIG. 4, the corpus items are articles posted to Usenet groups, and the corpus fields include a “From” field identifying the source email address, a “Message” field containing the text of the article, and a “Subject” field. The corpus fields further include a “Categories” field identifying the category which has been assigned by the user to each corpus item (in the example depicted, the Usenet group to which the article has been posted), using a manual or semi-automated technique. The user may select one or more corpus items from the list displayed in the UI to view details of the items or to edit the values of the corresponding corpus fields.

Referring again to the workflow diagram of FIG. 3, after the corpus file has been created and edited, KB 108 is built and tested from analysis of the corpus items. Building of KB 108 is managed by KB building processes 204. KB building processes initially split the corpus into a first subset to be used for training KB 108, and a second subset to be used for testing KB 108. The process of splitting the corpus into training and testing subsets is symbolically depicted in FIG. 5. Of course, many schemes may be utilized for dividing the corpus into subsets. Preferably, the manner in which the corpus is split is selectable by the user. FIG. 6 is an exemplary UI screen 600 listing various user-selectable options 602 for splitting the corpus into subsets for training and testing (e.g., using random cuts, create (train) using even-numbered items/analyze (test) using odd-numbered items (a method known in the art as “jack-knife”) and so on). It should be recognized that the training and testing subsets may be overlapping (i.e., include common corpus items), and that one or both of the subsets may include the entire corpus (e.g., as used for the “Create using all selected, analyze using all selected” option.)

After the corpus has been split into training and testing subsets, KB building processes 204 initiate the creation of KB 108. Generally described, the process of building KB 108 involves deriving relevant semantic and statistical information from the corpus items in the training subset and associating this information with corresponding nodes of the KB 108. As noted above, some or all of the nodes represent categories of the predefined set of categories; for the automated e-mail response application example described above, KB 108 may consist of three nodes arranged in a flat structure: a first node corresponding to the product information request category, a second node corresponding to the complaint category, and a third node corresponding to the store location category. According to the implementation depicted in FIG. 1, KB building processes 204 may invoke the services of modeling engine 106 to perform natural language and semantic analysis of the corpus texts and thereby derive the semantic and statistical information to be associated with the nodes of KB 108. Those skilled in the art will recognize that various well-known techniques and algorithms may be employed for processing of the corpus texts and extraction of the relevant semantic and statistical information, and so such techniques and algorithms need not be discussed herein. It should also be recognized that KB 108 will not necessarily be empty (i.e., lacking structure and relevant semantic/statistical information) prior to initiation of the KB building process; in some cases and implementations, KB building processes 204 will operate on an existing KB which has previously been provided with a structure and relevant information. In such cases and implementations, KB building processes 204 will cause the structure and information to be modified in accordance with the results of analysis of the texts in the training subset.

After KB 108 has been built, its performance is tested by classifying the corpus items in the testing subset of the corpus using the information contained in KB 108 to determine if the corpus items have been classified into the most relevant category(ies). Testing of KB 108 is managed by KB testing processes 206. In the FIG. 1 embodiment, KB testing processes 108 may call upon the services of modeling engine 106 to extract concepts from the corpus items (using, for example, an NLP engine) and perform statistical pattern matching using the relevant semantic and statistical information for each category contained within KB 108. This process will return a set of match scores for each corpus item in the testing subset. Each match score in the match score set represents a confidence level that the corpus item belongs to the associated category. In a typical implementation, match scores determined by modeling engine 106 fall within a pre-established range (e.g., 0-100), with higher scores denoting a high level of confidence that the corpus item belongs to the associated category, and lower scores denoting a low level of confidence that the corpus item belongs to the associated category. For example, using the three-category KB example discussed above (consisting of a product information category, a complaint category, and a store location category), a corpus item in the testing subset could have a match score of 95 for the product information category, a match score of 30 for the complaint category, and a match score of 5 for the store location category. If the corpus item in question is properly classified in the product information category, then KB 108 would be regarded as performing well; if, in fact, the corpus item is properly classified in one of the other two categories, then KB 108 would be regarded as performing poorly. Test results, comprising match score sets obtained for each corpus item in the training subset are stored by KB testing processes 206 and used for generating reports assessing various aspects of KB performance, as described below.

Referring again to the workflow diagram shown in FIG. 3, the user may select and view reports generated by KB tool 100 to gauge the performance of KB 108 and make appropriate adjustments to improve performance. Report generation is managed by reporting processes 208. As used herein, the term “report” denotes any collection of graphical and/or textual information that visually represents the performance of KB 108. Reports generated by reporting processes 208 include both summary reports, which depict the performance of KB 108 across all categories, and category reports, which depict the performance of KB 108 for a specified category. In a typical implementation, the reporting processes 208 will cause a UI or series of UI screens to be displayed in which the user can select the type and content of report he wishes to view. Examples of reports generated by KB tool 100 are described below. It is noted, however, that the reports described and depicted herein are intended as illustrative examples, and that the scope of the present invention should not be construed as being limited to these examples. It is further noted that the reports may be presented in a window of a graphical display and/or in a printed document.

FIG. 7 is an exemplary category report in the form of a scoring graph report 700. Scoring graph report 700 depicts match scores for each corpus item in a selected category. Each point 702 on the graph represents an individual corpus item. Light points 704 represent corpus items that belong to the selected category, and dark points 706 represent corpus items that do not belong to the selected category. If KB 108 is performing well in the selected category, most of the light points 704 will appear in the upper portion of the graph (at or above a match score of 0.80), and most of the dark points 706 will appear in the lower portion of the graph. In a preferred implementation of the scoring graph report, a user can select an individual point 702 on the graph (e.g., by clicking on the point) to view details of the corresponding corpus item. This feature allows the user to quickly and easily inspect “stray points” which are displaced from their expected, optimal area of the graph, i.e., light points 704 appearing in the lower portion of the graph and dark points 706 appearing in the upper portion of the graph, and determine if any discernible error or condition exists which caused the misclassification or failure to classify into the expected category. For example, the user may click on one of the stray dark points and discern that the associated corpus item was assigned the wrong category during the corpus creation process. The user may then edit the corpus item to assign the correct category and re-train KB 108 using the corrected information.

FIG. 8 is a summary report 800 consisting of a graph of total precision versus recall for all categories in KB 108. As used herein, the term “precision” denotes the fraction of corpus items identified as relevant to a category that are actually relevant to the category, and the term “recall” denotes the fraction of corpus items actually relevant to a category that are identified as being relevant. The graph of total precision versus recall represents a weighted average of the precision for each recall value, wherein categories having a relatively greater number of texts are accorded greater weight than categories having a relatively smaller number of texts. The total precision versus recall graph provides a visual indication of the overall performance of KB 108. Generally, a curve located primarily in the upper-right portion of the graph indicates that KB 108 is performing well, whereas a curve located primarily in the lower-left portion of the graph indicates a poorly performing KB 108. If the results indicate that the performance of KB 108 is poor, then the user may select and view category reports depicting precision verses recall results for each category in order to identify whether any specific category is particularly problematic.

FIG. 9 shows an exemplary cumulative success over time report 900. This report consists of a graph depicting the cumulative success of KB 108 during the lifetime of a chronological testing corpus (i.e., a corpus whose items are in the order they were received by the system). Each line 902 on the graph show how often the correct category was among each of the top five category choices (those categories having the highest match scores). More specifically, the bottommost line represents, for each point in time, how often the correct category was the highest scoring category, the next (vertically adjacent) line shows how often the correct category was one of the two highest scoring categories, and so on. Cumulative success over time report 900 is useful to assess trends in KB 108 performance, and identify problems occurring during particular time frames (as evidenced by dips in the lines indicative of decreased KB 108 performance). Generation of the cumulative success over time report requires inserting a corpus field for each corpus item that contains the date and time the corpus item was received.

FIG. 10 shows an exemplary threshold calculator report 1000. Thresholds are values used by application 104 to determine whether to take a specified action with respect to a communication. For example, where application 104 is in the form of an automated e-mail response application, a threshold setting may be used by application 104 to determine whether to auto-respond to an incoming e-mail, i.e., application 104 will auto-respond to a customer email only if the match score for a category exceeds a value (e.g., 90) indicative of a high confidence that the email should be classified in the category. Prior art CRM systems have generally lacked tools enabling the user to intelligently set thresholds in order to achieve a desired performance objective. Threshold calculator report 1000 provides a means for depicting the relationship between the threshold value and various performance parameters, including cost ratio (defined below), precision, and recall.

Threshold calculator report 1000 includes a graph 1002 showing match values for each corpus item for a specified category. Again, light points 1004 represent corpus items which belong to the specified category, and dark points 1006 represent corpus items which do not belong to the specified category. The current value of the threshold is represented as line 1008. Threshold calculator report 1000 also lists values of cost ratio, precision, recall, false positives, and false negatives corresponding to the current threshold value. The user may set values for any one of the following parameters: threshold, cost ratio, precision, or recall. In alternative implementations, user-settable values may include other suitable parameters which would be apparent to those skilled in the art. One such user-settable value is an automation ratio, which denotes the percentage of corpus items which meet or exceed the threshold. Responsive to entry of any of these values, reporting processes 208 calculates and displays corresponding values of the other parameters. For example, if the user enters a threshold value, reporting processes 208 calculate and display the resultant values of precision and recall. In another example, the user enters a desired value of precision, and reporting processes 208 calculate and display the corresponding threshold value. The user may also specify a cost ratio, which is the amount saved by automatically responding to a communication correctly divided by the amount lost by automatically responding to a communication incorrectly (for example, a saving of $10 for each correct automated response and a loss of $100 for each incorrect automated response will yield a cost ratio of 0.1), and reporting processes 208 will responsively calculate and display the corresponding threshold value. The methods of calculating the values of the foregoing parameters based on other specified parameters should be easily discernible to one of ordinary skill in the art and need not be described herein. The threshold calculator report 1000 may also include a button 1010 allowing the user to write the current (most recently specified or calculated) threshold value to the corresponding node of KB 108.

Finally, FIG. 11 shows a “stealing/stolen” report 1100 generated for a specified category. In some cases, poor KB performance occurs when categories “steal” corpus items from each other (i.e., when a corpus item receives a higher match score for an inappropriate category, relative to the match score calculated for the category to which the item belongs). For a selected category, stealing/stolen report 1100 shows the percentage and number of corpus items initially assigned to the selected category which yielded higher match scores in other categories (the “stolen from” column). In addition, stealing/stolen report 1100 displays for, each of the other categories, the percentage of corpus items initially assigned to the category which yielded a higher match score in the selected category (the “stolen by” column).

The occurrence of a relatively high number of incidents of stealing between pairs of categories may indicate that modeling engine 106 does not perceive a clear difference between the intents of the two categories, i.e., that the two nodes of KB 108 representing the categories contain overlapping content. In such situations, KB 108 performance may be improved by carefully redefining the categories to more clearly distinguish intents (or, if appropriate, joining them into a single category), reassigning categories to the corpus items to reflect the redefined categories, and retraining KB 108 using KB building processes 204.

Referring again to the FIG. 3 workflow diagram, the user may utilize information contained in one or more of the reports generated by reporting processes 208 to improve KB performance. Actions which may be taken by the user to remedy problems identified in the reports include redefining, deleting or adding categories; correcting or otherwise modifying individual corpus items; and, modifying KB 108 structure (e.g., by changing the organization of nodes, or by adding or changing rule-based nodes). Once these actions have been taken, KB 108 may be retrained by invoking KB building processes 204, and the retrained KB 108 may be tested against the testing subset of corpus items using KB testing processes 206. The user may then evaluate the performance of the retrained KB 108 by generating the appropriate reports using reporting processes 208.

It will be recognized by those skilled in the art that, while the invention has been described above in terms of preferred embodiments, it is not limited thereto. Various features and aspects of the above invention may be used individually or jointly. Further, although the invention has been described in the context of its implementation in a particular environment and for particular applications, those skilled in the art will recognize that its usefulness is not limited thereto and that the present invention can be beneficially utilized in any number of environments and implementations.

Claims

1. A computer-implemented method for training and testing a knowledge base of a computerized customer relationship management system, comprising: collecting, in a computer, one or more corpus items into a corpus, wherein the corpus items comprise electronic communications from customers;assigning, in the computer, a category from a set of predefined categories to each of the corpus items in the corpus;building, in the computer, a knowledge base of the computerized customer relationship management system by performing natural language and semantic analysis of a first subset of the corpus items in the corpus;testing, in the computer, the knowledge base of the computerized customer relationship management system on a second subset of the corpus items in the corpus by classifying each of the corpus items of the second subset into at least one of the predefined categories using information contained in the knowledge base of the computerized customer relationship management system; andgenerating and displaying, in the computer, a report based on results produced by the testing step to a user of the computerized customer relationship management system to gauge performance of the computerized customer relationship management system using the knowledge base, so that appropriate adjustments are made to improve the performance of the computerized customer relationship management system using the knowledge base.
2. The method of claim 1, wherein the step of testing the knowledge base includes calculating a set of scores for each corpus item in the second subset, each score from the calculated set of scores being associated with a corresponding category and being representative of a confidence that the corpus item belongs to the corresponding category.
3. The method of claim 1, wherein the step of generating and displaying a report includes generating a report relating to a single selected category.
4. The method of claim 1, wherein the step of generating and displaying a report includes generating a cumulative report relating to a plurality of categories.
5. The method of claim 1, wherein the step of generating and displaying a report includes: receiving user input specifying one of a precision value, a recall value, false positive rate, false negative rate, automation ratio or a cost ratio; andcalculating and displaying, for a selected category, a match score based on the user input.
6. The method of claim 1, wherein the step of generating and displaying a report includes: receiving user input specifying a match score; andcalculating and displaying, for a selected category, a precision value and a recall value based on the user input.
7. The method of claim 1, wherein the step of generating and displaying a report includes calculating precision as a function of recall and causing a graph to be displayed depicting the relationship between precision and recall.
8. The method of claim 1, wherein the step of generating and displaying a report includes generating and displaying a graph depicting cumulative success over time, the graph showing, for a plurality of groups of corpus items each having a common time parameter, the fraction of corpus items in the group that were appropriately classified.
9. The method of claim 1, wherein the step of generating and displaying a report includes generating and displaying a report showing, for each of a plurality of pairs of categories, a percentage of corpus items initially assigned to a first category of the pair of categories that were erroneously classified into a second category of the pair of categories.
10. The method of claim 1, wherein the step of generating and displaying a report includes generating and displaying a scoring report showing, for a selected category, match scores for each corpus item in the second subset, the match scores being representative of the relevance of the selected category to the corpus item.
11. The method of claim 1, wherein the first and second subsets of corpus items are selected in accordance with user input.
12. The method of claim 1, wherein the steps of use building and testing the knowledge base include using a modeling engine to analyze and classify corpus items.
13. The method of claim 1, wherein the step of generating and displaying a report includes selecting a report from a plurality of available reports in response to user input.
14. The method of claim 1, wherein the corpus items comprise customer communications received from one or more external sources.
15. The method of claim 1, wherein the corpus items include structured and unstructured information.
16. A device embodying instructions that, when executed by a computer, result in the computer performing a computer-implemented method for training and testing a knowledge base of a computerized customer relationship management system, comprising: collecting, in a computer, one or more corpus items into a corpus, wherein the corpus items comprise electronic communications from one or more customers;assigning, in the computer, a category from a set of predefined categories to each of the corpus items in the corpus;building, in the computer, a knowledge base of the computerized customer relationship management system by performing natural language and semantic analysis of a first subset of the corpus items in the corpus;testing, in the computer, the knowledge base of the computerized customer relationship management system on a second subset of the corpus items in the corpus by classifying each of the corpus items of the second subset into at least one of the predefined categories using information contained in the knowledge base of the computerized customer relationship management system; andgenerating and displaying, in the computer, a report based on results produced by the testing step to a user of the computerized customer relationship management system to gauge performance of the computerized customer relationship management system using the knowledge base, so that appropriate adjustments are made to improve the performance of the computerized customer relationship management system using the knowledge base.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. Utility application Ser. No. 10/835,694 filed on Apr. 29, 2004 entitled “SOFTWARE TOOL FOR TRAINING AND TESTING A KNOWLEDGE BASE,” which application claims the benefit of U.S. Provisional Application No. 60/468,493, filed May 6, 2003. The disclosure of the foregoing applications are incorporated herein by reference.

US Referenced Citations (343)

Number	Name	Date	Kind
3648253	Mullery et al.	Mar 1972	A
4110823	Cronshaw et al.	Aug 1978	A
4286322	Hoffman et al.	Aug 1981	A
4586160	Amano et al.	Apr 1986	A
4589081	Massa et al.	May 1986	A
4642756	Sherrod	Feb 1987	A
4658370	Erman et al.	Apr 1987	A
4724523	Kucera	Feb 1988	A
4805107	Kieckhafer et al.	Feb 1989	A
4814974	Narayanan et al.	Mar 1989	A
4817027	Plum et al.	Mar 1989	A
4908865	Doddington et al.	Mar 1990	A
4918735	Morito et al.	Apr 1990	A
4942527	Schumacher	Jul 1990	A
4984178	Hemphill et al.	Jan 1991	A
5018215	Nasr et al.	May 1991	A
5023832	Fulcher et al.	Jun 1991	A
5040141	Yazima et al.	Aug 1991	A
5051924	Bergeron et al.	Sep 1991	A
5060155	van Zuijlen	Oct 1991	A
5067099	McCown et al.	Nov 1991	A
5068789	van Vliembergen	Nov 1991	A
5099425	Kanno et al.	Mar 1992	A
5101349	Tokuume et al.	Mar 1992	A
5111398	Nunberg et al.	May 1992	A
5118105	Brim et al.	Jun 1992	A
5125024	Gokcen et al.	Jun 1992	A
5148408	Matthews	Sep 1992	A
5210872	Ferguson et al.	May 1993	A
5228116	Harris et al.	Jul 1993	A
5230054	Tamura	Jul 1993	A
5247677	Welland et al.	Sep 1993	A
5251129	Jacobs	Oct 1993	A
5251131	Masand et al.	Oct 1993	A
5265033	Vajk et al.	Nov 1993	A
5278942	Bahl et al.	Jan 1994	A
5287430	Iwamoto et al.	Feb 1994	A
5311583	Friedes et al.	May 1994	A
5321608	Namba et al.	Jun 1994	A
5325298	Gallant	Jun 1994	A
5325526	Cameron et al.	Jun 1994	A
5345501	Shelton	Sep 1994	A
5349526	Potts et al.	Sep 1994	A
5365430	Jagadish	Nov 1994	A
5369570	Parad	Nov 1994	A
5369577	Kadashevich et al.	Nov 1994	A
5371807	Register et al.	Dec 1994	A
5377354	Scannell et al.	Dec 1994	A
5418717	Su et al.	May 1995	A
5418948	Turtle	May 1995	A
5437032	Wolf et al.	Jul 1995	A
5444820	Tzes et al.	Aug 1995	A
5475588	Schabes et al.	Dec 1995	A
5483466	Kawahara et al.	Jan 1996	A
5487100	Kane	Jan 1996	A
5493677	Balogh et al.	Feb 1996	A
5493692	Theimer et al.	Feb 1996	A
5522026	Records et al.	May 1996	A
5526521	Fitch et al.	Jun 1996	A
5542088	Jennings, Jr. et al.	Jul 1996	A
5555344	Zunkler	Sep 1996	A
5559710	Shahraray et al.	Sep 1996	A
5574933	Horst	Nov 1996	A
5577241	Spencer	Nov 1996	A
5590055	Chapman et al.	Dec 1996	A
5594641	Kaplan et al.	Jan 1997	A
5596502	Koski et al.	Jan 1997	A
5610812	Scabes et al.	Mar 1997	A
5615360	Bezek et al.	Mar 1997	A
5627914	Pagallo	May 1997	A
5630128	Farrell et al.	May 1997	A
5634053	Noble et al.	May 1997	A
5634121	Tracz et al.	May 1997	A
5636124	Rischar et al.	Jun 1997	A
5649215	Itoh	Jul 1997	A
5664061	Andreshak et al.	Sep 1997	A
5680628	Carus	Oct 1997	A
5687384	Nagase	Nov 1997	A
5694616	Johnson et al.	Dec 1997	A
5701400	Amado	Dec 1997	A
5706399	Bareis	Jan 1998	A
5708829	Kadashevich	Jan 1998	A
5715371	Ahamed et al.	Feb 1998	A
5721770	Kohler	Feb 1998	A
5721897	Rubinstein	Feb 1998	A
5724481	Garberg et al.	Mar 1998	A
5737621	Kaplan et al.	Apr 1998	A
5737734	Schultz	Apr 1998	A
5745652	Bigus	Apr 1998	A
5745736	Picart	Apr 1998	A
5748973	Palmer et al.	May 1998	A
5754671	Higgins et al.	May 1998	A
5761631	Nasukawa	Jun 1998	A
5765033	Miloslavsky	Jun 1998	A
5768578	Kirk et al.	Jun 1998	A
5794194	Takebayashi et al.	Aug 1998	A
5799268	Boguraev	Aug 1998	A
5809462	Nussbaum	Sep 1998	A
5809464	Kopp et al.	Sep 1998	A
5822731	Schultz	Oct 1998	A
5822745	Hekmatpour	Oct 1998	A
5826076	Bradley et al.	Oct 1998	A
5832220	Johnson et al.	Nov 1998	A
5835682	Broomhead et al.	Nov 1998	A
5845246	Schalk	Dec 1998	A
5850219	Kumomura	Dec 1998	A
5860059	Aust et al.	Jan 1999	A
5864848	Horvitz et al.	Jan 1999	A
5864863	Burrows	Jan 1999	A
5867495	Elliott et al.	Feb 1999	A
5878385	Bralich et al.	Mar 1999	A
5878386	Coughlin	Mar 1999	A
5884032	Bateman et al.	Mar 1999	A
5884302	Ho	Mar 1999	A
5890142	Tanimura et al.	Mar 1999	A
5890147	Peltonen et al.	Mar 1999	A
5895447	Ittycheriah et al.	Apr 1999	A
5899971	De Vos	May 1999	A
5913215	Rubinstein et al.	Jun 1999	A
5920835	Huzenlaub et al.	Jul 1999	A
5933822	Braden-Harder et al.	Aug 1999	A
5940612	Brady et al.	Aug 1999	A
5940821	Wical	Aug 1999	A
5944778	Takeuchi et al.	Aug 1999	A
5946388	Walker et al.	Aug 1999	A
5948058	Kudoh et al.	Sep 1999	A
5950184	Kartutunen	Sep 1999	A
5950192	Moore et al.	Sep 1999	A
5956711	Sullivan et al.	Sep 1999	A
5960393	Cohrs et al.	Sep 1999	A
5963447	Kohn et al.	Oct 1999	A
5963894	Richardson et al.	Oct 1999	A
5970449	Alleva et al.	Oct 1999	A
5974385	Ponting et al.	Oct 1999	A
5974465	Wong	Oct 1999	A
5983216	Kirach	Nov 1999	A
5991713	Unger et al.	Nov 1999	A
5991751	Rivette et al.	Nov 1999	A
5991756	Wu	Nov 1999	A
5995513	Harrand et al.	Nov 1999	A
5999932	Paul	Dec 1999	A
5999990	Sharrit et al.	Dec 1999	A
6006221	Liddy et al.	Dec 1999	A
6009422	Ciccarelli	Dec 1999	A
6012053	Pant et al.	Jan 2000	A
6018735	Hunter	Jan 2000	A
6021403	Horvitz et al.	Feb 2000	A
6025843	Sklar	Feb 2000	A
6026388	Liddy et al.	Feb 2000	A
6032111	Mohri et al.	Feb 2000	A
6035104	Zahariev	Mar 2000	A
6038535	Campbell	Mar 2000	A
6038560	Wical	Mar 2000	A
6055528	Evans	Apr 2000	A
6058365	Nagai et al.	May 2000	A
6058389	Chandra et al.	May 2000	A
6061667	Danford-Klein et al.	May 2000	A
6061709	Bronte	May 2000	A
6064953	Maxwell, III et al.	May 2000	A
6064971	Hartnett	May 2000	A
6064977	Haverstock et al.	May 2000	A
6067565	Horvitz	May 2000	A
6070149	Tavor et al.	May 2000	A
6070158	Kirsch et al.	May 2000	A
6073098	Buchsbaum et al.	Jun 2000	A
6073101	Maes	Jun 2000	A
6076088	Paik et al.	Jun 2000	A
6081774	de Hita et al.	Jun 2000	A
6085159	Ortega et al.	Jul 2000	A
6092042	Iso	Jul 2000	A
6092095	Maytal	Jul 2000	A
6094652	Falsal	Jul 2000	A
6098047	Oku et al.	Aug 2000	A
6101537	Edelstein et al.	Aug 2000	A
6112126	Hales et al.	Aug 2000	A
6115734	Mansion	Sep 2000	A
6138128	Perkowitz et al.	Oct 2000	A
6138139	Beck et al.	Oct 2000	A
6144940	Nishi et al.	Nov 2000	A
6148322	Sand et al.	Nov 2000	A
6151538	Bate et al.	Nov 2000	A
6154720	Onishi et al.	Nov 2000	A
6161094	Adcock et al.	Dec 2000	A
6161130	Horvitz et al.	Dec 2000	A
6167370	Tsourikov et al.	Dec 2000	A
6169986	Bowman et al.	Jan 2001	B1
6182029	Friedman	Jan 2001	B1
6182036	Poppert	Jan 2001	B1
6182059	Angotti et al.	Jan 2001	B1
6182063	Woods	Jan 2001	B1
6182065	Yeomans	Jan 2001	B1
6182120	Beaulieu et al.	Jan 2001	B1
6185603	Henderson et al.	Feb 2001	B1
6199103	Sakaguchi et al.	Mar 2001	B1
6203495	Bardy	Mar 2001	B1
6212544	Borkenhagen et al.	Apr 2001	B1
6223201	Reznak	Apr 2001	B1
6226630	Billmers	May 2001	B1
6233575	Agrawal et al.	May 2001	B1
6233578	Machihara et al.	May 2001	B1
6236987	Horowitz et al.	May 2001	B1
6243679	Mohri et al.	Jun 2001	B1
6243735	Imanishi et al.	Jun 2001	B1
6249606	Kiraly et al.	Jun 2001	B1
6253188	Witek et al.	Jun 2001	B1
6256631	Malcolm	Jul 2001	B1
6256773	Bowman-Amuah	Jul 2001	B1
6260058	Hoenninger et al.	Jul 2001	B1
6263335	Paik et al.	Jul 2001	B1
6269368	Diamond	Jul 2001	B1
6271840	Finseth et al.	Aug 2001	B1
6275819	Carter	Aug 2001	B1
6278973	Chung et al.	Aug 2001	B1
6282565	Shaw et al.	Aug 2001	B1
6292794	Cecchini et al.	Sep 2001	B1
6292938	Sarkar et al.	Sep 2001	B1
6298324	Zuberec et al.	Oct 2001	B1
6301602	Ueki	Oct 2001	B1
6304864	Liddy et al.	Oct 2001	B1
6304872	Chao	Oct 2001	B1
6308197	Mason et al.	Oct 2001	B1
6311194	Sheth et al.	Oct 2001	B1
6314439	Bates et al.	Nov 2001	B1
6314446	Stiles	Nov 2001	B1
6324534	Neal et al.	Nov 2001	B1
6327581	Platt	Dec 2001	B1
6349295	Tedesco et al.	Feb 2002	B1
6353667	Foster et al.	Mar 2002	B1
6353827	Davies et al.	Mar 2002	B1
6360243	Lindsley et al.	Mar 2002	B1
6363373	Steinkraus	Mar 2002	B1
6363377	Kravets et al.	Mar 2002	B1
6366910	Rajaraman et al.	Apr 2002	B1
6370526	Agrawal et al.	Apr 2002	B1
6374221	Haimi-Cohen	Apr 2002	B1
6377945	Rievik	Apr 2002	B1
6377949	Gilmour	Apr 2002	B1
6393415	Getchius et al.	May 2002	B1
6397209	Read et al.	May 2002	B1
6397212	Biffar	May 2002	B1
6401084	Ortega et al.	Jun 2002	B1
6408277	Nelken	Jun 2002	B1
6411947	Rice et al.	Jun 2002	B1
6411982	Williams	Jun 2002	B2
6415250	van den Akker	Jul 2002	B1
6418458	Maresco	Jul 2002	B1
6421066	Sivan	Jul 2002	B1
6421675	Ryan et al.	Jul 2002	B1
6424995	Shuman	Jul 2002	B1
6424997	Buskirk, Jr. et al.	Jul 2002	B1
6430615	Hellerstein et al.	Aug 2002	B1
6434435	Tubel et al.	Aug 2002	B1
6434554	Asami et al.	Aug 2002	B1
6434556	Levin et al.	Aug 2002	B1
6438540	Nasr et al.	Aug 2002	B2
6442542	Ramani et al.	Aug 2002	B1
6442589	Takahashi et al.	Aug 2002	B1
6446061	Doerre et al.	Sep 2002	B1
6446081	Preston	Sep 2002	B1
6446256	Hyman et al.	Sep 2002	B1
6449589	Moore	Sep 2002	B1
6449646	Sikora et al.	Sep 2002	B1
6460074	Fishkin	Oct 2002	B1
6463533	Calamera et al.	Oct 2002	B1
6466940	Mills	Oct 2002	B1
6477500	Maes	Nov 2002	B2
6477580	Bowman-Amuah	Nov 2002	B1
6480843	Li	Nov 2002	B2
6490572	Akkiraju et al.	Dec 2002	B2
6493447	Goss et al.	Dec 2002	B1
6493694	Xu et al.	Dec 2002	B1
6496836	Ronchi	Dec 2002	B1
6496853	Klein	Dec 2002	B1
6505158	Conkie	Jan 2003	B1
6507872	Geshwind	Jan 2003	B1
6513026	Horvitz et al.	Jan 2003	B1
6535795	Schroeder et al.	Mar 2003	B1
6542889	Aggarwal et al.	Apr 2003	B1
6560330	Gabriel	May 2003	B2
6560590	Shwe et al.	May 2003	B1
6571282	Bowman-Amuah	May 2003	B1
6584464	Warthen	Jun 2003	B1
6594697	Praitis et al.	Jul 2003	B1
6601026	Appelt et al.	Jul 2003	B2
6611535	Ljungqvist	Aug 2003	B2
6611825	Billheimer et al.	Aug 2003	B1
6615172	Bennett et al.	Sep 2003	B1
6618727	Wheeler et al.	Sep 2003	B1
6651220	Penteroudakis et al.	Nov 2003	B1
6654726	Hanzek	Nov 2003	B1
6654815	Goss et al.	Nov 2003	B1
6665662	Kirkwood et al.	Dec 2003	B1
6675159	Lin et al.	Jan 2004	B1
6704728	Chang et al.	Mar 2004	B1
6711561	Chang et al.	Mar 2004	B1
6714643	Gargeya et al.	Mar 2004	B1
6714905	Chang et al.	Mar 2004	B1
6738759	Wheeler et al.	May 2004	B1
6742015	Bowman-Amuah	May 2004	B1
6744878	Komissarchik et al.	Jun 2004	B1
6745181	Chang et al.	Jun 2004	B1
6747970	Lamb et al.	Jun 2004	B1
6748387	Garber et al.	Jun 2004	B2
6766320	Wang et al.	Jul 2004	B1
6785671	Bailey et al.	Aug 2004	B1
6850513	Pelissier	Feb 2005	B1
6862710	Marchisio	Mar 2005	B1
6868065	Kloth et al.	Mar 2005	B1
6879586	Miloslavsky et al.	Apr 2005	B2
6915344	Rowe et al.	Jul 2005	B1
7047242	Ponte	May 2006	B1
20010022558	Karr et al.	Sep 2001	A1
20010027408	Nakisa	Oct 2001	A1
20010027463	Kobayashi	Oct 2001	A1
20010042090	Williams	Nov 2001	A1
20010047270	Gusick et al.	Nov 2001	A1
20010056456	Cota-Robles	Dec 2001	A1
20020032715	Utsumi	Mar 2002	A1
20020052907	Wakai et al.	May 2002	A1
20020059161	Li	May 2002	A1
20020059204	Harris	May 2002	A1
20020065953	Alford et al.	May 2002	A1
20020073129	Wang et al.	Jun 2002	A1
20020078119	Brenner et al.	Jun 2002	A1
20020078121	Ballantyne	Jun 2002	A1
20020078257	Nishimura	Jun 2002	A1
20020083251	Chauvel et al.	Jun 2002	A1
20020087618	Bohm et al.	Jul 2002	A1
20020087623	Eatough	Jul 2002	A1
20020091746	Umberger et al.	Jul 2002	A1
20020099714	Murray	Jul 2002	A1
20020103871	Pustejovsky	Aug 2002	A1
20020107926	Lee	Aug 2002	A1
20020116463	Hart	Aug 2002	A1
20020150966	Muraca	Oct 2002	A1
20020154645	Hu et al.	Oct 2002	A1
20020196911	Gao et al.	Dec 2002	A1
20030028564	Sanfilippo	Feb 2003	A1
20030046297	Mason	Mar 2003	A1
20030069780	Hailwood et al.	Apr 2003	A1
20040167889	Chang et al.	Aug 2004	A1
20040254904	Nelken et al.	Dec 2004	A1
20050187913	Nelken et al.	Aug 2005	A1

Foreign Referenced Citations (7)

Number	Date	Country
2180392	Feb 2001	CA
0 597 630	May 1994	EP
0 304 191	Feb 1999	EP
09106296	Apr 1997	JP
WO 0036487	Jun 2000	WO
0184373	Aug 2001	WO
0184374	Aug 2001	WO

Related Publications (1)

	Number	Date	Country
	20070294201 A1	Dec 2007	US

Provisional Applications (1)

	Number	Date	Country
	60468493	May 2003	US

Continuations (1)

	Number	Date	Country
Parent	10835694	Apr 2004	US
Child	11843937		US

Software tool for training and testing a knowledge base

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract