Identifying an intent of a caller in a conversation between a caller and an agent of a call center is a useful task for efficient customer relationship management (CRM), where an intent may be, for example, a reason why the caller has called into the call center. CRM processes, both automatic and manual, can be designed to improve intent identification. Intent identification is useful for CRM to determine issues related to products and services, for example, in real-time as callers call the call center. In addition, these processes can both improve customer satisfaction and allow for crossselling/upselling of other products.
In an embodiment, a method of labeling sentences for presentation to a human can include, in a hardware processor, selecting an intent bearing excerpt from sentences in a database, presenting the intent bearing excerpt to the human, and enabling the human to apply a label to each sentence based on the presentation of the intent bearing excerpt, the label being stored in a field of the database corresponding to the respective sentence. The sentences can be a grouping of sentences, such as from a same audio or text file. The sentences can be associated sentences or sentences associated with each other. The sentences can be related to each other by being from the same source (e.g., being from the same speaker or dialogue).
In another embodiment, the method can further include training the selecting of the intent bearing excerpt through use of manual input.
In yet another embodiment, the method can further include filtering the sentences used for training based on an intelligibility threshold. The intelligibility threshold can be an automatic speech recognition confidence threshold.
In yet another embodiment, the method can include choosing a representative sentence of a set of sentences based on at least one of similarity of the sentences of the set or similarity of intent bearing excerpts of the set of sentences. The method can further include applying the label to the entire set based on the label chosen for the intent bearing excerpt of the representative sentence.
In yet another embodiment, the intent bearing excerpt can be a non-contiguous portion of the sentences.
In another embodiment, the method can further include determining a part of the excerpt likely to include an intent of the sentences. Selecting the intent bearing excerpt can include focusing the selection on the part of the excerpt that includes the intent.
In yet another embodiment, the method can include loading the sentences by loading a record that includes a dialogue, monologue, transcription, dictation, or combination thereof.
In another embodiment, the method can include annotating the excerpt with a suggested label and presenting the excerpt with the suggested annotation to the human.
In another embodiment, the method can include presenting the intent bearing excerpt to a third party.
In another embodiment, a system for labeling sentences for presentation to a human can include a selection module configured to select an intent bearing excerpt from sentences with each other. The system can further include a presentation module configured to present the intent bearing excerpt to the human. The system can further include a labeling module configured to enable the human to apply a label to each of the sentence(s) based on the presentation of the intent bearing excerpt.
In another embodiment, a non-transitory computer-readable medium can be configured to store instructions for labeling sentences for presentation to a human. The instructions, when loaded and executed by a processor, can cause the processor to select an intent bearing excerpt from sentences, present the intent bearing excerpt to the human, and enable the human to apply a label to each sentence based on the presentation of the intent bearing excerpt.
The foregoing will be apparent from the following more particular description of example embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the present invention.
A description of example embodiments of the invention follows.
In an embodiment of the present invention, call classification can have two phases. A first phase is the training of a classifier. In the first phase of training, a human is used to label example calls to train a classifier. Stated another way, training is can be a human assigning one of a set of labels to each call. Training produces a classifier, which is a form of a statistical model, which can be embodied as a file in a memory.
A second phase of call classification is the classification of calls not labeled during training. The second phase is performed by a computer program that extracts information from the calls and uses the classifier (e.g., statistical model) to attempt to automatically assign labels to the unlabeled calls. An embodiment of the present invention optimizes the first phase of training the classifier to minimize human labor in training the classifier and/or creating a more accurate classifier.
Manually labeling a subset of calls with intent labels helps accurately predict the intent labels for the remaining calls using a classifier trained by the manual labeling. While manually labeling most or all of the subsets of calls with intent labels can improve label prediction accuracy, such a large manual effort is costly and impractical in most scenarios.
A traditional call classification system assigns intent labels to all the unlabeled calls. Human supervised or semi-supervised methods achieve improved accuracy by manually assigning labels to calls. Human supervised or semi-supervised methods can include manual labeling of calls or providing labels to a classifier, which can then label calls. Prediction accuracy is high if more calls are manually labeled, but that requires a large manual effort. Based on a chosen budget of manual effort (e.g., labor budget, budget of manual labeling, budget of human effort, budget of human labeling), the system chooses a subset M of N total calls to label manually. The system trains a classifier based on the M manually labeled calls. The classifier is later used to automatically label the remaining N-M calls. Typically, higher accuracy can require a higher M value, or a higher M:N ratio.
In an embodiment, a labeling system is used to achieve an optimal label prediction accuracy with least possible manual effort. The labeling system includes three subsystems that reduce manual effort involved in traditional intent labeling systems. A first subsystem is a call intelligibility classifier. Not all the calls recorded by the call center are intelligible or contain useful information. For example, for some calls, the automated speech recognition (ASR) error rate is high enough that it is impossible to determine information, such as an intent, from the call. As another example, the caller can be speaking in a different language. As another example, the call may have produced an error at the interactive voice response (IVR) system and, therefore, not produced a useful text result. Discarding such unintelligible calls automatically reduces the manual effort involved in labeling such calls.
A second subsystem is a call intent summarizer. Caller intent is typically conveyed in short segments within calls. The call intent summarizer generates an intent-focused summary of the call to reduce the manual effort by a human by avoiding the reading by the human of the irrelevant parts of the calls. For example, consider a call stating “Hello. I am a customer and I would like to be able to check my account balance.” The call intent summarizer can generate a call intent summary stating “check my account balance,” saving the human the time of reading irrelevant words of the call.
A third subsystem is an active sampling module. Label information for one or more of the calls can be generalized to a set of calls. For example, the system may determine that a set of calls have a similar intent (e.g., by having a similar pattern of words, etc.). Upon a human's choosing an intent bearing label for one of the set of calls, a classifier can apply this label to the remainder of the calls, so there is no need for a human to label a call manually with the same intent again. Choosing an optimal set of calls for manual labeling can lead to maximal information gain and, thus, least manual effort because the human only has to label one representative call of the set as opposed to each call individually.
These three subsystems can be combined as a pre-screening process to use human effort to label calls manually more efficiently. The three systems combined reduce human effort from attempting to label calls manually that are unintelligible, prevents human effort from attempting to label calls manually similar to calls already manually labeled, and isolates intent bearing parts of the call so that the human can label each call faster. Combined, the three subsystems allow the manual labeling to apply to a broader set of calls and a more robust training of the classifier. Alternatively, less time can be spent manually labeling, thereby reducing the labor budget of a project, while still producing the same training of the classifier.
The call preprocessing module 106 outputs calls to be manually labeled 108 to a presentation device 110. A manual labeler 116, from the presentation device 110, reads an intent bearing excerpt 114 associated with one of the calls to be manually labeled 108. The call preprocessing module 106 generates the intent bearing excerpt 114 in processing the unlabeled calls 104. Consider an example unlabeled call 104 stating “Hello. I would like help to purchase a ticket to Toronto on Thursday.” An example intent bearing excerpt 114 for this call can be “ticket to Toronto on Thursday.” The manual labeler 116 can read the intent bearing excerpt 114 instead of reading the entire call, and therefore can label each call faster, because the presentation device 110 showing the manual labeler 116 only the intent bearing excerpt 114. The call preprocessing module 106, for example, can compute an intelligibility score for each call. Calls with a score below a threshold are assumed to be unintelligible and are filtered out of the list of calls to be manually labeled. The call preprocessing module 106 can further reduce the number of calls presented to the human by presenting for manual labeling only one call per group of similar calls. The call preprocessing module 106 can perform active sampling to group similar calls together, and only present one of a group of calls with similar intent bearing excerpts 114 to the manual labeler 116 on the presentation device 110.
Upon a budget of manual labor being exhausted, the presentation device 110 outputs intents and corresponding calls 120 to a classifier training module 122. The classifier training module 122 builds a classification model 124 based on the intents and corresponding calls 120. Then, a call classifier 126 receives calls to be automatically labeled 118 from the call preprocessing module 106. The call classifier 126, using the classification model 124, automatically labels the calls to be automatically labeled 118 and outputs calls with labels 128. Therefore, the call preprocessing module 106, by improving the efficiency of the manual labeler 116, either reduces the labor budget to be expended for manual labeling, or creates a more robust classification model 124 based on the improved efficiency of the manual labeler 116 with the same labor budget.
The M intelligible calls 432 are then sent to a manual intent labeling trainer 434. The manual intent labeling trainer 434 is employed to train an intent summarizer 438 to find intent bearing excerpts of sentences. The intent summarizer 438 is not employed to find the intents themselves, but rather is employed to find areas of sentences in a call that are likely to have the intent. In order to perform such a summary of sentences, a user manually provides data on a number of calls to build a classifier, or training info for summarizer 436, that the intent summarizer 438 can use for the rest of the M intelligible calls 432. The intent summarizer 438 then outputs call summaries 440 to an active sampling module 442. The active sampling module 442 forms groups of calls that are determined to have the same meaning, and selects a representative subset from each group for labeling. The active sampling module 442 then only presents or displays a representative subset of calls or call summaries of each group to the user in manually labeling the calls. The representative subset of calls or call summaries can be one or more call or call summaries.
Client computer(s)/devices 50 and server computer(s) 60 provide processing, storage, and input/output devices executing application programs and the like. The client computer(s)/devices 50 can also be linked through communications network 70 to other computing devices, including other client devices/processes 50 and server computer(s) 60. The communications network 70 can be part of a remote access network, a global network (e.g., the Internet), a worldwide collection of computers, local area or wide area networks, and gateways that currently use respective protocols (TCP/IP, Bluetooth®, etc.) to communicate with one another. Other electronic device/computer network architectures are suitable.
In one embodiment, the processor routines 92 and data 94 are a computer program product (generally referenced 92), including a non-transitory computer-readable medium (e.g., a removable storage medium such as one or more DVD-ROM's, CD-ROM's, diskettes, tapes, etc.) that provides at least a portion of the software instructions for the invention system. The computer program product 92 can be installed by any suitable software installation procedure, as is well known in the art. In another embodiment, at least a portion of the software instructions may also be downloaded over a cable communication and/or wireless connection.
While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.