This invention relates generally to techniques for processing agent-customer interactions and, more specifically, relates to determining information from the interactions and concurrent agent activity.
Call centers are part of a customer service system, both of which are included under the strategy of customer relationship management (CRM). Call centers handle a variety of topics, from customer support to technical support to billing. Interactions between the agents who respond to the calls (or the chats) can be complex. There have been studies in the past that analyzed these interactions to attempt to provide insight and feedback, and therefore improve efficiency, customer loyalty, and revenue.
For instance, a contact study has been used to assess call center and back office operations in delivery centers. The contact study was performed by using a “contact collector tool”, which is an advanced “time and motion” tool that allows for capture and analysis of existing agent contact handling interactions, processes and behaviors. The contact study helped leading companies identify key areas for improvement, including providing data for business case justification to support the overall business vision and leverage the contact center as a competitive differentiator. The contact study provided a mechanism to derive operational strengths and areas for opportunity.
To perform the contact study, people were sent to sit side-by-side with agents to use the contact collector tool to collect information such as segmentations of call handling, information technology (IT) system utilization and business related information. Once such data was collected, analysts have to consolidate individual input to perform analysis. Overall, this particular contact study took about 320 human hours per engagement.
While the results of the contact study were very useful, the contact study used a tremendous number of human hours. It would be beneficial to provide techniques that do not require such a large human hour requirement.
In a first aspect, a method includes deriving first information from a number of agent-customer interactions in a customer service system, and determining concurrent system activity by the agents in the customer service system, the concurrent system activity occurring at least partially concurrently with the number of agent-customer interactions. The method further includes combining the determined first information and the determined concurrent system activity to determine second information related to one or more of the number of agent-customer interactions, and outputting the second information.
In a second aspect, an apparatus is disclosed that includes one or more processors and one or more memories coupled to the one or more processors and comprising program code. The one or more processors, in response to executing the program code, are configured to cause the apparatus perform the following: deriving first information from a number of agent-customer interactions in a customer service system; determining concurrent system activity by the agents in the customer service system, the concurrent system activity occurring at least partially concurrently with the number of agent-customer interactions; combining the determined first information and the determined concurrent system activity to determine second information related to one or more of the number of agent-customer interactions; and outputting the second information.
In a third aspect, a computer readable medium is disclosed that tangibly embodies a program of machine-readable instructions executable by a digital processing apparatus to cause the digital processing apparatus to perform operations including: deriving first information from a number of agent-customer interactions in a customer service system, and determining concurrent system activity by the agents in the customer service system, the concurrent system activity occurring at least partially concurrently with the number of agent-customer interactions; combining the determined first information and the determined concurrent system activity to determine second information related to one or more of the number of agent-customer interactions; and outputting the second information.
The foregoing and other aspects of embodiments of this invention are made more evident in the following Detailed Description of Exemplary Embodiments, when read in conjunction with the attached Drawing Figures, wherein:
Techniques are disclosed herein for multi-modal processing for automatic call/chat segmentation and analysis. These techniques can analyze speech/text (i.e., call/chat) agent-customer interactions coupled with concurrent system activity of the agents to derive insights that can improve the efficiency of the customer service process. Applications of these techniques include, but are not limited to agent performance analysis, process efficiency improvement, and automatic quality monitoring. Applications of these techniques provide analysis with a much lower human hour cost.
Referring now to
Associated with the back office 145 is a supervisor 140. The back office 145 has supervisory level of support, such as billing and oversight. From the front office 130 and the back office 145, the inputs 150 are used in the data collection 155 action. After data collection 155, there is data processing 160, data analysis 165, and insights 170. The instant invention resides primarily in the data processing 160 action, but also can perform at least some part of the data analysis 165 action.
A typical scenario would be that the supervisor 140 would like to be able to examine information about the interaction 115, in order to reach the insights 170. The data processing 160 and data analysis 165 provided by the instant invention can provide the types of exemplary insights 170.
Turning now to
The instant invention can provide time locations T1, T2, T3, and T4 for the call phases. Furthermore, in order to determine the time locations T1-T4, the invention can use the time locations T5, T6, T7, T8, and T9 of the system activities 220 in order to provide more accurate assessments of the locations T1-T4. For instance, the system activity 220 between T6 and T7 indicates that the greeting phase is most likely concluded. Combining the system activity information 220 with information about the interaction 115 can therefore provide additional analysis and determination of the call phase information 210. Moreover, the instant invention can also be used (as a non-limiting, non-exhaustive list) to perform the following, which aid in insight: (a) understand what call phase is taking what proportion of the interaction time (this can be used to change the interaction style as an example); (b) detect calls that behave significantly different from an average call; and/or (c) detect calls that fit a certain criterion (for, e.g., calls with no “closing” phase).
Similar to the contact study previously described, the invention may also be used in a contact study. Such contact studies are often a part of CRM process transformation. A project goal of such a contact study includes enabling contact study automation with established bases, for visibility into front office 130 and back office 145 processes, and developing quantifiable insight for process improvements. Additional goals commensurate with this include:
1) Automate and simplify contact (call and case) study data collection with identification of phase and system timers for contact analysis;
2) Perform advanced analytics with the contact study data for process behavior insights leading to opportunities for process improvement; and
3) Track improvement opportunities identified by time volume capture (TVC) and automatic contact collector (ACC) tools together across sites and resource pools for higher productivity and standardization within processes. Such goals may be met by exemplary aspects of the instant invention.
Referring now to
The insights portion 170 is typically displayed by the client computer 330, although the reporting/charting tool(s) 325 provides data to the client computer 330. The client computer 330 is showing output of the reporting/charting tool(s) 325 and shows a scorecard 335 (e.g., how well certain criteria are being met), a chart 340, and a report 350.
Typically, the front office 130 is that section of the contact center that deals with the customers at real-time, i.e., voice calls or interactive chats. The back office 145 is the section which deals with non-real-time transactions like emails, letters, voice mails. However, the instant invention may take a wide variety of configurations. The scorecard 335, chart 340, report 350 all help to develop insight, such as to understand what call phase is taking what proportion of the interaction time, to detect calls that behave significantly different from an average call, and/or to detect calls that fit a certain criterion. The instant invention has aspects spread across all of the data collection portion 155, data processing portion 160, data analysis portion 165, and insights portion 170. The system 300 will typically be used to understand the interaction process at an aggregate level (i.e., across various agent and different times) by an expert (e.g., supervisor 140) whose goal is typically to find out ways in which the process can be made more efficient (i.e., spend less time and/or improve rate of problem resolution and/or improve customer satisfaction) and/or find out areas of improvement for individual agents. Example insights are mentioned above. The insights should give an idea on what kind of questions can be asked. For example, (a) what was the agent doing when the customer was on hold, (b) what was the main concern of the customer? Other exemplary insights include (a) the time spent in the problem diagnosis phase (a phase 210 of
The instant invention, e.g., using the system 300 or portions thereof may be used to improve the efficiency of call/chat processes by combining (a) insights obtained from the audio exchange of the call, and (b) concurrent activities on the agent's computer system. Further, exemplary embodiments of the instant invention provide methods, apparatus, and program products for segmenting conversations that use multiple sources of information, including system activity, transcription of audio, identity of speakers (e.g., caller/agent), and prosodic features and that use an automatic or semi-supervised learning algorithm to make the most efficient use of available labeled training data. Exemplary embodiments of the instant invention are also directed to techniques for determining identity of speaker that uses acoustic, lexical, automatic speech recognition (ASR)-related and channel-specific features. Additional exemplary embodiments provide techniques for answering higher level questions about calls that use segments of the conversation along with other features including: words transcribed, emotions and information aggregated across calls.
Referring now to
Semi-supervised algorithms are performed in block 430. These algorithms 430 make optimal use of the limited hand labeled audio calls to generate phase boundaries and/or other labels for unlabeled calls and use these labels to re-learn the characteristics of the interactions. One possible embodiment of a semi-supervised algorithm 430 is described as follows. A Hidden Markov Model (HMM) model can be trained on the unlabelled data (which are, e.g., the ASR transcripts of the audio calls with no information about the phase/segment boundaries). The trained HMM model will assign a “phase label” to each part of the call-transcript. This phase label can then be used as an additional feature in the supervised training procedure on the labeled data. Another way of utilizing the trained HMM model is to use the output of the HMM model to find the words/features that are highly correlated with certain HMM states and then assign a higher weight to these words/features in the supervised training.
Speech/text interaction(s) 405 are analyzed by block 420, which computes lexicon and prosodic (pros) features. Call/chat segmentation is performed in block 425 (see
In block 440, the system 400 may perform automatic answering of questions based on inputs from blocks 415 and 425, and from system activity information 435 and insights from call aggregates 445. Insights from call aggregates 445 are generated by aggregating the calls that are similar on some dimensions such as “on same topic”, “from close geographical location” or “around the same time” and so on. Insights can include “average proportion of each of the phases”, “most likely sequence of phases”, “tools/aids available to the agent” and so on. It is noted that block 440 can benefit from analysis of similar calls, such as calls occurring around the same time or from a geographically close area or on the same topic. Such global analysis captures dynamically varying trends. In block 450, insights to improve process efficiency are determined.
With regard to the system activity information block 435, customer-agent interaction typically involves a parallel interaction between the agent and the system, e.g., retrieving/verifying customer data, browsing frequently asked questions (FAQs), generating requests and so on. In response to this, temporal profiles of various activities of the agent are generated on the system (using, e.g., system times 305). Many high-level questions (e.g., ‘what did the agent do while the customer was on hold?” and so on) can be answered only by combining such system activity profiles with insights from audio data. System activity information also helps in improving the performance of call segmentation (block 425).
In regard to the automatic answering of questions block 440, the following observations may be made: (1) answers for questions are not equally likely in each phase 210; (2) some answers are more likely in speech of the agent (or speech of the customer); and (3) emotions are indicative of many answers. Consequently, to learn likely answer phrases, calls are analyzed where the answers are provided by human experts and the locations of the answers are hand-labeled. This analysis occurs in semi-supervised algorithms block 430 and also in insights from call aggregates block 445. The hand-labels from the experts are learnt from semi-supervised algorithms block 430 and the call trends are captured in insights from call aggregates block 445. Additionally, the call/chat segmentation block 425 is the segmentation phase, which has the information that can be used by the automatic answering of questions block 440.
Turning to
Concerning ASR-based features, the speaker independent ASR system, with appropriate AM/LM (acoustic model/language model) 502, periodically computes speaker-specific parameters (SSPs) (e.g., VTLN α-factor) to improve the recognition performance. VTLN is vocal tract length normalization, and “VTLN α-factor” is a technical term used in ASR algorithms to recognize the speech even when the speaker changes. If there is a significant change in one or more of these SSPs, this indicates a change in speaker. Also, for regions with similar values for all the SSPs, this indicates speech is from the same speaker. The ASR system 511 uses the appropriate AM/LM (acoustic model/language model) 502 and the speech signal 501. In block 513, temporal variations in speaker specific parameters (e.g., VTLN warp factor, also called the VTLN α-factor herein). In block 515, locations are detected with variations above a certain threshold. In block 520, likely locations of speaker change are determined.
With regard to prosodic features, each speaker has a unique speech production apparatus. This uniqueness is captured by analyzing the physical speech signal 501. In block 505 therefore, prosodic features such as pitch, energy, and voice quality are computed. In block 506, locations are detected where feature variation is above a certain threshold. In block 510, likely locations of speaker changes are determined.
Concerning lexical features, typically, different sets of words are spoken by the customer and the agent during different phases of the interaction. In order to determine these different sets, transcripts are computed in block 525. In block 530, short-time histograms of different N-grams are computed. In block 535, locations are identified where the histograms shift substantially. In block 540, likely locations of speaker change are determined.
It is noted that the ASR and or the prosodic features can also include channel-specific features may also be used. The volume, background noise and other non-speech cues vary across the customer and the agent location.
Combination of the above features in block 545 leads to a temporal profile of silence/speaker-turn and locations of speaker changes.
Turning now to
For instance,
The speaker turn information 415 (see
Agent-system interaction 625 is input to block 630. The agent-system interaction 625 is the system activity information 435. In block 630, system activity analysis is performed, and locations of important events are determined in block 640. It is noted that the system activity analysis in block 630 may be supplemented and helped by events/categories of applications to track (block 645). Some examples of events/categories-of-applications to track are “agent filling the problem escalation form”, “agent browsing FAQ pages”, “agent accessing the client's servers for information” and so on.
In block 660, call aggregates 650 are analyzed to learn rules that indicate phase changes and/or identity of a phase. One way of learning the rules mentioned in 660 is to analyze the distribution of words in the vicinity of phase boundaries and in the middle of the phases.
In block 670, these various outputs are combined in order to segment calls. A phase is identified for segmentation at locations where multiple of the following sources identify a trigger: (a) speaker change is identified, (b) account-specific or N-gram based feature is detected, (c) system activity indicates an event of interest, (d) a phase-change or phase-ID rule is triggered.
Each of the above modes provides complementary information. For example, (a) audio analysis can indicate the location of hold and who initiated the hold, (b) the corresponding system information can indicate what happened during the hold, and (c) speaker identification (ID) detection after the hold can identify if a new agent (i.e., a subject matter expert) joined the interaction. Combining this information captured by different modes gives a richer understanding of the interaction. Blocks 670 and 450 are the block where this combination of information from different modes is performed.
Referring now to
A selection criterion may also be selected or entered (in the Enter block with “X=?”). The button 721 allows one to list calls and then to select a call. The button 722 allows a selected call to be played. The button 723 allows a transcript and phrases to be displayed.
Turning to
In block 1420, concurrent system activity is determined (see, e.g., blocks 435, 635). The concurrent system activity occurs concurrently with the agent-customer interactions. In block 1430, the determined first information and the determined concurrent system activity are combined to determine second information related to one or more of the agent-customer interactions. The second information is output (block 1440), e.g., in a form suitable for use for display. The second information is displayed in block 1450. Such display could be, e.g., the scorecard 335, chart 340, or report 340 in
In block 1460, insights are determined using the displayed information. Insights have been described above but include (a) understanding what call phase is taking what proportion of the interaction time (this can be used to change the interaction style as an example); (b) detecting calls that behave significantly different from an average call; (c) detecting calls that fit a certain criterion (for, e.g., calls with no “closing” phase); (d) determining that the time spent in the problem diagnosis phase (a phase 210 of
As should be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” It is noted that “entirely software” embodiments still require some type of hardware (e.g., a general purpose computer) on which to be executed (and therefore create a special purpose computer performing one or more of the actions described herein). Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or assembly language or similar programming languages. Such computer program code may also include code for field-programmable gate arrays, such as VHDL (Very-high-speed integrated circuit Hardware Description Language).
Aspects of the present invention are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable digital processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable digital processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the best techniques presently contemplated by the inventors for carrying out embodiments of the invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. All such and similar modifications of the teachings of this invention will still fall within the scope of this invention.
Furthermore, some of the features of exemplary embodiments of this invention could be used to advantage without the corresponding use of other features. As such, the foregoing description should be considered as merely illustrative of the principles of embodiments of the present invention, and not in limitation thereof.