The components in the drawings are not necessarily to scale relative to each other.
Like reference numerals designate corresponding parts throughout the several views.
As will be described in detail here with reference to several exemplary embodiments, systems and methods for analyzing communication sessions can potentially enhance post-recording processing of communication sessions. In this regard, it is known that compliance recording and/or recording of communication sessions for other purposes involves recording various types of information that are of relatively limited substantive use. By way of example, music, announcements and/or queries by IVR systems commonly are recorded. Such information can cause problems during post-recording processing in that these types of information can make it difficult for accurate processing by speech recognition and phonetic analysis systems. Additionally, since such information affords relatively little substantive value, inclusion of such information tends to use recording resources, i.e., the information takes up space in memory, thereby incurring cost without providing corresponding value.
Referring now to
In some embodiments, information that does not correspond to a voice component of any party to the communication session is deleted from the recording of the communication session. As another example, such information could be identified and any post-recording processing algorithms could ignore those portions, thereby enabling processing resources to be devoted to analyzing other portions of the recordings.
As a further example, at least with respect to announcements and queries from IVR systems that involve pre-recorded or synthetic human voices (i.e., computer generated voices), information regarding those audio components can be provided to the post-recording processing algorithms so that analysis can be accomplished efficiently. In particular, if the processing system has knowledge of the actual words that are being spoken in those audio components, the processing algorithm can more quickly and accurately convert those audio components to transcript form (as in the case of speech recognition) or to phoneme sequences (as in the case of phonetic analysis).
The contact center also incorporates an automated call distributor (ACD) 314 that facilitates routing of a call between the customer and the agent. The communication session is recorded by a recording system 316 that is able to provide information corresponding to the communication session to the voice analysis system for analysis.
In operation, the voice analysis system receives information corresponding to a communication session that occurs between a customer 320 and an agent 322, with the session occurring via a communication network 324. Specifically, the ACD routes the call so that the customer and agent can interact and the recorder records the communication session.
With respect to the voce analysis system 302, the identification system 304 analyzes the communication session (e.g., from the recording) to determine whether post-recording processing should be conducted with respect to each of the recorded portions of the session. Based on the determinations, which can be performed in various manners (examples of which are described in detail later), processing can be performed by the post-recording processing system 306. By way of example, the embodiment of
Notably, the ACD 314 can be responsible for providing various announcements to the customer. In some embodiments, these announcements can be provided via synthetic human voices and/or recordings. It should be noted that other types of announcements can be present in recordings that are not provided by an ACD. By way of example, a telephone central office can introduce announcements that could be recorded. As another example, voice mail systems can provide announcements. The principles described herein relating to treatment of ACD announcements are equally applicable to such other forms of announcements regardless of the manner in which the announcements become associated with a recording.
[please add any other comments regarding announcements]
Additionally or alternatively, the ACD can facilitate interaction of the customer with an IVR system that queries the customer for various information. Additionally or alternatively, the ACD can provide music on hold, such as when the call is queued awaiting pickup by an agent. It should be noted that other types of music can be present in recordings that are not provided by an ACD. By way of example, a customer could be speaking to an agent when music is being played in the background. The principles described herein relating to treatment of ACD music on hold are equally applicable to such other forms of music regardless of the manner in which the music becomes associated with a recording.
[please add any other comments regarding music]
In block 410, information regarding the presence of the music, announcements and/or IVR audio is used to influence post-recording processing of a communication session. By way of example, the corresponding portions of the recording can be designated or otherwise flagged with information indicating that music, announcements and/or IVR audio is present. Other manners in which such a post-recording process can be influenced will be described in greater detail later.
Thereafter, the process proceeds to block 412, in which post-recording processing is performed. In particular, such post-recording processing can include at least one of speech recognition and phonetic analysis.
With respect to the identification of various portions of a communication session, a voice analysis system can be used to distinguish those portions of a communication session that include voice components of a party to the communication from other audio components. Depending upon the particular embodiment, such a voice analysis system could identify the voice components of the parties as being suitable for both post-recording analysis and/or could identify other portions as not being suitable for post-recording analysis.
In some embodiments, a voice analysis system is configured to identify dual tone multi-frequency (DTMF) tones, i.e., the sounds generated by a touch tone phone. In some of these embodiments, the tones can be removed from the recording. In removing such tones prior to speech recognition and/or phonetic analysis, such analysis may be more effective as the DTMF tones may no longer mask some of the recorded speech.
As an additional benefit, the desire for improved security of personal information may require in some circumstances that such DTMF tones not be stored or otherwise made available for later access. For instance, a customer responding to an IVR system query may input DTMF tones corresponding to a social security number or a bank account number. Clearly, recording such tones could increase the likelihood of this information being compromised. However, an embodiment of a voice analysis system that deletes these tones does not incur this potential liability.
[please add any other comments regarding DTMF tones]
In some embodiments, signaling tones, such as distant and local ring tones and busy equipment signals, can be identified. With respect to the identification of ring tones, identification of regional tones can provide additional information about a call that may be useful. By way of example, such tones could identify the region to which an agent placed a call while a customer was on hold. Moreover, once identified, the signaling tones can be removed from the recording of the communication session.
[please add any other comments regarding signaling tones]
Regional identification of audio components also can occur in some embodiments with respect to announcements. In this regard, some regions provide unique announcements, such as those originating from a central telephone office. For example, in the United States an announcement may be as follows, “I am sorry, all circuits are busy. Please try your call again later.” Identifying such an audio component in a recording could then inform a user that a party to the communication session attempted to place a call to the United States.
[please add any other comments regarding regional identification]
Various techniques can be used for differentiating the various portions of a communication session. In this regard, energy envelope analysis, which involves graphically displaying the amplitude of audio of a communication session, can be used to distinguish music from voice components. This is because music tends to follow established tempo patterns and oftentimes exhibits higher energy levels than voice components.
In some embodiments, such identification can be accomplished manually, semi-automatically or automatically. By way of example, a semi-automatic mode of identification can include providing a user with a graphical user interface that depicts an energy envelope corresponding to a communication session. The graphical user interface could then provide the user with a sliding window that can be used to identify contiguous portions of the communication session. In this regard, the sliding window can be altered to surround a portion of the recording that is identified, such as by listening to that portion, as music. The portion of the communication session that has been identified within such a sliding window as being attributable to music can then be automatically compared by the system to other portions of the recorded communication session. When a suitable match is automatically identified, each such portion also can be designated as being attributable to music.
Additionally or alternatively, some embodiments of a voice analyzer system can differentiate between announcements and tones that are regional in nature. This can e accomplished by comparing the recorded announcements and/or tones to a database of known announcements and tones to check for parity. Once designations are made about the portions of a communication sessions containing regional characteristics, the actual audio can be discarded or otherwise ignored during post-recording processing. In this manner, speech analysis does not need to be undertaken with respect to those portions of the audio, thereby allowing speech analysis systems to devote more time and resources to other portions of the communication session. Notably, however, the aforementioned designations can be retained in the records of the communication session so that information corresponding to the occurrence of such characteristics is not discarded.
In some embodiments, a database can be used for comparative purposes to identify variable announcements. That is an announcement that includes established fields, within which information can be changed. An example of such a variable announcement includes an airline reservation announcement that indicates current rate promotions. Such an announcement usually includes a fixed field identifying the airline and then variable fields identifying a destination and a fare. Knowledge of the first variable field involving a destination could be used to simplify post-recording processing in some embodiments, whereas other embodiments may avoid processing of that portion once a determination is made that the portion corresponds to an announcement. Alternatively, a hybrid approach could involve not processing of audio corresponding to fixed fields and allowing post-recording processing on the audio corresponding to the variable fields.
Another form of variable announcements relates to voicemail systems. In this regard, voicemail systems use variable fields to inform a caller that a voice message can be recorded. In some embodiments, these announcements can be identified and handled such as described before. One notable distinction, however, involves the use of the actual voicemail message that is left by a caller. If such a caller indicates that the message is “private,” some embodiments can delete the message or otherwise avoid post-recording processing of the message.
Further, the local interface may include address, control, and/or data connections to enable appropriate communications among the aforementioned components. The processor may be a hardware device for executing software, particularly software stored in memory.
The memory can include any one or combination of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.).
Moreover, the memory may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory can have a distributed architecture, where various components are situated remote from one another, but can be accessed by the processor. Additionally, the memory includes an operating system 510, as well as instructions associated with a voice analysis system 51, exemplary embodiments of which are described above.
One should note that the flowcharts included herein show the architecture, functionality and/or operation of a possible implementation of one or more embodiments that can be implemented in software and/or hardware. In this regard, each block can be interpreted to represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical functions. It should also be noted that in some alternative implementations, the functions noted in the blocks may occur out of the order in which depicted. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
One should note that any of the functions (such as depicted in the flowcharts) can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. More specific examples (a nonexhaustive list) of the computer-readable medium could include an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (CDROM) (optical). In addition, the scope of the certain embodiments of this disclosure can include embodying the functionality described in logic embodied in hardware or software-configured mediums.
It should be emphasized that many variations and modifications may be made to the above-described embodiments. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.