The need for a simple and efficient way for capturing client-agent sessions or interactions is well known. Current systems are focused on recording of telephonic and computer-based interactions with customers such as telephone calls, e-mails, chat sessions, collaborative browsing and the like, but are not suitable to record face-to-face voice interactions in walk-in environments where a client has a frontal, face-to-face, interaction with a representative or an agent of a service provider.
The walk-in environments may be service centers, branches of banks, governmental offices, fast food counters, department stores and other private, commercial or government sites. In such an environment, it is very difficult to record a specific interaction between a client and an agent at an acceptable audio quality for several reasons. Firstly, the environmental noise, which is mainly human speech, may not be easily eliminated from the recording. Further, the agent may be required to leave its regular location facing the client during the interaction. Accordingly, existing solutions of voice recording systems are not suitable for noisy crowded environments such as walk-in service centers.
The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.
Although embodiments of the invention are not limited in this regard, discussions utilizing terms such as, for example, “processing,” “computing,” “calculating,” “determining,” “establishing”, “analyzing”, “checking”, or the like, may refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, that manipulate and/or transform data represented as physical (e.g., electronic) quantities within the computer's registers and/or memories into other data similarly represented as physical quantities within the computer's registers and/or memories or other information storage medium that may store instructions to perform operations and/or processes.
Although embodiments of the invention are not limited in this regard, the terms “plurality” and “a plurality” as used herein may include, for example, “multiple” or “two or more”. The terms “plurality” or “a plurality” may be used throughout the specification to describe two or more components, devices, elements, units, parameters, or the like. For example, “a plurality of stations” may include two or more stations.
Although embodiments of the invention are not limited in this regard, the terms “walk-in center” and “walk-in environment” as used herein may be used throughout the specification to describe any place in which a verbal interaction between two or more persons may occurred, for example, service centers of service providers, branches of banks, stores and other private, commercial or government points of presence.
Although embodiments of the invention are not limited in this regard, the term “an agent” as used herein may be used throughout the specification to describe any professional representative of a business or government providing a service to a customer, client or a civilian. Non-limiting examples may include a service provider representative, a clerk in a store, a banker, a tax authority representative and the like.
Reference is now made to
Each end-point, for example, end-points 110, 120 and 130 may include one or more agent input devices 111, for example a portable microphone to receive audio signals from agents and an input client unit 113 to receive audio signals from one or more clients. Each end-point 110, 120 and 130 may further include an interaction capture unit 112 to capture voice data from agent input device 111 and from input client unit 113. The audio signals captured by interaction capture unit 112 may be created by at least one agent and at least one client during a face-a-face verbal interaction occurring at the location of the respective end-point 110, 120 or 130. Although in the exemplary illustration of
Interaction capture unit 112 may process the captured audio signal, e.g., filter the non-relevant external acoustic sources and may transmit the processed audio signals via a wired or wireless link to central capture device 140, as described in detail below with reference to
Central capture device 140 may interface one or more end-points, for example, 120 and 130 in environment 100 and may transfer the processed audio signals of a verbal interaction to one or more storage unit 150. In some embodiments of the present invention, central capture device 140 may receive the audio signals from interaction capture units 112 and may process the audio signals before transferring them to storage unit 150. For example, central capture device 140 may combine the audio signals captured by agent input device 111 and the signals captured by input client unit 113 to a synchronized audio signal of an entire face-to-face interaction. In some embodiments, such processing may be performed by interaction capture unit 112 and central capture device 140 may separate the audio signals before transferring them to storage unit 150.
Although the scope of the present invention is not limited in this respect, central capture entity 140 may be implemented using any suitable combination of software and/or hardware and may be implemented as a stand alone unit or as a part of storage unit 150. Central capture device 140 may be coupled to communication network 160 to deliver the processed audio signals, for storage at storage unit 150 or live-monitoring at terminal 170. Storage unit and/or terminal 170 may be coupled to or may be a part of quality assurance or quality management system 180 which may be used for validating that the walk-in environment activities are being performed effectively and efficiently.
According to some embodiments of the present invention, input client unit 113 may include a directional microphone or one or more closely positioned microphones to act like a highly directional microphone in order to detect the audio signals, e.g., voice created by client, as is further described in
Although the scope of the present invention is not limited in this respect, input client unit 113 may be implemented using a microphone array, which may include a plurality of microphones which may optimize the signal-to-noise ratio (SNR) of the detected audio signal created by client 220 (of
Throughout the specification, for simplicity of the illustration, input client unit 113 is referred to a microphone array. It should be understood to a person skilled in art that the invention is not limited in this respect and according to embodiments of the present invention other devices having directional microphone functionalities are likewise applicable.
Communication network 160 may be a Local Area Network (LAN), a Wireless LAN (WLAN), a Metropolitan Area Network (MAN), a Wireless MAN (WMAN), a Wide Area Network (WAN), a Wireless WAN (WWAN) and networks operating in accordance with existing IEEE 802.11, 802.11a, 802.11b, 802.11e, 802.11g, 802.11h, 802.11i, 802.11n, 802.16, 802.16d, 802.16e standards and/or future versions and/or derivatives and/or Long Term Evolution (LTE) of the above standards. By way of example, communication network 160 may facilitate an exchange of information packets in accordance with the Ethernet local area networks (LANs). Such Ethernet LANs conform to the IEEE 802.3, 802.3u and 802.3x network standards, published by the Institute of Electrical and Electronics Engineers (IEEE). In some embodiments, proprietary interface protocols may be used and/or implemented.
Storage unit 150 may be used for voice interaction capturing, storing and retrieval. An exemplary system is sold under the trade name of NiceLog™ by NICE Systems Ltd., R'annana, Israel, the assignee of this patent application. In some embodiments of the present invention, storage unit 150 may further comprise screen capture and storage components for screen shot and screen events interaction capturing and/or video capture and storage component for capturing, storing and retrieval of the visual streaming video interaction coming from one, or more, video camera which may be located at one or more end point 110, 120 and/or 130. Storage unit 150 may include or may be coupled to a database component in which information regarding the interaction is stored for later query and analysis (not shown).
Although the scope of the present invention is not limited in this respect, capture elements, such as interaction capture unit 140 and storage elements, such as storage unit 150 may be separated and interconnected over a LAN/WAN or any other IP based local or wide network, e.g., communication network 160. The storage component 150, which may include a database component (not shown), may either be located at the same location or be centralized at another location covering multiple walk-in environments or branches. The transfer of content such as, voice, screen or other media from the interaction capture units 112 to the central capture unit 140 may either be based on proprietary protocols such as a unique packaging of RTP packets for the voice or based on standard protocols such as H.323 for VoIP and the like.
Reference is now made to
According to some embodiments of the present invention, input agent unit 230 may be a portable unit having dimensions small enough to be easily attached to and detached from the agent's clothing or body. In other embodiments agent unit 230 may be a fixed device, e.g., fixed to a desk, a computer or other equipment at the location of end-point 200. According to some embodiments of the present invention, agent unit 230 may detect and capture the voice stream created by agent 210 and may filter all external acoustic sources other than agent 210 voice. Agent unit 230 may further transmit the captured voice stream to local interaction capture unit 240 via a communication connection 260.
For a wireless agent unit, the transmission may be done via a wireless connection, for example a radio frequency (RF) connection. For a fixed agent unit, the transmission may be done via any wired connection, as known in the art. In some embodiments of the present invention, filtering and further processing of the voice stream detected by agent unit 230 may be performed by interaction capture unit 240. Input agent unit 230 may be implemented using hardware components or any suitable combination of software and hardware, as is described in detail below with reference to
Communication connection 260 may be a power-efficient and inexpensive interface, implemented for example, by proprietary unidirectional Wireless Personal Area Network (WPAN) protocols for low power networks, standard Radio Frequency (RF) protocols or proprietary RF protocols. Other communication protocols and methods may be used, e.g., zigbee I, zigbee II, bluetooth or IEEE 802.15.4.
According to the characteristics of a certain walk-in environment, e.g., walk-in environment 100 of
The design of microphone array 250 may be based on microphone phase array technology and may include one or more microphones which may optimize the signal to noise ratio (SNR) of the detected audio signal created by client 220. Microphone array 250 may include a set of closely positioned microphones to achieve better directionality than a single microphone by taking advantage of the fact that an incoming acoustic wave arrives at each of the microphones at a slightly different time or phase.
Non-Limiting examples of microphone array design may include a two-element microphone array, a straight four-element microphone array and L-shaped 4-element microphone array. Microphone array 250 may combine the signals detected by all microphones, and may act like a highly directional microphone, forming what is also referred to herein as “a beam” which is a known in the art term. This microphone array beam may be electronically managed to point to the speaker, e.g, client 220. Using microphone array 250 may be mechanically equivalent to using two highly directional microphones: one for scanning the end-point space and for measuring the sound level, and the other for pointing to the direction with the highest sound level, e.g., toward client 220.
According to some embodiments of the present invention, microphone array 250 may detect and/or capture audio signals from client 220 and may transmit these audio signals to local interaction capture unit 240. According to some embodiments of the present invention, microphone array 250 may include a microphone array receiving unit 280 to amplify and sample the audio signal detected by microphone array 250.
According to some embodiments of the present invention, interaction capture unit 240 may include an agent receiving unit 290 to receive the voice transferred from input agent unit 230, a processor 270 coupled to units 280 and 290 to process the received signals and a communication interface unit 275. Processor 270 may further control input agent unit 230 and optionally microphone array 250. According to embodiments of the present invention, processor 270 may sum the voice streams received from input agent unit 230 and microphone array 250 and may deliver a data stream of a complete verbal interaction between agent 210 and client 220.
Optionally, according to some embodiments of the invention, processor 270 may include or may be coupled to a memory unit 278. Memory unit 278 may be used as a buffer to store temporary data, for example, when the communication between capture unit 240 and the central capture 140 may be down. Although the scope of the present invention is not limited in this respect, types of memory that may be used with embodiments of the present invention may include, for example, a shift register, a Flash memory, a random access memory (RAM), dynamic RAM (DRAM), static RAM (SRAM) and the like.
According to embodiments of the present invention, unit 280 may include one or more amplifiers and one or more analog-to-digital (A/D) converters (not shown) to prepare the detected voice for further processing, such as but not limited to, filtering by processor 270. In some embodiments, unit 280 may include an amplifier and an A/D converter for each microphone of microphone array 250. Unit 280 may further contain a control circuitry to transmit control signals from processor 270 to microphone array 250. Microphone array receiving unit 280 may contain other blocks or circuitry. Microphone array receiving unit 280 may be implemented using hardware components or any suitable combination of software and hardware.
Microphone array 250 may be positioned in front of client 220 to produce high directivity “beam”, which may be considered as an acoustical phased array antenna with narrow controlled main beam and minimal side lobes by changing the weight of the signal received from each microphone of microphone array 250 by processor 270. Processor 270 may create the “beams” by, for example, weighted summation of all microphone array signals or other algorithms and may control the “movement” of the beam in order to track client 220 by applying mathematical algorithms on the signals received from microphone array receiving unit 280.
According to embodiments of the present invention, processor 270 may search for the position of client 220 and may aim the beam in that direction by using for example, special software. When client 220 moves, processor 270 may control microphone array 250 to follow the sound source by applying a software tracking algorithm. By way of example, the tracking algorithm used may be the GBD of Microsoft® designed by Ivan Tachev and Henrique S. Malvar.
According to some embodiments of the present invention, processor 270 may be a general-purpose processor. Additionally or alternatively, processor 270 may include a digital signal processor (DSP), a microprocessor, a host processor, a controller, a plurality of processors or controllers, a chip, a microchip, one or more circuits, circuitry, a logic unit (FPGA), an integrated circuit (IC), an application-specific IC (ASIC), or any other suitable multi-purpose or specific processor or controller. In some embodiments of the invention, processor 270 may be implemented as an integrated unit in microphone array 250.
According to some embodiments of the present invention, in which input agent unit 230 is a wireless device and communication connection 260 is implemented as wireless connection, agent receiving unit 290 may include an antenna, for example, a dipole antenna 292 to receive the audio signals transferred from input agent unit 230 via the wireless connection, an amplifier circuitry and an RF demodulator circuitry (not shown) to demodulate the audio signals received from input agent unit 230. The output of the demodulator circuitry or other circuitry may be further processed by processor 270. Processor 270 may transfer the agent voice stream and the client voice stream as separate channels or in an combined stream via communication interface unit 275 to a higher level; for example, central capture unit 140 of
Interface communication unit 275 may include circuitry and physical components for transferring the captured and processed voice streams or audio signals via a communication network, e.g., network 160 of
Although the scope of the invention is not limited in this respect, the space architecture of endpoint 200 may follow the exemplary specification detailed herein. According to an exemplary embodiment of the invention, the distance between client 220 and microphone array 250 may be no more than 1.5 meter, the angle between client 220 and interaction capture unit 240 may be not more than ±45 degrees in the horizontal plane and the angle between client 220 and endpoint 200 may be not more than −30 to 45 degrees in the vertical plane.
According to an exemplary embodiment of the invention, the agent may carry agent unit 230 such that the distance between the agent unit and the agent's mouth may not exceed 0.3 meters, the distance between agent unit 230 and interaction capture unit 240 may not exceed 20 meters and the distance between microphone array 250 and other direct sound sources at other end-points may be no less then 3 meters. Other distances may be used.
Reference is now made to
Input agent unit 300 may comprise a processing and control unit 320 to capture the analog voice signal received by microphone 310, to process the signal and to transfer the processed signal to interaction capture unit 240. The received and/or processed signal may be transmitted via antenna 330 which may include or may be for example, a PCB printed folded dipole antenna or any other antenna as is known in the art.
According to some embodiments of the present invention processing and control unit 320 may include amplifying circuits and/or other components to amplify the analog audio signal received from and/or detected by microphone 310, an analog-to-digital (A/D) converter to convert the received analog audio signal to a digital signal for further processing and a transmitting circuitry to transmit the processed signal via a wireless connection, e.g., connection 260 of
Although embodiments of the invention are not limited in this regard processing and control unit 320 may include circuitry for filtering the external acoustic sources other than the voice of agent 220 and for controlling the transmission of the processed signal according to the required communication protocol, for example, a proprietary RF protocol which may include a handshake with RF link band of 2400-2480 Mhz. Any other license free link band may be likewise used.
According to some embodiments of the present invention processing and control unit 320 may include a general-purpose processor. Additionally or alternatively, processing and control unit 320 may include a digital signal processor (DSP), a microprocessor, a host processor, a controller, a plurality of processors or controllers, a chip, a microchip, one or more circuits, circuitry, a logic unit, an integrated circuit (IC), an application-specific IC (ASIC), or any other suitable multi-purpose or specific processor or controller.
According to some embodiments of the present invention agent device 300 may include a power supply 340 which may be, for example, a rechargeable battery such as lithium ion battery, super iron battery and the like. Power supply 340 may be recharged via charge pins 350 and may allow an easy maintenance of agent device 300. Although embodiments of the invention are not limited in this regard power supply 340 may have dimensions which are small enough to be included in a personal portable device and may work for several hours, e.g., up to 9 hours without the need to recharge it.
Reference is now made to
As indicated at box 410, the method may include receiving audio stream signals of the voice created by a participant of a face-to-face interaction, for example, agent 210 (of
As indicated at box 430, the method may include transmitting the signals processed at block 420 via a communication link, for example, RF wireless communication to a capture unit, for example, interaction capture unit 240 (of
As indicated at box 450, the method may include processing the audio signals received at boxes 440 and 420, for example, beam forming, filtering external noises and reverberations other than client 220 and agent 210 voices. The method may further include processing of the received signals or controlling of the receiving microphones, e.g., microphone array unit 250 in order to optimize the signal to noise ratio of the received signal, as is described with reference to
According to some embodiments of the present invention, processing the audio signals received at box 410 may be additionally or alternatively to the processing which is indicated at box 420. According to some embodiments of the present invention the features of the method which are described at boxes 450 and 440 may be implemented at a single physical unit and according to other embodiments may be implemented at separate physical units.
As indicated at box 460, the method may include transmitting the processed signals of the face-to-face interaction to a higher level, for example, central capture unit 140 (of
While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.