The present invention relates to a speech detection system and method that is capable of detecting a non-standard speech that takes place in platforms using, a voice communication.
Telephone communication can be used to carry out most financial transactions such as banking transactions, e-commerce and others. In these transactions, a user's identity can be protected by using password control, one time password (OTP) entry, or biometric verification when a transaction is performed by voice over a transmission media such as a telephone. Identity verification can be done either by a machine or a human operator. Various security modes may be used to verify the identity of a caller. There is always a possibility that an individual who can provide a positive identification may be doing so under duress in which case a caller may be acting against his/her will. This presents a serious threat. A similar situation may arise during a cash transaction.
The current systems that detect a caller's emotion use the voice of a caller to detect the content of the speech or the change in caller's emotion in the speech. These systems use common speech models that are generated from a common database and therefore they perform the transaction over common voice models prepared over a common database. In this type of applications, training algorithms for mood models use general data and therefore common emotional features of all people in the database are extracted. As an example, an angry model can be generated from the analysis of an angry conversation. However, this increases the emotion detection error rate because of the use of a general database during the training of the model. Tone and way of speech may vary from one individual to another. An individual's angry tone of voice may be considered to be a normal speech for another individual. These differences affect the operation of the model and therefore resulting an error in identifying the mood in a speech. There is no known method where an individual's speech pattern is analyzed by using a model that is trained by using that individual's speech.
An objective of the present invention is to realize a non-standard speech detection system and method, which is capable of analyzing human voice to detect the speech style of an individual.
Another objective of the present invention is to realize a non-standard speech detection system and method, which is capable of transmitting non-standard speech to related units in the event that a speech is detected.
Another objective of the present invention is to realize a non-standard speech detection system and method that uses a model that is trained by a personalized speech and not trained by others' speech therefore increasing the accuracy rate of the system.
Another objective of the present invention is to realize a non-standard speech detection system and method which is trained by using personalized speech that includes content and behavioral features embedded in it, in addition to using acoustic features of the speech.
Another objective of the present invention is to realize a non-standard speech detection system and method which will increase security level and reliability of transactions carried out in restricted-access electronic fields such as telephone banking, call centers, telephone shopping and conversations made in physical environments such as banks, cash points as well. In the event that an individual's speech style differs from that individual's standard speech style, the system detects the difference in the speech style and communicates with the related units. If there are changes in the voice of an individual during conversation depending on stress, anger, psychological state, etc. the system is capable of detecting that change.
Another objective of the present invention is to realize a non-standard speech detection system and method that is capable of detecting and communicating a speech that is out of norm.
“A non-standard speech detection system and method” realized to fulfil the objectives of the present invention is shown in the figures attached, in which:
The components illustrated in the figures are individually numbered, where the numbers refer to the following:
Non-standard speech detection system (1) that is capable of analyzing human voice to detect the speech style of an individual, comprising:
Analysis module (6) analyses and extracts the speech content and behavioral characteristics of a speech. Analysis Module (6) detects non-standard speech style by comparing a new speech with stored speeches in Model Database (7). Transmission Unit (9) communicates the result of the comparison made in Analysis Unit (6) to the other units.
Voice Recording Unit (2) records speeches of an individual to be used to create a personal standard speech style model. Speech recordings can be obtained from call center conversations, which are performed in each individual's own standard speech style. In an embodiment of the invention, it is possible to initiate an external call with an individual or ask an individual to leave a voice recording during that individual's first call in order to store the voice recording of the individual. This recording of the individual is used for creating the model.
All voice recordings are performed by Voice Recording Unit (2) and the results are stored in Voice Database (3). The storing step is performed for each individual. As a result of the analysis of these recordings, personal acoustic, speech content and behavioral characteristics of each individual are extracted and personal standard speech style models are created.
Acoustic Examination Module (4) determines the acoustic characteristics of individual's speech by analyzing the recordings stored in Voice Database (3). A large number of acoustic parameters are evaluated in order to determine acoustic characteristics of a speech. In a preferred embodiment of the invention, Acoustic Examination Module (4) uses acoustic parameters such as prosodic parameters, pitch frequency, emphasis parameters, energy, spectrum parameters, duration parameters, Mel Frequency Cepstral Coefficients, harmonics, jitter and shimmer in order to determine acoustic. characteristics. Speech Recognition Module (5) determines the behavioral characteristics of an individual from that individual's voice recording. In a preferred embodiment of the invention, behavioral characteristics of an individual's speech are related to parameters such as speaking speed, monotony, hesitance, and interruption interval for the other party, speech overlap with the other party. Speech behavioral characteristics may vary from one individual to another individual; therefore determining personal behavioral characteristics is an important factor in the accuracy of detection. General models that are trained by using the speeches of many people may lead to error during non-standard speech detection. For example, voice of an individual who talks monotonously and hesitantly in daily life may be considered as non-standard when it is compared to general behavioral characteristics. This may cause misunderstandings and may generate erroneous results.
In a preferred embodiment of the invention, Speech Recognition Module (5) automatically converts the voice recording of an individual into text and extract speech content characteristics of a customer by performing content analysis. Speech content is related to different parameters such as sentence patterns, word and phrases, language model n-gram parameters, semantic parameters, content, context and shape parameters used in an individual's daily life. For example, it is possible that an individual may talk differently than his/her standard speech style when that individual is under stress.
Analysis Module (6) analyses the content of speech, behavioral characteristics of speech, and acoustic speech characteristics. Personal standard speech style models are generated based on the results obtained from Analysis Module (6) Personal standard speech style models are stored in the Standard Speech Style Model Database (7).
When an individual/customer wants to carry out transaction or get information over various speech channels, that individual is welcomed by a voice recording by a machine or a customer representative and voice communication is established. Acoustic, speech content and behavioral characteristics to be obtained from instant voice data taken from the customer during this conversation are extracted by Acoustic Examination Module (4) and Speech Recognition Module (5). Evaluation Module (8) compares the characteristics obtained from the system modules with the recorded personal standard speech style models and decides whether the individual's speech is a standard or non-standard speech.
Transmission Unit (9) informs the related units when there is a speech that is different than the standard speech style. In an embodiment of the invention, Transmission Unit (9) may send information to call center supervisors, security department officials, customer representatives or cashier officers by way of an e-mail, an sums, or pop-up methods depending on the application type. The information may also be reflected on the screen of an electronic device (PC, tablet, smart phone, etc.) of a related individual over an interface.
Reference models to be used for detecting non-standard speech style are trained by using personalized data but not by using general data. Model training, and reference recordings are obtained from an individual's standard speech by the system modules. The fact that the system will be designed in a personalized way increases the accuracy rate of the system and enhances the system's efficiency.
In an embodiment of the invention, Non-Standard Speech Detection System (1) can be applied to the speech of a customer representative in addition to a customer. For example, a customer representative's speech may change and the voice of a customer representative may alter if a customer representative gets tired after long and frequent conversations. This tiredness leads to a change in speech style of customer representative. In the event that Non-Standard Speech Detection System (1) is applied to a customer representative. Transmission Unit (9) provides information to a customer representative or a pre-determined authorized person in the event that the Evaluation Module (8) determines that the speech style of a customer representative during conversation changes and alters from the standard speech style. The customer representative or the authorized person can take necessary action by using this information. For example, a customer representative having conversation for a long time and getting information that his/her speech is becoming different from the standard speech style knows that s/he needs to take a break and rest. In another embodiment, a communication channel controller will be able to watch speech style of a customer representative and in the event that his/her speech becomes different from the standard, the person will be able to take necessary action.
The inventive non-standard speech detection system (1) can be used in all necessary fields.
The inventive non-standard speech detection method (100) comprises steps of:
The inventive non-standard speech detection method (100) is related to performing personal analysis from voices of persons who want to carry out transaction by establishing voice communication over various communication channels or customer representatives who help customers during transaction, and detecting non-standard speech.
In the inventive non-standard speech detection method (100), Voice Recording Unit (2) records an individual's conversations, which are made by that individual over a voice communication channel, to Voice Database (3) (101). In an embodiment of the invention, the communication channel may be restricted-access electronic fields such as telephone banking transactions, conversations made with call centers, telephone shopping and also conversations made in physical environments such as banks and cash points. During these conversations, speeches of a customer and/or a customer representative, and cashier are recorded in their own voice databases (3) individually.
Acoustic Examination Module (4) analyses the acoustic speech characteristics from the voice recordings in Voice Database (3) (102). Acoustic characteristics of speech may vary from an individual to another individual. An individual who talks aloud in daily life will likely have a high volume voice when that individual carries out a voice banking transaction over a telephone. It would be a mistake to conclude that the loud speech of an individual is a non-standard speech. Same situation applies to people who talk monotonously and with in a low volume of voice in daily life. In order to prevent these and similar mistakes, speeches of each individual are recorded individually and each voice recording is evaluated by Acoustic Examination Module (4) separately in inventive Non-Standard Speech Detection Method (100).
Speech Recognition Module (5) analyses the behavioral characteristics of a speaker by using the voice recordings of the speaker stored in Voice Database (3) (103). Behavioral characteristics vary from an individual to another individual. Therefore, examining behavioral characteristics of each individual increases accuracy rate of the system. Speech Recognition Module (5) examines the daily standard speeches of an individual that are stored in Voice Database (3) and analyses the behaviors of an individual and reflections thereof on that individual's voice and creates behavioral profile of that individual. This analysis is important for Evaluation Module (8) in order to examine variations depending on stress in that individual's voice, anger, and psychological state during a new conversation.
In Non-Standard Speech Detection System (1), Speech Recognition Module (5) converts the voice recordings in Voice Database (3) into texts and analyses the speech content from these texts (104). Voice tone or behavioral characteristics of an individual during a conversation may vary according to the content of the conversation. For example, it is possible that an individual, who is not pleased with his/her previous shopping experience, may want to cancel a transaction. During this cancellation process, the individual may talk louder. However, loud conversation may be the standard for this individual. Accordingly, determining in which tone and characteristic of speech would be standard for an individual depends on the content of the speech as well.
In Non-Standard Speech Detection System (1), Analysis Module (6) creates a personal standard speech style from the analysis results of an individual's speech characteristics, speech content and behavioral characteristics (105). The personal standard speech created by Analysis Module (6) is stored in Standard Speech Style Model Database (7) (106). Accordingly, Evaluation Module (8) compares new voice input with the personal standard speech style in Standard Speech Style Model Database (7) (107) when the speaker engages in a new voice conversation.
Speech styles other than a standard speech style may occur for different reasons. An individual may be angry, may be under stress or his/her psychological state may be different on a given day that may cause the person to talk differently than his/her standard speech style. Such cases may occur due to many different reasons. The fact that an individual is forced to carry out a transaction without his/her will is one of the examples that may lead to critical results. In circumstances where an individual is under stress or duress, it is normal that differences in an individual's speech may be noticeable. An individual would talk in a nervous, and stressful way. In these circumstances, Evaluation Module (8) will determine that the individual's speech does not match with the standard speech style models in the database. Transmission Unit (9) transmits the information obtained after the evaluation process carried out by Evaluation Module (8) to related pre-determined places (108). Due to this feedback, a transaction carried out during for example a voice call can be regarded as invalid depending on the subject and the related security departments can be informed and such preventive measures can be taken. Thus, security of transactions carried out by people via voice call is enhanced. An out of ordinary situation is detected by the system and security forces are notified.
In an embodiment of the invention, if it is determined by Evaluation Module (8) that instant voice input of a customer carrying out banking transaction over telephone is considered not to be standard speech style, the result is communicated to a customer representative and the customer representative may indirectly ask the customer to say a pre-determined password in urgent/risky situations between the customer and the hank. If it is determined that there is a security risk, then the transaction is regarded as invalid by the hank and this situation is reported to the related security departments.
Non-Standard Speech Detection System (1), and method (100) cannot be limited to the examples disclosed herein.
Number | Date | Country | Kind |
---|---|---|---|
a 2014/12527 | Oct 2014 | TR | national |
Number | Name | Date | Kind |
---|---|---|---|
4837830 | Wrench, Jr. | Jun 1989 | A |
5036539 | Wrench, Jr. | Jul 1991 | A |
5548647 | Naik | Aug 1996 | A |
5774525 | Kanevsky | Jun 1998 | A |
5983190 | Trower, II | Nov 1999 | A |
6182037 | Maes | Jan 2001 | B1 |
6256737 | Bianco | Jul 2001 | B1 |
6272463 | Lapere | Aug 2001 | B1 |
6615174 | Arslan | Sep 2003 | B1 |
6697779 | Bellegarda | Feb 2004 | B1 |
6731307 | Strubbe | May 2004 | B1 |
6795808 | Strubbe | Sep 2004 | B1 |
7076430 | Cosatto | Jul 2006 | B1 |
7136818 | Cosatto | Nov 2006 | B1 |
7240007 | Junqua | Jul 2007 | B2 |
7606701 | Degani | Oct 2009 | B2 |
7627472 | Trinkel | Dec 2009 | B2 |
7636855 | Applebaum | Dec 2009 | B2 |
7752043 | Watson | Jul 2010 | B2 |
7844467 | Cosatto | Nov 2010 | B1 |
7940914 | Petrushin | May 2011 | B2 |
7983910 | Subramanian | Jul 2011 | B2 |
8130929 | Wilkes | Mar 2012 | B2 |
8255223 | Wang | Aug 2012 | B2 |
8340274 | Saushkin | Dec 2012 | B2 |
8384516 | Fein | Feb 2013 | B2 |
8428227 | Angel | Apr 2013 | B2 |
8442824 | Aley-Raz | May 2013 | B2 |
8447614 | Capman | May 2013 | B2 |
8493410 | Basson | Jul 2013 | B2 |
8621615 | Zhao | Dec 2013 | B2 |
8645137 | Bellegarda | Feb 2014 | B2 |
8660970 | Fiedorowicz | Feb 2014 | B1 |
8694307 | Shammass | Apr 2014 | B2 |
8768707 | Mozer | Jul 2014 | B2 |
8831208 | Suendermann | Sep 2014 | B2 |
8897500 | Syrdal | Nov 2014 | B2 |
8903725 | Pilz | Dec 2014 | B2 |
9042867 | Gomar | May 2015 | B2 |
9092757 | Antin | Jul 2015 | B2 |
9094388 | Tkachev | Jul 2015 | B2 |
9099088 | Washio | Aug 2015 | B2 |
9123342 | Chen | Sep 2015 | B2 |
9129602 | Shepard | Sep 2015 | B1 |
9147401 | Shriberg | Sep 2015 | B2 |
9195641 | Fisher | Nov 2015 | B1 |
9202466 | Heckmann | Dec 2015 | B2 |
9223537 | Brown | Dec 2015 | B2 |
9262612 | Cheyer | Feb 2016 | B2 |
9286790 | Lyman | Mar 2016 | B2 |
9305553 | Meisel | Apr 2016 | B2 |
9311680 | Kim | Apr 2016 | B2 |
9318114 | Zeljkovic | Apr 2016 | B2 |
9368114 | Larson | Jun 2016 | B2 |
9378741 | Coussemaeker | Jun 2016 | B2 |
9386146 | Gainsboro | Jul 2016 | B2 |
9390706 | Gustafson | Jul 2016 | B2 |
9401925 | Guo | Jul 2016 | B1 |
9424837 | Talhami | Aug 2016 | B2 |
9443521 | Olguin Olguin | Sep 2016 | B1 |
9495350 | John | Nov 2016 | B2 |
9576157 | Fitzgerald | Feb 2017 | B2 |
9576593 | Pakhomov | Feb 2017 | B2 |
20010049785 | Kawan | Dec 2001 | A1 |
20020147914 | Arnold | Oct 2002 | A1 |
20030046083 | Devinney, Jr. | Mar 2003 | A1 |
20030084289 | Watanabe | May 2003 | A1 |
20060106605 | Saunders | May 2006 | A1 |
20060122834 | Bennett | Jun 2006 | A1 |
20070027687 | Turk | Feb 2007 | A1 |
20070174080 | Outwater | Jul 2007 | A1 |
20070185718 | Di Mambro | Aug 2007 | A1 |
20070213987 | Turk | Sep 2007 | A1 |
20090006856 | Abraham | Jan 2009 | A1 |
20090313165 | Walter | Dec 2009 | A1 |
20100036660 | Bennett | Feb 2010 | A1 |
20110246196 | Bhaskaran | Oct 2011 | A1 |
20120010879 | Tsujino | Jan 2012 | A1 |
20120253807 | Kamano | Oct 2012 | A1 |
20130097682 | Zeljkovic | Apr 2013 | A1 |
20130347129 | Samuelsson | Dec 2013 | A1 |
20150106102 | Chebiyyam | Apr 2015 | A1 |
20150156328 | Arslan | Jun 2015 | A1 |
20150302846 | Song | Oct 2015 | A1 |
20150350438 | Arslan | Dec 2015 | A1 |
20150379985 | Wang | Dec 2015 | A1 |
20160118050 | Arslan | Apr 2016 | A1 |
20160125419 | Arslan | May 2016 | A1 |
Number | Date | Country | |
---|---|---|---|
20160118050 A1 | Apr 2016 | US |