The present disclosure relates to an automated method for verifying that a user of a system is a human, and relates more particularly to a voice-based implementation of Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA).
In the modern Internet environment, digital enterprise platforms, e.g., finance, retail and/or travel websites, need to contend with bots, i.e., automated software applications programmed to do specific tasks much faster than can be performed by human users. Bots, which usually operate over a network, often imitate or replace a human user's behavior to perform malicious activities, e.g., hacking user accounts, scanning the web for contact information, etc. Examples of hots include web crawlers (which scan webpage contents on the Internet), social bots (which operate on social media platforms), chatbots (which simulate human responses in conversations) and malicious hots (which can send spam, scrape content, and/or perform credential stuffing).
One of the techniques for combatting hots is using completely automated public Turing test to tell computers and humans apart (CAPTCHA), which is a challenge-response mechanism configured to distinguish between a bot and a human. Conventional CAPTCHAs utilize text and/or image as bases for the challenge-response mechanism, which CAPTCHAs are increasingly being solved by hots and farms faster than the text and/or images can load on user's browsers, and the conventional CAPTCHAs are not able to detect when a single entity has solved the posed challenge multiple times, thus defeating the CAPTCHAs.
Therefore, there is a need to provide an improved CAPTCHA which can effectively distinguishes between a bot and a human.
According to an example embodiment of a method and a system for a voice CAPTCHA according to the present disclosure, a synthetically generated speech is distinguished from a natural human voice.
According to an example embodiment of the method and the system for a voice CAPTCHA according to the present disclosure, a user's voiceprint is created and associated with the user for authentication.
According to an example embodiment of the method and the system for a voice CAPTCHA according to the present disclosure, once a user logs into an account of the user in a system having the voice CAPTCHA functionality, the system checks whether the user's voiceprint already exists, and if not, the system records the user's speech to generate a unique voiceprint of the user.
According to an example embodiment of the method and the system for a voice CAPTCHA according to the present disclosure, once a user logs into an account of the user in a system having the voice CAPTCHA functionality, the system checks whether the user's voiceprint already exists, and if so, the system authenticates the user's voice by matching it to the user's voiceprint.
According to an example embodiment of the method and the system for a voice CAPTCHA according to the present disclosure, in the case a user performs a “guest checkout” (e.g., perform a purchase transaction) without logging into an account of the user in a system having the voice CAPTCHA functionality, the system determines whether the user is at least one of i) unique, ii) human, and iii) speaking live.
According to an example embodiment of the method and the system for a voice CAPTCHA according to the present disclosure, in the case a user performs a “guest checkout” without logging into an account of the user in a system having the voice CAPTCHA functionality, the system will try to match the user's voice to previous voices used for checkouts and/or those voices that have been enrolled already to determine, e.g., whether the user has previously purchased the same item.
FIG.1a is a schematic diagram of various components of an example system for implementing the voice CAPTCHA method according to the present disclosure.
FIG, 4 illustrates an example signal flow in an example system for implementing the voice CAPTCHA method for the case in which the user performs a “guest checkout”.
Continuing with
Continuing with
Continuing with
The MW 102 then sends a request to enroll the user with the VBS 103, as shown by the process arrow 4015. Once the VBS 103 sends to the MW 102 an indication that sufficient audio material from the user has been collected for training, as shown by the process arrow 4016, the MW 102 sends a request to the VBS 103 (as shown by the process arrow 4017) to start the training process to build a unique voiceprint. Once the training process for the voiceprint of the user has been completed, the VBS 103 sends to the MW 102 an indication that the unique voiceprint for the user has been successfully trained, as shown by the process arrow 4018.
As a summary, several examples of the method and the system according to the present disclosure are provided.
A first example of the method according to the present disclosure provides a method of Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA), comprising: recording, by a voice CAPTCHA module, a speech spoken by a user; determining, by a voice biometric service (VBS), whether a voiceprint matching the user's speech exists; and if a voiceprint matching the user's speech exists, verifying the user as a human user by the VBS.
A second example of the method modifying the first example of the method, the second method further comprising: if a voiceprint matching the user's speech does not exist, generating by the VBS a unique voiceprint for the user based on the user's speech.
A third example of the method modifying the first example of the method, the third method further comprising: if a voiceprint matching the user's speech does not exist, determining by the VBS whether the user's speech is at least one of a synthetically generated speech and a previously recorded audio being played back.
A fourth example of the method modifying the first example of the method, the fourth method further comprising: presenting, by the voice CAPTCHA module, a login screen to the user; wherein the VBS determines whether the voiceprint matching the user's speech exists after the user has logged in.
A fifth example of the method modifying the second example of the method, the fifth method further comprising: presenting, by the voice CAPTCHA module, a login screen to the user; wherein the VBS determines whether the voiceprint matching the user's speech exists after the user has logged in.
In a sixth example of the method modifying the third example of the method, the voice CAPTCHA module enables the user to perform a guest checkout without logging into the voice CAPTCHA module.
A seventh example of the method modifying the sixth example of the method, the seventh method further comprising: comparing, by the VBS, previously used voiceprints to the user's speech.
An eighth example of the method modifying the second example of the method, the eight method further comprising: if a voiceprint matching the user's speech does not exist, determining by the \IBS whether the user's speech is one of a synthetically generated speech and a previously recorded audio being played back.
In a ninth example of the method modifying the eighth example of the method, if the user's speech is not one of a synthetically generated speech and a previously recorded audio being played back, the VBS determines the user's speech to be a unique and authentic human voice.
In a tenth example of the method modifying the ninth example of the method, the unique voiceprint for the user is generated by the VBS after determining the user's speech is a unique and authentic human voice.
A first example of the system according to the present disclosure provides a system for implementing a method of Completely Automated. Public Turing test to tell Computers and Humans Apart (CAPTCHA), comprising: a voice CAPTCHA module configured to record a speech spoken by a user; and a voice biometric service (VBS) configured to: i) determine whether a voiceprint matching the user's speech exists, and ii) if a voiceprint matching the user's speech exists, verifying the user as a human user.
In a second example of the system modifying the first example of the system, the VBS is configured to generate a unique voiceprint for the user based on the user's speech if a voiceprint matching the user's speech does not exist.
In a third example of the system modifying the first example of the system, if a voiceprint matching the user's speech does not exist, the VBS is configured to determine whether the user's speech is at least one of a synthetically generated speech and a previously recorded audio being played back.
In a fourth example of the system modifying the first example of the system, the voice CAPTCHA module is configured to present a login screen to the user; and the VBS is configured to determine whether the voiceprint matching the user's speech exists after the user has logged in.
In a fifth example of the system modifying the second example of the system, the voice CAPTCHA module is configured to present a login screen to the user; and the VBS is configured to determine whether the voiceprint matching the user's speech exists after the user has logged in.
In a sixth example of the system modifying the third example of the system, the voice CAPTCHA module is configured to enable the user to perform a guest checkout without logging into the voice CAPTCHA module.
In a seventh example of the system modifying the sixth example of the system, the VBS is configured to compare previously used voiceprints to the user's speech.
In an eighth example of the system modifying the second example of the system, if a voiceprint matching the user's speech does not exist, the VBS is configured to determine whether the user's speech is at least one of a synthetically generated speech and a previously recorded audio being played back.
In a ninth example of the system modifying the eighth example of the system, if the user's speech is not one of a synthetically generated speech and a previously recorded audio being played back, the VBS is configured to determine the user's speech to be a unique and authentic human voice.
In a tenth example of the system modifying the ninth example of the system, the VBS is configured to generate the unique voiceprint for the user after determining the user's speech is a unique and authentic human voice.