The invention relates to artificial intelligence, and in particular to employing artificial intelligence (AI) to automatically determine whether an entity engaging in electronic messaging is a robot.
Recent developments in AI are driving sweeping changes in virtually every aspect of human activity. A particular class of AI applications is directed at human-machine interaction and comprises the development of language models enabling a computer to communicate in a natural language such as English or Chinese. While creating vast opportunities for business development by enabling the automation of various tasks such as customer service and information retrieval, language model-enabled software agents (commonly known as chat robots or chatbots) are also posing unprecedented technical and ethical challenges. As chatbots become ever more sophisticated and capable of engaging in realistic conversation, it becomes truly difficult to distinguish human from machine. Unscrupulous entities may exploit such confusion for purposes such as fraud, unsolicited communications (spam), identity theft, large-scale disinformation, and political manipulation, among others.
The problem of determining whether a conversation partner is a human or a computer is generically known as a Turing test and is almost as old as information technology itself. Several methods of implementing a Turing test have been proposed for various applications, most recently related to communicating via the Internet. One example comprises a category of methods collectively known as Completely Automated Public Turing tests to tell Computers and Humans Apart (CAPTCHA). In various embodiments, CAPTCHA comprises issuing a challenge (e.g., an image recognition problem) in response to an attempt to access an online service (e.g., a particular web page), and selectively allowing access to the respective service according to a response to the respective challenge. Such methods implicitly rely on the current inability of a typical software agent to solve the particular type of problem used as challenge. However, recent developments in AI and language modelling are quickly rendering such conventional Turing tests obsolete.
For the reasons outlined above, there is a renewed interest in developing robust and effective Turing tests.
According to one aspect, a computer system comprises at least one hardware processor configured to employ a generative language model to generate a surrogate conversation sequel and a surrogate response. The surrogate conversation sequel comprises a predicted continuation of an ongoing online conversation comprising a sequence of messages. The surrogate response comprises a predicted response from a conversation partner to the surrogate conversation sequel. The at least one hardware processor is further configured to, in response, distort the surrogate conversation sequel to produce a challenge message and add the challenge message to the ongoing online conversation. The at least one hardware processor is further configured to, in response to receiving a partner response from the conversation partner, the partner response comprising a response to the challenge message, determine whether the conversation partner comprises a robot according to a similarity between the partner response and the surrogate response.
The foregoing aspects and advantages of the present invention will become better understood upon reading the following detailed description and upon reference to the drawings where:
In the following description, it is understood that all recited connections between structures can be direct operative connections or indirect operative connections through intermediary structures. A set of elements includes one or more elements. Any recitation of an element is understood to refer to at least one element. A plurality of elements includes at least two elements. Any use of ‘or’ is meant as a nonexclusive or. Unless otherwise required, any described method steps need not be necessarily performed in a particular illustrated order. A first element (e.g., data) derived from a second element encompasses a first element equal to the second element, as well as a first element generated by processing the second element and optionally other data. Making a determination or decision according to a parameter encompasses making the determination or decision according to the parameter and optionally according to other data. Unless otherwise specified, an indicator of some quantity/data may be the quantity/data itself, or an indicator different from the quantity/data itself. A computer program is a sequence of processor instructions carrying out a task. Computer programs described in some embodiments of the present invention may be stand-alone software entities or sub-entities (e.g., subroutines, libraries) of other computer programs. Computer-readable media encompass non-transitory media such as magnetic, optic, and semiconductor storage media (e.g. hard drives, optical disks, flash memory, DRAM), as well as communication links such as conductive cables and fiber optic links. According to some embodiments, the present invention provides, inter alia, computer systems comprising hardware (e.g. one or more processors) programmed to perform the methods described herein, as well as computer-readable media encoding instructions to perform the methods described herein.
Embodiments of the present invention are directed at detecting chatbots masquerading as humans in online conversations. A chatbot herein denotes any computer program configured to automatically generate text formulated in a natural language such as English, Russian, and Chinese, among others, and to interface with a messaging application to transmit the respective text to another computer system. Generating text comprises effectively creating the respective text fragment (for instance by automatically concatenating words extracted from a dictionary according to an algorithm and/or language model), as opposed to merely encoding text provided by a human operator. An online conversation herein comprises a sequence of electronic messages exchanged among a set of partners via a messaging application and/or online platform. Messages may vary in format according to the respective messaging platform, protocol, and/or application, but in general an electronic message may comprise an encoding of a text part and/or an encoding of a media file (e.g., image, movie, sound, etc.). The text part may comprise text written in a natural language and other alphanumeric and/or special characters such as emoticons, among others.
For clarity, the following description will focus on processing text messages. However, a skilled artisan will know that the described systems and methods may be adapted to other kinds of messaging such as audio/video (e.g., spoken text) or a combination thereof. For instance, some embodiments may determine whether an audio file comprising a spoken message was generated by a chatbot. In one exemplary embodiment, a speech-to-text translator may be used to convert an audio file into a text fragment and then apply methods described herein in relation to text messaging.
Online messaging encompasses peer-to-peer messaging as well as messaging via public chatrooms, forums, social media sites, etc. Examples of online conversations include an exchange of short message service (SMS) messages, a sequence of e-mail messages, and a sequence of messages exchanged via instant messaging applications such as WhatsApp Messenger®, Telegram®, WeChat®, and Facebook® Messenger®, among others. Other exemplary online conversations include a content of a Facebook® wall, a chat conducted on an online forum such as Reddit® and Discord®, and a set of comments to a blog post. Exemplary messaging applications according to embodiments of the present invention include client-side instances of mobile applications such as WhatsApp®, Facebook®, Instagram®, SnapChat® etc., as well as software executing the server side of the respective messaging operations. Other examples of messaging application include an email client and an instance of an Internet browser.
The following description illustrates embodiments of the invention by way of example and not necessarily by way of limitation.
In typical online messaging scenarios such as social media platforms and online forums, messages are centralized, routed and/or dispatched by a messaging server 12, for instance using a client-server protocol. Stated otherwise, individual messages sent by multiple client systems 10a-c may accumulate at messaging server 12, which may selectively display, or otherwise deliver them to their intended destination. In alternative embodiments, electronic messaging uses a de-centralized network of individual peer-to-peer connections between client systems 10a-c.
In some embodiments, a chatbot detector determines whether at least a part of an online conversation (i.e., a set of messages) was automatically generated. For instance, the chatbot detector may determine whether a selected client system 10a-c comprises a chatbot masquerading as a human in online conversations. Exemplary embodiments of the chatbot detector include a set of interconnected computer programs, dedicated hardware modules, or a combination thereof. In some embodiments, at least a part of the chatbot detector may execute on a utility server 16 which may include a set of interconnected computer systems further connected to communication network 15. In some embodiments as described herein, a chatbot detector comprises an artificial intelligence (AI) system comprising a set of pre-trained neural networks. Training the respective AI system may be carried out on a dedicated AI training appliance 14.
In some embodiments, detector 20 comprises a conversation agent 32 interconnected with a challenge generator 34 and a response analyzer 36. Conversation agent 32 may interface with a messaging application 30 generically representing any software configured to enable a user of the respective client system to exchange electronic messages with other users. Exemplary messaging applications 30 include a local instance of a Facebook® mobile application and a local instance of an Internet browser, among others, as well as server-side components of messaging platforms as described above. Application 30 may display a content of each electronic message on an output device (e.g., screen) of the respective client system and may further organize messages according to sender, recipient, time, subject, or other criteria. Application 30 may further receive input from the user (e.g., from a keyboard, touchscreen, dictation interface, etc.), formulate electronic messages according to the received input, and transmit electronic messages to messaging server 12 and/or directly to other client systems 10a-c. Transmitting a message may comprise, for instance, adding an encoding of the respective message to an outbound queue of a communication interface of the respective client system.
Interfacing with application 30 includes retrieving and/or transmitting data from/to application 30. For instance, agent 32 may be configured to parse an ongoing online conversation and extract a conversation sample 22 including selected parts copied from the respective conversation. Extracting conversation samples may comprise identifying individual messages of a conversation and determining message-specific features such as a sender and/or receiver, a time of transmission (e.g., timestamp), a text of the respective message, and possibly other content data such as an image attached to the respective message. Interfacing with messaging application 30 may further include transmitting a challenge 44 to application 30, challenge 44 comprising a deliberately crafted message for insertion into the respective online conversation, as described in more detail below.
Conversation agent 32 may be implemented using any method known in the art. In one exemplary embodiment, agent 32 may be incorporated into messaging application 30, for instance as an add-on or plugin (e.g., a browser extension). Such embodiments may use the functionality of application 30 to extract and/or to insert data to/from ongoing conversations. Other exemplary embodiments of agent 32 may employ techniques known in the art of robotic process automation, for instance using a local driver to automatically identify elements of a user interface of messaging application 30 and mimic the way in which a human user would interact with it to read and write messages. Some such embodiments may extract message content using built-in features of the local operating system, such as an accessibility application programming interface (API)-software typically used for grabbing information currently displayed on screen for the purpose of making such information accessible to people with disabilities. Agent 32 may further parse various data structures such as a user interface tree or document object model (DOM) to identify individual messages and extract their content and other information such as an identity of the sender, etc.
In alternative embodiments, messaging application 30 may be surreptitiously modified (for instance by hooking) to install conversation agent 32. For instance, a code patch may be inserted into application 30, the respective code configured to transparently invoke execution of agent 32 whenever messaging application 30 performs a particular action, such as receiving or transmitting a message. Yet another embodiment of agent 32 may extract message content directly from intercepted network traffic going into messaging application 30 and/or passing via a network adapter(s) of the respective client device 10a-c. Such communication interceptors may implement communication protocols such as HTTP, WebSocket, and MQTT, among others, to parse communications and extract structured message data. When instant messages are encrypted, some embodiments employ techniques such as man-in-the-middle (MITM) to decrypt traffic for message content extraction.
Some embodiments of the present invention rely on the observation that modern chatbots are typically aware of a conversation's context, i.e., they generate individual messages according to a content of other messages previously exchanged during an ongoing conversation. Therefore, some embodiments of conversation agent 32 are further configured to organize messages into conversations. A conversation herein comprises a sequence of individual messages exchanged between the same pair of interlocutors (in the case of a one-to-one exchange), or within the same group (in the case of a group chat, for instance). In one exemplary embodiment, conversation agent 32 may identify a sender and/or receiver of each intercepted message, and attach a label to each message, the label creating an association between the respective message and a conversation. Following such labeling, messages may be selectively retrieved according to conversation and ordered according to a message-specific timestamp. In an alternative embodiment, conversation agent 32 may store each conversation as a separate data object comprising a concatenation of messages selected according to sender and/or recipient and arranged in the order of transmission according to their respective timestamp.
An exemplary conversation data object may further include a set of media indicators, for instance copies of image/video/audio files attached to messages belonging to the respective conversation, or a network address/URL where the respective media file is located. Other exemplary media indicators may include an indicator of a media format (encoding protocol), etc. A skilled artisan will understand that the actual data format of conversation objects may differ among embodiments; exemplary formats include a version of an extensible markup language (XML), and Javascript Object Notation (JSON), among others.
In some embodiments, in a step 302 (
In response to determining conversation context 42, agent 32 may transmit context 42 to challenge generator 34. In some embodiments, challenge generator 34 is configured to automatically construct a challenge 44 and at least one surrogate response 48 to the respective challenge (a step 306 in
Next, in a step 404, generator 34 may employ a generative language model to determine a surrogate conversation sequel 43 according to conversation context 42. Surrogate sequel 43 is deliberately constructed to predict or mimic a continuation (e.g., a new message) of an ongoing conversation represented by context 42. The modifier “surrogate”, as applied to “sequel” and “response”, is used herein to indicate that the respective item is an artefact produced by challenge generator 34, as opposed to a real message exchanged via messaging application 30. In the example of
Generator 34 may apply any generative language model known in the art to produce surrogate conversation sequel 43. Typical language models receive an input text fragment and produce an output text fragment comprising a computed continuation of the input text fragment. A model may be invoked iteratively, wherein at every step the input is modified according to the output determined at a previous step, for instance by concatenation. Language models are typically pre-trained on large text corpora and are language-specific.
One exemplary architecture of an AI module implementing a generative language model comprises a convolutional neural network (CNN) layer followed by a dense (i.e., fully connected) layer further coupled to a rectifier (e.g., ReLU or other activation function) and/or a loss layer. Alternative embodiments may comprise a CNN layer feeding into a recurrent neural network (RNN), followed by fully connected and ReLU/loss layers. Convolutional layers effectively multiply an internal representation of each token of a sequence with a matrix of weights known in the art as filters, to produce an embedding tensor so that each element of the respective tensor has contributions from a respective token, but also from other tokens adjacent to the selected token. The embedding tensor therefore collectively represents the input token sequence at a granularity that is coarser than that of individual tokens. The filter weights are adjustable parameters which may be tuned during a training process.
Recurrent neural networks (RNN) form a special class of artificial neural networks, wherein connections between the network nodes form a directed graph. Several flavors of RNN are known in the art, including long-short-term-memory (LSTM) networks and graph neural networks (GNN), among others. A typical RNN comprises a set of hidden units (e.g., individual neurons), and the topology of the network is specifically configured so that each hidden unit receives an input (e.g., embedding vector) characterizing a respective token tj, but also an input provided by an adjacent hidden unit, which in turn receives an input characterizing a token tj-1 preceding token tj within the input token sequence. As a result, the output of each hidden unit is influenced not only by the respective token tj, but also by the preceding token tj-1. Stated otherwise, an RNN layer may process information about each token in the context of previous token(s). Bi-directional RNN architectures may process information about each token in the context of both previous and subsequent token(s) of the input token sequence.
Yet another exemplary embodiment of an AI module for generating surrogate conversation sequel 43 comprises a stack of transformer neural network layers. The transformer architecture is described, for instance, in A. Vaswani et al., ‘Attention is all you need’, arXiv:1706.03762, among others. For each input token sequence, transformer layers may produce a sequence of contextualized token embedding vectors, wherein each token embedding vector encodes information from multiple (e.g., all) tokens tj of the input sequence. The output of the transformer layers may be fed into multiple distinct classifier modules (e.g., dense layers) known in the art as prediction heads, which in turn determine output tokens and therefore construct a continuation to the input tokens sequence.
In some embodiments as illustrated, the input token sequence further comprises a provisional sequel 143 projecting conversation context 42 into the future. Initially, provisional sequel 143 may consist exclusively of placeholder tokens, as illustrated in the upper half of
In response to determining surrogate conversation sequel 43, a step 406 may distort conversation sequel 43 to produce challenge 44. Some embodiments rely on the observation that deep neural networks are extremely complex systems and as such are prone to exhibiting chaotic behavior. One of the signatures of chaotic dynamics is the extreme sensitivity to initial conditions, wherein small differences in the initial state of a chaotic system grow exponentially fast, a phenomenon sometimes referred to as “the butterfly effect” in popular culture. A consequence of such sensitivity is that a chaotic system may be sent on a vastly different trajectory by the slightest nudge. Such behavior translates to the field of generative language models, where various computer experiments have revealed that the same model may produce substantially different output when fed input text that differs only slightly. Building on such observations, some embodiments distort surrogate conversation sequel 43 in a deliberate attempt to construct a challenge 44 that causes the respective chatbot to depart on a diverging trajectory, thus enabling chatbot detection.
In some embodiments, challenge generator 34 comprises a sequence modifier 60 as illustrated in
In the example illustrated in
In response to determining challenge 44, in a step 408, challenge generator 34 may determine a set of surrogate responses 48a-b (see
Evaluating challenge 44 may comprise determining a likelihood that a chatbot's response to challenge 44 may be substantially different from that of a human. In some embodiments, step 410 comprises evaluating a similarity between two token sequences, for instance, between surrogate conversation sequel 43 and challenge 44, and compare the respective similarity to a pre-determined threshold. Challenges 44 that are not sufficiently removed from sequel 43 may then be rejected as unsatisfactory.
In an alternative embodiment, challenge 44 is deemed satisfactory when a difference Dbetween challenge 44 and sequel 43 is smaller than a pre-determined upper bound ΔU, relying on the observation that a successful challenge will only derail a robot, while challenges that are too strange or out of context may trigger non-standard responses from both humans and robots. Yet other exemplary embodiments may evaluate a difference Dr between surrogate responses 48a and 48b and determine that challenge 44 is satisfactory when the respective difference exceeds a pre-determined lower threshold ΔL. Both ΔU and ΔL may be determined via computer experiments and may be specific to a type of chatbot.
Such criteria may also be combined. In one such example, challenge 44 is deemed satisfactory when Dc<ΔU AND Dr>ΔL. In another example, challenge generator 34 may determine a composite distance:
and determine that challenge 44 is satisfactory when D is lower than a pre-determined threshold.
Challenge generator 34 may use any method known in the art to quantify the similarity between to text sequences. Exemplary similarity measures include various variants of the Levenshtein distance, as well as a distance evaluated in an embedding space, which is further detailed below. Other similarity measures known in the art determine a measure of a sentiment transmitted by a target fragment of text, enabling some embodiments to detect a change in mood (e.g., from neutral to angry, etc.). Yet other exemplary similarity measures may be derived from metrics known in the art of machine translation, such as a bilingual evaluation understudy (BLEU) score or a recall-oriented understudy for gisting evaluation (ROUGE) score. When challenge 44 is deemed satisfactory, challenge 44 is output together with the respective surrogate response(s) 48a-b in a step 414 (
In a step 308 (
In some embodiments, response analyzer 36 determines bot verdict 26 according to a measure of similarity between partner response 46 and at least one of surrogate responses 48a-b determined in step 306. For instance, partner response 46 may be considered to originate from a chatbot when a difference between partner response 46 and surrogate response 48b exceeds a pre-determined threshold, which may be determined via experimentation. An alternative embodiment may further determine a difference between partner response 46 and surrogate response 48a, and determine that response 46 was authored by a chatbot when items 46 and 48a are sufficiently similar. Yet other embodiments may decide according to a similarity between item 46 and item 48b AND according to a similarity between item 46 and item 48a. For instance, an exemplary embodiment may determine that partner response 46 was generated by a chatbot if response 46 is more similar to surrogate response 48a than to surrogate response 48b.
An exemplary similarity measure for use in step 314 may be determined according to a distance separating two token sequences in an abstract hyperspace sometimes referred to as an embedding space. In such embodiments, each token sequence may be represented as a multidimensional embedding vector comprising a plurality of numerical coordinates collectively indicating a position of the respective token sequence in the embedding space. Individual coordinates of the embedding vector are determined by a component of chatbot detector 20 usually known as an encoder.
In an alternative embodiment, encoder 66 may calculate an embedding vector for each token of a sequence, and a similarity measure may then be calculated as an aggregated distance combining multiple inter-token distances. An exemplary text token embedding may be calculated using a version of the Word2Vic or Glove algorithms, among others. To produce such token embedding vectors, encoder 66 may be pre-trained on a text corpus, for instance using to a bag-of-words and/or skip-gram algorithm.
Bot verdict 26 may comprise a label (e.g., human, robot, etc.) or a Boolean value indicative of whether the author of partner response 46 is a chatbot or not. Alternatively, verdict 26 may comprise a number indicating a likelihood (e.g., a probability scaled between 0 and 1, with 1 indicating certainty) that response 46 was formulated by a robot. In response to determining verdict 26, chatbot detector 20 may display an indicator of verdict 26 to a user of the respective client system. For instance, some embodiments may employ conversation agent 32 to mark the respective conversation partner accordingly in a user interface of messaging application 30. In one such example, a chatbot conversation partner may be highlighted using a distinctive label, color, icon, etc.
As indicated above, various components of chatbot detector 20, such as generative language module 50 and encoder 66, among others, may comprise pre-trained neural networks. Training herein denotes a process of presenting the respective neural networks with a set of training samples (e.g., a corpus of text formulated in a natural language such as English, Chinese, Russian, etc.), employing the respective networks to determine an output according to the training samples, and in response, adjusting a set of parameters of the respective neural networks (e.g., synapse weights, etc.) according to the respective output. Several training strategies are known in the art, such as supervised and unsupervised training, among others. Generative language models are typically expensive to train in terms of computational resources (processing power and memory), so in typical embodiments training is performed on a dedicated AI training appliance 14 (
Some embodiments may bypass language model training altogether and use publicly available language models and/or chatbots (such as an implementation of ChatGPT from Open AI, Inc.) to embody some of the functionality of chatbot detector 20. For instance, challenge generator 34 may invoke a remote chatbot to generate surrogate conversation sequel 43 and/or surrogate responses 48a-b. In an alternative embodiment, components such as generative language module 50 (
Processors 82 are generally characterized by an instruction set architecture (ISA), which specifies the respective set of processor instructions (e.g., the x86 family vs. ARM® family), and the size of registers (e.g., 32 bit vs. 64 bit processors), among others. The architecture of processors 82 may further vary according to their intended primary use. While central processing units (CPU) are general-purpose processors, graphics processing units (GPU) may be optimized for image/video processing and some forms of parallel computing. Processors 82 may further include application-specific integrated circuits (ASIC), such as Tensor Processing Units (TPU) from Google®, Inc., and Neural Processing Units (NPU) from various manufacturers. TPUs and NPUs may be particularly suited for AI and machine learning applications as described herein.
Memory unit 84 may comprise volatile computer-readable media (e.g. dynamic random-access memory—DRAM) storing data/signals/instruction encodings accessed or generated by processor(s) 82 in the course of carrying out operations. Input devices 86 may include computer keyboards, mice, and microphones, among others, including the respective hardware interfaces and/or adapters allowing a user to introduce data and/or instructions into computer system 80. Output devices 88 may include display devices such as monitors and speakers among others, as well as hardware interfaces/adapters such as graphic cards, enabling the respective computing appliance to communicate data to a user. In some embodiments, input and output devices 86-88 share a common piece of hardware (e.g., a touch screen). Storage devices 92 include computer-readable media enabling the non-volatile storage, reading, and writing of software instructions and/or data. Exemplary storage devices include magnetic and optical disks and flash memory devices, as well as removable media such as CD and/or DVD disks and drives. Network adapter(s) 94 enable computer system 80 to connect to an electronic communication network (e.g. network 15 in
Controller hub 90 generically represents the plurality of system, peripheral, and/or chipset buses, and/or all other circuitry enabling the communication between processor(s) 82 and the rest of the hardware components of system 80. For instance, controller hub 90 may comprise a memory controller, an input/output (I/O) controller, and an interrupt controller. Depending on hardware manufacturer, some such controllers may be incorporated into a single integrated circuit, and/or may be integrated with processor(s) 82. In another example, controller hub 90 may comprise a northbridge connecting processor 82 to memory 84, and/or a southbridge connecting processor 82 to devices 86, 88, 92, and 94.
The exemplary systems and methods described above enable performing a Turing test to determine whether an online conversation partner comprises a chatbot. Chatbot technology has benefitted from recent developments in natural language processing, and in particular from the success of large language models such as generative pre-trained transformers (GPT). Currently, the problem of determining whether a conversation partner is robotic or not is critical to many applications, including prevention of fraud and online disinformation.
Conventional robot detection methods include various flavors of CAPTCHA, comprising inviting a user to solve a particular puzzle (e.g., displaying multiple images and asking the user to indicate which one shows a particular item such as a car or a bicycle), and determining whether the respective user is a robot according to a response to the respective puzzle. Conventional CAPTCHA explicitly relies on limitations of current AI systems in dealing with certain problems such as image recognition, among others. Recent improvements in AI are quickly rendering such conventional Turing tests obsolete. In contrast to conventional CAPTCHA tests, some embodiments do not assume any capability or limitation of the target chatbot (apart from the ability to generate plausible text) and instead exploit deeper, intrinsic features of generative language models, such as their inherent chaoticity. Systems and methods described herein may therefore prove more reliable at detecting AI than conventional Turing tests based on CAPTCHA technologies.
A chatbot detector as described herein may directly interface with a messaging application to effectively participate in an ongoing online conversation. An exemplary conversation agent component may be added on to the respective messaging application in the form of an extension or plugin. The respective component may determine a context of an ongoing conversation, attribute individual images to their respective users, and instruct the respective messaging application to submit a continuation to the respective conversation. The chatbot detector may formulate a challenge in the form of at least one conversation message, listen for a response to the respective challenge, and determine whether the respective conversation partner is a chatbot according to the response.
Some embodiments rely on the observation that by being complex systems, generative language models employed by chatbots may display chaotic behavior. One characteristic of chaotic systems is an extreme sensitivity to initial conditions, i.e., slight differences in initial conditions may lead to vastly different futures. The chaotic nature of language models may explain, for instance, why chatbots sometimes output surprising, out-of-context statements, a property known in the art as hallucinating. Some embodiments of the present invention explicitly use such trajectory divergence for chatbot detection, by carefully and deliberately constructing a challenge that would cause a chatbot to deviate from its expected behavior.
To exploit sensitivity to initial conditions, some embodiments construct a surrogate conversation sequel 43 (see e.g.,
The described systems and methods further rely on the observation that a successful challenge 44 should depart only slightly from a plausible conversation sequel 43, so as not to cause a human conversation partner to react in an unexpected way, therefore causing a false positive detection. Some embodiments therefore run a set of quality tests on a candidate challenge before actually submitting the respective challenge to the messaging application. In one example, a challenge is deemed satisfactory when a difference between the respective challenge 44 and the undistorted conversation sequel 43 falls within predetermined upper and lower bounds. In another example, the chatbot detector may generate a plausible response 48a to the respective challenge. The respective challenge 44 may then be deemed satisfactory if it stays within a predetermined upper bound away from the undistorted conversation sequel 43, while the surrogate response 48a to the respective challenge is farther removed from a surrogate response 48b to the undistorted conversation sequel than a pre-determined lower bound.
Exemplary distortions applied to construct the challenge include modifying the generated conversation sequel by introducing particular keywords, special characters and/or emojis, by altering a sentiment of the conversation sequel (for instance from neutral to angry), and changing a type of sentence (for instance from affirmative to negative or interrogative), among others. The specific types of distortion used by the chatbot detector may be updated from time to time, to keep pace with advances in chatbot technology. The choice of distortion may be informed by direct computer experiments comprising various attempts to cause hallucination or trajectory divergence in actual online chatbots.
It will be clear to one skilled in the art that the above embodiments may be altered in many ways without departing from the scope of the invention. Accordingly, the scope of the invention should be determined by the following claims and their legal equivalents.