Typically, agents overseeing interactive sessions, or chats, oversee a single chat rather than multiple chats concurrently. The chats can use generative artificial intelligence (“AI”) to provide responses to a user request. The agent typically has to review the response generated by the AI before providing an input to transmit the response to the user. This prevents the agent from being able to oversee multiple chats occurring at once. Further, typical user interfaces are not configured to support multiple chats. Rather, the typical user interface provides for output a single chat such that the agent can see what is happening in that chat at all times. The constant review and approval of responses generated by AI is time consuming and prevents the agent from being able to oversee multiple chats.
The technology is generally directed to using generative AI to facilitate multiple interactive sessions between one agent and multiple users concurrently. The interactive sessions may be electronic communication sessions, such as chats, configured to transmit and receive content among the participants of the interactive sessions. The interactive sessions may be established in response to receiving content from respective users. A predicted response may be identified, or generated, by generative AI trained to provide predicted responses based on the received content. The predicted responses may be automatically transmitted to the user if no manual input from the agent is received within a threshold period of time. In some examples, in response to the received content, a second machine learning model may determine whether to transmit a notification to the agent. The notification may be a request for agent intervention such that the agent subsequently provides one or more manual inputs rather than the predicted response.
One aspect of the disclosure is directed to a method comprising receiving, by one or more processors, content from a plurality of users, generating, by the one or more processors, a respective interaction window for each of the plurality of users, wherein each respective interaction window corresponds to a respective interactive session, identifying, by the one or more processors executing a first machine learning model based on the received content, a predicted response for each interactive session, determining, by the one or more processors prior to transmitting the predicted response, if a manual input from an agent is received, and automatically transmitting, by the one or more processors after a threshold period of time if the manual input from the agent is not received, the predicted response for each interactive session, wherein the automatically transmitting occurs with respect to multiple interactive sessions concurrently.
The respective interaction window may include a timer element in relation to the predicted response. The timer element may provide an indication of a remaining amount of time of the threshold period of time before the predicted response is automatically transmitted.
The respective interactive session may correspond to an electronic communication session among two or more of a respective user, the machine learning model, or an agent. The respective interaction windows for each of the plurality of users may be provided for output on one or more displays coupled to an agent computing device. The respective interactive windows may be cascaded in a panel of the single display.
A visible portion of the respective interaction windows may include a timer and an identifier of the respective user. The timer may provide an indication of an elapsed time since a previous response was transmitted to a respective user or an elapsed time from when content was received from the respective user. The previous response may be the predicted response or the manual input from the agent.
The method may further comprise automatically identifying, by the one or more processors executing a second machine learning model, whether to transmit a notification to an agent. The notification may be an audible or visual notification. The notification may be a request for agent intervention. The agent intervention may correspond to one or more manual inputs from the agent in response to the received content from a respective user.
The method may further comprise terminating, by one or more processors executing the machine learning model based on the received content, the respective interactive session. The method may further comprise providing, by the one or more processors as input into the first or second machine learning model, contents of the respective interaction session. The method may further comprise updating, by the one or more processors based on the contents of the respective interactive session, the first or second machine learning model. The contents of the respective interactive session may include an indication of when the manual input from the agent was transmitted instead of the predicted response.
Another aspect of the disclosure is directed to a system comprising one or more processors. The one or more processors may be configured to receive content from a plurality of users, generate a respective interaction window for each of the plurality of users, wherein each respective interaction window corresponds to a respective interactive session, identify, by executing a first machine learning model based on the received content, a predicted response for each interactive session, determine, prior to transmitting the predicted response, if a manual input from an agent is received, and automatically transmit, processors after a threshold period of time if the manual input from the agent is not received, the predicted response for each interactive session, wherein the automatically transmitting occurs with respect to multiple interactive sessions concurrently.
Yet another aspect of the disclosure is directed to one or more non-transitory computer-readable storage media encoding instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising receiving content from a plurality of users, generating a respective interaction window for each of the plurality of users, wherein each respective interaction window corresponds to a respective interactive session, identifying, by executing a first machine learning model based on the received content, a predicted response for each interactive session, determining, prior to transmitting the predicted response, if a manual input from an agent is received, and automatically transmitting, processors after a threshold period of time if the manual input from the agent is not received, the predicted response for each interactive session, wherein the automatically transmitting occurs with respect to multiple interactive sessions concurrently.
The technology is generally directed to concurrently conducting multiple interactive sessions between a single agent and a plurality of users. The single agent is able to oversee the multiple interactive sessions through the use of generative AI configured to predict responses to content provided by a respective user. Further, the single agent is able to oversee the multiple interactive sessions, concurrently, through the use of AI to provide notifications to alert the agent when to intervene in a particular interactive session.
The interactive session may be an electronic communication session, or a chat session, between a respective user and the agent. The agent may be overseeing multiple interactive sessions concurrently. The generative AI model may be trained to provide predictive responses to the content received by the user for a given interactive session. The generative AI may, in some examples, be a ML model. A plurality of interactive sessions may occur concurrently using the generative AI, such that the generative AI provides individualized predicted responses for each interactive communication session based on the content received from the user of the respective interactive session. The predicted responses provided by the generative AI as output provide conversational and interactive responses to the specific content provided by the user. The predicted responses may be transmitted automatically if an agent input is not received within a threshold period of time from the generation of the predicted response.
By using generative AI to predict responses, an agent may oversee a plurality of interactive sessions simultaneously as opposed to overseeing a single interactive session at a time. For example, the use of AI may automate actions and workflows within the interactive sessions thereby reducing the need for agent interaction with the interactive sessions. Further, the use of AI may allow for the number of interactive sessions that can be managed concurrently to increase without interfering or reducing the quality and effectiveness of the interactive sessions. For example, the predicted responses provided by the generative AI, the threshold period of time for waiting for an agent input prior to transmitting the predicted responses, and other features may result in a natural conversation between the user and the generative AI, in lieu of the agent, such that the user is unaware that their interactions are with AI rather than a human.
The threshold waiting period, in some examples, may correspond to a buffer time to avoid giving full control to the AI system when responding to users. This threshold waiting period, therefore, prevents the AI system from acting without user oversight.
According to some examples, an artificial intelligence (“AI”) model, such as a machine learning (“ML”) model, may be trained to provide a notification to the agent based on the content received from the user. The AI model may be, for example, a notification system. For example, if the content includes a request to speak to an agent, to access account information, or the like, the AI model may transmit a notification to the agent. In some examples, if the AI model cannot generate a response to the content, cannot access information responsive to the content, or the like, the AI model may transmit a notification to the agent. The notification may be, in some examples, visual or audible. For example, each interactive session that is occurring simultaneously may include a header. The notification may cause the header of the respective interactive session to flash, blink, change colors, or the like. Additionally or alternatively, the notification may be audible, haptic, etc. and draw the attention of the agent to the respective interaction session.
In some examples, when a notification is transmitted to the agent, the agent may intervene, such as by providing a manual input in response to the content received from the user. Such intervention may be used to update the second AI model, e.g., the notification system. For example, the manual input in response to a notification may be used as training data to update the notification system. The training data may include an indication of why the notification was sent, why the agent intervened in the interactive session, when the agent intervened, what the context of the intervention was, or the like. The content of the intervention may be, for example, the timing in the interactive session, text, request, etc. The training data may be used to update the notification system to determine one or more additional notifications. According to some examples, updating the system based on details regarding an agent's intervention in the interactive session may increase the computational efficiency of the system. For example, by using the details associated with why a notification was sent, why an agent intervened in an interactive session, when an agent intervened, what the context of the intervention was, etc., the AI model may be updated to more accurately predict when a notification is necessary or more appropriate. By sending a notification at an earlier time, in response to certain content, or the like, the use of computational resources of the system decreases by reducing the number of messages being transmitted in the interactive session. For example, by sending the notification to the agent to intervene at a more appropriate or accurate time during the interactive session, the user may no longer have to send a plurality of messages and/or receive a plurality of unresponsive messages via the interactive session. This can dramatically reduce the number of messages when large numbers of messages need to be processed at the same time by the system, when an extensive number of chats is run in parallel, thereby reducing processing power, network overhead, and other system resources. As a large number of chats can be run in parallel by a single agent, a reduction in messages exchanged by all parties involved in the interactive session can improve the functioning of the system. For example, by reducing the number of messages exchanged by all parties involved in the interactive session, the efficiency of the system is increased by reducing the processing power, network overhead, and other system resources.
The notifications may allow for an agent to simultaneously oversee a plurality of interactive sessions as the notification will alert the agent to the interactive session that requires an agent input while the generative AI continues to predict and, in some examples, transmit responses to the content received from the user.
Using generative AI, e.g., the predicted response system, to provide predicted responses to content received from the user and an AI model to provide notifications to an agent to intervene in the interactive session may allow for a plurality of efficient and effective interactive sessions to occur concurrently while being overseen by a single agent. In particular, the agent may continue to control and supervise the AI predicted responses while saving time and increasing productivity by intervening only when the AI model, e.g., the notification system, provides a notification to do so. The AI models may, therefore, allow a single agent to supervise a plurality of interactive sessions based, in part, to the predicted responses provided by the generative AI, e.g., the predicted response system, and the notifications to intervene provided by the AI model, e.g., the notification system.
According to some examples, by increasing the number of concurrent interactive sessions being overseen by a single agent, the computational efficiency of the system may increase by decreasing the amount of processing power and network overhead to conduct the interactive sessions. The amount of processing power and network overhead may be decreased by reducing the number of computer systems having to run concurrently, e.g., by no longer having to have a single computing system per interactive session. Rather, the processing power and network overhead is reduced by having a single computing system capable of handling a plurality of interactive sessions running concurrently.
This disclosure describes techniques for enabling artificial intelligence to provide predicted responses during an interactive session and identify whether to transmit a notification to alert an agent to intervene in the interactive session. Artificial intelligence (AI) is a segment of computer science that focuses on the creation of intelligent agents that can learn and act autonomously (e.g., without human intervention). Artificial intelligence systems can utilize one or more of (i) machine learning, which focuses on developing algorithms that can learn from data, (ii) natural language processing, which focuses on understanding and generating human language, and/or (iii) computer vision, which is a field that focuses on understanding and interpreting images and videos. Artificial intelligence systems can include generative models that generate new content (e.g., images/video, text, audio, or other content) in response to input prompts.
Example machine-learned models include neural networks or other multi-layer non-linear models. Example neural networks include feed forward neural networks, deep neural networks, recurrent neural networks, and convolutional neural networks. Some example machine-learned models can leverage an attention mechanism such as self-attention. For example, some example machine-learned models can include multi-headed self-attention models (e.g., transformer models).
The model(s) can be trained using various training or learning techniques. The training can implement supervised learning, unsupervised learning, reinforcement learning, etc. The training can use techniques such as, for example, backwards propagation of errors. For example, a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function). Various loss functions can be used such as mean squared error, likelihood loss, cross entropy loss, hinge loss, and/or various other loss functions. Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations. A number of generalization techniques (e.g., weight decays, dropouts, etc.) can be used to improve the generalization capability of the models being trained.
The model(s) can be pre-trained before domain-specific alignment. For instance, a model can be pretrained over a general corpus of training data and fine-tuned on a more targeted corpus of training data. A model can be aligned using prompts that are designed to elicit domain-specific outputs. Prompts can be designed to include learned prompt values (e.g., soft prompts). The trained model(s) may be validated prior to their use using input data other than the training data, and may be further updated or refined during their use based on additional feedback/inputs.
According to some examples, when the interface 100 receives an input corresponding to a selection of an interaction window 108, 110 an interactive session may be provided as a pop-up 106 or overlay on the interface 100. The interactive session may be a communication interface configured to allow for communication among the user, AI, and the agent. According to some examples, the interactive sessions may be provided as cascaded windows in chat panel 104. The interactive session provided for output in the chat panel 104 may be the same or different from the interactive session provided for output as pop-up 106. By having different interactive sessions in each of the chat panel 104 and pop-up 106, the agent may oversee and/or interact with a plurality of interactive sessions concurrently without having to switch interfaces, windows, computers, etc. This may increase computational efficiency by reducing the amount of processing power and network overhead by allowing the agent to engage with a plurality of users concurrently, without having to have multiple interfaces, computer systems, or the like running concurrently.
The interaction window 202 may include an abstract interaction window 204. The abstract interaction window 204 may include conversation bridge service 206, interactive session service 208, and conversation controller cache. The bridge may be a bridge between the user interface and an API.
According to some examples, bridge protocol buffers may be generated by the system 200. According to some examples, the protocol buffers may encode structured data in an efficient yet extensible format. For example, each field of the content may include a data type, tag, name, etc. A protocol-compiler may generate code that constructs and parses the content, produces human-readable dumps, or the like. According to some examples, bridge protocol buffers may generate services required for the interaction windows based on the use case of the interface, e.g., customer support. The bridge protocol buffers and their generated services may be reused by other bridge servers for other user cases, e.g., sales. Extensibility in the frontend components may be provided where appropriate to provide sufficient customization for all currently known and anticipated application specific use cases. The components may include one or more of a header component 212, transcript component 214, input component 216, and predicted response component 218.
According to some examples, an AI model may be used to identify, provide for output, and/or store the state of all active interactive sessions an agent is engaged in. For example, a controller may be configured to sync the AI model used to identify the state of the interactive session with the state of the database.
In some examples, a bridge protocol buffer may be created to identify information, or data, on the interactive session tangle entity, all participant entities, and the automated flow entities of the interactive session. Participant entities may include, for example, model representation for all participants in an interactive session. The participants may be, for example, the user, the agent, or the like. Automated flow entities may include, for example, model representation for agent supervised automation in the conversation. The model may contain the step configuration that determines how the automation in the interactive session will proceed. For example, the model may determine whether to send a predetermined response, an AI generated message, ending the chat, etc.
In some examples, a bridge protocol buffers may be generated to identify event data for an interactive session. For example, a first layer may have an event entity that stores common event data. A separate entity may store information related to specific types of events. The bridge conversation event entity may consolidate the individual entities into a single entity. Common fields may be added to the combined entity and used to create and/or update interactive session events. The interactive session events may be used throughout the life cycle, or lifetime, of the interactive session. The interactive session events may include, for example, when content is being transmitted and/or received by the user or agent, when content is being transmitted as part of an automated flow, an intervention from the agent, etc.
Interactive session service 208 may be configured to expose one or more additional features of the application programming interface (“API”) for interface 100. The API may include, for example, a bridge between the interface and the API configured to fetch a specific interactive session and return a model of the interactive session if it exists, a bridge between the interface and the API configured to fetch interactive session events, a bridge between the interface and the API configured to insert a completed interactive session event into a text box to be transmitted as content, a bridge between the interface and the API configured to insert a completed interactive session event into a text box to be transmitted as content to terminate the interactive session, a bridge between the interface and the API configured to insert a completed interactive session event including joint event data. Completed interactive session events may be, for example, events in which common interactive session event metadata has been filled out, e.g., provided. Event metadata may include, for example, an identification of the participant that triggered the interactive session event, whether an automated flow was used to trigger the interactive session event, ids and text for components of the interactive session event that were provided by AI, among other ids and timestamps needed by the system.
Interactive session controller cache 210 may be configured for tracking interactive sessions in memory. The system 200 may store and/or track interactive sessions after confirming the users have provided authorization for the interactive session to be stored and/or tracked. The contents of the interactive sessions may be determined and used after the user provides authorization for the system 200 to access and receive information related to the interactive session the user participated in. For example, the user may provide authorization to a website, application, or system when participating in an interactive session. The authorization may be for the application to system to access, store, track, or the like the contents of the interactive session.
According to some examples, the interactive session controller cache 210 may be configured as a pass through for content being transmitted to interactive session service 208. In such an example, interactive session controller cache 210 may add logic and/or local event handling on an as needed basis.
In some examples, the interactive session controller cache 210 may be configured to generate, store, and/or provide an identifier of respective interactive sessions. Each component of system 200 may use the identifier throughout a lifecycle of an interactive session.
The interactive session controller cache 210 may be configured to subscribe/unsubscribe 220 interactive sessions to memory. For example, subscribing an interactive session to memory may include loading contents of an interactive session into memory and updating the contents with the storage layer. The contents may include, for example, any content that was transmitted and/or received by the system, events associated with the interactive session, typing status of the user, agent, or AI models, or the like. The interactive session controller cache 210 may be configured to keep the interactive session model in sync by reloading at least a portion of the interactive session for each invalidation type.
Unsubscribing the interactive session may include, for example, removing the contents and/or associated interactive session model from memory. According to some examples, the interactive session controller cache 210 may be configured to unsubscribe from an interactive session at the termination of the interactive session.
According to some examples, a request to confirm whether the contents of an interactive session are loaded may be transmitted. In response to the request, interactive session controller cache 210 may provide a Boolean. The Boolean may provide an indication, such as a loading indicator, as to whether the interactive session has been loaded into memory.
In some examples, a request to return a model for the interactive session may be transmitted. In response to the request, the interactive session controller cache 210 may provide the model for the specific interactive session if the interactive session is in memory. If the specific interactive session is not in memory, the interactive session controller cache 210 may provide an empty response or an indication that the specific interactive session is not in memory.
According to some examples, a request for a list of interactive session events may be transmitted. In response to the request, interactive session controller cache 210 may return a list of interactive session events for a specific interactive session if the interactive session is stored in memory. If the specific interactive session is not in memory, the interactive session controller cache 210 may provide an empty response or an indication that the specific interactive session is not in memory.
According to some examples, a request to send content may be transmitted. For example, the request to send content may be transmitted to interactive session service 208. In response to the request, interactive session controller cache 210 may add, or include, logic configured to handle the in-flight view of the content to be transmitted. The interactive session controller cache 210 may include a pending event field, which may provide an indication of the status of the content to be transmitted. For example, when the system 200 receives content 22, the system 200 may generate an interaction window 202 and corresponding interactive session that includes the content 222. A pending message from the system 200 to the user may be provided as part of a call to the interactive session events. An indication of “pending” may be provided in the interactive session and/or interaction window 202 until the pending message is confirmed as “sent” or “failed.” Upon confirmation of “sent” or “failed,” the indication of the status of the message may be updated to correspond to “sent” or “failed.” According to some examples, when the status of the message is “failed,” the indication “failed” may remain in place in the interactive session and a “retry” input may be provided. An input corresponding to the selection of “retry” may cause the same message to be resent with updated time stamps and the previous “failed” message may be removed from the interactive session.
The interactive window 202 may include a plurality of interface components. The interface components may include, for example, header component 212, transcript component 214, input component 216, and predicted response component 218.
Header component 212 may be configured to provide interactive session data, such as the username, interactive session identifier, or the like. The information or data provided by the header component 212 may be specific to the use case of interface 100. For example, if the interface is used in conjunction with sales or customer support, header component 212 may include order information.
Transcript component 214 may be configured to access, read, and/or bound to interactive session cache, e.g, interactive session controller cache 210. According to some examples, transcript component 214 may be configured to register renderers for each event type. For example, for a content event the transcript component 214 may be configured to render content. When an interactive window is generated, transcript component 214 may be configured to render an identification of the participants within the interaction session, e.g., the user, the agent, or the like. When a participant leaves the interactive session, transcript component 214 may be configured to render an identification of the participants who have left the interactive session.
Input component 216 may be configured to receive content to be transmitted to a user as part of the interactive session. According to some examples, input component 216 may be configured to call interactive session cache, e.g., interactive session controller cache 210, when an agent or the AI model transmits content to a user. The input component 216 may be configured to toggle between shareable content, such as emojis, attachments, predicted responses, spelling and grammar check, manual input from the agent, or the like. According to some examples, system 200 may disable input component 216 when there is a pending message from the agent and/or AI model to the user.
Predicted response component 218 may be configured to provide a generated, or predicted, response to content 222 received from the use. The predicted response component 218 may be bound to conversation cache, e.g., interactive session controller cache 210, such that predicted responses provided by predicted response component 218 may be appended to draft messages.
According to some examples, the predicted response system 302 can receive the inference data 304 and/or training data 306 as part of a call to an application programming interface (API) exposing the predicted response system 302 to one or more computing devices. Inference data 304 and/or training data 306 can also be provided to the predicted response system 302 through a storage medium, such as remote storage connected to the one or more computing devices over a network. Inference data 304 and/or training data 306 can further be provided as input through a user interface on a client computing device coupled to the
The inference data 304 can include data associated with predicting responses to content as part of a plurality of concurrent interactive sessions. The inference data 304 may include content, such as event data, context data, or the like, associated with interactive sessions. In some examples, the inference data 304 may include source text of the interactive sessions as well as metadata for the source text, such as timestamp, event type, interventions, or the like.
The training data 306 can correspond to an artificial intelligence (AI) task, such as a ML task, for predicting responses to content received from a user, such as a task performed by a neural network. The training data can be split into a training set, a validation set, and/or a testing set. An example training/validation/testing split can be an 80/10/10 split, although any other split may be possible. The training data 306 can include example responses for certain content received from users. For example, if the content received from the user is a request for a status update on their order, the example responses may be “Can you please provide your order number?” or “I am happy to help you find that.” The training data 306 may be based on previous interactive sessions among users, agents, the predicted response system 302, and/or other AI models. For example, the content of completed, or terminated, interactive sessions may be provided as training data 306 for the predicted response system 302. The predicted response system may identify example responses, based on previously provided predicted responses and/or manual input from the agent, provided based on the content received from the user.
The training data 306 can be in any form suitable for training a model, according to one of a variety of different learning techniques. Learning techniques for training a model can include supervised learning, unsupervised learning, and semi-supervised learning techniques. For example, the training data can include multiple training examples that can be received as input by a model. The training examples can be labeled with a desired output for the model when processing the labeled training examples. The label and the model output can be evaluated through a loss function to determine an error, which can be backpropagated through the model to update weights for the model. For example, if the machine learning task is a classification task, the training examples can be images labeled with one or more classes categorizing subjects depicted in the images. As another example, a supervised learning technique can be applied to calculate an error between outputs, with a ground-truth label of a training example processed by the model. Any of a variety of loss or error functions appropriate for the type of the task the model is being trained for can be utilized, such as cross-entropy loss for classification tasks, or mean square error for regression tasks. The gradient of the error with respect to the different weights of the candidate model on candidate hardware can be calculated, for example using a backpropagation algorithm, and the weights for the model can be updated. The model can be trained until stopping criteria are met, such as a number of iterations for training, a maximum period of time, a convergence, or when a minimum accuracy threshold is met.
From the inference data 304 and/or training data 306, the predicted response system 302 can be configured to output one or more results related to providing a generative predicted response to content received from users during an interactive session. The predicted response may be generated as output data 314. As examples, the output data 314 can be any kind of score, classification, or regression output based on the input data. Correspondingly, the AI or machine learning task can be a scoring, classification, and/or regression task for predicting some output given some input. For example, the predicted response system 302 may predict a response given the input, e.g., content from a user. These AI or machine learning tasks can correspond to a variety of different applications in processing images, video, text, speech, or other types of data to provide an efficient and effective conversational experience among a user, an agent, and the predicted response system 302.
As an example, the predicted response system 302 can be configured to send the output data 314 for display on a client or user display. For example, the output data 314 may be provided for display on interface 100. As another example, the predicted response system 302 can be configured to provide the output data 314 as a set of computer-readable instructions, such as one or more computer programs. The computer programs can be written in any type of programming language, and according to any programming paradigm, e.g., declarative, procedural, assembly, object-oriented, data-oriented, functional, or imperative. The computer programs can be written to perform one or more different functions and to operate within a computing environment, e.g., on a physical device, virtual machine, or across multiple devices. The computer programs can also implement functionality described herein, for example, as performed by a system, engine, module, or model. The predicted response system 302 can further be configured to forward the output data 314 to one or more other devices configured for translating the output data into an executable program written in a computer programming language. The predicted response system 302 can also be configured to send the output data 314 to a storage device for storage and later retrieval.
The predicted response system 302 may provide predicted responses to be transmitted to a user in response to content received from the user. The predicted responses provided as output data 314 may be automatically transmitted after a threshold period of time if a manual input from the agent is not received. For example, after the predicted response system 302 provides the predicted response to the content as output data 314 an agent may review or otherwise provide an input overriding the predicted response. A threshold period of time may be set for the agent to intervene with the predicted response. After the threshold period of time elapses, the predicted response may be automatically transmitted to the user in response to the content.
The threshold period of time may, therefore, be a buffer period of time that prevents the AI system from having full control over the interactive sessions. In some examples, the threshold period of time allows agents to intervene to prevent the interactive sessions from being purely automated. The threshold period of time may, in some examples, be determined on an agent by agent basis. For example, the threshold period of time may be determined on a per agent basis based on one or more variables. The variables may include, for example, the number of concurrent interactive sessions the agent is engaged with, the time that has elapsed between communications within the interactive sessions, the number of communications transmitted and/or received within the interactive session, the type of issue the agent is handling within the interactive session, etc. In some examples, the threshold period of time may be determined for a plurality of agents, all agents, etc. For example, the threshold period of time may be determined based on the variables for the plurality of agents. According to some examples, the threshold period of time may be determined using an AI model trained to optimize the threshold period of time.
By using the predicted response system 302 and automatically transmitting the output data 314 if a manual input from the agent is not received within the threshold period of time, the computational efficiency of the system may be increased by requiring fewer manual inputs from the agent. For example, the predicted response system 302 may generate responses to be transmitted without requiring input from the agent. This increases computational efficiency by reducing the number of inputs required to engage with a user which decreases the amount of processing and network overhead associated with the interactive session. Further, a predicted response that is generated based on inference data 304 and training data 306 that is continuously updated based on terminated interactive sessions means that fewer inputs are required to have an efficient and effective interactive session. Further the network and processor overhead associated with subsequent manual inputs from the agent is reduced.
In some examples, using the predicted response system 302 and automatically transmitting the output data 314 if a manual input from the agent is not received within the threshold period of time, the computational efficiency of the system may be increased by requiring fewer computer systems to execute the predicted response system 302, and the like. In particular, the predicted response system 302 may provide for a single agent to oversee multiple interactive sessions in a single interface as compared to having each individual interactive session on a respective computer system.
After generating the interactive session 400, predicted response system 302 may generate a predicted response 404. The predicted response 404 may be based on the content received from the user. For example, in response to a request to establish the interactive session 400, the predicted response system 302 may provide, as output, a predicted response 404 welcoming the user to the interactive session 400.
According to some examples, the predicted response 404 may include a timer element 406. The timer element 406 may provide an indication of the amount of time remaining in the threshold period of time before the predicted response 404 is automatically transmitted to the user. For example, the timer element 406 may provide a real-time countdown of the time remaining before the predicted response 404 is automatically transmitted to the user. If a manual input from the agent is not received before the timer element 406 elapses, or reaches zero, the predicted response may be automatically transmitted to the user as part of the interactive session 400.
As illustrated in
According to some examples, as illustrated in
The AI model may, in some examples, be a separate, or different, AI model than the predicted response system 302. The AI model may identify whether to transmit a notification to the agent based on the content received from the user. According to some examples, the AI model may be trained to predict the likelihood the agent could, would, and/or should intervene based on historical examples of similar cases. When the likelihood is above a threshold, a notification may be transmitted to alert the agent. For example, the system may receive content requesting to speak to the agent directly. In such an example, the ML may generate the notification for the interactive session. As shown in
The notification may correspond to a request for agent intervention. Agent intervention may be, for example, one or more manual inputs from the agent in response to the content received from a user rather than the predicted response. By transmitting an audible and/or visible notification, the agent's attention may be drawn to a given interactive session, e.g., interactive session 502. This may allow the agent to supervise multiple interactive sessions concurrently.
According to some examples, the notification system 602 can receive the inference data 604/or training data 606 as part of a call to an application programming interface (API) exposing the notification system 602 to one or more computing devices. Inference data and/or training data can also be provided to the notification system 602 through a storage medium, such as remote storage connected to the one or more computing devices over a network. Inference data and/or training data can further be provided as input through a user interface on a client computing device coupled to the notification system 602.
The inference data 604 can include data associated with identifying whether to transmit a notification. According to some examples, the inference data 604 may include content from interactive sessions. The content may include, for example, source text of the interactive sessions as well as metadata for the source text, such as timestamp, event types, or the like.
The training data 606 can correspond to an artificial intelligence (AI) task, such as a ML task, for determining whether to transmit a notification, such as a task performed by a neural network. The training data can be split into a training set, a validation set, and/or a testing set. An example training/validation/testing split can be an 80/10/10 split, although any other split may be possible. The training data 606 can include examples for when a notification should be transmitted. For example, a notification should be transmitted if the content received from the user includes a request to speak to a human or an agent, a request for information that cannot be provided by the predicted response system 302, or the like.
The training data 606 can be in any form suitable for training a model, according to one of a variety of different learning techniques. Learning techniques for training a model can include supervised learning, unsupervised learning, and semi-supervised learning techniques. For example, the training data 606 can include multiple training examples that can be received as input by a model. The training examples can be labeled with a desired output for the model when processing the labeled training examples. The label and the model output can be evaluated through a loss function to determine an error, which can be backpropagated through the model to update weights for the model. For example, if the machine learning task is a classification task, the training examples can be images labeled with one or more classes categorizing subjects depicted in the images. As another example, a supervised learning technique can be applied to calculate an error between outputs, with a ground-truth label of a training example processed by the model. Any of a variety of loss or error functions appropriate for the type of the task the model is being trained for can be utilized, such as cross-entropy loss for classification tasks, or mean square error for regression tasks. The gradient of the error with respect to the different weights of the candidate model on candidate hardware can be calculated, for example using a backpropagation algorithm, and the weights for the model can be updated. The model can be trained until stopping criteria are met, such as a number of iterations for training, a maximum period of time, a convergence, or when a minimum accuracy threshold is met.
From the inference data 604 and/or training data 606, the notification system 602 can be configured to output one or more results related to whether a notification should be generated as output data 614. As examples, the output data 614 can be any kind of score, classification, or regression output based on the input data. The input data may be, for example, the content received from the user. Correspondingly, the AI or machine learning task can be a scoring, classification, and/or regression task for predicting some output given some input. These AI or machine learning tasks can correspond to a variety of different applications in processing images, video, text, speech, or other types of data to determine whether to generate and/or transmit a notification. The output data 614 can include instructions associated with generating and/or transmitting a notification.
As an example, the notification system 602 can be configured to send the output data 614 for display on a client or user display. For example, if notification system 602 determines that a notification should be provided, a visible notification may be provided as output data 614 for display on a display. In some examples, the notification may be an audible notification such that the notification may be provided as output data 614 via one or more outputs, such as speakers. As another example, the notification system 602 can be configured to provide the output data 614 as a set of computer-readable instructions, such as one or more computer programs. The computer programs can be written in any type of programming language, and according to any programming paradigm, e.g., declarative, procedural, assembly, object-oriented, data-oriented, functional, or imperative. The computer programs can be written to perform one or more different functions and to operate within a computing environment, e.g., on a physical device, virtual machine, or across multiple devices. The computer programs can also implement functionality described herein, for example, as performed by a system, engine, module, or model. The notification system 602 can further be configured to forward the output data 614 to one or more other devices configured for translating the output data 614 into an executable program written in a computer programming language. The notification system 602 can also be configured to send the output data 614 to a storage device for storage and later retrieval.
As shown in
Based on the inputs received, the system may provide the predicted response to the user, as shown in
According to some examples, after the interactive session 816 has ended, an archive chat input 818 may be provided for output to the agent. In response to an input corresponding to the selection of “archive chat,” and after confirming the user has provided authorization, the contents of the interactive session may be stored and/or tracked. In some examples, archiving chat may include providing the contents of the interactive session as training data and/or inference data for the predicted response system 302 and/or notification system 602.
According to some examples, an interactive session may be highlighted. For example, interactive session 904 may be highlighted 921. The highlight 921 may be, for example, a shading, color saturation, or any visual indication that indicates that the interactive session 904 is currently selected. The highlighted interactive session may indicate an interactive session that the agent is currently engaging with.
In some examples, one or more interactive sessions 904-907 may include an indicator 901. The indicator may provide an indication that content was recently received from a user. In some examples, the indicator 901 may correspond to a notification that the agent should intervene in the interactive session.
The interactive sessions may include a timer 910. The timer 910 may provide an indication of how much time has elapsed since content was received from the user. In some examples, the timer may provide an indication of how much time has elapsed since responsive content was transmitted to the user.
In some examples, the interactive sessions may include a status indicator 912. The status indicators 912 may provide an indication of what is happening in the interactive session, whether the user is waiting for responsive content, whether the agent and/or AI model(s) is waiting for content from the user, whether the interactive session is active or terminated, or the like. For example, as shown, the indicator 912 shows that the agent and/or AI model(s) is waiting to receive content from the user.
According to some examples, in response to receiving an input corresponding to a selection of an interactive session in the interaction window panel 1002, a popup 1004, or overlay, of the interaction session may be provided for display on the interface 1000. The pop-up 1004 may include one or more inputs, such as input 1006, that is configured to receive manual inputs from the agent. The manual inputs from the agent may include an input to compose and send content within the pop-up 1004, an input to scroll through the contents of the interactive session, or the like.
After the consulting interactive session 1106 is established, the consulting interactive session 1106 may be linked to the interactive session 1104 associated with the consult. For example, as shown in
The server computing device 1241 can include one or more processors 1242 and memory 1243. The memory 1243 can store information accessible by the processors 1242, including instructions 1245 that can be executed by the processors 1242. The memory 1243 also includes data 1244 that can be retrieved, manipulated, or stored by the processors 1242. The memory 1243 can be a type of non-transitory computer readable medium capable of storing information accessible by the processors 1242, such as volatile and non-volatile memory. The processors 1242 can include one or more central processing units (CPUs), graphic processing units (GPUs), field-programmable gate arrays (FPGAs), and/or application-specific integrated circuits (ASICs), such as tensor processing units (TPUs).
The instructions 1245 can include one or more instructions that, when executed by the processors 1242, cause the one or more processors 1242 to perform actions defined by the instructions 1245. The instructions 1245 can be stored in object code format for direct processing by the processors, or in other formats including interpretable scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. The instructions 1245 can include instructions for implementing a predicted response system 302 and/or notification system 602, which can correspond to the predicted response system 302 of
The data 1244 can be retrieved, stored, or modified by the processors 1242 in accordance with the instructions 1245. The data 1244 can be stored in computer registers, in a relational or non-relational database as a table having a plurality of different fields and records, or as JSON, YAML, proto, or XML documents. The data 1244 can also be formatted in a computer-readable format such as, but not limited to, binary values, ASCII, or Unicode. Moreover, the data 1244 can include information sufficient to identify relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories, including other network locations, or information that is used by a function to calculate relevant data.
The client computing device 1201 can also be configured similarly to the server computing device 1201, with one or more processors 1202, memory 1203, instructions 1205, and data 1204. The client computing device 1201 can also include a user input 1206, a user output 1207, and a communications interface 1208. The user input 1206 can include any appropriate mechanism or technique for receiving input from a user, such as keyboard, mouse, mechanical actuators, soft actuators, touchscreens, microphones, and sensors. The inputs 1206 may receive images, natural language inputs, or the like for input into the predicted response system 302 and/or notification system 602.
The server computing device 1241 can be configured to transmit data to the client computing device 1201, and the client computing device 1201 can be configured to display at least a portion of the received data on a display implemented as part of the user output 1207. The user output 1207 can also be used for displaying an interface between the client computing device 1201 and the server computing device 1241. For example, the output 1207 may be a display, such as a monitor having a screen, a touchscreen, a projector, or a television, configured to electronically display information to a user via a graphical user interface (“GUI”) or other types of user interfaces. For example, output 1207 may electronically display the output of the predicted response system 302 and/or notification system 602, such as predicted responses and/or notifications, respectively. The user output 1207 can alternatively or additionally include one or more speakers, transducers or other audio outputs, a haptic interface or other tactile feedback that provides non-visual and non-audible information to the platform user of the client computing device.
Device 1201 may be at a node of network 1250 and capable of directly and indirectly communicating with other nodes of network 1250. Although a single device 1201 is depicted in
In block 1310, content from a plurality of users is received. The content may be, for example, natural language inputs, such as text, images, documents, or the like.
In block 1320, a respective interaction window for each of the plurality of users is generated. Each respective interaction window may correspond to an interactive session. The respective interactive sessions may correspond to an electronic communication session among two or more of a respective user, generative AI, or an agent. The generative AI may, in some examples, be a first machine learning model. In some examples, the respective interactive sessions may be overseen by an agent while content responsive to the content received by the user is generated by the machine learning model. The respective interaction windows and, therefore, interactive sessions for each of the plurality of users may be provided for output on one or more displays coupled to an agent computing device. According to some examples, a visible portion of the respective interaction windows may include a timer and an identifier of the respective user. The timer may provide an indication of an elapsed time since a previous response was transmitted to a respective user or an elapsed time from when content was received from the respective user. The previous response may be the predicted response or the manual input from the agent.
In block 1330, a predicted response for each interaction session may be identified by executing the first machine learning model based on the received content. The first machine learning model may be, for example, the predicted response system 302 of
In block 1340, prior to transmitting the predicted response, it is determined if a manual input from the agent is received. For example, the respective interaction window may include a timer element in relation to the predicted response. The timer element may provide an indication of a remaining amount of time of a threshold period of time before the predicted response is automatically transmitted. For example, after the predicted response is identified, the timer element may set a countdown clock corresponding to the threshold period of time. The predicted response may not be automatically transmitted until the expiration of the timer.
In block 1350, if, after the threshold period of time the manual input from the agent is not received, the predicted response for each interactive session may be automatically transmitted. Automatically transmitting the predicted response may occur with respect to multiple interactive sessions concurrently. In this regard, an agent may concurrently supervise the multiple interactive sessions while the predicted response system 302 generates the responsive content to be automatically transmitted to the user.
According to some examples, a second machine learning model may be executed to identify whether to transmit a notification to an agent. The notification may be transmitted via the interface. The notification may be an audible or visual notification. For example, the visual notification may be a change in color, flashing colors, or the like. An audible notification may be a beep, ping, or the like. The notification may correspond to a request for agent intervention. Agent intervention may correspond to one or more manual inputs from the agent in response to the received content from a respective user.
In some examples, the respective interactive session may be terminated by executing the first machine learning model based on the received content. The contents of the respective interactive sessions may be provided as input into the first or second machine learning model. The first or second machine learning model may be updated based on the contents of the respective interactive session. According to some examples, the content of the respective interactive session may include an indication of when the manual input from the agent was transmitted instead of the predicted response.
The use of generative AI, such as the predicted response system 302, may allow for an agent to oversee a plurality of interactive sessions simultaneously as opposed to conducting a single interactive session at a time. The generative nature of the predicted response system 302 provides content responsive to the content received from the user, thereby providing an engaging, efficient, and productive interactive session with little to no manual input from the agent. For example, the predicted response system 302 may automate actions and workflows within the interactive sessions. This may reduce the number of inputs received from an agent, thereby increasing the computational efficiency of the system as a whole. For example, reducing the number of inputs received by the agent may decrease the processing power and network overhead required to engage in multiple interactive sessions concurrently.
Including a threshold period of time before transmitting the generated response prevents the system from being fully automated, such that all actions are performed by the AI models. The threshold period of time, or buffer, avoids the AI models from having full control of the interactive sessions overseen by the agent.
Further, the use of a notification system 602 may reduce the number of inputs received from the agent. For example, by providing an audible or visible notification altering the agent to an interactive session that requires agent intervention, the agent no longer has to click between multiple interactive sessions, windows, browsers, programs, or the like. This may increase the computational efficiency of the system by decreasing the processing power and network overhead required to engage in, or intervene in, multiple interactive sessions concurrently.
According to some examples, by increasing the number of interactive sessions a single agent can oversee concurrently, the computational efficiency of the system may increase by decreasing the number of computer systems required to engage in the same number of interactive sessions where an agent is only capable of overseeing a single interactive session. For example, as the number of concurrent interactive sessions the agent oversees increase, the computational efficiency of the system increases by decreasing the processing power, e.g., reduced number of computer systems, inputs, requests, etc., and decreasing network overhead/
Unless otherwise stated, the foregoing alternative examples are not mutually exclusive, but may be implemented in various combinations to achieve unique advantages. As these and other variations and combinations of the features discussed above can be utilized without departing from the subject matter defined by the claims, the foregoing description of the examples should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. In addition, the provision of the examples described herein, as well as clauses phrased as “such as,” “including” and the like, should not be interpreted as limiting the subject matter of the claims to the specific examples; rather, the examples are intended to illustrate only one of many possible implementations. Further, the same reference numbers in different drawings can identify the same or similar elements.
This application claims the benefit of the filing date of U.S. Provisional Patent Application No. 63/529,896 filed Jul. 31, 2023, the disclosure of which is hereby incorporated herein by reference.
| Number | Date | Country | |
|---|---|---|---|
| 63529896 | Jul 2023 | US |