The present disclosure is related to method and system for training virtual agents through fallback analysis. More specifically, the disclosure relates to leveraging fallback logs to suggest new training phrases for existing intents and recommend new intents based on clusters of fallback utterances.
A virtual agent refers to an interactive software program or system that simulates human-like conversations or interactions using artificial intelligence (AI) techniques. The virtual agent is designed to communicate with users in a natural language format, typically through text-based chat interfaces or voice-based interactions. The virtual agent is capable of understanding user queries or requests, interpreting the context, and providing appropriate responses or actions.
The virtual agent's functionality is based on advanced algorithms and AI models, which enables processing and analyzing user input, extracting relevant information, and generating meaningful and contextually appropriate responses. The virtual agent may utilize various techniques such as natural language processing (NLP), machine learning, pattern recognition, and knowledge representation to achieve accurate understanding and effective communication with users.
The virtual agent's capabilities may include but are not limited to:
One of the challenges in developing virtual agents is training them to accurately understand user inputs and respond appropriately. Traditional approaches involve manually identifying new intents (the purpose or goal behind a user's input) and creating training phrases for these intents. However, this manual process is time-consuming, error-prone, and may result in poor intent training, leading to inaccurate or unsatisfactory responses.
There is, therefore, a need in the present state of art, for an automated method and system that leverages a fallback analysis to suggest new training phrases and intent recommendations using the logs of fallback utterances. By automating this process, developers may save considerable time and effort that would otherwise be spent in manually identifying new intents and possible training phrases.
It is within this context that the present embodiments arise.
The following embodiments present a simplified summary in order to provide a basic understanding of some aspects of the disclosed invention. This summary is not an extensive overview, and it is not intended to identify key/critical elements or to delineate the scope thereof. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
Some example embodiments disclosed herein provide a method for training a virtual agent through fallback analysis, the method comprising obtaining a plurality of fallback utterances. The method may further include classifying the plurality of fallback utterances into one or more of existing intent categories, via a Machine Learning (ML) model. The method may further include upon unsuccessful classification of one or more utterances of the plurality of fallback utterances, clustering the one or more utterances into one or more groups based on similarities among the one or more utterances, via the ML model. The method may also include generating labels for the one or more groups to determine names of new intent categories associated with the one or more utterances.
According to some example embodiments, the method further comprising training the virtual agent based on the one or more groups of the one or more utterances, wherein the one or more utterances comprises training phrases.
According to some example embodiments, fallback logs comprise the plurality of fallback utterances.
According to some example embodiments, the plurality of fallback utterances corresponds to at least one of incorrect responses, ambiguous responses, or insufficient responses generated by the virtual agent for queries.
According to some example embodiments, the method further comprising receiving a query from a user; generating a response for the query, wherein the response is one of an incorrect response, an ambiguous response, or an insufficient response; storing the query in the fallback logs; and mining the logs to obtain the plurality of utterances.
According to some example embodiments, the ML model is a classification model.
According to some example embodiments, the classification model comprises at least one of a logistic regression model, a decision tree, a random forest model, a Support Vector Machine (SVM), or an automated machine learning (AutoML) model.
According to some example embodiments, the ML model is a clustering model.
According to some example embodiments, the clustering model uses at least one of a K-means clustering algorithm, a hierarchical clustering algorithm, a density-based spatial clustering algorithm, a gaussian mixture model, or hybrid ensemble models.
According to some example embodiments, generating the labels comprises assigning generic names to the one or more groups.
Some example embodiments disclosed herein provide a computer system for training a virtual agent through fallback analysis, the computer system comprises one or more computer processors, one or more computer readable memories, one or more computer readable storage devices, and program instructions stored on the one or more computer readable storage devices for execution by the one or more computer processors via the one or more computer readable memories, the program instructions comprising obtaining a plurality of fallback utterances. The one or more processors are further configured for classifying the plurality of fallback utterances into one or more of existing intent categories, via a Machine Learning (ML) model. The one or more processors are further configured for upon unsuccessful classification of one or more utterances of the plurality of fallback utterances. The one or more processors are further configured for clustering the one or more utterances into one or more groups based on similarities among the one or more utterances, via the ML model. The one or more processors are further configured for generating labels for the one or more groups to determine names of new intent categories associated with the one or more utterances.
Some example embodiments disclosed herein provide a non-transitory computer readable medium having stored thereon computer executable instruction which when executed by one or more processors, cause the one or more processors to carry out operations for training a virtual agent through fallback analysis. The operations comprising obtaining a plurality of fallback utterances. The operations further comprising classifying the plurality of fallback utterances into one or more of existing intent categories, via a Machine Learning (ML) model. The operations further comprising upon unsuccessful classification of one or more utterances of the plurality of fallback utterances, clustering the one or more utterances into one or more groups based on similarities among the one or more utterances, via the ML model. The operations further comprising generating labels for the one or more groups to determine names of new intent categories associated with the one or more utterances.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
The above and still further example embodiments of the present disclosure will become apparent upon consideration of the following detailed description of embodiments thereof, especially when taken in conjunction with the accompanying drawings, and wherein:
The figures illustrate embodiments of the invention for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure can be practiced without these specific details. In other instances, systems, apparatuses, and methods are shown in block diagram form only in order to avoid obscuring the present invention.
Reference in this specification to “one embodiment” or “an embodiment” or “example embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. The appearance of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.
Some embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout.
The terms “comprise”, “comprising”, “includes”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device, or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by “comprises . . . a” does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present invention. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., are non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, non-volatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
The embodiments are described herein for illustrative purposes and are subject to many variations. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient but are intended to cover the application or implementation without departing from the spirit or the scope of the present invention. Further, it is to be understood that the phraseology and terminology employed herein are for the purpose of the description and should not be regarded as limiting. Any heading utilized within this description is for convenience only and has no legal or limiting effect.
The term “module” used herein may refer to a hardware processor including a Central Processing Unit (CPU), an Application-Specific Integrated Circuit (ASIC), an Application-Specific Instruction-Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physics Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a Controller, a Microcontroller unit, a Processor, a Microprocessor, an ARM, or the like, or any combination thereof.
The term “machine learning model” may be used to refer to a computational or statistical or mathematical model that is trained on classical ML modelling techniques with or without classical image processing. The “machine learning model” is trained over a set of data and using an algorithm that it may use to learn from the dataset.
The term “artificial intelligence” may be used to refer to a model built using simple or complex Neural Networks using deep learning techniques and computer vision algorithms. Artificial intelligence model learns from the data and applies that learning to achieve specific pre-defined objectives.
The term “virtual agent” may be used to refer to a virtual assistant that is computer program or AI system designed to simulate human-like conversations with users. They are typically powered by artificial intelligence and natural language processing technologies. The virtual agent can understand user inputs, generate appropriate responses, and perform specific tasks or provide information. They are often used in customer support, information retrieval, and other applications to provide automated and efficient conversational experiences.
Embodiments of the present disclosure may provide a method, a system, and a computer program product for training a virtual agent. The method, the system, and the computer program product for training a virtual agent are described with reference to
The user input may represent a query, request, or command from the user 102, indicating their intention or information they seek. The user input serves as the starting point for the virtual agent 104 to understand the user's needs and provide appropriate assistance or information. It ranges from specific questions or requests to more general inquiries or tasks. The objective of the virtual agent 104 is to accurately interpret and process the user input to deliver a relevant and helpful response.
The virtual agent 104 may be capable of creating a smooth and seamless user experience by effectively capturing and interpreting the user's intent, regardless of the input format or language used. By understanding the user's intent accurately, the virtual agent 104 may provide more relevant and tailored responses, improving the overall user satisfaction and achieving the goals of the interaction.
However, there are circumstances where the virtual agent 104 is unable to process the query of the user 102, or may provide incorrect, ambiguous, or insufficient responses to the user 102. In such cases the virtual agent needs to be trained to ensure accurate and adequate responses to user queries.
Here are some common examples where such condition may occur:
New or uncommon user queries: When user ask questions or provide inputs that fall outside the predefined training data of the virtual agent, it may struggle to understand and respond appropriately.
Lexical variations and language nuances: Different users may express the same intent using varied phrasing, slang, or regional language nuances, which the virtual agent may not be equipped to handle without further training.
Evolving user needs and context: As user needs and contexts evolve over time, the virtual agent may encounter queries that were not anticipated during its initial training, leading to inaccurate or inadequate responses.
Complex or ambiguous queries: Users may pose complex or ambiguous queries that require deeper understanding or disambiguation. Without proper training, the virtual agent may struggle to provide satisfactory responses in such scenarios.
Intent confusion and intent clashing: If the virtual agent have not been trained to differentiate between similar intents or handle potential conflicts between intents, it may provide incorrect or conflicting responses to user queries.
In all such cases by training through a fallback analysis, the virtual agent can identify and learn from these new intents, enabling it to provide accurate and adequate responses to user queries. This is further explained in greater detail in conjunction with
While only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The example of the machine 200 includes at least one processor 202 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), advanced processing unit (APU), or combinations thereof), one or more memories such as a main memory 204, a static memory 206, or other types of memory, which communicate with each other via link 208. Link 208 may be a bus or other type of connection channel. The machine 200 may include further optional aspects such as a graphics display unit 210 comprising any type of display. The machine 200 may also include other optional aspects such as an alphanumeric input device 212 (e.g., a keyboard, touch screen, and so forth), a user interface (UI) navigation device 214 (e.g., a mouse, trackball, touch device, and so forth), a storage unit 216 (e.g., disk drive or other storage device(s)), a signal generation device 218 (e.g., a speaker), sensor(s) 221 (e.g., global positioning sensor, accelerometer(s), microphone(s), camera(s), and so forth), output controller 228 (e.g., wired or wireless connection to connect and/or communicate with one or more other devices such as a universal serial bus (USB), near field communication (NFC), infrared (IR), serial/parallel bus, etc.), and a network interface device 220 (e.g., wired and/or wireless) to connect to and/or communicate over one or more networks 226.
Executable Instructions and Machine-Storage Medium
The various memories (i.e., 204, 206, and/or memory of the processor(s) 202) and/or storage unit 216 may store one or more sets of instructions and data structures (e.g., software) 224 embodying or utilized by any one or more of the methodologies or functions described herein. These instructions, when executed by processor(s) 202 cause various operations to implement the disclosed embodiments.
While only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
As used herein, the terms “machine-storage medium,” “device-storage medium,” “computer-storage medium” mean the same thing and may be used interchangeably in this disclosure. The terms refer to a single or multiple storage devices and/or media (e.g., a centralized or distributed database, and/or associated caches and servers) that store executable instructions and/or data. The terms shall accordingly be taken to include storage devices such as solid-state memories, and optical and magnetic media, including memory internal or external to processors. Specific examples of machine-storage media, computer-storage media and/or device-storage media include non-volatile memory, including by way of example semiconductor memory devices, e.g., erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), FPGA, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The terms machine-storage media, computer-storage media, and device-storage media specifically and unequivocally excludes carrier waves, modulated data signals, and other such transitory media, at least some of which are covered under the term “signal medium” discussed below.
Signal Medium
The term “signal medium” shall be taken to include any form of modulated data signal, carrier wave, and so forth. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a matter as to encode information in the signal.
Computer Readable Medium
The terms “machine-readable medium,” “computer-readable medium” and “device-readable medium” mean the same thing and may be used interchangeably in this disclosure. The terms are defined to include both machine-storage media and signal media. Thus, the terms include both storage devices/media and carrier waves/modulated data signals.
As used herein, the term “network” may refer to a long-term cellular network (such as GSM (Global System for Mobile Communication) network, LTE (Long-Term Evolution) network or a CDMA (Code Division Multiple Access) network) or a short-term network (such as Bluetooth network, Wi-Fi network, NFC (near-field communication) network, LoRaWAN, ZIGBEE or Wired networks (like LAN, el all) etc.).
As used herein, the term “computing device” may refer to a mobile phone, a personal digital assistance (PDA), a tablet, a laptop, a computer, VR Headset, Smart Glasses, projector, or any such capable device.
As used herein, the term ‘electronic circuitry’ may refer to (a) hardware-only circuit implementations (for example, implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of ‘circuitry’ applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term ‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term ‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
Accordingly, blocks of the flow diagram support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flow diagram, and combinations of blocks in the flow diagram, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions. The method 300 illustrated by the flowchart diagram of
The method 300 starts at step 302, and at the step 304 a user calls an online service. The user may initiate a call to the online service through various means such as, but not limited to, accessing a website, using a mobile application, or interacting with a chatbot.
In an embodiment, a virtual agent activated to serve the user at 306. In particular, the user who calls for the online service may seek assistance or information from the virtual agent. The virtual agent is an automated system that may be designed to interact with users and provide relevant information or assistance. The information may be related to order enquiry, billing enquiry, booking flights, finding accommodation, or getting recommendations for popular destinations and the like.
Further, at step 308 the virtual agent answers user's queries for existing intent. The virtual agent proceeds to answer the user's queries that fall within the scope of existing intents. These existing intents represent predefined categories or purposes behind user inputs that the virtual agent has been trained on. When a user's query matches one of these existing intents, the virtual agent utilizes its training to understand the intent and respond accordingly.
However, in situations where the user's query does not associate with any existing intents, then in that case the virtual agent performs fallback analysis to identify new intents, at step 310. Fallback refers to user utterances that the virtual agent is unable to understand or process effectively. The fallback analysis refers to a process of analyzing user queries or inputs that cannot be matched to any predefined intents or categories within a system. When a user submits a query that does not associate with existing intents, the virtual agent performs fallback analysis to understand the user's intent and generate an appropriate response.
In a more elaborative way, the virtual agent utilizes natural language understanding techniques to analyze and process the user's query. These techniques may involve tokenization, part-of-speech tagging, entity recognition, and other NLU processes to extract meaningful information from the query.
The virtual agent compares the processed user query with the existing intents in the system. This may be achieved using techniques such as pattern matching, keyword matching, or machine learning algorithms to find the best matching intent.
If the virtual agent does not find a suitable matching intent, it determines that the user's query does not associate with any existing intents. This triggers the fallback analysis process to identify new intents.
During the fallback analysis, the virtual agent may generate suggestions for new intents based on the characteristics of the user's query. These suggestions may be based on clustering similar queries, analyzing the intent patterns in the fallback utterances, or applying machine learning techniques to identify distinct user needs.
In some embodiments, user feedback may be incorporated into the fallback analysis process. Users may have the option to provide feedback on the virtual agent's response or indicate whether the provided intents match their query. This feedback may help to validate the need for new intents and refine the analysis process.
Upon identifying the new intent through fallback analysis, the virtual agent proceeds to answer user's queries for the new intent, at step 312 and the method 300 terminates at 314.
The method 300 can be better explained with the help of an example. Consider a scenario where the user is calling an online shopping service to inquire about product availability and place an order. The user might ask, “Is the blue shirt in stock?” or “How do I return a defective item?”. The virtual agent may accurately understand and respond to these queries based on its existing training data.
In some cases, the user may ask a question or make a request that the virtual agent is not adequately trained to handle. For instance, the user might ask, “Can I schedule a product demo with a sales representative?”. If the virtual agent is unsure how to respond, it enters a fallback state. During the fallback state, the virtual agent generates a fallback response, such as “I'm sorry, I didn't understand your request”. The user, in response to the fallback, might provide feedback utterances to express their frustration, confusion, or clarification needs. These feedback utterances may include statements like, “Your response doesn't make sense” or “I need to speak to a salesperson”.
The fallback utterances generated by the user are collected and analyzed. The fallback analysis process begins by classifying these utterances to determine if they are variations of existing intents. If they are variations, the virtual agent may suggest new training phrases for the corresponding intent, helping to improve its understanding and responses.
For the fallback utterances that cannot be classified into existing intents, the system applies clustering algorithms to group similar utterances together. Each cluster represents a possible new intent. In the above example, the feedback utterances expressing the need to speak to a salesperson may form a cluster, indicating a possible new intent related to sales assistance.
The virtual agent then runs intent name generation algorithms on the clusters to suggest relevant names or labels for the new intents. In this case, it might generate a suggested intent name like “Sales Inquiry” or “Sales Support”.
Once the new intent and its corresponding name are identified, the virtual agent may be trained on additional training phrases for the new intent to better understand and respond to similar user queries in the future. Thus, by analyzing the fallback utterances, generating new intents, and training phrases, the virtual agent becomes more adept at handling a broader range of user queries and can provide more accurate and helpful responses.
User messages contain the messages or queries sent by the user to the virtual agent. This can include text inputs, voice commands, or any other form of user communication.
Agent responses include the replies or responses generated by the virtual agent in reaction to the user's messages. These responses may be in the form of text, voice, or other relevant formats depending on the medium of interaction.
In an example embodiment, diarization of speaker refers to the process of identifying and distinguishing between different speakers in an audio conversation or interaction. Diarization involves segmenting the audio signal into distinct speaker turns or segments, labeling them according to the speaker's identity, and determining when a new speaker begins talking or when a speaker switch occurs.
Virtual agents or chatbots rely on natural language processing (NLP) to understand and respond to user inputs. However, in scenarios where multiple participants are involved, such as in group discussions or meetings, speaker diarization becomes essential to accurately attribute spoken content to specific individuals.
By performing speaker diarization, the virtual agent 104 may analyze the audio input, recognize different speakers, and associate their spoken content with respective identities. This enables the virtual agent 104 to provide more personalized and context-aware responses, tailor the interaction based on individual preferences or histories, and facilitate smoother multi-party conversations.
Speaker diarization algorithms typically utilize techniques such as voice activity detection (VAD) to determine when a speaker is active, speech segmentation to identify speaker boundaries, and speaker recognition or clustering algorithms to assign speakers to their respective segments. These algorithms can be trained on large amounts of audio data to improve their accuracy in distinguishing between different speakers and handling various acoustic environments.
Apart from the VAD, speaker change detection algorithms can also be employed to identify transitions between different speakers. These algorithms analyze acoustic features, such as pitch, energy, or spectral characteristics, to detect changes in the speaking style or characteristics, indicating a potential speaker change.
Further, advanced acoustic modeling techniques, such as Gaussian Mixture Models (GMMs) or Hidden Markov Models (HMMs), may be used to capture speaker-specific information and model the acoustic properties of individual speakers. These models may be trained on large datasets to learn speaker-specific patterns and improve the accuracy of diarization.
Alternatively, deep neural networks (DNNs) and deep learning architectures, such as recurrent neural networks (RNNs) or convolutional neural networks (CNNs), can also be applied to speaker diarization tasks. These models can learn complex patterns and representations of speaker characteristics, leading to improved accuracy in identifying speaker boundaries and assigning speakers to segments.
In addition to audio information, incorporating other modalities like video or textual data can enhance the diarization process. Visual cues from video, such as lip movements or facial expressions, can provide additional information for speaker identification. Textual data from transcriptions or subtitles can also be used to align speech segments with corresponding text and helps in diarization.
In an example embodiment, when the virtual agent encounters a user input that it cannot understand or respond to adequately within its existing intents then in that case a logged user input and corresponding fallback utterances are stored as part of fallback logs at block 404. The fallback logs act as a repository of fallback scenarios, capturing instances where the virtual agent encountered challenges in understanding user inputs or providing suitable responses. The fallback logs include the user's fallback utterances, timestamps, and other relevant information, which may be used for various purposes, such as analyzing user needs, identifying areas for system improvement, and generating new intents or training phrases.
The purpose of maintaining these fallback logs is to utilize the information for various purposes:
Analysis of User Needs: The fallback logs may be analyzed to get understanding of the types of queries or user needs that the virtual agent struggles to handle. By examining the patterns and commonalities in the fallback utterances, developers and system administrators may identify areas where the virtual agent requires improvement.
System Improvement: The fallback logs may be utilized to improve the virtual agent's performance and enhance its understanding of user inputs. By reviewing the logged fallback scenarios, developers may identify the root causes of fallbacks and make adjustments to the system, such as refining existing intents, adding new intents, or expanding the training data.
Generation of New Intents or Training Phrases: The logged fallback utterances may act as a valuable resource for generating new intents and training phrases. By clustering similar fallback utterances and applying intent name generation techniques, new intents may be identified and labeled. These new intents may then be used to train the virtual agent, improving its ability to understand and respond to similar user queries in the future.
Upon obtaining a plurality of fallback utterances from the fallback logs, a fallback analysis is performed to train the virtual agent for the new intents, at block 406. The fallback analysis may include steps of classification, clustering, and label generation.
In a more elaborative way, at block 408, the proposed system employs a machine learning classification model to classify fallback utterances as variations of existing intent training phrases. By comparing the patterns and semantic similarities between fallbacks and known training data, the system identifies which existing intents the fallback utterances belong to. This step ensures the utilization of existing training phrases and avoids duplication.
The classification may be performed by at least one of the following machine learning classification models: logistic regression, decision tree, random forest, Support Vector Machine (SVM), or an automated machine learning (AutoML) model. Here is how each of these models may be applied in the classification process:
By employing classification models such as logistic regression, decision tree, random forest, SVM, or AutoML, the system may perform the classification of fallback utterances into existing intent classes. It is to be noted that each model has its own capability to learns the patterns and relationships between training phrases and intent labels, enabling accurate prediction and identification of the most appropriate existing intent for a given fallback utterance.
Further, at block 410, for fallback utterances that cannot be classified into existing intents, the system applies machine learning clustering model (more preferably, machine learning clustering algorithms). These algorithms group similar fallback utterances into clusters, considering their semantic and contextual similarities. Each cluster represents a possible new intent, as the fallback utterances within the cluster share common characteristics.
The clustering may be performed by at least one of the following machine learning classification models: a K-means clustering algorithm, a hierarchical clustering algorithm, a density-based spatial clustering algorithm, a gaussian mixture model, or hybrid ensemble models. Here is how each of these models may be applied in the clustering process:
By employing these clustering models, the system provides a comprehensive and flexible approach to clustering fallback utterances. Each model offers unique strengths and capabilities, enabling the system to identify different types of clusters and effectively suggest new intents based on the grouped fallback utterances. It is to be noted that a selection of clustering models may be tailored to the specific requirements and characteristics of the fallback data to achieve accurate and meaningful clustering results.
Furthermore, at block 412, after clustering the remaining fallback utterances, the system employs label generation model (more preferably, intent name generation algorithms) to determine name of new intent. The intent name generation algorithms analyze the content and context of the clustered fallbacks to suggest relevant and descriptive names for the new intents. The generated intent names accurately reflect the underlying themes and concepts of the clustered fallback utterances. These algorithms consider various factors such as keyword analysis, semantic similarity, contextual cues, and linguistic patterns to generate relevant and appropriate names for the new intent categories.
The specific intent name generation algorithms used in the present invention may vary depending on the implementation. Several techniques and algorithms may be utilized for this purpose, including but not limited to:
It is to be noted that the specific algorithms used for intent name generation may be customized and adapted based on the requirements and characteristics of the virtual agent. The choice of algorithms depends on factors such as the complexity of the utterances, the desired level of specificity in intent naming, and the available training data.
After clustering the fallback utterances into groups representing possible new intents, at block 414, the virtual agent is trained based on these groups and their corresponding utterances. This step is to incorporate the clustered fallback utterances as new training data for the virtual agent's natural language understanding model.
With this expanded training data, the virtual agent's natural language understanding model is retrained. This may include updating the model's parameters, such as weights and biases, based on the combined training data. The training process typically utilizes machine learning algorithms, such as deep learning or statistical approaches, to optimize the model's performance in recognizing user intents accurately.
Once the virtual agent is trained by incorporating the newly identified intents and training phrases into the virtual agent's training data, it may now generate a response to a user's query, at block 416. This expanded training dataset enhances the virtual agent's understanding, enabling it to generate responses based on the training and provide accurate and contextually appropriate replies to user inputs.
Accordingly, blocks of the flow diagram support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flow diagram, and combinations of blocks in the flow diagram, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
The method 500 illustrated by the flow diagram of
Fallback utterances typically arise when the virtual agent generates incorrect, ambiguous, or insufficient responses for user queries. These fallback utterances represent user inputs of which the system was unable to understand or provide a satisfactory response. It indicates possible areas where the system's understanding or response generation may be improved.
By way of an example, if a user asks, “What is the weather like today?” and the virtual agent responds with an unrelated answer like, “I'm sorry, I don't have that information”, the user might provide a fallback utterance such as, “That's not what I asked for” or “You didn't answer my question”. These fallback utterances highlight instances where the virtual agent failed to provide the desired response.
The method 500, at step 506, may include classifying the plurality of fallback utterances into one or more of existing intent categories, via a Machine Learning (ML) model. The intent categories represent predefined labels or classes that capture different types of user intents for which the virtual agent is designed to understand and respond.
To perform the classification, a Machine Learning (ML) model is utilized. The ML model is trained on a dataset that includes examples of user inputs labeled with their corresponding intents. This training data helps the model to learn patterns and features that may distinguish one intent from another. The classification of fallback utterances using the ML model include training the model on a labeled dataset and then using it to predict the class or category of new, unlabeled data. Here are some general processes of classification using the ML model:
It is to be noted that the specific ML model and algorithms used for classification may vary depending on the requirements, dataset characteristics, and available resources. The choice of model and feature extraction techniques should be tailored to the specific context and goals of the classification task.
The method 500, at step 508, may include, clustering the one or more utterances into one or more groups upon unsuccessful classification of one or more utterances of the plurality of fallback utterances, based on similarities among the one or more utterances, via a clustering model.
The general process for clustering using the ML model may include:
Further, the method 500, at step 510, may include generating labels for the one or more groups to determine names of new intent categories associated with the one or more utterances and the method 500 terminates at 512. The labels act as names or identifiers for the new intent categories associated with the utterances in each group. The labels should be descriptive and representative of the content or theme of the utterances within the cluster. A process of generating the labels is further explained in conjunction with
The objective of label generation process is to capture the essence of the utterances in each cluster and assign descriptive names that reflect the underlying intent or purpose. By generating these labels, the method 500 enables the identification and naming of new intent categories based on the content and context of the fallback utterances.
In some example embodiments, a computer programmable product may be provided. The computer programmable product may comprise at least one non-transitory computer-readable storage medium having stored thereon computer-executable program code instructions that when executed by a computer, cause the computer to execute the method 500.
In an example embodiment, an apparatus for performing the method 500 of
Accordingly, blocks of the flow diagram support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flow diagram, and combinations of blocks in the flow diagram, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions. The method 600 illustrated by the flowchart diagram of
The method 600 starting at 602 commences by receiving a query from a user in step 604 of method 600. This query represents the input or request made by the user to the virtual agent. It may be in the form of natural language text, voice input, or any other medium through which the user interacts with the virtual agent.
In an embodiment, the method 600, at step 606 may further include generating a response for the query. The response may be in the form of a text message, spoken words, or any other appropriate output format. The virtual agent attempts to provide a relevant and accurate response that addresses the user's query. The response generated in this step may fall into one of the following categories:
Further, the method 600, at step 608 may further include storing the query in the fallback logs. In a more elaborative way, if the response generated in step 606 falls into any of the categories mentioned above (i.e., incorrect, ambiguous, or insufficient), the query is considered a fallback state. In this step, the query is logged and stored in the fallback logs. The fallback logs act as a record of user queries that the virtual agent is not able to understand or respond adequately.
Further, the method 600, at step 610 may further include mining the logs to obtain the plurality of fallback utterances and the method 600 terminates at 612. The logs are examined to identify the queries that resulted in fallback situations. These queries form the plurality of fallback utterances, representing the set of user inputs that the virtual agent struggled to understand or respond to accurately.
The mining process may include various techniques, such as natural language processing (NLP), data analysis, or machine learning algorithms, to extract and separate the fallback utterances from the rest of the logs. These utterances are then used for further analysis, classification, clustering, and generating suggestions for new training phrases and intent recommendations, as described in earlier steps of the present disclosure. Here are some techniques commonly employed in the mining process:
In some example embodiments, a computer programmable product may be provided. The computer programmable product may comprise at least one non-transitory computer-readable storage medium having stored thereon computer-executable program code instructions that when executed by a computer, cause the computer to execute the method 600.
In an example embodiment, an apparatus for performing the method 600 of
Accordingly, blocks of the flow diagram support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flow diagram, and combinations of blocks in the flow diagram, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
The method 700 illustrated by the flow diagram of
The method 700, at step 706, may include classifying the plurality of fallback utterances into one or more of existing intent categories, via a classification model. The classification model is trained on labeled data, where each utterance is associated with a specific intent category. By applying the classification model to the fallback utterances, the system attempts to assign them to the existing intent categories that best match their content and purpose. Here is an explanation of how the classification model typically works to perform this task:
The method 700 may include, at step 708, clustering the one or more utterances into one or more groups upon unsuccessful classification of one or more utterances of the plurality of fallback utterances, based on similarities among the one or more utterances, via a clustering model. clustering model is used to perform this task. The clustering model identifies groups or clusters of utterances that exhibit similar characteristics, patterns, or themes. This process aims to group together fallback utterances that share common features or intents but could not be assigned to existing intent categories through classification.
There are various clustering algorithms available, such as K-means, hierarchical clustering, DBSCAN (Density-Based Spatial Clustering of Applications with Noise), or spectral clustering. The choice of algorithm depends on the nature of the data and the desired clustering outcome. Each algorithm has its own way of defining clusters based on different criteria, such as distance, density, or connectivity. The clustering model is applied to the unclassified fallback utterances. The clustering model analyzes the extracted features and groups similar utterances together into clusters. The similarity between utterances is typically determined based on distance metrics, such as Euclidean distance or cosine similarity, which measure the similarity in feature space.
The method 700 may further include, at step 710, generating labels for the one or more groups to determine names of new intent categories associated with the one or more utterances. These labels act as names or identifiers for the new intent categories associated with the clustered utterances. The labels are generated to reflect the underlying theme or purpose shared by the utterances within each cluster.
The step 710 may include a sub-step 712 for generating the labels. In an embodiment, at step 712 generating the labels comprises assigning generic names to the one or more groups, and the method 700 terminates at step 714. In the sub-step 712, a process is followed to assign generic names to the clusters. The generic names provide a high-level representation of the intent or concept encompassed by the cluster, without delving into specific details. These generic labels capture the common theme or purpose of the clustered utterances.
Intent name generation algorithms employ various techniques to generate meaningful labels for the one or more groups of clusters. Some common approaches include:
Keyword Extraction: Extracting important keywords or phrases from the utterances within each cluster can provide descriptive labels that capture the central theme. This may be done using techniques like TF-IDF (Term Frequency-Inverse Document Frequency), TextRank, or other keyword extraction algorithms.
Topic Modeling: Applying topic modeling algorithms, such as Latent Dirichlet Allocation (LDA) or Non-Negative Matrix Factorization (NMF), to the utterances within each cluster may help identify dominant topics or themes. The generated topics may serve as labels for the clusters.
Semantic Analysis: Leveraging semantic analysis techniques, such as word embeddings or contextualized word representations (e.g., BERT) may capture the semantic meaning of the utterances. By analyzing the semantic relationships and similarities among the clustered utterances, labels may be generated based on the overarching intent or concept.
Rule-based Labeling: Establishing a set of predefined rules or heuristics may be useful for assigning labels based on specific patterns or characteristics observed within the clusters. These rules may be based on linguistic patterns, frequently occurring terms, or other domain-specific criteria.
By employing intent name generation algorithms, the system generates labels for the clusters obtained from the clustering step. These labels represent the names or identifiers associated with the new intent categories detected from the fallback utterances. The generated labels may then be used to further enhance the virtual agent understanding and training phrases, improving its performance and accuracy in responding to user interactions.
In some example embodiments, a computer programmable product may be provided. The computer programmable product may comprise at least one non-transitory computer-readable storage medium having stored thereon computer-executable program code instructions that when executed by a computer, cause the computer to execute the method 700.
In an example embodiment, an apparatus for performing the method 700 of
Accordingly, blocks of the flow diagram support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flow diagram, and combinations of blocks in the flow diagram, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
The method 800 illustrated by the flow diagram of
Here is an explanation of how training is typically done by considering these groups of utterances:
The method 800, at step 806 may further include generating a response for a user based on training and the method 800 further terminates at step 808. Once the virtual agent is trained, it may generate responses for user queries based on the acquired knowledge. When a user submits a query, the virtual agent uses its trained model to understand the query's intent and context. It then generates an appropriate response that aligns with the intent and provides relevant information or assistance to the user. The training process allows the virtual agent to understand and classify user queries accurately, providing appropriate and relevant responses.
In some example embodiments, a computer programmable product may be provided. The computer programmable product may comprise at least one non-transitory computer-readable storage medium having stored thereon computer-executable program code instructions that when executed by a computer, cause the computer to execute the method 800.
In an example embodiment, an apparatus for performing the method 800 of
In an embodiment, the user selects “Optimized Intent” window 902 from the GUI 900. This action opens a new window that presents an “Optimized Utterances” 904, a “Similar Utterances” 906, and an “Intents” 908 associated with the selected utterances.
Within the “Optimized Intent” window 902, the user can find the “Optimal Utterances” option 904. This option displays a list of carefully selected utterances that have been classified as relevant and representative of the associated intent. Each utterance is accompanied by a score indicating its relevance or confidence level. Examples of optimized utterances mentioned in the description:
The “Similar Utterances” option 906 provides a list of utterances that are similar to the optimized utterances. These utterances share common themes or topics and can provide additional context or variations related to the associated intent. Examples of similar utterances mentioned in the description:
The “Intents” option 908 displays a list of intents associated with the optimized and similar utterances. These intents represent the identified themes or categories to which the utterances belong. Examples of intents mentioned in the description:
Thus, the GUI 900 allows users to view and analyze classified utterances, explore optimized and similar utterances, and identify the intents associated with them. This helps in understanding user queries, improving the virtual agent training, and facilitating effective communication and responses to user queries.
In an embodiment, the user opens the “New Intents” window 1002 from the GUI 1000. This action opens a new window specifically designed for managing the newly identified utterances.
Within the “New Intents” window 1002, the user can find the “Optimal Utterances” option 1004. This option displays a list of carefully selected utterances that have been identified as relevant and representative of the newly identified intents. Examples of optimal utterances mentioned in the description:
The “Similar Utterances” option 1006 provides a list of utterances that are similar to the optimal utterances. These utterances share similar themes or topics and may provide additional context or variations related to the newly identified intents. Examples of similar utterances mentioned in the description:
The “Intents” option 1008 displays a list of intents associated with the optimal and similar utterances. These intents represent the newly identified themes or categories to which the utterances belong. Examples of intents mentioned in the description:
In particular, the GUI 1000 allows users to view and analyze newly identified utterances, explore optimal and similar utterances, and identify the intents associated with them. This helps in recognizing emerging patterns or themes in user queries, facilitating the creation of new intent categories, and improving the virtual agent training to handle these specific topics.
As will be appreciated by those skilled in the art, the techniques described in the various embodiments discussed above are not routine, or conventional, or well understood in the art. The present disclosure addresses the problem of fallback logs to suggest new training phrases for existing intents and recommend new intents based on clusters of fallback utterances. The primary goal of the present invention is to automate the identification of new intents and training phrase suggestions, minimizing the manual effort involved.
The techniques discussed above provide advantages like, it automates entirely manual process and reduces considerable manual hours spent by developers on manually identifying new intents and possible training phrases of existing intents. Further, the techniques provide significant reduction in manual effort for identifying new intents and training phrases, integration into existing processing pipelines, and enhancing overall system efficiency.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
It is to be understood that the above description is intended to be illustrative, and not restrictive. For example, the above-discussed embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description.
With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
The benefits and advantages which may be provided by the present invention have been described above with regard to specific embodiments. These benefits and advantages, and any elements or limitations that may cause them to occur or to become more pronounced are not to be construed as critical, required, or essential features of any or all of the embodiments.
While the present invention has been described with reference to particular embodiments, it should be understood that the embodiments are illustrative and that the scope of the invention is not limited to these embodiments. Many variations, modifications, additions, and improvements to the embodiments described above are possible. It is contemplated that these variations, modifications, additions, and improvements fall within the scope of the invention.