Generative Artificial Intelligence (Gen. AI) refers to AI methodologies that can be used to create new content. Gen. AI is powered by very large machine learning (ML) models that are pre-trained on vast amounts of data via self-supervised or semi-supervised learning, commonly referred to as foundation models (FMs). FMs generate output from inputs provided in human language instructions. FMs are based on complex neural networks including generative adversarial networks (GANs), transformers, and variational encoders. Accordingly, LLMs are AI models with billions or even trillions of parameters. It is the rapid advances in LLMs that have opened a new era in which Gen. AI models can generate textual, audio, graphics, or video content. Gen. AI starts with a prompt that could be in the form of a text, an image, a video, a design, musical notes, or any input that the AI system can process. Various AI algorithms then return new content in response to the prompt.
Features of the present disclosure are illustrated by way of examples shown in the following figures. In the following figures, like numerals indicate like elements, in which:
For simplicity and illustrative purposes, the present disclosure is described by referring to examples thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure. Throughout the present disclosure, the terms “a” and “an” are intended to denote at least one of a particular element. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
Gen. AI-based chatbot apparatus responds to a received user query by employing classical AI techniques as well as Gen. AI techniques. A chatbot input interface is used to receive the user query which may include an intent and a slot. The user query can be received as a voice input or a textual input that conveys via the intent, the user's requirement for at least one task to be executed. The task may pertain to retrieval and/or analysis of data from a plurality of data sources. The plurality of data sources may include relational databases and knowledge bases. The relational databases may include structured data generated, manipulated, and used in transactions. The knowledge bases can store unstructured domain-based knowledge such as process information, user profile information, and metadata related to the transactions including session information, the database schema of the relational databases, etc. The slot in the user query can act as a filter to extract data from one or more of the plurality of data sources. Therefore, the slot can convey an identity of the data source which may include one or more of the relational databases and the knowledge bases that can be used for carrying out the task conveyed in the intent. Furthermore, depending on the context, the slot can also convey features of the data or relevant entities from the data sources which can be used as filters to retrieve data responsive to the user query.
When the user query is received, it is provided to an intent-slot prediction model for extracting the intent and the slot. In an example, the intent-slot prediction model implements classical AI techniques for the prediction of the intent and identification of the slot. Additionally, the intent-slot prediction model also outputs confidence related to the accuracy of the intent predictions. This confidence is compared to a threshold such as a configured confidence limit. If the confidence of the intent prediction is equal to or above the configured confidence limit, an answer to the user query can be generated using structured queries on the relational databases or using LLM based approach on unstructured databases as per configuration. If the confidence of the intent prediction is below the configured confidence limit, one or more of a plurality of large language models (LLMs) may be employed to automatically generate the structured queries using the knowledge bases, and the automatically-generated structured queries are in turn used to retrieve results from the relational databases. The chatbot interface formats the retrieved results to provide an answer to the user query.
When the confidence of the intent prediction is above the configured confidence limit, a mapping of the intents and the structured queries and/or search queries/terms that can carry out the tasks conveyed by the intents is accessed. The intent extracted from the user query is matched to one of the intents from the intent-query mappings. The corresponding structured query or search query is retrieved from the intent-query mappings and executed on one of the relational databases or the unstructured knowledgebases to retrieve results responsive to the user query. The chatbot interface may format the retrieved results to form an answer to the user query. The answer may be provided as a textual output, voice output, etc.
When the confidence of the intent prediction is below the configured confidence limit, the user query is provided via a prompt to a selected one of the plurality of LLMs. A prompt is natural language text describing the task that should be performed by an LLM. A prompt can be a query, a command, a short statement of feedback, or a longer statement including context, instructions, and input data. Particularly, a prompt generated from the data of the user query can be provided to an LLM. The LLM is trained to automatically generate a structured query (e.g., Structured Language (SQL) Query) for the intent or to carry out the task conveyed in the user query. The LLM can be trained on various forms that user queries can take or the various ways in which the intents can be conveyed for executing various in tasks. Additional parameters for the training can include at least user profile information and the database schema of the relational database which includes the results responsive to the user query. The user profile information can include a user's role and the access privileges associated therewith. The database schema can include database configuration information such as but not limited to the number of columns, column names, relations such as primary key, foreign key references, etc. In an example, the structured query thus automatically generated is executed on one or more tables of the relational databases to create the results which can be formatted into the answer for the user query. Furthermore, the plurality of LLMs can include subsets of customized LLMs wherein each subset is customized for a given domain, wherein each LLM within the subset of customized LLMs is further trained to identify a corresponding intent.
If the structured query cannot be automatically generated by the currently selected LLM, another prompt may be issued to divide the task into a sequence of subtasks. Particularly, the LLM can be prompted for the steps involved in achieving the task conveyed in the intent thus obtaining the sequence of subtasks. Prompts are then generated to obtain the structured query for each of the subtasks thereby generating multiple structured queries. The multiple structured queries can be executed on one or more tables of the relational databases for retrieval of responsive results. Different prompting strategies would provide answers with their corresponding confidence scores and the response with the highest score is selected. If a structured query cannot be generated by the prompt, the prompt containing the instructions is processed through an augmented retrieval process wherein similar documents are retrieved through the knowledge bases and those documents which satisfy a similarity threshold may be processed through the currently-selected LLM for the answer. The answer thus generated can be provided to the user via an output screen of the chatbot interface. Feedback is also collected from the users for the answers provided, and based on the feedback the LLM used for responding to user queries can be switched to a different LLM.
The Gen. AI chatbot apparatus and methods disclosed herein provide a technical solution wherein existing computing systems using classical AI models are enabled to be forward compatible with LLMs based on Gen. AI techniques. Usage of different LLMs lends the power of Gen. AI techniques to process user queries issued in a natural language besides allowing users to frame queries differently for the same intent. Training LLMs to produce structured queries enables leveraging Gen. AI techniques for improving classical data retrieval methodologies. Furthermore, providing for the collection of user feedback and switching between the LLMs based on the user feedback configures the Gen. AI chatbot apparatus for improving over time with usage.
The apparatus 100 includes an input/output (I/O) interface such as the chatbot interface 150, an orchestrator 160, a structured data manager 170, an unstructured data manager 180, a Generative AI (“Gen. AI”) switch 190, and a feedback processor 140. In an example, the chatbot interface 150 can include an intent-slot prediction model 152 which predicts the intent 112 and the slot 114 from the user query 110. In an example, the intent-slot prediction model 152 can be trained on labeled training data including various user queries/user inputs/user utterances and corresponding intents and slots. For example, the same intent can be conveyed through utterances or queries framed with different words. Accordingly, the intent-slot prediction model 152 is trained to identify the intent 112 accurately from various types of utterances using different words. In an example, the intent-slot prediction model 152 implements classical AI techniques such as but not limited to, neural nets, Support Vector Machines (SVM), or deep learning models such as Bidirectional Encoder Representations from Transformers (BERT).
The intent-slot prediction model 152 can also be configured to output the confidences associated with the accuracy of predictions. The user query 110 along with the predicted intent 112 and slot 114 with corresponding confidences are provided to the orchestrator 160. The orchestrator 160 can compare the intent prediction confidence to a confidence limit 162 configured therein. If the confidence of the intent prediction is higher than the confidence limit 162, the intent 112 and the slot 114 are provided to the structured data manager 170 for information retrieval. The data responsive to the user query 110 can be obtained from an appropriate one of the data sources 120 identified by the slot 114 and provided to the user device 116 via the chatbot interface 150. If the intent prediction confidence is lower than the confidence limit 162, the user query 110 is provided to the unstructured data manager 180 for determining the intent 112. The intent 112 may be inherently associated with the slot 114 as the intent is indicative of a task which can be domain-specific. For example, the terminology to convey intents in different domains may differ and therefore, the accuracy of intent prediction can inherently depend on the accuracy of slot prediction or vice versa.
In an example, additional data such as a user context 132 which includes user role and session information can be used for intent/slot determination by the unstructured data manager 180. The user context 132 can be retrieved from the knowledge bases 124 which include domain-specific knowledge in structured or unstructured formats. The user context 132 data can be retrieved using, for example, the user's login information. The unstructured data manager 180 via the Gen. AI switch 190 employs one of a plurality of LLMs 192 e.g., LLM1, LLM2, . . . , LLMk (wherein k is a natural number greater than 1) for obtaining the data requested in the user query 110 from one or more of the data sources 120. The plurality of LLMs 192 can be trained for generating structured queries for extraction of data from the one or more data sources 120 for different intents, different domains, etc.
The unstructured data manager 180 can also prompt the LLM providing the structured query for retrieving the data from the relational databases 122 to also provide the corresponding confidence score that the query provided is responsive to the intent conveyed in the user query 110. In case the confidence score associated with the query is less than a predetermined threshold confidence score and if no query can be generated, then further processes which may also include an augmented retrieval process are employed with the prompt containing instructions to get the answers from the one or more of the plurality of data sources 120. Based again on the predetermined threshold confidence score, only chunks from the knowledge bases 124 having a predefined level of similarity with the intent 120 are used for answering the user query 110. In case no relevant chunks are available, the LLM is prompted to answer the subtask and instructed to explain the answer 194. Generally, the answer 194 can be provided as a textual output on a display screen of the user device 116. However, the chatbot interface 150 may also be enabled for voice interaction via, for example, text-to-voice and/or voice-to-text converters, so that the user query 110 can be received as an audio input and the answer 194 is provided as a voice output to the user device 116 by the chatbot interface 150. Wherever necessary, the chatbot interface 150 can also be configured to identify videos from one or more of the plurality of data sources 120 and output the videos through the chatbot interface 150.
The feedback from the user for the answer 194 can be received by the feedback processor 140. Users can provide feedback for each response by tagging it as thumps up and thumps down. They can also provide longer textual feedback for each session. The feedback can indicate whether or not the retrieved information is responsive to the user query 110. Feedback can be used as a mechanism to fine-tune the prompt for queries where data from the knowledge bases 124 has to be fetched for known intents. Answers with high negative feedback counts are added to the prompt so that the currently selected LLM can provide better results. An example prompt providing feedback may have the format:
In the above example, role is the profile which the user may have logged in e.g., manager, sales assistant, system administrator, etc. The ‘task’ is the intent or the function to be executed, the ‘instruction’ may detail how the task is to be executed. In an example, the instruction may include slot information e.g., a data source to be used, the ‘context’ includes DB schema information, session metadata, etc., the ‘format instructions’ may detail the format in which the user likes to receive the answer, the ‘previous answer’ is the prior answer received by the user and the ‘feedback’ is the user's reaction to the previous answer. The ‘answer’ is the response that was expected by the user. Similarly, query-response pairs for queries with unknown intents and having positive feedback can also be used to fine-tune one or more of the plurality of LLM models 192.
If the feedback from a predetermined number of users is negative indicating that the answer 194 is not responsive to the user query 110, the retrieved information/answer with the high negative feedback can be added to the prompt to increase the likelihood that the non-responsive LLM is not selected for similar user inputs that may be received in future. If the feedback from a predetermined number of users is positive indicating that the retrieved information/answer 194 is responsive to the user query 110, the answer with the high positive feedback can be used with the query that generated information in the positively-reviewed answer so that the responsive LLM is selected for similar user inputs that may be received in future. Thus, regular monitoring and analysis of free-text feedback helps to identify the gaps in the chatbot. New topics can be identified through LLM prompting with few-shot prompting or classical topic modeling techniques. Few-shot prompting can be used as a technique to enable in-context learning where demonstrations are provided in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where the model is to generate a response. New intents can be added to address prominent gaps to improve the accuracies of the LLMs.
The selection/de-selection of the plurality of LLMs 192 is carried out by the Gen. AI switch 190. The Gen. AI switch 190 can be configured in a rule-based process or programmatically to select a particular one of the plurality of LLMs 192 based on the slot 114 and the intent 112 obtained from the user input. In an example, each LLM of the plurality of LLMs 192 can be trained to provide a query for retrieving information from the relational databases 122. The plurality of LLMs 192 can include LLMs trained for particular domains, e.g., oil and gas domain, retail domain, healthcare domain, etc. The plurality of LLMs 192 can also include models that are customized to particular organizations so that such models can handle proprietary or confidential information. The rule-based process may implement a hierarchy in the selection to narrow down the LLM that will be used based at least on configuration details such as industry, region, functional domains within an organization, etc., in addition to user feedback. The different types of LLMs used in the plurality of LLMs 192 may include Jurassic-2 Ultra®, GPT-4®, BARD®, Cohere®, PaLM®, and Claude v1®, etc. Therefore, even as new LLMs are introduced, such LLMs may be incorporated into the apparatus 100 without significant re-configuration so that the apparatus 100 maintains forward compatibility. In an example, LLMs that may be particularly suited for structured query generation, such as but not limited to, Jurassic-2 Ultra may be frequently selected by the Gen AI switch 190.
The query process selector 204 selects to process the user query 110 by one of the structured data manager 170 or the unstructured data manager 180. If the intent 112 is known, then the structured data manager 170 is selected for further processing. However, if the intent 112 is unknown (i.e., cannot be determined with a confidence above the confidence limit 162) then the user query 110 is processed via the unstructured data manager 180 for retrieval of the query results. The answer provider 206 receives the query results from one of the structured data manager 170 or the unstructured data manager 180 and forwards the query results to the chatbot interface 150 for presentation on the user device 116 in one or more of text, audio, or video formats as the answer 194. In an example, the answer 194 may be composed as an email that the user can review and send out, or the answer 194 may trigger an action on a downstream Enterprise Resource Planning (ERP) system. When presented as an audio response or a video, the answer 194 may bring up interactive dashboard reports for the user to perform analysis and derive insights, etc.
The data thus gathered by the user input processor 402 including the user context 132 is provided to the prompt generator 404. The prompt generator 404 automatically identifies a prompt 442 to obtain a structured query 444 (e.g., SQL query) from one of the plurality of LLMs 192 which is currently in use. The prompt 442 can be identified from a datastore of pre-configured prompts 424. For example, a pre-configured prompt “Generate sql query for {query} based on the relational table with following details {table schema} {table description}. Explain the query. If no sql query can generate, say no sql query can be generated” can be retrieved from the datastore of pre-configured prompts 424 to obtain the structured query 444. The structured query 444 is executed against the relational databases 122 by the results retriever 406 for extracting the query results 446 for the user query 110 which may be formatted into the answer 194 by the chatbot interface 150. In an example, if a SQL query cannot be obtained from the current LLM, the prompt generator 404 may initially run another prompt from the data store of pre-configured prompts 424 to sub-divide the intent 112 into multiple sub-tasks and then generate prompts to obtain multiple SQL queries for each of the multiple sub-tasks. In case the sub-division into tasks also fails, the prompt generator 404 can implement an augmented retrieval process to obtain the relevant data/results. Under the augmented retrieval process (e.g., a retrieval augmented generation), an input is used to retrieve a set of relevant/supporting collection of documents/chunks from the knowledge bases 124 which may include the domain-specific information. In an example, a relevance threshold can be implemented to select the top n documents (wherein n is a natural number and n=1, 2, 3, . . . ). The documents/chunks can be concatenated as context with the original input prompt and fed to the currently selected LLM which produces the final output. The final output includes the structured query 444 which is executed by the results retriever 406 on the relational databases 122 for obtaining the query results 354.
The user query 110 is processed by the intent-slot prediction model 152 at 504 for identification of the intent 112 and the slot 114. Known intents and slots can be configured for the intent-slot prediction model 152. These help to distinctly identify frequently used queries from users. More particularly, the intent-slot prediction model 152 can be trained on user queries along with the intents and slots labeled for such user queries. Different queries may have the same intent and the request for the same information can be framed differently. Additionally, the confidences associated with such predictions can also be extracted as output from the intent-slot prediction model 152. For example, a user query such as, “Provide the breakup for overstocked items” or “Can I get the breakdown for overstocked items for Supplier items?” can both be mapped to the intent “breakup_overstocked” on the slot “material_group”. Therefore, in response to these prompts, the apparatus 100 may execute SQL queries against material tables to retrieve the overstocked items as the answer.
The apparatus 100 is configured with the confidence limit 162 for comparison of the confidences with which the intent 112 is determined. At 506, the confidence with which the intent 112 was determined (i.e., intent determination confidence) is extracted from the intent-slot prediction model 152 which can implement classical AI techniques. The confidence thus extracted is compared with the confidence limit 162. At 508, it is determined if the intent determination confidence is greater than the confidence limit 162. If the intent determination confidence is greater than the confidence limit 162, then the method moves to 510 to retrieve the responsive query 352 mapped to the intent 112 from the intent-query mappings 350. The responsive query 352 is executed on the relational databases 122 to obtain query results 354 at 512. The query results 354 are formatted at 514 and output as the answer 192 to the user query 110. If at 508, it is determined that the intent determination confidence is less than the confidence limit 162, the method moves to 516 wherein the query results 354 for the user query 110 are retrieved using one of the plurality of LLMs 192 as described herein. The method returns to 514 wherein the query results 354 are formatted and output as the answer 192 to the user query 110.
The prompts for the sub-tasks obtained at 610 are further executed by the currently selected LLM for obtaining the corresponding structured queries at 612. It is determined at 614 if the structured queries are generated for the sub-tasks. If it is determined at 614, that the structured queries are generated for the sub-tasks by the currently selected LLM, the method proceeds to 616 for the execution of the structured queries on the relational databases 122 in the same sequence corresponding to the sequence of the sub-tasks. If it is determined at 614 that the structured queries were not generated for the sub-tasks, the user query 110 is passed through the augmented retrieval approach at 618 to obtain the answers. The generation of the sub-tasks and the subsequent generation of SQL queries are carried out by the currently selected LLM in a process which may not be transparent to the user.
If at 704, it is determined that none of the documents retrieved from the knowledge bases 124 meet the predetermined similarity threshold, the currently selected LLM is prompted to provide an answer to the subtask and instructed to provide an explanation for the answer at 714 and the process terminates on the end block.
The computer system 900 includes hardware processor(s) 902, such as a central processing unit, ASIC or another type of hardware processing circuit, input/output (I/O) devices 912, such as a display, mouse keyboard, etc., a network interface 904, such as a Local Area Network (LAN), a wireless 802.11x LAN, a 3G, 4G or 5G mobile WAN or a WiMax WAN, and a processor-readable medium 906. Each of these components may be operatively coupled to a bus 908. The processor-readable or computer-readable medium 906 may be any suitable medium that participates in providing instructions to the processor(s) 902 for execution. For example, the processor-readable medium 906 may be a non-transitory or non-volatile medium, such as a magnetic disk or solid-state non-volatile memory, or a volatile medium such as RAM. The instructions or modules stored on the processor-readable medium 906 may include machine-readable instructions 964 executed by the processor(s) 902 that cause the processor(s) 902 to perform the methods and functions of the AI-based filter apparatus 100.
The chatbot apparatus 100 may be implemented as software or machine-readable instructions stored on a non-transitory processor-readable storage medium and executed by one or more processors 902. For example, the computer-readable storage medium or non-transitory processor-readable medium 906 may store an operating system 962, such as MAC OS, MS WINDOWS, UNIX, or LINUX, and code/instructions 964 for the AI-based filter apparatus 100. The operating system 962 may be multi-user, multiprocessing, multitasking, multithreading, real-time, and the like. For example, during runtime, the operating system 962 is running and the code for the Generative AI-based chatbot apparatus 100 is executed by the processor(s) 902.
The computer system 900 may include a data storage 910, which may include non-volatile data storage. The data storage 910 stores any data used by the AI-based filter apparatus 100. The data storage 910 may be used as local data storage of the apparatus 100 to store the user queries, retrieved results, formatted answers, confidences, and other data generated or used by the Gen. AI-based chatbot apparatus 100.
The network interface 904 connects the computer system 900 to internal systems for example, via a LAN. Also, the network interface 904 may connect the computer system 900 to the Internet. For example, the computer system 900 may connect to web browsers and other external applications and systems via the network interface 904.
What has been described and illustrated herein is an example along with some of its variations. The terms, descriptions, and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the spirit and scope of the subject matter, which is intended to be defined by the following claims and their equivalents.