As the technological capacity for organizations to create, track, and retain information continues to grow, a variety of different technologies for managing and storing the rising tide of information have been developed. Different types of data may be stored across many different systems or services. When it is time to locate desired information, the different systems or services storing data may have to be checked in order to obtain relevant data.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.
It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the present invention. The first contact and the second contact are both contacts, but they are not the same contact.
Various techniques of intent classification for executing a retrieval augmented generation pipeline for natural language tasks using a generative machine learning model are described herein. Generative machine learning models refer to machine learning techniques that model different types of data in order to perform various data generative tasks given a prompt. For example, natural language generative machine learning models, such as large language models (LLMs), are one type of generative machine learning model that refer to machine learning techniques applied to model language, which may include natural language (e.g., human speech) and machine-readable language (e.g., programming languages, scripts, code representations, etc.). For generative machine learning models that model language, the generative machine learning models may take language prompts and generate corresponding programming language predictions (which may be referred to as code predictions or code suggestions).
Generative machine learning models that generate language to perform various natural language processing tasks, are a form of machine learning that provides language processing capabilities with wide applicability to a number of different systems, services, or applications. More generally, machine learning refers to a discipline by which computer systems can be trained to recognize patterns through repeated exposure to training data. In unsupervised learning, a self-organizing algorithm learns previously unknown patterns in a data set without any provided labels. In supervised learning, this training data includes an input that is labeled (either automatically, or by a human annotator) with a “ground truth” of the output that corresponds to the input. A portion of the training data set is typically held out of the training process for purposes of evaluating/validating performance of the trained model. The use of a trained model in production is often referred to as “inference,” during which the model receives new data that was not in its training data set and provides an output based on its learned parameters. The training and validation process may be repeated periodically or intermittently, by using new training data to refine previously learned parameters of a production model and deploy a new production model for inference, in order to mitigate degradation of model accuracy over time.
For generative machine learning models, the “inference” may be the output predicted by the generative machine learning model to satisfy a language prompt (e.g., create a summary of a draft financial plan). A prompt may be an instruction and/or input text in one (or more) languages (e.g., in a programming language). Different generative machine learning models may be trained to handle varying types of prompts. Some generative machine learning models may be generally trained across a wide variety of subjects and then later fine-tuned for use in specific applications and subject areas. Fine-tuning refers to further training performed on a given machine learning model that may adapt the parameters of the machine learning model toward specific knowledge areas or tasks through the use of additional training data. For example, an LLM may be trained to recognize patterns in text and generate text predictions across many different scientific areas, literature, transcribed human conversations, and other academic disciplines and then later fine-tuned to be optimized to perform language tasks in a specific area.
Retrieval augmented generation is another technique for adapting generative machine learning models to perform tasks for specific use cases by obtaining relevant data as part of using a generative machine learning model. For example, various data retrieval techniques for identifying and providing relevant data information may be implemented in order to augment the performance of the generative machine learning model. Challenges arise when the number and complexity of accessing different data sources or determining how to handle different natural language requests, including if, when, and how much to utilize retrieval augmented generation to perform tasks that are adapted to relevant data. Some natural language requests may suffer from poor performance if provided directly to data retrieval or to prompt a generative machine learning model to obtain a result. For example, different types of tasks, including instructions, questions, and conversational interactions may each involve different handling in order to optimally contextualize the performance of the natural language request and task for data retrieval (if any) and generative machine learning response generation. Therefore, techniques that can better determine the intent behind a natural language request to perform a natural language task can achieve better performance results across different natural language tasks, by intelligently recognizing when different types of handling are optimal for different natural language tasks. Accordingly, implementing intent classification for executing a retrieval augmented generation pipeline for natural language tasks using a generative machine learning model can improve the performance of generative machine learning systems by optimally using computing resources when appropriate (e.g., not performing data retrieval when not needed), decontextualizing requests (e.g., to add in relevant information), and recognizing and performing tasks in multiple parts, when needed (e.g., by classifying a task as multi-part in order to determine and perform the multiple parts before providing a response).
Generative machine learning service 110 may implement natural language task orchestration 120 in order to handle natural language tasks for an associated application 102 implementing workflows, such as the example workflow illustrated below with regard to
In some embodiments, machine learning based approaches may be implemented for intent classifier model 122. For example, a neural network-based language model such as Bidirectional Encoder Representations from Transformers (BERT) or Robustly Optimized BERT pre-training Approach (ROBERTA). These or various other machine learning models may be trained to recognize different intents. For example, generic conversation natural language requests like “Hello” or “How are you?” can be detected by training the intent classifier model 122 to recognize phatic intent (which does not need data retrieval pipeline 160). For instruction or command intents, that include requests such as “write email, summarize text, write article, etc.” the intent classifier model 122 can be further trained to detect instruction intent (including general and conversational commands). Intent classification model 122 can also be trained to recognize keyword requests (which may be queries that just type in a keyword without other context. For example, keyword requests may lack sufficient semantics and could be very short or technical. They possibly do not use a generative model (e.g., data retrieval may be sufficient) or might require some query rewriting to make them semantically meaningful. For example, IP search “172.1.2.100” or searching for specific terms like “MX-52113” which may be a product number. Multi-part tasks can be similarly trained as well.
Retrieval pipeline 160 may access 123 associated data repositor (ies) for the application and used obtained data 125 to generate an augmented prompt 162 for generative machine learning model 170. The result 172 may be provided to natural language task orchestration 120 which may then determine whether other parts are needed to perform (e.g., according to the intent classification), whether the result is valid, or other post-result processing tasks. Some natural language tasks 102 may be determined to have zero iterations through retrieval pipeline 160 and instead may be sent as prompt 106 to generative machine learning model. The final result 108, when achieved, may be provided to the natural language generative application.
Please note that the previous description is a logical illustration and thus is not to be construed as limiting as to the implementation. Different combinations or implementations may be implemented in various embodiments.
This specification begins with a general description of a provider network that implements a generative natural language application service that supports intent classification for executing a retrieval augmented generation pipeline for natural language tasks using a generative machine learning model. Then various examples of distributed orchestration of natural language tasks using a generative machine learning model including different components, or arrangements of components that may be employed as part of implementing the service are discussed. A number of different methods and techniques to implement intent classification for executing a retrieval augmented generation pipeline for natural language tasks using a generative machine learning model are then discussed, some of which are illustrated in accompanying flowcharts. Finally, a description of an example computing system upon which the various components, modules, systems, devices, and/or nodes may be implemented is provided. Various examples are provided throughout the specification.
In various embodiments, the components illustrated in
In various embodiments, natural language generative application service 210 may provide a scalable, serverless, and machine-learning powered service to create or support generative natural language applications using data specific to the application, such as data stored in database services 230, data storage services 240, or other services 260. Natural language generative application service 210 may enables users (e.g., enterprise customers) to deploy a generative AI-powered “expert” in minutes. For example, users (e.g., enterprise employees or agents) can ask complex questions via applications that operate on enterprise data, get comprehensive answers and execute actions on their enterprise applications in a unified, intuitive experience powered by generative AI.
Natural language generative application service 210 easily connects to a variety of different systems, services, and applications, both hosted internal to provider network 200 and external to provider network 200 (e.g., other provider network/public cloud services or on-premise/privately hosted systems). Once connected, natural language generative application service 210 allows users to ask complex questions and execute actions on these systems using natural language (e.g., human speech commands). For example, a sales agent can ask the generative application to compare the various credit card offers and recommend a card with the best travel points for their customer and natural language generative applications service 210 would support the features to provide a recommendation and the reason for its choice along with references to the data sources for this recommendation. In some scenarios, a user can use the generative application to create a case summary and add it to a customer relationship management (CRM) system.
Natural language generative application service 210 may implement security layers that check user permissions to prevent unauthorized access to enterprise systems thereby ensuring users only see information and perform actions they are entitled to. Natural language generative application service 210 implements guardrails to protect against and avoids incorrect or erroneous statements or other generated results (sometimes called hallucinations) by limiting the responses to data in the enterprise and builds trust by providing citations and references to the sources used to generate the answers. Natural language generative application service 210 may offer an intuitive user interface to create and deploy an enterprise-grade application to users in minutes without requiring generative machine learning domain expertise.
For example, enterprises are struggling to provide new generative AI-powered experiences that their users expect while interacting with enterprise systems. Users may need to switch across multiple fragmented systems like internal wiki, various data share sites, communication sites or messaging services in order to find information because they cannot get comprehensive answers collated from ideas contained in multiple pieces of content. Moreover, users are unable to ask probing follow-up questions or perform comparative analysis on the content to understand it better. When users need to take any follow-up actions, users then need go through multiple platforms like CRM systems, ticketing systems and other enterprise applications to take the action.
Recent advancements in generative AI powered by machine learning models trained to generate content (referred to as generative machine learning models), such as generative language models, like Large Language Models (LLMs), have opened up possibilities to build intuitive expert-like experiences. However, these generative models have limitations as they are not knowledgeable about enterprise data and their knowledge is not up-to-date. Generative models also hallucinate and there is no way for end users to fact-check the responses. Additionally, enterprises need to ensure that users do not get answers from content that they do not have access to. Enterprises may also need to build a conversational application and deploy it for their users. This makes it hard to adopt the new generative AI technologies for enterprise use cases. Lack of unified, intuitive experiences for the enterprise leads to poor knowledge sharing among the users, lower rate of self-service, and loss of productivity across the company.
With natural language generative application service 210, enterprises (and other service users) utilize the various features of natural language generative application service 210 to overcome the technical challenges standing in the way of enterprises to make use of generative AI. Natural language generative application service 210 allows enterprises to easily tap into the power of AI technologies, including generative AI, to transform how their users interact with their enterprise applications in a secure way. Natural language generative application service 210 moves beyond the traditional fragmented experience of navigating multiple systems to a single, unified expert-like experience. Using an intuitive interface elements (e.g., a simple point-and-click admin interface), application creators (e.g., for enterprises) can sync with enterprise systems. Users of the generative applications benefit from capabilities like generative answers from multiple documents, answers from knowledge embedded in the model, comparative analysis, content summarization, math and reasoning, text generation and ability to execute actions on enterprise apps. Natural language generative application service 210 may support requests to find information and execute follow-up actions (e.g., “find me policy options for this client and attach a summary to client notes in a CRM system”). Natural language generative application service 210 uses enterprise content to generate answers thus minimizing hallucinations and providing up-to-date information. To ensure trust and safety for the users, Natural language generative application service 210 weaves in human-like citations, references, and attachments for source documents in its response. Natural language generative application service 210 manages enterprise access and access control list (ACL) permissions. When the user asks a question to natural language generative application service 210, natural language generative application service 210 analyzes the data in the enterprise systems and generates responses only from the content that the user has access to. Natural language generative application service 210 also provides a pre-built conversational application that can be easily deployed for end users in minutes speeding up the time to value for application creators. The unified and intuitive experience provided by natural language generative application service 210 improves productivity and knowledge sharing for enterprises and enhances self-service for end users.
In various embodiments, application creators can deploy generative applications that can utilize natural language generative application service 210 in their enterprise in minutes. For example, in a console or other graphical user interface, creators can quickly connect their enterprise systems to natural language generative application service 210. Natural language generative application service 210 provides a wide range of built-in data connectors to different data sources to associate them as data repositories for a generative application and supports data retrievers, which find relevant data (e.g., documents or other non-natural language data, such as image data, numerical data, audio or video data) to feed into a generative machine learning model (e.g., an LLM). Natural language generative application service 210 also supports actions for enterprise systems such as updating a customer record in a database or creating a ticket in an issue management system so that users can execute actions in those applications using natural language commands. Next, application creators can connect their generative applications with their identity providers (e.g., both internal to or external to provider network 200), etc. Finally, application creators can deploy the pre-built conversational application to their end users.
Natural language generative application service 210 may support interactions through a generative application created (and in some embodiments hosted by natural language generative application service 210) in order to perform various tasks, which may be specified in natural language request. Features of natural language generative application service 210 to support these interactions may include question answering for enterprise data. For instance, natural language generative application service 210 can process questions from end users and returns generative responses using information from various secure enterprise data sources. Natural language generative application service 210 can continue the conversation with the user in the context of the active session or start with a new one. Natural language generative application service 210 will support question answering on both structured and unstructured data sources. Application creators (e.g., which may be enterprise administrators) can choose if they want to limit answers from enterprise content or leverage the knowledge of the generative model to answer queries.
Another example feature of natural language generative application service 210 to support interactions may be security. Natural language generative application service 210 provides ACL support across private data (e.g., enterprise data) and the application-level security for enterprise systems. Natural language generative application service 210 may generate responses that are only based on content that an end user has access to. Natural language generative application service 210 may presents references and other summary information from the sources (e.g., documents) which were used to generate the response for the end user so that the user can use that for fact checking. Follow-up actions suggested by natural language generative application service 210 to the user will only execute actions on applications that the user has access to (e.g., database systems, CRM systems, and so on that the user has access to).
Another example feature of natural language generative application service 210 to support interactions may be actions. Natural language generative application service 210 enables end users to perform actions on various applications like email, messaging, posting or other communication or data sharing applications using natural language commands. For example, an end user can ask natural language generative application service 210 to update an opportunity in a CRM system or create a ticket in a ticketing system.
Another example feature of natural language generative application service 210 to support interactions is summarization. End users can also ask for a summary of the content in their chat.
Another example feature of natural language generative application service 210 to support interactions is built-in data connectors. Natural language generative application service 210 natively supports document and other data retrievers for many different data storage systems, data search systems, database systems, or any other data repositories, including support for ACLs for those systems. The connectors may eliminate the heavy lifting involved in crawling data sources, extracting text content from files, and making it available for search.
Another example feature of natural language generative application service 210 to support interactions may be usage analytics. Natural language generative application service 210 allows application creators (e.g., admins) to analyze end user engagement metrics including the number of queries, number of sessions, queries per session, and popular queries. In this way, an application can be updated or modified based on the usage analytics.
Another example feature of natural language generative application service 210 to support interactions is personalization. Natural language generative application service 210 leverages an end user's context such as role, location, etc. and learns from past interactions such as past searches as well as thumbs up/thumbs down feedback received from users to provide a personalized experience.
Natural language generative application service 210 may support various features to ingest, index, and/or retrieve relevant data from associated data repositories for a generative application. Natural language generative application service 210 features that can connect and ingest data from different data sources. Once the data sources are connected, natural language generative application service 210 will process data from these content sources and be ready to be deployed in minutes. However, if an application creator already has content in a retriever like OpenSearch or other index, then these retrievers can easily be integrate with natural language generative application service 210.
As noted above, generative machine learning models can sometimes create seemingly good but factually incorrect or otherwise erroneous answers called hallucinations. In addition, it is possible that generative machine learning models pick up inappropriate content because they are trained on large public data sets. These risks can undermine the accuracy and trustworthiness for applications. Natural language generative application service 210 addresses these issues with multiple capabilities. Natural language generative application service 210 combines generative machine learning models with application-specific data retrieval to provide question answering functionality. Natural language generative application service 210 first uses a retriever to find relevant data for a request from the associated data repositories and then feeds portions from the top relevant data to a generative machine learning model to get a synthesized response that is relevant to application creator (e.g., enterprise) content. In addition, natural language generative application service 210 provides citations and references to the enterprise documents that were used to generate the responses so that end users can verify the accuracy of the answer. Natural language generative application service 210 also leverages built-in prompt and response classifiers to detect inappropriate content such as swearing, insults, and profanity.
Natural language generative application service 210 provides various interface elements and features, including APIs and UI components (e.g., code snippets or libraries that encapsulate the natural language generative application service 210 functionality without defining the specific style of the user interface) for application creators who want to integrate natural language generative application service 210 with their own generative AI-powered applications. Using these APIs and headless components, application creators can embed natural language generative application service 210 features into their own applications.
Natural language generative application service 210 provides many customization options for application creators, including but not limited to:
There may be scenarios in which natural language generative application service 210 cannot find or cannot generate a desired result (e.g., an answer to a particular question). In such scenarios, natural language generative application service 210 will respond that it could not find the answer and will return a list of documents or other data that may contain information related to the question asked.
Natural language generative application service 210 supports various creation user interfaces, including programmatic, API or software development kit (SDK), and/or graphical user interfaces, such as a hosted web-console. For example, a web-console of natural language generative application service 210 may provide an easy way to get started. An application creator can point natural language generative application service 210 to content sources and use the experience builder to quickly deploy a pre-built user interface for end users. An application creator can also apply customization such as response tuning, custom document enrichment, and custom synonyms, to further improve answer accuracy, as noted above. Natural language generative application service 210 can also be integrated with non-hosted applications using APIs.
Natural language generative application service 210 natural language capabilities enable it to understand any business domain or specialty. However, for application specific vocabulary (e.g., specific to a particular enterprise), application creators can use natural language generative application service 210's custom synonyms feature to tune natural language generative application service 210so that it can recognize those words.
Natural language generative application service 210 may provide support to access various types of data files and formats, including but not limited to, PDF, HTML, slide presentation files, word processing files, spreadsheet files, Javascript Object Notation (JSON), Comma Separated Value (CSV), Rich Text Files (RTFs), plain text, audio/video, images and scanned documents. Natural language generative application service 210 may support many different human languages for interacting performing natural language tasks.
Natural language generative application service 210 may securely store application data and uses it only for the purpose of providing the service to the application's end-users. The data may be encrypted using service-provided keys or application creator provided keys.
Natural language generative application service 210 may implement front-end 211, in some embodiments. Front-end 211 may support various types of programmatic (e.g., Application Programming Interfaces (APIs)), command line, and/or graphical user interfaces to support the management of data sets for analysis, request, configure, and/or otherwise obtain new or existing analysis, and/or perform natural language queries, as discussed below. Front-end 211 may be a service that an application creator (or application owner) will use to configure and build custom applications (e.g., for generative AI-powered conversation). For example, front-end 211 may support HTTPS/2 for streaming use cases and fall back to HTTPS/1.1 for non-streaming use cases, in some embodiments. In some embodiments, front-end 211 may have browser support for API, with web-socket support for the streaming interface. In various embodiments, front-end 211 may implement throttling, metering, ensuring authentication and authorization.
Front-end 211 may dispatch requests (and/or proxy for) downstream services of natural language generative application services (e.g., control plane 212, natural language task orchestration 213, session store 214, retrieval 215, ingestion and indexing 216, data access management 217, and application management 218). For example, front-end 211 may dispatch requests to control plane 212 for setting up the top level resources necessary for generative applications/accounts, to application management 218 to allow configuration of the app, to retrieval 215 to allow configuring of retrieval sources against the generative application, to session store 214 to get conversational history (for conversational history API, to natural language task orchestration 213 for the generative requests.
Natural language generative application service 210 may implement control plane 212, in some embodiments. Control plane 212 may be a service which will store and manage the top level account for a generative application (or multiple generative applications that may be created under an account). Control plane 212 may also be a single point service for handling data protection regulation (e.g., GDPR), resource identification and tagging from other provider network 200 services, and requests for operations such as deletion of top level resources. Control plane 212 may orchestrate the actions across other services of natural language generative application service 210, such as application management service 217 and retrieval 215.
Natural language generative application service 210 may implement ingestion and indexing 216, in some embodiments. Ingestion and indexing 216 service may allow application creators to identify and index data for association as a data repository for a generative application Ingestion and indexing 215 may index documents to a service index (e.g., via an API call). Ingestion and indexing 218 may be service that stores documents into a service index for retrieval as part of performing natural language tasks. In some embodiments, ingestion and indexing 2158 abstracts the underlying storage and type and may include a model invocation during indexing and retrieval operation. The model call may be to generate embedding vectors before the data is indexed and also against the data (e.g., query text) during retrieval invocation.
Natural language generative application service 210 may implement data access management 217, in some embodiments. As discussed in detail below with regard to
Natural language generative application service 210 may implement application management 218, in some embodiments. In various embodiments, natural application management 218 may support creation and hosting of a generative application that will be available to end users like a SAAS (Software as a service) available, as a hosting service or an application that is published to an endpoint, as discussed in detail below with regard to
Natural language generative application service 210 may implement natural language task orchestration 213, in some embodiments. Natural language task orchestration 213 may execute workflows to perform natural language tasks received as natural language requests, as discussed above and in detail below with regard to
Natural language generative application service 210 may implement session store 214, in some embodiments. Session store 214 may be responsible for ensuring that the context in a conversation is maintained (e.g., even if the socket connection is closed by the user). Session store 214 may also providing the data for the conversation history (as discussed below with regard to
Natural language generative application service 210 may implement retrieval 215, in some embodiments. Retrieval service 215 may support data retrieval from retrieval sources, as discussed in detail below with regard to
In various embodiments, database services 230 may be various types of data processing services that perform general or specialized data processing functions (e.g., analytics, big data querying, time-series data, graph data, document data, relational data, structured data, or any other type of data processing operation) over data that is stored across multiple storage locations, in some embodiments. For example, in at least some embodiments, database services 210 may include various types of database services (e.g., relational) for storing, querying, and updating data. Such services may be enterprise-class database systems that are scalable and extensible. Queries may be directed to a database in database service(s) 230 that is distributed across multiple physical resources, as discussed below, and the database system may be scaled up or down on an as needed basis, in some embodiments. The database system may work effectively with database schemas of various types and/or organizations, in different embodiments. In some embodiments, clients/subscribers may submit queries or other requests (e.g., requests to add data) in a number of ways, e.g., interactively via an SQL interface to the database system or via Application Programming Interfaces (APIs). In other embodiments, external applications and programs may submit queries using Open Database Connectivity (ODBC) and/or Java Database Connectivity (JDBC) driver interfaces to the database system.
In some embodiments, database services 220 may be various types of data processing services to perform different functions (e.g., query or other processing engines to perform functions such as anomaly detection, machine learning, data lookup, or any other type of data processing operation). For example, in at least some embodiments, database services 230 may include a map reduce service that creates clusters of processing nodes that implement map reduce functionality over data stored in one of data storage services 240. Various other distributed processing architectures and techniques may be implemented by database services 230 (e.g., grid computing, sharding, distributed hashing, etc.). Note that in some embodiments, data processing operations may be implemented as part of data storage service(s) 230 (e.g., query engines processing requests for specified data).
Data storage service(s) 240 may implement different types of data stores for storing, accessing, and managing data on behalf of clients 270 as a network-based service that enables clients 270 to operate a data storage system in a cloud or network computing environment. For example, one data storage service 230 may be implemented as a centralized data store so that other data storage services may access data stored in the centralized data store for processing and or storing within the other data storage services, in some embodiments. Such a data storage service 240 may be implemented as an object-based data store, and may provide storage and access to various kinds of object or file data stores for putting, updating, and getting various types, sizes, or collections of data objects or files. Such data storage service(s) 230 may be accessed via programmatic interfaces (e.g., APIs) or graphical user interfaces. A data storage service 240 may provide virtual block-based storage for maintaining data as part of data volumes that can be mounted or accessed similar to local block-based storage devices (e.g., hard disk drives, solid state drives, etc.) and may be accessed utilizing block-based data storage protocols or interfaces, such as internet small computer interface (iSCSI).
In various embodiments, data stream and/or event services may provide resources to ingest, buffer, and process streaming data in real-time, which may be a source of data repositories. In some embodiments, data stream and/or event services may act as an event bus or other communications/notifications for event driven systems or services (e.g., events that occur on provider network 200 services and/or on-premise systems or applications).
Generally speaking, clients 270 may encompass any type of client configurable to submit network-based requests to provider network 200 via network 280, including requests for materialized view management platform 210 (e.g., a request to create a generative application at natural language generative application service). For example, a given client 270 may include a suitable version of a web browser, or may include a plug-in module or other type of code module that may execute as an extension to or within an execution environment provided by a web browser. Alternatively, a client 270 may encompass an application such as a generative application (or user interface thereof), in provider network 200 to implement various features, systems, or applications. (e.g., to use natural language generative application service 210 APIs to send natural language requests to perform different tasks (e.g., question answering, summarization, or various other features as discussed above). In some embodiments, such an application may include sufficient protocol support (e.g., for a suitable version of Hypertext Transfer Protocol (HTTP)) for generating and processing network-based services requests without necessarily implementing full browser support for all types of network-based data. That is, client 270 may be an application may interact directly with provider network 200. In some embodiments, client 270 may generate network-based services requests according to a Representational State Transfer (REST)-style network-based services architecture, a document- or message-based network-based services architecture, or another suitable network-based services architecture.
In some embodiments, a client 270 may provide access to provider network 200 to other applications in a manner that is transparent to those applications. For example, client 270 may integrate with an operating system or file system to provide storage on one of data storage service(s) 240 (e.g., a block-based storage service). However, the operating system or file system may present a different storage interface to applications, such as a conventional file system hierarchy of files, directories and/or folders. In such an embodiment, applications may not need to be modified to make use of the storage system service model. Instead, the details of interfacing to the data storage service(s) 240 may be coordinated by client 270 and the operating system or file system on behalf of applications executing within the operating system environment.
Clients 270 may convey network-based services requests (e.g., natural language queries) to and receive responses from provider network 200 via network 280. In various embodiments, network 280 may encompass any suitable combination of networking hardware and protocols necessary to establish network-based-based communications between clients 270 and provider network 200. For example, network 280 may generally encompass the various telecommunications networks and service providers that collectively implement the Internet. Network 280 may also include private networks such as local area networks (LANs) or wide area networks (WANs) as well as public or private wireless networks. For example, both a given client 270 and provider network 200 may be respectively provisioned within enterprises having their own internal networks. In such an embodiment, network 280 may include the hardware (e.g., modems, routers, switches, load balancers, proxy servers, etc.) and software (e.g., protocol stacks, accounting software, firewall/security software, etc.) necessary to establish a networking link between given client 270 and the Internet as well as between the Internet and provider network 200. It is noted that in some embodiments, clients 270 may communicate with provider network 200 using a private network rather than the public Internet.
As noted above, natural language generative application service 210 may support communications with external data sources 290 over network 280 in order to obtain data for performing various natural language tasks.
A request to create a non-hosted application 302 may be received. The creation request may include many of the aforementioned configuration features or parameters, such as including an identity provider, implement or enable various analytics collection, associated or specify various associated data repositories, enable/specify various custom features (e.g., actions, style, etc. as discussed above). Request handling 300 may be invoked by control plane 212 which may be invoked by front-end 211, not illustrated) to perform the request and create in application metadata 310 configuration information for non-hosted application 312. Various features of a non-hosted application can be changed in subsequent requests (not illustrated), such as adding or removing data repositories, adding, modifying, or removing custom features, or various other features of the non-hosted application. For a non-hosted application, application provisioning 320 may still allocate application identifiers and/or other information, as indicated at 321. When non-hosted generative language application 352 invokes natural language generative application service 210 via front-end 211 to perform different tasks (e.g., responsive to end user interactions 354) using the provided identifier, as indicated at 356. Although not illustrated, interactions with an identity provider may be performed prior to performing interactions 356 (e.g., by application 352 interacting with an identity provider system/service directly). The end user identity, having been determined by the identity provider (e.g., using sign-on or other end user identification procedure), may be included to information interactions 356 to be specific to the identified end user.
For request to create a hosted application 304, request handling 300 may initiate application creation 305, application provisioning 320 may provision computing resources 330 and a network endpoint for accessing the generative natural language application 332 (which may be configured according to various options supported by application management 218) in addition to adding hosted application metadata 314. For example, the creation request 304 may include many of the aforementioned configuration features or parameters, such as including an identity provider, implement or enable various analytics collection, associated or specify various associated data repositories, enable/specify various custom features (e.g., actions, style, etc. as discussed above). Various features of a non-hosted application can be changed in subsequent requests (not illustrated), such as adding or removing data repositories, adding, modifying, or removing custom features, or various other features of the hosted application. Application provisioning 320 may obtain (e.g., from computing service provider of provider network 200), computing resources 330 (e.g., virtual computing resources to serve as a host system) and build a generative natural language application 332 according to the provided configuration features. For example, different software components corresponding to the different selected features can be obtained and integrated based on the application specific information (e.g., identified data repositories, identified data retrievers, identity provider, and so on). Then an executable form (e.g., compiled, assembled, or otherwise built form of the generative application may be installed on the provisioned computing resources as generative natural language application 332. A network endpoint (e.g., a network address, such as a URL), may be provided so that end-users can access generative natural language application 332.
Once created, generative natural language application 332 may be ready to accept end user requests 344 and interact 346 with natural language generative application service via front-end 211. An example interaction flow is described below. An end user visits the hosted generative application (e.g., web app) network endpoint for the first time and gets directed to the login page of the configured identity provider, where the end user enters their username and password. Upon successful authentication, the end user is directed to obtain access credentials for generative natural language application 332 (e.g., using the SAMLRedirect API, where the identity provider provides the SAMLAssertion certification, then calling the STS (Security Token Service) assumeRoleWithSAML using the SAMLAssertion to obtain sigV4 credentials (AccessKey, SecretKey). The obtained credentials may be valid for a period of time (e.g., 1 hour) allowing the end user access to generative natural language application 332. The end user is then directed to the home page for final authentication and credential (e.g., using cookies or other session preserving information). An authentication token may be obtained and used to establish a connection for interactive features (e.g., a WebSocket chat connection to front-end 211) and event streaming by signing all calls with these credentials and store them in browser memory for further use until they expire.
For example, ingestion 410 may implement different connectors (e.g., software components that interact with or are deployed as agents) on a data source 401. Data source 401, may be various types of data storage, processing, messaging, streaming, or other information sources, internal to or external to provider network 200, as noted above. Different connectors may implemented different respective file interpreters, parsers, crawlers, or other features that can interpret and obtain information from data source 401 to include in an index. For example, ingestion 410 may extract both metadata descriptive of data objects (e.g., document-wide metadata describing author, title, publisher, etc.) and the data itself (e.g., as document text passages). Once obtained, the ingested data 412 may be provided to index generation 420 for index creation.
Index generation 420 may implement various indexing techniques in order to perform searches for data when performing a natural language task, as discussed below with regard to
A request to add a repository without indexing 404 may be performed by updating 405 the data repository metadata 430 (and may include schema information for searching/accessing the data repository). For example, the request may provide location information, such as a network address, access credentials, data format or other schema information, in order to allow a data retriever to obtain data for a retrieval pipeline when performing a natural language task, as discussed below.
Similar to data connectors discussed above, different data retrievers 530 may implement respective interface components for generating and sending data retrieval requests to respective data repositories. For example, one type of data retriever 530 can read data from one type of index and another type of data retriever 530 can read data from another type of index (or external data repository).
Requests to invoke selected retrievers 506, may cause retrieval request handler 510 to dispatch queries 514 to the selected data retrievers 530 which may access 532 data in data repository (ies) 540. The schema information for some data repositories may be known (as it may have been generated when ingesting an indexing the data repository). For other data repositories which are external or otherwise not indexed/ingested, schema information (which may be provided as part of an add retriever request 501) may be used to search and return relevant data. As discussed below with regard to
A natural language request for a natural language task may be received, as indicated at 604. Task orchestration workflow 600 may implement conversation history 610. Conversation history 610 may obtain (if any) past conversations in order to perform decontextualization. For example, a user identifier and/or session identifier, may be used to perform a query/search on session store 220 for other requests performed for an end user of the generative application. A number of past sessions may be obtained (if any exists). The number may, in some embodiments, be determined according to a window of past conversations, turns, or other tasks, out of a larger number of stored conversations, turns, or tasks (e.g., n most recent conversations). The conversation data may be obtained and provided for further processing. If no conversation history exists, then an entry, data structure, or file may be created to store conversation history (including the current natural language request and task 602).
Intent classification model 620, similar to intent classification model 122 in
Application principal store 636 may be used to provide local user credentials or information to be used when retrieving data at data retrieval 634 (according to the techniques discussed below with regard to
In various embodiments, prompt generation 640 may implement a rules-based prompt generator which, according to a classification type, may generate a prompt (e.g., by completed a corresponding prompt template for each classification type) with the request and, if applicable, relevant data retrieved at pipeline 630 and rewritten request at 632. Generative machine learning model 650 may be trained to generate natural language responses to generated prompts at 540. In some embodiments, generative machine learning model 650 may be an LLM, including a privately developed or maintained Foundation Model (FM), which may use millions or billions of parameters in order to generate a response to the prompt. As part of the prompt, a requirement may be included to use the provided relevant data (retrieved via pipeline 630) so that generative machine learning model 650 may not return a response that hallucinates. Generative machine learning model 650 may be hosted as part of natural language generative application service, or hosted as a separate service of provider network 200. In some embodiments, generative application creation may support selecting a particular generative machine learning model out of multiple available models, including ones hosted externally to provider network 200.
A result of generative language model 650 may then be evaluated 660 for completion (e.g., last part of a multi-part question, or validation check to determine whether the result is valid (if not an error or other failure indication may be sent)). For example, natural language task orchestration 213 may track the number of parts of a task completed and return to earlier stages in workflow 630 to perform additional stages (e.g., based on the output of a prior part, or not based on prior output).
In some embodiments, sources may be attributed 670 for retrieved data used to generate the result. For example, as discussed above, annotations or other indications of the retrieved documents (e.g., based on document-wide metadata from which retrieved document passages are obtained) may be used to annotate the response. In some embodiments, an additional machine learning model trained to detect profane or other in appropriate content may be invoked on the result to ensure that the result is not invalid for inappropriate content. In some embodiments, a response 604 indicating that the question cannot be answered (e.g., due to inappropriate result or lack of relevant data to provide from retrieval pipeline) may be sent. Otherwise, response 604 may be sent based on the generated response from generative machine learning model 650.
As discussed above, in some embodiments, generative natural language service may help to enforce access restrictions when accessing data repositories so that end users do not obtain access to information for which they are not authorized to access (e.g., as multiple different end users of a generative application may have varying levels of access rights).
Data access management 217 may provide principal information in order to determine which data may be accessed when performing a natural language task. Data crawlers 732 (which may be implemented as part of ingestion 410), may crawl data in data source(s) 720. Data may be obtained and formatted, and stored, as indicated at 742, in indexed data repositor (ies) 740. Additionally, access control data 744 (e.g., access control lists of users for accessing individual documents) may be obtained by data crawler(s) 732.
Identity crawler(s) 734 may be implemented (e.g., as part of data access management 217) and access data source(s) 720 to obtain local user information and local group information (which may be obtained or provided by data source(s) 720 by accessing data source(s) identity management 710. Identity crawler(s) 734 may be used to create principal store 750 for an application. Service user(s) 752 may be created that are service-level, aggregating different local users (for the same user) to corresponding source(s) 753) and thus may act as a global federated user identity for end users of a generative application. Group membership(s) 754 of the service user 752 may also be stored. Service group(s) 756 may also be stored, indicating the group user(s) 757 that are in the group. In some embodiments, maintenance operations to update application principal data store 750 based on user changes may be performed (e.g., by listening for or receiving notifications of user changes from data source(s) 720).
The techniques discussed above may improve access control performance in various scenarios. For example, an application environment may use data repositories that are accessible using different identity providers and multiple applications (which can serve as data sources) linked to the identity providers. In such scenarios, a user can be present in both identify providers, resulting in a different identifier for each different identity provider. Moreover, each application may also have its own user identifier local to that application for the user, resulting in multiple user-identity and user-local user connections. These connections may be captured in application principal store, allowing for data retrieval to identify and use the appropriate local user restrictions based on service user's local user mapping 753, group membership 754, and our group user(s) 757 information with respect to access control data 744 maintained and enforced for different indexed data repositories.
Although
Various different systems and devices may implement the various methods and techniques described below, either singly or working together. For example, a natural language generative application service such as described above with regard to
As indicated at 810, a natural language request may be received via an interface of a generative machine learning service to perform a natural language task for a generative natural language application using one or more data repositories associated with the generative natural language application, in some embodiments. For example, a hosted or non-hosted generative application may send a request to an interface of the generative machine learning service (e.g., via an API) to perform the natural language task. The request may include or be identified with an existing session (e.g., an existing or ongoing chat) using network communication features, such as tokens and/or cookies, and utilizing bi-directional communication protocols, in some embodiments.
As indicated at 820, a classification machine learning model, trained to determine intents of natural language requests, may be caused to determine an intent for the natural language request, in some embodiments. For example, as discussed above with regard to
As indicated at 830, a number of iterations of a retrieval pipeline may be determined to perform the natural language task of the natural language request based, at least in part, on the intent for the natural language request, in some embodiments. For example, different intents may not require data retrieval, whereas other intents (e.g., multi-part, instruction, query/question, keyword) may need one or more retrieval iterations. An orchestration system, such as natural language task orchestrator 213, may be implemented as part of the generative machine learning service, and may track the progress and ensure performance of the number of iterations determined based on the intent.
As indicated at 840, the natural language request may be processed through the retrieval pipeline according to the determine number of iterations, in some embodiments. As discussed in detail above with regard to
Using data (if obtained), a prompt may be generated and provided to a generative machine learning model. As indicated at 850, a response to the natural language request based, at least in part, on a result received from the generative machine learning model, may be returned via the interface of the generative machine learning service, in some embodiments. Other post-result processing may be performed, as discussed above, including source attribution, validation, appropriate response verification, or completeness of the task processing workflow may be performed, in some embodiments.
As indicated at 920, a service-level principal store may be created that maps service users to local users and local groups found in the local user information and local group information, in some embodiments. For example, service users may be treated as a global, federated user mapping for end users of a generative application. In this way, service-level principal store can provide local user information to enforce access controls across a number of different data repositories that may be involved in performing a natural language task, as discussed above.
As indicated at 930, the service-level principal store may be provided for enforcing local access control at data repositories that store data from the data sources when performing natural language requests from the natural language generative application, in some embodiments. For example, as indicated in
The methods described herein may in various embodiments be implemented by any combination of hardware and software. For example, in one embodiment, the methods may be implemented by a computer system (e.g., a computer system as in
Embodiments of intent classification for executing a retrieval augmented generation pipeline for natural language tasks using a generative machine learning model as described herein may be executed on one or more computer systems, which may interact with various other devices. One such computer system is illustrated by
In the illustrated embodiment, computer system 1000 includes one or more processors 1010 coupled to a system memory 1020 via an input/output (I/O) interface 1030. Computer system 1000 further includes a network interface 1040 coupled to I/O interface 1030, and one or more input/output devices 1050, such as cursor control device 1060, keyboard 1070, and display(s) 1080. Display(s) 1080 may include standard computer monitor(s) and/or other display systems, technologies or devices. In at least some implementations, the input/output devices 1050 may also include a touch- or multi-touch enabled device such as a pad or tablet via which a user enters input via a stylus-type device and/or one or more digits. In some embodiments, it is contemplated that embodiments may be implemented using a single instance of computer system 1000, while in other embodiments multiple such systems, or multiple nodes making up computer system 1000, may host different portions or instances of embodiments. For example, in one embodiment some elements may be implemented via one or more nodes of computer system 1000 that are distinct from those nodes implementing other elements.
In various embodiments, computer system 1000 may be a uniprocessor system including one processor 1010, or a multiprocessor system including several processors 1010 (e.g., two, four, eight, or another suitable number). Processors 1010 may be any suitable processor capable of executing instructions. For example, in various embodiments, processors 1010 may be general-purpose or embedded processors implementing any of a variety of instruction set architectures (ISAs), such as the x86, PowerPC, SPARC, or MIPS ISAs, or any other suitable ISA. In multiprocessor systems, each of processors 1010 may commonly, but not necessarily, implement the same ISA.
In some embodiments, at least one processor 1010 may be a graphics processing unit. A graphics processing unit or GPU may be considered a dedicated graphics-rendering device for a personal computer, workstation, game console or other computing or electronic device. Modern GPUs may be very efficient at manipulating and displaying computer graphics, and their highly parallel structure may make them more effective than typical CPUs for a range of complex graphical algorithms. For example, a graphics processor may implement a number of graphics primitive operations in a way that makes executing them much faster than drawing directly to the screen with a host central processing unit (CPU). In various embodiments, graphics rendering may, at least in part, be implemented by program instructions configured for execution on one of, or parallel execution on two or more of, such GPUs. The GPU(s) may implement one or more application programmer interfaces (APIs) that permit programmers to invoke the functionality of the GPU(s). Suitable GPUs may be commercially available from vendors such as NVIDIA Corporation, ATI Technologies (AMD), and others.
System memory 1020 may store program instructions and/or data accessible by processor 1010. In various embodiments, system memory 1020 may be implemented using any suitable memory technology, such as static random access memory (SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type of memory. In the illustrated embodiment, program instructions and data implementing desired functions, such as those described above are shown stored within system memory 1020 as program instructions 1025 and data storage 1035, respectively. In other embodiments, program instructions and/or data may be received, sent or stored upon different types of computer-accessible media or on similar media separate from system memory 1020 or computer system 1000. Generally speaking, a non-transitory, computer-readable storage medium may include storage media or memory media such as magnetic or optical media, e.g., disk or CD/DVD-ROM coupled to computer system 1000 via I/O interface 1030. Program instructions and data stored via a computer-readable medium may be transmitted by transmission media or signals such as electrical, electromagnetic, or digital signals, which may be conveyed via a communication medium such as a network and/or a wireless link, such as may be implemented via network interface 1040.
In one embodiment, I/O interface 1030 may coordinate I/O traffic between processor 1010, system memory 1020, and any peripheral devices in the device, including network interface 1040 or other peripheral interfaces, such as input/output devices 1050. In some embodiments, I/O interface 1030 may perform any necessary protocol, timing or other data transformations to convert data signals from one component (e.g., system memory 1020) into a format suitable for use by another component (e.g., processor 1010). In some embodiments, I/O interface 1030 may include support for devices attached through various types of peripheral buses, such as a variant of the Peripheral Component Interconnect (PCI) bus standard or the Universal Serial Bus (USB) standard, for example. In some embodiments, the function of I/O interface 1030 may be split into two or more separate components, such as a north bridge and a south bridge, for example. In addition, in some embodiments some or all of the functionality of I/O interface 1030, such as an interface to system memory 1020, may be incorporated directly into processor 1010.
Network interface 1040 may allow data to be exchanged between computer system 1000 and other devices attached to a network, such as other computer systems, or between nodes of computer system 1000. In various embodiments, network interface 1040 may support communication via wired or wireless general data networks, such as any suitable type of Ethernet network, for example; via telecommunications/telephony networks such as analog voice networks or digital fiber communications networks; via storage area networks such as Fibre Channel SANs, or via any other suitable type of network and/or protocol.
Input/output devices 1050 may, in some embodiments, include one or more display terminals, keyboards, keypads, touchpads, scanning devices, voice or optical recognition devices, or any other devices suitable for entering or retrieving data by one or more computer system 1000. Multiple input/output devices 1050 may be present in computer system 1000 or may be distributed on various nodes of computer system 1000. In some embodiments, similar input/output devices may be separate from computer system 1000 and may interact with one or more nodes of computer system 1000 through a wired or wireless connection, such as over network interface 1040.
As shown in
Those skilled in the art will appreciate that computer system 1000 is merely illustrative and is not intended to limit the scope of the techniques as described herein. In particular, the computer system and devices may include any combination of hardware or software that can perform the indicated functions, including a computer, personal computer system, desktop computer, laptop, notebook, or netbook computer, mainframe computer system, handheld computer, workstation, network computer, a camera, a set top box, a mobile device, network device, internet appliance, PDA, wireless phones, pagers, a consumer device, video game console, handheld video game device, application server, storage device, a peripheral device such as a switch, modem, router, or in general any type of computing or electronic device. Computer system 1000 may also be connected to other devices that are not illustrated, or instead may operate as a stand-alone system. In addition, the functionality provided by the illustrated components may in some embodiments be combined in fewer components or distributed in additional components. Similarly, in some embodiments, the functionality of some of the illustrated components may not be provided and/or other additional functionality may be available.
Those skilled in the art will also appreciate that, while various items are illustrated as being stored in memory or on storage while being used, these items or portions of them may be transferred between memory and other storage devices for purposes of memory management and data integrity. Alternatively, in other embodiments some or all of the software components may execute in memory on another device and communicate with the illustrated computer system via inter-computer communication. Some or all of the system components or data structures may also be stored (e.g., as instructions or structured data) on a computer-accessible medium or a portable article to be read by an appropriate drive, various examples of which are described above. In some embodiments, instructions stored on a non-transitory, computer-accessible medium separate from computer system 1000 may be transmitted to computer system 1000 via transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as a network and/or a wireless link. Various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the foregoing description upon a computer-accessible medium. Accordingly, the present invention may be practiced with other computer system configurations.
It is noted that any of the distributed system embodiments described herein, or any of their components, may be implemented as one or more web services. For example, leader nodes within a data warehouse system may present data storage services and/or database services to clients as network-based services. In some embodiments, a network-based service may be implemented by a software and/or hardware system designed to support interoperable machine-to-machine interaction over a network. A network-based service may have an interface described in a machine-processable format, such as the Web Services Description Language (WSDL). Other systems may interact with the web service in a manner prescribed by the description of the network-based service's interface. For example, the network-based service may define various operations that other systems may invoke, and may define a particular application programming interface (API) to which other systems may be expected to conform when requesting the various operations.
In various embodiments, a network-based service may be requested or invoked through the use of a message that includes parameters and/or data associated with the network-based services request. Such a message may be formatted according to a particular markup language such as Extensible Markup Language (XML), and/or may be encapsulated using a protocol such as Simple Object Access Protocol (SOAP). To perform a web services request, a network-based services client may assemble a message including the request and convey the message to an addressable endpoint (e.g., a Uniform Resource Locator (URL)) corresponding to the web service, using an Internet-based application layer transfer protocol such as Hypertext Transfer Protocol (HTTP).
In some embodiments, web services may be implemented using Representational State Transfer (“RESTful”) techniques rather than message-based techniques. For example, a web service implemented according to a RESTful technique may be invoked through parameters included within an HTTP method such as PUT, GET, or DELETE, rather than encapsulated within a SOAP message.
The various methods as illustrated in the FIGS. and described herein represent example embodiments of methods. The methods may be implemented in software, hardware, or a combination thereof. The order of method may be changed, and various elements may be added, reordered, combined, omitted, modified, etc.
Various modifications and changes may be made as would be obvious to a person skilled in the art having the benefit of this disclosure. It is intended that the invention embrace all such modifications and changes and, accordingly, the above description to be regarded in an illustrative rather than a restrictive sense.