The present disclosure relates generally to machine learning (ML), and more particularly to systems for processing complex or compound queries using large language models (LLMs).
ML language models, including large language models, are commonly used to generate responses to queries. In some cases, complex or compound queries can be addressed by decomposing queries into a series of separate steps or tasks performed sequentially by a language model. In such systems, the language model used is preferably a large language model trained on a diverse range of inputs for many purposes, so as to offer versatility needed to identify and execute steps or tasks of any type needed to address the complex or compound query or prompt.
This disclosure presents a method of processing a compound prompt. This method includes mapping a domain to each of several distinct machine learning models (MLMs) and decomposing the compound prompt into a plan having several steps. For each step, one of the MLMs is selected by matching the step to a corresponding mapped domain, and a language output is generated using the selected MLM. These outputs are integrated into a syntactically and semantically coherent final output using a large language model.
This disclosure also presents a system for generating a response to a complex prompt. This system includes a manager and several specialized large language models (LLMs), each having a corresponding domain of specialization. The manager includes a planner, a selection module, and an integration module. The planner is configured to decompose the complex prompt into a plan having multiple steps. The selection module is configured to select one of the plurality of specialized LLMs to execute each of the plurality of steps. The integration module is configured to generate a language output responsive to the complex prompt from outputs of each of the selected ones of the plurality of specialized LLMs.
The present summary is provided only by way of example, and not limitation. Other aspects of the present disclosure will be appreciated in view of the entirety of the present disclosure, including the entire text, claims, and accompanying figures.
While the above-identified figures set forth one or more embodiments of the present disclosure, other embodiments are also contemplated, as noted in the discussion. In all cases, this disclosure presents the invention by way of representation and not limitation. It should be understood that numerous other modifications and embodiments can be devised by those skilled in the art, which fall within the scope and spirit of the principles of the invention. The figures may not be drawn to scale, and applications and embodiments of the present invention may include features and components not specifically shown in the drawings.
The present disclosure presents methods and systems for processing complex or compound queries or prompts using a system including multiple separately specialized large language models. The term “compound prompt” can refer to a prompt explicitly including multiple separate tasks or steps, e.g., “(1) identify the three highest-selling jazz musicians of the 1970s, and then (2) generate a report comparing the musical styles of these three musicians.” The term “complex prompt” can refer more generally to any prompt requiring multiple steps to process, either implicitly or explicitly, e.g., “generate a report comparing the musical styles of the three highest-selling jazz musicians of the 1970s.” Although distinctions can be drawn between complex and compound prompts, this disclosure will treat complex and compound prompts as equivalent for the purposes of explanation except where specifically stated.
Machine learning (ML) models are increasingly commonly used to process complex or compound queries. Complex prompts can, for example, demand retrieval of data from multiple or specialized sources, assembly of outputs (e.g. natural language, computer code, lists) from the retrieved data based on identified criteria, and/or subsequent processing of those outputs (e.g. transmission or archival to specified categories or locations and/or recipients). Existing solutions to complex prompt processing use large language model (LLM) planners to semantically decompose such prompts into multiple steps, then execute those steps either using the same LLM, or using native functions or databases. Tools for such approaches include Semantic Kernel and LangChain. Although many complex prompts most often involve sequential steps each contingent upon the results or outputs of previous steps, some complex prompts can also or instead include parallel tasks or steps, i.e., tasks or steps not contingent upon the results of some other tasks or steps.
The present disclosure presents systems and method for improving accuracy and reliability in complex prompt response through the use of multiple specialized machine learning models (MLMs). The present disclosure focuses principally on the use of specialized LLMs, but a person skilled in the art will understand that the approaches set forth below are mostly generalizable to other specialist MLMs, except where otherwise noted. In the most general case, this disclosure presents a system whereby general query processing is handled by a manager using a generalist or “jack-of-all trades” LLM or Meta-Language Model, but steps falling within specialized domains are advantageously handled by dedicated, specially-trained specialist LLMs selected by the manager.
Processor 102 is a logic-capable device configured to execute software, applications, and/or programs stored on memory 104. Examples of processor 102 can include one or more of a processor, a microprocessor, a controller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other equivalent discrete or integrated logic circuitry. Processor 102 can be entirely or partially mounted on one or more circuit boards.
Memory 104 is a machine-readable storage medium configured to store information including complex prompt handling system 110, and can most generally include both transitory and non-transitory storage media. In some examples, a computer-readable storage medium can include a non-transitory information storage medium. The term “non-transitory” can indicate that the storage medium is not embodied in a carrier wave or a propagated signal. In certain examples, a non-transitory storage medium can store data that can, over time, change (e.g., in RAM or cache). In some examples, memory 104 is or includes a temporary memory. As used herein, a temporary memory refers to a memory having a primary purpose that is not long-term storage. Memory 104, in some examples, is described as volatile memory. As used herein, a volatile memory refers to a memory that does not maintain stored contents when power to the memory 104 is turned off. Examples of volatile memories can include random access memories (RAM), dynamic random access memories (DRAM), static random access memories (SRAM), and other forms of volatile memories. In some examples, the memory is used to store program instructions for execution by the processor. The memory, in one example, is used by software or applications running on hardware computing device 100 (e.g., complex prompt handling system 110) to temporarily store information during program execution. Memory 104, in some examples, also includes one or more persistent computer-readable storage. Examples of such persistent, non-volatile storage elements can include, for example, magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
User interface 106 is an input and/or output device and/or software interface, and enables an operator, such as user 108, to control operation of and/or interact with software elements of computer hardware device 100. For example, user interface 106 can be configured to receive inputs from an operator and/or provide outputs. Most relevantly for the present disclosure, user interface 106 provides means for user 108 to supply complex prompt 120 to computer hardware device 100. User interface 106 can, for example, be a local interface including one or more of a sound card, a video graphics card, a speaker, a display device (such as a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, etc.), a touchscreen, a keyboard, a mouse, a joystick, or other type of device for facilitating input and/or output of information in a form understandable to users and/or machines. In other mutually consistent embodiments, user interface 106 can additionally or alternatively include means such as a wired or wireless transceiver for communicating with a remote user, e.g., through a remote device and/or over a local or wide area network.
Computer hardware system 100 receives complex prompt 120 from user 108 via user interface 106, as noted above. Complex prompt 120 is interred in memory 104 as an input of complex prompt handling system 110 (see
Network 140 can be a local area network (LAN) such as a network connecting computer hardware device 100 to local user device and/or databases, or can be wide-area network (WAN) suitable for connecting computer hardware device 100 to servers and other computing components that are separated by greater geographic distances than the devices of a local network. Network 140 can include network infrastructure for connecting devices separated by larger geographic distances. In at least some examples, network 140 is the internet. For illustrative purposes, computer hardware device 100 can communicate with remote devices 142 and 144 via network 140. More generally, any number of remote devices can be communicatively connected to computer hardware device 100 via one or multiple networks 140.
As illustrated in
Manager 200 (with its various constituent modules 202-210) forms the core of complex prompt handling system 110 and is responsible both for initial processing of complex prompts for final assembly of outputs responsive to those complex prompts. Specialist models 220 (used herein to refer to any or all specialist models 220a-n) are MLMs, and in most embodiments specifically LLMs, that are trained for competence in specific domains. Advantageously, manager 200 delegates specific tasks necessary for the execution of each complex prompt 120 to appropriate specialist models 22, thereby providing advantages from specialized training (efficiency, reliability, reduced hallucination, etc.) at each step, in a generalist overall system.
As noted above, specialist models 200 are MLMs such as LLMs that are trained for efficiency and reliability within specific domains. As illustrative examples, specialist model 220a can be a model fine-tuned to scrape data from websites, specialist model 220b can be a model trained to generate emails to match a style of a particular individual (i.e., from training data including a set of that person's sent emails), and specialist module 220d can be a MLM trained to perform chemical reaction mathematics. Specialist models 220 need not have anything in common with each other save their accessibility to manager 200. In
As noted above, manager 200 includes planner 202, selection module 204, integration module 208, and model 210. Model 210 is an LLM trained to process natural language prompts. In various embodiments model 210 can be used by various functional components of manager 200, including planner 202, selection model 204, and/or integration module 208. Although planner 202, selection module 204, and integration module 208 are herein described separately below in terms of function for the purpose of explanation, in some embodiments many functions of planner 202, selection module 204, and/or integration module 208 may be performed by providing prompts or context injections to model 210, i.e., to a single shared generalist LLM used by manager 200. In other embodiments, however, manager 200 can include multiple models 210, each dedicated specifically to a subset of modules 202, 204, and/or 208.
Planner 202 is a semantic processing module capable of decomposing a complex prompt into a plurality of steps. Planner 202 can, for example, be a planner such as used in conventional systems such as Semantic Kernel or LangChain. In the most general case, planner 202 can be any suitable natural language processing (NLP) agent capable of identifying a plurality of actionable tasks (i.e., steps) for the resolution of complex prompt 120. Planner 202 can, for example, make use of model 210 for generative production of a response to complex prompt 120 that identifies these actionable tasks. In some embodiments planner 202 can immediately output a plan (i.e., several steps) in direct response to the complex prompt. In other embodiments planner 202 can assemble the plan over several iterations, e.g., by identifying an initial task or set of tasks insufficient to completely address the complex prompt, then supplementing this partial plan with an additional step or steps in response to completion of the initial task or tasks.
Selection module 204 is responsible for delegation of tasks identified via planner 202 to specific models. More specifically, selection module 204 identifies a best model for the execution of each task or step from all models (210, 220a-n) available within complex prompt handling system 110, and provides a prompt (either generated by planner 202 as a part of the plan, or generated by selection module 204 based on the plan and model record 206; see below) corresponding to the task in question to the selected model. Selection module 204 can include model record 206 that maps a domain to each specialist model 220. Model record 206 can, in some embodiment, consist of a subject-matter specialization identification for each specialist model 220, with selection module 204 performing intent identification on the each via model 210 to identify a specialist model 220 having a subject-matter specialization substantially matching that intent. In some such embodiments, model 210 can be trained in model selection through provision of a large number of (step/task) prompts labeled as suited for a particular specialist model. In alternative embodiments, functions described herein by reference to planner 202 and selection module 204 can be performed inseparably based on context of complex prompt 120 using a meta-language model (i.e., wherein model 210 is a meta-language model) trained to designate a single specialist model 220 as a part of each plan step generated from complex prompt 120.
Manager 200 delegates execution of a particular task or step to a selected specialist model 220 by providing a prompt corresponding to that task or step to the selected specialist model 220. In some embodiments, model record 206 can, in addition to mapping domains of each specialist model 220, also identify prompts rules, formats, or templates suitable for each or some specialist models 220, e.g., for context injection based on the designated model and the specific task, or complex prompt 120. More generally, prompts provided to models 210 and 220 can include retrieval automated generation (RAG) or other context injection to reduced hallucination or otherwise constrain outputs based on templates retrieved from or data validated through outside data storage, e.g., lists and/or vector and/or relational databases either stored in memory 104 or retrieved from other devices 130, 142, or 144. This context injection can be provided by manager 200 based on operation of selection module 204, or can be sourced externally to complex prompt handling system 110 based on the nature of prompt provided by selection module 204 to the selected specialist model.
Specialist models 220 advantageously offer improved performance within their specialized domains over using a generalist (general-purpose, jack-of-all-trades) model, but may be incapable outside of those specialized domains. Tasks for which selection module 204 can identify no appropriate specialist model 220 are handled by model 210, or by another generalist model. Specialist models 220 can have domains of varying breadth. In some embodiments, complex prompt handling system 110 can include both specialist models with non-overlapping domains, and specialist models with overlapping domains, e.g., of different scope.
Integration module 208 is a natural language processing module disposed to generate a singular output responsive to the complex prompt based on the outputs of the steps of the plan generated by planner 202, as executed by specialist models 220 (and in some instances model 210) per delegation by selection module 204. Integration module 208 can, for example, include its own specialized LLM trained to aggregate these various model outputs into a semantically and syntactically coherent single output without introducing hallucinations or errors, or omitting information provided by the various specialist models 220. Alternatively, model 210 can be a generalist LLM (as described above) capable of performing this function using prompts generated by integration module 208 from the aforementioned outputs of specialist models 220 (and in some instances model 210), i.e. such that the same model 210 provides the trained machine learning backbone of integration module 208 and planner 202, selection module 204, or both. Integration module 208 can in some embodiments receive inputs from all designated specialist models 220 used in handling steps identified by planner. In other embodiments, where outputs of some specialist models 220 are used only to provide inputs to other specialist models 220 (see
In step 302, the training data is generated. For training the computer-implemented machine learning model(s) used in complex prompt handling system 110, training data includes domain specific information with example inputs labeled (i.e. mapped to or tagged with) example outputs. Data can be labeled entirely manually or, to reduce labor, can be labeled in a semi-automated fashion, e.g. using clustering or analogy to manually labeled data. Each specialist model 220 is trained on different training data, although in some embodiments some specialist models 220 can be trained on training data that is a subset of or overlaps with training data used to train other specialist models 220.
In step 304, the labeled data is used to train each computer-implemented machine learning model (i.e. models 210, 220) to produce appropriate outputs given inputs within its domain. Assuming models operate within their domains, training in broader domains, i.e. of models intended for more general use, will produce less reliable or accurate outputs than narrower training, i.e. of more specialized models, for many applications. This is generally the case not only when conserving overall volume of training data (i.e. such that a model having a narrower domain has a higher density of training data within that domain), but also when conserving density of training data (i.e. where a comparatively less specialized model is trained with more data, but the subset of that training data corresponding to the domain of a narrower model is analogous to the entirety of training data used to train the narrower model). In other words, the introduction of out-of-domain training data to broaden or generalize model competence can, within some domains, produce less reliable or accurate outputs in-domain. As a consequence, specialist models 220 of various scopes can be useful so long as selection module 204 is capable of delegating tasks or steps intelligently.
As used herein, “training” a computer-implemented machine learning model refers to any process by which parameters, hyper parameters, weights, and/or any other value related model accuracy are adjusted to improve the fit of the computer-implemented machine learning model to the training data. The labeled data can be transformed by, for example, one or more programs and/or one or more other trained machine learning models before it is used for training in step 304.
In step 306, the trained computer-implemented machine learning model is tested with domain-specific test data. This test data is unlabeled data that can be used to qualify and/or quantify performance of the trained computer-implemented machine learning model. More specifically, a human or machine operator can evaluate the performance of the machine learning model by evaluating the fit of the model to the test data. Step 306 can be used to determine, for example, whether the machine learning model was overfit to the labeled data during model training in step 304.
As depicted in
Training method 300 can advantageously be used to train any machine learning model described herein. More generally, the systems and methods disclosed herein advantageously allow for the training and use of machine learning models that can be used for a general-purpose model (e.g. model 210) as well as for domain-specific specialist models (e.g. specialist models 220) by varying the scope of training of test data.
In some embodiments, model 210 can be a general purpose model produced by method 300, and at least some of specialized models 220 can be produced from model 210 via fine-tuning and/or transfer learning from model 210. In other embodiments, at least some specialized models 220 can be trained entirely independently of model 210, or even entirely independently of any more general model. In either case, the domain of training and testing data used for training specialist models 220 is narrower, i.e. more specialized, than the domain of training data used to train model 210.
First, complex prompt 120 is received by planner 202. (Step 702). Planner 202 generates plan 504 generally as described above with respect to
The systems and method set forth above advantageously allow specialized machine learning models to be leveraged to address specific parts of steps of complex prompt responses. This is accomplished by separately training multiple specialist models, and by additionally training selection and integration modules interface with this library of available models. Specifically, the selection module allows for precise and improvable selection of the best-suited available model to handle specific tasks on a step-by-step basis, and the integration module assembles outputs from selected models into a coherent final output responsive to the complex prompt.
The following are non-exclusive descriptions of possible embodiments of the present invention.
A method of processing a compound prompt, the method comprising: decomposing the compound prompt into a plan having a plurality of steps via a planner module instantiated in machine-readable memory and operable by a processor; mapping a domain to each of a plurality of distinct machine learning models; for each of the plurality of steps: selecting one of the plurality of distinct machine learning models by matching the step to the corresponding mapped domain; and generating a language output using the selected one of the distinct machine learning models; and integrating the language outputs of each of the plurality of steps into a syntactically and semantically coherent final output via an integration module utilizing a large language model.
The method of the preceding paragraph can optionally include, additionally and/or alternatively, any one or more of the following features, configurations and/or additional components:
A further embodiment of the foregoing method, further comprising training each of the plurality of distinct machine learning models using different training data specific to its respective domain.
A further embodiment of the foregoing method, wherein at least a subset of the plurality of distinct machine learning models are trained entirely separately from others of the plurality of distinct machine learning models, without overlapping training data.
A further embodiment of the foregoing method, wherein at least a subset of the plurality of distinct machine learning models are specialized in a respective domain via fine-tuning or transfer learning.
A further embodiment of the foregoing method, wherein mapping the domain to each of the plurality of distinct machine learning models comprises mapping a subject-matter specialization to each of the plurality of distinct machine learning models.
A further embodiment of the foregoing method, wherein mapping the domain to each of the plurality of distinct machine learning models comprises training a selection model via machine learning to map compound prompts to one or more of the plurality of distinct machine learning models.
A further embodiment of the foregoing method, wherein generating a language output using the selected one of the distinct machine learning models comprises providing the selected one of the plurality of distinct machine learning models with at least a portion of the complex prompt, one of the language outputs from another of the plurality of steps, or both.
A further embodiment of the foregoing method, wherein at least a subset of the plurality of machine learning models are large language models.
A further embodiment of the foregoing method, wherein the large language model is one of the plurality of distinct machine learning models having a corresponding generalist domain.
A further embodiment of the foregoing method, wherein the selection of one of the plurality of distinct machine learning models for each step is performed by the large language model.
A further embodiment of the foregoing method, wherein decomposing the compound prompt into a plan having a plurality of steps comprises identifying the plurality of steps, ordering the plurality of steps, and identifying outputs from at least one of the plurality of steps to be received as inputs by another of the plurality of steps.
A further embodiment of the foregoing method, wherein at least some of the plurality of steps are executed sequentially.
A system for generating a response to a complex prompt, the system comprising: an input device configured to receive the complex prompt; a logic processor; machine-readable memory; a plurality of specialized large language models (LLMs) instantiated in the machine-readable memory, each of the specialized LLMs having a corresponding domain of specialization; and a manager comprising: a planner instantiated in the machine-readable memory and operable via the logic processor to decompose the complex prompt into a plan having a plurality of steps; a selection module instantiated in the machine-readable memory and operable via the logic processor to select one of the plurality of specialized LLMs to execute each of the plurality of steps; and an integration module instantiated in the machine-readable memory and operable via the logic processor to generate a language output responsive to the complex prompt from outputs of each of the selected ones of the plurality of specialized LLMs.
The system of the preceding paragraph can optionally include, additionally and/or alternatively, any one or more of the following features, configurations and/or additional components:
A further embodiment of the foregoing system, wherein the manager further comprises a generalist LLM, and wherein the integration module generates the language output from outputs of at least a subset of the selected ones of the plurality of specialized large language models using the generalist large language model.
A further embodiment of the foregoing system, wherein the generalist LLM is a Meta-Language Model.
A further embodiment of the foregoing system, wherein the selection module maps each of the plurality of steps to one of the plurality of specialized large language models using the generalist large language model.
A further embodiment of the foregoing system, wherein the selection module comprises a model record identifying the corresponding domain of specialization and at least one of an input format and an output format for each of the plurality of specialized large language models.
A further embodiment of the foregoing system, further comprising a database communicatively coupled with at least one of plurality of specialized large language models to provide context injection for that respective specialized large language model.
A further embodiment of the foregoing system, wherein each of the plurality of specialized large language models is trained for its respective domain using different training data and/or parameters than all others of the plurality of specialized large language models.
A further embodiment of the foregoing system, wherein at least a subset of the specialized large language models are trained via fine tuning, transfer learning, or both.
Any relative terms or terms of degree used herein, such as “substantially”, “essentially”, “generally”, “approximately” and the like, should be interpreted in accordance with and subject to any applicable definitions or limits expressly stated herein. In all instances, any relative terms or terms of degree used herein should be interpreted to broadly encompass any relevant disclosed embodiments as well as such ranges or variations as would be understood by a person of ordinary skill in the art in view of the entirety of the present disclosure, such as to encompass ordinary manufacturing tolerance variations, incidental alignment variations, alignment or shape variations induced by thermal, rotational or vibrational operational conditions, and the like.
While the invention has been described with reference to an exemplary embodiment(s), it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment(s) disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.
The present application claims priority to U.S. provisional patent application Ser. No. 63/543,454 by S. Joynt, filed Oct. 10, 2023 and entitled “COMPOUND PROMPT PROCESSING USING MULTIPLE INTEGRATED DOMAIN-SPECIALIZED LANGUAGE MODELS.”
| Number | Date | Country | |
|---|---|---|---|
| 63543454 | Oct 2023 | US |