COMPOUND PROMPT PROCESSING USING MULTIPLE INTEGRATED DOMAIN-SPECIALIZED LANGUAGE MODELS

FIELD OF THE INVENTION

The present disclosure relates generally to machine learning (ML), and more particularly to systems for processing complex or compound queries using large language models (LLMs).

BACKGROUND

ML language models, including large language models, are commonly used to generate responses to queries. In some cases, complex or compound queries can be addressed by decomposing queries into a series of separate steps or tasks performed sequentially by a language model. In such systems, the language model used is preferably a large language model trained on a diverse range of inputs for many purposes, so as to offer versatility needed to identify and execute steps or tasks of any type needed to address the complex or compound query or prompt.

SUMMARY

This disclosure presents a method of processing a compound prompt. This method includes mapping a domain to each of several distinct machine learning models (MLMs) and decomposing the compound prompt into a plan having several steps. For each step, one of the MLMs is selected by matching the step to a corresponding mapped domain, and a language output is generated using the selected MLM. These outputs are integrated into a syntactically and semantically coherent final output using a large language model.

This disclosure also presents a system for generating a response to a complex prompt. This system includes a manager and several specialized large language models (LLMs), each having a corresponding domain of specialization. The manager includes a planner, a selection module, and an integration module. The planner is configured to decompose the complex prompt into a plan having multiple steps. The selection module is configured to select one of the plurality of specialized LLMs to execute each of the plurality of steps. The integration module is configured to generate a language output responsive to the complex prompt from outputs of each of the selected ones of the plurality of specialized LLMs.

The present summary is provided only by way of example, and not limitation. Other aspects of the present disclosure will be appreciated in view of the entirety of the present disclosure, including the entire text, claims, and accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a query processing system including a complex prompt handling system.

FIG. 2 is a schematic diagram of the complex prompt handling system of FIG. 1.

FIG. 3 is a method flowchart describing a simplified training method for generating language models of the prompt handling system of FIGS. 1 and 2 using guided machine learning.

FIG. 4 is a method flowchart describing methods of further training for generating language models of the query handling system of FIGS. 1 and 2 using transfer learning and/or fine tuning.

FIGS. 5 and 6 are system flowcharts illustrating mutually combinable versions of processes of operation of the complex prompt handling system of FIGS. 1 and 2.

FIG. 7 is a method flowchart illustrating a method of operation of the complex prompt handling system of FIGS. 1 and 2 as set forth in FIGS. 5 and 6.

While the above-identified figures set forth one or more embodiments of the present disclosure, other embodiments are also contemplated, as noted in the discussion. In all cases, this disclosure presents the invention by way of representation and not limitation. It should be understood that numerous other modifications and embodiments can be devised by those skilled in the art, which fall within the scope and spirit of the principles of the invention. The figures may not be drawn to scale, and applications and embodiments of the present invention may include features and components not specifically shown in the drawings.

DETAILED DESCRIPTION

The present disclosure presents methods and systems for processing complex or compound queries or prompts using a system including multiple separately specialized large language models. The term “compound prompt” can refer to a prompt explicitly including multiple separate tasks or steps, e.g., “(1) identify the three highest-selling jazz musicians of the 1970s, and then (2) generate a report comparing the musical styles of these three musicians.” The term “complex prompt” can refer more generally to any prompt requiring multiple steps to process, either implicitly or explicitly, e.g., “generate a report comparing the musical styles of the three highest-selling jazz musicians of the 1970s.” Although distinctions can be drawn between complex and compound prompts, this disclosure will treat complex and compound prompts as equivalent for the purposes of explanation except where specifically stated.

Machine learning (ML) models are increasingly commonly used to process complex or compound queries. Complex prompts can, for example, demand retrieval of data from multiple or specialized sources, assembly of outputs (e.g. natural language, computer code, lists) from the retrieved data based on identified criteria, and/or subsequent processing of those outputs (e.g. transmission or archival to specified categories or locations and/or recipients). Existing solutions to complex prompt processing use large language model (LLM) planners to semantically decompose such prompts into multiple steps, then execute those steps either using the same LLM, or using native functions or databases. Tools for such approaches include Semantic Kernel and LangChain. Although many complex prompts most often involve sequential steps each contingent upon the results or outputs of previous steps, some complex prompts can also or instead include parallel tasks or steps, i.e., tasks or steps not contingent upon the results of some other tasks or steps.

The present disclosure presents systems and method for improving accuracy and reliability in complex prompt response through the use of multiple specialized machine learning models (MLMs). The present disclosure focuses principally on the use of specialized LLMs, but a person skilled in the art will understand that the approaches set forth below are mostly generalizable to other specialist MLMs, except where otherwise noted. In the most general case, this disclosure presents a system whereby general query processing is handled by a manager using a generalist or “jack-of-all trades” LLM or Meta-Language Model, but steps falling within specialized domains are advantageously handled by dedicated, specially-trained specialist LLMs selected by the manager.

FIG. 1 is a schematic diagram of prompt processing system 10, an illustrative example of a system disposed to receive and process prompts, particularly complex natural language prompts, and produce syntactically and semantically coherent outputs responsive to those prompts. FIG. 1 depicts hardware computing device 100 with processor 102, memory 104, and user interface 106 accessible to user 108. Complex prompt handling system 110 is instantiated at least partly within memory 104, and executable via processor 102 to generate outputs responsive to complex prompt 120. In some embodiments prompt processing system 10 can also include local device 130 directly connected to hardware computing device 100, and network 140 further connecting remote devices 142 and 144 to hardware computing device 100 indirectly. Remote device 144 can include computer-executable software accessible via API 146.

FIG. 1 focuses on hardware components of prompt processing system 10, and is provided as an illustrative example of a general hardware system for processing complex prompts. Software logic for prompt processing system 10 is primarily discussed below with respect to subsequent figures. The elements presented in FIG. 1, particularly including local device 130, network 140, and remote devices 142 and 144, can be omitted or replaced with analogous hardware in different architectures without departing from the scope and spirit of the present disclosure. In particular, although complex prompt handling system 110 is illustrated as instantiated in memory 104 of hardware computing device 100, subcomponents of complex prompt handling system 110 (see FIG. 2 and accompanying description) can in other embodiments be distributed across multiple hardware devices, including but not limited to remote and/or local devices a illustratively shown in FIG. 1 (130, 142, 144). Similarly, although hardware computing device 100 is illustrated as directly accessible to user 108, embodiments wherein hardware computing device 100 and therefore complex prompt handling system 110 are only accessible via intervening devices or steps also fall within the scope of the present disclosure.

Processor 102 is a logic-capable device configured to execute software, applications, and/or programs stored on memory 104. Examples of processor 102 can include one or more of a processor, a microprocessor, a controller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other equivalent discrete or integrated logic circuitry. Processor 102 can be entirely or partially mounted on one or more circuit boards.

Memory 104 is a machine-readable storage medium configured to store information including complex prompt handling system 110, and can most generally include both transitory and non-transitory storage media. In some examples, a computer-readable storage medium can include a non-transitory information storage medium. The term “non-transitory” can indicate that the storage medium is not embodied in a carrier wave or a propagated signal. In certain examples, a non-transitory storage medium can store data that can, over time, change (e.g., in RAM or cache). In some examples, memory 104 is or includes a temporary memory. As used herein, a temporary memory refers to a memory having a primary purpose that is not long-term storage. Memory 104, in some examples, is described as volatile memory. As used herein, a volatile memory refers to a memory that does not maintain stored contents when power to the memory 104 is turned off. Examples of volatile memories can include random access memories (RAM), dynamic random access memories (DRAM), static random access memories (SRAM), and other forms of volatile memories. In some examples, the memory is used to store program instructions for execution by the processor. The memory, in one example, is used by software or applications running on hardware computing device 100 (e.g., complex prompt handling system 110) to temporarily store information during program execution. Memory 104, in some examples, also includes one or more persistent computer-readable storage. Examples of such persistent, non-volatile storage elements can include, for example, magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.

User interface 106 is an input and/or output device and/or software interface, and enables an operator, such as user 108, to control operation of and/or interact with software elements of computer hardware device 100. For example, user interface 106 can be configured to receive inputs from an operator and/or provide outputs. Most relevantly for the present disclosure, user interface 106 provides means for user 108 to supply complex prompt 120 to computer hardware device 100. User interface 106 can, for example, be a local interface including one or more of a sound card, a video graphics card, a speaker, a display device (such as a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, etc.), a touchscreen, a keyboard, a mouse, a joystick, or other type of device for facilitating input and/or output of information in a form understandable to users and/or machines. In other mutually consistent embodiments, user interface 106 can additionally or alternatively include means such as a wired or wireless transceiver for communicating with a remote user, e.g., through a remote device and/or over a local or wide area network.

Computer hardware system 100 receives complex prompt 120 from user 108 via user interface 106, as noted above. Complex prompt 120 is interred in memory 104 as an input of complex prompt handling system 110 (see FIGS. 2-7), which is executed by processor 102 to produce a merged output advantageously leveraging the task-specific training of multiple domain-specialized LLMs.

Network 140 can be a local area network (LAN) such as a network connecting computer hardware device 100 to local user device and/or databases, or can be wide-area network (WAN) suitable for connecting computer hardware device 100 to servers and other computing components that are separated by greater geographic distances than the devices of a local network. Network 140 can include network infrastructure for connecting devices separated by larger geographic distances. In at least some examples, network 140 is the internet. For illustrative purposes, computer hardware device 100 can communicate with remote devices 142 and 144 via network 140. More generally, any number of remote devices can be communicatively connected to computer hardware device 100 via one or multiple networks 140.

As illustrated in FIG. 1, prompt processing system 10 can also include local device 130 and network 140 with remote devices 142 and 144. Devices 130, 142, and 144 are generalizations represent all hardware external to computer hardware device 100 that can be included in prompt processing system 10. Memory 104 and analogous memory situated on devices 130, 142, and/or 144 can contain retrievably housed data stored in a variety of storage formats or architectures, including as structured databases or application data, unstructured data (e.g., data lakes), or vector databases. In the most general case, remote devices 130, 142, and 144 can provide external locations for separate processors and with associated memory hosting remote data storage (e.g., databases or similar storage structures for context injection) or processing (e.g., of software including specialized LLMs, as discussed in detail below). In some examples devices 100, 130, 142, and 144 can be only a few hardware devices in a cloud architecture. In an opposite example, prompt processing system 10 can eschew devices 130, 142, and 144 and handle all data storage and processing locally within computer hardware device 100. In the most general case, software functions described with respect to FIGS. 2-7, including various language models, Meta-Language Models, LLMs, and other MLMs can be distributed across any useful number of separate hardware devices.

FIG. 2 is a schematic diagram of complex prompt handling system 110. As noted above, complex prompt handling system 110 is instantiated in memory 104 and executed via processor 102 on computer hardware device 100. As shown in FIG. 2, complex prompt handling system includes manager 200 and several specialist models 220a-n. Manager 200 is made up of multiple software modules, including planner 202, selection module 204 (with model record 206), integration module 208, and model 210. Manager 200 receives complex prompt 120 from user interface 106, decomposes this complex prompt into several steps, and delegates these steps as appropriate to specialist models 220a-n. The final results of these delegated tasks, directly and/or indirectly, are assembled into a syntactically and semantically coherent final output responsive to complex prompt 120. The components of complex prompt handling system 110 are illustrated and described with reference to FIG. 2, while the process of operation of complex prompt handling system 110 is illustrated and described by reference primarily to FIGS. 5-7, below. FIGS. 3 and 4 illustrate training methods for various models within complex prompt handling system 110, including model 210 and specialist models 220a-n. FIGS. 3-7 are described separately below, but elements illustrated in FIG. 2 are best understood in view of FIGS. 2-7, together.

Manager 200 (with its various constituent modules 202-210) forms the core of complex prompt handling system 110 and is responsible both for initial processing of complex prompts for final assembly of outputs responsive to those complex prompts. Specialist models 220 (used herein to refer to any or all specialist models 220a-n) are MLMs, and in most embodiments specifically LLMs, that are trained for competence in specific domains. Advantageously, manager 200 delegates specific tasks necessary for the execution of each complex prompt 120 to appropriate specialist models 22, thereby providing advantages from specialized training (efficiency, reliability, reduced hallucination, etc.) at each step, in a generalist overall system.

As noted above, specialist models 200 are MLMs such as LLMs that are trained for efficiency and reliability within specific domains. As illustrative examples, specialist model 220a can be a model fine-tuned to scrape data from websites, specialist model 220b can be a model trained to generate emails to match a style of a particular individual (i.e., from training data including a set of that person's sent emails), and specialist module 220d can be a MLM trained to perform chemical reaction mathematics. Specialist models 220 need not have anything in common with each other save their accessibility to manager 200. In FIG. 2, specialist models 220d and 220e are shown as situated on local device 130 and remote device 142 or 144, respectively. More generally, FIG. 2 illustrates that not all specialist models 220 need be situated on hardware computing device 100. In some instances specialist models may be proprietary third-party models accessed, e.g., through API 146, or first-party models retrieved or queried on as as-needed basis from separate or remote hardware. Specialist models 220 can, for example, be convolutional or recurrent neural network models. In alternative cases, some specialist models 220 can be random forest models or other decision-tree-like models. In all cases, however, specialist models 220 are models trained via machine learning from data specific to a particular domain (e.g., web scraping, email mimicry, or mathematics, to carry forward the previous examples). This training data can be wholly manually labeled (i.e., for supervised learning) or partially manually labelled (i.e., for semi-supervised learning).

As noted above, manager 200 includes planner 202, selection module 204, integration module 208, and model 210. Model 210 is an LLM trained to process natural language prompts. In various embodiments model 210 can be used by various functional components of manager 200, including planner 202, selection model 204, and/or integration module 208. Although planner 202, selection module 204, and integration module 208 are herein described separately below in terms of function for the purpose of explanation, in some embodiments many functions of planner 202, selection module 204, and/or integration module 208 may be performed by providing prompts or context injections to model 210, i.e., to a single shared generalist LLM used by manager 200. In other embodiments, however, manager 200 can include multiple models 210, each dedicated specifically to a subset of modules 202, 204, and/or 208.

Planner 202 is a semantic processing module capable of decomposing a complex prompt into a plurality of steps. Planner 202 can, for example, be a planner such as used in conventional systems such as Semantic Kernel or LangChain. In the most general case, planner 202 can be any suitable natural language processing (NLP) agent capable of identifying a plurality of actionable tasks (i.e., steps) for the resolution of complex prompt 120. Planner 202 can, for example, make use of model 210 for generative production of a response to complex prompt 120 that identifies these actionable tasks. In some embodiments planner 202 can immediately output a plan (i.e., several steps) in direct response to the complex prompt. In other embodiments planner 202 can assemble the plan over several iterations, e.g., by identifying an initial task or set of tasks insufficient to completely address the complex prompt, then supplementing this partial plan with an additional step or steps in response to completion of the initial task or tasks.

Selection module 204 is responsible for delegation of tasks identified via planner 202 to specific models. More specifically, selection module 204 identifies a best model for the execution of each task or step from all models (210, 220a-n) available within complex prompt handling system 110, and provides a prompt (either generated by planner 202 as a part of the plan, or generated by selection module 204 based on the plan and model record 206; see below) corresponding to the task in question to the selected model. Selection module 204 can include model record 206 that maps a domain to each specialist model 220. Model record 206 can, in some embodiment, consist of a subject-matter specialization identification for each specialist model 220, with selection module 204 performing intent identification on the each via model 210 to identify a specialist model 220 having a subject-matter specialization substantially matching that intent. In some such embodiments, model 210 can be trained in model selection through provision of a large number of (step/task) prompts labeled as suited for a particular specialist model. In alternative embodiments, functions described herein by reference to planner 202 and selection module 204 can be performed inseparably based on context of complex prompt 120 using a meta-language model (i.e., wherein model 210 is a meta-language model) trained to designate a single specialist model 220 as a part of each plan step generated from complex prompt 120.

Manager 200 delegates execution of a particular task or step to a selected specialist model 220 by providing a prompt corresponding to that task or step to the selected specialist model 220. In some embodiments, model record 206 can, in addition to mapping domains of each specialist model 220, also identify prompts rules, formats, or templates suitable for each or some specialist models 220, e.g., for context injection based on the designated model and the specific task, or complex prompt 120. More generally, prompts provided to models 210 and 220 can include retrieval automated generation (RAG) or other context injection to reduced hallucination or otherwise constrain outputs based on templates retrieved from or data validated through outside data storage, e.g., lists and/or vector and/or relational databases either stored in memory 104 or retrieved from other devices 130, 142, or 144. This context injection can be provided by manager 200 based on operation of selection module 204, or can be sourced externally to complex prompt handling system 110 based on the nature of prompt provided by selection module 204 to the selected specialist model.

Specialist models 220 advantageously offer improved performance within their specialized domains over using a generalist (general-purpose, jack-of-all-trades) model, but may be incapable outside of those specialized domains. Tasks for which selection module 204 can identify no appropriate specialist model 220 are handled by model 210, or by another generalist model. Specialist models 220 can have domains of varying breadth. In some embodiments, complex prompt handling system 110 can include both specialist models with non-overlapping domains, and specialist models with overlapping domains, e.g., of different scope.

Integration module 208 is a natural language processing module disposed to generate a singular output responsive to the complex prompt based on the outputs of the steps of the plan generated by planner 202, as executed by specialist models 220 (and in some instances model 210) per delegation by selection module 204. Integration module 208 can, for example, include its own specialized LLM trained to aggregate these various model outputs into a semantically and syntactically coherent single output without introducing hallucinations or errors, or omitting information provided by the various specialist models 220. Alternatively, model 210 can be a generalist LLM (as described above) capable of performing this function using prompts generated by integration module 208 from the aforementioned outputs of specialist models 220 (and in some instances model 210), i.e. such that the same model 210 provides the trained machine learning backbone of integration module 208 and planner 202, selection module 204, or both. Integration module 208 can in some embodiments receive inputs from all designated specialist models 220 used in handling steps identified by planner. In other embodiments, where outputs of some specialist models 220 are used only to provide inputs to other specialist models 220 (see FIG. 6, described below), integration module 208 can receive only a subset of the outputs of specialist model 220. In some instances integration module 208 can also receive the plan produced by planner 202. Inputs of integration module 208 can, for example, be aggregated into a LLM prompt using model 210. Although FIG. 2 illustrates a single integration module 208, some embodiments of complex prompt handling system 110 can include specialized integration modules suited to aggregating model outputs of particular types. Similarly, although integration module 208 is mainly described herein as a tool for generating the aforementioned singular output responsive to complex prompt 120, some embodiments of integration module(s) 208 can serve an intermediate function by structuring outputs of several specialist models 220 for use as prompts to other specialist models 220.

FIG. 3 presents training method 300, a simplified method for supervised or semi-supervised training of machine learning models such as models 210 and/or 220. Training method 300 includes the steps of generating training data (step 302), training the computer-implemented machine learning model with the training data (step 304), and testing the trained computer-implemented machine learning model with test data (step 306). Training method 300 is a method of machine learning that can be used to train any suitable computer-implemented machine learning model for use with complex prompt handling system 110 (see FIG. 2, discussed above) or method 700 (see FIG. 7, discussed below).

In step 302, the training data is generated. For training the computer-implemented machine learning model(s) used in complex prompt handling system 110, training data includes domain specific information with example inputs labeled (i.e. mapped to or tagged with) example outputs. Data can be labeled entirely manually or, to reduce labor, can be labeled in a semi-automated fashion, e.g. using clustering or analogy to manually labeled data. Each specialist model 220 is trained on different training data, although in some embodiments some specialist models 220 can be trained on training data that is a subset of or overlaps with training data used to train other specialist models 220.

In step 304, the labeled data is used to train each computer-implemented machine learning model (i.e. models 210, 220) to produce appropriate outputs given inputs within its domain. Assuming models operate within their domains, training in broader domains, i.e. of models intended for more general use, will produce less reliable or accurate outputs than narrower training, i.e. of more specialized models, for many applications. This is generally the case not only when conserving overall volume of training data (i.e. such that a model having a narrower domain has a higher density of training data within that domain), but also when conserving density of training data (i.e. where a comparatively less specialized model is trained with more data, but the subset of that training data corresponding to the domain of a narrower model is analogous to the entirety of training data used to train the narrower model). In other words, the introduction of out-of-domain training data to broaden or generalize model competence can, within some domains, produce less reliable or accurate outputs in-domain. As a consequence, specialist models 220 of various scopes can be useful so long as selection module 204 is capable of delegating tasks or steps intelligently.

As used herein, “training” a computer-implemented machine learning model refers to any process by which parameters, hyper parameters, weights, and/or any other value related model accuracy are adjusted to improve the fit of the computer-implemented machine learning model to the training data. The labeled data can be transformed by, for example, one or more programs and/or one or more other trained machine learning models before it is used for training in step 304.

In step 306, the trained computer-implemented machine learning model is tested with domain-specific test data. This test data is unlabeled data that can be used to qualify and/or quantify performance of the trained computer-implemented machine learning model. More specifically, a human or machine operator can evaluate the performance of the machine learning model by evaluating the fit of the model to the test data. Step 306 can be used to determine, for example, whether the machine learning model was overfit to the labeled data during model training in step 304.

As depicted in FIG. 3, steps 304 and 306 can be performed iteratively to improve the performance of the machine learning model. More specifically, if the fit of the model to the unlabeled data determined in step 306 is undesirable, step 306 can be repeated to further adjust the parameters, hyper parameters, weights, etc. of the model to improve the fit of the model to the test data. Step 306 can then be repeated with a new set of unlabeled test data to determine how the adjusted model fits the new set of unlabeled test data. If the fit continues to be undesirable, further iterations of steps 304 and 306 can be performed until the fit of the model becomes desirable.

Training method 300 can advantageously be used to train any machine learning model described herein. More generally, the systems and methods disclosed herein advantageously allow for the training and use of machine learning models that can be used for a general-purpose model (e.g. model 210) as well as for domain-specific specialist models (e.g. specialist models 220) by varying the scope of training of test data.

FIG. 4 illustrates training method 400, a simplified expansion of training method 300 to include fine tuning and/or transfer learning. Method 400 includes steps 302, 304, and 306 as described above with respect to FIG. 3, as well as parallel steps 402, 404, and 406 generally mirroring steps 302, 304, and 306. Method 300 can, for example, result in a trained model such as a neural network model, where training produces edge weights between nodes distributed across several network levels or depths. In one embodiment, FIG. 4 provides a method of fine-tuning this model by applying further training to this existing model according to substantially the same process set forth with respect to FIG. 3 (i.e. with steps 402, 404, and 406 generally qualitatively matching steps 302, 304, and 306), but with new training and testing data matching an altered domain. Fine-tuning of this kind can be used to improve the functioning of a broader (i.e. less specialized) model within a more specialized domain by providing a domain-constrained second set of labeled training data and testing the resulting model with more specialized test data. Alternatively or additionally, training method 400 can include transfer learning provided by freezing edge weights (step 408) of the model produced through iterations of step 304, and adding additional model layers adjustable through training in steps 402-406.

In some embodiments, model 210 can be a general purpose model produced by method 300, and at least some of specialized models 220 can be produced from model 210 via fine-tuning and/or transfer learning from model 210. In other embodiments, at least some specialized models 220 can be trained entirely independently of model 210, or even entirely independently of any more general model. In either case, the domain of training and testing data used for training specialist models 220 is narrower, i.e. more specialized, than the domain of training data used to train model 210.

FIGS. 5 and 6 depict processes 500 and 600, respectively, and FIG. 7 illustrates method 700. Method 700 is a method of operation of complex prompt handling system 110 mirroring processes 500 and 600. Processes 500 and 600 are nearly identical simplified maps of information flow within complex prompt handling system 110, and function broadly as described above with respect to FIG. 2. FIGS. 5-7 are described together.

First, complex prompt 120 is received by planner 202. (Step 702). Planner 202 generates plan 504 generally as described above with respect to FIG. 2. (Step 704). Plan 504 comprises a plurality of steps 506 (e.g. 506a, 506b, . . . , 506x). For each such task 506, selection module 204 identifies a model from among all available models (i.e. model 210 and specialist models 220) that is best suited to the task. (Step 708). Each task is then handled by the selected model 210/220 (step 710), producing a task result as output. (Step 712). This process continues so long as tasks from plan 504 remain (step 706). Model outputs as discussed above with respect to FIG. 2 are then provided to integration module 208 to integrate these outputs (step 714) into a unified response 508 to the complex prompt. Response 508 is then output to user 108 via user interface 106. (Step 714).

FIGS. 5 and 6 involve the same modules, and differ only in the flow of outputs. Specifically, FIG. 5 illustrates process 500 as involving parallel prompting of models 220, with outputs of specialists models 220a, 220d/e, and 220n (illustratively) responsive to tasks 506a, 506b, and 506x being provided directly and independently to integration module 208. By contrast, FIG. 6 illustrates process 600 as involving serial queries, with an output of specialist model 220a responsive to task 506a being provided as an input to specialist model 220d/e in order to handle task 506. Similarly, FIG. 6 illustrates the output of specialist model 220d/e responsive to the aforementioned inputs being provided as an input to specialist model 220n. In process 600, only the output of specialist model 220n is provided directly to integration module 208; outputs of specialist models 220a and 220d/e are only received by integration module indirectly.

FIGS. 5 and 6 provide simplified illustrations of parallel or serial task processing. In practice, method 700 can include both serial and parallel processing, with outputs corresponding to some tasks or steps being provided directly to integration module 208, and others being provided directly only to other models as inputs. Furthermore, data flows between modules and models of system 110 can branch or nest in more complex fashions than illustrated in FIGS. 5 and 6. Flows of data, e.g. of model outputs to manager 200 or specialist models 220, can be governed by plan 504. In some embodiments, a subset of tasks 506 may be determined only after earlier tasks have been performed, and task outputs received by manager 200.

The systems and method set forth above advantageously allow specialized machine learning models to be leveraged to address specific parts of steps of complex prompt responses. This is accomplished by separately training multiple specialist models, and by additionally training selection and integration modules interface with this library of available models. Specifically, the selection module allows for precise and improvable selection of the best-suited available model to handle specific tasks on a step-by-step basis, and the integration module assembles outputs from selected models into a coherent final output responsive to the complex prompt.

Discussion of Possible Embodiments

The following are non-exclusive descriptions of possible embodiments of the present invention.

A method of processing a compound prompt, the method comprising: decomposing the compound prompt into a plan having a plurality of steps via a planner module instantiated in machine-readable memory and operable by a processor; mapping a domain to each of a plurality of distinct machine learning models; for each of the plurality of steps: selecting one of the plurality of distinct machine learning models by matching the step to the corresponding mapped domain; and generating a language output using the selected one of the distinct machine learning models; and integrating the language outputs of each of the plurality of steps into a syntactically and semantically coherent final output via an integration module utilizing a large language model.

The method of the preceding paragraph can optionally include, additionally and/or alternatively, any one or more of the following features, configurations and/or additional components:

A further embodiment of the foregoing method, further comprising training each of the plurality of distinct machine learning models using different training data specific to its respective domain.

A further embodiment of the foregoing method, wherein at least a subset of the plurality of distinct machine learning models are trained entirely separately from others of the plurality of distinct machine learning models, without overlapping training data.

A further embodiment of the foregoing method, wherein at least a subset of the plurality of distinct machine learning models are specialized in a respective domain via fine-tuning or transfer learning.

A further embodiment of the foregoing method, wherein mapping the domain to each of the plurality of distinct machine learning models comprises mapping a subject-matter specialization to each of the plurality of distinct machine learning models.

A further embodiment of the foregoing method, wherein mapping the domain to each of the plurality of distinct machine learning models comprises training a selection model via machine learning to map compound prompts to one or more of the plurality of distinct machine learning models.

A further embodiment of the foregoing method, wherein generating a language output using the selected one of the distinct machine learning models comprises providing the selected one of the plurality of distinct machine learning models with at least a portion of the complex prompt, one of the language outputs from another of the plurality of steps, or both.

A further embodiment of the foregoing method, wherein at least a subset of the plurality of machine learning models are large language models.

A further embodiment of the foregoing method, wherein the large language model is one of the plurality of distinct machine learning models having a corresponding generalist domain.

A further embodiment of the foregoing method, wherein the selection of one of the plurality of distinct machine learning models for each step is performed by the large language model.

A further embodiment of the foregoing method, wherein decomposing the compound prompt into a plan having a plurality of steps comprises identifying the plurality of steps, ordering the plurality of steps, and identifying outputs from at least one of the plurality of steps to be received as inputs by another of the plurality of steps.

A further embodiment of the foregoing method, wherein at least some of the plurality of steps are executed sequentially.

A system for generating a response to a complex prompt, the system comprising: an input device configured to receive the complex prompt; a logic processor; machine-readable memory; a plurality of specialized large language models (LLMs) instantiated in the machine-readable memory, each of the specialized LLMs having a corresponding domain of specialization; and a manager comprising: a planner instantiated in the machine-readable memory and operable via the logic processor to decompose the complex prompt into a plan having a plurality of steps; a selection module instantiated in the machine-readable memory and operable via the logic processor to select one of the plurality of specialized LLMs to execute each of the plurality of steps; and an integration module instantiated in the machine-readable memory and operable via the logic processor to generate a language output responsive to the complex prompt from outputs of each of the selected ones of the plurality of specialized LLMs.

The system of the preceding paragraph can optionally include, additionally and/or alternatively, any one or more of the following features, configurations and/or additional components:

A further embodiment of the foregoing system, wherein the manager further comprises a generalist LLM, and wherein the integration module generates the language output from outputs of at least a subset of the selected ones of the plurality of specialized large language models using the generalist large language model.

A further embodiment of the foregoing system, wherein the generalist LLM is a Meta-Language Model.

A further embodiment of the foregoing system, wherein the selection module maps each of the plurality of steps to one of the plurality of specialized large language models using the generalist large language model.

A further embodiment of the foregoing system, wherein the selection module comprises a model record identifying the corresponding domain of specialization and at least one of an input format and an output format for each of the plurality of specialized large language models.

A further embodiment of the foregoing system, further comprising a database communicatively coupled with at least one of plurality of specialized large language models to provide context injection for that respective specialized large language model.

A further embodiment of the foregoing system, wherein each of the plurality of specialized large language models is trained for its respective domain using different training data and/or parameters than all others of the plurality of specialized large language models.

A further embodiment of the foregoing system, wherein at least a subset of the specialized large language models are trained via fine tuning, transfer learning, or both.

Summation

Any relative terms or terms of degree used herein, such as “substantially”, “essentially”, “generally”, “approximately” and the like, should be interpreted in accordance with and subject to any applicable definitions or limits expressly stated herein. In all instances, any relative terms or terms of degree used herein should be interpreted to broadly encompass any relevant disclosed embodiments as well as such ranges or variations as would be understood by a person of ordinary skill in the art in view of the entirety of the present disclosure, such as to encompass ordinary manufacturing tolerance variations, incidental alignment variations, alignment or shape variations induced by thermal, rotational or vibrational operational conditions, and the like.

While the invention has been described with reference to an exemplary embodiment(s), it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment(s) disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

COMPOUND PROMPT PROCESSING USING MULTIPLE INTEGRATED DOMAIN-SPECIALIZED LANGUAGE MODELS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATION

Provisional Applications (1)