SPECIALIST LANGUAGE MODEL SET MAPPING AND SELECTION FOR PROMPT DELEGATION

FIELD OF THE INVENTION

The present disclosure relates generally to machine learning (ML), and more particularly to systems for processing complex or compound queries using large language models (LLMs).

BACKGROUND

ML language models, including large language models, are commonly used to generate responses to queries. In some cases, complex or compound queries can be addressed by decomposing queries into a series of separate steps or tasks performed sequentially by a language model. In such systems, the language model used is preferably a large language model trained on a diverse range of inputs for many purposes, so as to offer versatility needed to identify and execute steps or tasks of any type needed to address the complex or compound query or prompt.

SUMMARY

This disclosure presents a method of processing a compound prompt. This method includes decomposing the compound prompt into a plan having multiple steps, and associating a domain descriptor with each of several specialized machine learning models. For each step of the plan, a relevance score for each specialized model is assigned by semantic comparison between its domain descriptor and the step. A subset of the models are selected based on relevance score and used to produce a step output. The outputs of all steps are assembled into a syntactically and semantically coherent final output via an integration module utilizing a large language model.

This disclosure also presents a system for generating a response to a complex prompt. This system includes a manager and several specialized large language models (LLMs), each having an associated domain descriptor identifying its area of specialization. The manager includes a planner, a selection module, and an integration module. The planner is configured to decompose the complex prompt into a plurality of steps. The selection module is configured to identify approaches for generating outputs for each step using step-specific subset of the plurality of specialized LLMs. The step-specific subset is identified by the selection module based on semantic comparison between the respective step and the associated domain descriptor of each of the plurality of specialized LLMs. The integration module is configured to generate a language output responsive to the complex prompt from all of the step outputs corresponding to each of the plurality of steps.

The present summary is provided only by way of example, and not limitation. Other aspects of the present disclosure will be appreciated in view of the entirety of the present disclosure, including the entire text, claims, and accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a query processing system including a complex prompt handling system.

FIG. 2 is a schematic diagram of the complex prompt handling system of FIG. 1.

FIG. 3 is a method flowchart describing a simplified training method for generating language models of the prompt handling system of FIGS. 1 and 2 using guided machine learning.

FIG. 4 is a method flowchart describing methods of further training for generating language models of the query handling system of FIGS. 1 and 2 using transfer learning and/or fine tuning.

FIG. 5 is a process flowchart illustrating decomposition and processing of a complex prompt in several steps using the complex prompt handling system of FIGS. 1 and 2.

FIG. 6 is a process flowchart illustrating processing of a step of FIG. 5 in greater detail.

FIG. 7 is a method flowchart illustrating a method of generating an output to a complex prompt using the complex prompt handling system of FIGS. 1 and 2 as set forth generally in FIGS. 5 and 6.

While the above-identified figures set forth one or more embodiments of the present disclosure, other embodiments are also contemplated, as noted in the discussion. In all cases, this disclosure presents the invention by way of representation and not limitation. It should be understood that numerous other modifications and embodiments can be devised by those skilled in the art, which fall within the scope and spirit of the principles of the invention. The figures may not be drawn to scale, and applications and embodiments of the present invention may include features and components not specifically shown in the drawings.

DETAILED DESCRIPTION

The present disclosure presents methods and systems for processing complex or compound queries or prompts using a system including multiple separately specialized large language models. The term “compound prompt” can refer to a prompt explicitly including multiple separate tasks or steps, e.g., “(1) identify the three highest-selling jazz musicians of the 1970s, and then (2) generate a report comparing the musical styles of these three musicians.” The term “complex prompt” can refer more generally to any prompt requiring multiple steps to process, either implicitly or explicitly, e.g., “generate a report comparing the musical styles of the three highest-selling jazz musicians of the 1970s. ” Although distinctions can be drawn between complex and compound prompts, this disclosure will treat complex and compound prompts as equivalent for the purposes of explanation except where specifically stated.

Machine learning (ML) models are increasingly commonly used to process complex or compound queries. Complex prompts can, for example, demand retrieval of data from multiple or specialized sources, assembly of outputs (e.g. natural language, computer code, lists) from the retrieved data based on identified criteria, and/or subsequent processing of those outputs (e.g. transmission or archival to specified categories or locations and/or recipients). Existing solutions to complex prompt processing use large language model (LLM) planners to semantically decompose such prompts into multiple steps, then execute those steps either using the same LLM, or using native functions or databases. Tools for such approaches include Semantic Kernel and LangChain. Although many complex prompts most often involve sequential steps each contingent upon the results or outputs of previous steps, some complex prompts can also or instead include parallel tasks or steps, i.e., steps not contingent upon the results of some other steps.

The present disclosure presents systems and method for improving accuracy and reliability in complex prompt response through the use of multiple specialized machine learning models (MLMs). The present disclosure focuses principally on the use of specialized LLMs, but a person skilled in the art will understand that the approaches set forth below are mostly generalizable to other specialist MLMs, except where otherwise noted. In the most general case, this disclosure presents a system whereby general query processing is handled by a manager using a generalist or “jack-of-all trades” LLM or Meta-Language Model, but steps falling within specialized domains are advantageously handled by dedicated, specially-trained specialist LLMs selected by the manager. This disclosure provides advantageous examples of methods for mapping plan steps (i.e., tasks) to one or more appropriate models.

FIG. 1 is a schematic diagram of prompt processing system 10, an illustrative example of a system disposed to receive and process prompts, particularly complex natural language prompts, and produce syntactically and semantically coherent outputs responsive to those prompts. FIG. 1 depicts hardware computing device 100 with processor 102, memory 104, and user interface 106 accessible to user 108. Complex prompt handling system 110 is instantiated at least partly within memory 104, and executable via processor 102 to generate outputs responsive to complex prompt 120. In some embodiments prompt processing system 10 can also include local device 130 directly connected to hardware computing device 100, and network 140 further connecting remote devices 142 and 144 to hardware computing device 100 indirectly. Remote device 144 can include computer-executable software accessible via API 146.

FIG. 1 focuses on hardware components of prompt processing system 10, and is provided as an illustrative example of a general hardware system for processing complex prompts. Software logic for prompt processing system 10 is primarily discussed below with respect to subsequent figures. The elements presented in FIG. 1, particularly including local device 130, network 140, and remote devices 142 and 144, can be omitted or replaced with analogous hardware in different architectures without departing from the scope and spirit of the present disclosure. In particular, although complex prompt handling system 110 is illustrated as instantiated in memory 104 of hardware computing device 100, subcomponents of complex prompt handling system 110 (see FIG. 2 and accompanying description) can in other embodiments be distributed across multiple hardware devices, including but not limited to remote and/or local devices a illustratively shown in FIG. 1 (130, 142, 144). Similarly, although hardware computing device 100 is illustrated as directly accessible to user 108, embodiments wherein hardware computing device 100 and therefore complex prompt handling system 110 are only accessible via intervening devices or steps also fall within the scope of the present disclosure.

Processor 102 is a logic-capable device configured to execute software, applications, and/or programs stored on memory 104. Examples of processor 102 can include one or more of a processor, a microprocessor, a controller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other equivalent discrete or integrated logic circuitry. Processor 102 can be entirely or partially mounted on one or more circuit boards.

Memory 104 is a machine-readable storage medium configured to store information including complex prompt handling system 110, and can most generally include both transitory and non-transitory storage media. In some examples, a computer-readable storage medium can include a non-transitory information storage medium. The term “non-transitory” can indicate that the storage medium is not embodied in a carrier wave or a propagated signal. In certain examples, a non-transitory storage medium can store data that can, over time, change (e.g., in RAM or cache). In some examples, memory 104 is or includes a temporary memory. As used herein, a temporary memory refers to a memory having a primary purpose that is not long-term storage. Memory 104, in some examples, is described as volatile memory. As used herein, a volatile memory refers to a memory that does not maintain stored contents when power to the memory 104 is turned off. Examples of volatile memories can include random access memories (RAM), dynamic random access memories (DRAM), static random access memories (SRAM), and other forms of volatile memories. In some examples, the memory is used to store program instructions for execution by the processor. The memory, in one example, is used by software or applications running on hardware computing device 100 (e.g., complex prompt handling system 110) to temporarily store information during program execution. Memory 104, in some examples, also includes one or more persistent computer-readable storage. Examples of such persistent, non-volatile storage elements can include, for example, magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.

User interface 106 is an input and/or output device and/or software interface, and enables an operator, such as user 108, to control operation of and/or interact with software elements of computer hardware device 100. For example, user interface 106 can be configured to receive inputs from an operator and/or provide outputs. Most relevantly for the present disclosure, user interface 106 provides means for user 108 to supply complex prompt 120 to computer hardware device 100. User interface 106 can, for example, be a local interface including one or more of a sound card, a video graphics card, a speaker, a display device (such as a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, etc.), a touchscreen, a keyboard, a mouse, a joystick, or other type of device for facilitating input and/or output of information in a form understandable to users and/or machines. In other mutually consistent embodiments, user interface 106 can additionally or alternatively include means such as a wired or wireless transceiver for communicating with a remote user, e.g., through a remote device and/or over a local or wide area network.

Computer hardware system 100 receives complex prompt 120 from user 108 via user interface 106, as noted above. Complex prompt 120 is interred in memory 104 as an input of complex prompt handling system 110 (see FIGS. 2-7), which is executed by processor 102 to produce a merged output advantageously leveraging the task-specific training of multiple domain-specialized LLMs.

Network 140 can be a local area network (LAN) such as a network connecting computer hardware device 100 to local user device and/or databases, or can be wide-area network (WAN) suitable for connecting computer hardware device 100 to servers and other computing components that are separated by greater geographic distances than the devices of a local network. Network 140 can include network infrastructure for connecting devices separated by larger geographic distances. In at least some examples, network 140 is the internet. For illustrative purposes, computer hardware device 100 can communicate with remote devices 142 and 144 via network 140. More generally, any number of remote devices can be communicatively connected to computer hardware device 100 via one or multiple networks 140.

As illustrated in FIG. 1, prompt processing system 10 can also include local device 130 and network 140 with remote devices 142 and 144. Devices 130, 142, and 144 are generalizations representing all hardware external to computer hardware device 100 that can be included in prompt processing system 10. Memory 104 and analogous memory situated on devices 130, 142, and/or 144 can contain retrievably housed data stored in a variety of storage formats or architectures, including as structured databases or application data, unstructured data (e.g., data lakes), or vector databases. In the most general case, remote devices 130, 142, and 144 can provide locations external to computer hardware device 100 for separate processors and with associated memory for remote data storage (e.g., databases or similar storage structures for context injection) or processing (e.g., of software including specialized LLMs as discussed in detail below). In some examples devices 100, 130, 142, and 144 can be only a few hardware devices in a cloud architecture. In an opposite example, prompt processing system 10 can eschew devices 130, 142, and 144 and handle all data storage and processing locally within computer hardware device 100. In the most general case, software functions described with respect to FIGS. 2-7, including various language models, Meta-Language Models, Large Language Models (LLMs), and logic software, can be distributed across any useful number of separate hardware devices.

FIG. 2 is a schematic diagram of complex prompt handling system 110. As noted above, complex prompt handling system 110 is instantiated in memory 104 and executed via processor 102 on computer hardware device 100. As shown in FIG. 2, complex prompt handling system includes manager 200 and several specialist models 220a-n with associated domain descriptors 222a-n. Manager 200 is made up of multiple software modules, including planner 202, selection module 204 (with model record 206), integration module 208, and model 210. Manager 200 receives complex prompt 120 from user interface 106, decomposes this complex prompt into several steps, and delegates these steps as appropriate to specialist models 220a-n, as described in greater detail below. The final results of these delegated steps, directly and/or indirectly, are assembled into a syntactically and semantically coherent final output responsive to complex prompt 120. The components of complex prompt handling system 110 are illustrated and described with reference to FIG. 2, while the process of operation of complex prompt handling system 110 is illustrated and described by reference primarily to FIGS. 5-7, below. FIGS. 3 and 4 illustrate training methods for various models within complex prompt handling system 110, including model 210 and specialist models 220a-n. FIGS. 3-7 are described separately below, but elements illustrated in FIG. 2 are best understood in view of FIGS. 2-7, together.

Manager 200 (with its various constituent modules 202-210) forms the core of complex prompt handling system 110 and is responsible both for initial processing of complex prompts for final assembly of outputs responsive to those complex prompts. Specialist models 220 (used herein to refer to any or all specialist models 220a-n) are MLMs, and in most embodiments specifically LLMs, that are trained for competence in specific domains identified by their respective domain descriptors 222 (used herein to refer to any or all domain descriptors 222a-n). Domain descriptors 222 can be natural-language terms or phrases identifying the specialization of their respective models 220, e.g., “mapping and routefinding,” “email generation,” or “mathematics.” Advantageously, manager 200 delegates specific steps necessary for the execution of each complex prompt 120 to appropriate specialist models 220, thereby providing advantages from specialized training (efficiency, reliability, reduced hallucination, etc.) at each step, in a generalist overall system.

As noted above, specialist models 220 are MLMs such as LLMs that are trained for efficiency and reliability within specific domains. As illustrative examples, specialist model 220a can be a model fine-tuned to scrape data from websites, specialist model 220b can be a model trained to generate emails to match a style of a particular individual (i.e., from training data including a set of that person's sent emails), and specialist module 220d can be a MLM trained to perform chemical reaction mathematics. Specialist models 220 need not have anything in common with each other but their accessibility to manager 200. In FIG. 2, specialist models 220d and 220e are shown as situated on local device 130 and remote device 142 or 144, respectively. More generally, FIG. 2 illustrates that not all specialist models 220 need be situated on computer hardware device 100. In some instances specialist models may be proprietary third-party models accessed, e.g., through API 146, or first-party models retrieved or queried on as as-needed basis from separate or remote hardware. Specialist models 220 can, for example, be convolutional or recurrent neural network models. In alternative cases, some specialist models 220 can be random forest models or other decision-tree-like models. In all cases, however, specialist models 220 are models trained via machine learning, with at least a portion of this training including training data specific to a particular domain (e.g., web scraping, email mimicry, or mathematics, to carry forward the previous examples). This training data can be wholly manually labeled (i.e., for supervised learning) or partially manually labelled (i.e., for semi-supervised learning).

As noted above, manager 200 includes planner 202, selection module 204, integration module 208, and model 210. Model 210 is an LLM trained to process natural language prompts. In various embodiments model 210 can be used by various functional components of manager 200, including planner 202, selection model 204, and/or integration module 208. Although planner 202, selection module 204, and integration module 208 are herein described separately below in terms of function for the purpose of explanation, in some embodiments many functions of planner 202, selection module 204, and/or integration module 208 may be performed by providing prompts or context injections to model 210, i.e., to a single shared generalist LLM used by manager 200. In other embodiments, however, manager 200 can include multiple models 210, each dedicated specifically to a subset of modules 202, 204, and/or 208.

Planner 202 is a semantic processing module capable of decomposing a complex prompt into a plurality of steps. Planner 202 can, for example, be a planner such as used in conventional systems such as Semantic Kernel or LangChain. In the most general case, planner 202 can be any suitable natural language processing (NLP) agent capable of identifying a plurality of actionable tasks (i.e., steps) for the resolution of complex prompt 120. Planner 202 can, for example, make use of model 210 for generative production of a response to complex prompt 120 that identifies these actionable tasks. In some embodiments planner 202 can immediately output a plan (i.e., several steps) in direct response to the complex prompt. In other embodiments planner 202 can assemble the plan over several iterations, e.g., by identifying an initial task or set of tasks insufficient to completely address the complex prompt, then supplementing this partial plan with an additional step or steps in response to completion of the initial task or tasks. For simplicity of explanation, the following explanation does not distinguish between initially-generated and subsequently-generated steps.

Selection module 204 is responsible for identifying, for each step provided by planner 202, an approach to executing that step using specialist models 220. As described in greater detail below, selection module 204 is responsible for identifying, for each step identified by planner 202, a corresponding set of models to be queried at that particular step and a method by which this/these model(s) are to be queried. Selection module 204 can, by way of example, identify a subset of models suitable for execution each step from among all models (210 and 220) available within complex prompt handling system 110, and provide a prompt (general or step-specific, and either generated by planner 202 as a part of the plan, or generated by selection module 204 based on the plan and model record 206; see below) corresponding to the step in question to the selected each of the models of that subset.

Selection module 204 can include model record 206 that maps a domain to each specialist model 220, i.e., that reflects domain descriptors 222. Model record 206 can, in some examples, be generated or updated on-the-fly as specialist models 220 are added to or removed from complex prompt handling system 110. Model record 206 enables selection module 204 to perform intent identification of steps generated by planner 202 via model 210 (or another appropriate model) to identify specialist models 220 having a domain (i.e., subject-matter specialization) relevant to that intent. In some such embodiments, model 210 can be trained in model selection through provision of a large number of (step/task) prompts labeled as suited for a particular specialist model. In alternative embodiments, functions described herein by reference to planner 202 and selection module 204 can be performed inseparably based on context of complex prompt 120 using a meta-language model (i.e., wherein model 210 is a meta-language model) trained to identify specialist models 220 as a part of each plan step generated from complex prompt 120, or trained to semantically identify a task or type of task referenceable against domain descriptors 222.

Manager 200 delegates execution of a particular plan step to selected specialist models 220 by providing prompts corresponding to that step to the selected specialist models 220. In some embodiments, model record 206 can, in addition to mapping domains of each specialist model 220, also identify prompts rules, formats, or templates suitable for each or some specialist models 220, e.g., for context injection based on the designated model and the specific step, or complex prompt 120. More generally, prompts provided to models 210 and 220 can include retrieval automated generation (RAG) or other context injection to reduce hallucination or otherwise constrain outputs based on templates retrieved from or data validated through outside data storage, e.g., lists and/or vector and/or relational databases either stored in memory 104 or retrieved from other devices 130, 142, or 144. Unless otherwise specified, this context injection can be provided by manager 200 based on operation of selection module 204, or can be sourced externally to complex prompt handling system 110 based on the nature of prompt provided by selection module 204 to the selected specialist model.

Specialist models 220 advantageously offer improved performance within their specialized domains over using a generalist (general-purpose, jack-of-all-trades) model, but may be less capable outside of those specialized domains. Steps for which selection module 204 can identify no appropriate specialist model 220 are handled by model 210, or by another generalist model. Specialist models 220 can have domains of varying breadth. In some embodiments, complex prompt handling system 110 can include both specialist models with non-overlapping domains, and specialist models with overlapping domains, e.g., of broader or narrower scope. During the execution of each step generated by planner 202 in response to complex prompt 120, planner 202 may identify multiple relevant specialist models 220 to be separately or collectively prompted during that step, as discussed in greater detail below.

Integration module 208 is a natural language processing module disposed to generate a singular output responsive to the complex prompt based on the outputs of the steps of the plan generated by planner 202, as executed by specialist models 220 (and in some instances model 210) per delegation by selection module 204. Integration module 208 can, for example, include its own specialized LLM trained to aggregate these various model outputs into a semantically and syntactically coherent single output without introducing hallucinations or errors, or omitting information provided by the various specialist models 220. Alternatively, model 210 can be a generalist LLM (as described above) capable of performing this function using prompts generated by integration module 208 from the aforementioned outputs of specialist models 220 (and in some instances model 210), i.e. such that the same model 210 provides the trained machine learning backbone of integration module 208 and planner 202, selection module 204, or both. Integration module 208 can in some embodiments receive inputs from all designated specialist models 220 used in handling steps identified by planner. In other embodiments, where outputs of some specialist models 220 are used only to provide inputs to other specialist models 220 (see FIG. 6, described below), integration module 208 can receive only a subset of the outputs of specialist model 220. In some instances integration module 208 can also receive the plan produced by planner 202, and generate its output based a combination of the outputs of specialist models 220 and the plan and/or complex prompt 120. Inputs of integration module 208 can, for example, be aggregated into a LLM prompt using model 210. Although FIG. 2 illustrates a single integration module 208, some embodiments of complex prompt handling system 110 can include specialized integration modules suited to aggregating model outputs of particular types. Similarly, although integration module 208 is mainly described herein as a tool for generating the aforementioned singular output responsive to complex prompt 120, some embodiments of integration module(s) 208 can serve an intermediate function by structuring outputs of several specialist models 220 for use as prompts to other specialist models 220.

FIG. 3 presents training method 300, a simplified method for supervised or semi-supervised training of machine learning models such as models 210 and/or 220. Training method 300 includes the steps of generating training data (step 302), training the computer-implemented machine learning model with the training data (step 304), and testing the trained computer-implemented machine learning model with test data (step 306). Training method 300 is a method of machine learning that can be used to train any suitable computer-implemented machine learning model for use with complex prompt handling system 110 (see FIG. 2, discussed above) or method 700 (see FIG. 7, discussed below).

In step 302, the training data is generated. For training the computer-implemented machine learning model(s) used in complex prompt handling system 110, training data includes domain specific information with example inputs labeled (i.e. mapped to or tagged with) example outputs. Data can be labeled entirely manually or, to reduce labor, can be labeled in a semi-automated fashion, e.g. using clustering or analogy to manually labeled data. Each specialist model 220 is trained on different training data, although in some embodiments some specialist models 220 can be trained on training data that is a subset of or overlaps with training data used to train other specialist models 220.

In step 304, the labeled data is used to train each computer-implemented machine learning model (i.e. models 210, 220) to produce appropriate outputs given inputs within its domain. Assuming models operate within their domains, training in broader domains, i.e. of models intended for more general use, will produce less reliable or accurate outputs than narrower training, i.e. of more specialized models, for many applications. This is generally the case not only when conserving overall volume of training data (i.e. such that a model having a narrower domain has a higher density of training data within that domain), but also when conserving density of training data (i.e. where a comparatively less specialized model is trained with more data, but the subset of that training data corresponding to the domain of a narrower model is analogous to the entirety of training data used to train the narrower model). In other words, the introduction of out-of-domain training data to broaden or generalize model competence can, within some domains, produce less reliable or accurate outputs in-domain. As a consequence, specialist models 220 of various scopes can be useful so long as selection module 204 is capable of delegating tasks or steps intelligently.

As used herein, “training” a computer-implemented machine learning model refers to any process by which parameters, hyper parameters, weights, and/or any other value related model accuracy are adjusted to improve the fit of the computer-implemented machine learning model to the training data. The labeled data can be transformed by, for example, one or more programs and/or one or more other trained machine learning models before it is used for training in step 304.

In step 306, the trained computer-implemented machine learning model is tested with domain-specific test data. This test data is unlabeled data that can be used to qualify and/or quantify performance of the trained computer-implemented machine learning model. More specifically, a human or machine operator can evaluate the performance of the machine learning model by evaluating the fit of the model to the test data. Step 306 can be used to determine, for example, whether the machine learning model was overfit to the labeled data during model training in step 304.

As depicted in FIG. 3, steps 304 and 306 can be performed iteratively to improve the performance of the machine learning model. More specifically, if the fit of the model to the unlabeled data determined in step 306 is undesirable, step 306 can be repeated to further adjust the parameters, hyper parameters, weights, etc. of the model to improve the fit of the model to the test data. Step 306 can then be repeated with a new set of unlabeled test data to determine how the adjusted model fits the new set of unlabeled test data. If the fit continues to be undesirable, further iterations of steps 304 and 306 can be performed until the fit of the model becomes desirable.

Training method 300 can advantageously be used to train any machine learning model described herein. More generally, the systems and methods disclosed herein advantageously allow for the training and use of machine learning models that can be used for a general-purpose model (e.g. model 210) as well as for domain-specific specialist models (e.g. specialist models 220) by varying the scope of training of test data.

FIG. 4 illustrates training method 400, a simplified expansion of training method 300 to include fine tuning and/or transfer learning. Method 400 includes steps 302, 304, and 306 as described above with respect to FIG. 3, as well as parallel steps 402, 404, and 406 generally mirroring steps 302, 304, and 306. Method 300 can, for example, result in a trained model such as a neural network model, where training produces edge weights between nodes distributed across several network levels or depths. In one embodiment, FIG. 4 provides a method of fine-tuning this model by applying further training to this existing model according to substantially the same process set forth with respect to FIG. 3 (i.e. with steps 402, 404, and 406 generally qualitatively matching steps 302, 304, and 306), but with new training and testing data matching an altered domain. Fine-tuning of this kind can be used to improve the functioning of a broader (i.e. less specialized) model within a more specialized domain by providing a domain-constrained second set of labeled training data and testing the resulting model with more specialized test data. Alternatively or additionally, training method 400 can include transfer learning provided by freezing edge weights (step 408) of the model produced through iterations of step 304, and adding additional model layers adjustable through training in steps 402-406.

In some embodiments, model 210 can be a general purpose model produced by method 300, and at least some of specialized models 220 can be produced from model 210 via fine-tuning and/or transfer learning from model 210. In other embodiments, at least some specialized models 220 can be trained entirely independently of model 210, or even entirely independently of any more general model. In either case, the domain of training and testing data used for training specialist models 220 is narrower, i.e. more specialized, than the domain of training data used to train model 210.

FIG. 5 is a flowchart illustrating process 500 for decomposing and processing complex prompt 120 using complex prompt handling system 110. Specifically, planner 202 generates plan 504 including several separate steps 506a, 506b, 506c (collectively or generically referred to hereinafter as steps 506). Steps 506 are provided for illustrative purposes only—more generally, plan 504 can include any number of steps 506 generated by planner 202 to address complex prompt 120. FIG. 6 is a flowchart illustrating process 600, which depicts handling of an individual step 506 (of FIG. 5) in greater detail. FIGS. 5 and 6 are described together.

Each step 506 generated by plan 504 includes a task generally specifying an output. Steps 506 can take the form of prompts. Carrying forward the illustrative examples introduced with discussion of FIG. 2, step 506a could be “determine what products are currently discounted on GenericBrand website,” and step 506b could be “generate a cheerful informational email from informing frequent GenericBrand buyers of new offers based on currently discounted products.” Steps 506 can be executed in parallel, as illustrated in FIG. 5. Alternatively or additionally, some steps 506 can be executed in series, with outputs of previously executed steps 506 used as inputs to other steps 506.

As illustrated in FIG. 5, each step 506 is received by selection module 204, which generates a corresponding approach 508a, 508b, 508c (hereinafter collectively or generically referred to as approaches 508). Each approach 508 identifies means for extracting information from one or more models 220 to step 508. In the simplest possible case, approach 508 identifies a single relevant model 220 to execute step 506 and return an output to integration module 208. As discussed above with reference to FIG. 2, integration module 208 synthesizes final output 510 based on outputs from steps 506. Final output 510 is an output responsive to complex prompt 120. Preferably, final output 510 is a result or other output that fully addresses the intent of complex prompt 120 from user 108. Where user intent is unclear, or further information is needed from use 108 in order to provide a satisfactory result, final output 510 can instead or additionally include follow-up questions to user 108 prompting user 108 to provide additional information to supplement complex prompt 120.

Results from multiple steps 506 can be merged by integration module 208 (e.g. using model 210) at the stage of producing final output 510. In some embodiments, however, results from execution of one or more steps 506 using associated approach 508 can be inputs of other steps 506, as noted above.

As shown in FIG. 5, approaches 508 can vary between steps 506 (see approaches 508i and 508j for steps 506a and 506c, respectively), but need not be different for every step (see approach 508i used for both step 506a and step 506b).

FIG. 6 explores differences in approaches 508 in greater detail by reference to more complex cases wherein multiple models 220 may be relevant to execution of a step 506. Specific, FIG. 6 depicts process 600, by which selection module 204 receives step 506 from planner 202 and analyzes (action 602a-n, hereinafter collectively and/or generically semantic analysis 602) each domain descriptor 220a-n for relevance to step 506. This analysis can, for example, be semantic analysis to identify conceptual overlap or subject matter overlap similarly between natural language of step 506 and natural language descriptions of each model 220 provided through associated domain descriptors 220, either provided upon request or stored locally in model record 206.

Semantic analysis 602 of each respective domain descriptor 222 for relevance to step 506 produces a respective model relevance score 604a-n (hereinafter generically and/or collectively relevance score(s) 604) quantifying a degree of relevance of corresponding specialist model 220 to step 506. Model relevance scores 604 can, in some embodiments, be binary, identifying a model either as relevant or irrelevant to step 506. In more complex embodiments of process 600, model relevance scores 604 can reflect a degree of semantic overlap between step 506 and associated domain descriptor 222. In some such embodiments, semantic analysis 602 can, for example, include vector similarity (e.g., cosine similarity) scoring between vectorized text of all or parts of step 506 and each domain descriptor 222. In other such embodiments, scoring of model relevance 604 can, for example, be based on intent recognition (i.e., intent classification) performed, e.g., by model 210. In still further embodiments, model 210 or a separate dedicated delegation model (not shown) can undergo reinforcement learning or other training as set forth in FIG. 4 with reference specifically to the set of specialist models 220 available within prompt processing system 10, so as to train identification of relevant models 220 to steps 504.

If relevance of specific models cannot be ascertained from step 506 based on a lack of information in complex prompt 120, complex prompt handling system 110 may solicit addition information from user 108 via follow-up questions or requests for clarification (i.e., as new complex prompts 120, either replacing or cumulative with a previous complex prompt 120). In some embodiments the need for such clarifications can be identified, and clarification requested, prior to generation of plan 504 by planner 202.

Selection module 204 selects (action 606) a subset of available specialist models (hereinafter selected model set 608) 220 based on their respective model relevance scores 604. In some embodiments, selected model set 608 can consist of the single highest-scoring model 220, or of a preset number N of highest-scoring models 220. In other embodiments, selected model set 608 can include all models scoring over a minimum relevance threshold (e.g., cosine similarity>0.6). In combinations of these approaches, selected model set 608 can include up to a maximum number of highest-scoring models 220, excluding any scoring below a threshold value. Each specialist model 220 in selected model set 608 (illustratively shown as including specialist models 220b and 220n) is used in the generation of an intermediate model output 610 provided to integration module 208.

In one embodiment, all specialist models in selected model set 608 are prompted in parallel. According to this approach, each specialist model 220 in selected model set 608 is provided with step 506 or a natural language portion thereof as a prompt, producing a respective intermediate model output 610 therefrom. These intermediate model outputs (illustratively shown as intermediate model outputs 610b and 610n, corresponding to specialist models 220b and 220n, respectively) are synthesized by integration module 208 to produce step output 212.

In an alternative embodiment, specialist models in selected model set 608 are prompted in series, i.e., as a feed-forward from one model to the next. According to this approach, a first specialist model 220 in selected model set 608 is provided with step 506 or a natural language portion thereof as a prompt (specialist model 220b, in the illustrated example). An output of this specialist model is provided as input to another specialist model (specialist model 220n, in the illustrated example), either alone or in combination with other information such as step 506. This feed-forward approach continues until all specialist models 220 in selected model set 608 have been queried, thereby producing an intermediate model output 610 (intermediate model output 610n, in the illustrated example). In some embodiments, selected model set 608 can be traversed serially in multiple orders (e.g., ABC, ACB, CBA, . . . etc.), with each such traversal producing an associated intermediate model output 610. In some embodiments of process 600, a combination of serial and parallel approaches can be used to traverse selected model set 608.

Intermediate model outputs 610 are reconciled and/or aggregated by integration module 208 to produce step 612, an output responsive to step 506. Integration module 208 can, for example, poll individual model outputs 610 for commonalities between outputs, or for agreement between different models. Step outputs 612 for each step 506 are synthesized by integration module 208 to produce final output 510 responsive to complex query 120 as a whole, as described above with reference to FIG. 5.

FIG. 7 is a method flowchart illustrating method 700. Method 700 provides an overview of the approach set forth in FIGS. 2, 5, and 6 whereby prompt processing system 10 decomposes complex prompt 120 into steps, handles each step with a selected approach 508, and generates a final output 510 from these intermediate step results. Method 700 begins with the receipt of complex prompt 120 from user 108 via user interface 106 (see FIG. 1). (Step 702). Planner 202 of manager 200 within complex prompt handling system 110 then generates a plan 504 based on complex prompt 120 (see FIG. 2 and associated description). (Step 704). Plan 504 can include any number of steps 506. For each step 506, so long as steps remain (evaluation 706), selection module 204 of manager 200 in complex prompt handling system 100 selects an approach 508 for addressing that step 506 using an appropriate subset of models 220 (sec FIGS. 5 and 6). (Step 708). A step output 612 is generated by integration module 208 of manager 200 for each step 506 using the selected approach. (Step 710). In some embodiments, step outputs 612 can be used in the completion of downstream steps, as noted with reference to FIGS. 2 and 5. Once step outputs 612 for each step 506 have been produced, integration module 208 of manager 200 in complex prompt handling system 110 integrates these results (Step 714) to produce final output 510 (Step 716), a unified output responsive to complex prompt 120.

The systems and method set forth above advantageously allow specialized machine learning models to be leveraged to complete relevant steps of complex prompt responses, thereby enabling system 10 as a whole to exhibit broad general competence across a wide range of subject matter while reducing hallucination and error rate compared to purely generalist model-based approaches. This is accomplished by separately training multiple specialist models 220, and by identifying and traversing a relevant subset of specialist models 220 with respect to each step 506 involved in responding to complex prompt 120 via an appropriate approach 508.

Discussion of Possible Embodiments

A method of processing a compound prompt, the method comprising: decomposing the compound prompt into a plan having a plurality of steps via a planner module instantiated in machine-readable memory and operable by a processor; associating a domain descriptor to each of a plurality of distinct machine learning models; for each of the plurality of steps: generating a model relevance score by semantic comparison of the step to the domain descriptors of each of the plurality of distinct machine learning models; selecting a subset of the plurality of distinct machine learning models based on the respective model relevance score of each of the plurality of distinct machine learning models; and generating a step output addressing the step, from the selected subset of the plurality of distinct machine learning models; and assembling all of the step outputs into a syntactically and semantically coherent final output via an integration module utilizing a large language model.

The method of the preceding paragraph can optionally include, additionally and/or alternatively, any one or more of the following features, configurations and/or additional components:

A further embodiment of the foregoing method, wherein generating the model relevance score comprises evaluating vector cosine similarity between vectorized text of each of the domain descriptors and at least a portion of the step.

A further embodiment of the foregoing method, wherein generating the model relevance score comprises classifying intent of at least a portion of the step, and scoring similarity of the classified intent with the domain descriptors of each of the plurality of distinct machine learning models.

A further embodiment of the foregoing method, wherein selecting the subset of the plurality of distinct machine learning models comprises selecting those of the plurality of distinct machine learning models having associated model relevance scores above a threshold value.

A further embodiment of the foregoing method, wherein selecting the subset of the plurality of distinct machine learning models comprises selecting those of the plurality of distinct machine learning models having the highest associated model relevance scores among the plurality of distinct machine learning models.

A further embodiment of the foregoing method, wherein generating the step output comprises integrating intermediate outputs from of all models of the selected subset of distinct machine learning models.

A further embodiment of the foregoing method, wherein generating the step output comprises serially traversing all of the selected subset of distinct machine learning models via feed-forward of an initial model output of one of the selected subset of distinct machine learning models as input into another of the selected subset of distinct machine learning models.

A further embodiment of the foregoing method, wherein generating the step output comprises serially traversing the selected subset of the distinct machine learning models in multiple orders, and wherein generating the step output comprises integrating intermediate outputs of a last specialist model in each such sequence via a large language model.

A further embodiment of the foregoing method, further comprising training each of the plurality of distinct machine learning models using different training data specific to its associated domain descriptor.

A further embodiment of the foregoing method, wherein at least a subset of the plurality of distinct machine learning models are trained entirely separately from others of the plurality of distinct machine learning models, without overlapping training data.

A further embodiment of the foregoing method, wherein at least a subset of the plurality of distinct machine learning models are specialized in a respective domain via fine-tuning or transfer learning.

A further embodiment of the foregoing method, wherein each of the plurality of distinct machine learning models is a large language model.

A system for generating a response to a complex prompt, the system comprising: an input device configured to receive the complex prompt; a logic processor; machine-readable memory; a plurality of specialized large language models (LLMs) instantiated in the machine-readable memory, each of the specialized LLMs having an associated domain descriptor identifying its area of specialization; and a manager comprising: a planner instantiated in the machine-readable memory and operable via the logic processor to decompose the complex prompt into a plan having a plurality of steps; a selection module instantiated in the machine-readable memory and operable via the logic processor, for each of the plurality of steps, to identify an approach for generating a step output corresponding to a respective step using a step-specific subset of the plurality of specialized LLMs, wherein the approach identifies step-specific subset of the plurality of specialized LLMs by semantic comparison between the respective step and the associated domain descriptor of each of the plurality of specialized LLMs; and an integration module instantiated in the machine-readable memory and operable via the logic processor to generate a language output responsive to the complex prompt from all of the step outputs corresponding to each of the plurality of steps.

The system of the preceding paragraph can optionally include, additionally and/or alternatively, any one or more of the following features, configurations and/or additional components:

A further embodiment of the foregoing system, wherein, for each of the plurality of steps, the selection module is configured to assign a model relevance score to each of the plurality of specialized LLMs based on the semantic comparison between the respective step and the associated domain descriptor of the respective specialized LLM.

A further embodiment of the foregoing system, wherein assigning the model relevance score comprises evaluating cosine similarity of vectorized text of the associated domain descriptor to vectorized text of at least a portion of the respective step.

A further embodiment of the foregoing system, wherein assigning the model relevance score comprises classifying intent of at least a portion of the step, and scoring similarity of the classified intent to the domain descriptors of the respective domain descriptor.

A further embodiment of the foregoing system, wherein identifying the step-specific subset of the plurality of specialized LLMs comprises identifying those of the plurality of specialized LLMs having an associated model relevance score exceeding a threshold value.

A further embodiment of the foregoing system, wherein identifying the step-specific subset of the plurality of specialized LLMs comprises identifying those of the plurality of specialized LLMs having the model relevance scores among all of the plurality of specialized LLMs.

A further embodiment of the foregoing system, wherein the integration module is further operable to generate the step outputs for each of the plurality of steps using outputs from multiple of the subset of the plurality of specialized LLMs associated with that respective step.

A further embodiment of the foregoing system, wherein each of the plurality of specialized LLMs is trained to its respective area of specialization via fine tuning, transfer learning, or both.

Summation

Any relative terms or terms of degree used herein, such as “substantially”, “essentially”, “generally”, “approximately” and the like, should be interpreted in accordance with and subject to any applicable definitions or limits expressly stated herein. In all instances, any relative terms or terms of degree used herein should be interpreted to broadly encompass any relevant disclosed embodiments as well as such ranges or variations as would be understood by a person of ordinary skill in the art in view of the entirety of the present disclosure, such as to encompass ordinary manufacturing tolerance variations, incidental alignment variations, alignment or shape variations induced by thermal, rotational or vibrational operational conditions, and the like.

While the invention has been described with reference to an exemplary embodiment(s), it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment(s) disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

SPECIALIST LANGUAGE MODEL SET MAPPING AND SELECTION FOR PROMPT DELEGATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATION

Provisional Applications (1)