The present disclosure relates generally to machine learning (ML), and more particularly to systems for processing complex or compound queries using large language models (LLMs).
ML language models, including large language models, are commonly used to generate responses to queries. In some cases, complex or compound queries can be addressed by decomposing queries into a series of separate steps or tasks performed sequentially by a language model. In such systems, the language model used is preferably a large language model trained on a diverse range of inputs for many purposes, so as to offer versatility needed to identify and execute steps or tasks of any type needed to address the complex or compound query or prompt.
This disclosure presents a method of processing a compound prompt. This method includes decomposing the compound prompt into a plan having multiple steps, and associating a domain descriptor with each of several specialized machine learning models. For each step of the plan, a relevance score for each specialized model is assigned by semantic comparison between its domain descriptor and the step. A subset of the models are selected based on relevance score and used to produce a step output. The outputs of all steps are assembled into a syntactically and semantically coherent final output via an integration module utilizing a large language model.
This disclosure also presents a system for generating a response to a complex prompt. This system includes a manager and several specialized large language models (LLMs), each having an associated domain descriptor identifying its area of specialization. The manager includes a planner, a selection module, and an integration module. The planner is configured to decompose the complex prompt into a plurality of steps. The selection module is configured to identify approaches for generating outputs for each step using step-specific subset of the plurality of specialized LLMs. The step-specific subset is identified by the selection module based on semantic comparison between the respective step and the associated domain descriptor of each of the plurality of specialized LLMs. The integration module is configured to generate a language output responsive to the complex prompt from all of the step outputs corresponding to each of the plurality of steps.
The present summary is provided only by way of example, and not limitation. Other aspects of the present disclosure will be appreciated in view of the entirety of the present disclosure, including the entire text, claims, and accompanying figures.
While the above-identified figures set forth one or more embodiments of the present disclosure, other embodiments are also contemplated, as noted in the discussion. In all cases, this disclosure presents the invention by way of representation and not limitation. It should be understood that numerous other modifications and embodiments can be devised by those skilled in the art, which fall within the scope and spirit of the principles of the invention. The figures may not be drawn to scale, and applications and embodiments of the present invention may include features and components not specifically shown in the drawings.
The present disclosure presents methods and systems for processing complex or compound queries or prompts using a system including multiple separately specialized large language models. The term “compound prompt” can refer to a prompt explicitly including multiple separate tasks or steps, e.g., “(1) identify the three highest-selling jazz musicians of the 1970s, and then (2) generate a report comparing the musical styles of these three musicians.” The term “complex prompt” can refer more generally to any prompt requiring multiple steps to process, either implicitly or explicitly, e.g., “generate a report comparing the musical styles of the three highest-selling jazz musicians of the 1970s. ” Although distinctions can be drawn between complex and compound prompts, this disclosure will treat complex and compound prompts as equivalent for the purposes of explanation except where specifically stated.
Machine learning (ML) models are increasingly commonly used to process complex or compound queries. Complex prompts can, for example, demand retrieval of data from multiple or specialized sources, assembly of outputs (e.g. natural language, computer code, lists) from the retrieved data based on identified criteria, and/or subsequent processing of those outputs (e.g. transmission or archival to specified categories or locations and/or recipients). Existing solutions to complex prompt processing use large language model (LLM) planners to semantically decompose such prompts into multiple steps, then execute those steps either using the same LLM, or using native functions or databases. Tools for such approaches include Semantic Kernel and LangChain. Although many complex prompts most often involve sequential steps each contingent upon the results or outputs of previous steps, some complex prompts can also or instead include parallel tasks or steps, i.e., steps not contingent upon the results of some other steps.
The present disclosure presents systems and method for improving accuracy and reliability in complex prompt response through the use of multiple specialized machine learning models (MLMs). The present disclosure focuses principally on the use of specialized LLMs, but a person skilled in the art will understand that the approaches set forth below are mostly generalizable to other specialist MLMs, except where otherwise noted. In the most general case, this disclosure presents a system whereby general query processing is handled by a manager using a generalist or “jack-of-all trades” LLM or Meta-Language Model, but steps falling within specialized domains are advantageously handled by dedicated, specially-trained specialist LLMs selected by the manager. This disclosure provides advantageous examples of methods for mapping plan steps (i.e., tasks) to one or more appropriate models.
Processor 102 is a logic-capable device configured to execute software, applications, and/or programs stored on memory 104. Examples of processor 102 can include one or more of a processor, a microprocessor, a controller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other equivalent discrete or integrated logic circuitry. Processor 102 can be entirely or partially mounted on one or more circuit boards.
Memory 104 is a machine-readable storage medium configured to store information including complex prompt handling system 110, and can most generally include both transitory and non-transitory storage media. In some examples, a computer-readable storage medium can include a non-transitory information storage medium. The term “non-transitory” can indicate that the storage medium is not embodied in a carrier wave or a propagated signal. In certain examples, a non-transitory storage medium can store data that can, over time, change (e.g., in RAM or cache). In some examples, memory 104 is or includes a temporary memory. As used herein, a temporary memory refers to a memory having a primary purpose that is not long-term storage. Memory 104, in some examples, is described as volatile memory. As used herein, a volatile memory refers to a memory that does not maintain stored contents when power to the memory 104 is turned off. Examples of volatile memories can include random access memories (RAM), dynamic random access memories (DRAM), static random access memories (SRAM), and other forms of volatile memories. In some examples, the memory is used to store program instructions for execution by the processor. The memory, in one example, is used by software or applications running on hardware computing device 100 (e.g., complex prompt handling system 110) to temporarily store information during program execution. Memory 104, in some examples, also includes one or more persistent computer-readable storage. Examples of such persistent, non-volatile storage elements can include, for example, magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
User interface 106 is an input and/or output device and/or software interface, and enables an operator, such as user 108, to control operation of and/or interact with software elements of computer hardware device 100. For example, user interface 106 can be configured to receive inputs from an operator and/or provide outputs. Most relevantly for the present disclosure, user interface 106 provides means for user 108 to supply complex prompt 120 to computer hardware device 100. User interface 106 can, for example, be a local interface including one or more of a sound card, a video graphics card, a speaker, a display device (such as a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, etc.), a touchscreen, a keyboard, a mouse, a joystick, or other type of device for facilitating input and/or output of information in a form understandable to users and/or machines. In other mutually consistent embodiments, user interface 106 can additionally or alternatively include means such as a wired or wireless transceiver for communicating with a remote user, e.g., through a remote device and/or over a local or wide area network.
Computer hardware system 100 receives complex prompt 120 from user 108 via user interface 106, as noted above. Complex prompt 120 is interred in memory 104 as an input of complex prompt handling system 110 (see
Network 140 can be a local area network (LAN) such as a network connecting computer hardware device 100 to local user device and/or databases, or can be wide-area network (WAN) suitable for connecting computer hardware device 100 to servers and other computing components that are separated by greater geographic distances than the devices of a local network. Network 140 can include network infrastructure for connecting devices separated by larger geographic distances. In at least some examples, network 140 is the internet. For illustrative purposes, computer hardware device 100 can communicate with remote devices 142 and 144 via network 140. More generally, any number of remote devices can be communicatively connected to computer hardware device 100 via one or multiple networks 140.
As illustrated in
Manager 200 (with its various constituent modules 202-210) forms the core of complex prompt handling system 110 and is responsible both for initial processing of complex prompts for final assembly of outputs responsive to those complex prompts. Specialist models 220 (used herein to refer to any or all specialist models 220a-n) are MLMs, and in most embodiments specifically LLMs, that are trained for competence in specific domains identified by their respective domain descriptors 222 (used herein to refer to any or all domain descriptors 222a-n). Domain descriptors 222 can be natural-language terms or phrases identifying the specialization of their respective models 220, e.g., “mapping and routefinding,” “email generation,” or “mathematics.” Advantageously, manager 200 delegates specific steps necessary for the execution of each complex prompt 120 to appropriate specialist models 220, thereby providing advantages from specialized training (efficiency, reliability, reduced hallucination, etc.) at each step, in a generalist overall system.
As noted above, specialist models 220 are MLMs such as LLMs that are trained for efficiency and reliability within specific domains. As illustrative examples, specialist model 220a can be a model fine-tuned to scrape data from websites, specialist model 220b can be a model trained to generate emails to match a style of a particular individual (i.e., from training data including a set of that person's sent emails), and specialist module 220d can be a MLM trained to perform chemical reaction mathematics. Specialist models 220 need not have anything in common with each other but their accessibility to manager 200. In
As noted above, manager 200 includes planner 202, selection module 204, integration module 208, and model 210. Model 210 is an LLM trained to process natural language prompts. In various embodiments model 210 can be used by various functional components of manager 200, including planner 202, selection model 204, and/or integration module 208. Although planner 202, selection module 204, and integration module 208 are herein described separately below in terms of function for the purpose of explanation, in some embodiments many functions of planner 202, selection module 204, and/or integration module 208 may be performed by providing prompts or context injections to model 210, i.e., to a single shared generalist LLM used by manager 200. In other embodiments, however, manager 200 can include multiple models 210, each dedicated specifically to a subset of modules 202, 204, and/or 208.
Planner 202 is a semantic processing module capable of decomposing a complex prompt into a plurality of steps. Planner 202 can, for example, be a planner such as used in conventional systems such as Semantic Kernel or LangChain. In the most general case, planner 202 can be any suitable natural language processing (NLP) agent capable of identifying a plurality of actionable tasks (i.e., steps) for the resolution of complex prompt 120. Planner 202 can, for example, make use of model 210 for generative production of a response to complex prompt 120 that identifies these actionable tasks. In some embodiments planner 202 can immediately output a plan (i.e., several steps) in direct response to the complex prompt. In other embodiments planner 202 can assemble the plan over several iterations, e.g., by identifying an initial task or set of tasks insufficient to completely address the complex prompt, then supplementing this partial plan with an additional step or steps in response to completion of the initial task or tasks. For simplicity of explanation, the following explanation does not distinguish between initially-generated and subsequently-generated steps.
Selection module 204 is responsible for identifying, for each step provided by planner 202, an approach to executing that step using specialist models 220. As described in greater detail below, selection module 204 is responsible for identifying, for each step identified by planner 202, a corresponding set of models to be queried at that particular step and a method by which this/these model(s) are to be queried. Selection module 204 can, by way of example, identify a subset of models suitable for execution each step from among all models (210 and 220) available within complex prompt handling system 110, and provide a prompt (general or step-specific, and either generated by planner 202 as a part of the plan, or generated by selection module 204 based on the plan and model record 206; see below) corresponding to the step in question to the selected each of the models of that subset.
Selection module 204 can include model record 206 that maps a domain to each specialist model 220, i.e., that reflects domain descriptors 222. Model record 206 can, in some examples, be generated or updated on-the-fly as specialist models 220 are added to or removed from complex prompt handling system 110. Model record 206 enables selection module 204 to perform intent identification of steps generated by planner 202 via model 210 (or another appropriate model) to identify specialist models 220 having a domain (i.e., subject-matter specialization) relevant to that intent. In some such embodiments, model 210 can be trained in model selection through provision of a large number of (step/task) prompts labeled as suited for a particular specialist model. In alternative embodiments, functions described herein by reference to planner 202 and selection module 204 can be performed inseparably based on context of complex prompt 120 using a meta-language model (i.e., wherein model 210 is a meta-language model) trained to identify specialist models 220 as a part of each plan step generated from complex prompt 120, or trained to semantically identify a task or type of task referenceable against domain descriptors 222.
Manager 200 delegates execution of a particular plan step to selected specialist models 220 by providing prompts corresponding to that step to the selected specialist models 220. In some embodiments, model record 206 can, in addition to mapping domains of each specialist model 220, also identify prompts rules, formats, or templates suitable for each or some specialist models 220, e.g., for context injection based on the designated model and the specific step, or complex prompt 120. More generally, prompts provided to models 210 and 220 can include retrieval automated generation (RAG) or other context injection to reduce hallucination or otherwise constrain outputs based on templates retrieved from or data validated through outside data storage, e.g., lists and/or vector and/or relational databases either stored in memory 104 or retrieved from other devices 130, 142, or 144. Unless otherwise specified, this context injection can be provided by manager 200 based on operation of selection module 204, or can be sourced externally to complex prompt handling system 110 based on the nature of prompt provided by selection module 204 to the selected specialist model.
Specialist models 220 advantageously offer improved performance within their specialized domains over using a generalist (general-purpose, jack-of-all-trades) model, but may be less capable outside of those specialized domains. Steps for which selection module 204 can identify no appropriate specialist model 220 are handled by model 210, or by another generalist model. Specialist models 220 can have domains of varying breadth. In some embodiments, complex prompt handling system 110 can include both specialist models with non-overlapping domains, and specialist models with overlapping domains, e.g., of broader or narrower scope. During the execution of each step generated by planner 202 in response to complex prompt 120, planner 202 may identify multiple relevant specialist models 220 to be separately or collectively prompted during that step, as discussed in greater detail below.
Integration module 208 is a natural language processing module disposed to generate a singular output responsive to the complex prompt based on the outputs of the steps of the plan generated by planner 202, as executed by specialist models 220 (and in some instances model 210) per delegation by selection module 204. Integration module 208 can, for example, include its own specialized LLM trained to aggregate these various model outputs into a semantically and syntactically coherent single output without introducing hallucinations or errors, or omitting information provided by the various specialist models 220. Alternatively, model 210 can be a generalist LLM (as described above) capable of performing this function using prompts generated by integration module 208 from the aforementioned outputs of specialist models 220 (and in some instances model 210), i.e. such that the same model 210 provides the trained machine learning backbone of integration module 208 and planner 202, selection module 204, or both. Integration module 208 can in some embodiments receive inputs from all designated specialist models 220 used in handling steps identified by planner. In other embodiments, where outputs of some specialist models 220 are used only to provide inputs to other specialist models 220 (see
In step 302, the training data is generated. For training the computer-implemented machine learning model(s) used in complex prompt handling system 110, training data includes domain specific information with example inputs labeled (i.e. mapped to or tagged with) example outputs. Data can be labeled entirely manually or, to reduce labor, can be labeled in a semi-automated fashion, e.g. using clustering or analogy to manually labeled data. Each specialist model 220 is trained on different training data, although in some embodiments some specialist models 220 can be trained on training data that is a subset of or overlaps with training data used to train other specialist models 220.
In step 304, the labeled data is used to train each computer-implemented machine learning model (i.e. models 210, 220) to produce appropriate outputs given inputs within its domain. Assuming models operate within their domains, training in broader domains, i.e. of models intended for more general use, will produce less reliable or accurate outputs than narrower training, i.e. of more specialized models, for many applications. This is generally the case not only when conserving overall volume of training data (i.e. such that a model having a narrower domain has a higher density of training data within that domain), but also when conserving density of training data (i.e. where a comparatively less specialized model is trained with more data, but the subset of that training data corresponding to the domain of a narrower model is analogous to the entirety of training data used to train the narrower model). In other words, the introduction of out-of-domain training data to broaden or generalize model competence can, within some domains, produce less reliable or accurate outputs in-domain. As a consequence, specialist models 220 of various scopes can be useful so long as selection module 204 is capable of delegating tasks or steps intelligently.
As used herein, “training” a computer-implemented machine learning model refers to any process by which parameters, hyper parameters, weights, and/or any other value related model accuracy are adjusted to improve the fit of the computer-implemented machine learning model to the training data. The labeled data can be transformed by, for example, one or more programs and/or one or more other trained machine learning models before it is used for training in step 304.
In step 306, the trained computer-implemented machine learning model is tested with domain-specific test data. This test data is unlabeled data that can be used to qualify and/or quantify performance of the trained computer-implemented machine learning model. More specifically, a human or machine operator can evaluate the performance of the machine learning model by evaluating the fit of the model to the test data. Step 306 can be used to determine, for example, whether the machine learning model was overfit to the labeled data during model training in step 304.
As depicted in
Training method 300 can advantageously be used to train any machine learning model described herein. More generally, the systems and methods disclosed herein advantageously allow for the training and use of machine learning models that can be used for a general-purpose model (e.g. model 210) as well as for domain-specific specialist models (e.g. specialist models 220) by varying the scope of training of test data.
In some embodiments, model 210 can be a general purpose model produced by method 300, and at least some of specialized models 220 can be produced from model 210 via fine-tuning and/or transfer learning from model 210. In other embodiments, at least some specialized models 220 can be trained entirely independently of model 210, or even entirely independently of any more general model. In either case, the domain of training and testing data used for training specialist models 220 is narrower, i.e. more specialized, than the domain of training data used to train model 210.
Each step 506 generated by plan 504 includes a task generally specifying an output. Steps 506 can take the form of prompts. Carrying forward the illustrative examples introduced with discussion of
As illustrated in
Results from multiple steps 506 can be merged by integration module 208 (e.g. using model 210) at the stage of producing final output 510. In some embodiments, however, results from execution of one or more steps 506 using associated approach 508 can be inputs of other steps 506, as noted above.
As shown in
Semantic analysis 602 of each respective domain descriptor 222 for relevance to step 506 produces a respective model relevance score 604a-n (hereinafter generically and/or collectively relevance score(s) 604) quantifying a degree of relevance of corresponding specialist model 220 to step 506. Model relevance scores 604 can, in some embodiments, be binary, identifying a model either as relevant or irrelevant to step 506. In more complex embodiments of process 600, model relevance scores 604 can reflect a degree of semantic overlap between step 506 and associated domain descriptor 222. In some such embodiments, semantic analysis 602 can, for example, include vector similarity (e.g., cosine similarity) scoring between vectorized text of all or parts of step 506 and each domain descriptor 222. In other such embodiments, scoring of model relevance 604 can, for example, be based on intent recognition (i.e., intent classification) performed, e.g., by model 210. In still further embodiments, model 210 or a separate dedicated delegation model (not shown) can undergo reinforcement learning or other training as set forth in
If relevance of specific models cannot be ascertained from step 506 based on a lack of information in complex prompt 120, complex prompt handling system 110 may solicit addition information from user 108 via follow-up questions or requests for clarification (i.e., as new complex prompts 120, either replacing or cumulative with a previous complex prompt 120). In some embodiments the need for such clarifications can be identified, and clarification requested, prior to generation of plan 504 by planner 202.
Selection module 204 selects (action 606) a subset of available specialist models (hereinafter selected model set 608) 220 based on their respective model relevance scores 604. In some embodiments, selected model set 608 can consist of the single highest-scoring model 220, or of a preset number N of highest-scoring models 220. In other embodiments, selected model set 608 can include all models scoring over a minimum relevance threshold (e.g., cosine similarity>0.6). In combinations of these approaches, selected model set 608 can include up to a maximum number of highest-scoring models 220, excluding any scoring below a threshold value. Each specialist model 220 in selected model set 608 (illustratively shown as including specialist models 220b and 220n) is used in the generation of an intermediate model output 610 provided to integration module 208.
In one embodiment, all specialist models in selected model set 608 are prompted in parallel. According to this approach, each specialist model 220 in selected model set 608 is provided with step 506 or a natural language portion thereof as a prompt, producing a respective intermediate model output 610 therefrom. These intermediate model outputs (illustratively shown as intermediate model outputs 610b and 610n, corresponding to specialist models 220b and 220n, respectively) are synthesized by integration module 208 to produce step output 212.
In an alternative embodiment, specialist models in selected model set 608 are prompted in series, i.e., as a feed-forward from one model to the next. According to this approach, a first specialist model 220 in selected model set 608 is provided with step 506 or a natural language portion thereof as a prompt (specialist model 220b, in the illustrated example). An output of this specialist model is provided as input to another specialist model (specialist model 220n, in the illustrated example), either alone or in combination with other information such as step 506. This feed-forward approach continues until all specialist models 220 in selected model set 608 have been queried, thereby producing an intermediate model output 610 (intermediate model output 610n, in the illustrated example). In some embodiments, selected model set 608 can be traversed serially in multiple orders (e.g., ABC, ACB, CBA, . . . etc.), with each such traversal producing an associated intermediate model output 610. In some embodiments of process 600, a combination of serial and parallel approaches can be used to traverse selected model set 608.
Intermediate model outputs 610 are reconciled and/or aggregated by integration module 208 to produce step 612, an output responsive to step 506. Integration module 208 can, for example, poll individual model outputs 610 for commonalities between outputs, or for agreement between different models. Step outputs 612 for each step 506 are synthesized by integration module 208 to produce final output 510 responsive to complex query 120 as a whole, as described above with reference to
The systems and method set forth above advantageously allow specialized machine learning models to be leveraged to complete relevant steps of complex prompt responses, thereby enabling system 10 as a whole to exhibit broad general competence across a wide range of subject matter while reducing hallucination and error rate compared to purely generalist model-based approaches. This is accomplished by separately training multiple specialist models 220, and by identifying and traversing a relevant subset of specialist models 220 with respect to each step 506 involved in responding to complex prompt 120 via an appropriate approach 508.
A method of processing a compound prompt, the method comprising: decomposing the compound prompt into a plan having a plurality of steps via a planner module instantiated in machine-readable memory and operable by a processor; associating a domain descriptor to each of a plurality of distinct machine learning models; for each of the plurality of steps: generating a model relevance score by semantic comparison of the step to the domain descriptors of each of the plurality of distinct machine learning models; selecting a subset of the plurality of distinct machine learning models based on the respective model relevance score of each of the plurality of distinct machine learning models; and generating a step output addressing the step, from the selected subset of the plurality of distinct machine learning models; and assembling all of the step outputs into a syntactically and semantically coherent final output via an integration module utilizing a large language model.
The method of the preceding paragraph can optionally include, additionally and/or alternatively, any one or more of the following features, configurations and/or additional components:
A further embodiment of the foregoing method, wherein generating the model relevance score comprises evaluating vector cosine similarity between vectorized text of each of the domain descriptors and at least a portion of the step.
A further embodiment of the foregoing method, wherein generating the model relevance score comprises classifying intent of at least a portion of the step, and scoring similarity of the classified intent with the domain descriptors of each of the plurality of distinct machine learning models.
A further embodiment of the foregoing method, wherein selecting the subset of the plurality of distinct machine learning models comprises selecting those of the plurality of distinct machine learning models having associated model relevance scores above a threshold value.
A further embodiment of the foregoing method, wherein selecting the subset of the plurality of distinct machine learning models comprises selecting those of the plurality of distinct machine learning models having the highest associated model relevance scores among the plurality of distinct machine learning models.
A further embodiment of the foregoing method, wherein generating the step output comprises integrating intermediate outputs from of all models of the selected subset of distinct machine learning models.
A further embodiment of the foregoing method, wherein generating the step output comprises serially traversing all of the selected subset of distinct machine learning models via feed-forward of an initial model output of one of the selected subset of distinct machine learning models as input into another of the selected subset of distinct machine learning models.
A further embodiment of the foregoing method, wherein generating the step output comprises serially traversing the selected subset of the distinct machine learning models in multiple orders, and wherein generating the step output comprises integrating intermediate outputs of a last specialist model in each such sequence via a large language model.
A further embodiment of the foregoing method, further comprising training each of the plurality of distinct machine learning models using different training data specific to its associated domain descriptor.
A further embodiment of the foregoing method, wherein at least a subset of the plurality of distinct machine learning models are trained entirely separately from others of the plurality of distinct machine learning models, without overlapping training data.
A further embodiment of the foregoing method, wherein at least a subset of the plurality of distinct machine learning models are specialized in a respective domain via fine-tuning or transfer learning.
A further embodiment of the foregoing method, wherein each of the plurality of distinct machine learning models is a large language model.
A system for generating a response to a complex prompt, the system comprising: an input device configured to receive the complex prompt; a logic processor; machine-readable memory; a plurality of specialized large language models (LLMs) instantiated in the machine-readable memory, each of the specialized LLMs having an associated domain descriptor identifying its area of specialization; and a manager comprising: a planner instantiated in the machine-readable memory and operable via the logic processor to decompose the complex prompt into a plan having a plurality of steps; a selection module instantiated in the machine-readable memory and operable via the logic processor, for each of the plurality of steps, to identify an approach for generating a step output corresponding to a respective step using a step-specific subset of the plurality of specialized LLMs, wherein the approach identifies step-specific subset of the plurality of specialized LLMs by semantic comparison between the respective step and the associated domain descriptor of each of the plurality of specialized LLMs; and an integration module instantiated in the machine-readable memory and operable via the logic processor to generate a language output responsive to the complex prompt from all of the step outputs corresponding to each of the plurality of steps.
The system of the preceding paragraph can optionally include, additionally and/or alternatively, any one or more of the following features, configurations and/or additional components:
A further embodiment of the foregoing system, wherein, for each of the plurality of steps, the selection module is configured to assign a model relevance score to each of the plurality of specialized LLMs based on the semantic comparison between the respective step and the associated domain descriptor of the respective specialized LLM.
A further embodiment of the foregoing system, wherein assigning the model relevance score comprises evaluating cosine similarity of vectorized text of the associated domain descriptor to vectorized text of at least a portion of the respective step.
A further embodiment of the foregoing system, wherein assigning the model relevance score comprises classifying intent of at least a portion of the step, and scoring similarity of the classified intent to the domain descriptors of the respective domain descriptor.
A further embodiment of the foregoing system, wherein identifying the step-specific subset of the plurality of specialized LLMs comprises identifying those of the plurality of specialized LLMs having an associated model relevance score exceeding a threshold value.
A further embodiment of the foregoing system, wherein identifying the step-specific subset of the plurality of specialized LLMs comprises identifying those of the plurality of specialized LLMs having the model relevance scores among all of the plurality of specialized LLMs.
A further embodiment of the foregoing system, wherein the integration module is further operable to generate the step outputs for each of the plurality of steps using outputs from multiple of the subset of the plurality of specialized LLMs associated with that respective step.
A further embodiment of the foregoing system, wherein each of the plurality of specialized LLMs is trained to its respective area of specialization via fine tuning, transfer learning, or both.
Any relative terms or terms of degree used herein, such as “substantially”, “essentially”, “generally”, “approximately” and the like, should be interpreted in accordance with and subject to any applicable definitions or limits expressly stated herein. In all instances, any relative terms or terms of degree used herein should be interpreted to broadly encompass any relevant disclosed embodiments as well as such ranges or variations as would be understood by a person of ordinary skill in the art in view of the entirety of the present disclosure, such as to encompass ordinary manufacturing tolerance variations, incidental alignment variations, alignment or shape variations induced by thermal, rotational or vibrational operational conditions, and the like.
While the invention has been described with reference to an exemplary embodiment(s), it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment(s) disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.
The present application claims priority to U.S. provisional patent application Ser. No. 63/543,454 by S. Joynt, filed Oct. 10, 2023 and entitled “COMPOUND PROMPT PROCESSING USING MULTIPLE INTEGRATED DOMAIN-SPECIALIZED LANGUAGE MODELS.”
| Number | Date | Country | |
|---|---|---|---|
| 63543454 | Oct 2023 | US |