Large language models (LLMs) are powerful tools that can comprehend natural language text and other complex information, and generate text responses to various input prompts, such as queries or commands. To interact with an LLM, a human user provides a prompt and receives a response from the model, which is based on its internal algorithms. The user then evaluates the response and decides whether it answers the prompt satisfactorily. This process may be repeated several times until the user reaches a desired outcome (or gives up).
Current LLMs, such as PaLM, GPT-4, and LLAMA, can exhibit both remarkable and baffling behaviors, depending on the task. In evaluating what causes these discrepancies, research has primarily focused on testing LLMs on different tasks so the user can gain insights about their behavior. For GPT-4, a large collection of behaviors has been studied on several tasks. Benchmarking efforts have been conducted in various domains such as causal discovery, summarization, and reasoning. These research directions are reasonable, given that most LLMs are blackboxes that cannot easily be inspected, interpreted, or explained. However, the field lacks a mathematical framework to systematically describe, compare, and improve LLMs.
The disclosed examples are described in detail below with reference to the accompanying drawing figures listed below. The following summary is provided to illustrate some examples disclosed herein. The following is not meant, however, to limit all examples to any particular configuration or sequence of operations.
Example solutions for processing LLM prompts include: receiving an input large language model (LLM) prompt; creating a first LLM prompt based on the input LLM prompt, the first LLM prompt representing a first step toward generating a first solution to the input LLM prompt; submitting the first LLM prompt to an LLM as a first sub-query, thereby resulting in the generation of a first LLM output; creating a second LLM prompt based on the input LLM prompt, the second LLM prompt representing a second step toward generating the first solution to the input LLM prompt, the second LLM prompt including the first LLM output; submitting the second LLM prompt to the LLM as a second sub-query, thereby resulting in the generation of a second LLM output; and transmitting the second LLM output as the solution to the input LLM prompt.
The disclosed examples are described in detail below with reference to the accompanying drawing figures listed below:
Corresponding reference characters indicate corresponding parts throughout the drawings. Any of the drawings may be combined into a single example or embodiment.
In examples, a system and associated framework are described that clarify key terms and concepts in large language model (LLM) research such as, for example, hallucinations, alignment, self-verification and chain-of-thought reasoning. The system offers a precise and consistent way to characterize LLMs, identify their strengths and weaknesses, and integrate new findings. The system differentiates chain-of-thought reasoning from chain-of-thought prompting and establishes the conditions under which they are equivalent. This distinction clarifies the basic assumptions behind chain-of-thought prompting and its implications for methods that use it, such as self-verification and prompt programming.
The system described herein provides a formal framework for LLMs that helps both researchers and practitioners explore new possibilities for generative artificial intelligence (AI). This system provides a tool for opening up new research avenues. The formal definitions and results described herein help advance the discussion on how to build generative AI systems that are safe, reliable, fair, and robust, especially in domains like healthcare and software engineering.
An example system offers a new perspective on interpreting LLMs and presents a mathematical framework that formalizes and generalizes what is known about LLMs. The system pursues at least two objectives. First, the system defines key LLM concepts such as alignment, hallucinations, and chain-of-thought reasoning using a proposed mathematical framework. Second, the system provides a reasoning tool that facilitates a common understanding of existing LLM research and forms a basis for exploring new questions and challenges. An example framework is based on commutative diagrams that relate different levels of abstraction and computation in LLMs. The framework is usable by researchers in LLMs to share, position, and reason about their findings. An example framework also captures the essence of LLMs as abstract execution machines that use natural language as an interface, which reveals the assumptions and implications of various methods that use LLMs, such as self-verification and prompt programming. An example framework also formalizes and evaluates the output of LLMs using distance metrics and compatibility relations.
The example system improves the performance of LLMs to avoid hallucinations and misalignments. The creation of multiple LLM prompts based on an initial input prompt and submitting those LLM prompts as sub-queries to the LLM allows the system to break the input prompt into several parts or steps toward generating a solution to the input LLM prompt. Such a divisional approach reduces the number of responses that have hallucinations or misalignments. When responses have such erroneous output, users typically reword and resubmit additional queries to the LLM. These additional queries add additional computational overhead to the LLM, as well as consumption of additional network bandwidth for the added queries and responses. The example system reduces the number of queries needed to generate accurate output, thus providing a technical improvement.
While modules 122-126 are shown as provided by the LLM analytics device 120 for purposes of illustrating aspects of the system 100, it should be understood that any of the operations described herein can be provided by any of the modules 122-126 or other components of the LLM analytics system 100 not expressly shown. Further, while the example architecture shown in
During operation, the LLM analytics system 100 uses an LLM 110 to process input prompts 106A, 106B (collectively, “input prompts 106”). In some examples, input prompts 106A are sent directly to the LLM 110 (e.g., from a user computing device 104) to generate LLM output 112. In other examples, an LLM analytics device 120 acts as an interface to the LLM 110 (e.g., for the user computing device 104), taking the input prompt 106B from the user computing device 104 and using the LLM 110 to generate curated output 134 given that prompt 106B.
The LLM 110 is an execution machine that uses natural language as an interface. More specifically, the LLM 110 is a type of artificial intelligence program designed to understand and generate human-like text based on the input received. The LLM 110 is built using a neural network (e.g., a transformer model), which is adept at handling sequential data, such as text. The LLM 110 is trained on a diverse and extensive dataset comprising, for example, text from books, websites, articles, and other written materials. This training enables the LLM 110 to learn language patterns, grammar, facts about the world, and various writing styles. Once trained, the LLM 110 can perform a range of language-related tasks, such as answering questions, writing essays, summarizing texts, translating languages, and creating content such as poems or code. The LLM 110 is designed to understand context from the text input received. As such, the LLM 110 can generate coherent and contextually relevant responses based on conversation history and the input text provided. Despite the capabilities, the LLM 110 can sometimes generate incorrect or nonsensical responses, especially when dealing with complex or ambiguous queries. The LLM 110 can also unintentionally propagate biases present in the training data.
In examples, given an input prompt 106 that captures the intentions of the user 102, the LLM 110 produces an output answer in the form of text that is presented to the user 102 (e.g., as LLM output 112, intermediate outputs 132, or curated output 134 via processing by the LLM analytics device 120). The prompt 106 aims to solve a problem such as, for example, a mathematical operation, a summarization, or the generation of a certain type of text. The prompt 106 expresses an intention to execute some operations based on some information provided by the user 102. The LLM 110, which in many senses operates as a “black box,” uses the internal memory acquired at training time to provide an answer by sampling from some probability distribution conditioned on the input prompt 106. This output 112 is computed by, for example, choosing the next word that best fits the previous text, and then repeating that process for each subsequent word.
In some examples, the input prompt 106B is received by the LLM analytics device 120, and the LLM analytics device 120 uses the prompt processing module 126 to break the problem presented by the input prompt 106B into multiple sub-problems. More specifically,
In examples, the system 100 provides a unified framework that formally defines example concepts for LLMs 110, including alignment (or misalignment 116), hallucinations 114, chain-of-thought reasoning, self-verification, and prompt programming. In these examples, a concrete state is defined as a function that assigns values from a concrete domain to a finite set of variables. V denotes a set of variables and C denotes a concrete domain. As such, σ: V→C indicates that σ is a concrete state. For example, suppose V={x,y} and C=Z, the set of integers. Then, one possible concrete state is σ={x→1, y→2}, which means that σ(x)=1 and σ(y)=2. The set of all concrete states over V and C is denoted by 2V-C.
Further, for an example problem, let q: 2V→C→2V→C be a query, and let σ be a concrete state. Informally, the query q expresses what a user wishes to compute over a state σ. For example, q might select some variables, apply some operations, or filter some conditions. As such, a problem is defined by the query-state pair (q,σ). A prompt is a natural language expression of a problem (q,σ) that the LLM 110 can solve. While examples provided herein separate elements to simplify this exposition, both elements can be mixed in some prompts. The prompt (
In examples, given a prompt (
In this example, the diagram 200 represents different ways or paths to solve a problem (q, σi). Each path starts from an initial concrete state σi and ends at a final concrete state σi+1. The diagram 200 is commutative if and only if every path leads to a compatible σi+1. In other words, if and only if Λq(σi)=(γc∘γa∘Λ{circumflex over (q)}∘αa∘αC)(σi), where ≡ is a suitable compatibility relation. Additional examples of compatibility relations, such as equality, are provided below. Further, in this example, the path starting with the state string
For concrete state, the set of variables V={x,y,z} is defined, where x, y, and z are the variables of the problem. The concrete domain C is the set of integers Z. The concrete initial state is defined as σi={x→12, y→13, z→⊥}, where ⊥ represents an undefined value. The query is q=λσ. σ(x)+σ(y) and the problem 300 is the pair (q, σi). The functional Λq(σi) computes the concrete output state σi+1. As such, σi+1={x→12, y→13, z→25} is the correct answer that the user 102 expects from the LLM 110 (e.g., the “ground truth” of this example problem).
As an abstraction map, the function ac maps the concrete input state σi to a state string
In this example, the LLM output 304 for prompt 302 is “The value of z is 25”, and this is represented by the state string
For concrete state, the set of variables V={x, body, legs, head, tail, horn, m} is defined and represents the information needed to draw an animal using TikZ, which is a language for creating graphics in LaTeX. The variable x holds the name of the animal we want to draw (string value), such as unicorn, and the variables body, legs, head, tail, and horn hold the TikZ code for drawing each part of the animal, such as a circle, a line, or a curve. The variable m is a slack variable that can be used to store any additional information or code that is not captured by the other variables. The system 100 starts with an initial state σi, where x is assigned the value “unicorn” and all the other variables are assigned the undefined value L, meaning that we do not have any code for drawing the unicorn yet.
In this example, the query 402, q, is a TikZ code generator that takes σi as input, draws a diagram of σi(x), and the problem is the pair (q, σi). The functional Λq(σi) computes the concrete output state σi+1. In particular, σi+1={x→“unicorn”, body→<code for body>, legs→<code for legs>, head→<code for head>, tail→<code for tail>, horn→code for horn, m→⊥}. This is the expected answer for the user 102 (e.g., the ground truth), which is defined as any code that sketches the unicorn. However, there are many possible ways to write such code. In examples, and as described in further detail below, the system 100 measures the quality of output 404, 406 provided by the LLM 110.
As an abstraction map, the function ac maps the concrete input state σi to a state string
In this example, the LLM output 404 for the prompt 402, as shown in
Unlike the arithmetic example 300 of
The terms “hallucination” and “misalignment” are used to describe two common issues with LLMs. Hallucination means that LLMs produce wrong or irrelevant responses to prompts, but the output makes them sound reasonable or coherent. This can be dangerous when LLMs are used in critical domains where accuracy and safety are important. Alignment means that LLMs act in accordance with their human users' intentions. LLMs that are misaligned act differently from what their users want. This can also cause harm, such as giving wrong answers, generating biased outputs, or discriminating results. Alignment involves tuning LLMs to encourage desired behaviors and discourage undesired ones.
In this example, the system uses the diagram 200 and associated framework to represent hallucination and misalignment. Given a problem (q, σ), a compatibility relation ≡ is defined to compare the equivalence of two concrete states. Hallucination and misalignment occur when the diagram 200 for (q, σ) does not commute, meaning that the paths from σ to the output concrete state do not produce equivalent states under ≡. Formally, this means that the compatibility relationship (γ∘Λ{circumflex over (q)}∘α)(σ)=Λq(σ), where (γ∘Λ{circumflex over (q)}∘α)(σ) is the answer generated by the LLM 110, does not hold. The system 100 detects hallucinations or misalignments by comparing states, and thus a suitable notion of equivalence (≡) is defined that captures the types of errors of interest.
For example, consider the simple arithmetic problem 300 of
However, suppose the LLM 110 outputs “The sum of x and y is a tiger”. This answer is both wrong and misaligned because it contains a nonsensical output that is not an integer. In this case, the only way for the two states to be aligned is if every variable in the scope has an integer value. That is, (σ1≡σ2)⇐⇒∀x∈V.typeof(σ1(x))=typeof(σ2(x))=int.
To evaluate how well the LLM 110 performs a task, the system 100 measures the extent of its hallucinations or misalignments. For example, given a problem (q, σ), the output of an LLM 110 is defined by the state string
As such, the system 100 evaluates how well the LLM 110 solves a problem by comparing its output state, σ, with a reference state that represents the correct answer, Λq(σ). To do this, the system 100 computes a distance metric Δ: 2V→C×2V→C→+ that quantifies how different two states are. The choice of A depends on the problem scenario, but it must satisfy a consistency condition: the diagram 200 must commute, meaning that the concrete state derived from the output of the LLM, γc(
For example, suppose the system 100 is solving the problem of adding 12+13. If the output of the LLM 110 is σ and the system 100 applies γc to it, the system 100 generates γc(
Some problems need complex distance methods to measure how close the diagram 200 is to being commutative. This is more difficult when there are many valid solutions for the ground truth Λq(σ), such as in code generation where different solutions are acceptable. In examples, the user 102 of the LLM 110 may define these metrics to capture the right properties for these problems.
In examples, the method involves prompting the LLM 110 to generate intermediate steps that lead to the final answer, and using these steps as additional context or feedback for the next step. More specifically, the input prompt 106B is transmitted to the LLM analytics device 120, and the LLM analytics device 120 interacts with the LLM 110 by sending one or more intermediate prompts 130 to the LLM 110 and receiving an intermediate output 132 in response, before generating a final output, or “curated output” 134 that is sent to the user computing device 104. This curated output 134 represents an output that is improved over what would be directly generated by the LLM 110 given the same input prompt 106 (e.g., a decrease in hallucinations 114, misalignments 116, or other errors, as compared to the LLM output 112 that would be generated by application of the input prompt 106A to the LLM 110 directly).
The diagram 500 in
In this example, to formalize CoT reasoning, the LLM analytics device 120 creates diagram 500. Consider a problem (q, σi), where q is a query (e.g., the input query 106B) and σi is a concrete input state. The LLM analytics device 120 splits this problem into two sub-problems 502 (q1, σ1) and 504 (q2, σ2), such that the query transformation Λq is the same as applying Λq
In a first example lemma, suppose a problem (q, σi) can be solved by splitting it into two sub-problems (q1, σ1) and (q2, σ2), such that Λ(σi)=(Λq
To elicit CoT reasoning, CoT prompting embeds it in the prompt sent to the LLM 110 (e.g., via intermediate prompt(s) 130). This method is convenient because it avoids extra calls to the LLM 110, which can be costly. In the diagram 600 shown in
In an example second lemma, suppose a problem (q, σ1) can be solved by splitting it into two sub-problems (q1, σ1) and (q2, σ2), such that Λ(σi)=(Λq
Consider as a corollary, CoT prompting and CoT reasoning produce the same output for a problem (q, σ) if and only if both of the following conditions are true: (i) CoT reasoning holds for the problem (q, σ) (that is, Λq=Λq
CoT prompting and CoT reasoning are equivalent only under the conditions given by this corollary. However, many known systems (including engineering APIs) rely on CoT prompting without verifying this. This can lead to incorrect outcomes. Typically, CoT prompting may not be sufficient for enabling CoT reasoning.
In this example, the first prompt 902 submitted to the LLM 110 is “Prompt1”, namely “You are a programmer that uses TikZ to generate diagrams. Draw a hexagon.” In response, the LLM 110 generates TikZ code (not shown) that results in first intermediate output 132, identified in
In this example, the LLM analytics device 120 then uses the first and second intermediate outputs 132 from the prompts 902, 904 (e.g., ‘{answer1}’ and ‘{answer2}’) to create and submit a third prompt 906 to the LLM 110 as “Prompt3”, namely “You are a programmer that uses TikZ to generate diagrams. A hexagon can be drawn as {answer1}. A circle inside the hexagon can be drawn as {answer2}. Draw a square inside the circle.” In response, the LLM 110 generates the figure shown at the bottom of
In examples, subsequent intermediate prompts 130 (e.g., prompts 904, 906) can identify, reference, or otherwise include one or more intermediate outputs 132 generated by prior intermediate prompts 130 (e.g., prompts 902, 904). For purposes of illustration, the example shown in
In examples, the initial input prompt 106B is parsed into a plurality of intermediary steps. For example, presume that the initial input prompt 106B of the example in
In this example, while parsing this example initial input prompt 106B, the LLM analytics device 120 identifies the sentence “First, draw a hexagon” as the first step or instruction (e.g., based on the temporal adverb ‘first’), and thus generates the first prompt 902 as “<Contextual text>.<Instruction 1>”, where <Contextual text> in these examples is “You are a programmer that uses TikZ to generate diagrams”, and where <Instruction 1> is “Draw a hexagon” (e.g., from the sentence of the initial input prompt 106B, “First, draw a hexagon”, with the temporal adverb being removed). As such, the first prompt 902 is created as “You are a programmer that uses TikZ to generate diagrams. Draw a hexagon.” When applied to the LLM 110, this first prompt 902 generates an intermediate output 132 (not shown) that can subsequently be referenced as ‘{answer1}’ in subsequent intermediate prompts 130.
Likewise, for the second prompt 904, the LLM analytics device 120 identifies a second step or instruction from the next sentence of the initial input prompt 106B (e.g., “Then, draw a circle within the hexagon” as <Instruction 2>). From this second instruction, the second prompt 904 is created as “<Contextual text>.<Intermediate Output 1>.<Instruction 2>”, where <Intermediate Output 1> is an inclusion or reference to the prior intermediate output 132 (e.g., the ‘{answer1}’ generated by the first prompt 902, shown here as the sentence “A hexagon can be drawn as {answer1}”), and where <Instruction 2> is “Draw a circle inside the hexagon”. As such, the second prompt 904 is created as “You are a programmer that uses TikZ to generate diagrams. A hexagon can be drawn as {answer1}. Draw a circle inside the hexagon.” Similarly, when applied to the LLM 110, this second prompt 904 generates an intermediate output 132 (not shown) that can subsequently be referenced as ‘{answer2}’ in subsequent intermediate prompts 130.
For the third prompt 906, the LLM analytics device 120 identifies a third step or instruction from the next sentence of the initial input prompt 106B (e.g., “Then, draw a square inside the circle” as <Instruction 3>). From this third instruction, the third prompt 906 is created as “<Contextual text>.<Intermediate Output 1>.<Intermediate Output 2>.<Instruction 3>”, where <Intermediate Output 1> is an inclusion or reference to {answer1} generated by the first prompt 902, <Intermediate Output 2> is an inclusion or reference to {answer2} generated by the second prompt 904, and where <Instruction 3> is “Draw a square inside the circle”. As such, the third prompt 906 is created as “You are a programmer that uses TikZ to generate diagrams. A hexagon can be drawn as {answer1}. A circle inside the hexagon can be drawn as {answer2}. Draw a square inside the circle.” Similarly, when applied to the LLM 110, this third prompt 904 generates an intermediate output 132 (shown as output 908), which can finally be provided as curated output 134.
In some examples, parsing the initial input prompt 106B can include identifying multiple steps based on one or more pre-configured templates based on matching a syntactic form of the initial input prompt 106B. For example, presume a template is defined as: “<*A*>.< [First], *B*>.< [Second, Next, Then], *C*>.< [Third, Next, Then, Finally], *D*>”. Such a template can be applied to the example initial input prompt 106B as described above based on the adverbs ‘first’, ‘next’, and ‘then’. As such, *A* is assigned to the contextual text “You are a programmer that uses TikZ to generate diagrams”, *B* is identified as the first step, *C* is identified as the second step, and *D* is identified as the third step. As such, the LLM analytics device 120 can use such a template to parse the initial input prompt 106B into multiple sub-steps, and thus to similarly generate the prompts 902, 904, 906.
In some examples, parsing the initial input prompt 106B can include using the LLM 110 to parse the initial input prompt 106B. More specifically, the LLM analytics device 120 can create a ‘deconstruction LLM prompt’ that is used to parse (or ‘deconstruct’) the initial input prompt 106B into multiple steps through the help of the LLM 110. For example, the LLM 110 can be asked to identify the multiple steps of the initial input prompt 106B by creating and submitting a deconstruction LLM prompt as: “Identify each individual step in the following LLM prompt: ‘<Initial Input Prompt>’”, where <Initial Input Prompt> is the source text of the initial input prompt 106B. As such, in the above example, the LLM 110 identifies three individual steps as “1. Draw a Hexagon. 2. Draw a circle inside the hexagon. 3. Draw a square inside the circle.” Accordingly, the LLM analytics device 120 may initially submit this deconstruction LLM prompt (as a first intermediate prompt 130) and analyze the output to identify the instructions identified by the LLM 110 (e.g., identifying the three steps from a parsing of the output). As such, and similar to the above examples, once the three steps have been identified, the LLM analytics device 120 uses those three steps to generate the prompts 902, 904, 906, similarly submitting each intermediate prompt 130 in the identified order, and subsequently submitting the prior results to the next intermediate prompt 130 until the output 908 has been generated.
The above corollary has important consequences on concepts in the LLM space. These include self-verification and prompt engineering. Self-verification is the ability of the LLM 110 to assess the validity of their own output. In problems solved with self-verification, the same LLM 110 plays two different roles: (1) the LLM 110 is a generator of solutions, and (2) the LLM 110 is a discriminator of potentially wrong answers.
During the generation step, the LLM 110 produces a list of 10 names (represented here as output 1110), but it has no way of asserting if they were all born in Chicago (e.g., there could be hallucinations 114 or misalignments 116 in this output 1110). In fact, Nancy Pelosi, Bobby Rush and Timuel Black were born elsewhere, but in this example, the LLM 110 identifies all of them as having been born in Chicago (e.g., as also shown with errors in output 1112). However, if verification were split into two steps (e.g., generating the list and then verifying the list), the LLM analytics device 120 can use the LLM 110 to identify mistakes.
To verify this output, the LLM analytics device 120 creates and submits a verification LLM prompt 1202 as shown in
Generating and verifying answers may involve multiple sub-problems, each of which could benefit from CoT reasoning. CoT reasoning could also help to decompose the generation and verification processes further, when they require multiple steps. Further, the text structure of a prompt can affect how LLMs process and respond to it.
A clear and concise prompt has two basic components: a query and a state. The query tells the LLM 110 what the user 102 wants to do, such as summarize a text or caption an image. The state provides the LLM 110 with the data or information it needs to do it, such as the text or the image. This separation makes it easier to design prompts that can adapt to different scenarios.
As used herein, an API to a general purpose trainer provides a natural way to separate the query and the state in the prompt. The API lets users structure the prompt into three sections: “system”, “user”, and “assistant”. These sections correspond to the key elements of the diagram 200 of
This API gives the user flexibility to structure the problem and assign different roles to the parts of the prompt. However, it does not show how the service combines these components internally. As explained herein, CoT reasoning involves strong assumptions and multiple calls to the LLM 110 (e.g., one for each step), unlike CoT prompting that uses a single prompt. This enables more diverse and effective ways to use and deploy LLMs 110 like GPT-4 with CoT reasoning ideas. This is especially helpful in domains like healthcare, where a domain expert needs to define and execute sub-tasks of a problem.
At operation 1616, the LLM analytics device 120 creates a second LLM prompt (e.g., prompt 904) based on the input LLM prompt, the second LLM prompt representing a second step toward generating the first solution to the input LLM prompt, the second LLM prompt including the first LLM output (e.g., ‘{answer1}’ of the
In some examples, the input LLM prompt includes a first sentence and a second sentence, wherein creating the first LLM prompt comprises creating the first LLM prompt to include at least the first sentence, wherein creating the second LLM prompt comprises creating the first LLM prompt to include at least the second sentence. In some examples, the LLM analytics device 120 also parses the input LLM prompt based on one or more of sequence adverbs and temporal adverbs, wherein creating the first LLM prompt further comprises removing at least one adverb from the first sentence. In some examples, the LLM analytics device 120 also identifies a pre-configured template that matches a form of the input LLM prompt, wherein creating the first LLM prompt comprises creating the first LLM prompt based on matching one or more portions of the input LLM prompt to one or more portions of the pre-configured template.
In some examples, the LLM analytics device 120 also creates a deconstruction LLM prompt based on the input LLM prompt, the deconstruction LLM prompt being formed to query the LLM to identify multiple sub-steps from within input LLM prompt, and submits the deconstruction LLM prompt to the LLM, thereby generating a deconstruction LLM output that includes at least a first step and a second step, wherein creating the first LLM prompt based on the input LLM prompt further includes creating the first LLM prompt based on the first step, wherein creating the second LLM prompt based on the input LLM prompt further includes creating the second LLM prompt based on the second step.
In some examples, the LLM analytics device 120 also creates a verification LLM prompt based on the first LLM output, submits the verification LLM prompt to the LLM, thereby generating a verification LLM output, compares the verification LLM output to the first solution based on a comparison metric, and displays a result of the comparison.
In some examples, the LLM analytics device 120 also generates a graph that includes a first state node, a second state node, and one or more edges, the first state node representing an initial state of one or more variables identified by the input LLM prompt, the second state node representing a second state of the one or more variables, identifies a first edge connecting the first state node and the second state node, the first edge representing a trusted application of the input LLM query to the one or more variables, thereby resulting in a trusted solution, identifies one or more other edges connecting the first state node and the second state node, the one or more other edges representing application of the input LLM query via the LLM, thereby resulting in the first solution, and determines whether or not the graph commutes based on whether first solution matches the trusted solution.
An example system comprises: a processor; and a computer-readable medium storing instructions that are operative upon execution by the processor to: receive an input large language model (LLM) prompt; create a first LLM prompt based on the input LLM prompt, the first LLM prompt representing a first step toward generating a first solution to the input LLM prompt; submit the first LLM prompt to an LLM as a first sub-query, thereby resulting in the generation of a first LLM output; create a second LLM prompt based on the input LLM prompt, the second LLM prompt representing a second step toward generating the first solution to the input LLM prompt, the second LLM prompt including the first LLM output; submit the second LLM prompt to the LLM as a second sub-query, thereby resulting in the generation of a second LLM output; and cause the second LLM output to be displayed as the first solution to the input LLM prompt in response to the receiving of the input LLM prompt.
An example computer-implemented method comprises: receiving an input large language model (LLM) prompt; creating a first LLM prompt based on the input LLM prompt, the first LLM prompt representing a first step toward generating a first solution to the input LLM prompt; submitting the first LLM prompt to an LLM as a first sub-query, thereby resulting in the generation of a first LLM output; creating a second LLM prompt based on the input LLM prompt, the second LLM prompt representing a second step toward generating the first solution to the input LLM prompt, the second LLM prompt including the first LLM output; submitting the second LLM prompt to the LLM as a second sub-query, thereby resulting in the generation of a second LLM output; and displaying the second LLM output as the first solution to the input LLM prompt in response to the receiving of the input LLM prompt.
An example computer storage device has computer-executable instructions stored thereon, which, on execution by a computer, cause the computer to perform operations comprising: receiving an input large language model (LLM) prompt; creating a first LLM prompt based on the input LLM prompt, the first LLM prompt representing a first step toward generating a first solution to the input LLM prompt; submitting the first LLM prompt to an LLM as a first sub-query, thereby resulting in the generation of a first LLM output; creating a second LLM prompt based on the input LLM prompt, the second LLM prompt representing a second step toward generating the first solution to the input LLM prompt, the second LLM prompt including the first LLM output; submitting the second LLM prompt to the LLM as a second sub-query, thereby resulting in the generation of a second LLM output; and displaying the second LLM output as the first solution to the input LLM prompt in response to the receiving of the input LLM prompt.
Alternatively, or in addition to the other examples described herein, examples include any combination of the following:
While the aspects of the disclosure have been described in terms of various examples with their associated operations, a person skilled in the art would appreciate that a combination of operations from any number of different examples is also within scope of the aspects of the disclosure.
The examples disclosed herein may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program components including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks, or implement particular abstract data types. The disclosed examples may be practiced in a variety of system configurations, including personal computers, laptops, smart phones, mobile tablets, hand-held devices, consumer electronics, specialty computing devices, etc. The disclosed examples may also be practiced in distributed computing environments when tasks are performed by remote-processing devices that are linked through a communications network.
Computing device 1700 includes a bus 1710 that directly or indirectly couples the following devices: computer storage memory 1712, one or more processors 1714, one or more presentation components 1716, input/output (I/O) ports 1718, I/O components 1720, a power supply 1722, and a network component 1724. While computing device 1700 is depicted as a seemingly single device, multiple computing devices 1700 may work together and share the depicted device resources. For example, memory 1712 may be distributed across multiple devices, and processor(s) 1714 may be housed with different devices.
Bus 1710 represents what may be one or more busses (such as an address bus, data bus, or a combination thereof). Although the various blocks of
In some examples, memory 1712 includes computer storage media. Memory 1712 may include any quantity of memory associated with or accessible by the computing device 1700. Memory 1712 may be internal to the computing device 1700 (as shown in
Processor(s) 1714 may include any quantity of processing units that read data from various entities, such as memory 1712 or I/O components 1720. Specifically, processor(s) 1714 are programmed to execute computer-executable instructions for implementing aspects of the disclosure. The instructions may be performed by the processor, by multiple processors within the computing device 1700, or by a processor external to the client computing device 1700. In some examples, the processor(s) 1714 are programmed to execute instructions such as those illustrated in the flow charts discussed below and depicted in the accompanying drawings. Moreover, in some examples, the processor(s) 1714 represent an implementation of analog techniques to perform the operations described herein. For example, the operations may be performed by an analog client computing device 1700 and/or a digital client computing device 1700. Presentation component(s) 1716 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc. One skilled in the art will understand and appreciate that computer data may be presented in a number of ways, such as visually in a graphical user interface (GUI), audibly through speakers, wirelessly between computing devices 1700, across a wired connection, or in other ways. I/O ports 1718 allow computing device 1700 to be logically coupled to other devices including I/O components 1720, some of which may be built in. Example I/O components 1720 include, for example but without limitation, a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
Computing device 1700 may operate in a networked environment via the network component 1724 using logical connections to one or more remote computers. In some examples, the network component 1724 includes a network interface card and/or computer-executable instructions (e.g., a driver) for operating the network interface card. Communication between the computing device 1700 and other devices may occur using any protocol or mechanism over any wired or wireless connection. In some examples, network component 1724 is operable to communicate data over public, private, or hybrid (public and private) using a transfer protocol, between devices wirelessly using short range communication technologies (e.g., near-field communication (NFC), Bluetooth™ branded communications, or the like), or a combination thereof. Network component 1724 communicates over wireless communication link 1726 and/or a wired communication link 1726a to a remote resource 1728 (e.g., a cloud resource) across network 1730. Various different examples of communication links 1726 and 1726a include a wireless connection, a wired connection, and/or a dedicated link, and in some examples, at least a portion is routed through the internet.
Although described in connection with an example computing device 1700, examples of the disclosure are capable of implementation with numerous other general-purpose or special-purpose computing system environments, configurations, or devices. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with aspects of the disclosure include, but are not limited to, smart phones, mobile tablets, mobile computing devices, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, gaming consoles, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, mobile computing and/or communication devices in wearable or accessory form factors (e.g., watches, glasses, headsets, or earphones), network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, virtual reality (VR) devices, augmented reality (AR) devices, mixed reality devices, holographic device, and the like. Such systems or devices may accept input from the user in any way, including from input devices such as a keyboard or pointing device, via gesture input, proximity input (such as by hovering), and/or via voice input.
Examples of the disclosure may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices in software, firmware, hardware, or a combination thereof. The computer-executable instructions may be organized into one or more computer-executable components or modules. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. Aspects of the disclosure may be implemented with any number and organization of such components or modules. For example, aspects of the disclosure are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other examples of the disclosure may include different computer-executable instructions or components having more or less functionality than illustrated and described herein. In examples involving a general-purpose computer, aspects of the disclosure transform the general-purpose computer into a special-purpose computing device when configured to execute the instructions described herein.
By way of example and not limitation, computer readable media comprise computer storage media and communication media. Computer storage media include volatile and nonvolatile, removable and non-removable memory implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or the like. Computer storage media are tangible and mutually exclusive to communication media. Computer storage media are implemented in hardware and exclude carrier waves and propagated signals. Computer storage media for purposes of this disclosure are not signals. Exemplary computer storage media include hard disks, flash drives, solid-state memory, phase change random-access memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that may be used to store information for access by a computing device. In contrast, communication media typically embody computer readable instructions, data structures, program modules, or the like in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media.
The order of execution or performance of the operations in examples of the disclosure illustrated and described herein is not essential, and may be performed in different sequential manners in various examples. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the disclosure. When introducing elements of aspects of the disclosure or the examples thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. The term “exemplary” is intended to mean “an example of.” The phrase “one or more of the following: A, B, and C” means “at least one of A and/or at least one of B and/or at least one of C.”
Having described aspects of the disclosure in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the disclosure as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the disclosure, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
This application claims the benefit of U.S. Provisional Patent Application No. 63/596,212 filed on Nov. 3, 2024, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63596212 | Nov 2023 | US |