Large language models (LLMs) are particular types of machine learning models that can perform various natural language processing (NLP) tasks, such as language generation, machine translation, and question-answering. These LLMs are typically trained on enormous amounts of diverse data including data from, but not limited to, webpages, electronic books, software code, electronic news articles, and machine translation data. Accordingly, these LLMs leverage the underlying data on which they were trained in performing these various NLP tasks. For instance, in performing a language generation task, these LLMs can process a natural language (NL) based input that is received from a client device, and generate an NL based output that is responsive to the NL based input and that is to be rendered at the client device.
In some cases, an LLM can include hundreds of millions of parameters, billions of parameters, or even one hundred billion or more parameters. As such, given the large numbers of parameters included in an LLM, performance of NLP tasks using an LLM can consume relatively large amounts of resources (e.g., in terms of computing resources used in completing the NLP task, time taken to complete performance of the NLP task, energy consumed for completion of the NLP task, etc.). Furthermore, again owing to the size of LLMs, it can be difficult to adequately train an LLM such that it can reliably perform a given NLP task according to that task's respective constraints, particularly when those constraints are not explicitly provided. It is therefore beneficial in terms of computational resource usage for LLMs to generate responses to NL based inputs that do not necessitate additional follow-up NL based inputs.
Implementations described herein can serve to reduce the number of follow-up NL based inputs that may be received by an LLM. Although any given user may decide to provide a follow-up NL based input, any “on average” reduction in the number of follow-up NL based inputs can be hugely beneficial in terms of computational resource usage. More specifically, implementations described herein relate to using self-evaluation when utilizing a LLM to generate a response to a NL based input. The LLM can be used to process the NL based input to generate a plurality of responses, and to generate a critique of those responses by comparing the responses to a set of response evaluation criteria. One of the responses can then be selected as the “highest quality” response, based on the comparison with the set of response evaluation criteria which can vary from one NL based input to another. For instance, if the NL based input requests that the LLM generate a response that includes a fiction story for children, then the “highest quality” response may be one that includes simple words and sentences and that is short and engaging as set forth in the response evaluation criteria, and regardless of whether the fiction story is factually accurate. However, if the NL based input requests that the LLM generate a response that recounts a historical event, then the “highest quality” response may be one that is most factually accurate as set forth in the response evaluation criteria. Such techniques can result in responses being provided that reduce the number of follow-up NL based inputs, at least on average across the user base.
Since the LLM operates in a probabilistic manner, the quality of the initial, candidate, responses generated based on processing the NL based input using the LLM can vary. For instance, if an average of the quality of the candidate responses is taken, some of the candidate responses can be considered below the average quality and some can be considered above the average quality. By evaluating the quality of the candidate responses based on comparing them against a set of response evaluation criteria, the candidate response that is considered the highest quality can be identified.
As used herein, the “set of response evaluation criteria” can include constraints, guidelines, principles, quality metrics, rules, etc., which are implied or inherent to a particular NL based input and/or to a particular context. The set of response evaluation criteria can be used to determine an objective measure of quality of a response to an NL based input based on the extent to which (e.g., how many) the response complies with one or more criteria included in the set of response evaluation criteria. In other words, the set of response evaluation criteria for a particular NL based input can be indicative of attributes that should be exhibited by a response to that particular NL based input. It can be assumed that a likelihood of a given response resulting in follow-up NL based input(s) corresponds to the extent to which the response complies with the set of responses evaluation criteria.
In some implementations, at least some of the response evaluation criteria can be generated based on processing the NL based input using the same LLM used in generating the candidate responses and/or the critique of the candidate responses. For instance, implied constraints, quality metrics, etc. can be inferred by the LLM from attributes of the NL based input. This can be performed by providing a request, to the LLM, to generate a set of response evaluation criteria that is particular to the NL based input. In this way, the LLM can effectively self-evaluate its own responses according to criteria it itself generated in order to identify a “high-quality” response, without human intervention. In additional or alternative implementations, at least some of the response evaluation criteria can be obtained from data associated with a particular user of a client device. In additional or alternative implementations, at least some of the response evaluation criteria can be obtained from data associated with a third party (3P) that is associated with the particular user of the client device, such as a third party entity that is distinct from a first party entity that trains and/or manages the LLM.
As used herein, a “critique response” can be indicative of an evaluation of a corresponding one of the candidate responses according to the set of response evaluation criteria. For instance, a critique response can be generated for each of the candidate responses. This can be performed by providing a request, to the LLM, to determine which of one or more criteria included in the set of response evaluation criteria each candidate response complies with. In some implementations, each critique response can therefore provide an indication of an extent to which a corresponding one of the candidate replies complies with the response evaluation criteria. The candidate response which best complies with the evaluation criteria can then be identified based on the critique responses. In some implementations, the corresponding critique response associated with a given one of the candidate responses that indicates the given one of the candidate responses complies with the highest number of response evaluation criteria can be identified.
In some other implementations, multiple critique responses can be generated for each of the corresponding candidate responses, where each of the multiple critique responses for a given one of the corresponding candidate replies indicate an extent to which the given one of the candidate replies complies with the response evaluation criteria. In these implementations, a majority vote can be used to determine which of the candidate responses is most consistently determined to best comply with the set of response evaluation criteria. For instance, assume that 5 critique responses are generated for each of the corresponding candidate responses. In this instance, and assuming that 4 of the 5 critique responses (or some other majority) indicate that the given one of the corresponding candidate responses best complies with the set of response evaluation criteria, then the given one of the corresponding candidate responses can be identified.
In some implementations, the self-evaluation process described herein can be used during utilization of an NL based response system including an LLM to generate a response to an NL based input associated with (e.g. provided by) a user via a client device. The response identified as being the “highest quality” response (or the top N responses identified as being the “highest quality”, where N is a positive integer greater than one) can be selected for being rendered at the client device and in response to the user's NL based input.
In some additional or alternative implementations, the self-evaluation process described herein can be used to generate synthetic training data for fine-tuning an LLM for subsequent utilization by, for example, an NL based response system. For instance, an NL based input can be obtained (for instance, from a database of previously submitted NL based inputs provided by one or more users), and provided as input to the NL based response system. The NL based response system can provide, as output, a high quality response by utilizing the self-evaluation process described herein. The NL based input and the high quality response can then be stored as a training instance to be used for fine-tuning the LLM. In some versions of those implementations, additional information, such as the number of response evaluation criteria the high quality response is determined to comply with, can also be stored in the training instance. For instance, the additional information can be used to provide a weighting for the high quality answer during fine-tuning. By fine-tuning the LLM based on examples of “high quality” responses, the average quality of responses generated using the fine-tuned LLM can be of a higher quality than corresponding responses generated using the LLM prior to fine-tuning. This process can be repeated, such that at each iteration, the average quality of response generated using the LLM is improved.
In these and other manners, responses generated using an LLM can be reliably of a higher quality. This can be the case whether the self-evaluation process described herein is used when the response is generated or is used to generate training data used in fine-tuning the LLM prior to the response being generated. As such, instances of subsequent (e.g. follow-up) NL based inputs provided by a user, e.g. in order to improve the quality of an initial response, which would otherwise be processed by the LLM can be reduced. For instance, if an initial response did not comply with an implied constraint from the initial NL based input, the user may provide a further NL based input which explicitly includes the constraint in order to force the LLM to generate a response which complies with the constraint. The user may repeat this process a number of times, for instance, if the subsequent responses do not adequately comply with the constraint, or if there are further implied constraints which the response does not comply with. As described herein, implementations which utilize the self-evaluation process can ensure that resources, which would otherwise be consumed in these repeated interactions with the LLM, can be conserved.
Furthermore, as described herein, a mechanism for self-evaluation for responses generated using an LLM is provided. In this way, the responses, and the corresponding NL based input, can be stored as training data, with the self-evaluation process providing a manner of labelling the training data without human intervention. As such, implementations described herein can provide a relatively low cost and time efficient manner of labelling training data as compared to, for instance, manually labelling training data by humans (e.g. manually indicating a relative or absolute quality of responses generated using an LLM).
In addition, in some instances, the self-evaluation process can be used as part of an NL based response system to be used for conducting a dialog (e.g., including multiple inputs and responses) with a human user. For instance, the NL based response system can be provided as part of an automated assistant, a chat bot, etc. In some cases, the user can provide one or more commands to be fulfilled as part of the dialog (e.g., to control a smart device, to generate code, to generate commands to control a robot, to assist with navigation in a vehicle, etc.). As such, use of the self-evaluation process described herein can also assist the user in performing a technical task by means of a continued and guided human-machine interaction process. Further, and since the responses generated using the LLM can be reliably of a higher quality, the human-machine interaction process can be concluded in a quick and efficient manner.
Furthermore, implementations described herein can allow a user to more easily and intuitively interact and control the NL based response system, which is itself a technical system. For instance, since the NL based response system can be capable of inferring response evaluation criteria from an NL based input, it is not necessary for the user to explicitly provide such information in the NL based input. As discussed herein, determining such information by a human can require trial and error, or can require high levels of skill, training, and/or familiarity with the particular LLM. As such, implementations described herein can mitigate these obstacles.
In other words, implementations described herein can provide a mechanism by which, without any additional interaction from the user, additional information that is in addition to the NL based input (e.g., the set of response evaluation criteria) can effectively be leveraged when processing the NL based input by the LLM to provide a higher quality response, and therefore more efficient access to the information stored in the LLM. This has the effect of augmenting the NL based input to the LLM, and thus improving the information retrieval by the LLM on an objective basis.
As mentioned, an LLM is typically trained with data from, for instance, webpages, electronic books, software code, electronic news articles, and machine translation data, and, when, for instance, generating a response to a particular NL based input, the LLM leverages the underlying data on which it was trained. In this way, an LLM can be considered to be a database structure with information stored in the parameters of the LLM. Since, as described herein, an NL based input to be processed by the LLM can be augmented by using the self-evaluation process described herein, this can be considered to be an improved database query, which can result in more efficient information retrieval.
The above description is provided as an overview of some implementations of the present disclosure. Further description of those implementations, and other implementations, are described in more detail below.
Turning now to
In some implementations, all or some aspects of the NL based response system 120 can be implemented locally at the client device 110. In additional or alternative implementations, all or some aspects of the NL based response system 120 can be implemented remotely from the client device 110 as depicted in
The client device 110 can be, for example, one or more of: a desktop computer, a laptop computer, a tablet, a mobile phone, a computing device of a vehicle (e.g., an in-vehicle communications system, an in-vehicle entertainment system, an in-vehicle navigation system), a standalone interactive speaker (optionally having a display), a smart appliance such as a smart television, and/or a wearable apparatus of the user that includes a computing device (e.g., a watch of the user having a computing device, glasses of the user having a computing device, a virtual or augmented reality computing device). Additional and/or alternative client devices may be provided.
The client device 110 can execute one or more software applications, via application engine 115, through which NL based input can be submitted and/or NL based output and/or other output that is responsive to the NL based input can be rendered (e.g., audibly and/or visually). The application engine 115 can execute one or more software applications that are separate from an operating system of the client device 110 (e.g., one installed “on top” of the operating system)-or can alternatively be implemented directly by the operating system of the client device 110. For example, the application engine 115 can execute a web browser or automated assistant installed on top of the operating system of the client device 110. As another example, the application engine 115 can execute a web browser software application or automated assistant software application that is integrated as part of the operating system of the client device 110. The application engine 115 (and the one or more software applications executed by the application engine 115) can interact with the NL based response system 120.
In various implementations, the client device 110 can include a user input engine 111 that is configured to detect user input provided by a user of the client device 110 using one or more user interface input devices. For example, the client device 110 can be equipped with one or more microphones that capture audio data, such as audio data corresponding to spoken utterances of the user or other sounds in an environment of the client device 110. Additionally, or alternatively, the client device 110 can be equipped with one or more vision components that are configured to capture vision data corresponding to images and/or movements (e.g., gestures) detected in a field of view of one or more of the vision components. Additionally, or alternatively, the client device 110 can be equipped with one or more touch sensitive components (e.g., a keyboard and mouse, a stylus, a touch screen, a touch panel, one or more hardware buttons, etc.) that are configured to capture signal(s) corresponding to touch or typed input directed to the client device 110.
Some instances of an NL based input described herein can be a query for an NL response that is formulated based on user input provided by a user of the client device 110 and detected via user input engine 111. For example, the query can be a typed query that is typed via a physical or virtual keyboard, a suggested query that is selected via a touch screen or a mouse of the client device 110, a spoken voice query that is detected via microphone(s) of the client device 110 (and optionally directed to an automated assistant executing at least in part at the client device 110), or an image or video query that is based on vision data captured by vision component(s) of the client device 110 (or based on NL input generated based on processing the image using, for example, object detection model(s), captioning model(s), etc.). Other instances of a NL based input described herein can be a prompt for NL content that is formulated based on user input provided by a user of the client device 110 and detected via the user input engine 111. For example, the prompt can be a typed prompt that is typed via a physical or virtual keyboard, a suggested prompt that is selected via a touch screen or a mouse of the client device 110, a spoken prompt that is detected via microphone(s) of the client device 110, or an image prompt that is based on an image captured by a vision component of the client device 110.
In various implementations, the client device 110 can include a rendering engine 112 that is configured to render content (e.g., NL based response(s)) for audible and/or visual presentation to a user of the client device 110 using one or more user interface output devices. For example, the client device 110 can be equipped with one or more speakers that enable the content to be provided for audible presentation to the user via the client device 110. Additionally, or alternatively, the client device 110 can be equipped with a display or projector that enables the content to be provided for visual presentation to the user via the client device 110.
In various implementations, the client device 110 can include a context engine 113 that is configured to determine a context (e.g., current or recent context) of the client device 110 and/or of a user of the client device 110 (e.g., an active user of the client device 110 when the client device 110 is associated with multiple users). In some of those implementations, the context engine 113 can determine a context based on data stored in response evaluation criteria data database 110A. The data stored in the response evaluation criteria data database 110A can include, for example, user interaction data that characterizes current or recent interaction(s) of the client device 110 and/or a user of the client device 110, location data that characterizes a current or recent location(s) of the client device 110 and/or a user of the client device 110, user attribute data that characterizes one or more attributes of a user of the client device 110, user preference data that characterizes one or more preferences of a user of the client device 110, user profile data that characterizes a profile of a user of the client device 110, third party (3P) data which is indicative of one or more response evaluation criteria defined by a 3P and/or any other data accessible to the context engine 113 via the client device data database 110A or otherwise.
For example, the context engine 113 can determine a current context based on a current state of a dialog session (e.g., considering one or more recent inputs provided by a user during the dialog session), profile data, and/or a current location of the client device 110. For instance, the context engine 113 can determine a current context of “best landmarks to visit in London” based on a recently issued query, profile data, and/or a current or an anticipated future location of the client device 110 (e.g., based on calendar information associated with the user accessible to the context engine 113). As another example, the context engine 113 can determine a current context based on which software application is active in the foreground of the client device 110, a current or recent state of the active software application, and/or content currently or recently rendered by the active software application. A context determined by the context engine 113 can be utilized, for example, in supplementing or rewriting NL based input that is formulated based on user input, in generating an implied NL based input (e.g., an implied query or prompt formulated independent of any explicit NL based input provided by a user of the client device 110), and/or in determining to submit an implied NL based input and/or to render result(s) (e.g., an NL based output) for an implied NL based input.
In various implementations, the client device 110 can include an implied input engine 114 that is configured to: generate an implied NL based input independent of any user explicit NL based input provided by a user of the client device 110; submit an implied NL based input, optionally independent of any user explicit NL based input that requests submission of the implied NL based input; and/or cause rendering of response(s) for the implied NL based input, optionally independent of any explicit NL based input that requests rendering of the response(s). For example, the implied input engine 114 can use one or more past or current contexts, from the context engine 113, in generating an implied NL based input, determining to submit the implied NL based input, and/or in determining to cause rendering of response(s) that is responsive to the implied NL based input. For instance, the implied input engine 114 can automatically generate and automatically submit an implied query or implied prompt based on the one or more past or current contexts. Further, the implied input engine 114 can automatically push the response(s) that is generated responsive to the implied query or implied prompt to cause them to be automatically rendered or can automatically push a notification of the response(s), such as a selectable notification that, when selected, causes rendering of the response(s). Additionally, or alternatively, the implied input engine 114 can submit respective implied NL based input at regular or non-regular intervals, and cause respective response(s) to be automatically provided (or a notification thereof automatically provided). For instance, the implied NL based input can be “automated assistant news” based on the one or more past or current contexts indicating a user's general interest in automated assistants, the implied NL based input or a variation thereof periodically submitted, and the respective response(s) can be automatically provided (or a notification thereof automatically provided). It is noted that the respective response(s) can vary over time in view of, e.g., presence of new/fresh search result document(s) over time.
Further, the client device 110 and/or the NL based response system 120 can include one or more memories for storage of data and/or software applications, one or more processors for accessing data and executing the software applications, and/or other components that facilitate communication over one or more of the networks 199. In some implementations, one or more of the software applications can be installed locally at the client device 110, whereas in other implementations one or more of the software applications can be hosted remotely (e.g., by one or more servers) and can be accessible by the client device 110 over one or more of the networks 199.
Although aspects of
The NL based response system 120 is illustrated in
Further, the NL based response system 120 is illustrated in
As described in more detail herein (e.g., with respect to
The NL based input can be associated with a client device 110 (for instance, provided by a user of the client device 110 via user input engine 111, implied input engine 114, etc.), and the self-evaluation can be used to select a particular response to be rendered (e.g., using rendering engine 112) at the client device (e.g. as described with respect to
In additional or alternative implementations, the NL based input can be obtained from example input data stored in the example input data database 131A. In these implementations, the self-evaluation can be used to generate labelled training instances, including the NL based input and a response selected based on the determination of the evaluation engine 155 as to which of the candidate responses best complies with the set of response evaluation criteria (e.g., as described with respect to
Turning now to
The NL based input 210 can be processed using an NL based response system, such as the NL based response system 120 as described in relation to
In some implementations, a set of response evaluation criteria 232 can also be obtained. In some implementations, at least some of the set of response evaluation criteria can be generated by the NL based response system 120 (e.g., using the response evaluation criteria generation engine 153). For instance, the set of response evaluation criteria 232 can be generated based on processing the NL based input 210 with an LLM (e.g., the same LLM used in generating the candidate responses 230, or a different LLM stored in the LLM(s) database 142A).
In some versions of those implementations, the NL based response system 120 can generate at least some of the set of response evaluation criteria based on processing a request to generate the set of response evaluation criteria 232 along with the NL based input 210. The content of the request can be automatically generated during utilization of the NL based response system 120, or the content of the request can be generated prior to the utilization and retrieved (for instance, from request generation data database 141A) when required. In either case, the request to generate the response evaluation criteria 232 can be generated based on the obtained content. In some implementations, the request to generate the response evaluation criteria 232 can include at least one (e.g., 2, 10, 20, 100, etc.) example of response evaluation criteria for one or more given NL based input(s). For instance, the examples can be predefined by a human, and can guide the LLM as to an appropriate format and/or content of a response evaluation criteria for a given NL based input for use in the self-evaluation process described herein.
In some implementations, the set of response evaluation criteria 232 can be filtered based on one or more filtering criteria. The filtering criteria can include, for instance, length of a response evaluation criterion. For instance, if a response evaluation criterion from the set of response evaluation criteria 232 is too long (e.g., if it includes a number of words or characters above a threshold), it can be excluded from further processing to reduce a quantity of tokens to be processed using the LLM. Other filtering criteria are also possible, such as grammatical features, semantic content, etc.
In some implementations, at least some of the response evaluation criteria 232 can be obtained based on response evaluation criteria data stored in response evaluation criteria data database 110A. For instance, user information can be obtained. The user information can be associated with a user profile of the user accessible to the NL based response system 120. The user information can be retrieved from a device associated with the user, a remote server system, a local database, etc. The user information can include, for instance, user interaction history with the LLM and/or various software applications, user-defined preferences, contextual information within an ongoing dialog between the user and the LLM, etc.
In some versions of those implementations, at least some of the set of response evaluation criteria 232 can be generated based on processing the user information using the LLM (e.g., independently or along with the NL based input). Additionally, or alternatively, the user information itself can be indicative of response evaluation criteria. For instance, the user information can include response evaluation criteria (or a token indicative of response evaluation criteria) that has been determined based on information associated with the user (e.g., user preferences, etc.) and/or that is predetermined by the user. In this way, behavior of the NL based response system 120 can be personalized to the user (e.g., during utilization by the user and/or in generating training instances to fine-tune the LLM of the NL based response system 120 such that it is personalized to the user).
In some implementations, at least some of the response evaluation criteria 232 can be obtained based on information associated with a third party (3P). The 3P information can include (or be indicative of) information submitted by the 3P and/or information otherwise known regarding the 3P (e.g., based on publicly available information). In some versions of those implementations, at least some of the set of response evaluation criteria 232 can be generated based on processing the 3P information using the LLM (e.g., independently or along with the NL based input). Additionally, or alternatively, the 3P information itself can be indicative of response evaluation criteria. For instance, the 3P information can include response evaluation criteria (or a token indicative of response evaluation criteria) that has been determined based on 3P information associated with the 3P (e.g., values provided by the 3P, etc.) and/or that is predetermined by the 3P.
As one non-limiting example, the NL based response system 120 can be provided on behalf of a 3P. The 3P can provide an interface by which users can provide input (e.g., via a chatbot available on a webpage associated with the 3P). The 3P interface can then conduct interaction between the NL based response system 120 and the user, for instance, using an application programming interface (API). The 3P can also provide information indicative of response evaluation criteria (e.g., tokens or response evaluation criteria themselves based on 3P defined rules) to be processed by the NL based response system 120. In this way, a mechanism is provided by which a 3P can provide an additional signal to an NL based response system 120 when generating responses to user input in order to bias the responses generated, without requiring any additional resource intensive training. Additionally, or alternatively, the 3P can provide information indicative of response evaluation criteria during training or fine-tuning of the LLM of the NL based response system 120. In this way, a simple and relatively inexpensive (e.g., as compared to the 3P developing their own NL based response system 120 and/or training their own LLM) mechanism by which a NL based response system 120 can be personalized to a 3P is provided.
In various implementations, the NL based response system 120 can generate critique responses 240 based on processing the candidate responses 230 and the set of response evaluation criteria 232. The critique responses 240 can be indicative of an extent to which the candidate responses 230 comply with the set of response evaluation criteria 232.
In some implementations, the generation of the critique responses 240 can be initiated by a request for the LLM to generate the critique responses 240. For instance, the request can be generated and processed, along with the candidate responses 230 and the set of response evaluation criteria 232, using the LLM. The content of the request for the LLM to generate the critique responses 240 can be predefined, or can be generated through utilization of the NL based response system 120. As described herein, there are various mechanisms by which the candidate responses 230 can be evaluated. As will be appreciated, attributes of the request (such as the format and content of the request) can influence the form of the critique responses 240 generated using the LLM. As such, the attributes of the request can be chosen based on the particular implementation of evaluation of the candidate responses 230. In some implementations, the request does not include any examples of critique responses for the LLM. In this way, the critique responses 240 generated using the LLM can be less constrained (e.g., by attributes of any examples provided).
Based on the critique responses 240, one (or more) of the candidate responses 230 can be selected using evaluation engine 155. For instance, the candidate response 230 which is determined to best comply with the set of response evaluation criteria 232 based on the critique response(s) 240 can be selected as the selected response. In some implementations, more than one of the candidate responses 230 can be selected using the evaluation engine 155 (for instance, if multiple candidate responses 230 are determined to equally comply with the set of response evaluation criteria 232, if a second candidate response is above a threshold compliance of the set of response evaluation criteria 232, if the NL based response system 120 is configured to provide multiple responses, etc.). In this case, the order of the selected responses 250, based on how well they comply with the set of response evaluation criteria 232 can be indicated. Additionally, or alternatively, an indication of the extent to which each of the selected responses 250 complies with the set of response evaluation criteria 232 (e.g., a comparison measure, as described herein) can be provided.
As an example of a particular evaluation mechanism, a critique response 240 can be generated for each of the candidate responses 230. For instance, a critique response 240 can include an indication of an extent to which a corresponding one of the candidate responses 230 complies with the set of response evaluation criteria 232. The critique response 240 can also include an indication of a reasoning for why the corresponding candidate response 230 is determined to comply or not comply with each of the set of response evaluation criteria. In order to initiate the generation of these critique responses, a request for the LLM to indicate the extent to which each of the candidate responses 230 complies with the set of response evaluation criteria 232 can be generated and processed using the LLM.
In particular, a critique response 240 can include a comparison measure determined based on comparing the set of response evaluation criteria 232 to the corresponding candidate response 230, using the LLM. The comparison measure can be indicative of the number of response evaluation criteria 232 the corresponding candidate response 230 is determined to comply with and/or an extent to which the corresponding candidate response 230 complies with a given criterion included in the response evaluation criteria 232. For instance, the comparison measure can be determined by dividing the number of response evaluation criteria the corresponding candidate response 230 is determined to comply with by the total number of response evaluation criteria in the set of response evaluation criteria 232.
In some implementations, a plurality of critique responses 240 can be generated for each of the candidate responses 230. Since, as described herein, the LLM can be probabilistic, each of the critique responses 240 for a corresponding candidate response 230 can vary (e.g., in terms of which of the set of response evaluation criteria 232 the corresponding response 230 complies with, or the reasoning provided for why this is the case), even though the same input data (e.g., the NL based input 210, context, and/or other input data described herein) is processed using the LLM. As such, the critique responses 240 (or the comparison measures of the critique responses 240) for a corresponding candidate response 230 can be summarized (e.g., averaged, summed, concatenated, most consistent response determined, etc.).
Alternatively, or additionally, the critique response 240 for a corresponding candidate response 230 can be filtered based on one or more filtering criteria. For example, if the indication of the reasoning of a particular critique response 240 includes more than a threshold number of words or characters, it can be excluded from further processing (e.g., discounted from the summarizing). As another example, if a particular critique response 240 for a corresponding candidate response 230 has a lower than threshold consistency measure with the other critique responses 240 for the corresponding candidate response 230, it can be excluded from further processing. As yet another example, when a plurality of corresponding critique responses 240 are generated for each of the candidate responses 230, a comparison measure for a given candidate response 230 can be determined based on a quantity of the critique responses 240 for the given candidate response 230 that indicate that the given candidate response 230 complies with at least a threshold number of the response evaluation criteria 232. For instance, the comparison measure can be based on a percentage or the total number of critique responses 240 that indicate that the candidate response 230 complies with at least a threshold number of the response evaluation criteria 232.
Based on the critique responses 240, one (or more) of the candidate responses 230 can be selected using evaluation engine 155. For instance, the candidate response 230 which is determined to comply with the most response evaluation criteria of the set of response evaluation criteria 232 can be selected as the selected response 250. This can be determined on the basis of the comparison measure. For instance, the candidate response with the highest corresponding comparison measure (e.g., as indicated by the corresponding critique response 240) can be selected as the selected response 250.
As an example of this particular evaluation mechanism, assume that NL based input X has been submitted to the NL based response system 120, which has responsively provided candidate responses A, B and C. Further assume that the NL based response system 120 has obtained a set of response evaluation criteria Y which includes eight response evaluation criteria, and has generated critique responses A*, B*, and C* based on comparing the candidate responses A, B, and C to the set of response evaluation criteria Y respectively. In this example, critique response A* indicates that candidate response A complies with six out of eight of the set of response evaluation criteria Y (e.g. critique response A* can include a comparison measure of 0.75). Further, critique response B* indicates that candidate response B complies with seven out of eight of the set of response evaluation criteria Y (e.g. critique response B* can include a comparison measure of 0.875). Moreover, critique response C* indicates that candidate response C complies with two out of eight of the set of response evaluation criteria Y (e.g. critique response C* can include a comparison measure of 0.25). Accordingly, in this example, candidate response B can be selected as the selected candidate response using the evaluation engine 155 since it has been determined to comply with the highest number of the set of response evaluation criteria Y (and can thus be considered to be the “highest quality” response).
As another example of a particular evaluation mechanism, one or more critique responses 240 can be generated for all of the candidate responses 230. For instance, the critique response(s) 240 can be generated based on comparing all of the candidate responses 230 with the set of response evaluation criteria 232, using the LLM. The critique response(s) 240 can thus include an indication of a candidate response 230 which is determined to best comply with the set of response criteria 232. The critique response(s) 240 can also include an indication of the reasoning for why the particular candidate response 230 was determined to best comply with the set of response evaluation criteria 232. Generating the critique response(s) can be initiated by generating and processing a request for the LLM to determine which one of the candidate responses 230 best complies with the set of response evaluation criteria 232. Where a single critique response 240 is generated, the candidate response 230 determined as best complying with the set of response evaluation criteria 232 in the critique response 240 can be selected as the selected response 250.
In some implementations, multiple critique responses 240 can be generated using the LLM, with each of the critique responses 240 including an indication as to which of the candidate responses 230 best complies with the set of evaluation criteria 232. As described herein, since the LLM can be probabilistic in nature, each of the critique responses 240 may vary (e.g. in terms of the candidate response 230 indicated as best complying the set of response evaluation criteria 232 and/or in the indication of reasoning provided), even though the critique responses 240 were generated based on the same candidate responses 230 and the same set of response evaluation criteria 232 using the LLM. Additionally, or alternatively, multiple critique responses 240 can be generated based on processing the candidate responses 230 with different subsets of the set of response evaluation criteria 232, to generate more diverse critique responses 240. From the plurality of critique responses 240, it can be determined which of the candidate responses 230 is most consistently (e.g., most often) determined as best complying with the set of response evaluation criteria 232. This candidate response 230 can thus be selected as the selected response 250. In this example, a comparison measure can be determined across the plurality of critique responses 240. The comparison measure can be indicative of a consistency across the plurality of critique responses 240 for the selected response 250 being indicated as the candidate response 230 which best complies with the set of response evaluation criteria 232. For instance, the comparison measure can be based on performing majority voting over the plurality of critique responses 240. In some implementations, the comparison measure can be in a range between 0 and 1.
As described herein, the plurality of critique responses 240 can be filtered based on one or more filtering criteria. Based on the filtering, a subset of the plurality of critique responses 240 can be identified for further processing (e.g., in selecting the selected response 250).
As an example of this particular evaluation mechanism, assume that NL based input X has been submitted to the NL based response system, which has responsively provided candidate responses A, B and C. Further assume that the NL based response system has obtained a set of response evaluation criteria Y, and has generated critique responses D, E, F, G, and H based on determining which of the candidate responses A, B, and C best complies with the set of response evaluation criteria Y over five iterations. Critique responses D, F, G, and H each indicate that candidate response A best complies with the set of response evaluation criteria Y. Critique response E indicates that candidate response C best complies with the set of response criteria. In other words, it can be considered that candidate A has received four “votes” for best complying with the set of response evaluation criteria Y. It can also be considered that candidate response C has received one “vote” for best complying with the set of response evaluation criteria Y. In this case, candidate response A can be selected as the selected candidate response since it has been determined most consistently that candidate response A best complies with the set of response evaluation criteria Y (and can thus be considered to be the “highest quality” response). Put another way, candidate response A has received the most “votes”, and will be selected due to majority voting. Since candidate response A received four out of five “votes”, a corresponding comparison measure can be determined as 0.8.
Turning now to
For instance, the NL based input 210 can be provided based on user input by a user of the client device. The user can provide the user input, for instance, by typing on a virtual or physical keyboard of the client device, providing speech which is captured by one or more microphones of the client device 110, selection (e.g., via tapping on a touch screen display, voice command, using a pointing device, etc.) a suggested input, providing gestures captured by one or more sensors of the client device 110, etc. Information indicative of the user input can be used to determine the NL based input 210. For instance, the information can include text entered, selected, or determined based on processing the user's speech using speech recognition. This text can then be provided as the NL based input. As another example the information can include one or more token(s) which can be used to determine the NL based input 210 (e.g., by the client device or the NL based response system 120). The information can be provided to the NL based response system by the client device 110, for instance via a wireless network (such as network 199).
Similarly, the selected response 250 (or information indicative of the selected response 250) can be provided to the client device 110 by the NL based response system 120 (e.g., via network 199). A command can also be sent to the client device 110 to cause the client device 110 to render the selected response (e.g., via a display of the client device 110, via a speaker of the client device 110, etc.). However, in some implementations, it can be assumed that the client device 110, upon receiving the selected response 250, will render the selected response 250, without any explicit command to do so being received. In some implementations, more than one of the candidate responses 230 can be selected to be sent to the client device 110. In this case, the client device 110 can render one or more of the received selected responses 250. For instance, the client device 110 can determine which of the selected responses 250 to render. This determination 110 can be based on additional information received from the NL based response system (e.g., comparison measures associated with the selected responses 250).
Although it has generally been described that the client device 110 to which the NL based input 210 is associated and the client device 110 which renders the selected response 250 are the same client device 110, in some implementations this may not be the case. In other words, the client device 110 which renders the selected response 250 can be a different client device 110 than the client device which provided the NL based input 210. For instance, the selected response 250 can be rendered on a display separate from (but possibly associated with, for instance, by virtue of a user account being signed in on both devices) a smart speaker which received the NL based input 210.
In this way, the NL based response system 120, and particularly the self-evaluation mechanism described herein, can be utilized to generate responses to NL based inputs 210 associated with a client device 110 and cause rendering of selected responses 250 by the client device 110. In other words, the NL based response system 120, and particularly the self-evaluation mechanism described herein, can be utilized to provide responses for a user. Put another way, the NL based response system 120, and particularly the self-evaluation mechanism described herein, can be used during inference using the LLM. As such, resources required to process repeated interactions (e.g., follow-up NL based input(s)) with LLM (e.g., of the NL based response system 120) which could otherwise occur in order to refine an initial response can be conserved. In addition, expert knowledge and experience required to formulate an NL based input in order to retrieve a particular response can be reduced and/or eliminated altogether. An example of the self-evaluation process during inference using an LLM (e.g., of the NL based response system 120) is described herein in relation to
For example, and referring briefly to
The graphical interface also includes a plurality of candidate responses 356, 358, 360 generated using an NL based response system (e.g., NL based response system 120) based on processing the NL based input 352. As depicted in
Referring briefly to
Referring briefly to
Referring briefly to
As noted above, although intermediate stages of the self-evaluation process are shown as being rendered by client device 310 in
Turning now to
The example input data database 131A can include a number of example NL based inputs. The example NL based inputs can be obtained from previous usage of the NL based response system 120 (or another NL based response system). For instance, during utilization of the NL based response system 120 as described in relation to
As described in relation to
The training instance(s) database 132A can include training instances generated as described in relation to
Once the training instance(s) have been generated in this manner, the NL based response system 120 (or an LLM thereof), can be fine-tuned (or otherwise termed, trained) using the training instances stored in the training instance(s) database 132A (e.g., using training engine 132). This can be performed in any suitable way (e.g., supervised learning, reinforcement learning, etc.).
For instance, and referring briefly to
In some implementations, additional data, such as comparison data 416 can be obtained from the training instance 410 as well. The comparison data 416 can be included in the comparison between the selected response 420 and the training instance response 414. The comparison data 416 can be used to provide a weighting to the comparison when generating the training loss 430. For instance, if the comparison data 416 indicates that the training instance response 414 is of a very high quality (e.g. if it is determined to comply with all of the response evaluation criteria), a difference between the selected response 420 and the training instance response 414 can be propagated to a greater extent (e.g. by determining a larger training loss 430, by giving the training loss 430 a greater significance during training, etc.), and vice versa. As another example, the comparison data 416 can be used to train a separate reward model for use in fine-tuning the LLM (e.g., of the NL based response system 120) using reinforcement learning.
Once the LLM (e.g., of the NL response system 120) has been fine-tuned, the fine-tuned LLM can be deployed for use in generating responses to NL based input. In some cases, the NL based response system 120 can be updated with the fine-tuned LLM for use in inference (e.g., in the manner described in relation to
Turning now to
At block 510, the system receives a natural language (NL) based input associated with a client device (e.g., in the same or similar manner described with respect to
At block 520, the system generates a large language model (LLM) response based on processing the NL based input using an LLM. In various implementations, block 520 can include block 522, block 524, block 526, and/or block 528 for utilization in generating the LLM response.
At block 522, the system obtains a set of response evaluation criteria (e.g., in the same or similar manner described with respect to
In some implementations, obtaining the set of response evaluation criteria can additionally or alternatively include obtaining user information associated with the user of the client device. The set of response evaluation criteria can then be determined based on the user information. The user information can include, for instance, one or more of: user interaction history with the LLM and/or another application, user-defined preferences, and contextual information within an ongoing dialog between the user and the LLM, wherein the NL based input is associated with the ongoing dialog.
In some implementations, obtaining the set of response evaluation criteria can additionally or alternatively include obtaining information indicative of a set of response evaluation criteria associated with a third party (3P). The set of response evaluation criteria can then be determined based on the obtained information.
At block 524, the system generates a plurality of candidate LLM responses based on processing the NL based input using the LLM (e.g., in the same or similar manner described with respect to
At block 526, the system generates, for each of the plurality of candidate LLM responses, a corresponding critique response based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM (e.g., in the same or similar manner described with respect to
Generating a corresponding critique response for a given candidate LLM response can include, for example, generating a request for the LLM to determine which of the set of response evaluation criteria the given candidate LLM response complies with. The request can then be processed using the LLM to generate the corresponding critique response. In some implementations, the request does not include any examples of a critique response.
Each of the corresponding critique responses can include an indication of an extent to which a corresponding one of the plurality of candidate LLM responses complies with the set of response evaluation criteria. For instance, a comparison measure can be determined for each of the plurality of candidate LLM responses, and included in the corresponding critique responses. The comparison measure can be based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM.
For instance, the comparison measure for a given candidate LLM response can be indicative of the number of response evaluation criteria from among the set of response evaluation criteria the given candidate LLM response is determined to comply with (e.g., using the LLM). As a specific example, the comparison measure for a given candidate LLM response can be determined by dividing the number of response evaluation criteria of the set of response evaluation criteria the given candidate LLM is determined to comply with by the total number of response evaluation criteria of the set of response evaluation criteria.
In addition, each of the corresponding critique responses can include an indication of a reasoning for why a corresponding one of the plurality of candidate LLM responses complies (or not) with the set of response evaluation criteria
In some implementations, a plurality of corresponding critique responses are generated for each of the plurality of candidate LLM responses. In this case, the corresponding critique responses can be filtered based on the reasoning. Additionally, or alternatively, in this case, the comparison measure for a given candidate LLM response can be determined based on a quantity of the plurality of critique responses for the given candidate LLM response that indicate that the given candidate LLM response complies with at least a threshold number of response evaluation criteria from the set of response evaluation criteria.
At block 528, the system selects, based on the corresponding critique responses, one of the plurality of candidate LLM responses as the LLM response (e.g., in the same or similar manner described with respect to
At block 530, the system causes the LLM response to be rendered at the client device.
In some implementations, the NL based input and the selected LLM response can be stored as training data for use in subsequent fine-tuning of the LLM. The corresponding critique response (e.g., including the corresponding comparison data) can also be stored as training data. At a subsequent time, the LLM can be fine-tuned based on the training data.
Turning now to
At block 610, the system generates training data for fine-tuning a large language model (LLM) (e.g., in the same or similar manner described with respect to
At block 612, the system obtains a natural language (NL) based input for the LLM (e.g., in the same or similar manner described with respect to
At block 614, the system obtains a set of response evaluation criteria (e.g., in the same or similar manner described with respect to
At block 616, the system generates a plurality of candidate LLM responses based on processing the NL based input using the LLM (e.g., in the same or similar manner described with respect to
At block 618, the system generates, for each of the plurality of candidate LLM responses, a corresponding critique response based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM (e.g., in the same or similar manner described with respect to
At block 620, the system selects, based on the corresponding critique responses, one of the plurality of candidate LLM responses as an LLM response that is responsive to the NL based input (e.g., in the same or similar manner described with respect to
At block 622, the system stores, as an instance of the training data, the NL based input along with the LLM response that is selected from among the plurality of candidate LLM responses.
At optional block 630, the system fine-tunes the LLM based on the training data (e.g., in the same or similar manner described with respect to
In some implementations, the LLM can be fine-tuned using reinforcement learning (RL). For instance, a reward model can be generated based on a selected LLM response to a NL based input along with its corresponding comparison measure. Fine-tuning the LLM can then include fine-tuning the LLM with (RL) using the reward model.
In some implementations, subsequent to fine-tuning the LLM, the LLM can be deployed for use in generating responses to NL based inputs. For instance, an NL based input associated with a client device can be received. An LLM response can be generated based on processing the NL based input associated with the client device using the fine-tuned LLM. The client device can then be caused to render the LLM response.
Turning now to
Computing device 710 typically includes at least one processor 714 which communicates with a number of peripheral devices via bus subsystem 712. These peripheral devices can include a storage subsystem 724, including, for example, a memory subsystem 725 and a file storage subsystem 726, user interface output devices 720, user interface input devices 722, and a network interface subsystem 716. The input and output devices allow user interaction with computing device 710. Network interface subsystem 716 provides an interface to outside networks and is coupled to corresponding interface devices in other computing devices.
User interface input devices 722 can include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touch screen incorporated into the display, audio input devices such as voice recognition systems, microphones, and/or other types of input devices. In general, use of the term “input device” is intended to include all possible types of devices and ways to input information into computing device 710 or onto a communication network.
User interface output devices 720 can include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem can include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem can also provide non-visual display such as via audio output devices. In general, use of the term “output device” is intended to include all possible types of devices and ways to output information from computing device 710 to the user or to another machine or computing device.
Storage subsystem 724 stores programming and data constructs that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 724 can include the logic to perform selected aspects of the methods disclosed herein, as well as to implement various components depicted in
These software modules are generally executed by processor 714 alone or in combination with other processors. Memory 725 used in the storage subsystem 724 can include a number of memories including a main random access memory (RAM) 730 for storage of instructions and data during program execution and a read only memory (ROM) 732 in which fixed instructions are stored. A file storage subsystem 726 can provide persistent storage for program and data files, and can include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations can be stored by file storage subsystem 726 in the storage subsystem 724, or in other machines accessible by the processor(s) 714.
Bus subsystem 712 provides a mechanism for letting the various components and subsystems of computing device 710 communicate with each other as intended. Although bus subsystem 712 is shown schematically as a single bus, alternative implementations of the bus subsystem 712 can use multiple busses.
Computing device 710 can be of varying types including a workstation, server, computing cluster, blade server, server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computing device 710 depicted in
In situations in which the systems described herein collect or otherwise monitor personal information about users, or can make use of personal and/or monitored information), the users can be provided with an opportunity to control whether programs or features collect user information (e.g., information about a user's social network, social actions or activities, profession, a user's preferences, or a user's current geographic location), or to control whether and/or how to receive content from the content server that can be more relevant to the user. Also, certain data can be treated in one or more ways before it is stored or used, so that personal identifiable information is removed. For example, a user's identity can be treated so that no personal identifiable information can be determined for the user, or a user's geographic location can be generalized where geographic location information is obtained (such as to a city, ZIP code, or state level), so that a particular geographic location of a user cannot be determined. Thus, the user can have control over how information is collected about the user and/or used.
In some implementations, a method implemented by one or more processors is provided, and includes receiving natural language (NL) based input associated with a client device; and generating a large language model (LLM) response based on processing the NL based input using an LLM. Generating the LLM response includes obtaining a set of response evaluation criteria; generating a plurality of candidate LLM responses based on processing the NL based input using the LLM; generating, for each of the plurality of candidate LLM responses, a corresponding critique response based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM; and selecting, based on the corresponding critique responses, one of the plurality of candidate LLM responses as the LLM response. The method further includes causing the LLM response to be rendered at the client device.
These and other implementations can optionally include one or more of the following features.
In some implementations, each of the corresponding critique responses can include an indication of an extent to which a corresponding one of the plurality of candidate LLM responses complies with the set of response evaluation criteria.
In some versions of these implementations, the indication of the extent to which a corresponding one of the plurality of candidate LLM responses complies with the set of response evaluation criteria can include a comparison measure, the comparison measure can be generated, for each of the plurality of candidate LLM responses, based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM.
In some implementations, obtaining the set of response evaluation criteria can include generating the set of response evaluation criteria based on processing the NL based input using the LLM. In some versions of those implementations, generating the set of response evaluation criteria can include: generating a request for the LLM to generate a set of response evaluation criteria based on the NL based input; and processing the request using the LLM to generate the set of response evaluation criteria.
In some implementations, obtaining the set of response evaluation criteria can include obtaining user information associated with the user of the client device; and determining the set of response evaluation criteria based on the user information.
In some implementations, obtaining the set of response evaluation criteria can include obtaining information indicative of a set of response evaluation criteria associated with a third party (3P); and determining the set of response evaluation criteria based on the obtained information.
In some implementations, generating a corresponding critique response for a given candidate LLM response can include: generating a request for the LLM to determine which of the set of response evaluation criteria the given candidate LLM response complies with; and processing the request using the LLM to generate the corresponding critique response.
In some implementations, a method implemented by one or more processors is provided, and includes generating training data for fine-tuning a large language model (LLM). Generating the training data includes obtaining a natural language (NL) based input for the LLM; obtaining a set of response evaluation criteria; generating a plurality of candidate LLM responses based on processing the NL based input using the LLM; generating, for each of the plurality of candidate LLM responses, a corresponding critique response based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM; selecting, based on the corresponding critique responses, one of the plurality of candidate LLM responses as an LLM response that is responsive to the NL based input; and storing, as an instance of the training data, the NL based input along with the LLM response that is selected from among the plurality of candidate LLM responses.
These and other implementations can optionally include one or more of the following features.
In some implementations, the method can further include fine-tuning the LLM based on the training data.
In some versions of those implementations, the method can further include generating, for each of the plurality of candidate LLM responses, and based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM, a corresponding comparison measure, and training a reward model based on the selected one of the plurality of candidate LLM responses and the corresponding comparison measure. Fine-tuning the LLM can include fine-tuning the LLM with reinforcement learning (RL) using the reward model.
In additional or alternative versions of those implementations, the method can further include, subsequent to fine-tuning the LLM: receiving an NL based input associated with a client device; generating an LLM response based on processing the NL based input associated with the client device using the LLM; and causing the LLM response to be rendered at the client device.
In some implementations, each of the corresponding critique responses can include an indication of an extent to which a corresponding one of the plurality of candidate LLM responses comply with the set of response evaluation criteria.
In some versions of those implementations, the indication of the extent to which a corresponding one of the plurality of candidate LLM responses complies with the set of response evaluation criteria can include a comparison measure, the comparison measure can be generated, for each of the plurality of candidate LLM responses, based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM.
In some implementations, obtaining the set of response evaluation criteria can include generating the set of response evaluation criteria based on processing the NL based input using the LLM.
In some versions of those implementations, generating the set of response evaluation criteria can include generating a request for the LLM to generate a set of response evaluation criteria based on the NL based input; and processing the request using the LLM to generate the set of response evaluation criteria.
In some implementations, obtaining the set of response evaluation criteria can include obtaining user information associated with the user of the client device; and determining the set of response evaluation criteria based on the user information.
In some implementations, obtaining the set of response evaluation criteria can include obtaining information indicative of a set of response evaluation criteria associated with a third party (3P); and determining the set of response evaluation criteria based on the obtained information.
In some implementations, generating a corresponding critique response for a given candidate LLM response can include generating a request for the LLM to determine which of the set of response evaluation criteria the given candidate LLM response complies with; and processing the request using the LLM to generate the corresponding critique response.
In some implementations, a method implemented by one or more processors is provided, and includes receiving natural language (NL) based input; and generating a large language model (LLM) response based on processing the NL based input using an LLM. Generating the LLM response can include obtaining a set of response evaluation criteria; generating a plurality of candidate LLM responses based on processing the NL based input using the LLM; generating, at least one critique response based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM; and selecting, based on the corresponding critique responses, one of the plurality of candidate LLM responses as the LLM response. The at least one critique response is indicative of a candidate LLM response from among the plurality of candidate LLM responses which is determined to best comply with the set of response criteria. The method further includes causing the LLM response to be rendered at the client device and/or storing, as an instance of the training data, the NL based input along with the LLM response that is selected from among the plurality of candidate LLM responses.
These and other implementations can optionally include one or more of the following features.
In some implementations, the method further can include: generating a plurality of critique responses based on comparing each of the plurality of candidate LLM responses to the set of response evaluation criteria using the LLM; determining which of the candidate LLM responses is most often determined to be the candidate LLM response which best complies with the set of response evaluation criteria in the plurality of critique responses. Selecting one of the plurality of candidate LLM responses as the LLM response can include selecting the candidate LLM response which is most often determined to best comply with the set of response evaluation criteria in the plurality of critique responses.
In addition, some implementations include one or more processors (e.g., central processing unit(s) (CPU(s)), graphics processing unit(s) (GPU(s), and/or tensor processing unit(s) (TPU(s)) of one or more computing devices, where the one or more processors are operable to execute instructions stored in associated memory, and where the instructions are configured to cause performance of any of the aforementioned methods. Some implementations also include one or more computer readable storage media (e.g., transitory and/or non-transitory) storing computer instructions executable by one or more processors to perform any of the aforementioned methods. Some implementations also include a computer program product including instructions executable by one or more processors to perform any of the aforementioned methods.
Number | Date | Country | |
---|---|---|---|
63466132 | May 2023 | US |