The present disclosure generally relates to human-in-the-loop cyber-physical systems, and in particular, to a system and associated methods for applying large language models (LLM) to human-in-the-loop human-in-the-plant cyber physical systems (CPS) to translate high level prompts into action plans that align with the physical system dynamics of the CPS and are also safe for the human users.
Safety criticality implies that operation of an autonomous cyber physical system (CPS) has the potential to harm human participants who are affected by the CPS goal.
Given the impending risks to the human user, safety critical applications often operate with a human in the loop (HIL) system. In such systems, the human is in charge of starting and stopping automation and can provide manual inputs when safety concerns or operational inefficiencies are perceived. In medical applications such as automated insulin delivery, this system results in a human in the loop-human in the plant (HIL-HIP) system model.
The HIP component results in complex dynamical systems such as biological or biochemical processes, with hard requirements on the safety criteria that must be satisfied under all circumstances. Moreover, the HIP components contribute to increased variability and uncertainty in the plant dynamics compared to CPS without HIP.
It is with these observations in mind, among others, that various aspects of the present disclosure were conceived and developed.
Corresponding reference characters indicate corresponding elements among the view of the drawings. The headings used in the figures do not limit the scope of the claims.
Aspects of the present disclosure relate to inventive concepts for implementing large language models (LLM) in human-in-the-loop human-in-the-plant cyber physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision making automated by a real-world CPS controller to achieve a control goal. The present disclosure shows that it is relatively straightforward to contextualize LLMs so that they can generate domain-specific plans. However, these plans may be infeasible for the physical system to execute or the plan may be unsafe for human users. To address this, the present disclosure outlines CPS-LLM, a computer-implemented system implementing an LLM that is retrained using an instruction-tuning framework, which ensures that generated plans not only align with the physical system dynamics of the CPS but are also safe for the human users. CPS-LLM includes two innovative components: a) a physical dynamics coefficient estimator based on a liquid time constant neural network that can derive coefficients of dynamical models with some unmeasured state variables; b) the model coefficients are then used to train an LLM with prompts embodied with traces from dynamical system and the corresponding model coefficients. Results show that when CPS-LLM is integrated with a contextualized chatbot such as BARD, it can generate feasible and safe plans to manage external events such as meal for automated insulin delivery systems used by Type 1 Diabetes subjects.
Referring to
Deployment: In this stage, the user provides two inputs:
With these inputs, the plan generation mechanism operates with the following steps:
To summarize, the present disclosure outlines the following contributions: a) in the planning domain, the feasibility of using LLMs is evaluated for safe and effective generation of usage plan for CPS; and b) in the machine learning (ML) domain, a liquid time constant neural network-based model parameter estimation is demonstrated for CPS when some of the state variables of the physical dynamics are not measured.
Safety criticality implies that the operation of the autonomous cyber physical system (CPS) has the potential to harm human participants who are affected by the CPS goal. Given the impending risks to the human user, safety critical applications often operate with a human in the loop (HIL) system (Li et al. 2014). In such systems, the human is in charge of starting and stopping automation, and can provide manual inputs when safety concerns or operational inefficiencies are perceived. In medical applications such as automated insulin delivery, this system results in a human in the loop-human in the plant (HIL-HIP) (Maity, Banerjee, and Gupta 2023) system model (
Existing safety certification process generally assume a control affine system model, where the plant state X is assumed to follow the dynamics in Eqn 1 below:
Large scale deployment and day-to-day usage imply that a significant number of users will be non-conformal to the “average user” settings, resulting in novel and unforeseen usage scenarios. To achieve a level of performance similar to that obtained in the safety certification process, a real user may undertake personalization usage plans. These plans consist of a temporal sequence of b external inputs (uex(ti)) at times qi and/or a system configuration changes (S(pi)) at times qi applied with or without consultations from expert advisory agents (such as clinicians), {s(p1) . . . s(pa)}∪{uex (q1) . . . uex (qb)} Such inputs may have a causal relation with the HIP state X, are out of distribution, and may violate safety criteria. Such unverified sonalization usage plan carries the risk of compromising operational safety (Banerjee et al. 2023; Maity et al. 2022).
Inventive Solutions: In the present disclosure, it is assumed that the autonomous system (
Here, uex∈Uex is a set of personalized inputs, and s∈S is a set of controller configuration changes specific to a real-life user. The presented technique is validated by generating safe usage plans for automated insulin delivery (AID) systems aimed at controlling glucose levels in individuals with Type 1 Diabetes (T1D).
Formally the problem can be defined as follows (
Three broad classes of safe CPS control synthesis exist:
Reinforcement learning approach: Safe RL (Garcia and Fernandez 2015) is an emerging approach that models agents with a value function that has control objective as the reward and safety violation as the penalty function (Garcia and Fernandez 2015). Safe RL technique starts an initial safe model predictive control (MPC) design that may not be effective, and for each control step evaluates the value function. If the value function is less than a threshold indicating heavy penalty, the safe RL defaults to the MPC strategy, else it uses the strategy obtained by maximizing the value function. This approach has been frequently used in robotics; however, the value function evaluation strategy does not involve human inputs.
The key advantages that LLMs like GPT3 (Floridi and Chiriatti 2020), BARD (AI 2023), LLAMA2 (Touvron et al. 2023) offer over the above-mentioned traditional techniques are:
The LLM's responses are generated based on the patterns learned from diverse data sources. It can generate creative and imaginative responses, which might or might not align with factual or realistic plans. The efficacy of LLMs in generating accurate plans or delivering meaningful responses without hallucination depends on the quality of prompts provided, and the inherent capabilities of the model. In relation to LLMs it is helpful to clarify the assumed meanings of the following terms.
In the context of the present disclosure including associated study, the capabilities of LLMs can be categorized into the following key areas:
LLMs demonstrates exceptional performance in natural language processing tasks, however, using them to generate a sequence of external inputs and controller set points in the continuous time real number domain is still an unexplored area. In this domain, LLMs are yet to be tested extensively.
Referring again to
Deployment: In this stage, the user provides two inputs: a) a natural language prompt that describes a CPS usage plan discovery task through a chat RL interface, BARD in this case (AI 2023), and b) a trace {X(t) ∀t ∈[t0−th, t0]} of the physical dynamics of the CPS, where t0 is the current time and the th is the past horizon. With these inputs, the plan generation mechanism can operate with the following steps:
At least two innovations are presented: a) in the planning domain, the feasibility of using LLMs for safe and effective generation of usage plan for CPS is evaluated, and b) in the machine learning (ML) domain, a liquid time constant neural network-based model parameter estimation for CPS is demonstrated when some of the state variables of the physical dynamics are not measured.
The usage of CPS-LLM is illustrated using the example of the Artificial Pancreas (AP). The AP uses the HIL-HIP architecture and is a safety-critical medical device. The LLM based planning architecture is used to protect the system from critical errors as well as personalize the system based on the dynamically changing user context. AID systems are exemplary CPS used by T1D subjects to automate insulin delivery with the aim of controlling blood glucose level within a tight range of 70 mg/dl to 180 mg/dl, while preventing hypoglycemia when blood glucose level measured by the Continuous Glucose Monitor (CGM) falls below 70 mg/dl. However, AID systems may not effectively handle glucose fluctuations induced by factors like meals, exercise, or medication intake such as hydrocortisone. In order to maintain safe and efficient operation, the user has to undertake a CPS usage plan by either providing external bolus insulin uex or by changing the set point configuration of the AID controller s. For example, the Loop AID system (Jeyaventhan et al. 2021), has a set point of 90 mg/dl throughout the day, except for mealtime when the set point is increased to 110 mg/dl and an external insulin bolus is injected. The set point is reverted back to 90 mg/dl 2 hrs after meal intake. The bolus computation follows the standard clinical process, where the user sets a carb ratio (CR) which is the units of insulin used to cover per gram of carbohydrate. Before a meal intake, the user makes an informed estimate of the grams of carbohydrate. The insulin dosage is then computed as the ratio of the grams of carbohydrate to the CR minus any residual insulin still in the body, also known as insulin on board (IOB). This residual insulin or IOB depends on the insulin pharmacokinetics, given by Equation 4, which is the plant dynamics obtained from Bergman Minimal Model (BMM) (Bergman 2021), and is very difficult for a human to guess.
As such some simple formulas based on linearity assumptions are used by mobile apps to estimate IOB and consequently meal bolus such as Bolus Wizard (Shashaj, Busetto, and Sulli 2008). The insulin intake is assumed to decrease linearly over time, the slope determined by the insulin action time setting set by the user. However, it is a grosses-and often inaccurate. The final meal insulin intake is determined by Equation 4.
A self-adaptive MPC controller Tandem Control IQ (Forlenza et al. 2019) can be used which gives the control actions u=π(X, s). A trace T is a collection of CGM trajectories for an extended run of the AP controller, which in this case includes X=y, z, iob, the control actions u and the set point s. In addition, users can also manually provide priming bolus uex to prepare for an unplanned glycemic event such as meal.
The outcome is measured using four metrics: a) percentage time in range (TIR), 70 mg/dl<CGM<180 mg/dl, b) mean CGM, c) time above range (TAR), when CGM>180 mg/dl, and d) time below range (TBR), when CGM<70 mg/dl.
Here, it is demonstrated how the CPS-LLM can be used to derive safe meal management plan when integrated with an AID controller that relies on the human user to inject external insulin to control post-prandial (after meal) hyperglycemia. In this section, the performance of any general LLM used for this planning purpose is shown and further shown in the subsequent section is how CPS-LLM provides much safer and more efficacious insulin dosage recommendation.
STL formulas can be applied to continuous time signals to define specific properties that hold true over some notions of time. STL formula satisfaction can be evaluated using a robustness function (Donzé and Maler 2010).
The robustness value ρ maps an STL ϕ, the continuous time signal and a time t∈[0, T] to a real value. American Diabetes Association (ADA) established safety criteria can be specified using STL ϕt:GI(TBR<4%), where GI implies globally true.
In some aspects, when evaluating the action plan for safety through forward simulation of physical dynamics of the s physical system, the systems outlined herein can perform formally-specified safety checks using safety criterion descriptive of safe operation constraints of the dynamical physical system which may be formatted using STL. The safety checks may be performed using simulated/projected (expected) continuous time signals associated with the physical dynamics of the dynamical physical system, and may also be continually or periodically evaluated during execution of an action plan using continuous time signals that are measured or otherwise derived in real-time.
As such, the system can access a safety criterion descriptive of safe operation constraints of the dynamical physical system, the safety criterion being formatted using signal temporal logic and evaluate the action plan with respect to the safety criterion and a continuous time signal associated with the physical dynamics of the dynamical physical system.
The safety of the LLM generated plan is evaluated using forward simulation. For the AID system the T1D simulator from UVA PADOVA (Man et al. 2014) can be used. In the T1D simulator, virtual patients can be instantiated with the same BMM model coefficients as obtained from the LTC NN discussed in the “Liquid Time Constant Neural Networks based coefficient estimation” section. Simulations can be made for the future time horizon tf to determine whether the plan generated by the LLM is safe.
In one example, an untuned LLAMA 7B model (Touvron et al. 2023) was used and contextualized using the prompts shown below.
Based on this contextualization, the LLAMA 7B model was prompted with the following inference prompt:
The response obtained from the LLM was as follows:
Clearly the LLM was computing insulin dosage since it could not infer that IOB should be subtracted from Carb intake/carb ratio computation. Instead it added the IOB to the ratio and resulted in 2 U higher insulin dosage which may result in severe hypoglycemia.
A prompt with a serious safety condition of hypoglycemia resulted in a physically incoherent explanation of insulin computation. The following prompt was provided.
The response provided by the LLM model was as follows.
This is a numerically and physically incoherent explanation of the insulin dosage.
When the physical dynamics of the human body were used as a prompt, the LLAMA2 7B model was unable to produce the required results.
Below is an instruction that describes the task of finding the Insulin On Board of a type 1 diabetic patient paired with a diffusion parameter of the Bergman Minimal Model for an insulin intake. Write a corresponding output that is the Insulin On Board time series.
The response provided by the base LLAMA2 7B model is as follows.
As can be seen, the LLM model did not generate any meaningful response to the prompt embedded with physical dynamics.
A virtual patient was used with BMM parameters shown in Table 2 as simulation settings. 218 meal instances were generated of sizes ranging from 7 g to 50 g for various carb ratio settings ranging from 10 to 25. The virtual patients were set up with prior insulin usage starting from 30 mins before meal to 3 hrs before meal. An MPC controller was integrated similar to Control IQ that generates the insulin outputs u=π(X, s) in addition to the prior bolus and also the meal bolus. The meal bolus for each of the cases were generated by the CPS-LLM and compared against un-tuned LLM and bolus wizard.
LNNs are neural networks where the hidden state dynamics are given by a time constant component and a parameterized non-linear component. LNNs are considered to be universal function approximators and are shown to learn complex non-linear functions with much less number of cells than traditional deep learning techniques.
LTC NN based diffusion coefficient estimate: The LTC NN based encoder decoder architecture is shown in
For this experiment, two different types of LLMs were used: 1) Proprietary LLMs accessed via an API, and 2) Open Source LLMs. The first LLM category (BARD) was used to develop domain-specific embodied prompts based on user queries. These embodied prompts incorporate various personalized factors of the user. The second category of LLMs used is the state-of-the-art LLAMA2 model developed by Meta AI. This model is fine-tuned on domain-specific datasets that compass the constraints from both the cyber and the physical world. The 7B base version of the LLAMA2 model was used for this experiment.
Prompt Generation. The BARD model was used using the interactive GUI. For the BARD model, the model was primed with a few examples and used it to generate personal-domain-specific embodied prompts. Upon careful consideration of the different prompting techniques, the ALPACA (Taori et al. 2023) format for fine-tuning the LLAMA2-7B model was used. To prime the model for better instruction tuning one can use the following system prompt: “Below is an instruction that describes the task of finding the diffusion parameter of the Bergman Minimal Model paired with a time series of 40 Insulin on Board.” The system prompt is followed by an instruction, an input, and the corresponding output. An example of the entire prompt is as follows.
The fine-tuned LLAMA model, i.e. CPS-LLM, was tested with the query of the following form.
Below is an instruction that describes the task of finding Insulin On Board of a type 1 diabetic patient paired with a diffusion parameter of the Bergman Minimal Model for an insulin intake. Write a corresponding output that is the Insulin On Board timeseries. ###Instruction: I took an insulin dosage now. What is my Insulin On Board percentage timeseries? ###Input:
The following form of response was obtained from the CPS-LLM model:
Three different CPS usage plan generation mechanisms were tested, each interfaced with the MPC Control IQ type controllers.
The first approach is manual plan generation, where the user uses the bolus wizard and the standard linear assumption on the IOB computation to compute the meal bolus insulin in accordance with the rule described in Equation 4.
The second approach is the un-tuned LLM LLAMA2 7B model interfaced with contextualized BARD chat RL to determine the usage plan and integrated with MPC.
The third approach is the integration of CPS-LLM (fine-tuned LLAMA2 7B model), contextualized BARD and the MPC controller.
Importantly, when there is a violation (i.e., detected through evaluation of the action plan or one or more traces of the dynamical physical system for safety), the system can generate and re-check an updated action plan with appropriate prompting.
For example, the system can: access one or more traces of the dynamical physical system and a safety criterion descriptive of safe operation constraints of the dynamical physical system; generate, by the chatbot model that integrates the neural network implementing the Large Language Model, an updated action plan based on the text prompt, the one or more traces of the dynamical physical system, and the safety criterion; and evaluate the updated action plan for safety.
This also allows dynamic plan generation, i.e., if something changes during execution of an action plan, then the system can generate and check an updated action plan based on new context. The new context may be provided by the user through a text prompt or any other interface available to the user. Additionally, new context may be provided through measurements obtained by a CGM monitor or another sensor associated with the dynamical physical system. New context may include, but are not limited to: information about changes in medication or nutrition amount/type, changes in user schedule or times for access to medication or nutrition (e.g., work or travel plans, etc.), and physical aspects of the user's body itself (e.g., exercise, rest, hydration, measured or unmeasured values, etc.).
In some examples, the system may detect a deviation from one or more parameters associated with the action plan. In such a case. generation of the updated action plan can be responsive to detection of the deviation. The one or more parameters associated with the action plan could include values associated with the one or more traces of the dynamical physical system, which may be simulated or measured/derived in real-time. The one or more parameters associated with the action plan could also include one or more nutritional or medication parameters which may change, such as but not limited to carb or fiber intake or insulin dosage/time/type.
The present disclosure has demonstrated the feasibility of using LLMs in planning the personalized usage of a CPS. An important question has been answered in the planning community and shown first use of LLMs in planning control tasks for safety critical human in the loop and human in the plant systems. The example used herein is in the medical domain, which enhances the significance of the results. The main observations are that it is feasible to use LLMs for planning control tasks, provided two important steps are meticulously designed: a) contextualization of the chat RL, and b) fine tuning of the LLM internal weights through bodied training, where textual instructions and interpretations are intertwined with traces from the real world system. This is only an initial attempt at using LLMs in safety critical planning and has only been shown for one example. However, the methodology is general and its application to other examples such as autonomous cars and unmanned aerial vehicles is yet to be tested. The approach described herein may start a new domain of research that is crucial for the progress of LLMs and planning.
The functions performed in the processes and methods, described herein may be implemented in differing order. Furthermore, the outlined steps and operations are provided as examples, and some of the steps and operations may be optional, combined into fewer steps and operations, or expanded into additional steps and operations without detracting from the essence of the disclosed embodiments.
Device 300 comprises one or more network interfaces 310 (e.g., wired, wireless, PLC, etc.), at least one processor 320, and a memory 340 interconnected by a system bus 350, as well as a power supply 360 (e.g., battery, plug-in, etc.). Device 300 can also include a display device 330 that enables a user to view or otherwise interact with the aspects of the system 100 shown in
Network interface(s) 310 include the mechanical, electrical, and signaling circuitry for communicating data over the communication links coupled to a communication network. Network interfaces 310 are configured to transmit and/or receive data using a variety of different communication protocols. As illustrated, the box representing network interfaces 310 is shown for simplicity, and it is appreciated that such interfaces may represent different types of network connections such as wireless and wired (physical) connections. Network interfaces 310 are shown separately from power supply 360, however it is appreciated that the interfaces that support PLC protocols may communicate through power supply 360 and/or may be an integral component coupled to power supply 360.
Memory 340 includes a plurality of storage locations that are addressable by processor 320 and network interfaces 310 for storing software programs and data structures associated with the embodiments described herein. In some embodiments, device 300 may have limited memory or no memory (e.g., no memory for storage other than for programs/processes operating on the device and associated caches). Memory 340 can include instructions executable by the processor 320 that, when executed by the processor 320, cause the processor 320 to implement aspects of the system 100 and the methods outlined herein.
Processor 320 comprises hardware elements or logic adapted to execute the software programs (e.g., instructions) and manipulate data structures 345. An operating system 342, portions of which are typically resident in memory 340 and executed by the processor, functionally organizes device 500 by, inter alia, invoking operations in support of software processes and/or services executing on the device. These software processes and/or services may include CPS-LLM processes/services 390, which can include aspects of the methods and/or implementations of various modules described herein. Note that while CPS-LLM processes/services 390 is illustrated in centralized memory 340, alternative embodiments provide for the process to be operated within the network interfaces 310, such as a component of a MAC layer, and/or as part of a distributed computing network environment.
It will be apparent to those skilled in the art that other processor and memory types, including various computer-readable media, may be used to store and execute program instructions pertaining to the techniques described herein. Also, while the description illustrates various processes, it is expressly contemplated that various processes may be embodied as modules or engines configured to operate in accordance with the techniques herein (e.g., according to the functionality of a similar process). In this context, the term module and engine may be interchangeable. In general, the term module or engine refers to model or an organization of interrelated software components/functions. Further, while the CPS-LLM processes/services 390 is shown as a standalone process, those skilled in the art will appreciate that this process may be executed as a routine or module within other processes.
It should be understood from the foregoing that, while particular embodiments have been illustrated and described, various modifications can be made thereto without departing from the spirit and scope of the invention as will be apparent to those skilled in the art. Such changes and modifications are within the scope and teachings of this invention as defined in the claims appended hereto.
This is a non-provisional patent application that claims benefit to U.S. Provisional Patent Application Ser. No. 63/615,231 filed on Dec. 27, 2023, which is herein incorporated by reference in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| 63615231 | Dec 2023 | US |