The application claims the benefit of British Patent Application No. 2315750.6 filed Oct. 13, 2023, and entitled “A COMPUTER-IMPLEMENTED METHOD FOR GENERATING AN ACKNOWLEDGEMENT IN AN AUTOMATED CONVERSATIONAL HEALTHCARE PIPELINE,” which is hereby incorporated by reference in its entirety.
The present invention relates to generation of acknowledgements in automated conversational healthcare pipelines. More particularly, the present invention relates to a method for generating an acknowledgement in an automated conversational healthcare pipeline, and related data processing apparatuses, computer programs, and computer-readable storage media.
Conversational computing and artificial intelligence are becoming increasingly pervasive, supported by the presence and integration of such technologies on phones, appliances, and in cars. In addition, the awareness of an individual's state of well-being is on the rise. Consequently, provisions for providing support, coaching, treatment and/or therapy are of interest.
Typical conversational computing systems are relatively uncomplex. The complexity of a software application (or “bot”) running an interactive system may be measured in “turns”—i.e., the number of interactions between the bot and the user required to complete the given activity. A bot that enables a user to, for example, check the weather forecast for a given location or confirm the timing of their next medication, may require between one and ten turns.
In contrast, automated conversational healthcare interactions are complex. In patient-therapist text-based cognitive behavioural therapy (CBT), for example, a patient may typically spend around 6 hours in therapy sessions in which the CBT protocol is delivered. There will be, on average, around 50 “turns” per hour per patient and therefore systems may be required to handle several hundred turns. Other protocols or strategies, including specific forms of CBT protocols, may also be delivered, and may be deemed healthcare protocols, delivered to the patient or user in a healthcare pipeline.
In order to address this level of complexity in a healthcare pipeline, the protocol may be divided into a plurality of elements of care, each of which may be delivered by a dedicated sub-dialogue unit or bot. The overall pipeline may comprise a conversation, or dialogue, between the user and the computing system (or operating clinician thereof) and may be divided into a number of different stages, or sub-dialogues, wherein a separate sub-dialogue unit may deliver each stage.
One challenge that arises in typical healthcare pipelines is the inadequacy of conversational “acknowledgements” provided in response to user inputs. Typical conversational computing systems use limited sets of pre-templated, slot-filled responses, because clinicians see these as safer in this domain. User research suggests that an important acceptability criteria for the end user is that they feel heard and understood, which is not routinely accomplished with conventional techniques.
As a simple example, consider a conversational computing system implementing a known healthcare pipeline. A typical dialogue between a user and the system may begin with the system asking: “How do you feel?”, to which the user may respond: “Not great. My dog was sick this week.”. A known system may perform simple textual sentiment analysis on the user input and respond with a prescribed response: “Sorry to hear that”. Internal research indicates that users award an average score of only 3.04 out of a maximum of 5 for the ability of current known automated conversational healthcare systems to provide tailored communication.
Tailored communication is one manner in which conversational computing systems can administer successful healthcare treatments, as tailored communication can motivate the user to interact with the system. Studies have shown that users who type (share) more readily and frequently in early sessions are more likely to engage in treatment.
One approach to address this inadequacy of automated conversional healthcare systems is to modify existing pipelines with further and more granular pathways, so as to map a user input more appropriately to an acknowledgement, which provides a more suitably tailored response to the user input. However, this additional complexity requires significantly increased processing capabilities, which is not desirable in conversational contexts, where the user should not face time delays during sessions.
The inventors have therefore realised that improvements related to communication between a user and a chatbot within a healthcare protocol are desired.
The invention is defined in the independent claims, to which reference should now be made. Further features are set out in the dependent claims.
According to an aspect of the invention, there is provided a computer-implemented method for generating an acknowledgement in an automated conversational healthcare pipeline (that is, a conversation or dialogue between user and system in a healthcare context). The automated conversational healthcare pipeline may be offered to the user for the specific treatment of a mental health disorder using a specific treatment methodology.
The method includes a step of receiving an input from a user. The input may be accepted via a graphical user interface (GUI) provided on a user device. The input may be responsive to a question or query provided by a conversational computing system implementing the healthcare pipeline, presented on the GUI. The input may be provided via typed text or via spoken word, transcribed by the user device or the conversational computing system into text.
The method includes a step of deciding, by the conversational computing system, whether to provide a generated acknowledgement to the user. The generated acknowledgement is responsive to the user input. The conversational computing system makes the decision through use of a filtering mechanism. The filtering mechanism, which comprises any number of individual filters, is configured to avoid the output inappropriate acknowledgements by means of, for example, classification of the input and/or acknowledgement in respect of clinical appropriateness or suitability to the particular healthcare context of concern. An acknowledgement, in this context, refers to a message from the computing system, explicitly or implicitly informing the user that the user's input has been received. The acknowledgment may be in the form of a single utterance or may be in the form of multiple utterances, within a longer running dialogue. The acknowledgment may utilise or implement such techniques (at least in part) as mirroring (e.g., imitation of the user's speech pattern or use of terminology) and Socratic questioning (e.g., seeking clarification from the user on an aspect of their input; challenging a user's assumptions; exploring implications and consequences of user's input).
The method includes a step of, in response to a decision to provide the generated acknowledgement, outputting the generated acknowledgement to the user on the GUI. The generated acknowledgement is produced by processing the input through a generative acknowledgement model. The generative acknowledgment may be a large language model (LLM), in the form of a Llama 2 model or a ChatGPT model.
The method includes a step of continuing with the automated healthcare pipeline. Without any further user input, the conversational computing system generates a next output of the conversational healthcare pipeline, for instance based on the user input. The conversational computing system then causes display of the next output of the conversational healthcare pipeline on the GUI of the user device.
The computing system may thereby respond empathically to the user in a way that is concise, and in a way that does not introduce unwanted content (e.g., unverified clinical content or further questions) into the healthcare pipeline. The use of the generative acknowledgement model is only instigated when the filtering mechanism deems it acceptable to do so.
The technique does not create new conversational pathways and thus avoids the need to modify existing pipelines with further and more granular pathways. The method therefore avoids necessary increases in processing capabilities as required with known techniques.
The method provides assistance to the user for entering text into the implementing computer system through provision of a generative acknowledgment, which is shown to improve rates and extent of interaction with the computer system.
Traditional techniques for implementing dialogue systems using separate modules for natural language understanding (NLU), dialogues planning (DP), and natural language generation (NLG) pose limitation of the variety of responses that the user may receive. Conventional techniques therefore may make the language produced by the system seem inflexible, and thus unengaging. By contrast, systems built around large foundation models are able to generate richer and more natural-sounding responses. However, these types of systems are difficult to control over a longer conversation, which makes them prone to diverging away from the clinical protocol they are required to deliver. Systems implementing the method according to embodiments are controllable by using a traditional dialogue planner, but are also able to generate rich and natural-sounding responses. This is particularly useful when acknowledging responses from the user, where the user has invested a high degree of effort, or emotion.
The automated conversational healthcare pipeline (and the output of a generated acknowledgement therein) may be for the treatment of any or all of the following mental health disorders: generalised anxiety disorder; depression; obsessive compulsive disorder (OCD); post-traumatic stress disorder (PTSD); or phobia disorder. Additionally, the automated conversational healthcare pipeline (and the output of a generated acknowledgement therein) may be for the treatment of disorders such as chronic fatigue; chronic pain; and irritable bowel syndrome.
Furthermore, the automated conversational healthcare pipeline (and the output of a generated acknowledgement therein) may improve the wellbeing of an individual by, for example, reducing symptoms of worry or stress, social anxiety or sleeping problems.
Mental health disorders may affect individuals with chronic illnesses (long-term health conditions that may not have a cure), such as diabetes (whether type 1 or type 2), asthma, arthritis, cancer, chronic obstructive pulmonary disease, heart disease and stroke, kidney disease, and viral diseases such as hepatitis C and HIV/AIDS. Management of associated mental health conditions are an important factor in ongoing treatment for chronic illnesses and, for example, helps patients with adherence to, and persistence with, medication plans.
The automated conversational healthcare pipeline (and the output of a generated acknowledgement therein) may be used as part of the treatment of any or all of the following chronic illnesses: diabetes (whether type 1 or type 2), asthma, arthritis, cancer, chronic obstructive pulmonary disease, heart disease and stroke, kidney disease, and viral diseases such as hepatitis C and HIV/AIDS.
The automated healthcare pipeline may be utilised by a patient on a “wait list”, that is a patient awaiting their first appointment with a healthcare professional, and/or between appointments with their healthcare professional and, furthermore, may be utilised by patients who have completed a course of treatment in order to prevent, or reduce the risk of, relapse.
The automated conversational healthcare pipeline may deliver psychological therapies. The type of psychological therapy may be dependent on a diagnosis or it may be transdiagnostic in that the therapy focuses on making changes that target common patterns of functioning across psychological disorders, rather than being specific to a diagnosis.
The type of psychological therapy may be selected from: cognitive behavioural therapy (CBT), cognitive therapy, behavioural therapy, rational emotive behavioural therapy, exposure therapy, emotional schema therapy, schema therapy, mindfulness based cognitive therapy, acceptance and commitment therapy, compassion focussed therapy, dialectical behaviour therapy or metacognitive therapy. The psychological therapy may also be interpersonal psychotherapy (IPT) or psychodynamic therapy (PDT).
The person skilled in the art will recognise that this list is not closed as new psychologic therapies and protocols are continually being developed and assessed for clinical effectiveness and may be based on elements from the aforementioned psychological therapies. The psychological therapy may be any therapy currently approved by the National Institute of Clinical Excellence (NICE) for use in the Improving Access to Psychological Therapies (IAPT) (now known as “Talking Therapies”) Manual (updated annually). The generated acknowledgement may be varied depending on the treatment method in question.
Optionally, the filtering mechanism includes a plurality of filters. The plurality may include an input filter, which is to be applied to the user input. The input filter is configured to decide whether to produce the generated acknowledgement. Of course, the input filter may be further divided into numerous functional input filters. The plurality may also include an output filter, which is to be applied to a produced generated acknowledgement. The output filter is configured to decide whether to output the generated acknowledgement. Of course, the output filter may be further divided into numerous functional output filters. These two safety mechanisms are able to cope with different circumstances (i.e., inappropriate user input and inappropriate generated acknowledgement).
Any or all of the filters involve an aspect of probabilistic filtering, which may assess the probability of the generated acknowledgements as being an appropriate utterance. For instance, a filter may be configured to assess probabilities or logarithmic probabilities (logprobs) output from the generative acknowledgment model. For example, a probabilistic filter may decide not to provide a generated acknowledgement to the user in the event that a corresponding logprob is below a threshold value.
Optionally, the method may be intended for improving security when generating an acknowledgement in an automated conversational healthcare pipeline. Deciding whether to provide the generative acknowledgement may thus include a step of processing the input using an input outlier filter of the filtering mechanism, which is configured to determine if the input is an outlier, relative to expected or usual inputs. For instance, the input outlier filter may be an embedding-based classifier, and the input may be classified as an input outlier in the case that the distance in embedding or latent space between an embedding of the input and a distribution of embeddings of known inputs exceeds an input predetermined threshold. Distance in this context may be, for instance, the Mahalanobis distance.
In response to a determination that the input is an input outlier (e.g., is an unusual or unexpected input), deciding whether to provide the generative acknowledgement may also include a step of classifying the input to be unsafe and/or unsuitable. In this way, the system flags the input as being unsuitable for the output of a generated acknowledgment. The implementing system then decides not to provide any generated acknowledgement to the user.
Use of an input outlier filter of the filtering mechanism in this way avoids cases where unsafe user input may disrupt the system (e.g., via prompt injection) and/or cause an unsafe acknowledgement. The use of an input outlier filter therefore offers a simple way of protecting both the system and the user.
Use of the input outlier filter may result in a decision to carry on with the generative acknowledgement process, in which case further filters may be applied, such as an output outlier filter, a probabilistic filter, and/or a combination filer. Of course, such filters may be used without use of an input outlier filter and any or all of these filters may be used together.
Optionally, the decision as to whether to provide the generated acknowledgement to the user includes a step of processing the input using the generative acknowledgement model to produce the generated acknowledgement. The method may then include a step of processing the generated acknowledgement using an output outlier filter of the filtering mechanism, which is configured to determine if the generated acknowledgement is an output outlier, relative to expected or usual acknowledgements. For instance, the output outlier filter may be an embedding-based classifier, and the generated acknowledgement may be classified as an output outlier in the case that the distance between an embedding of the generated acknowledgement and a distribution of embeddings of known acknowledgements exceeds an output predetermined threshold.
In response to a determination that the output is an output outlier (e.g., is an unusual or unexpected output), deciding whether to provide the generative acknowledgement may also include a step of classifying the output to be unsuitable and/or unsafe. In this way, the system flags the generated acknowledgment as being unsuitable. The implementing system then decides not to provide any generated acknowledgement to the user.
Use of an output outlier filter of the filtering mechanism in this way avoids cases where unsafe acknowledgements are output to the user.
Optionally, the decision as to whether to provide the generated acknowledgement to the user includes a step of processing a combination of the generated acknowledgement and the user input using a first combination filter of the filtering mechanism. The first combination filter may be a machine-learning based classifier, trained and configured to classify combinations of inputs and corresponding acknowledgements in terms of the combinations' clinical appropriateness (e.g., as assessed by a trained clinician). For instance, the first combination filter may classify a combination in a binary manner, as either “appropriate” or “inappropriate” for output to a user. For example, the first combination filter may be configured to identify if the generated acknowledgement is coherent in view of the input, poses further questions, is socially appropriate, etc. The first combination filter may be an embedding-based classifier.
In response to a determination by the first combination filter that the combination is not clinically appropriate (e.g., the classifier output indicates that the combination fails to meet an appropriateness threshold criteria), deciding whether to provide the generative acknowledgement may also include a step of classifying the output to be unsuitable and/or unsafe. In this way, the system flags the generated acknowledgment as being unsuitable. The implementing system then decides not to provide any generated acknowledgement to the user.
Optionally, the first combination filter may be configured to perform classification of the combination in respect of multiple classes. For instance, the first combination filter may be configured to perform multiple classifications, one for each of the multiple classes. In the event that a predetermined classification criteria are met, the system may flag the generated acknowledgment as being unsuitable. The multiple classes may correspond to multiple appropriateness criteria, such as: the posing of further questions; the normalisation of negative feelings the user may express in the input; and the provision of medical diagnosis. This ensures that the output from the system does not impinge on clinically prepared healthcare protocols.
Optionally, the decision as to whether to provide the generated acknowledgement to the user includes a step of processing a combination of a generated acknowledgement and the user input using a second combination filter of the filtering mechanism. The second combination filter may be a machine-learning based LLM, trained and configured to classify combinations of inputs and corresponding acknowledgements in terms of the combinations' clinical appropriateness (e.g., as assessed by a trained clinician). The second combination filter may involve use of a prepared prompt (textual input given to the model in order to have the model generate an output) for the LLM, which provides terminology definitions and conditions by which the LLM is to classify the combination. For example, where a condition (of the acknowledgment) is introduced in the LLM prompt to define a “personal opinion”, the LLM may then identify personal opinions within generated acknowledgements. The LLM may be pretrained (using off-the-shelf weights and biases for a particular LLM architecture) or may be finetuned to the specific task at hand.
In response to a determination by the second combination filter that the combination is not clinically appropriate, deciding whether to provide the generative acknowledgement may also include a step of classifying the output to be unsuitable and/or unsafe. In this way, the system flags the generated acknowledgment as being unsuitable. The implementing system then decides not to provide any generated acknowledgement to the user.
Optionally, the second combination filter may not be initiated in cases where the expected processing time for the second combination filter exceeds a predetermined time constraint. That is, in response to a determination that the time required for use of the second combination filter of the filtering mechanism to perform the classification would exceed the predetermined time constraint, the method may involve output of the generated acknowledgment without performing the second combination filtering. Of course, alternatively, the method may not output the generated acknowledgement in this scenario. In response to a determination that the time required would not exceed the predetermined time constraint, the method may perform second combination filtering as described above. The expected processing time for the second combination filter may be informed through consulting historical processing times for example user inputs and accompanying acknowledgements of varying lengths and complexities. The predetermined time constraint may be, for example, five seconds.
By imposing a time constraint on the second combination filtering, the method ensures that there is a continued guided human-machine interaction at a pace that is shown to mirror a natural human-human dialogue. This, in turn, supports the provision of tailored communication to the user in a supportive manner.
Optionally, the method includes a step of processing the input using a risk listener. The risk listener is a software application or bot (of functional aspect thereof) configured to judge the input to identify if the user is likely to pose a risk to themselves and/or to pose a risk to others. In response to a positive identification of a risk, the method may further include a step of outputting an alarm to the user and/or to any other relevant party (such as clinical staff or emergency services). The risk listener may be applied before application of the filtering mechanism (or particular aspects thereof), so as to minimise processing requirements induced through filters and through the generative acknowledgment model in a case where it is not appropriate to generate any acknowledgement (rather, it is appropriate to output preconfigured, clinically verified, acknowledgments to deal with such user risk). The risk listener may instead be applied in parallel to the processing of the filtering mechanism.
Optionally, the generated acknowledgement may be produced by the generative acknowledgement model using the user input and using a history of acknowledgements and user inputs. Previous input data may be accessed from a local or a remote data storage server. Further context may also be provided to the generative acknowledgment model, such as user details (age, gender, location, employment status, prescribed medication, etc.) to inform the acknowledgement generation process. This context could be optionally inserted into a generative prompt at specific locations or following specific questions of the healthcare pipeline so that the generative acknowledgement model may include certain details for a more tailored and personalised interaction.
Optionally, the method may further include, in response to any decision not to provide a generated acknowledgment, outputting a predetermined acknowledgement to the user. Any predetermined acknowledgement may be clinically coded, in the sense that a trained clinician has prepared the acknowledgement and deemed it suitable for the particular healthcare pipeline at hand.
Optionally, the method may further include, in response to any decision not to provide a generated acknowledgement, revising the generated acknowledgement. For instance, information related to a decision by the filtering mechanism (or filters thereof) to not provide a generated acknowledgement (such as classifications) may be appended to the generated acknowledgement as it is reprocessed using the generative acknowledgement model. In this way, the generative acknowledgement model may tune the generated acknowledgement into a more acceptable state.
Optionally, the method may include provision of a GUI to the user, upon a user device, which is configured to accept the input from the user and to output any generated acknowledgements (i.e., any generated acknowledgement, any revised generated acknowledgement, and/or any predetermined acknowledgement).
Use of the GUI ensures that information in the form of a tailored generated acknowledgment is presented to the user, which is shown to improve the physiological reaction of the user in the sense of continued interaction with the healthcare protocol and improved response times. The conditional output of the generated acknowledgement ensures that this procedure occurs in a clinically safe manner, where the filtering mechanism acts as a gatekeeper performing a clinical safety and appropriateness check.
Embodiments of another aspect include a data processing apparatus or system comprising a memory storing computer-readable instructions and a processor. The processor (or controller circuitry) is configured to execute the instructions to carry out a computer-implemented method for generating an acknowledgement in an automated conversational healthcare pipeline.
The data processing system may be realised in a distributed computing environment, comprising a networked client or user device and a server system. The user device may be configured to receive an input from the user and to transmit the input to the server system. The server system may be configured to receive the user input and to decide whether to provide a generated acknowledgement to the user in response to the input using a filtering mechanism configured to avoid inappropriate acknowledgements. The server system may be configured to, responsive to a decision to provide the generated acknowledgement, output the generated acknowledgement produced by processing the input using a generative acknowledgement model. The server system may be configured to continue with the automated healthcare pipeline, by generating a next output of the conversational healthcare pipeline. The server system may be configured to output and to transmit the next output of the conversational healthcare pipeline. The user device may be configured to receive and display the next output of the conversational healthcare pipeline.
Techniques herein are thereby readily adaptable to limited hardware and bandwidth resources, where, for example, computationally expensive generative steps may be performed remotely (at the server system) relative to the user device.
Embodiments of another aspect include a computer program comprising instructions, which, when executed by computer, causes the compute to execute a computer-implemented method for generating an acknowledgement in an automated conversational healthcare pipeline.
Embodiments of another aspect include a non-transitory computer-readable storage medium comprising instructions, which, when executed by a computer, cause the computer to execute a computer-implemented method for generating an acknowledgement in an automated conversational healthcare pipeline.
The invention may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof. The invention may be implemented as a computer program or a computer program product, i.e., a computer program tangibly embodied in a non-transitory information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, one or more hardware modules.
A computer program may be in the form of a stand-alone program, a computer program portion, or more than one computer program, and may be written in any form of programming language, including compiled or interpreted languages, and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a data processing environment. A computer program may be deployed to be executed on one module or on multiple modules at one site or distributed across multiple sites and interconnected by a communication network.
Method steps of the invention may be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Apparatus of the invention may be implemented as programmed hardware or as special purpose logic circuitry, including e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions coupled to one or more memory devices for storing instructions and data.
The invention is described in terms of particular embodiments. Other embodiments are within the scope of the following claims. For example, the steps of the invention may be performed in a different order and still achieve desirable results.
Elements of the invention have been described using the terms “processor”, “input device” The skilled person will appreciate that such functional terms and their equivalents may refer to parts of the system that are spatially separate but combine to serve the function defined. Equally, the same physical parts of the system may provide two or more of the functions defined. For example, separately defined means may be implemented using the same memory and/or processor as appropriate.
Reference is made, by way of example only, to the accompanying drawings in which:
In the present example, a first filter 31 (“filter #1”) may be configured to identify any unacceptable user inputs, and to ensure that only acceptable inputs are passed through a generated model for the production of generated acknowledgements. The first filter in this example is an input outlier filter. That is, the first filter 31 may be a classification model, configured to identify outlying inputs relative to expected, usual, or conventional inputs. An embedding-based machine learning classification model may be used for this purpose. The embedding-based classification model may be trained using conventional techniques, such that the trained model is configured to construct machine-readable representations of the meaning of inputs (“embeddings”) and to classify the input embeddings as acceptable or unacceptable via closeness analysis between the input embedding and known categories in the input embedding domain.
As an example, an embedding-based model for the first filter 31 may be implemented as a BERT-based model, for instance using the SentenceTransformers framework or as a ROBERTa model (see the work of Reimers, N. & Gurevych, I. arXiv: 1908.10084, and Liu, Y. et al. arXiv: 1907.11692, respectively). In one example, training and testing data may be user utterances data, generated using OpenAI's gpt-3.5-turbo model, based on clinically acquired conversational history (comprising the last 5 conversational turns in a dialogue). Examples (acquired from clinical trials, from real users) of suitable training data include the following user utterances:
With the case of a SentenceTransformers model, training, testing and inference data undergoes dimensionality reduction taking the locally linear embeddings from dim500 to approximately dim100.
The closeness analysis may be performed using the Mahalanobis distance (see, for example, the work of Podolskiy, A. et al. arXiv: 2101.03778). The dimensionally reduced input embeddings may be compared to a set of embeddings representing in-domain (classed as acceptable) user inputs or utterances. Briefly, the Mahalanobis distance, d, (serving as an out-of-domain score) may be defined as:
where ψ(x) is a vector representation (embedding) of the utterance x, μc is the centroid for a class c, and Σ is the co-variance matrix. The estimations of μc and Σ may be defined as:
where INc={x|(x, y)∈
in, y=c}, N is the total number of utterances, and Nc is the number of utterances belonging to class c.
A particular utterance (user input) may be considered to be an outlier if the Mahalanobis distance exceeds a predetermined threshold. Table 1 below indicates example user inputs, the calculated Mahalanobis distance, and the resultant classification as being an outlier or being an in-domain utterance (and thus suitable for subsequent processing). In this example, the predetermined threshold may be set to 10.
Of course, other closeness analysis techniques and metrics (such as cosine similarity) may be used. The above described closeness analysis technique is used to classify user input 30 as either acceptable or unacceptable; more granular classification is also possible, for instance, classifying user input 30 in response to a particular category of question. In such cases, the “acceptability” of a user input 30 may be best determined by calculation of Mahalanobis distance from the input embedding to a specific class of known utterances in embedding space.
As another example of a suitable first filter, the SentenceTransformer “all-MiniLM-L6-v2”, may be used, which is configured to map sentences and paragraphs to a 384 dimensional dense vector space and may be used for tasks like clustering or semantic search. A predetermined threshold of a Mahalanobis distance of 2 may be used: if the distance of a user input 30 in embedding space is greater than 2 from the acceptable cluster, the user input 30 may be deemed an outlier and too far from the intended-use user utterance distribution.
As seen in the above examples, this first filter 31 is particularly suited to improve reliability and safety through providing means to prevent prompt injections (recall that the user input 30 is to be sent to a generative acknowledgement model 32), which is a particular concern in the context of automated conversational healthcare.
As a further example of a first filter 31 (to be used in addition or as an alternative to the classification-based filter described above), one may use statistical n-gram (including 1, 2, 3, 4-gram) language model. Using all patient utterances in a human-human corpus (that is, a database of human-human dialogue), the language model may be able to inform one of the per-word perplexity of any new utterance given the model (where perplexity is a measurement of how well a probability distribution or probability model predicts a sample). Highly improbable user input 30 utterances are awarded a high score, and—above a threshold—flagged as unacceptable.
In the present example, when the input (or embedding thereof) is found by the first filter 31 to be acceptable, the input is passed to a generative acknowledgement model 32. As an example, the generative acknowledgment model 32 may be implemented as a large language model (LLM), for instance using a Llama2 model (see the work of Touvron, H. et al. arXiv: 2307.09288) or OpenAI's gpt-3.5-turbo model. Pre-trained models may be finetuned using training datasets including clinician-approved acknowledgements.
In the present example, a second filter 33 (“filter #2”) may be configured to identify any unacceptable outputs from the generative acknowledgement model. The second filter 33 in this example is an output outlier filter. That is, the second filter 33 may be a classification model, configured to identify outlying outputs relative to expected, usual or conventional outputs (acknowledgements). An embedding-based machine learning classification model may be used for this purpose, as described above in the context of the first filter. The second filter 33 may be configured to classify output embeddings as acceptable or unacceptable via closeness analysis between the output embedding and known categories in the output embedding domain. Table 2 below indicates example generated acknowledgments, the calculated Mahalanobis distance, and the resultant classification as being an outlier or being an in-domain acknowledgment (and thus suitable for subsequent processing). In this example, the predetermined threshold may be set to 20 (i.e., the predetermined threshold does not need to be the same as the predetermined threshold for the first filter).
As seen in the above examples, this second filter 33 is particularly suited to remove inappropriate acknowledgments, which may cause offence to the user and/or may be deemed clinically unsuitable in the present context. The second filter 33 is able to catch far out-of-distribution generated acknowledgments that are unsuitable or unusual, such as prompt attacks, or otherwise indicative of unexpected, unusual behaviour. However, as suggested in
As another example of a suitable second filter 33, the SentenceTransformer “all-MiniLM-L6-v2”, may be used. A predetermined threshold of a Mahalanobis distance of 2 may be used: if the distance of a generated acknowledgement in embedding space is greater than 2 from the acceptable cluster, the generated acknowledgement may be deemed an outlier and too far from the intended-use acknowledgement distribution.
Training data, used as the in-distribution data, may be acknowledgements generated for users correctly interacting with a conversational computing system implementing a known healthcare pipeline. The intended user utterance data may be taken from real users interacting with such a system. Examples (acquired from clinical trials, from real users) of suitable training data include the following generated acknowledgements:
In addition, manual checking of the generated responses used as training data may be performed to ensure generated acknowledgments are in-domain (i.e., generally acceptable).
As a further example of a second filter 33 (to be used in addition or as an alternative to the classification-based filter described above), one may use statistical n-gram (including 1, 2, 3, 4-gram) language model, as described above.
In the present example, a third filter 36 (“filter #3”) may be configured to process a combination of input and generated acknowledgment, to assess the clinical acceptability of the generated acknowledgment specifically in the context of the user input. The third filter 36 in this example is a first combination filter, configured to identify outlying combinations (of input and output) relative to expected, usual or conventional combinations. An embedding-based machine learning classification model may be used for this purpose. An embedding-based machine learning classification model may be used for this purpose, as described above in the context of the first filter 31 and the second filter. The third filter 36 may be configured to perform a binary classification on combinations, as acceptable or unacceptable.
As an example, an embedding-based model for the third filter 36 may be implemented as a BERT-based model, for instance using a DistilBERT model (see the work of Sanh, V. et al. arXiv: 1910.01108), which is known for its small size, and low computational cost to implement.
In this case, the third filter 35 may be trained on generated data to prevent specific suboptimal effects based on clinically relevant concepts, such as negative reinforcement. For instance, to create acceptable acknowledgements to real user utterances, the inventors used a version of a contextual acknowledgement prompt in an LLM (GPT-3.5). To generate unacceptable examples, the inventors used another “negative” prompt, which would, for example, break one of a number of preconfigured clinical rules (e.g., give medical advice). One of the prompts for the LLM used to generate unacceptable acknowledgements is as follows:
The above prompt produced the following examples of unacceptable generated acknowledgments for three different user utterances:
Examples of acceptable generations for three different user utterance include:
Following verification by a clinician and data cleansing (e.g., removing incorrectly labelled data), the pre-trained DistilBERT model was fine-tuned using concatenated user input 30 and generated acknowledgment with the following hyperparameters:
The following results in TABLE 3 were found for test data for the third filter 35 (using a dataset comprising 4038 datapoints, and using an 80:20 train: test split such that 3230 datapoints were used as training data):
In the present example, a fourth filter 36 (“filter #4”) may be configured to process a combination of input and generated acknowledgment, to assess the clinical acceptability of the generated acknowledgment specifically in the context of the user input. The fourth filter 36 in this example is a second combination filter, configured to classify combinations (of input and output) in respect of potential undesirable clinical categorisations.
An LLM-based machine learning model may be used for this purpose. In one example, a pretrained instance of OpenAI's gpt-4 model may be used. Of course, alternative LLMs may be used, and fine-tuning may be applied so as to enable higher quality results and to enable lower latency requests. A suitable LLM prompt to inform the LLM of clinical categorisations and to instruct the model to perform such classification is as follows:
Of course, additional or alternative conditions and definitions may be included within the LLM prompt as required. As an example of application of the above prompt to a combination of user input 30 and generated acknowledgment, consider the example user input 30 utterance “I am really worried about my meeting later, what if it doesn't go well and I look really silly”, and a generated acknowledgement “Don't worry about it, it will be fine”. A pretrained gpt-4 model may provide the following output:
That is, the example fourth filter 36 identifies the example generated acknowledgment (generated in response to the example user input) as containing language that encourages a control agenda (as defined in the example prompt). The example generated acknowledgement does not ask questions and does not include emojis or makes jokes. Again, other categories of unacceptable outputs may be defined, such as those that include discriminatory language or other forms of socially inappropriate language.
Where any condition is satisfied, the fourth filter 36 may indicate the combination as inappropriate. In another arrangement, a predetermined number of conditions (e.g., 2) may require satisfaction in order to classify the combination as inappropriate or unacceptable.
Throughout the automated conversational healthcare pipeline, wherever a filter is triggered so as to identify a user input, a generated acknowledgement, or a combination thereof as inappropriate or unacceptable, the automated conversational healthcare pipeline may continue without use of the generated acknowledgement. Where no filter is triggered, the generated acknowledgement may be output to the user and the automated conversational healthcare pipeline may continue as required.
As indicated in
Briefly, the risk listener may be a functional unit (a bot, a unit, a software application, etc.) of the pipeline, which may comprise a natural language understanding module configured to receive the input from the user and, if present within the input, identify an intent indicating a risk. For example, in a clinical setting, therapists delivering care have a responsibility to monitor their patient for signs of risk to self or others; a similar responsibility may be assigned to the risk listener. The watchful monitoring of user inputs for intents indicating a risk may be permanently present throughout a clinical conversation, regardless of the point currently reached in the interaction. The risk listener may be triggered by user inputs that include potential intents indicating a risk. Once triggered, the risk listener may be selected to provide an output to the user. For instance, the user may be provided with functionality to then decide whether to pause the activity or to continue with the current activity. The risk listener may be further configured to take an action, and wherein the action is based, at least on part, on the identified risk. The actions may range from notifying the user's treating clinician, launching a crisis management procedure, involving clinical personnel, and/or calling out to the local emergency services, as appropriate.
Typical AI approaches to this risk listening use rule-based systems (i.e., exact matching of keywords and phrases) and while they tend to have high positive predictive value (i.e., the instances they identify are indicative of risk), they often miss the myriad of ways in which risk can be expressed in language (e.g. common misspellings of ‘suicide’ and/or where the wider context is needed to correctly identify risk). Compared to their rule-based counterparts, however, machine learning systems are better at generalising to unseen data, and therefore, tend to have higher sensitivity (recall) for classification tasks. For this use case, given that the cost of a false-positive (i.e., highlighting an SOS feature when not appropriate) is likely lower than a false-negative (i.e., not highlighting an SOS feature when appropriate), a preferred arrangement uses hybrid approach. One hybrid approach to identifying mentions of clinical risk involves both a rule-based component (i.e. based on keywords and phrases developed by clinicians that are highly indicative of risk regardless of context) and a machine learning approach that is trained from data. With this hybrid approach, one improves identification of risk for common stereotypical mentions of risk, and one may increase the sensitivity of systems with our machine learning component.
Where a risk is identified, the risk listener may output an SOS message to the user or a safeguarding message to relevant personnel. In this event, it is not appropriate for the system to provide a generated acknowledgement, as it is desirable to stick to preconfigured responses, which are known to be clinically suitable. A pre-programmed acknowledgement to the user input is then output and the known automated conversational healthcare pipeline continues through generation and output of a next output.
Where no risk is identified in the user input, the automated system performs an assessment as to whether the input is acceptable. For instance, the automated system tests if the user input appears to be a prompt injection. This assessment is performed by a computationally cheap classifier, as described above in the context of the first filter 31 (input outlier). Where the user input is deemed to be an outlier, a pre-programmed acknowledgement to the user input is then output and the known automated conversational healthcare pipeline continues. Where the user input is not deemed to be an outlier, the automated system passes the user input through a LLM to obtain a generated acknowledgement.
The automated system then performs an assessment as to whether the generated acknowledgment is an outlier. Again, this assessment is performed by a computationally cheap classifier, as described above in the context of the second filter 33 (output outlier). For instance, the automated system tests if the generated acknowledgement is relevant and/or contains therapeutic content. Where the generated acknowledgement is deemed to be an outlier, a pre-programmed acknowledgement to the user input is then output and the known automated conversational healthcare pipeline continues through generation and output of a next output.
Where the generated acknowledgement is not deemed to be an outlier, the automated system performs an assessment on the user input and the generated acknowledgement to determine if the combination is appropriate according to criteria. This assessment is performed by a computationally cheap classifier, as described above in the context of the third filter 36 (first combination filter). For instance, the automated system may query if the combination is coherent, poses further questions, is socially appropriate, and/or is clinically appropriate. Where the generated acknowledgement in combination with the user input is deemed not to be appropriate (e.g., fails to satisfy preconfigured conditions), a pre-programmed acknowledgement to the user input is then output and the known automated conversational healthcare pipeline continues through generation and output of a next output.
Where the generated acknowledgement in combination with the user input is deemed to be appropriate by the third filter 36, the automated system may perform a further assessment on the user input and the generated acknowledgement to determine if the combination is appropriate according to criteria. This assessment is performed by an LLM-based filter, as described above in the context of the fourth filter 36 (second combination filter). Where the generated acknowledgement in combination with the user input is deemed not to be appropriate by the fourth filter 36 (e.g., fails to satisfy preconfigured conditions, for accordance with regulatory requirements), a pre-programmed acknowledgement to the user input is then output and the known automated conversational healthcare pipeline continues. Where the generated acknowledgement in combination with the user input is deemed to be appropriate, the automated system may be configured to output the generated acknowledgement and to continue with the known automated conversational healthcare pipeline through generation and output of a next output.
As this LLM-based filter is comparatively computationally expensive, the automated system may only initiate the check if the system deems there to be sufficient time in which to acquire a determination. For instance, in order to ensure that the user is not faced with a lengthy wait for an acknowledgement to their input, the automated system may only initiate the check if the system believes an LLM-based response will be provided before elapse of a preconfigured timer (e.g., 5 seconds, which is found to be an acceptable time delay). If, for instance, the user input is relatively long, and the automated system may be aware that an LLM-based response will take a time exceeded such a preconfigured timer, the automated system may output the generated acknowledgement without performing the illustrated further assessment. Alternatively, the automated system may output a pre-programmed acknowledgement to the user input and continue with the known automated conversational healthcare pipeline.
Again, as the LLM-based filter is comparatively computationally expensive (that is, relative to classification-based filters), in one arrangement, the insights derived from an LLM-based filter may be used as training data for a cheaper classification filter. For instance, training data accumulated from the LLM-based filter may be used to train and to improve the first classification-based combination filter. In the above-described example, the first combination filter is configured to perform a binary classification on combinations, as acceptable or unacceptable. However, with the insights derived from the LLM-based filter, the first combination may, instead, be configured to perform classification in regard to the labels produced by the fourth filter 36, such as the conditions set out in the example LLM prompt above.
As indicated with dashed lines in
The frontend GUI may be resident on a user device, typically a mobile phone, but potentially also a personal digital assistant, personal computer, laptop computer or other computing device. The user device may communicate with the backend via the internet, using standard communication protocols, such as TCP/IP, HTTP, and REST.
The backend in this example comprises a collection of software services hosted within a cloud computing platform, such as Microsoft Azure (of course implementation on a single fixed server is also a possibility the skilled person will be aware of). Communication with the backend may be routed via an endpoint web app, which coordinates the activities of the various backend services in a way that provides the desired behaviour for the user, as detailed next.
The endpoint web app may access the authentication service to validate the user's credentials and confirm their permissions to access the conversational agent. The authentication service may be implemented using existing off-the-shelf technology, such as Azure Active Directory, or other third-party solutions, like Auth0.
After authenticating the user, the endpoint web app may connect to a patient management service to log the user's access of the conversational agent, which may be required for administrative purposes, such as billing. Depending on the clinical pathway set in place, the user may be required to complete a series of tasks, such as filling in a range of clinical questionnaires. The results of these may also be stored via a patient management service. These results may later be accessed as additional context as input into the generative acknowledgement model.
After completing the administrative tasks required when access to the system is initiated, the user is given access to the functionality of the conversational agent. The endpoint web app does this by routing frontend requests to an orchestrator bot, which welcomes the user to the new session, retrieves any pre-existing conversation state, and hands over control of the conversation to the appropriate sub-dialogue unit.
Any bots or units, such as orchestrators and sub-dialogue units, may be implemented using the Microsoft Bot Framework, or other third-party solution, such as RASA. Filtering mechanisms may be implemented as custom-built components hosted within the Azure ML service. Alternatively, they could be based on Azure Cognitive Services for Language Understanding, or be suitably configured LLMs such as those offered by the Azure OpenAI Service.
In order to allow continuation of previously interrupted conversations, sub-dialogue units are able to persistently store the conversation state for each user. This functionality may be implemented using an Azure Cosmos DB datastore, or some other similar solution.
If multimedia content is part of the designed user experience (such as images), this content may be stored within Azure Blob Storage and made available to sub-dialogue units in this manner. Sub-dialogue units have the option to retrieve such content from storage and return it as part of their response to the user's request.
Along with routing requests to the orchestrator, or the currently active sub-dialogue unit, the endpoint web app may also send requests to the background units. If one or more of the background units identifies an intent within the latest user utterance, the endpoint web app may decide to cede control of the conversation to one of the background units. To make that determination, the endpoint web app may use an adjudicator service, which may itself be implemented as a web app providing a REST API. The adjudicator implements the decision logic that takes into account the relative priority of all the bots that are able to provide a response, their confidence (i.e., detection probability) for the intent they have each identified, and a set of rules implementing other relevant business logic.
For monitoring and compliance reasons it may be necessary to maintain comprehensive activity logs that keep a record of all the interactions between the user and the system, and all the decisions made by the system. One way to implement this is by storing a record of all the system events, each event being represented as a snippet of XML or JSON content. The totality of all these event representations is then collected and persistently stored in Azure Blob Storage, or some other storage solution.
As illustrated in
As seen, with all descriptive terms, users consistently rate the healthcare pipeline with generated acknowledgements higher than that without. Approximately 10% of all generated acknowledgements in this example were not ideal and included repetitions, parroting of the user input, or were slightly out of context. Of the 747 generated acknowledgements in this example, none were considered to be “risky” by clinically trained annotators.
Studies show that patients who type more in a first session are more likely to engage in treatment, and thus the method for generating an acknowledgement in an automated conversational healthcare pipeline may demonstrably improve user engagement with a treatment protocol.
For example, an embodiment may be composed of a network of such computing devices. Optionally, the computing device also includes one or more input mechanisms such as keyboard and mouse 996, and a display unit such as one or more monitors or screens 995. The components are connectable to one another via a bus 992.
The memory 994 may include a computer readable medium, a term which may refer to a single medium or multiple media (e.g., a centralised or distributed database and/or associated caches and servers) configured to carry computer-executable instructions or have data structures stored thereon. Computer-executable instructions may include, for example, instructions and data accessible by and causing a general purpose computer, special purpose computer, or special purpose processing device (e.g., one or more processors) to perform one or more functions or operations. Thus, the term “computer-readable storage medium” may also include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methods of the present disclosure. The term “computer-readable storage medium” may accordingly be taken to include, but not be limited to, solid-state memories, optical media and magnetic media. By way of example, and not limitation, such computer-readable media may include non-transitory computer-readable storage media, including Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory devices (e.g., solid state memory devices).
The processor 993 is configured to control the computing device and execute processing operations, for example executing code stored in the memory to implement the method for generating an acknowledgement in an automated conversational healthcare pipeline and the associated filtering mechanism described here and in the claims. The memory 994 stores data being read and written by the processor 993. As referred to herein, a processor may include one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. The processor may include a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processor may also include one or more special-purpose processing devices such as an ASIC, a FPGA, a digital signal processor (DSP), network processor, or the like. In one or more embodiments, a processor is configured to execute instructions for performing the operations and steps discussed herein.
Display 995 may display a user interface controlled by the conversational agent and provide the frontend introduced above. Input 996, in the form of a touchscreen or screen and keyboard and/or voice, may be used for user input. The user interface may be embodied as a user app shown on the display and optionally connected to the audio input/output of the user device for voice input and audio output. Local storage, for example of a user identification and/or settings, may be provided by memory 994, and processor 993 may carry out background functions. The core functionality (acknowledgement generation, for example) is preferably implemented remotely from the user device for example on the cloud.
The network interface (network I/F) 997 may be connected to a network, such as the internet, and is connectable to other such computing devices via the network. The network I/F 997 may control data input/output from/to other apparatus via the network. Other peripheral devices such as microphone, speakers, etc. may be included in the computing device.
A filtering mechanism module may comprise processing instructions stored on a portion of the memory 994, the processor 993 to execute the processing instructions, and a portion of the memory 994 to store filtering model details, such as weights, biases, and other information concerning the classifier architecture and/or LLM architecture during the execution of the processing instructions. The classifier and/or LLM weights and biases of the filtering mechanism may be stored on the memory 994 and/or on a connected storage unit, and may be transmitted, transferred or otherwise communicated to further components.
A generative acknowledgement model module may comprise processing instructions stored on a portion of the memory 994, the processor 993 to execute the processing instructions, and a portion of the memory 994 to store generative acknowledgement model details, such as weights, biases, and other information concerning the LLM architecture during the execution of the processing instructions. The LLM weights and biases of the generative acknowledgement model may be stored on the memory 994 and/or on a connected storage unit, and may be transmitted, transferred or otherwise communicated to further components.
Methods embodying the present invention may be carried out on a computing device such as that illustrated in
A method embodying the present invention may be carried out by a plurality of computing devices operating in cooperation with one another. One or more of the plurality of computing devices may be a data storage server storing at least a portion of trained model weights and/or biases, training datasets, model hyperparameters, user utterances, generated acknowledgements, etc.
The various methods described above may be implemented by a computer program. The computer program may include computer code (e.g., instructions) arranged to instruct a computer to perform the functions of one or more of the various methods described above. For example, the steps of the methods described in relation to
In an implementation, the modules, components and other features described herein may be implemented as discrete components or integrated in the functionality of hardware components such as ASICS, FPGAs, DSPs or similar devices.
A “hardware component” is a tangible (e.g., non-transitory) physical component (e.g., a set of one or more processors) capable of performing certain operations and may be configured or arranged in a certain physical manner. A hardware component may include dedicated circuitry or logic that is permanently configured to perform certain operations. A hardware component may comprise a special-purpose processor, such as an FPGA or an ASIC. A hardware component may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations.
In addition, the modules and components may be implemented as firmware or functional circuitry within hardware devices. Further, the modules and components may be implemented in any combination of hardware devices and software components, or only in software (e.g., code stored or otherwise embodied in a machine-readable medium or in a transmission medium).
Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “receiving”, “determining”, “comparing”, “enabling”. “maintaining”, “identifying”, “obtaining”, “accessing”, or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and apparatuses described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of methods and apparatus described herein may be made.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2315750.6 | Oct 2023 | GB | national |