DIALOGUE SYSTEM AND A DIALOGUE METHOD

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and benefit of UK Patent Application No. 2308287.8 filed Jun. 2, 2023, which is hereby incorporated in its entirety by this reference.

FIELD

The present disclosure relates to a dialogue system and a dialogue method.

BACKGROUND

Dialogue systems provide conversational user interfaces, for example chatbots or voice-based agents, which may be used in many applications. Producing text and dialogue that is human-like has long been a challenge in artificial intelligence. As a result, much research has been done in making humans and machines converse. Early attempts used rule-based models, while more recent advances have relied on more complex neural networks.

The key task of language modelling is to produce output text, given some previous input. As such, language modelling models a sequence of tokens (which may correspond to words) and predicts the next token in this sequence. For example, the sentence “I went to the kitchen to brew myself a cup of” will likely be followed by the word “coffee” or “tea”, but not “car”. This logic can then be extended further, to produce entire sentences, paragraphs or even longer conversations or texts.

The way that language models learn these statistical dependencies is by observing large amounts of human-written text, like books or internet forums. In simple terms, these models observe word-word or longer term sentence-sentence pairs, and base their predictions based on such observed regularities. As a result, language modelling may be considered as complex pattern matching.

Language models have achieved significant progress in recent years, producing what may appear as human-like text. Firstly, the internet has provided large quantities of text, increasing the amount of training data manifold, and leading to advances in performance. Secondly, specific large neural networks have been developed to deal with the complexities of natural language and increased computing power has made it easier to train them.

Among these neural network architectures are large transformer models. These comprise attention mechanisms within their neural network architecture, and have a large number of trainable parameters—for example tens of billions of parameters. These large models have seen significant gains in performance, being able to produce coherent long-term text completion. Because these systems are so large, this text completion ability allows them to also engage in other text-related tasks, like summarisation or conversations. While some transformers are more fine-tuned on these tasks, they still rely on the same basic prediction idea in language modelling.

Such large language models may have many advantages over older, more classical text generation systems, like rule-based chatbots. For example, their large amounts of training data make them understand quite varied forms of language and tone and they are robust in reacting to mistakes in the input text. Additionally, this allows them to respond in varied, and potentially highly nuanced ways, adapting their tone, and response, as well as not having to rely on formulaic responses. This can make them seem significantly more human-like and empathetic than other systems. This in turn makes them a key candidate to engage in deeper conversations with humans.

Furthermore, these architectures may make any sort of explainability difficult. They engage in pure, black-box style, pattern-matching, that does not allow for causal explanations behind the output utterances. This black box nature may also make it harder to constrain the model outputs in a useful fashion, potentially letting the models be led astray, or engage in potentially harmful behaviour that they have picked up through their source material.

SUMMARY

In one aspect, there is provided a dialogue system, comprising:

- an input configured to obtain input data relating to speech or text provided by a user;
- an output configured to provide output data relating to speech or text to a user;
- one or more processors, configured to:
  - receive, byway of the input, input data relating to speech or text provided by a user;
  - provide the input data to a first module, the first module comprising a subject understanding module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the input data and provide as output subject profile information;
  - generate a system input using the subject profile information;
  - provide the system input to a language model, the language model configured to provide a system response; and
- output, by way of the output, the system response.

The one or more subject understanding models represent clinical constructs or frameworks. These models output subject profile information represent clinical concepts or diagnoses, such as cognitive distortion.

In one example, the system input is a system prompt, wherein the system prompt is provided as input to the language model in order to output a system response. The system prompt may be generated using the input data and the subject profile information.

In one example, the first module further comprises a subject recommendation module, wherein the subject recommendation module comprises a subject recommendation model configured to take as input the subject profile information and providing as output a subject recommendation, wherein the system prompt is generated using the subject recommendation to generate a subject prompt.

In one example, the language model is a general generative pre-trained transformer model.

The input data may further comprise numeric data.

In one example, the one or more subject understanding models comprises a trained cognitive understanding model configured to take as input the input data, determine one or more classifications, and provide the determined one or more classifications as output subject profile information. The one or more classifications may comprise at least one classification predicting a specific medical diagnosis. The one or more classifications may represent an aspect of a clinical framework or process, or a clinical concept for example.

In one example, the specific medical diagnosis comprises one or more cognitive distortions. In one example, the specific medical diagnosis comprises any or any combination of: catastrophizing; dichotomous thinking; negative filtering; fortune telling; mind reading; and/or personalising.

In one example, the language model is configured to generate the subsequent words in a sequence of text beginning with the system prompt.

In one example, one or more of the one or more subject understanding models predicts information about the subject relating to an aspect of a clinical process. One or more of the one or more subject understanding models may be trained using data that is labelled with information relating to the corresponding aspect.

In one example, the language model is a trained model and one or more of the one or more subject understanding models is a trained model, wherein the one or more trained subject understanding models each has a smaller number of trained parameters than the language model.

In one example, the one or more processors is further configured to receive, at an output safety module, the system response from the language model and evaluate the system response at the output safety module before it is output. The output safety module may be further configured to prevent output of the system response depending on the output of the evaluation.

In one example, the one or more processors is further configured to receive, at a subject safety module, the input data and evaluate the input data before a system response is output. The subject safety module may be further configured to prevent output of a system response depending on the output of the evaluation.

In one example, the one or more subject understanding models comprises a trained behavioural understanding model configured to take as input the input data, determine one or more activity classifications using a trained model, classify the activity classifications as user adaptive or user maladaptive behaviours and provide the determined one or more classifications as output subject profile information.

In one example, the one or more processors are further configured to output, by way of the output, a record comprising one or more system responses provided to the user and the subject profile information used as system input to generate each system response.

In one example, the first module further comprises a recommender module.

In one aspect, there is provided a computer implemented dialogue method, comprising

- receiving, by way of an input, input data relating to speech or text provided by a user;
- providing the input data to a first module, the first module comprising a subject understanding module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the input data and provide as output subject profile information;
- generating a system input using the subject profile information;
- providing the system input to a language model, the language model configured to provide a system response; and
- outputting, by way of an output, the system response.

The dialogue system and dialogue method may be used in a method of automated diagnosis and/or automated treatment for example.

In one aspect, there is provided a computer-implemented method for automated diagnostics, the method comprising:

- performing the dialogue method; and
- outputting an indication of one or more diagnoses based on the subject profile information.

In one aspect, there is provided a computer-implemented method for automated treatment, the method comprising:

- performing the dialogue method; and
- outputting a treatment plan based on the subject profile information.

In one aspect, there is provided a dialogue system, comprising:

- an input configured to obtain input data relating to speech, text or numeric data provided by a user;
- an output configured to provide output data relating to speech, text or numeric data to a user;
- one or more processors, configured to:
  - receive, by way of the input, input data relating to speech, text or numeric data provided by a user;
  - provide the input data to a first module, the first module comprising a subject understanding module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the input data and provide as output subject profile information;
  - generate a system input using the subject profile information;
  - provide the system input to a language model, the language model configured to provide a system response; and
  - output, by way of the output, the system response.

In one aspect, there is provided a dialogue method, comprising:

- receiving, by way of the input, input data relating to speech, text or numeric data provided by a user;
- providing the input data to a first module, the first module comprising a subject understanding module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the input data and provide as output subject profile information;
- generating a system input using the subject profile information;
- providing the system input to a language model, the language model configured to provide a system response; and
- outputting, by way of the output, the system response.

In one aspect, there is provided a dialogue system, comprising:

- an input configured to obtain input data relating to speech or text provided by a user;
- an output configured to provide output data relating to speech or text to a user;
- one or more processors, configured to:
  - receive, byway of the input, input data relating to speech or text provided by a user;
  - generate subject profile information using a first module, the first module comprising a subject understanding module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to provide as output subject profile information;
  - generate a system input using the input data and the subject profile information;
  - provide the system input to a language model, the language model configured to provide a system response; and
  - output, by way of the output, the system response.

In one aspect, there is provided a dialogue method, comprising:

- receiving, by way of an input, input data relating to speech or text provided by a user;
- generating subject profile information using a first module, the first module comprising a subject understanding module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to provide as output subject profile information;
- generating a system input using the input data and the subject profile information;
- providing the system input to a language model, the language model configured to provide a system response; and
- outputting, by way of an output, the system response.

According to another aspect, there is provided a non-transitory computer readable storage medium comprising computer readable code configured to cause a computer to perform the above methods.

In one aspect, there is provided a dialogue system, comprising:

- an input configured to receive input data relating to speech or text provided by a user;
- an output configured to provide output data relating to speech or text to a user; and
- one or more processors, configured to:
  - receive, byway of the input, input data relating to speech or text provided by a user;
  - provide the input data to a subject safety module, the subject safety module configured to receive the input data and evaluate the input data before a system response is output, evaluating the input data comprising performing a first determination on the input data using a matching module and a second determination on the input data using a trained model, wherein the matching module performs the first determination to determine whether the input data matches one or more items from a pre-determined set of one or more items;
  - generate a system response using a first process or using a second process, wherein the first process uses at least one trained language model to generate a dynamically determined system response and wherein the second process retrieves a pre-determined system response, wherein a selection between the first process and the second process is made based on the evaluation of the input data;
  - output, by way of the output, the system response.

In one example, responsive to the system response being generated by the second process, the one or more processors are further configured to provide a function to the user to contact a third party.

In one example, responsive to the system response being generated by the second process, the one or more processors are further configured to transmit information comprising the input data to a second user.

In one example, the first determination comprises applying one or more regular expressions.

In one example, the trained model comprises a language model. The second determination may comprise generating a system prompt including instructions to evaluate the input data and the input data, and providing the system prompt to the language model.

In one example, the pre-determined system response is retrieved based on a rule based dialogue flow.

In one example, the dynamically determined system response is generated by generating a system prompt comprising the input data and providing the system prompt to the at least one trained language model.

In one example, the subject safety module is configured to generate a first output based on the evaluation of the input data. The one or more processors may be configured to select the second process if the first output includes an indication that the user is in crisis.

In one example, the one or more processors are configured to select the first process if the first output does not include an indication that the user is in crisis.

In one example, the system response is a first system response. The one or more processors may be further configured to:

- provide the first system response to an output safety module, wherein the output safety module is configured to evaluate the first system response and provide an evaluation result;
- responsive to a first evaluation result, output, by way of the output, the first system response;
- responsive to a second evaluation result, generate a second system response based on the second evaluation result;
- output, by way of the output, the second system response.

In one example, generating the system response using the first process comprises:

- receiving, at a first module, structured information comprising information relating to a clinical state of the user, the structured information being generated from the input data, the first module comprising a subject understanding module and a subject recommendation module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the structured information and provide as output subject profile information;
- generating, at the subject understanding module, subject profile information based on the structured information;
- determining a subject recommendation comprising an intervention for the user, determining the subject recommendation comprising providing the subject profile information as input to the subject recommendation module; and
- wherein the system response is output as a part of a dialogue with the user, the system response delivering an intervention.

In one example, the subject recommendation module comprises decision logic configured to map subject profile information to a pre-determined list of interventions.

In one example, the one or more subject understanding models comprises two or more subject understanding models, each of the subject understanding models comprising a trained model configured to determine a classification, wherein each classification comprises a prediction of a specific diagnosis.

In one example, the subject recommendation module is configured to identify an intervention from a plurality of stored interventions, wherein each stored intervention is associated with a specific stored diagnosis corresponding to a classification.

In one example, the one or more processors are further configured to provide the input data to an interaction module, the interaction module comprising at least one model configured to generate the structured information by generating one or more questions to be provided to the user and storing the input data received in response to the one or more questions as the structured information.

In one example, the subject recommendation module is further configured to determine whether the subject profile information satisfies one or more thresholds, and responsive to determining that the subject profile information does not satisfy the one or more thresholds, instruct the interaction module to generate one or more further questions to be provided to the user.

In one example, the interaction module is further configured to generate a label corresponding to each of the one or more questions, the label indicating a topic associated with the question, and store the input data received in response to the one or more questions associated with the topic as the structured information.

In one example, a first topic corresponds to an indication that the question relates to thoughts, a second topic corresponds to an indication that the question relates to situations and a third topic corresponds to an indication that the question relates to feelings.

In one example, the subject understanding module comprises one or more of: a first model, a second model, a third model and a fourth model, wherein the first model is configured to detect whether information from the structured information corresponds to a thought, the second model is configured to determine whether a thought is a distorted thought, the third model is configured to determine whether information from the structured information corresponds to a positive sentiment or a negative sentiment, and the fourth model is configured to determine whether information from the structured information corresponds to a positive activity or a negative activity.

In one example, the subject recommendation module is further configured to determine the subject recommendation based on stored historical data associated with one or more previous interventions for the user.

In one example, the subject recommendation module includes a trained model.

In one example, the subject recommendation module is further configured to determine the subject recommendation based on one or more of: stored diagnosis information associated with the user, stored demographic data associated with the user, stored historical data associated with one or more previous interventions for the user, and feedback from the user in relation to one or more previous interventions for the user.

In one example, the recommender model is configured to predict the intervention for the user so as to maximise at least one of three target metrics, at least two of the three target metrics, or three of the three target metrics, wherein a first metric of the three target metrics relates to therapeutic alliance, a second metric of the three target metrics relates to patient outcomes, and a third metric of the three target metrics relates to insights derived by the subject understanding module.

In one example, the interaction module is further configured to:

- determine whether the input data from the user includes information relating to a clinical state of the user;
- in response to determining that the input data includes information relating to the clinical state of the user, generating the structured information from subsequent input data received from the user; and
- output, by the way of the output, information explaining the intervention to the user.

In one example, the interaction module is further configured to generate a first system prompt to instruct a language model to determine whether the input data includes information relating to the clinical state of the user, wherein the first system prompt comprises a first flag to be output by the language model when it determines that the input data incudes information relating to the clinical state of the user, the dialogue history and instruction to the language model to generate a system response to the user input, wherein in response to determining that the first flag is output, the interaction module is further configured to generate a second system prompt, wherein the second system prompt instructs the language model to generate the one or more questions to be provided to the user.

In one example, the first system prompt includes a second flag to be output by the language model when it determines that the user has requested a specific intervention, wherein in response to determining that the second flag is output, the interaction module is further configured to instruct the language model to generate a system response delivering the intervention.

In one example, the system responses deliver a sequence of interventions.

In one example, generating the system response using the first process comprises:

- providing the input data to an interaction module, the interaction module comprising at least one model configured to:
  - determine whether the input data from the user includes information relating to a clinical state of the user;
  - in response to determining that the input data includes information relating to the clinical state of the user, generate structured information from subsequent input data received from the user, wherein the structured information comprises information to identify an intervention for the user;
  - output, by the way of the output, information explaining the intervention to the user; and
- wherein the system response is output as a part of a dialogue with the user, the system response delivering an intervention.

In one example, the at least one model comprises a language model, wherein the language model is a general generative pre-trained transformer model.

In one example, the at least one model is a model trained using supervised learning.

In one example, the interaction module is further configured to generate a first system prompt to instruct the language model to determine whether the input data includes information relating to the clinical state of the user.

In one example, the first system prompt includes a first flag to be output by the language model when it determines that the input data incudes information relating to the clinical state of the user.

In one example, the first system prompt further comprises the dialogue history and instruction to the language model to generate a system response to the user input.

In one example, in response to determining that the first flag is output, the interaction module is further configured to generate a second system prompt, wherein the second system prompt instructs the language model to generate a question to be provided to the user relating to cognitive behavioural therapy assessment.

In one example, the first system prompt includes a second flag to be output by the language model when it determines that the user has requested a specific intervention.

In one example, in response to determining that the second flag is output, the interaction module is further configured to generate a fourth system prompt, wherein the fourth system prompt instructs the language model generate a system response delivering the intervention.

In one example, the second system prompt instructs the language model to generate the structured information.

In one example, the interaction module is further configured to generate a third system prompt to instruct the language model to generate output information explaining the intervention to the user, wherein the third system prompt includes at least some information specific to the user.

In one example, the interaction module is further configured to generate a fourth system prompt to instruct the language model to generate system responses delivering the intervention to the user.

In one aspect, there is provided a dialogue system, comprising:

- an input configured to obtain input data relating to speech or text provided by a user;
- an output configured to provide output data relating to speech or text to a user;
- one or more processors, configured to:
  - receive, byway of the input, input data relating to speech or text provided by a user;
  - provide the input data to a first module, the first module comprising a subject understanding module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the input data and provide as output subject profile information;
  - generate a system input using the subject profile information;
  - provide the system input to a language model, the language model configured to provide a system response; and
  - output, by way of the output, the system response,
  - wherein the one or more processors are further configured to provide the input data to a subject safety module and evaluate the input data before a system response is output, wherein the subject safety module is further configured to prevent output of the system response depending on the output of the evaluation.

In one example, the one or more processors are further configured to provide the system response to an output safety module. The output safety module may be configured to receive the system response from the language model and evaluate the system response before it is output.

In one aspect, there is provided a dialogue method, comprising:

- receiving, by way of the input, input data relating to speech or text provided by a user;
- providing the input data to a first module, the first module comprising a subject understanding module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the input data and provide as output subject profile information;
- generating a system input using the subject profile information;
- providing the system input to a language model, the language model configured to provide a system response; and
- outputting, by way of the output, the system response,
- wherein the one or more processors are further configured to provide the input data to a subject safety module and evaluate the input data before a system response is output, wherein the subject safety module is further configured to prevent output of the system response depending on the output of the evaluation.

In one aspect, there is provided a dialogue method, comprising:

- receiving, by way of the input, input data relating to speech or text provided by a user;
- providing the input data to a subject safety module, the subject safety module configured to receive the input data and evaluate the input data before a system response is output, evaluating the input data comprising performing a first determination on the input data using a matching module and a second determination on the input data using a trained model, wherein the matching module performs the first determination to determine whether the input data matches one or more items from a pre-determined set of one or more items;
- generating a system response using a first process or using a second process, wherein the first process uses at least one trained language model to generate a dynamically determined system response and wherein the second process retrieves a pre-determined system response, wherein a selection between the first process and the second process is made based on the evaluation of the input data;
- outputting, by way of the output, the system response.

In one aspect, there is provided a dialogue system, comprising:

- an input configured to receive input data relating to speech or text provided by a user;
- an output configured to provide output data relating to speech or text to a user; and
- one or more processors, configured to:
  - receive, byway of the input, input data relating to speech or text provided by a user;
  - provide the input data to a subject safety module, the subject safety module configured to receive the input data and evaluate the input data before a system response is output;
  - generate a system response using a first process or using a second process, wherein the first process uses at least one trained language model to generate a dynamically determined system response and wherein the second process retrieves a pre-determined system response, wherein a selection between the first process and the second process is made based on the evaluation of the input data;
  - output, by way of the output, the system response.

In one aspect, there is provided a dialogue method, comprising:

- receiving, by way of the input, input data relating to speech or text provided by a user;
- providing the input data to a subject safety module, the subject safety module configured to receive the input data and evaluate the input data before a system response is output;
- generating a system response using a first process or using a second process, wherein the first process uses at least one trained language model to generate a dynamically determined system response and wherein the second process retrieves a pre-determined system response, wherein a selection between the first process and the second process is made based on the evaluation of the input data;
- outputting, by way of the output, the system response.

In one aspect, there is provided a dialogue system, comprising:

- an input configured to obtain input data relating to speech or text provided by a user;
- an output configured to provide output data relating to speech or text to a user; and
- one or more processors, configured to:
  - receive, byway of the input, input data relating to speech or text provided by a user;
  - generate a first system response from the input data using a first trained model;
  - provide the first system response to an output safety module, wherein the output safety module is configured to evaluate the first system response and provide an evaluation result;
  - responsive to a first evaluation result, output, by way of the output, the first system response;
  - responsive to a second evaluation result, generate a second system response;
  - output, by way of the output, the second system response.

In one example, the second system response is generated based on the second evaluation result.

In one example, generating the first system response comprises generating a first system prompt comprising the input data and providing the system prompt to the first trained model. The first trained model may comprise a first language model. Generating the second system response may comprise generating a second system prompt. The second system prompt may comprise the second evaluation result. The second system prompt may comprise the first system prompt annotated with the second evaluation result. The second system prompt may comprise the first system response. Generating the second system response may comprise providing the second system prompt to the first trained model or to a second trained model. The second trained model may comprise a second language model.

In one example, the second evaluation result includes an indication that the dynamically determined system response includes information unsuitable for the user.

In one example, unsuitable information comprises at least one of: undesirable information, harmful information, or medical advice.

In one example, the second system prompt includes instructions to modify a portion of the first system response.

In one example, the second evaluation result includes an indication that the reading age of the first system response is outside a pre-determined range.

In one example, the output safety module is further configured to:

- compare the first system response to one or more stored text sequences and determine the evaluation result based on the comparison, wherein the one or more stored text sequences comprise one or more stored text sequences corresponding to one or more interventions.

In one example, the one or more processors are further configured to:

- receive, at a first module, structured information comprising information relating to a clinical state of the user, the structured information being generated from the input data, the first module comprising a subject understanding module and a subject recommendation module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the structured information and provide as output subject profile information;
- generate, at the subject understanding module, subject profile information based on the structured information; and
- determine a subject recommendation comprising an intervention for the user, determining the subject recommendation comprising providing the subject profile information as input to the subject recommendation module. A stored text sequence may correspond to the intervention.

In one example, generating the first system response comprises:

- providing the input data to a subject safety module, the subject safety module configured to receive the input data and evaluate the input data before a system response is output, evaluating the input data comprising performing a first determination on the input data using a matching module and a second determination on the input data using a trained model, wherein the matching module performs the first determination to determine whether the input data matches one or more items from a pre-determined set of one or more items;
- generating the first system response using a first process or using a second process, wherein the first process uses at least one trained language model to generate a dynamically determined system response and wherein the second process retrieves a pre-determined system response, wherein a selection between the first process and the second process is made based on the evaluation of the input data.

In one example, generating the first system response comprises:

- receiving, at a first module, structured information comprising information relating to a clinical state of the user, the structured information being generated from the input data, the first module comprising a subject understanding module and a subject recommendation module, wherein the subject understanding module comprises one or more subject understanding models, each of the one or more subject understanding models configured to take as input the structured information and provide as output subject profile information;
- generating, at the subject understanding module, subject profile information based on the structured information;
- determining a subject recommendation comprising an intervention for the user, determining the subject recommendation comprising providing the subject profile information as input to the subject recommendation module; and
- wherein the system response is output as a part of a dialogue with the user, the system response delivering an intervention.

In one example, the subject recommendation module comprises decision logic configured to map subject profile information to a pre-determined list of interventions.

In one example, the subject recommendation module includes a trained model.

In one example, the interaction module is further configured to:

- determine whether the input data from the user includes information relating to a clinical state of the user;
- in response to determining that the input data includes information relating to the clinical state of the user, generating the structured information from subsequent input data received from the user; and
- output, by the way of the output, information explaining the intervention to the user.

In one example, the system responses deliver a sequence of interventions.

In one example, generating the first system response comprises:

- providing the input data to an interaction module, the interaction module comprising at least one model configured to:
  - determine whether the input data from the user includes information relating to a clinical state of the user;
  - in response to determining that the input data includes information relating to the clinical state of the user, generate structured information from subsequent input data received from the user, wherein the structured information comprises information to identify an intervention for the user;
  - output, by the way of the output, information explaining the intervention to the user; and
- wherein the system response is output as a part of a dialogue with the user, the system response delivering an intervention.