Embodiments described herein relate to methods and systems for generating a user profile. The user profile may be in a diagnostic system.
A diagnostic system may include a knowledge base including medical concepts, a statistical inference engine, and a chatbot for interfacing with a user in order to diagnose a user's condition using the medical concepts from the chatbot. The chatbot may generate one or more concepts from the consultation. The concepts may be encoded using, for example, XML and sent to a database for storage. Upon request from a medical practitioner, the concepts may be retrieved from the database to build a user profile to analyse the clinical history of a patient after a number of consultations.
The present disclosure is best described with reference to the accompanying figures, in which:
Embodiments of the present disclosure relate to a computer-implemented method of building a user profile for a medical diagnostic system. The method comprises: receiving a new event including data describing a chatbot consultation with the user; encoding the new event using JSON; storing the encoded new event in a queue of events; decoding and translating the new event into a form compatible with the user profile; and adding the translated new event to the user profile.
It is an object of the present disclosure to improve on the prior art. In particular, the present disclosure addresses a technical problem tied to computer technology and arising in the realm of computer networks, namely the technical problem of bandwidth usage and processing speed. The disclosed system solves this technical problem using a technical solution, namely by encoding events from a chatbot consultation using JSON and storing the encoded event in a queue for subsequent retrieval, decoding and translation into a user profile. JSON provides a standardised format for concept encoding requiring reduced bandwidth for transmission, and using the queue allows for the user profile to be built up incrementally saving processing each time the user profile is generated.
With reference to
The mobile phone 3 will communicate with interface 5. Interface 5 has two primary functions, the first function 7 is to take the words uttered by the user and turn them into a form that can be understood by the inference engine 11. The second function 9 is to take the output of the inference engine 11 and to send this back to the user's mobile phone 3.
In some embodiments, Natural Language Processing (NLP) is used in the interface 5. NLP is one of the tools used to interpret, understand, and then use everyday human language and language patterns. It breaks both speech and text down into shorter components and interprets these more manageable blocks to understand what each individual component means and how it contributes to the overall meaning, linking the occurrence of medical terms to the knowledge base. Through NLP it is possible to transcribe consultations, summarise clinical records and chat with users in a more natural, human way.
However, simply understanding how users express their symptoms and risk factors is not enough to identity and provide reasons about the underlying set of diseases. For this, the inference engine 11 is used. The inference engine 11 is a powerful set of machine learning systems, capable of reasoning on a space of >100s of billions of combinations of symptoms, diseases and risk factors, per second, to suggest possible underlying conditions. The inference engine 11 can provide reasoning efficiently, at scale, to bring healthcare to millions.
In an embodiment, a knowledge base 13 is a large structured set of data defining a medical knowledge base. The knowledge base 13 describes an ontology, which in this case relates to the medical field. It captures human knowledge on modern medicine encoded for machines. This is used to allow the above components to speak to each other. The knowledge base 13 keeps track of the meaning behind medical terminology across different medical systems and different languages. In particular, the knowledge base 13 includes data patterns describing a plurality of semantic triples, each including a medical related subject, a medical related object, and a relation linking the subject and the object. An example use of the knowledge base would be in automatic diagnostics, where the user 1, via mobile device 3, inputs symptoms they are currently experiencing, and the interface engine 11 identifies possible causes of the symptoms using the semantic triples from the knowledge base 13. The subject-matter of this disclosure relate to creating and enhancing the knowledge base 13 based on information described in unstructured text.
A user graph 15 is also provided and linked to the knowledge base as discussed in more detail below.
With reference to
With reference to
At step 102, an event is generated to describe the consultation. The event may include one or more concepts describing symptoms as well as the diagnoses (as described in more detail below). The event may also include event information such as an event identifier (ID), a user ID, a time stamp at which the consultation took place, and a location of the consultation. The user ID may be determined from the IP address of the user device. The time stamp may be obtained from the clock of the user device. The location of the consultation may be obtained from a positioning system of the user device, e.g. a global positioning system (GPS). The event information may be included in the form of metadata.
At step 104, the event is encoded as described below.
During the chatbot consultation, the patient may input “I have an acute pain in my left leg”, which contains the complex medical notion “Acute pain in left leg” that needs to be identified and encoded in a formal way using concepts from the medical knowledge base 13 (
Pain ∃hasQualifier .Acute
∃findingSite.LeftLowerLimb
where Pain, Acute, and LeftLowerLimb are concepts from the knowledge base 13 and hasQualifier and findingSite are properties (binary relations).
The concept above is written in (abstract) description logic (DL) syntax and in order to be transmitted between different computer systems, or even saved in a data store, it needs to be serialised into a machine readable format.
Various different services within the diagnostic system may need to exchange concepts between them. For example, an engine for triaging may need to ask the user further questions about their reported symptoms in order to proceed with the symptom checking process or retrieve sonic information from the user graph 15. The answers of questions also represent complex medical notions. For example, for a user reporting an injury in his hand, the symptom checking system may need to ask the following question:
The user friendly text above is associated with a concept that captures its meaning. Answers are captured through a (complex) medical concept which is built using other concepts, where each concept is described in association with an international resource identifier (IRI) from the medical knowledge base 13. For example, answer 1 corresponds to the complex concept
whereas answer 2 corresponds to complex concept
These complex concepts need to be transmitted to the user together with the respective user friendly text to be rendered. In addition, the final diagnosis for a patient, the reported symptoms and any other related condition, which are again represented as complex medical concepts need to be stored in his/her profile. Besides the aforementioned ones, several other services within the diagnostic system generate, store or exchange complex medical knowledge for users/patients.
Thus, it becomes apparent that serialising and transmitting concepts between these services is of paramount importance in order for the services to intercommunicate, coordinate, and interoperate. The format needs to be simple, compact, easy to serialise/deserialise, and transmit over the network, as well as comprehensible by software engineers and medical personnel that are developing the services.
An encoding format only specifies some general rules for creating concepts. The freedom of the encoding rules make it possible for services (and actually even humans) to create erroneous concepts or simply concepts of low quality. For example, the following concept is of low-quality in the sense that it is an empty concept (does not represent anything in the real-world).
This is the case because the two concepts Wound and Bleeding are of different semantic type (they are not a like for a like) and hence their intersection (conjunction) is empty. Although this example is quite obvious there are more involved examples like
In order to achieve high levels of interoperability, quality, and reduce the number of empty concepts an additional set of constraints are required that can be used to eliminate or prevent the generation of such concepts. In order to do this, the diagnostic system uses a JavaScript Object Notation (JSON)-based format for serialising and exchanging concepts. The use of JSON makes the format easy to process and exchange as JSON is one of the most popular formats for exchanging data between web services.
Concepts used by the diagnostic system may be defined by the following Backus-Naur Form (BNF) syntax:
A modified concept can be encoded in JSON as follows. For an example concepts of “a bleeding wound”, the encoded form in JSON would be:
The JSON-format is a syntax and does not provide with formal semantics of constructs, structural restrictions on concepts, or a deeper insight on the complexity or the properties of the concepts that can be constructed using it. To do so, it is good to try and map this syntax also to a formal language like that of Description Logic. Table 1 below presents a mapping from the JSON-constructs defined above to the corresponding DL notation.
The above translation can produce concepts of the form E∃-R.D. Semantically, these concepts are implying concepts of the form E
¬∃R.D. Summarising, the complex concepts constructed using the JSON-syntax presented above roughly correspond to DL concepts constructed using the following syntax.
U is an operator called “unknown”. It is a non-standard operator in Description Logic but its semantics can be given using some 3-valued logic where UC obtains the truth value of 0.5.
Example semantics for unknown concepts are the following:
which follows Lukasiewicz logic in the sense that the proposition “unknown implies unknown” is true. Hence, if KB
A and A is unknown that B is implied to be unknown. In contrast, in Kleen logic (min-max logic) “unknown implies unknown” is unknown and “false implies unknown” is true.
Consequently, with the Lukasiewicz logic-based semantics if some concept is set to “unknown” then all sub-concepts of it in the Knowledge Base are implied to by “unknown” as well.
The mapping to Description Logics presented above can help us into developing a set of constraints that can be used for ensuring the quality and coherency of complex concepts created using the JSON format. In the following we present a definition of the current constraints implemented as a validation service for complex concepts. Some of these constraints are implemented with the help of the Knowledge Base and some upper-level-model encoded in the Knowledge Base. This upper-level-model describes some constraints on the acceptable models.
Definition. Let K be some KB and let δ be mappings from properties in a KB to their domain. Similarly, let ρ be mappings from properties in a KB to their range. Let also sty be a set of concepts from K called semantic types. A complex concept is well-formed if the following conditions hold:
At step 106, the concepts may be filtered according to the criteria numbered 1-6 above. In this way, only concepts that fulfil the above criteria are encoded, and so only concepts of sufficient quality are encoded to reduce the overall number of concepts being encoded such that processing burden is reduced. Step 106 is shown as a broken line as it is optional.
At step 108, the encoded event is stored in a queue of events. The other events in the queue may include other events that have previously been obtained from the chatbot for the same user. The queue of events is stored as electronic data in the memory 24 (
Once a queue of events is available for a user, a user profile can be built as a projection by a projector. The user profile can take the form of a user graph or a table of information specific to the patient.
With reference to
At step 152, the queue is checked to determine if a new event has been recorded since the previous iteration of the user profile.
If there has been a new event recorded since the previous iteration of the user profile the new event is retrieved from the queue at step 154.
At step 156, the event is decoded from the format used to store the event in the queue. In particular, the event is decoded from the JSON format. The decoded event is translated into a form used for the user profile. Where the user profile is a user graph, the event is translated into a set of nodes and edges linking the nodes, as described below in relation to
At step 158, the latest version/iteration of the user profile is retrieved from the memory 24 (
In the event that there is no new event in the queue, the latest iteration of the user profile is retrieved from the memory 24 (
At step 156a, the event is decoded and translated into an interim graph. The interim graph includes a plurality of nodes and edges linking the nodes. The nodes represent information from the event. For instance, one node corresponds to an event identifier (ID), one node may correspond to a concept derived from the chatbot consultation (e.g. the concept may define the diagnosis), one node may correspond to a time stamp associated with the consultation, one node may correspond to a location of the user during the consultation, and one node may correspond to an identifier of the user.
At step 158a, the previous version of the user graph is retrieved from the memory 24 (
As shown in
The other information of an event shown as nodes 52 in
With further reference to
At step 161a, the nodes of the interim graph and the knowledge graph corresponding to the matched concepts are linked using an edge. In this way, the newly added interim graph is joined to the knowledge graph and so is integrated within the user graph.
At step 162a, the user profile is finalised. The user profile may be transmitted to the memory 24 (
With reference to
Once the user profile is available in either form (e.g. user graph or table), a user (e.g. a medical professional) can request analytics.
With reference to
A query is generated by the interface engine 11 (
Once the user history has been compiled, the concept in the query may be used to filter the extracted information from the user history. For instance, where the concept relates to dementia, and where all of the users have been included in the query, several users may have no history of dementia. Accordingly, the filtering will return to the user interface, at step 208, only information relating to users where dementia is included for them in the user history.
In this way, the requesting user can obtain analytics relating to a particular condition, for a particular patient or group of patients. Such knowledge may be used to ascertain warnings relating to outbreaks of certain conditions, for example, a new strain of flu for users in a particular area. For example, the query may include a reference to a geographical region, and cover all users within that region, together with the condition or symptoms. In this way, the user nodes will be identified by identifying all user nodes linked to a node 52 (
When obtaining the concepts at step 206, the projector extracts the IRI of each identified concept. For instance, for the following event:
the extracted IRIs are 266713003 and 372665008, which are value IRIs and are added to indexed fields in the projection. In terms of the actual implementation, they become elements in a list stored in the field entitled “all_iris”.
As a counter example, consider an architecture that does use event sourcing but does not encode the events using JSON as described above. Not having this common JSON structure enforced in events would mean that for each source it would be necessary to implement sonic ad-hoc logic in the projector (or some stream transformation) to extract the IRIs. For example, drug reports could be received as:
and maybe medical conditions could be reported by another system as
The projection would then have to be aware of the different structure of events and parse them differently based on the source. In this way, by encoding the events using JSON and storing each new event in a queue (event sourcing), it is possible to construct the user profile more efficiently.
The process outlined in
An alternative to this would be to start from a medical concept, e.g. dementia, and explore the knowledge base searching all the possible risk factors (e.g. being a smoker) and the subtypes of dementia (e.g. senile dementia). This query is not particularly complex (it's linear in the size of the graph or O(b{circumflex over ( )}d) where b is the maximum branching factor and d is the maximum depth). This would return a set of concept C to capture in the user history. Then it would be necessary to query the clinical history table (
if event sourcing wasn't being used this would be more complicated and involve additional network calls since it would be necessary to query multiple databases to aggregate all the events at query time.
With reference to
At step 254, the interface engine 11 (
The process outlined in
Using the information for the patient over one week, various analytics can be implemented. For instance, it is possible to construct a co-occurrence matrix.
For instance, it is possible to use a “map-reduce” function to aggregate concepts by time bucket and patient. For example, all of the concepts related to the patent X, for week W, are saved into a bucket B=<X,W>. The bucket, B, may be stored as electronic data in the memory 24 (
The output may be:
or normalised as:
Features of some embodiments set out in the following clauses.