Every day, hospitals create a tremendous amount of clinical data across the globe. Analysis of this data is critical to understand detailed insights in healthcare delivery and quality of care, as well as provide a basis to improve personalized healthcare. Unfortunately, a large proportion of recorded data is difficult to access and analyze as most data are captured in an unstructured form. Unstructured data may include, for examples, healthcare provider notes, imaging or pathology reports, or any other data that are neither associated with a structured data model nor organized in a pre-defined manner to define the context and/or meaning of the data. Structured data may include data that are mapped to certain fields, codes, etc. that define the context and/or meaning of the mapped data, such that the meaning/context of the data can be determined based on the mapping.
Hospitals, as well as or other health care providers, try to address this limitation by using a combination of automated or semi-automated and manual processes as part of human-based abstraction to abstract unstructured data into structured data that can be readily interpreted based on the mapping. As part of an abstraction process, abstractors read various documents including unstructured data across a number of formats documenting the clinical encounter (typically electronic health records pathology reports, imaging reports, and laboratory reports), interpret these documents, and structure pertinent information into structured patient data records, such as a cancer registry. As used herein, a cancer registry can include an information system designed for the collection, management, and analysis of data on persons with the diagnosis of a malignant or neoplastic disease, such as cancer. The data stored in a cancer registry can be useful for many applications, such as performing quality of care analysis, cancer research, etc. But the process to manually extract and/or abstract such information into structured medical data records is laborious, slow, costly, and error-prone.
Disclosed herein are techniques for a workflow to convert unstructured patient data into structured patients data records, such as a cancer registry, for a medical application. The medical application may include, for example, a quality of care evaluation tool to evaluate a quality of care administered to a patient, a medical research tool to determine a correlation between various information of the patient (e.g., demographic information) and tumor information (e.g., prognosis or expected survival) of the patient, etc. The techniques can also be applied to other registries, applications, etc. (e.g., an oncology workflow), and in other types of diseases areas.
In some embodiments, the techniques include receiving or retrieving patient data of a patient. The patient data can originate from various primary sources (at one or more healthcare institutions) including, for example, an EMR (electronic medical record) system, a PACS (picture archiving and communication system), a Digital Pathology (DP) system, a LIS (laboratory information system) including genomic data, RIS (radiology information system), patient reported outcomes, wearable and/or digital technologies, social media etc. The patient data can include raw structured and unstructured patient data from the primary sources, as well as processed data (e.g. ingested, normalized, tagged, etc.) derived from the raw patient data.
The techniques may further include, as part of a workflow, processing the patient data using a learning system with an Artificial Intelligence (AI)-assisted clinical extraction tool. The learning system can include, for example, a rule-based extraction system, a machine learning (ML) model (which may include a deep learning neural network or other machine learning models), a natural language processor (NLP), etc., which can extract data elements from the unstructured patient data, classify (e.g., as part of a normalization process) the data elements, and map the data elements to pre-defined data representations (e.g., codes, fields, etc.) to form structured data based on the classification. A data representation may include data that is formatted/translated to a certain standard/protocol such that the data representation can be readily mapped to various data fields of a registry (e.g., a cancer registry). Moreover, as part of the normalization process, the learning system can also detect and correct data errors. The techniques can further include creating/updating a structured medical record, such as a cancer registry, based on the mapping of the data elements, and providing the structured medical record to a medical application for additional processing. The structured medical record can also be provided to other organizations to update other databases containing structured medical records, such as state cancer registries.
As part of the workflow, the AI-assisted clinical extraction tool can be continuously adapted based on new patient data. For example, some of the raw unstructured patient data from the primary sources can be post-processed (e.g., tagged) to indicate mappings of certain data elements as ground truth. The tagged unstructured patient data can be used to train the ML model and the NLP to perform the extraction, classification, and mapping. Moreover, rules of the rule-based extraction system can also be adapted based on the processed patient data to improve the error detection and correction processing. At least some of the tagging operations can be performed by abstractors to train the AI-assisted clinical extraction tool. The AI-assisted clinical extraction tool can then automatically perform the extraction, classification, mapping and correction on other patient data.
These and other embodiments of the invention are described in detail below. For example, other embodiments are directed to systems, devices, and computer readable media associated with methods described herein.
A better understanding of the nature and advantages of embodiments of the present invention may be gained with reference to the following detailed description and the accompanying drawings.
The detailed description is set forth with reference to the accompanying figures.
Disclosed herein are techniques for automated extraction of information into a structured patient data record, such as a cancer registry, based on learning system(s) with AI-assisted clinical abstraction and data normalization operations, and providing the structured patient data record to a medical application. The medical application may include, for example, a quality of care evaluation tool to evaluate a quality of care administered to a patient, a medical research tool to determine a correlation between various information of the patient (e.g., demographic information) and tumor information (e.g., prognosis results) of the patient, etc. The techniques can also be applied to other registries, applications, etc. (e.g., an oncology workflow), and in other types of diseases areas.
More specifically, patient data of a patient can be received or retrieved from multiple sources. The patient data can originate from various primary sources (at one or more healthcare institutions) including, for example, an EMR (electronic medical record) system, a PACS (picture archiving and communication system), a Digital Pathology (DP) system, a LIS (laboratory information system) including genomic data, RIS (radiology information system), patient reported outcomes, wearable and/or digital technologies, social media etc. The patient data can include raw structured and unstructured patient data from the primary sources, as well as processed data (e.g. ingested, normalized, tagged, etc.) derived from the raw patient data.
As part of a workflow, the patient data can be processed using a learning system with Artificial Intelligence (AI)-assisted clinical extraction tool. The learning system can include, for example, a rule-based extraction system, a machine learning (ML) model (which may include a deep learning neural network or other machine learning models), a natural language processor (NLP), etc., which can extract data elements from the unstructured patient data, classify the data elements, and map the data elements to pre-defined data representations (e.g., codes, fields, etc.) to form structured data. Data errors can also be detected and corrected. Examples of the unstructured patient data can include, for example, pathological report, doctor's notes, etc. The pre-defined data representations can include, for example, International Classification of Diseases (ICD), Systematized Nomenclature of Medicine (SNOMED), indications representing biographical information of the patient (e.g., identification, age, sex, etc.), indications representing medical history of the patient (e.g., tumor information, biomarker, history of treatments received, adverse events after the treatments, etc.), etc. Some of the received/retrieved patient data can also include structured data elements in these pre-defined data representations.
A structured patient data record can be updated/created based on the pre-defined presentations. For example, a cancer registry can include a structured data record of the patient including entries correspond to, for example, medical history of the patient, biographical information of the patient, etc. The pre-defined data representations (e.g., ontology representations such as ICD and SNOMED, biographical information, etc.) extracted and mapped from the unstructured patient data, as well as those obtained from the structured patient data, can be used to automatically populate corresponding entries of the data record in the cancer registry. In some embodiments, the pre-defined data representations can also be provided to an abstractor as suggestions to assist the abstractor in populating the entries of the data record.
Moreover, as part of the workflow, the AI-assisted clinical extraction tool can be continuously adapted to new patient data to improve the mapping and normalization processes. For example, some of the original unstructured patient data from the primary sources can be tagged to indicate mappings of certain data elements as ground truth. For example, a sequence of texts in doctor's notes can be tagged as a ground truth indication of an adverse effect of a treatment. The tagging can indicate, for example, a particular data category for a text string. The tagged doctor's notes can be used to train, for example, an NLP of the AI-assisted clinical extraction tool, to enable the NLP to extract text strings indicating adverse effects from other untagged doctor's notes. The NLP can also be trained with other training data sets including, for example, common data models, data dictionaries, hierarchical data (i.e. dependencies between/among text), to extract data elements based on a semantic and contextual understanding of the extracted data. For example, the natural language processor can be trained to select, from a set of standardized data candidates for a data element of the cancer registry, a candidate having a closest meaning as the extracted data. Moreover, some of the extracted data, such as numerical data, can also be updated or validated for consistency with one or more data normalization rules as part of the processing. Entries of the data records of the cancer registry can then be populated using the processed data.
The disclosed techniques can enable automated extraction of patient data from various sources, as well as conversion of the extracted patient data into structured patient data records, such as a cancer registry, which can substantially speed up the generation of structured patient data records. Moreover, using techniques such as natural language processing and data normalization, the likelihood of introducing data errors to the cancer registry can be reduced, which can improve the reliability of the abstraction extraction. Moreover, the cancer registry can include data elements to support clinical research and quality of care metrics computation. With the improvements in the overall speed of data flow and in the correctness and completeness of data and quality metrics, wider and faster access of high-quality patient data can be provided for clinical and research purposes, which can facilitate the development in treatments and medical technologies, as well as the improvement of the quality of care provided to the patients.
As discussed above, manual extraction of patient data from electronic medical records 102 (e.g., pathology reports, imaging reports, etc.) and conversion into patient data records can be a laborious, slow, costly, and error-prone process, which in turn affects performances and timeliness of the medical applications that rely on the cancer registry. For example, errors in the patient data records 110 can lead to generation of inaccurate cancer summary reports 132, cohort characteristics 134, clinical care delivery information 142, and quality of care metrics 144. Moreover, the slow and laborious data entry for patient data records 110 can also introduce delay in, for example, detection and remedy of problems in the administration of care.
The present disclosure proposes a data processing system that can perform automated extraction of patient data from electronic medical records and conversion into a structured patient data record, such as a cancer registry. The automated extraction can reduce or even eliminate the need for manual extraction and entry of patient data, which are slow and laborious as explained above. The data processing system can a learning such as, for example, a rule-based extraction system, a machine learning (ML) model (which may include a deep learning neural network or other machine learning models), a natural language processor (NLP), etc., to extract data elements from the unstructured patient data, classify the data elements, and map the data elements to pre-defined data representations (e.g., codes, fields, etc.) to form structured data, and then populate various fields of a structured patient data record (e.g., a cancer registry) based on the structured data. The data processing system can also operate in various modes, such as a full-automated mode in which the data processing system automatically populate the fields, or a hybrid mode in which some of the fields are populated by the data processing system while the rest of the fields are populated by a human abstractor. The hybrid mode can be part of the learning process to update the machine learning model.
A. System Overview
In some examples, patient data abstraction module 202 can receive raw patient data 210 of patients from primary data sources 212. Primary data sources 212 may include an EMR (electronic medical record) system, a PACS (picture archiving and communication system), a Digital Pathology (DP) system, an LIS (laboratory information system) including genomic data, an RIS (radiology information system), patient reported outcomes, wearable and/or digital technologies, social media, etc. Patient data processor 200 can perform an abstraction process of patients data, which include extraction of data elements from the raw patient data 210 and mapping the extracted data elements to various data element fields/entries of patient data records 110.
Patient data abstraction module 202 can perform abstraction of data using various techniques. For example, patient data abstraction module 202 can include a learning system with Artificial Intelligence (AI)-assisted clinical extraction tool. The learning system can include, for example, a rule-based extraction system, a machine learning (ML) model (which may include a deep learning neural network or other machine learning models), a natural language processor (NLP), etc., which can extract data elements from raw unstructured patient data (e.g., pathological report, doctor's notes, etc.), classify the data elements, and map the data elements to pre-defined data representations (e.g., codes, fields, etc.) to form structured data. The pre-defined data representations can include ontology representations including, for example, International Classification of Diseases (ICD) and Systematized. Nomenclature of Medicine (SNOMED). The data representations may also include indications representing biographical information of the patient (e.g., identification, age, sex, etc.), indications representing medical history of the patient (e.g., tumor information, biomarker, history of treatments received, adverse events after the treatments, etc.), etc. Moreover, the natural language processor can select, from a set of standardized data candidates for a data element field of the cancer registry, one or more candidates having the closest meaning as the extracted data.
Patient data abstraction module 202 can also perform data normalization on the numerical data (e.g., validating the expected range) to validate the numerical data, and to correct or flag invalid numerical data. The data normalization can be performed based on one or more data normalization rules. In some examples, raw patient data 210 may also include structured medical data having the pre-defined data representations, and patients data abstraction module 202 can extract data elements based on identifying the pre-defined presentations of the data elements.
Based on an operation mode, patient data abstraction module 202 can automatically populate different fields of patient data records 110 using the processed data, or assist an abstractor in populating the fields of patient data records 110. For example, in one operation mode, patient data abstraction module 202 can automatically populate, via server 122, different fields of patient data records 110 of database 120 based on pre-determined mapping between the pre-defined data representations and the fields of patient data records 110. Moreover, in a different operation mode, patient data abstraction module 202 may allow manual extraction as a backup option when, for example, AI-assisted clinical extraction tool outputs a low confidence level for the output, which may indicate that raw patients data 210 include data that are inconsistent with the training data set. In some examples, patient data abstraction module 202 may adopt a hybrid approach by allowing a human abstractor to populate certain data element fields, via a display interface 206 and server 122, while using the AI-assisted clinical extraction tool to populate other data element fields. Patient data abstraction module 202 may generate other information, such as a progress report for tracking the completion of a patient's data record, the percentages of fields being populated manually versus being populated automatically by the AI-assisted clinical extraction tool, etc., to facilitate the management of abstraction operations.
As part of the workflow, the AI-assisted clinical extraction tool can be continuously adapted, as described above. Specifically, patient data abstraction module 202 can receive processed patients data 214 from secondary data sources 216, such as a training data database, to train or adapt the models/rules for extracting data elements. Processed patients data 214 can be derived from some of the prior raw patients data 210 that have been processed (e.g., tagged) to indicate mappings of certain data elements as ground truth. The tagged raw patients data can be used to train the learning system (e.g., a ML model, an NLP, etc.) to perform the extraction, classification, and mapping processing. Moreover, rules of the rule-based extraction system can also be adapted based on the processed patient data to improve the error detection and correction processing. Processed patients data 214 can also be generated by the manual population of data element fields via display interface 206.
To further improve the quality of data stored in the patient data records 110 (e.g., the processed data reflecting the correct interpretation of the extracted data), the data of patient data records 110 can be validated as part of a periodic data curation process, which can be automated or handled manually on a regular basis. As part of the data curation process, any erroneous data in patient data records 110 can also be corrected. The learning system can be retrained based on the extracted data input and the desired processing output. Moreover, the one or more data normalization rules can be revised if incorrect normalization outputs are detected. As the learning system is re-trained using a more complete and accurate training data set, and the data normalization rules are also adjusted, the quality of processing output as well as the speed of processing can be improved.
After patient data abstraction module 202 populates patient data records 110 in database 120, data analytics module 204 can obtain data included in multiple sections of patient data records 110 from multiple patients included in database 120, and perform various analyses on patient data records 110. For example, in a case where patient data records 110 is part of a cancer registry, data analytics module 204 may include a cancer data analytics module 220 to perform analysis on data related to cancer types represented in patient data records 110 to generate, for example, cancer summary reports 132, cohort characteristics 134, etc. Moreover, a care quality metrics analytics module 222 can perform analysis on data related to a quality of care deliver to the patients represented in patient data records 110 to generate, for example, clinical care delivery information 142, quality of care metrics 144, etc. Further, patients data processor 200 may include a reporting module (not shown in
Display interface 206 allows a user (e.g., an abstractor, an epidemiologist/clinical researcher, a hospital administrator, etc.) to interact with the patient data processor 200. For example, the display interface 206 allows the abstractor to instruct the patient data abstraction module 202 to perform automatic population of the fields of patient data records 110, to view the populated data, etc. Display interface 206 also allows a hospital administrator to retrieve and view reports of various quality of care metrics as well as other derived reports (e.g., accreditation report, etc.). The display interface 206 also allows a researcher to retrieve and view reports from cancer data analytics module 220 (e.g., cancer summary report, cohort characteristics, etc.). In some examples, as to be described below, the display interface 206 can be in the form of a dashboard which allows the user to select and customize the displayed information.
B. Patient Data Abstraction Module
AI-assisted clinical extraction tool 302 can include a natural language processor 304 to extract data elements from unstructured raw patients 210, map the extracted data elements to a pre-determined data representation, and populate the fields of patient data records 110 that correspond to the pre-determined data representation.
Specifically, referring to
Moreover, node 316b is connected to a node 318c representing a medication category, as well as to a node 318d representing other categories. This represents that for a sequence of words/phrases represented by node 314 and 316b (e.g., “Jane Doe takes”), the category of the word/phrase that follows can be for a medication or other information, and there is a 90% chance (represented by “0.9” in
Further, node 316c is connected a node 318e representing a medication category with a 90% chance, as well as to a node 318f representing other categories. The combination of nodes 314, 316c, and 318e can indicate that a patient subject stops taking a certain medication. Node 318e is further connected to a set of nodes, including nodes 320, 322a, and 322b representing possible explanations of why the patient subject stops taking the medication. Node 322a represents a side-effect of the medication, whereas node 322b represents other reasons. There is a 90% chance that the phrase/word that follow node 318e refers to a side-effect of the medication, and there is a 10% chance that the phrase/words that follow node 318e refers to other reasons why the patient stops taking the medication. The probabilities can be based on the prior raw patients data entered by the user into primary data sources 212.
Natural language processor 304 can refer to the decision tree to determine a category of the word/phrase extracted from raw patients data 210. For example, if natural language processor 304 extracts a sequence of words/phrases “Jane Doe is”, which maps to a sequence of nodes 314 and 316a, natural language processor 304 can determine that the next word/phrase to be extracted more likely refers to a gender than an age of the patient. Also, if natural language processor 304 extracts a sequence of words/phrases “Jane Doe takes”, which maps to a sequence of nodes 314 and 316b, natural language processor 304 can that the next word/phrase to be extracted more likely refers to a medication taken by the patient. Further, if natural language processor 304 extracts a sequence of words/phrases “Jane Doe does not take”, natural language processor 304 can that the next word/phrase to be extracted more likely refers to a medication. If the sequence of nodes 314, 316b, and 318e is followed by words/phrases representing a reasoning statement (indicated by node 320), the reasoning statement is more likely to refer to a side-effect of the medication.
While
Based on the determination of the categories of data elements 334, 336, and 338, data normalization module 306 can map each of data elements 334, 336, and 338 to, respectively, data representations 344, 346, and 348. For example, data representation 344 uses a patient identifier (“001”) to represent the patient's name (“Ms. Smith”). Data representation 346 uses a code (“ABC”), which can be based on SNOMED, ICD, or other standards, to represent the drug taken by Ms. Smith (“RX1”). Further, data representation 348 can link data element 338 (“nausea”) to a field representing the adverse effect developed by Ms. Smith as a result of taking drug ABC. At least some of the mapping can be based on data table 330 of
Each of data representations 344, 346, and 348 can correspond to various fields of a patient data record. For example, data representation 344 (patients identifier) can correspond to a patient's identifier field in patient biography information 112. Data representations 346 (drug) and 348 (adverse effect of the drug) can correspond to fields in treatment history 116 concerning a drug the patient has taken, and the adverse side effect the patient has developed from the drug. AI-assisted clinical extraction tool 302 can then populate the fields of patient data records 110 based on these data representations.
C. Training Operation to Perform Data Element Extraction
NLP 304 and data normalization module 306 (or other machine learning model, or a rule-based extractor) can be trained/adapted to identify data elements 334, 336, and 338 and their categories based on a training data set 350. Training data set 350 may include, for example, a common data model 360, dictionaries 362, hierarchical data 364, tagged data 366, etc., to identify data elements 334, 336, and 338 based on a semantic and contextual understanding of the extracted data developed through the training.
Specifically, a common data model 360 may define, for example, semantic structure of sentences, which enables NLP 304 to recognize a semantic structure and to deduce a meaning of a text based on the semantic structure and the text's location in the structure. Part of language extraction model 312 of
In the example of
In addition, NLP 304 can also be trained by tagged data 366. Tagged data 366 may include raw unstructured patients data 210 which has been processed by, for example, having certain data elements tagged. The tagging can be performed by, for example, an abstractor, an administrator of patients data processor 200, etc. Tagged data 366 may include a similar pattern of data elements as text data 332, and the data elements can be tagged to indicate, for example, which data categories the data elements belong to, which data representations the data elements are mapped to as ground truth, etc. NLP 304 can be trained by tagged data 366 to, for example, update the probability of a word/phrase representing a certain data category in language extraction model 312. As a result, when NLP 304 receives untagged text data 332 including data elements 334, 336, and 338, NLP 304 can recognize the data pattern and determines the data representations for the data elements based on the recognized data pattern.
D. Data Normalization
Referring back to
In some examples, natural language processor 304 and data normalization module 306 can operate together in various ways to handle the extracted data. For example, the natural language processor 304 and data normalization module 306 can operate in parallel to handle different sets of extracted data. In one example, data normalization module 306 can be assigned to handle shorter text strings, numerical values, etc., for which data normalization rules can define a reference numerical range or a set of standardized text data candidates. Natural language processor 304 can be assigned to handle more complex text strings, which may require some forms of contextual and semantic analyses to determine the intended meaning of the text strings for the output. Data normalization module 306 and natural language processor 304 can also operate in a serial fashion on the same set of extracted data. For example, data normalization module 306 can perform pre-processing on the extracted data to correct typos and/or out-of-range values. Natural language processor 304 can then process the pre-processed data to generate an output associated with data elements in patient data records 110.
E. Manual Cancer Registry Population Assistance
Patient data abstraction module 202 further includes a manual population module 308, which allows a human abstractor to manually populate the fields of patient data records 110 via a display interface 206. The manual population module 308 can operate with AI-assisted clinical extraction tool 302 in various ways. For example, a display interface 206 can provide a selection option for each data element to select between automatic population and manual population. If automatic population is selected for a given data element, the AI-assisted clinical extraction tool 302 can extract the data from its primary data source(s) 212 tagged with a tag corresponding to the field, and populate the extracted data in the field. If manual population is selected, the user can enter the data for the field manually via the display interface 206. As another example, automatic population may be set as default, whereas manual population is provided as a backup when, for example, the confidence level of the natural language processor output is below a threshold.
F. Abstraction Management Module
Abstraction management module 310 can generate analytical results of the abstraction operations and manage the abstraction operations based on these results. For example, the extraction management module 310 can generate data-driven results reflecting the abstraction progress, such as percentage of completion of each patient's malignancy included in a given patient data record. The abstraction progress analysis results can also be aggregated at different levels, such as for different human abstractors assigned for the abstraction operations or for different caregivers (e.g., hospitals, clinics, etc.). The abstraction progress analysis results can be displayed via the display interface 206 and/or provided via other means to facilitate management of the abstraction operations. The abstraction progress analysis can also be used by abstraction management module 310 to track the progress of the automatic abstraction operations if the operations are fully automated. In addition, abstraction management module 310 can also generate results reflecting the confidence levels of the automatically populated data element fields (e.g., the confidence levels of the outputs of natural language processor 304). The confidence level can be based on, for example, a probability of a data element mapped to a particular data category as indicated in language extraction model 312. The confidence level information can be displayed via the display interface 206 to, for example, allow a user to select between automatic and manually populated data elements, as described above.
In addition, abstraction management module 310 can perform a routine cadence of data validation to improve the quality of data included patient data records 110 (e.g., the processed data reflecting the correct interpretation of the extracted data). The data curation process can be performed according to a management schedule. As part of the data curation process, the data of patient data records 110 can be validated and erroneous data can be corrected. Moreover, natural language processor 304 can be retrained based on the new extracted data and the one or more data normalization rules can also be revised if incorrect normalization outputs are detected. In some examples, the validation can be performed automatically by abstraction management module 310. For example, the natural language processor 304 can be retrained using a set of most recent extracted data. After the retraining, AI-assisted clinical extraction tool 302 can revisit earlier extracted data that have been processed and stored in patient data records 110, and reprocess those data with the retrained natural language processor 304. To further the data validation functionality and improve data quality included in patient data records 110, AI-assisted clinical extraction tool 302 can update the data of patient data records 110 if the data mismatch with the reprocessed data.
Data contained with patient data records 110 can be procured by a data analytics module 204 to perform various automated analyses on the data. For example, as described above, cancer data analytics module 220 can generate, for example, cancer summary reports 132, describe cohort characteristics 134, etc. Moreover, care quality metrics analytics module 222 can generate, for example, clinical care delivery outcomes 142, quality of care metrics 144, etc. All these reports can also be displayed in an analytics dashboard provided by display interface 206. The analysis can be performed based on all or a subset of the patient data records 110 in database 120.
Moreover, as shown in
The analytics data shown in display interface 206 of
In addition, the patients data stored in patient data records 110 can be provided to different medical applications including, for example, a clinical decision application, regional/national cancer registries, accreditation boards, etc. For example, treatment history 116 can be used to predict the effect of treatment on a patient having similar characteristics (e.g., based on tumor information 114, biomarkers 118, etc.) as other patients whose records are stored in patient data records 110. Moreover, the patients data stored in patient data records 110 can be reported to regional/national cancer registries, accreditation boards, etc., to, for example, support affective oversight of the caregivers.
In operation 702, the patient data processor 200 can receive patients data for an individual patient. The electronic medical records are received from one or more sources comprising at least one of: an EMR (electronic medical record) system, a PACS (picture archiving and communication system), a Digital Pathology (DP) system, an LIS (laboratory information system), a RIS (radiology information system), wearable and/or digital technologies, social media etc.
In operation 704, patient data processor 200 can process the patient data using a learning system with Artificial Intelligence (AI)-assisted clinical extraction tool (e.g., AI-assisted clinical extraction tool 302). The processing may include extracting, based on a trained language extraction model that reflects language semantics and a user's prior habit of entering other patient data, data elements from the patient data and data categories represented by the data elements, and mapping the extracted data elements to pre-determined data representations based on the data categories.
The learning system can include, for example, a rule-based extraction system, a machine learning (ML) model (which may include a deep learning neural network or other machine learning models), a natural language processor (NLP), etc., which can extract data elements from the unstructured patient data and determine their data categories based on a trained language extraction model, such as language extraction model 312 of
In operation 706, patient data processor 200 can populate fields of a data record of the patient corresponding to the data representations. The data representations (e.g., patients biography data, medication, side-effect, etc.) may correspond to certain fields of the data record, and the fields can be populated based on the corresponding data representations.
In operation 708, patient data processor 200 can store the populated patient data record in a database accessible by the medical application. The medical application may include, for example, a quality of care evaluation tool to evaluate the quality of care administered to a patient or patient population, a medical research tool to estimate a correlation between various information of the patient (e.g., demographic information) and tumor information (e.g., prognosis results) of the patient, a reporting tool to report the patient data record (e.g., a cancer registry) to a regional/national cancer registry, etc. The patients data processor 200 may include a data analytics module (e.g., data analytics module 204) to obtain data from sections (i.e. tables) included in the patient data record and to perform data analytics operations, with display of the data in a display interface (e.g., display interface 206), based on the techniques described above.
Any of the computer systems mentioned herein may utilize any suitable number of subsystems. Examples of such subsystems are shown in
The subsystems shown in
A computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface 81 or by an internal interface. In some embodiments, computer systems, subsystem, or apparatuses can communicate over a network. In such instances, one computer can be considered a client and another computer a server, where each can be part of a same computer system. A client and a server can each include multiple systems, subsystems, or components.
Aspects of embodiments can be implemented in the form of control logic using hardware (e.g. an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner As used herein, a processor includes a single-core processor, multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present invention using hardware and a combination of hardware and software.
Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission. A suitable non-transitory computer readable medium can include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.
Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps. Thus, embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective step or a respective group of steps. Although presented as numbered steps, steps of methods herein can be performed at the same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, all or portions of a step may be optional. Additionally, any of the steps of any of the methods can be performed with modules, units, circuits, or other means for performing these steps.
The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the invention. However, other embodiments of the invention may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects.
The above description of example embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form described, and many modifications and variations are possible in light of the teaching above.
A recitation of “a”, “an” or “the” is intended to mean “one or more” unless specifically indicated to the contrary. The use of “or” is intended to mean an “inclusive or,” and not an “exclusive or” unless specifically indicated to the contrary. Reference to a “first” component does not necessarily require that a second component be provided. Moreover, reference to a “first” or a “second” component does not limit the referenced component to a particular location unless expressly stated.
All patents, patent applications, publications, and descriptions mentioned herein are incorporated by reference in their entirety for all purposes. None is admitted to be prior art.
The present application is a continuation of International Patent Application No. PCT/US2020/019089, filed Feb. 20, 2020, which claims priority to U.S. Provisional Pat. Appl. No. 62/807,898, filed on Feb. 20, 2019, each of which is incorporated herein by reference in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
62807898 | Feb 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2020/019089 | Feb 2020 | US |
Child | 17445475 | US |