A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The Affordable Care Act of 2010 has prompted healthcare organizations to shift from a fee-for-service financial model to a value-based model. The goal of a value-based model is to deliver better quality care while lowering costs through developing patient-centered integrated care and support. Integrated care requires that providers take a holistic view of their patients and to understand the patient's unique characteristics and the full spectrum of their ailments. Integrated care approaches improve care and efficiency by allowing providers to develop an individualized care plan. While delivering targeted interventions on individual basis may not be cost-effective, segmentation of a population into groups of individuals with similar characteristics is tractable and proven to be effective in increasing the quality of care, lowering costs, and future hospital admissions. For instance, it has been shown that state-wide programs that identify elderly people who have unmet long-term care needs and connect them to Medicaid home and community based services lowered the annual Medicaid spending by 24%.
The growing adoption of electronic medical records (EMR) by providers has set the stage for integrated health care delivery. EMRs provide easy access to quantitative (e.g., laboratory and diagnostic test results) and qualitative (e.g., text-based discharge summaries) health data as well as transactional and financial data within and across multiple health systems. Experts urge the use of these rich data sources in EMRs for extraction of knowledge to improve decision making and care delivery.
Individualized healthcare requires that a physician has a comprehensive view of the patient's physical, behavioral, social and environmental conditions. People with multiple medical disorders and mental illness have a much higher risk of hospitalization, mortality and poor outcomes. To improve care by targeting interventions, analytical methods are being developed to identify individuals who are at risk of poor outcomes. Many of the risk prediction methods rely on structured data in the EMR such as lab results, diagnosis codes and procedure codes. However, better methods are needed that extract knowledge from the unstructured qualitative data in the EMR to have a deeper understanding of the complex characteristics of patients so that targeted interventions can be incorporated in their care delivery.
It is well known that structured data such as diagnosis codes (ICD9/10) alone cannot capture the complexities of individual patients. For example, structured data may indicate that a patient has late stage diabetes or uncontrolled diabetes, but it does not suggest possible causes. Knowing that a patient is not taking the prescribed medication because of the cost or access to a pharmacy can help the provider develop a more effective care plan. However, this type of qualitative information does not appear in the structured data, rather it appears in the unstructured text-based notes. Similarly, addressing the social and behavioral health of patients will improve treatment and management of physical disorders.
Certain illustrative embodiments illustrating organization and method of operation, together with objects and advantages may be best understood by reference to the detailed description that follows taken in conjunction with the accompanying drawings in which:
While this invention is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detail specific embodiments, with the understanding that the present disclosure of such embodiments is to be considered as an example of the principles and not intended to limit the invention to the specific embodiments shown and described. In the description below, like reference numerals are used to describe the same, similar or corresponding parts in the several views of the drawings.
The terms “a” or “an”, as used herein, are defined as one or more than one. The term “plurality”, as used herein, is defined as two or more than two. The term “another”, as used herein, is defined as at least a second or more. The terms “including” and/or “having”, as used herein, are defined as comprising (i.e., open language). The term “coupled”, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically.
Reference throughout this document to “one embodiment”, “certain embodiments”, “an embodiment” or similar terms means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of such phrases or in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments without limitation.
In an embodiment, a system and method is presented to extract highly meaningful terms from unstructured fields in an Electronic Medical Record (EMR) that distinguish two different patient populations. The differences between patient populations that, at a high level, may present similar symptoms and lead to the same diagnosis may provide healthcare professionals with information to more quickly and efficiently target populations of patients requiring broader or more targeted healthcare services. The system enables healthcare providers to identify root causes of health outcomes and to discover potential targets for intervention and improving healthcare delivery for segments of patient populations that may require broader or more substantial healthcare services. In a non-limiting example, using this system, healthcare providers can quickly identify terms that are highly associated with a specific set of patients, for instance, chronic heart failure (CHF) patients who are high utilizers of the emergency department (ED) compared to CHF patients with low ED utilization. An additional non-limiting example might be the identification of diabetic patients suffering from neuropathy within a larger population of diabetic patients overall.
In an embodiment, EMRs for all patients contain large amounts of descriptive and unstructured text from which data relevant to diagnosis, treatment, and prognosis may be gleaned. However, culling through this descriptive and unstructured text is a gargantuan task for which healthcare professionals have little time, given the caseload serviced by such professionals on a daily basis. The system and method provides analysis of this descriptive and unstructured text to identify and present to healthcare professionals categories of patients based upon pre-determined identifiers such as, in a non-limiting example, diagnosis, and sub-categories within these categories presenting distinguishing characteristics of the sub-category from the overall category. The system and method may create characterizations of patient populations for an overall category. Within that category, the system and method may identify terms, where terms may be single words or groupings of words, that distinguish a set of patients from the overall category and create a sub-category wherein the sub-category maintains the original identifier, but presents an additional identifier that serves to distinguish the patient population in the sub-category from the patient population in the overall category. The automated system may also utilize the sub-category patient population as a starting population and provide an analysis of the sub-category to create additional patient populations that have additional identifiers that further distinguish the additional populations from the patient population in the sub-category. In this manner, the automated system may derive multiple sets of patient populations that have characteristics and identifiers in common, but distinguish them from broader categories of patient populations.
In an embodiment, the automated system may be pre-configured with limits on the eligible terms to be applied to the patient populations when performing the textual analysis of the EMRs. The eligible terms may also be weighted on a percentage of interest to the healthcare professional to restrict the patient population discovered and to provide a ranking for patients in the discovered population. These configuration parameters permit the system to present the healthcare provider with ranked lists of patient categories and permit focus on only the terms most relevant to the healthcare provider from the patient EMRs.
In an embodiment, upon the conclusion of the semantic analysis of EMRs for requested or pre-configured terms, the patient populations, in any categorization, may be filtered and displayed to the healthcare professional according to various standard vocabularies in use in healthcare systems. Further filtering the patient population utilizing the terms from one or more standard vocabularies provides the ability to place patients identified at the top of the list of patients within a category or population. By way of example and not of limitation, vocabularies such as the Systematized Nomenclature of Medicine (SNOMED) or Federal Drug Agency (FDA) drug lists may be used as parameters by the system to further identify and categorize patient populations. Such vocabularies may permit the more rapid identification of patients within a population whose records contain text from these vocabularies. However, the two vocabularies presented are simply examples of vocabularies that may be pre-configured for use by the system, and the system may be configured to use any number of standard vocabularies for further analysis of EMRs based upon the instructions of a healthcare provider.
In an additional embodiment, the system and method may present to the healthcare professional records that are selected during semantic analysis for the reason that identifiers are found in the records, or because the identifiers are not found in the records. The identifiers may be defined by a healthcare professional as terms of interest for analysis of the EMRs. The healthcare professional may provide direction to the system to present records either containing or not containing terms that are of interest to the healthcare professional. This embodiment of the automated system provides the healthcare professional with lists of patients with terms of interest either highlighted or absent to quickly provide the healthcare professional which terms of interest are not found at the same prevalence in the reference set of patient records as distinguished from a comparison set of patient records.
The transformative nature of the system is such that users can quickly assess specific terms extracted from the unstructured fields in the EMRs of a group of patients to discover and explore unique characteristics. The system provides a decision-support tool for providers to understand the complexity of their patients and patient populations to develop targeted intervention or treatment strategies. The tool will assist providers to improve quality of care and to reduce cost for the healthcare system. Some examples of the transformative application of the semantic analysis system are provided below.
In an exemplary embodiment, the performance of the automated semantic analysis system in the identification of diagnoses from Physicians' notes as compared to analysis of discharge diagnosis codes alone for a group of patients with frequent ED visits are shown in TABLE 1. Typically, diagnosis codes underestimate the frequency of indications. In this non-limiting example, analysis of ICD9 codes revealed 19% of the frequent ED utilizers at an urban hospital were coded for schizophrenia. However, the automated semantic analysis system identified 51% of the frequent ED utilizers were associated with schizophrenia.
In this embodiment, it is important to note that other terms, such as homelessness and other terms, may not be represented in the diagnosis codes provided in discharge data, but may be highly relevant to delivery of high quality care to specific patients. Such additional terms may be specified by a healthcare professional to assist in identifying patients or groups of patients by screening for such terms that the healthcare professional has discovered, in their experience, may be ancillary, but relevant, to any particular discharge code or discharge data for ED utilizers. The automated semantic analysis system may then preferentially analyze physicians' notes for the defined terms either in association with particular discharge data records, or absent from particular discharge data records to provide the healthcare professional with patients that may require additional or more specific services. In this non-limiting example, the automated semantic analysis system identified 72% of the frequent ED utilizers at an urban hospital as being associated with the term ‘homeless.’ In addition, the automated semantic analysis system identified 65% of the frequent ED utilizers were associated with congestive heart failure (CHF), compared to only 3% found by analysis of ICD9 diagnosis codes utilized in the discharge data.
In an additional embodiment, as shown in TABLE 2, automated semantic analysis system has shown great utility in discovering a potential cause of admission in a subset of 1,267 patients out of a total of 3,466 patients in an oncology clinic. In the non-limiting example presented by TABLE 2, among the top terms associated with the admission group, three different formulations of granulocyte colony-stimulating factor drugs (Filgrastim, Pegilgrastim, and Sargramostim) were identified by the automated semantic analysis system as being present in physicians' notes for the patients of the clinic. These drugs are often given during chemotherapy to stimulate the production of white blood cells. Importantly, the system identified that Sargramostim was associated with 50% higher rate of hospital admission when compared to Pegfilgrastim alternative therapy. This result is consistent with a recent report entitled “Comparative effectiveness of filgrastim, pegfilgrastim and sargramostim as prophylaxis against hospitalization for neutropenic complications in patients with cancer receiving chemotherapy”. The authors reported that the risk of hospitalization was 2.1%, 1.1%, and 2.5% for filgrastim, pegfilgrastim and sargramostim, respectively. The adjusted odds of hospitalization were significantly higher for filgrastim and sargramostim compared to pegfilgrastim. The automated semantic analysis system discovered lower admission rates for patients receiving pegfilgrastim in the group of oncology patients examined. These results highlight the utility of the automated semantic analysis system in several ways. First, the automated semantic analysis system quickly identified risk factors associated with high admission rates of a subpopulation of oncology patients. Second, the automated semantic analysis system revealed areas were best-practices may not be implemented in a given provider group. Thus, using the automated semantic analysis system, healthcare providers may quickly identify areas for improving the quality of care specific to the patient population under their care.
In an embodiment, additional non-limiting examples of the transformative nature of the automated semantic analysis system for discovery and population health management include: 1) Identification of root cause of out-of-control diabetes within a group of 400 patients who, after 12-month diabetes program, were not able to lower their HbA1C below 8%, and 2) Identification of major cause of poor performing in-patients with stroke which, when addressed with appropriate treatment, decreased hospital costs by $2 million per year.
Turning now to
Upon creation or retrieval, the concatenated text document is processed using natural language processing methods 120 to remove lab results, negations (e.g. ‘patient does not have diabetes,’ or ‘the test result is negative for HIV,’ etc), or other comments and observations that have been defined by the healthcare provider. In addition, only a single family history and history-and-physical result is represented for each patient in a patient document to avoid inflation of terms for a given patient. The collection of all patient concatenated text documents in a particular healthcare system is represented in a patient document corpus 130. Any of a plurality of standard term weighting methods 140 (e.g. tf-idf, log entropy, etc) may be applied to the patient document corpus, such that each term in the patient document corpus is assigned a weight representing the frequency of the term in the patient's document with respect to the frequency of the term across all documents in the corpus. At 150, weighted terms may be mapped to a variety of standard vocabularies or ontologies (e.g. ICD9, CPT, SNOMED, FDA drug list, etc) to further identify, categorize, and characterize patient populations. Highly ranked terms and vocabulary/ontology classifications are provided to the healthcare provider by the system 160, to quickly summarize the highly relevant characteristics as associated with the attributes of interest defined for each patient in any given analysis effort. The automated semantic analysis system may then create patient populations with similar characteristics based upon these highly relevant characteristics.
Turning now to
In an exemplary embodiment, to continue an analysis of each patient against any patient population, the system requires the input of a patient population reference list, at 210, which includes a list of patients to be used for comparison against the patient reference list for each group of patients with specified attributes of interest. In a non-limiting example, the patient population reference list could be the entire population or a subset of patients which had good outcomes for a particular disease compared to bad outcomes. At 160, patient summarization data may be extracted from both the patient reference list and the patient population reference list. At 220, the user may define the minimum number of top ranked weighted terms to be used for comparisons during the list analysis step.
At 240, the system calculates differences in frequencies and odds ratios for terms between the reference input list and the patient population reference input lists. The frequency is calculated for each term t in R, where R is the patient reference list,
freq(t,R)=x(t,R)/|R|,
where x(t, R) is a count of patients that contain term t, where t represents an attribute of interest, the odds ratio may then be calculated for each term t with respect to R and S, where S is the patient population reference list,
odds(t)=freq(t,R)/freq(t,S),
where odds(t) is set to infinity when freq(t, S)=0.
In this exemplary embodiment, the automated semantic analysis system at 250 produces a ranked list of terms which differentiate between the two input patient lists. At 260, the user may select specific terms to produce a new patient population reference list and the process is re-iterated at step 160. The user selected terms are based on subjective analysis of the top ranked differentiating terms according to user's expertise and preferences. Once the term is selected, the system identifies all patients in the population whose records explicitly contain the term of interest. At this point, the user may choose a subset of patients identified to have the term of interest as a new reference population, upon which a new term differential analysis can be performed.
Turning now to
After computation, the system displays the results in a separate panel and records all of the activities of the user in a ‘History’ log and displays the activity in a different panel. The user may delete previous runs or designate specific runs as ‘Favorites’ by clicking on the star icon (⋆) for ease of future access. In the display panel, terms may be ranked based on frequency percentage or odds ratios in the reference list RPOI compared to the comparison list CPOI.
In an exemplary embodiment, in one analysis operation the system found that 55 patients (12%) in the RPOI were associated with the term ‘neuropathy.’ The odds that the term neuropathy appears in the records of the RPOI were 7.04 times higher than the odds of the term appearing in the records of the CPOI.
Turning now to
Turning now to
In this non-limiting example, odds ratios are computed for each highlighted attribute of interest that is present in the patient records. The frequency percentage is displayed to the user as freq(t, R). In this result, any highlighted terms quickly show which terms are not found at the same prevalence in the selected RPOI versus the CPOI utilized in the analysis.
Turning now to
Turning now to
While certain illustrative embodiments have been described, it is evident that many alternatives, modifications, permutations and variations will become apparent to those skilled in the art in light of the foregoing description. By way of example and not of limitation, machine learning and natural language processing algorithms can be developed to further filter the differentiating terms between two patient populations to suggest clinically impactful terms. In addition, further improvements to the system would automatically suggest probabilistically favorable courses of action based on the term differential analysis to assist clinical decision support toward individual and population level health management.