1. Field of the Invention
The present invention is related to aggregating population data according to member similarity and more particularly to aggregating electronic health records from multiple data sources based on patient similarities.
2. Background Description
Healthcare digitization has produced voluminous data. Doctor's offices, that have been converting paper patient records to electronic records, collect new patient data in an electronic format, e.g., as electronic health records (EHR). EHRs make patient histories readily available, e.g., for making/supporting clinical decisions. Existing EHR data can facilitate subsequent patient diagnosis and treatment. Matching new patient symptoms and other characteristics to patient histories to find patients with similar symptoms and characteristics, may provide the patient's doctor with an early diagnosis and suggest treatment. At the very least, it will winnow the potential diagnosis and treatment to a few likely diagnoses and treatments. However, while multiple patients may have the same diagnosis, no two people are identical, e.g., symptoms and treatment may be different. Thus typically, complete matches are infrequent.
While finding complete matches in the voluminous, multi-dimensional data may be a relatively simple task, defining and finding similar cases can be much more complicated. The degree of similarity desired, for example, can complicate matching similar patient histories. Further, having been collected by multiple health care providers in different formats, the raw history data may be in multiple locations in different databases/sources in multiple incompatible formats. The data formats may include, for example, International Classification of Diseases, Ninth Revision (ICD9), Current Procedural Terminology (CPT) codes, National Drug Codes (NDC), LAB, clinical notes. These formats rely heavily on coding the data both to quickly categorize it and for efficient data handling.
However, the variety and variation of these codes can complicate comparing data further. Typically there isn't a one to one mapping for codes, making it more difficult to: value the relevance of the raw data, determine event timeliness, and determine for each match what coded events are more important than others. Missing data or mismatched codes may mask similarities. Noise, e.g., unrelated symptoms, in the raw data can further shade results. Moreover, once similar results are matched, those results are not an ultimate determination. That, typically, is made by a requesting physician. Currently, there is no mechanism that allows the requesting physician to provide similarity goodness feedback based on his/her clinical intuition used to make a final diagnosis and prescribe an appropriate treatment.
Thus, there is a need for a way to identify similarities in patient histories and aggregate the results to reflect a global similarity.
A feature of the invention is a similarity measure for grouping members of a population based on member similarities;
Another feature of the invention is improved matching of medical patients with similar conditions based on patient similarities;
Another feature of the invention is improving matching of medical patients with similar conditions based on feedback from medical professionals with regard to previous grouping;
Yet another feature of the invention is a similarity measure for matching medical patients based on patient similarities, and further honed by feedback from medical professionals with regard to previous grouping.
The present invention relates to a system, method and program product for matching members of a population, e.g., patients, based on member similarities. Patients are mapped to a bipartite graph with patient nodes connected by weighted edges to clustered factor nodes, are clustered categorically. As a new patient query is received, a similarity measure for each other patient is generated for each cluster by comparing cluster edges. The cluster similarity measures are aggregated for each patient to provide a global closeness measure to every other patient. Based on the global closeness measure, a list of the closest patients is displayed and measurement feedback may be provided.
The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Turning now to the drawings and, more particularly,
The similarity measurement module 102 determines a pairwise patient similarity score for a current patient against histories, e.g., in storage 110, for other individual patients to identify similar conditions. In particular, the similarity measurement module 102 uses a general patient similarity measure for handling heterogeneous patient records as set forth hereinbelow. The similarity match module 104 searches resulting similarity scores and retrieves the histories for the top-k similar scores. The top-k similar scores are returned, e.g., displayed 112, for a medical professional, e.g., a doctor to select one or more similar patients and make a diagnosis for the current patient and suggest treatment. The feedback module 106 receives general patient similarity measure incorporating feedback from experts, e.g., the efficacy of the treatment selected, to further customize and hone the similarity match performed by the similarity measurement module 102.
So, as shown in the example of
The similarity measurement module 102 determines 124 a cluster similarity score, s1, s2, . . . , sn, for each new or requesting patient x with each other patient y, i.e., nodes 140-1-140-m, in each factor cluster 142-1-142-n. For example, if two patients x and y connect to a common factor f, the match result between x and y on f is 1; and otherwise f is 0, i.e., no match. This match result can be generalized to be weighted by wx*wy*t where wx, wy are the edge weights from x or y to f, and t is the type weight of f. A general example of determining a similarity measure between members of a population based on connection to members of another population is described by J. Sun et al., “Neighborhood Formation and Anomaly Detection in Bipartite Graphs,” Fifth IEEE International Conference on Data Mining, ICDM pp. 418-425, November, 2005, the contents of which are incorporated herein by reference. Then, the similarity measurement module 102 combines cluster scores 126 for each patient 140-1-140-m to provide a global similarity for each, S{x,y}=t1*s1+t2*s2+ . . . +wn*sn, where t1 . . . tn are the weighting coefficient on the factors, si is the match result of x and y on factor i, and i is between 1 to n.
In this example, the factor clusters 142-1-142-n are categories for the individual nodes, which include a diagnosis code cluster 142-1, e.g., Clinical Classifications Software (CCS); a procedure code (CPT) cluster 142-2, and a drug code (NDC) cluster 142-n. Also, individual factor nodes can indicate symptoms, indicate a temporal logical sequence modeled as factor nodes, or be a very general (e.g., logical) indicator. For example, factor nodes can indicate glucose level as normal, low, or high. In another example, a factor node can indicate the logical sequence“CCS.1 follows with (CPT.2 and NDC.2).” For each cluster 142-1-142-n, the similarity measurement module 102 determines the cluster similarity 124 of requesting patient x with existing patient y 140-1-140-m based on the correlation of factors between the two patients x and y. Optionally, instead of using a weighted familiarity approach to arrive at similarity measurements, a random walk approach as also described by Sun et al. may be used. The similarity measurement module 102 stores 128 the global similarity measure Sx,y, e.g., in storage 110, for use by the similarity match module 104.
The similarity match module 104 searches and retrieves and displays 130 similarity scores Sx,1-Sx,m for similarity matches. Matches may be selected as the top-k similar scores, where k is some number between 1 and m, the number of matched patients. Further, k can be selected, for example, by default or when requested. The similarity match module 104 retrieves and presents 130 the matching similar scores, e.g., displaying 112 the matches for a medical professional, such as a nurse or a doctor. The medical professional can review the displayed results, either individually Sx,1-Sx,m, or the selected similarity matches. The medical professional may further review the efficacy of the treatment selected and/or the similarity to patient y or the group of patients, for example, and provide feedback 132 based on that review.
The feedback module 106 receives feedback general patient similarity measure incorporating from experts, e.g., including/excluding certain data sources, varying weights for each. So, for example, using a typical GUI, the medical professional can select individual factor nodes or clusters for exclusion in the similarity measure Sy,z. Also, the medical professional can adjust both edge weights and factor weights. Based on this feedback 32, the similarity measurement module 102 regenerates the global similarity measures Sx,1-Sx,m for the patient x.
Thus advantageously, a preferred system 100 handles multiple data sources, incorporating expert feedback to arrive at the best selection of similar patients. The preferred similarity measurement module leverages the flexibility of a preferred factor graph model to model to selectively add/remove additional features or data sources to the consideration. The factor graph model also enables varying weighting coefficients on different features. Optimal weighting coefficients may be determined using a classification problem on all pairs of patients with experts labeling the results positively or negatively.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.