An example embodiment of the present invention relates generally to the association, e.g., matching, of patient identifying information and, more particularly, to the association of patient identifiers utilizing principal component analysis.
Many patients have a variety of healthcare records maintained by the same or different healthcare providers. In this regard, each healthcare provider may maintain its own records of the patient's visits, treatments and the like. Each patient record generally includes identifiers, e.g., information, identifying the patient, such as by name, address or other demographic information.
In some instances, the healthcare records maintained by different healthcare providers may be reviewed in order to identify healthcare records of the different healthcare providers that are associated with the same patient. For example, a comprehensive healthcare record of a patient may be established by collecting the healthcare records of the patient maintained by the various healthcare providers. In order to ensure that the healthcare records are associated with the same patient, a number of identifiers that are associated with the respective patient may be reviewed and algorithmically matched. Various algorithmic matching techniques may be utilized including, for example, the determination of a matched score of patient name similarity based on edit distance or a matched score based on components of the address of the patient.
While such approaches may permit healthcare records of a patient to be matched in terms of being associated with the same patient, algorithmic matching techniques generally do not scale well to large data sets and may disadvantageously require substantial processing. Additionally, each set of identifying information of a patient that is considered requires separate algorithmic processing and weighting relative to the other identifiers that are considered, thereby further increasing the processing requirements and further reducing the scalability to large data sets.
A method, apparatus and computer program product are provided in accordance with one embodiment to associate patient identifying information, such as patient identifiers, utilizing principal component analysis. By defining a higher dimensional space formed by the various components of patient identifying information, this identifying information may be matched such that patient records related to the same patient may be identified. By reducing the dimensionality of the higher dimensional space through creation of component scores, the method, apparatus and computer program product of an example embodiment may reduce the set of complete data to which more complete algorithmic methods may be used to associate, e.g., match, patient identifying information, such as patient identifiers, in a manner that is efficient in terms of the requisite processing resources and is readily scalable to large datasets.
In one embodiment, a method is provided that includes, for each of a plurality of patient identifying information, such as patient identifiers, determining a set of vectors representative of a plurality of components of a respective patient identifying information. The method of this embodiment also performs a principal component analysis of each set of vectors and compares results of the principal component analysis of each set of vectors. The method also determines whether two or more of the patient identifying information, such as patient identifiers, are associated with a same or similar patient based upon a comparison of the results of the principal component analysis of each set of vectors. In another embodiment, an apparatus comprising processing circuitry configured to perform comparable functionality is provided. In a further embodiment, a computer program product including at least one non-transitory computer-readable storage medium having computer-executable program code instructions stored therein that include program code instructions configured to perform comparable functionality is also provided.
Having thus described certain embodiments of the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the inventions are shown. Indeed, these inventions may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like numbers refer to like elements throughout.
A method, apparatus and computer program product are provided in order to permit patient identifying information to be associated. Although described hereinbelow in conjunction with patient identifiers, the method, apparatus and computer program product of example embodiments of the present invention may also work with other types of patient identifying information, e.g., name, date of birth, zip code, etc. As such, the method, apparatus and computer program product of an example embodiment may permit healthcare records associated with the same patient to be matched based upon the patient identifiers of the various healthcare records. As described below, the method, apparatus and computer program product may utilize principal component analysis in order to permit the patient identifiers to be associated in a manner that is efficient in terms of the processing resources required. As such, the method, apparatus and computer program product of an example embodiment are more readily scalable to a large data set.
A computing device 10 may provide for the association of patient identifiers utilizing principal component analysis in accordance with an example embodiment of the present invention. The computing device may be embodied by one or more servers, computer workstations or the like. Regardless of the type of computing device, the computing device may be a centralized computing device or a distributed computing device. However, one example of a computing device is depicted in
As shown in
The communication interface 18 may include one or more interface mechanisms for enabling communication with the other entities. In this regard, the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling the communications, such as secure communications as noted above.
In an example embodiment, the memory 16 may include one or more non-transitory memory devices such as, for example, volatile and/or non-volatile memory that may be either fixed or removable. The memory may be configured to store information, data, applications, instructions or the like for enabling the computing device 10 to carry out various functions in accordance with example embodiments of the present invention. For example, the memory could be configured to buffer input data for processing by the processor 12. Additionally or alternatively, the memory could be configured to store instructions for execution by the processor.
The processor 12 may be embodied in a number of different ways. For example, the processor may be embodied as various processing means such as one or more of a microprocessor or other processing element, a coprocessor, a controller or various other computing or processing devices including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), or the like. In an example embodiment, the processor may be configured to execute instructions stored in the memory 14 or otherwise accessible to the processor. As such, whether configured by hardware or by a combination of hardware and software, the processor may represent an entity (e.g., physically embodied in circuitry—in the form of processing circuitry) specifically configured to perform operations according to embodiments of the present invention while configured accordingly. Thus, for example, when the processor is embodied as an ASIC, FPGA or the like, the processor may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor is embodied as an executor of software instructions, the instructions may specifically configure the processor to perform the operations described herein.
Referring now to
The computing device 10, such as the processing circuitry 12, e.g., the processor 14, may be configured to determine a set of vectors in various manners. In regards to a set of vectors associated with a patient's name, for example, the patient's first name may be represented by a 26-digit vector with each digit associated with a respective letter of the alphabet and the value of each digit representative of the number of occurrences of the respective letter of the alphabet within the patient's first name. Other vectors of the same set of vectors may be determined in the same fashion for the patient's middle name and the patient's last name in accordance with this example embodiment. This set of vectors is provided by way of example, however, and the computing device, such as the processing circuitry, e.g., the processor, may represent the various components of a respective patient identifier with different types of vectors in other embodiments. By way of another example, bigram vectors may be constructed for each of a plurality of patient identifiers, e.g., first name, middle name, last name, date of birth, etc.
The set of vectors representative of a plurality of components of a respective patient identifier may then be simplified by being decomposed into principal components. As shown in block 22 of
The principal component analysis may be performed on the vectors representative of a plurality of components of all of the patient identifiers. Alternatively, the principal component analysis may be performed on the vectors representative of components of a subset of the patient identifiers, such as the patient identifiers that are most variable across the patient population and that accordingly contribute the greatest to the unique identification of a patient. For example, the feature space may include vector representations of name components, the edit distance of those name components, date of birth, edit distance of date of birth, etc. In this embodiment, the computing device 10, such as the processing circuitry 12 and, more particularly, the processor 14, may be configured to calculate a mean set of these features and may then rapidly calculate the difference from mean for each feature. As such, the vector representations may be transformed to a smaller dimensionality set of constituent features describing the greatest variation in the underlying data. These constituent features can be more quickly compared through principal component analysis than the underlying data.
As shown in block 24 of
Thereafter, the computing device 10, such as the processing circuitry 12 and, more particularly, the processor 14, may be configured to determine whether two or more of the patient identifiers and, therefore, two or more of the healthcare records with which the patient identifiers are associated, are associated with the same patient based upon the comparison of the results of the principal component analysis of each set of vectors representative of the plurality of components of the respective patient identifiers of the different healthcare records. See block 26 of
As such, the computing device 10 and, therefore, the method, apparatus and computer program product of an example embodiment embodied by the computing device may identify two or more healthcare records that are associated with the same patient based upon an analysis of the patient identifiers of the healthcare records and, more particularly, based upon the determination and comparison of a set of vectors representative of a plurality of components of the patient identifier associated with each healthcare record. By utilizing principal component analysis, healthcare records associated with the same patient may be identified in a manner that is efficient in terms of the processing resources required for such a determination and, as such, may be more readily scalable to large data sets.
As noted above,
Accordingly, blocks of the flowchart support combinations of means for performing the specified functions and combinations of operations for performing the specified functions. It will also be understood that one or more blocks of the flowchart, and combinations of blocks in the flowchart, can be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions. In some embodiments, certain ones of the operations above may be modified or further amplified and additional optional operations may be included. It should be appreciated that each of the modifications, optional additions or amplifications below may be included with the operations above either alone or in combination with any others among the features described herein.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
This application claims the benefit of U.S. Provisional Application No. 61/751,606, filed Jan. 11, 2013, which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61751606 | Jan 2013 | US |