Healthcare-associated infections (HAIs) are patient-acquired infections received during healthcare treatment for conditions unrelated to the infection. A healthy patient entering a hospital for a surgical procedure to repair a hernia who subsequently develops a staph infection at the surgical site while in the recovery ward is an example of a patient-acquired HAI. HAIs in the medical literature are often referred to as nosocomial infections. According to a survey conducted by the CDC in 2011, approximately 1 out of every 25 patients hospitalized will contract an HAI. The study estimated that there were approximately 721,000 HAIs. HAIs cause or contribute to approximately 75,000 deaths each year.
Nosocomial infections can cause severe pneumonia and infections of the urinary tract, bloodstream and other parts of the body. Many types are difficult to attack with antibiotics, and antibiotic resistance is spreading to Gram-negative bacteria that can infect people outside the hospital. In the USA, the most frequent type of infection hospital-wide is pneumonia (21.8%), followed by surgical site infection (21.8%), and gastrointestinal infection (17.1%). (Magill S S, Edwards J R, Bamberg W, et al. “Multistate Point-Prevalence Survey of Health CareAssociated Infections,” N Engl J Med 2014;370:1198-208.)
According to a 2009 report by the CDC, HAIs cost U.S. hospitals approximately $35 billion per year. Much of the cost is related to longer patient stays, quarantining parts of the hospital, and discovering and eradicating the source of infection. Approximately 25.6% of HAIs are believed to be caused by medical devices such as catheters and ventilators. The remaining infections are believed to be associated with surgical procedures and other sources within the hospital. (Scott II, R. D., “The Direct Medical Costs of Healthcare-Associated Infections in U.S. Hospitals and the Benefits of Prevention,” CDC, March 2009.)
As genetic sequencing technology becomes more widely available, it is becoming more feasible to collect samples from patients to sequence genetic information. This genetic information may be from infection causing pathogens, patient tissue, or other sources. Similarities found across large data sets may be used to draw conclusions about the nature of the organism from which the genetic information was derived. However, misclassification of sequences in the large data sets may skew results. Furthermore, due to the massive data contained in genetic sequences, medical staff can become overwhelmed by the information and be unable to act on it.
According to an illustrative embodiment of the invention, a method may include accessing a sequence of a sample isolate in a memory accessible by at least one processing unit; comparing, with the at least one processing unit, the sequence of the sample isolate to at least one reference sequence of a reference isolate stored in a database accessible to the processor to determine variants between the sample isolate sequence and the at least one reference sequence; calculating an evolutionary distance between the sample isolate and the at least one reference sequence, based at least in part, on the variants; determining whether the sample isolate is deviant from the at least one reference sequence with the at least one processing unit based at least in part on the evolutionary distance; and storing the sequence of the sample isolate in the memory with a flag if the sample isolate is deviant, wherein the flag may indicate that the sequence of the sample isolate requires further analysis. The method may further include analyzing the flagged sequence of the sample isolate for contaminants. The sample isolate may be determined to be deviant from the at least one reference sequence if the evolutionary distance is above a desired threshold value.
According to an illustrative embodiment of the invention, a method may include comparing, with at least one processing unit, a sequence of a sample isolate stored in a memory accessible by the at least one processing unit to at least one reference sequence stored in a database accessible to the at least one processing unit to determine variants between the sample isolate sequence and the at least one reference sequence; determining, with the at least one processing unit, an evolutionary distance of the sample isolate from the at least one reference sequence, based at least in part, on the variants; calculating a probability that the sample isolate is deviant from the at least one reference isolate, based at least in part, on a difference of the evolutionary distance of the sample isolate and the distribution of evolutionary distances of the plurality of sequences; determining that the sample isolate is deviant from the at least one reference isolate may be responsive to the probability being above a desired threshold value; and flagging the sample isolate in memory, wherein flagging may indicate that the sequence of the sample isolate may require further analysis. The determination of whether the sample isolate is deviant may be based, at least in part, on whether the evolutionary distance of the infection isolate falls within a desired confidence interval of the distribution of evolutionary distances of the plurality of sequences.
According to an illustrative embodiment of the invention, a system may include a processing unit, a memory accessible to the processing unit, a database accessible to the processing unit, and a display coupled to the processing unit, wherein the processing unit may be configured to compare a sequence of a sample isolate stored in the memory to at least one reference sequence stored in the database to determine variants between the sample isolate sequence and the at least one reference sequence, calculate an evolutionary distance of the sample isolate, based at least in part, on the variants, compare the evolutionary distance of the sample isolate to an evolutionary distance of the at least one reference sequence, determine that the sample isolate sequence is deviant from the at least one reference sequence responsive to a difference of the evolutionary distance of the sample isolate and the evolutionary distance of the at least one reference sequence exceeding a desired threshold value, store the sample isolate sequence with a flag in the memory if determined to be deviant, wherein the flag may indicate that the sequence of the sample isolate may require further analysis. The system may further include a computer system accessible to the processing unit, wherein the processing unit may be configured to provide the determination of whether the sample isolate is deviant. The system may further include a sequencing unit that may be configured to provide the sequence of the sample isolate to the memory.
The following description of certain exemplary embodiments is merely exemplary in nature and is in no way intended to limit the invention or its applications or uses. In the following detailed description of embodiments of the present systems and methods, reference is made to the accompanying drawings which form a part hereof, and in which are shown by way of illustration specific embodiments in which the described systems and methods may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the presently disclosed systems and methods, and it is to be understood that other embodiments may be utilized and that structural and logical changes may be made without departing from the spirit and scope of the present system.
The following detailed description is therefore not to be taken in a limiting sense, and the scope of the present system is defined only by the appended claims. The leading digit(s) of the reference numbers in the figures herein typically correspond to the figure number, with the exception that identical components which appear in multiple figures are identified by the same reference numbers. Moreover, for the purpose of clarity, detailed descriptions of certain features will not be discussed when they would be apparent to those with skill in the art so as not to obscure the description of the present system.
Medical facilities may analyze genetic information such as genetic sequences for a variety of applications. An example application of analyzing genetic sequences is infection control. An infection may be caused by a pathogen such as bacteria, a virus, a fungus, a parasite, or other organism. Some infections may be caused by multiple types of organisms present at the same time. In some instances, an infection may be transmitted between two living organisms. In other instances, an infection may be transmitted to a living organism from a non-living specimen.
Hospitals and other health care facilities often have a baseline level of HAIs. Despite stringent infection control protocols, pathogens may still be present in the facility. Infection control staff may monitor the baseline level HAIs to watch for signs of outbreaks and/or changes in virulence of HAIs. An outbreak is when a large number of patients acquire HAIs in a short period of time. An outbreak may be caused by a new source of infection or a change in virulence of a previously present pathogen. When an outbreak occurs, infection control staff may attempt determine whether the HAIs are from a single source and whether the source or sources are inside or outside the facility. This may allow them to determine how to reduce new patients from acquiring an HAI.
When an outbreak is suspected, samples may be collected by medical staff from patients, surfaces, food, equipment, or other suspected sources. Medical staff may also collect samples on a routine basis as part of regular HAI monitoring. Samples may include tissue, blood, water, and swabs of surfaces. The samples may then be processed to isolate the pathogen causing the infection from other materials in the sample. The infection isolate and/or other isolate of interest may then be analyzed by a variety of methods. The analysis may determine the pathogen type, species, drug resistance, and/or other properties. If a large number of samples are collected, the infection control staff may have difficulty finding patterns or analysis error in the collected data. Overlooking patterns or using erroneous data may cause the infection staff to draw improper conclusions about the source of an HAI.
For example, an outbreak of staph infections may occur in a burn ward of a hospital. The medical staff collects samples from the patients for analysis. If one patient's sample was contaminated by a non-sterile sample receptacle, that patient may be misclassified as having an infection contracted by a different source than the rest of the patients. The infection control staff may waste time and resources searching erroneously for a second infection source. In another example, one patient may have a more virulent strain of staph infection, even though the patient was infected by the same source. The change in virulence may be caused by a genetic mutation in the infection. This change in virulence may be overlooked or the patient may be misclassified as above as having an infection from a different source than the rest of the patients.
By collecting an infection isolate from a sample and analyzing its genetic sequence, it may be possible to determine a source of infection, virulence of the infection, and species of the pathogen causing the infection by using phylogenetic methods. An isolate is a component of the sample that includes genetic information from an organism of interest. In addition to infection sourcing, phylogenetic methods may also be used to find samples that may be contaminated or were incorrectly identified by a previous analysis. Phylogenetics is the study of evolutionary relationships between organisms. Phylogenetic methods analyze all or a portion of a genetic sequence of an organism. By determining an evolutionary history of an infection, it may be possible to provide an understanding of how different incidents of an infection are or are not related. For example, the sequences of infection isolates from multiple infected patients may be compared. It may be possible to determine that one or more of the patients are infected by a different strain of bacteria or if one or more patients have a more virulent strain of the bacteria.
Multiple phylogenetic methods exist, including methods based on evolutionary distances, parsimonious, and maximum likelihoods. Distances based methods are where an evolutionary distance is calculated between each organism. The evolutionary distance is calculated based on the degree of similarity between genetic sequences of organisms. Differences between the two sequences are often referred to as variants. The fewer variants between sequences, the smaller the evolutionary distance between the organisms. One such method for determining evolutionary distances is called the Jukes-Cantor (Evolution of protein molecules In Mammalian protein metabolism, Vol. III (1969), pp. 21-132 by T. H. Jukes, C. R. Cantor edited by M. N. Munro) method where the transition from any particular letter in the genome to another occurs with the same probability:
In Equation 1, above, the instantaneous rate matrix Q represents the rates of change between a pair of nucleotides per instant of time. P—the probability transition matrix is given as
p(t)=eQt Equation 2
As a result, the evolutionary distance between any two organisms under this model is simply:
d
ab=−3/4In. (1−4/3p) Equation 3
Where p is the number of sites along the single nucleotide polymorphisms (SNPs)/DNA that differ between the sequences. The distance goes to infinity as p approaches the equilibrium value (75% of sites differ). This simple model, however does not take into account the biological consideration that transitions (purine to purine (a-g) or pyrimidine to pyrimidine (t-c)) and transversions (purine to pyrimidine or vice-versa) occur at different rates. Another distance model, the Kimura 2-parameter model (Kimura, Motoo. “A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.” Journal of molecular evolution 16.2 (1980): 111-120), attempts to correct for this. In this case:
d==−1/2In [(1−2p−q) (sqrt(1−2q))] Equation 4
For p (proportion of transitions) and q (proportion of transversions).
Once sample isolate sequences have been compared to determine their evolutionary distances, rates of evolution may be determined. The evolutionary distances and relationships between isolates from samples may then be plotted in graphical form, such as a tree plot. Neighbor Joining (Saitou N, Nei M. “The neighbor-joining method: a new method for reconstructing phylogenetic trees.” Molecular Biology and Evolution, volume 4, issue 4, pp. 406-425, July 1987) is one method of building unrooted trees. The method corrects for unequal evolutionary rates between sequences by first finding a pair of neighboring leaves i and j which have the same parent node k. That is, leaves i and j may be organisms that evolved from a common organism k. Leaves i and j may then be removed from the list of leaf nodes and k is added to the current list of nodes, and node distances are recalculated. This algorithm is an example of a greedy “minimum evolution” algorithm.
Another method of building phylogenetic trees is the unweighted pair group method with arithmetic mean (UPGMA) (Sokal R., Michener C. “A statistical method for evaluating systematic relationships.” University of Kansas Science Bulletin 38: 1409-1438, 1958). The UPGMA algorithm is agglomerative and generates a rooted tree. Initially, each sequence defines a single cluster. With each iteration, clusters are combined to form larger clusters. This continues until all sequences are included in a single cluster. With each iteration, two clusters of sequences that are found to have the shortest evolutionary distance are combined into a higher-level cluster. The evolutionary distance between clusters is the average of all evolutionary distances between corresponding pairs of sequences in each of the clusters. The algorithm reiterates until all sequences are placed in the tree.
Single-linkage clustering is a method of building rooted trees similar to UPGMA. However, rather than using the average evolutionary distance between all corresponding pairs of sequences between clusters, the evolutionary distance between clusters is defined by the minimum distance between a sequence in a first cluster and a sequence in a second cluster. That is, the distance of a single pair of sequences defines the distance between clusters.
Complete-linkage clustering is also a method of building rooted trees similar to UPGMA and single-linkage clustering. As with single-linkage clustering, the evolutionary distance between a single pair of sequences, each included in a different cluster, defines the evolutionary distance between two clusters. However, in complete-linkage clustering, the pair of sequences that has the greatest evolutionary distance defines the evolutionary distance between the two clusters.
Unlike neighbor joining, the UPGMA algorithm and related clustering algorithms assume a constant rate of evolution. The above methods of generating phylogenetic trees are provided for example purposes only. Other methods of generating phylogenetic trees may be used without departing from the scope of the invention.
Using the tree representation of many organism isolate sequence samples, it may be possible to estimate relative timing of one organism to another organism. Without loss of generality, a method called Mean Path Lengths (MPL) may be used (Britton, Tom, et al. “Phylogenetic dating with confidence intervals using mean path lengths.” Molecular phylogenetics and evolution 24.1 (2002): 58-65). The MPL method estimates the age of a node with the mean of the distances from this node to all leaves descending from it. Under the assumption of a similar molecular clock, that is, a rate of evolution, standard-errors of the estimated node ages can be computed. Using this method, mutation rates may be calculated for the different sample isolates.
It may be possible to determine one or more organisms originated from the same source based on the evolutionary distance and/or mutation rate. Different sources that have reservoirs of pathogens or other organisms may include but are not limited to blood, saliva, food, surgical tables, sinks, toilets, and bed linens. Genetic isolates from samples that are found to have similar evolutionary distances and/or rates of mutation compared to a reference isolate may have all originated from the reference isolate. Sample isolates whose sequences deviate more than what should be expected from a reference isolate sequence or sequences, based on one or more phylogenetic models may be from a different source, a more virulent strain, misclassified as a particular species/subspecies, and/or contaminated. Deviant sample isolate sequences may need further analysis by technical staff or infection control staff to determine the cause of deviation.
The isolate sequence may then be compared to one or more sequences at Step 115 by the processing unit. The other sequences may be from other collected isolates, reference sequences of known organisms from public or private databases, and/or sequences from other sources. The comparison may include determining variants between the isolate sequence and the one or more other sequences. Variants may be found using existing software tools such as BWA-samstools and Golden Helix. These variants may be used at Step 120 to determine the evolutionary history of the isolate sequence in relation to the one or more sequences. The evolutionary history may be determined by one of the methods described above or another method. Based on the evolutionary history, the isolate sequence may be analyzed at Step 125 to determine if it is deviant from the one or more sequences. Deviation may be based on an analysis of evolutionary distances and/or mutation rates of the sequences. For example, the greater the evolutionary distance, the more likely the isolate sequence may be considered deviant. A thresholding technique based on the evolutionary distance may be used for making a determination of deviance. Other methods of determining deviant sequences or other categorization of sample isolate sequences may be possible. Any deviant sequences may be flagged in the memory for further analysis at Step 130.
An example of a system 200 used for determining and flagging deviant sequences according to an embodiment of the disclosure is shown as a block diagram in
Once a sample isolate sequence is flagged as deviant by one or more of the methods described below, one or more actions may be taken by the system 200. The system may provide a visual indicator to a user on the display 220 to alert the user of the deviant sequences. The deviant sequences may be kept in memory 205, stored in a portion of the database 210 separate from reference and non-deviant sequences, and/or transmitted to the remote computer system 225. The one or more processing units 215 may automatically conduct further analysis on deviant sequences or a user may initiate further analysis. The analysis may be executed by the one or more processing units 215 or by a separate system, such as computer system 225. For example, the one or more processing units 215 and/or computer system 225 may run an analysis configured to detect contamination. Alternatively or in addition to, the one or more processing units 215 and/or computer system 225 may run an analysis configured to detect characteristics in the deviant sequence that are associated with increased virulence and/or drug resistance. The results of these additional analyses may then be provided to a user and/or stored in a database, such as database 210.
The user may use the flagged deviant sequences to determine which samples need to be re-sequenced and/or that new samples need to be collected. New samples may be acquired from sources whose sample isolates were flagged as deviant. The user may run the above processes on the deviant sequences against a different database of reference sequences to determine if the sequences were misclassified as another organism. In infection control, deviant sequences may be determined to have been acquired outside the hospital rather than classified as a HAI.
Alternatively, or in addition to, the relative evolutionary distance trees 400A-D may be converted into dated phylogenetic trees if a time point of one or more of the sequences is known. The MPL method described above or another method may be used. The dated tree may then be used to calculate mutation rates for each strain.
Although reference sequences are grouped into a single distribution in the examples shown in
Although many of the above examples are given in reference to HAI's and infection control in hospitals, other applications of determining and flagging deviant sequences may be possible. The examples given are for illustrative purposes only to assist in understanding the principles of the disclosure, and should not be considered to be limiting the scope of the invention.
Of course, it is to be appreciated that any one of the above embodiments or processes may be combined with one or more other embodiments and/or processes or be separated and/or performed amongst separate devices or device portions in accordance with the present systems, devices and methods.
Finally, the above-discussion is intended to be merely illustrative of the present system and should not be construed as limiting the appended claims to any particular embodiment or group of embodiments. Thus, while the present system has been described in particular detail with reference to exemplary embodiments, it should also be appreciated that numerous modifications and alternative embodiments may be devised by those having ordinary skill in the art without departing from the broader and intended spirit and scope of the present system as set forth in the claims that follow. Accordingly, the specification and drawings are to be regarded in an illustrative manner and are not intended to limit the scope of the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2015/056858 | 9/8/2015 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62056875 | Sep 2014 | US |