Enterohemorrhagic Escherichia coli (EHEC) includes a diverse population of Shiga toxin-producing E. coli that causes outbreaks of food and waterborne disease (1-3). EHEC often resides in bovine reservoirs and is transmitted via many food vehicles including cooked meat, such as hamburger (4) and salami (5) and raw vegetables, such as lettuce (6, 7) and spinach (8). In North America, E. coli O157:H7 is the most common EHEC serotype contributing to more than 75,000 human infections (9) and 17 outbreaks (3) per year.
The population genetics and epidemiology of E. coli O157:H7 infections have changed dramatically since the first outbreaks of illness associated with contaminated ground beef occurred in the early 1980s (1). New routes of infection, including direct contact with animals, and survival in novel food vehicles, particularly fresh produce, have become major sources of new disease cases and have contributed to widespread epidemics (3). This changing epidemiology is also influenced by the genetic variation and “relentless evolution” (41) of the O157 pathogen population. As the population of EHEC O157 strains has increased in frequency and spread geographically, it has genetically diversified. Isolates of EHEC O157 from clinical and bovine sources have been shown to be genotypically diverse by different methods, including pulsed field gel electrophoresis (PFGE) (26), octomer based genome scanning (42), and multilocus variable number of tandem repeats analysis (MLVA) (43). Studies of prophage and prophage remnants in EHEC O157 strains have indicated that genotypic diversity is largely attributable to bacteriophage-related insertions, deletions, and duplications of variable sizes of DNA fragments (24, 25, 44).
Substantial variability in clinical presentation also has been observed among patients with EHEC O157 infections. This variation is even apparent among different O157 outbreaks, as some outbreaks have contributed to remarkably high frequencies of HUS and hospitalization relative to others (Table 1). Consequently, it appears that there is extensive variation in virulence among distinct clades of O157.
It is not clear why outbreaks of EHEC O157 vary dramatically in the severity of illness and the frequency of the most serious complication, hemolytic uremic syndrome (HUS) (10-12). The 1993 outbreak in western North America (4) and the large 1996 outbreak in Japan (13) had low rates of hospitalization and HUS (14, 15), whereas the 2006 North American spinach outbreak (8) had high rates of both hospitalization (>50%) and HUS (>10%). One hypothesis is that outbreak strains differ in virulence as a result of variation in the presence and expression of different Shiga toxin (Stx) gene combinations (16-19).
Although molecular subtyping methods, such as PFGE, reveal extensive genomic diversity among O157 outbreaks, “DNA fingerprinting” data are not amenable to population genetic or phylogenetic analyses. PFGE analysis has demonstrated that differences between O157 strains result from discrete insertions or deletions that contribute to restriction site changes between strains rather than SNPs (24). Comparison of multiple O157 genomes has shown that bacteriophage variation is a major factor in generating genomic diversity (25) and presumably underlies most genomic variability detected by PFGE (24, 26).
The inventors have developed primers for use in a method for genotyping E. coli O157:H7 by detecting the nucleotides at 96 single nucleotide polymorphism (SNP) loci in E. coli O157:H7, and applying this method to more than 500 E. coli O157:H7 clinical strains. Phylogenetic analyses identified 39 SNP genotypes (SGs) that differ at 20% of SNP loci and are separated into nine distinct clades. Differences were observed between clades in the frequency and distribution of Shiga toxin genes and in the type of clinical disease reported. Patients with hemolytic uremic syndrome (HUS) were significantly more likely to be infected with clade 8 strains, which have increased in frequency over the past 5 years. Genome sequencing of a spinach outbreak strain, a member of clade 8, also revealed substantial genomic differences. The present method suggests that an emergent subpopulation of the clade 8 lineage has acquired critical factors that contribute to more severe disease.
More specifically, the present invention includes methods for detecting E. coli O157:H7 strains. The present invention further includes detecting E. coli O157:H7 strains in any of 36 SNP genotypes using multiplexed primer sets that are capable of identifying 32 SNPs. In one embodiment, these methods are used to detect E. coli O157:H7 strains with increased virulence, e.g., E. coli O157:H7 strains that are or would be included in clade 8, as defined herein.
The present invention also includes methods for diagnosing diseases caused by E. coli O157:H7 infections. In one embodiment, these methods are used to diagnose diseases associated with infection by E. coli O157:H7 strains that may have increased virulence, e.g., E. coli O157:H7 strains from clade 8, as defined herein.
The present invention includes a method for genotyping E. coli O157:H7, including providing a sample of DNA from a possible E. coli O157:H7 infection; detecting in the sample whether the identity of the nucleotide at position 125 of SEQ ID NO. 11 is thymine (T) or guanine (G), the nucleotide at position 648 of SEQ ID NO. 82 is T or cytosine (C), the nucleotide at position 299 of SEQ ID NO. 47 is T or C, the nucleotide at position 339 of SEQ ID NO. 15 is T or C, the nucleotide at position 144 of SEQ ID NO. 67 is adenine (A) or G, the nucleotide at position 417 of SEQ ID NO. 78 is T or C, the nucleotide at position 3971 of SEQ ID NO. 52 is G or T, the nucleotide at position 1186 of SEQ ID NO. 75 is C or G, the nucleotide at position 2244 of SEQ ID NO. 81 is T or C, the nucleotide at position 1151 of SEQ ID NO. 10 is T or C, the nucleotide at position 1678 of SEQ ID NO. 16 is G or C, the nucleotide at position 1545 of SEQ ID NO. 17 is G or A, the nucleotide at position 311 of SEQ ID NO. 21 is G or A, the nucleotide at position 1340 of SEQ ID NO. 48 is G or A, the nucleotide at position 776 of SEQ ID NO. 35 is G or A, the nucleotide at position 132 of SEQ ID NO. 57 is G or T, the nucleotide at position 348 of SEQ ID NO. 46 is A or C, the nucleotide at position 928 of SEQ ID NO. 20 is G or A, the nucleotide at position 849 of SEQ ID NO. 36 is G or A, the nucleotide at position 247 of SEQ ID NO. 79 is G or A, the nucleotide at position 83 of SEQ ID NO. 1 is T or C, the nucleotide at position 117 of SEQ ID NO. 6 is C or A, the nucleotide at position 259 of SEQ ID NO. 22 is C or T, the nucleotide at position 379 of SEQ ID NO. 18 is C or T, the nucleotide at position 739 of SEQ ID NO. 4 is G or A, the nucleotide at position 527 of SEQ ID NO. 47 is C or T, the nucleotide at position 693 of SEQ ID NO. 74 is C or T, the nucleotide at position 281 of SEQ ID NO. 11 is C or T, the nucleotide at position 267 of SEQ ID NO. 57 is G or A, the nucleotide at position 2707 of SEQ ID NO. 66 is C or A, the nucleotide at position 354 of SEQ ID NO. 47 is C or A, and the nucleotide at position 339 of SEQ ID NO. 70 is T or A; and using the identities of these nucleotides to determine whether the possible E. coli O157:H7 has a particular single nucleotide polymorphism (SNP) genotype (SG) of an E. coli O157:H7 that is defined by these nucleotides.
The invention also includes the above method wherein the identity of the nucleotide at position 125 of SEQ ID NO. 11 is G, the nucleotide at position 648 of SEQ ID NO. 82 is C, the nucleotide at position 299 of SEQ ID NO. 47 is C, the nucleotide at position 339 of SEQ ID NO. 15 is C, the nucleotide at position 144 of SEQ ID NO. 67 is G, the nucleotide at position 417 of SEQ ID NO. 78 is C, the nucleotide at position 3971 of SEQ ID NO. 52 is T, the nucleotide at position 1186 of SEQ ID NO. 75 is G, the nucleotide at position 2244 of SEQ ID NO. 81 is T, the nucleotide at position 1151 of SEQ ID NO. 10 is C, the nucleotide at position 1678 of SEQ ID NO. 16 is G, the nucleotide at position 1545 of SEQ ID NO. 17 is G, the nucleotide at position 311 of SEQ ID NO. 21 is G, the nucleotide at position 1340 of SEQ ID NO. 48 is A, the nucleotide at position 776 of SEQ ID NO. 35 is A, the nucleotide at position 132 of SEQ ID NO. 57 is G, the nucleotide at position 348 of SEQ ID NO. 46 is A, the nucleotide at position 928 of SEQ ID NO. 20 is G, the nucleotide at position 849 of SEQ ID NO. 36 is G, the nucleotide at position 247 of SEQ ID NO. 79 is G, the nucleotide at position 83 of SEQ ID NO. 1 is C, the nucleotide at position 117 of SEQ ID NO. 6 is C, the nucleotide at position 259 of SEQ ID NO. 22 is C or T, the nucleotide at position 379 of SEQ ID NO. 18 is C or T, the nucleotide at position 739 of SEQ ID NO. 4 is G or A, the nucleotide at position 527 of SEQ ID NO. 47 is C or T, the nucleotide at position 693 of SEQ ID NO. 74 is C or T, the nucleotide at position 281 of SEQ ID NO. 11 is T, the nucleotide at position 267 of SEQ ID NO. 57 is G, the nucleotide at position 2707 of SEQ ID NO. 66 is C, the nucleotide at position 354 of SEQ ID NO. 47 is C, and the nucleotide at position 339 of SEQ ID NO. 70 is T; and the possible E. coli O157:H7 is determined to have a SG of an E. coli O157:H7 genotype associated with more severe disease.
With the inventive method, the SG determination may be used to identify the strain or the clade of E. coli O157:H7 for use in large-scale epidemiological studies; or the SG determination may be used as a tool to diagnose infection by E. coli O157:H7 in a clinical setting. Further, the inventive method may be used to test a sample from a plant or animal, including a human, to determine whether E. coli is present by screening for the SG and possibly, other identifying genetic characteristics in any given sample.
The inventive method also can involve the use of real-time polymerase chain reaction (PCR) assays to detect the nucleotides at each of the SNP loci together or individually. Primer trios may be used in the PCR assay, and the primer trios may be selected from the oligonucleotides identified by SEQ ID NOs. 83-382 herein.
Finally, the inventive method also includes identifying the organism in the sample as having one of thirty-nine SGs that are defined by the above-described nucleotides at the SNP loci.
The foregoing summary, as well as the following detailed description of the invention, will be better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there are shown in the drawings and tables, certain embodiment(s) which are presently preferred. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.
Before the subject invention is described further, it is to be understood that the invention is not limited to the particular embodiments of the invention described below, as variations of the particular embodiments may be made and still fall within the scope of the appended claims. It is also to be understood that the terminology employed is for the purpose of describing particular embodiments, and is not intended to be limiting. Instead, the scope of the present invention will be established by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range, and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
All references, patents, patent publications, articles, and databases, referred to in this application are incorporated herein by reference in their entirety, as if each were specifically and individually incorporated herein by reference. Such patents, patent publications, articles, and databases are incorporated for the purpose of describing and disclosing the subject components of the invention that are described in those patents, patent publications, articles, and databases, which components might be used in connection with the presently described invention. The information provided below is not admitted to be prior art to the present invention, but is provided solely to assist the understanding of the reader.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, embodiments, and advantages of the invention will be apparent from the description and drawings, Examples, Sequence Listing, and from the claims. The preferred embodiments of the present invention may be understood more readily by reference to the following detailed description of the specific embodiments, the Examples, and the Sequence Listing included hereafter.
The text file filed concurrently with this application, titled “MIC037P349 Sequence Listing.txt” contains material identified as SEQ ID NOS: 1-384 which material is incorporated herein by reference. This text file was created on Mar. 5, 2010, and is 218,851 bytes.
For clarity of disclosure, and not by way of limitation, the detailed description of the invention is divided into the subsections that follow.
Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by one of ordinary skill in the art to which this invention belongs. Generally, the nomenclature used herein and the laboratory procedures in cell culture, molecular genetics, organic chemistry and nucleic acid chemistry described below are those well known and commonly employed in the art. Although any methods, devices and materials similar or equivalent to those described herein can be used in the practice or testing of the invention, the preferred methods, devices and materials are now described.
In this specification and the appended claims, the singular forms “a,” “an” and “the” include plural reference unless the context clearly dictates otherwise.
The inventors genotyped more than 500 clinical strains of EHEC O157 based on 96 SNPs that separated strains into genetically distinct groups, and sequenced the genome of the O157 strain implicated in the spinach outbreak. These data form a basis for addressing how EHEC O157 has diversified and evolved in genome content, and for assessing intrinsic differences among O157 lineages with regard to clinical presentation and disease severity.
The evaluation of more than 500 O157 strains from clinical sources for up to 96 SNP loci highlights the degree of genetic variation among strains, and identifies a specific O157 lineage (clade 8) that has increased in frequency (
The observation that clade 8 strains more frequently have both the stx2 and stx2c genes infers that carriage of both the Stx2 and Stx2c phages contribute in part to the greater virulence of clade 8 strains. The Stx genes, encoded by lambda-like bacteriophages, can circulate among hundreds of different E. coli strains, (46) and integrate into many sites in the O157 genome (25, 44). Previous studies have observed correlations between specific Stx genes and disease, particularly for stx2 and stx2c (18, 19), though it has not been suggested that having both variants together may increase virulence. Because not all clade 8 strains have both stx2 and stx2c, and none of the strains have only stx2c, the presence and presumable production of the Stx2c variant alone cannot be solely responsible for the enhanced virulence attributed to this lineage. This also is true for the production of Stx2, as it was detected in nearly every strain representing all nine clades. We cannot, however, rule out the possibility that stx2c is rapidly lost during infection, thereby inhibiting our ability to detect it in some strains. What accounts for the greater intrinsic virulence among clade 8 strains and other O157 genotypes has not been fully understood. There is a constellation of mobile genetic elements that contribute to the virulence of pathogenic E. coli (47), and it is possible that a novel combination of virulence factors has emerged in the clade 8 lineage.
Among the three most common clades (2, 7, and 8) examined, there are noteworthy differences in transmission and clinical disease characteristics (Table 2) in addition to the association between clade 8 and HUS.
As to Table 2, there are crude associations between patient characteristics and infection with E. coli O157 strains (n=333) of different clades. Differences in the distribution of clades as measured by clinical data and bacterial characteristics were tested using the Likelihood Ratio Chi square (1 degree of freedom); odds ratios (OR), 95% confidence intervals (95% CI), and P values (P) were obtained based on these distributions. * means percentages and associations are relative to all other clades combined; clade 9 strains were omitted from the analysis. Only 1 strain per outbreak or cluster was used in the analyses. † means number varies depending on characteristic as some data were missing.
For example, patients infected with strains from both clades 2 and 8 reported bloody diarrhea more frequently when compared to patients with clade 7 infections. Furthermore, clades 7 and 8 were more common among female patients, and clade 8 was associated with disease in younger (<18 yrs) patients (
To determine when specific clades first appeared in human disease and assess whether clade 8 strains have increased in frequency in strains recovered from outside of Michigan, the inventors evaluated a subset of O157 strains isolated during different time periods. Through this screening, the inventors identified clade 8 strains from clinical cases dating back to 1984 on multiple continents (Table 3) suggesting that clade 8 has not recently emerged. This result was confirmed by both the spinach outbreak genome (
Table 3 shows distribution and frequency of single nucleotide polymorphism (SNP) genotypes (SGs) among 528 E. coli O157 strains and close relatives. Strain isolation dates are represented by commas for SGs with less than two strains, and as a range for categories with more strains and those with an unknown collection date. * means SG-35 contains 7 strains including (β-glucuronidase positive, GUD+; sorbitol negative, Sor−) strains that are O157:H7. SG-36 contains 6 strains isolated in Germany that are GUD+/Sor+ and have serotype O157:H—. SG-37 strain represents a nontypeable (NT) serotype (O antigen) isolated from a healthy marmoset. † means strains are 055:H7 serotypes and represent the evolutionarily derived lineages (GUD−/Sor−).
In contrast to clade 8 strains from Michigan patients, the frequency of stx2c with or without stx2 did not increase in frequency over time, and stx2c was detected in a strain isolated in 1984, indicating that it too, has not recently emerged.
It is clear that EHEC O157 is genetically diversified and comprises multiple detectable clades with substantial genomic, biological, and epidemiological variation. SNP genotyping has revealed the clades that reflect the genetic variability among pathogenic strains associated with clinical infection. These results support the hypothesis that the clade 8 lineage has recently acquired novel factors that contribute to enhanced virulence. Evolutionary changes in the clade 8 subpopulation could explain its emergence in several recent foodborne outbreaks; however, it is not clear why this virulent subpopulation is increasing in prevalence. Since humans are more an incidental host for EHEC O157, further investigation of the bovine reservoir (48, 49) and environment is critical, as is the evaluation of agricultural practices in areas where livestock and produce are farmed side-by-side. Identifying the underlying factors that lead to enhanced virulence and the successful transmission of EHEC O157 in contaminated food and water is imperative. Similarly, conducting large-scale molecular epidemiologic studies is necessary to assess the actual distribution of SGs, clades and Stx variants in environmental reservoirs and broad geographic scales (50). The development and deployment of a rapid, inexpensive molecular test that can identify more virulent O157 subtypes also would be useful for clinical laboratories to identify patients with an increased likelihood of developing HUS.
The systematic analysis of SNPs is useful for E. coli outbreak investigations, can resolve closely related bacterial genotypes, provide insights into the micro-evolutionary history of genome divergence (20, 27), and contribute to an epidemiologic assessment of associations between bacterial genotypes and disease. Accordingly, to assess the genetic diversity and variability in virulence among E. coli O157 strains, the inventors developed a system for identifying synonymous and non-synonymous mutations as single nucleotide polymorphisms (“SNPs”) (20-23). In one embodiment, the system includes identifying the SNPs through the use of real time PCR. Other methods of identifying the polymorphic nucleotide will be understood by those of skill in the art.
The present invention includes a method for identifying a strain of E. coli O157:H7 by identifying the SNP genotype of the strain, including: (1) providing a sample of DNA from a possible E. coli O157:H7 infection; (2) detecting the nucleotides at a grouping or subset of SNP loci identified in Table 4 herein; (3) based on the nucleotide present at the SNP loci in the sample, identifying a SNP genotype (“SG”) for the sample (e.g., a SG selected from the SGs listed in Table 6 below); and, based on that SG, identifying the strain of E. coli O157:H7. In one embodiment, the SG is used to identify the clade, or phylogenetic lineage, of the strain (e.g., the clade is one of the nine clades identified in Table 6).
The O157 Sakai genome is used as a point of reference for identifying the location of the ninety-six SNPs of the present invention (Table 4) and this genome is comprised of 5,498,450 base pairs (see, Genbank Accession No. NC—002695; as well as
The location of each of the SNPs of the present invention also is identified by its position within a gene of the O157 Sakai genome. For example, again referring to Table 4, the SNP identified as “03—83” is located in gene (or open reading frame) “ECs0333” (SEQ ID NO. 1) at nucleotide position 83 of this gene. The same system of identification is utilized for the other 95 SNPs. SEQ ID NOs. 1-82 describe the nucleotide sequences for the genes (or ORFs) in which the 96 SNPs are located.
In addition to the detection methods described herein, other methods that could be used to detect the nucleotide at a SNP locus include real-time PCR, DNA sequencing and 454 pyrosequencing, which involves sequencing short stretches of DNA containing the SNPs (56).
In one embodiment of the invention, the nucleotides at the SNP loci are detected using real-time PCR. In this embodiment, primers are designed to detect a subset of the 96 SNPs identified in Table 4. For example, those primers may be one or more of the primer trios identified in Table 5 below. These primers have the nucleotide sequences identified in SEQ ID NOs. 83-382 and are used to detect the nucleotide at the SNP loci in the genes having the nucleotide sequences identified in SEQ ID NOs. 1-82. For example, the trio of primers having the nucleotide sequences of SEQ ID NOs. 86-88 can be used to detect the nucleotide at SNP position 83 in the gene having the nucleotide sequence of SEQ ID NO. 1. The primers are made according to methods known in the art and are used to detect the occurrence of the SNPs in a sample of DNA from a possible E. coli O157:H7 infection.
Based on the presence or absence of each of the SNPs in the sample, a SNP genotype can be identified for the sample (e.g., which SNP genotype may be selected from the SNP genotypes listed in Table 6 below); and, based on the SNP genotype, the clade of E. coli O157:H7 in the sample can be identified. For example, a sample can be identified as having the “SNP genotype 1” shown in Table 6 if the DNA of that sample includes all of the nucleotides identified for each of the 32 SNPs shown in the row of Table 6 identified as “1” under “SNP genotype” (i.e., if that DNA includes a thymine for SNP 03—83, a guanine for SNP 95—739, an adenine for SNP 09—117, etc). The same process is used to identify whether an organism has any of the other 38 SNP genotypes shown in Table 6. Further, a sample can be identified as having the “SNP genotype 1” shown in Table 6 if the DNA of that sample includes all of the nucleotides identified for each of the 32 SNPs shown in the row of Table 6 identified as “1” under “SNP genotype(s)”, and the same process is used to identify each of the other 32 SNP genotypes shown in Table 6.
All 96 SNPs, or different groupings or subsets of the 96 SNPs can be used to identify a SNP genotype and, therefore, a strain of E. coli O157:H7. For example, one grouping of the 96 SNPs is the 32 SNPs identified in Table 6. Other groupings are the 32 SNPs identified in Table 6, all of the 96 SNPs identified in Table 4, or some other grouping of these 96 SNPs which can be used to identify a SNP genotype and, therefore, a strain of E. coli O157:H7. The groupings of 32 SNPs shown in Table 6 could be used for rapid detection for diagnostic or clinic applications. Additionally, all 96 SNPs identified in Table 4 could be used as a genotyping tool.
In one embodiment, nucleotides are detected at the 32 SNP loci shown in Table 6, and based on the occurrence of the nucleotides present at these positions, a determination is made whether the organism has any one of the thirty-six SNP genotypes described in Table 6. Note: in Table 6, in some instances, one SG is identified by more than one SG number, e.g., an SG is identified as both “4” and “6” (see also, SGs 16 and 17, as well as SG 20 and 23).
The methods of the present invention also include identifying an E. coli O157:H7 as belonging to one of the clades shown in Table 6 below. The methods of the present invention may be used to identify a strain of E. coli O157:H7 that either is known or unknown.
Having now generally described the invention, the same will be more readily understood through reference to the following examples, which are provided by way of illustration, and are not intended to be limiting of the present invention, unless specified.
Bacterial strains. A total of 528 EHEC O157 strains and close relatives were genotyped; 444 were from Michigan patients identified via surveillance by the Michigan Department of Community Health (MDCH), Bureau of Laboratories from 2001-2006 (40). Patients were confirmed to have O157-associated disease by culture, enzyme immunoassay, and real time PCR for stx1,2 (40). Strains with unique PFGE patterns or patterns present in 2 or fewer strains (n=333) were included in the epidemiological analyses. The additional 94 strains were selected based on epidemiological data to provide a sample representing different geographic locations and collection dates.
SNP loci and real time PCR assays. The 96 SNP loci (Table 4) were identified from data generated by comparative genome sequencing microarrays (23), multilocus sequence typing (28), virulence gene sequencing, and in silico comparisons of the two O157 genomes (29, 30).
SEQ ID NOs. 1-82 include the nucleotide sequences for the genes or ORFs in which the 96 SNPs are located.
Table 4 shows ninety-six single nucleotide polymorphism (SNP) loci examined by real time PCR assays. In the column identified as “Min” the number “1” is used to show the SNPs that are in both the initial set of 32 SNP loci and in the set of 96 SNP loci; and “0” is used only in 96 SNP loci set. “N” means non-synonymous substitution; and “S” means synonymous substitution.
Hairpin-shaped primers (Table 5) were designed by adding a 5′ tail complementary to the 3′ end of each linear primer (22) for each locus, and real-time PCR was used to identify the SNP. Six strains were duplicated to serve as internal controls; identical SNP profiles were observed. Table 5 shows the primers trios (three primers for each SNP of the 96 SNPs) used to detect the SNPs (See, SEQ ID NOs 83-382).
To reduce the number of SNP assays for classifying strains into SGs, the inventors used the SNPT program (21) that identified the initial set of 32 SNP loci (shown as “1” in the “Min” column of Table 4) to delineate 39 SGs. Additional assays were performed to confirm certain SGs. A second set of 32 SNP loci was developed which delineates 39 SGs. In this second set of 32 SNP loci as compared to the initial set of 32 SNP loci, three SNP loci that resolved SNP types 35 through 39 (fimA—354, aspC—267, and espA—339) were substituted with three different loci for classifying SGs 1 through 34 (90—1097G, espA—370, and 26—510).
Those strains responsible for the extensive recombination depicted in
The clade designations in Table 6 are shown, as follows: clade 1 is SG 1 and 2; clade 2 is SGs 3-11; clade 3 is SGs 12-18; clade 4 is SG 19 and 20; clade 5 is SG 23 (after the removal of SGs 21 and 22 which are mixed cultures); SG 23 is now classified as clade 5 because it is equidistant from SGs 20, 24, and 28; clade 6 is SGs 24-26; as compared to the original set, SG 27 was removed because of culture contamination; clade 7 is SG 28 and 29; clade 8 is SGs 30-34; and clade 9 is SGs 35-39. Three SGs (6, 17, and 23) cannot be distinguished from three other SGs using this particular system. Additional SNPs from Table 4 (96 loci) are required to differentiate these SGs.
Phylogenetic analyses. Distance between SGs was measured as the pairwise number of nucleotide difference. ME trees were used to infer the evolutionary relationships among the 39 SGs based on pairwise distance matrix with bootstrap replication for concatenated SNP data using MEGA3 (51). Bootstrap analysis of phylogenetic trees generated by the ME method were constructed using MEGA3 (51) and bootstrap confidence levels (based on 1000 replicate trees) were used to classify SGs into clades. A phylogenetic network based on the Neighbor-net algorithm (33) was applied to 48 PI sites using the SplitsTree4 program (52).
Spinach outbreak strain genomic analysis. A culture isolated from a Michigan patient hospitalized in September 2006, linked by the PulseNet PFGE system (53) to the spinach outbreak pattern by the MDCH and CDC, was sequenced. The Michigan State University (MSU) Genomic Research Support Technical Facility used parallel pyrosequencing on the GS20 454 that included four standard sequencing runs and one paired end run. The final assembly had 201 large contigs (>500 nt) with ˜20× coverage arranged into 79 scaffolds with a total of 5,307,096 nt, and 680 small contigs for a total of 213,699 nt (4% of the total assembled length). Contig alignments to published genomes (Sakai (29) and EDL-933 (30)) were conducted by MUMmer (38). Sakai/EDL-933 genes with at least one alignment of >90% nucleotide identity in the spinach genome were considered present in the spinach strain.
To evaluate the distribution of SNPs in the spinach genome, a strict set of comparison rules were applied. Conserved genes were included only if the alignment was 100% unique in both genomes (i.e., multi-copied genes in either genome were excluded), the identity between the aligned regions was over 90%, and the alignment region was more than 90% of the length of Sakai/EDL-933 genes. Insertions and deletions were excluded. A total of 2,741 genes that fit these criteria and occurred in all three genomes were compared to identify SNP differences. A map was plotted by GENOMEVIZ™ (54).
Stx2c detection. Multiplex PCR was used to detect stx2c and the Stx2c-phage o and q genes (39) in 519 strains; stx data was missing for 19 strains, 4 of which were repeatedly stx negative. The malate dehydrogenase (mdh) gene was used as a positive control. Strains were considered positive for stx2c if mdh (835 bp), stx2c (182 bp), o (533 bp), and q (321 bp) were present.
The multiplex PCR does not distinguish between stx2 and stx2c (both genes only differ by three amino acids in the B subunit (55)), thus the inventors developed a RFLP-based method that amplifies a larger PCR product (1152 bp) using primers stx2 F61 (5′-TATTCCCRGGARTTT AYGATAGA-3′) and stx2-2g_R1213 (5′-ATCCRGAGCCTGATKCAC AG-3′) (See, SEQ ID NOs. 383 and 384) PCR conditions include a 10-min soak at 94° C. and 35 cycles of: 92° C. for 1 min, 59° C. for 30 sec, 72° C. for 1 min, followed by a 5-min soak at 72° C. Digestion with FokI at 37° C. for 3 hours yields banding patterns specific for stx2 (453 bp, 362 bp, 211 bp, and 126 bp) or stx2c (488 bp, 453 by and 211 bp). All bands from each pattern are visible in strains with both stx2 and stx2c.
Epidemiological analyses. The inventors tested for differences in the frequency of clinical characteristics for Michigan patients using the Likelihood Chi Square test, and described the distributions using odds ratios with 95% confidence intervals. Clade 9 was omitted from the analysis as was one strain not part of a clade. To adjust for factors associated with infection by clade, we fit logistic regression models adjusting for age, gender and symptoms. The final epidemiologic analysis was limited to 333 of the 444 Michigan patients, as only one strain from each outbreak or cluster was included.
A total of 96 SNP loci were evaluated in 83 O157 genes (
Pairwise comparisons of the nucleotide profiles from 403 E. coli O157 and closely related strains from clinical sources worldwide distinguished 39 distinct SNP genotypes (SGs) (Table 3). Overall, the number of nucleotide differences between O157 SGs ranged from 1 to 57 with an average of 23.1±1.6 across the 96 loci. The nucleotide diversity, a measure of the degree of polymorphism within the O157 population, is 0.212±0.199, indicating that two strains selected at random differ on average at ˜20% of SNP loci (
Subsequent analyses of the 39 SG profiles revealed phylogenetically informative loci, as defined by two variants found in two or more SGs. Among the 96 SNP loci, 71 sites had complete data and, of these, there were 23 singletons and 48 parsimoniously informative (PI) sites. The 48 PI sites were used to construct a Neighbor-net tree (33) to determine if the informative sites support conflicting phylogenies or a single tree (
To further examine the distribution of O157 genotypes, the inventors devised a minimum set of 32 SNP loci for resolving all 39 SGs, and genotyped 135 additional O157 strains representing clinical sources, including five from well known outbreaks. In all, with the additional screening based on the minimal SNP set, 528 O157 strains were genotyped and classified into SGs and clades. Virtually all of the 528 strains were classified into one of 9 clades, and more than 75% of strains belonged to one of four clades. The most common genotypes were SG-9 (n=184; 35%) of clade 2 followed by SG-30 (n=94; 18%) of clade 8; 20 of the 39 SGs were only represented by one or two strains (
Because the production of Stx has been linked to virulence in O157 strains (35), we estimated the frequency of one or more of three Stx variants (stx1, stx2, and stx2c) by clade. Although stx1 was found in over half (˜65%) of 519 of the 528 O157 strains tested, the distribution is highly non-random across clades (
The stx2c gene also has a non-random distribution and is concentrated in clades 4, 6, 7, and 8 (
Clade 1 contains two SGs and includes the O157 genome strain, Sakai (29) (SG-1), implicated in the 1996 Japanese outbreak (Table 1) linked to radish sprouts (13). Clade 2, the predominant lineage identified, contains nine SGs and includes strain 93-111 (SG-9) from the 1993 outbreak associated with contaminated hamburgers in western North America (4). Clade 3 consists of seven genotypes and includes the genome strain EDL-933 (30) (SG-12) from the first human O157 outbreak in 1982 linked to hamburgers sold at a chain of fast food restaurant outlets in Michigan and Oregon (36). Although these outbreaks representing clades 1, 2, and 3 affected 12,000 people combined, the rate of HUS and hospitalization was low for each (4, 14, 15, 36) compared to the average rates for 350 North American outbreaks (3) (Table 1). Clade 8, in contrast, consists of five SGs that include O157 strains from multistate outbreaks linked to contaminated spinach (37) and lettuce (7) (SG-30) in North America. These 2006 outbreaks caused reportable illnesses in more than 275 patients and resulted in remarkably high rates of more severe disease, characterized by hospitalization (average 63%) and HUS (average 13%), a rate that is 3 times greater than the average HUS rate for 350 outbreaks (Table 1).
To assess whether the high rates of severe disease associated with the spinach outbreak are attributable to intrinsic differences between the spinach outbreak strain (clade 8) and other previously sequenced strains (e.g., Sakai, clade 1; EDL-933, clade 3), we used massively parallel pyrosequencing (GS 20, 454 Life Sciences, Branford, Conn.) to sequence the genome of a strain (TW14359) linked to the 2006 spinach outbreak. Contig alignment of the spinach outbreak strain to the O157 Sakai genome (29) using MUMmer (38) revealed 5,061 (96.3%) significant matches to the 5,253 Sakai genes. The spinach strain genome was missing 192 Sakai genes, 26 of which are backbone genes and 166 are genes for prophage and prophage-like elements. For example, the Mu-like phage Sp18 that is integrated into the sorbose operon of the Sakai genome (25) is absent in the spinach strain genome. Alignment to the Sakai pO157 plasmid revealed that 111 of 112 pO157 genes are present in the spinach outbreak strain, suggesting that the plasmid is conserved in both pathogens.
Among the 4,103 shared backbone genes within the Sakai and spinach genomes, the average sequence identity is 99.8%, and of the 958 shared island genes with Sakai, the average sequence identity is 97.96%. The average sequence identity for all shared genes (n=5,061) is 99.25%. We then compared the conservation of backbone genes and identified 2,741 shared genes with less than 0.5% nucleotide divergence among all three O157 genomes (
To determine if the O157 infections caused by clade 8 pathogens differ with respect to clinical presentation, the inventors examined epidemiological data for all laboratory-confirmed O157 cases (n=333 patients) identified in Michigan since 2001 (40). There are significant associations between specific O157 clades and patient symptoms as well as disease severity via univariate (Table 2) and multivariate (Table 7) analyses. Table 7 shows logistic regression results identifying predictors of hemolytic uremic syndrome (HUS) and infection with various E. coli O157 clades among 333 Michigan patients. *—the models used those without HUS as the reference group and were adjusted for bloody diarrhea, abdominal pain, diarrhea, chills, body aches, hospitalization, age and gender. †—the models used those infected with all other clades except clade 9 as the reference group and were adjusted for bloody diarrhea, abdominal pain, diarrhea, chills, body aches, HUS, hospitalization, age and gender.
Patients infected with O157 strains of clade 8 were significantly more likely to be younger (ages 0 to 18), and despite the small number (n=11) of HUS cases identified, HUS patients were 7 times more likely to be infected with clade 8 strains than patients with strains from clades 1 to 7 combined (
Three HUS patients had infections caused by strains of clade 2, the most numerically dominant clade, however, patients with HUS were still more likely to have a clade 8 infection when compared to clade 2 (Tables 2 and 7). In this analysis, the inventors also observed that clade 2 strains were more common in male patients, and clade 7 strains caused less severe disease, as measured by reporting frequencies of bloody diarrhea and other symptoms, though not all were significant (
Because both the 2006 spinach and lettuce outbreaks were caused by members of the same SG within clade 8, the inventors estimated the frequency of clade 8 over time in an epidemiologically relevant setting. There was a significant increase (Mantel-Haenszel Chi Square=32.5, df=1, P<0.0001) in the frequency of disease caused by clade 8 strains among all 444 O157 cases in Michigan (Fig. S2). Specifically, the frequency of clade 8 strains increased from 10% in 2002 to 46% in 2006 despite the steady decrease in all O157 cases identified via surveillance (40) since 2002 (
While the foregoing specification has been described with regard to certain preferred embodiments, and many details have been set forth for the purpose of illustration, it will be apparent to those skilled in the art that the invention may be subject to various modifications and additional embodiments, and that certain of the details described herein can be varied considerably without departing from the spirit and scope of the invention. Such modifications, equivalent variations and additional embodiments are also intended to fall within the scope of the appended claims.
This application claims benefit of provisional application Ser. No. 61/158,633, filed Mar. 9, 2009, entitled “Methods of Detecting and Genotyping Escherichia coli O157:H7”, the entire contents of which are incorporated herein in their entirety.
This invention was in part made with United States government support awarded by the following agency: National Institute of Health/NIAID grant number N01 AI30058 and NIH grant AI049353. The United States has certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
61158633 | Mar 2009 | US |